Running & Listing Simulations

Preparing Simulation Runs

It is very often the case that you want to run "batch simulations", i.e. just submit a bunch of different simulations, all using same algorithms and code but just different parameters. This scenario always requires the user to prepare a set of simulation parameter containers which are then passed into some kind of "main" function that starts the simulation.

To make the preparation part simpler we provide the following functionality:

DrWatson.dict_list — Function

dict_list(c::AbstractDict)

Expand the dictionary c into a vector of dictionaries. Each entry has a unique combination from the product of the Vector values of the dictionary while the non-Vector values are kept constant for all possibilities. The keys of the entries are the same.

Whether the values of c are iterable or not is of no concern; the function considers as "iterable" only subtypes of Vector.

To restrict some values in the dictionary so that they only appear in the resulting dictionaries, if a certain condition is met, the macro @onlyif can be used on those values.

Use the function dict_list_count to get the number of dictionaries that dict_list will produce.

Examples

julia> c = Dict(:a => [1, 2], :b => 4);

julia> dict_list(c)
2-element Array{Dict{Symbol,Int64},1}:
 Dict(:a=>1,:b=>4)
 Dict(:a=>2,:b=>4)

julia> c[:model] = "linear"; c[:run] = ["bi", "tri"];

julia> dict_list(c)
4-element Array{Dict{Symbol,Any},1}:
 Dict(:a=>1,:b=>4,:run=>"bi",:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:model=>"linear")

julia> c[:e] = [[1, 2], [3, 5]];

julia> dict_list(c)
8-element Array{Dict{Symbol,Any},1}:
 Dict(:a=>1,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")

source

DrWatson.dict_list_count — Function

dict_list_count(c) -> N

Return the number of dictionaries that will be created by calling dict_list(c).

source

DrWatson.@onlyif — Macro

@onlyif(ex, value)

Tag value to only appear in a dictionary created with dict_list if the Julia expression ex (see below) is evaluated as true. If value is a subtype of Vector, @onlyif is applied to each entry. Since @onlyif is applied to a value and not to a dictionary key, it is possible to restrict only some of the values of a vector. This means that based on ex the number of options for a particular key varies.

Within ex it is possible to extract values of the dictionary passed to dict_list by a shorthand notation where only the key must be provided. For example ex = :(:N == 1) is tranformed in the call dict_list(d) to an expression analogous to :(d[:N] == 1) by using the function lookup_candidate. This is supported for Symbol and String keys.

Examples

julia> d = Dict(:a => [1, 2], :b => 4, :c => @onlyif(:a == 1, [10, 11]));

julia> dict_list(d) # only in case `:a` is `1` the dictionary will get key `:c`
3-element Array{Dict{Symbol,Int64},1}:
 Dict(:a => 1,:b => 4,:c => 10)
 Dict(:a => 1,:b => 4,:c => 11)
 Dict(:a => 2,:b => 4)

 julia> d = Dict(:a => [1, 2], :b => 4, :c => [10, @onlyif(:a == 1, 11)]);

 julia> dict_list(d) # only in case `:a` is `1` the dictionary will get extra value `11` for key `:c`
 3-element Array{Dict{Symbol,Int64},1}:
 Dict(:a => 1,:b => 4,:c => 10)
 Dict(:a => 1,:b => 4,:c => 11)
 Dict(:a => 2,:b => 4,:c => 10)

See the Defining parameter sets with restrictions section for more examples.

source

Using the above function means that you can write your "preparation" step into a single dictionary and then let it automatically expand into many parameter containers. This keeps the code cleaner but also consistent, provided that it follows one simple rule: Anything that is a Vector has many parameters, otherwise it is one parameter. dict_list considers this true irrespectively of what the Vector contains. This allows users to use any iterable custom type as a single "parameter" of a simulation.

See the Preparing & running jobs for a very convenient application!

Saving Temporary Dictionaries

The functionality of dict_list is great, but can fall short in cases of submitting jobs to a computer cluster. For serial clusters, each run is submitted to a different Julia process and thus one cannot propagate a Julia in-memory Dict (for parallel clusters using pmap is fine).

To balance this, we have here some simple functionality that stores the result of dict_list (or any other dictionary collection, really) to files with temporary names. The names are returned and can then be propagated into a main-like Julia process that can take the temp-name as an input, load the dictionary and then extract the data.

DrWatson.tmpsave — Function

tmpsave(dicts::Vector{Dict} [, tmp]; kwargs...) -> r

Save each entry in dicts into a unique temporary file in the directory tmp. Then return the list of file names (relative to tmp) that were used for saving each dictionary. Each dictionary can then be loaded back by calling

wload(nth_tmpfilename, "params")

tmp defaults to projectdir("_research", "tmp").

Collecting Results

Requires `DataFrames`

The function collect_results! is only available if you do using DataFrames in your Julia session.

There are cases where you have saved a bunch of simulation results in a bunch of different files in a folder. It is useful to be able to collect all of these results into a single table, in this case a DataFrame. The function collect_results! provides this functionality. Importantly, the function is "future-proof" which means that it works nicely even if you add new parameters or remove old parameters from your results as your project progresses!

DrWatson.collect_results! — Function

collect_results!([filename,] folder; kwargs...) -> df

Search the folder (and possibly all subfolders) for new result-files and add them to df which is a DataFrame containing all the information from each result-file. If a result-file is missing keys that are already columns in df, they will be set as missing. If on the other hand new keys are encountered, a new column will be added and filled with missing for all previous entries.

If no file exists in filename, then df will be saved there. If however filename exists, the existing df will be first loaded and then reused. The reused df has some results already collected: files already included in df are skipped in subsequent calls to collect_results! while new result-files are simply appended to the dataframe.

filename defaults to:

filename = joinpath(dirname(folder), "results_$(basename(folder)).jld2")

See also collect_results.

Warning

df contains a column :path which is the path where each result-file is saved to. This is used to not reload and reprocess files already present in df when searching for new ones.

If you have an entry :path in your saved result-files this will probably break collect_results (untested).

Keyword Arguments

subfolders::Bool = false : If true also scan all subfolders of folder for result-files.
valid_filetypes = [".bson", ".jld", ".jld2"]: Only files that have these endings are interpreted as result-files. Other files are skipped.
rpath = nothing : If not nothing stores relpath(file,rpath) of result-files in df. By default the absolute path is used.
verbose = true : Print (using @info) information about the process.
update = false : Update data from modified files and remove entries for deleted files.
white_list : List of keys to use from result file. By default uses all keys from all loaded result-files.
black_list = [:gitcommit, :gitpatch, :script]: List of keys not to include from result-file.
special_list = []: List of additional (derived) key-value pairs to put in df as explained below.

special_list is a Vector where each entry is a derived quantity to be included in df. There are two types of entries. The first option is of the form key => func where the key is a symbol to be used as column name in the DataFrame. The function entry always takes a single argument, which is the loaded result-file (a dictionary). The second option is to provide just one function func. This function also takes the single dictionary argument but returns one or more key => value pairs. This second notation may be useful when one wants to extract values for multiple columns in a single step. As an example consider that each result-file contains a field :longvector too large to be included in the df. The quantity of interest is the mean and the variance of said field. To have these values in your results first use black_list = [:longvector] and then define

special_list = [ :lv_mean => data -> mean(data[:longvector]),
                 :lv_lar  => data -> var(data[:longvector]) ]

In case this operation fails the values will be treated as missing.

source

DrWatson.collect_results — Function

collect_results(folder; kwargs...) -> df

Do exactly the same as collect_results! but don't care to load (or later save) an existing dataframe. Thus all found results files are processed.

source

For an example of using this functionality please have a look at the Real World Examples page!