Running & Listing Simulations

Preparing Simulation Runs

It is very often the case that you want to run "batch simulations", i.e. just submit a bunch of different simulations, all using same algorithms and code but just different parameters. This scenario always requires the user to prepare a set of simulation parameter containers which are then passed into some kind of "main" function that starts the simulation.

To make the preparation part simpler we provide the following functionality:

DrWatson.dict_listFunction
dict_list(c::AbstractDict)

Expand the dictionary c into a vector of dictionaries. Each entry has a unique combination from the product of the Vector values of the dictionary while the non-Vector values are kept constant for all possibilities. The keys of the entries are the same.

Whether the values of c are iterable or not is of no concern; the function considers as "iterable" only subtypes of Vector.

To restrict some values in the dictionary so that they only appear in the resulting dictionaries, if a certain condition is met, the macro @onlyif can be used on those values.

Use the function dict_list_count to get the number of dictionaries that dict_list will produce.

Examples

julia> c = Dict(:a => [1, 2], :b => 4);

julia> dict_list(c)
2-element Array{Dict{Symbol,Int64},1}:
 Dict(:a=>1,:b=>4)
 Dict(:a=>2,:b=>4)

julia> c[:model] = "linear"; c[:run] = ["bi", "tri"];

julia> dict_list(c)
4-element Array{Dict{Symbol,Any},1}:
 Dict(:a=>1,:b=>4,:run=>"bi",:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:model=>"linear")

julia> c[:e] = [[1, 2], [3, 5]];

julia> dict_list(c)
8-element Array{Dict{Symbol,Any},1}:
 Dict(:a=>1,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
source
DrWatson.@onlyifMacro
@onlyif(ex, value)

Tag value to only appear in a dictionary created with dict_list if the Julia expression ex (see below) is evaluated as true. If value is a subtype of Vector, @onlyif is applied to each entry. Since @onlyif is applied to a value and not to a dictionary key, it is possible to restrict only some of the values of a vector. This means that based on ex the number of options for a particular key varies.

Within ex it is possible to extract values of the dictionary passed to dict_list by a shorthand notation where only the key must be provided. For example ex = :(:N == 1) is tranformed in the call dict_list(d) to an expression analogous to :(d[:N] == 1) by using the function lookup_candidate. This is supported for Symbol and String keys.

Examples

julia> d = Dict(:a => [1, 2], :b => 4, :c => @onlyif(:a == 1, [10, 11]));

julia> dict_list(d) # only in case `:a` is `1` the dictionary will get key `:c`
3-element Array{Dict{Symbol,Int64},1}:
 Dict(:a => 1,:b => 4,:c => 10)
 Dict(:a => 1,:b => 4,:c => 11)
 Dict(:a => 2,:b => 4)

 julia> d = Dict(:a => [1, 2], :b => 4, :c => [10, @onlyif(:a == 1, 11)]);

 julia> dict_list(d) # only in case `:a` is `1` the dictionary will get extra value `11` for key `:c`
 3-element Array{Dict{Symbol,Int64},1}:
 Dict(:a => 1,:b => 4,:c => 10)
 Dict(:a => 1,:b => 4,:c => 11)
 Dict(:a => 2,:b => 4,:c => 10)

See the Defining parameter sets with restrictions section for more examples.

source

Using the above function means that you can write your "preparation" step into a single dictionary and then let it automatically expand into many parameter containers. This keeps the code cleaner but also consistent, provided that it follows one simple rule: Anything that is a Vector has many parameters, otherwise it is one parameter. dict_list considers this true irrespectively of what the Vector contains. This allows users to use any iterable custom type as a single "parameter" of a simulation.

See the Preparing & running jobs for a very convenient application!

Saving Temporary Dictionaries

The functionality of dict_list is great, but can fall short in cases of submitting jobs to a computer cluster. For typical clusters that use qsub or slurm, each run is submitted to a different Julia process and thus one cannot propagate a Julia in-memory Dict (in the case of being already on a machine with a connected and massive amount of processors/nodes, simply using pmap is fine).

To balance this, we have here some simple functionality that stores the result of dict_list (or any other dictionary collection, really) to files with temporary names. The names are returned and can then be propagated into a main-like Julia process that can take the temp-name as an input, load the dictionary and then extract the data.

DrWatson.tmpsaveFunction
tmpsave(dicts::Vector{Dict} [, tmp]; kwargs...) -> r

Save each entry in dicts into a unique temporary file in the directory tmp. Then return the list of file names (relative to tmp) that were used for saving each dictionary. Each dictionary can then be loaded back by calling

wload(nth_tmpfilename, "params")

tmp defaults to projectdir("_research", "tmp").

See also dict_list.

Keywords

  • l = 8 : number of characters in the random string.
  • prefix = "" : prefix each temporary name will have.
  • suffix = "jld2" : ending of the temporary names (no need for the dot).
  • kwargs... : Any additional keywords are passed through to wsave (e.g. compression).
source

An example usage is shown in Using a Serial Cluster.

Collecting Results

There are cases where you have saved a bunch of simulation results in a bunch of different files in a folder. It is useful to be able to collect all of these results into a single table, in this case a DataFrame. The function collect_results! provides this functionality. Importantly, the function is "future-proof" which means that it works nicely even if you add new parameters or remove old parameters from your results as your project progresses!

DrWatson.collect_results!Function
collect_results!([filename,] folder; kwargs...) -> df
Requires `DataFrames`

The function collect_results! is only available if you do using DataFrames in your Julia session.

Search the folder (and possibly all subfolders) for new result-files and add them to df which is a DataFrame containing all the information from each result-file. If a result-file is missing keys that are already columns in df, they will be set as missing. If on the other hand new keys are encountered, a new column will be added and filled with missing for all previous entries.

If no file exists in filename, then df will be saved there. If however filename exists, the existing df will be first loaded and then reused. The reused df has some results already collected: files already included in df are skipped in subsequent calls to collect_results! while new result-files are simply appended to the dataframe.

filename defaults to:

filename = joinpath(dirname(folder), "results_$(basename(folder)).jld2")

See also collect_results.

Don't use `:path` as a parameter name.

df contains a column :path which is the path where each result-file is saved to. This is used to not reload and reprocess files already present in df when searching for new ones.

Keyword Arguments

  • subfolders::Bool = false : If true also scan all subfolders of folder for result-files.
  • valid_filetypes = [".bson", ".jld", ".jld2"]: Only files that have these endings are interpreted as result-files. Other files are skipped.
  • rpath = nothing : If not nothing, then it must be a path to a folder. The path column of the result-files is then relpath(file, rpath), instead of the absolute path, which is used by default.
  • verbose = true : Print (using @info) information about the process.
  • update = false : Update data from modified files and remove entries for deleted files.
  • rinclude = [r""] : Only include files whose name matches any of these Regex expressions. Default value includes all files.
  • rexclude = [r"^\b$"] : Exclude any files whose name matches any of these Regex expressions. Default value does not exclude any files.
  • white_list : List of keys to use from result file. By default uses all keys from all loaded result-files.
  • black_list = [:gitcommit, :gitpatch, :script]: List of keys not to include from result-file.
  • special_list = []: List of additional (derived) key-value pairs to put in df as explained below.

special_list is a Vector where each entry is a derived quantity to be included in df. There are two types of entries. The first option is of the form key => func where the key is a symbol to be used as column name in the DataFrame. The function entry always takes a single argument, which is the loaded result-file (a dictionary). The second option is to provide just one function func. This function also takes the single dictionary argument but returns one or more key => value pairs. This second notation may be useful when one wants to extract values for multiple columns in a single step. As an example consider that each result-file contains a field :longvector too large to be included in the df. The quantity of interest is the mean and the variance of said field. To have these values in your results first use black_list = [:longvector] and then define

special_list = [ :lv_mean => data -> mean(data[:longvector]),
                 :lv_lar  => data -> var(data[:longvector]) ]

In case this operation fails the values will be treated as missing.

source

For an example of using this functionality please have a look at the Real World Examples page!