Running & Listing Simulations

Running & Listing Simulations

Preparing Simulation Runs

It is very often the case that you want to run "batch simulations", i.e. just submit a bunch of different simulations, all using same algorithms and code but just different parameters. This scenario always requires the user to prepare a set of simulation parameter containers which are then passed into some kind of "main" function that starts the simulation.

To make the preparation part simpler we provide the following functionality:

DrWatson.dict_listFunction.
dict_list(c::Dict)

Expand the dictionary c into a vector of dictionaries. Each entry has a unique combination from the product of the Vector values of the dictionary while the non-Vector values are kept constant for all possibilities. The keys of the entries are the same.

Whether the values of c are iterable or not is of no concern; the function considers as "iterable" only subtypes of Vector.

Use the function dict_list_count to get the number of dictionaries that dict_list will produce.

Examples

julia> c = Dict(:a => [1, 2], :b => 4);

julia> dict_list(c)
3-element Array{Dict{Symbol,Int64},1}:
 Dict(:a=>1,:b=>4)
 Dict(:a=>2,:b=>4)

julia> c[:model] = "linear"; c[:run] = ["bi", "tri"];

julia> dict_list(c)
4-element Array{Dict{Symbol,Any},1}:
 Dict(:a=>1,:b=>4,:run=>"bi",:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:model=>"linear")

julia> c[:e] = [[1, 2], [3, 5]];

julia> dict_list(c)
8-element Array{Dict{Symbol,Any},1}:
 Dict(:a=>1,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
 Dict(:a=>1,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
 Dict(:a=>2,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
source
dict_list_count(c) -> N

Return the number of dictionaries that will be created by calling dict_list(c).

source

Using the above function means that you can write your "preparation" step into a single dictionary and then let it automatically expand into many parameter containers. This keeps the code cleaner but also consistent, provided that it follows one simple rule: Anything that is a Vector has many parameters, otherwise it is one parameter. dict_list considers this true irrespectively of what the Vector contains. This allows users to use any iterable custom type as a single "parameter" of a simulation.

See the Preparing & running jobs for a very convenient application!

Saving Temporary Dictionaries

The functionality of dict_list is great, but can fall short in cases of submitting jobs to a computer cluster. For serial clusters, each run is submitted to a different Julia process and thus one cannot propagate a Julia in-memory Dict (for parallel clusters using pmap is fine).

To balance this, we have here some simple functionality that stores the result of dict_list (or any other dictionary collection, really) to files with temporary names. The names are returned and can then be propagated into a main-like Julia process that can take the temp-name as an input, load the dictionary and then extract the data.

DrWatson.tmpsaveFunction.
tmpsave(dicts::Vector{Dict} [, tmp]; kwargs...) -> r

Save each entry in dicts into a unique temporary file in the directory tmp. Then return the list of file names (relative to tmp) that were used for saving each dictionary.

tmp defaults to projectdir("_research", "tmp").

See also dict_list.

Keywords

  • l = 8 : number of characters in the random string.
  • prefix = "" : prefix each temporary name will have.
  • suffix = "bson" : ending of the temporary names (no need for the dot).
source

An example usage is shown in Using a Serial Cluster.

Collecting Results

Requires `DataFrames`

The function collect_results! is only available if you do using DataFrames in your Julia session.

There are cases where you have saved a bunch of simulation results in a bunch of different files in a folder. It is useful to be able to collect all of these results into a single table, in this case a DataFrame. The function collect_results! provides this functionality. Importantly, the function is "future-proof" which means that it works nicely even if you add new parameters or remove old parameters from your results as your project progresses!

collect_results!([filename,] folder; kwargs...) -> df

Search the folder (and possibly all subfolders) for new result-files and add them to df which is a DataFrame containing all the information from each result-file. If a result-file is missing keys that are already columns in df, they will be set as missing. If on the other hand new keys are encountered, a new column will be added and filled with missing for all previous entries.

If no file exists in filename, then df will be saved there. If however filename exists, the existing df will be first loaded and then reused. The reused df has some results already collected: files already included in df are skipped in subsequent calls to collect_results! while new result-files are simply appended to the dataframe.

filename defaults to:

filename = joinpath(dirname(folder), "results_$(basename(folder)).bson")

See also collect_results.

Warning

df contains a column :path which is the path where each result-file is saved to. This is used to not reload and reprocess files already present in df when searching for new ones.

If you have an entry :path in your saved result-files this will probably break collect_results (untested).

Keyword Arguments

  • subfolders::Bool = false : If true also scan all subfolders of folder for result-files.
  • valid_filetypes = [".bson", ".jld", ".jld2"]: Only files that have these endings are interpreted as result-files. Other files are skipped.
  • verbose = true : Print (using @info) information about the process.
  • white_list : List of keys to use from result file. By default uses all keys from all loaded result-files.
  • black_list = []: List of keys not to include from result-file.
  • special_list = []: List of additional (derived) key-value pairs to put in df as explained below.

special_list is a Vector{Pair{Symbol, Function}} where each entry is a derived quantity to be included in df. The function entry always takes a single argument, which is the loaded result-file (a dictionary). As an example consider that each result-file contains a field :longvector too large to be included in the df. The quantity of interest is the mean and the variance of said field. To have these values in your results first use black_list = [:longvector] and then define

special_list = [ :lv_mean => data -> mean(data[:longvector]),
                 :lv_lar  => data -> var(data[:longvector]) ]

In case this operation fails the values will be treated as missing.

source
collect_results(folder; kwargs...) -> df

Do exactly the same as collect_results! but don't care to load (or later save) an existing dataframe. Thus all found results files are processed.

source

For an example of using this functionality please have a look at the Real World Examples page!