Running & Listing Simulations
Preparing Simulation Runs
It is very often the case that you want to run "batch simulations", i.e. just submit a bunch of different simulations, all using same algorithms and code but just different parameters. This scenario always requires the user to prepare a set of simulation parameter containers which are then passed into some kind of "main" function that starts the simulation.
To make the preparation part simpler we provide the following functionality:
DrWatson.dict_list
— Function.dict_list(c::Dict)
Expand the dictionary c
into a vector of dictionaries. Each entry has a unique combination from the product of the Vector
values of the dictionary while the non-Vector
values are kept constant for all possibilities. The keys of the entries are the same.
Whether the values of c
are iterable or not is of no concern; the function considers as "iterable" only subtypes of Vector
.
Use the function dict_list_count
to get the number of dictionaries that dict_list
will produce.
Examples
julia> c = Dict(:a => [1, 2], :b => 4);
julia> dict_list(c)
3-element Array{Dict{Symbol,Int64},1}:
Dict(:a=>1,:b=>4)
Dict(:a=>2,:b=>4)
julia> c[:model] = "linear"; c[:run] = ["bi", "tri"];
julia> dict_list(c)
4-element Array{Dict{Symbol,Any},1}:
Dict(:a=>1,:b=>4,:run=>"bi",:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"bi",:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"tri",:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"tri",:model=>"linear")
julia> c[:e] = [[1, 2], [3, 5]];
julia> dict_list(c)
8-element Array{Dict{Symbol,Any},1}:
Dict(:a=>1,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
DrWatson.dict_list_count
— Function.dict_list_count(c) -> N
Return the number of dictionaries that will be created by calling dict_list(c)
.
Using the above function means that you can write your "preparation" step into a single dictionary and then let it automatically expand into many parameter containers. This keeps the code cleaner but also consistent, provided that it follows one simple rule: Anything that is a Vector
has many parameters, otherwise it is one parameter. dict_list
considers this true irrespectively of what the Vector
contains. This allows users to use any iterable custom type as a single "parameter" of a simulation.
See the Preparing & running jobs for a very convenient application!
Saving Temporary Dictionaries
The functionality of dict_list
is great, but can fall short in cases of submitting jobs to a computer cluster. For serial clusters, each run is submitted to a different Julia process and thus one cannot propagate a Julia in-memory Dict
(for parallel clusters using pmap
is fine).
To balance this, we have here some simple functionality that stores the result of dict_list
(or any other dictionary collection, really) to files with temporary names. The names are returned and can then be propagated into a main
-like Julia process that can take the temp-name as an input, load the dictionary and then extract the data.
DrWatson.tmpsave
— Function.tmpsave(dicts::Vector{Dict} [, tmp]; kwargs...) -> r
Save each entry in dicts
into a unique temporary file in the directory tmp
. Then return the list of file names (relative to tmp
) that were used for saving each dictionary.
tmp
defaults to projectdir("_research", "tmp")
.
See also dict_list
.
Keywords
l = 8
: number of characters in the random string.prefix = ""
: prefix each temporary name will have.suffix = "bson"
: ending of the temporary names (no need for the dot).
An example usage is shown in Using a Serial Cluster.
Collecting Results
The function collect_results!
is only available if you do using DataFrames
in your Julia session.
There are cases where you have saved a bunch of simulation results in a bunch of different files in a folder. It is useful to be able to collect all of these results into a single table, in this case a DataFrame
. The function collect_results!
provides this functionality. Importantly, the function is "future-proof" which means that it works nicely even if you add new parameters or remove old parameters from your results as your project progresses!
DrWatson.collect_results!
— Function.collect_results!([filename,] folder; kwargs...) -> df
Search the folder
(and possibly all subfolders) for new result-files and add them to df
which is a DataFrame
containing all the information from each result-file. If a result-file is missing keys that are already columns in df
, they will be set as missing
. If on the other hand new keys are encountered, a new column will be added and filled with missing
for all previous entries.
If no file exists in filename
, then df
will be saved there. If however filename
exists, the existing df
will be first loaded and then reused. The reused df
has some results already collected: files already included in df
are skipped in subsequent calls to collect_results!
while new result-files are simply appended to the dataframe.
filename
defaults to:
filename = joinpath(dirname(folder), "results_$(basename(folder)).bson")
See also collect_results
.
df
contains a column :path
which is the path where each result-file is saved to. This is used to not reload and reprocess files already present in df
when searching for new ones.
If you have an entry :path
in your saved result-files this will probably break collect_results
(untested).
Keyword Arguments
subfolders::Bool = false
: Iftrue
also scan all subfolders offolder
for result-files.valid_filetypes = [".bson", ".jld", ".jld2"]
: Only files that have these endings are interpreted as result-files. Other files are skipped.verbose = true
: Print (using@info
) information about the process.white_list
: List of keys to use from result file. By default uses all keys from all loaded result-files.black_list = []
: List of keys not to include from result-file.special_list = []
: List of additional (derived) key-value pairs to put indf
as explained below.
special_list
is a Vector{Pair{Symbol, Function}}
where each entry is a derived quantity to be included in df
. The function entry always takes a single argument, which is the loaded result-file (a dictionary). As an example consider that each result-file contains a field :longvector
too large to be included in the df
. The quantity of interest is the mean and the variance of said field. To have these values in your results first use black_list = [:longvector]
and then define
special_list = [ :lv_mean => data -> mean(data[:longvector]),
:lv_lar => data -> var(data[:longvector]) ]
In case this operation fails the values will be treated as missing
.
DrWatson.collect_results
— Function.collect_results(folder; kwargs...) -> df
Do exactly the same as collect_results!
but don't care to load (or later save) an existing dataframe. Thus all found results files are processed.
For an example of using this functionality please have a look at the Real World Examples page!