Running & Listing Simulations
Preparing Simulation Runs
It is very often the case that you want to run "batch simulations", i.e. just submit a bunch of different simulations, all using same algorithms and code but just different parameters. This scenario always requires the user to prepare a set of simulation parameter containers which are then passed into some kind of "main" function that starts the simulation.
To make the preparation part simpler we provide the following functionality:
DrWatson.dict_list
— Functiondict_list(c::AbstractDict)
Expand the dictionary c
into a vector of dictionaries. Each entry has a unique combination from the product of the Vector
values of the dictionary while the non-Vector
values are kept constant for all possibilities. The keys of the entries are the same.
Whether the values of c
are iterable or not is of no concern; the function considers as "iterable" only subtypes of Vector
.
To restrict some values in the dictionary so that they only appear in the resulting dictionaries, if a certain condition is met, the macro @onlyif
can be used on those values.
To compute some parameters on creation of dict_list
as a function of other specified parameters, use the type Derived
.
Use the function dict_list_count
to get the number of dictionaries that dict_list
will produce.
Examples
julia> c = Dict(:a => [1, 2], :b => 4);
julia> dict_list(c)
2-element Array{Dict{Symbol,Int64},1}:
Dict(:a=>1,:b=>4)
Dict(:a=>2,:b=>4)
julia> c[:model] = "linear"; c[:run] = ["bi", "tri"];
julia> dict_list(c)
4-element Array{Dict{Symbol,Any},1}:
Dict(:a=>1,:b=>4,:run=>"bi",:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"bi",:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"tri",:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"tri",:model=>"linear")
julia> c[:e] = [[1, 2], [3, 5]];
julia> dict_list(c)
8-element Array{Dict{Symbol,Any},1}:
Dict(:a=>1,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"bi",:e=>[1, 2],:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"tri",:e=>[1, 2],:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"bi",:e=>[3, 5],:model=>"linear")
Dict(:a=>1,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
Dict(:a=>2,:b=>4,:run=>"tri",:e=>[3, 5],:model=>"linear")
Example using Derived
julia> p = Dict(:α => [1, 2],
:solver => ["SolverA","SolverB"],
:β => Derived(:α, x -> x^2),
)
Dict{Symbol, Any} with 3 entries:
:α => [1, 2]
:solver => ["SolverA", "SolverB"]
:β => Derived{Symbol}(:α, #51)
julia> dict_list(p)
4-element Vector{Dict{Symbol, Any}}:
Dict(:α => 1, :solver => "SolverA", :β => 1)
Dict(:α => 2, :solver => "SolverA", :β => 4)
Dict(:α => 1, :solver => "SolverB", :β => 1)
Dict(:α => 2, :solver => "SolverB", :β => 4)
DrWatson.dict_list_count
— Functiondict_list_count(c) -> N
Return the number of dictionaries that will be created by calling dict_list(c)
.
DrWatson.@onlyif
— Macro@onlyif(ex, value)
Tag value
to only appear in a dictionary created with dict_list
if the Julia expression ex
(see below) is evaluated as true. If value
is a subtype of Vector
, @onlyif
is applied to each entry. Since @onlyif
is applied to a value and not to a dictionary key, it is possible to restrict only some of the values of a vector. This means that based on ex
the number of options for a particular key varies.
Within ex
it is possible to extract values of the dictionary passed to dict_list
by a shorthand notation where only the key must be provided. For example ex = :(:N == 1)
is tranformed in the call dict_list(d)
to an expression analogous to :(d[:N] == 1)
by using the function lookup_candidate
. This is supported for Symbol
and String
keys.
Examples
julia> d = Dict(:a => [1, 2], :b => 4, :c => @onlyif(:a == 1, [10, 11]));
julia> dict_list(d) # only in case `:a` is `1` the dictionary will get key `:c`
3-element Array{Dict{Symbol,Int64},1}:
Dict(:a => 1,:b => 4,:c => 10)
Dict(:a => 1,:b => 4,:c => 11)
Dict(:a => 2,:b => 4)
julia> d = Dict(:a => [1, 2], :b => 4, :c => [10, @onlyif(:a == 1, 11)]);
julia> dict_list(d) # only in case `:a` is `1` the dictionary will get extra value `11` for key `:c`
3-element Array{Dict{Symbol,Int64},1}:
Dict(:a => 1,:b => 4,:c => 10)
Dict(:a => 1,:b => 4,:c => 11)
Dict(:a => 2,:b => 4,:c => 10)
See the Defining parameter sets with restrictions section for more examples.
DrWatson.Derived
— TypeDerived(parameters::Vector{Union{String,Symbol}}, function::Function) Derived(parameter::Union{String,Symbol}, function::Function)
Wrap the name(s) of a parameter(s) and a function. After the possible parameter combinations are created, dict_list
will replace instances of Derived by the result of the function func, evaluated with the value of the parameter(s).
Examples
julia> p = Dict(:α => [1, 2],
:solver => ["SolverA","SolverB"],
:β => Derived(:α, x -> x^2),
)
Dict{Symbol, Any} with 3 entries:
:α => [1, 2]
:solver => ["SolverA", "SolverB"]
:β => Derived{Symbol}(:α, #51)
julia> dict_list(p)
4-element Vector{Dict{Symbol, Any}}:
Dict(:α => 1, :solver => "SolverA", :β => 1)
Dict(:α => 2, :solver => "SolverA", :β => 4)
Dict(:α => 1, :solver => "SolverB", :β => 1)
Dict(:α => 2, :solver => "SolverB", :β => 4)
A vector of parameter names can also be passed when the accompanying function uses multiple arguments:
julia> p2 = Dict(:α => [1, 2],
:β => [10,100],
:solver => ["SolverA","SolverB"],
:γ => Derived([:α,:β], (x,y) -> x^2 + 2y),
)
Dict{Symbol, Any} with 4 entries:
:α => [1, 2]
:γ => Derived{Symbol}([:α, :β], #7)
:solver => ["SolverA", "SolverB"]
:β => [10, 100]
julia> dict_list(p2)
8-element Vector{Dict{Symbol, Any}}:
Dict(:α => 1, :γ => 21, :solver => "SolverA", :β => 10)
Dict(:α => 2, :γ => 24, :solver => "SolverA", :β => 10)
Dict(:α => 1, :γ => 21, :solver => "SolverB", :β => 10)
Dict(:α => 2, :γ => 24, :solver => "SolverB", :β => 10)
Dict(:α => 1, :γ => 201, :solver => "SolverA", :β => 100)
Dict(:α => 2, :γ => 204, :solver => "SolverA", :β => 100)
Dict(:α => 1, :γ => 201, :solver => "SolverB", :β => 100)
Dict(:α => 2, :γ => 204, :solver => "SolverB", :β => 100)
Using the above function means that you can write your "preparation" step into a single dictionary and then let it automatically expand into many parameter containers. This keeps the code cleaner but also consistent, provided that it follows one simple rule: Anything that is a Vector
has many parameters, otherwise it is one parameter. dict_list
considers this true irrespectively of what the Vector
contains. This allows users to use any iterable custom type as a single "parameter" of a simulation.
See the Preparing & running jobs for a very convenient application!
Saving Temporary Dictionaries
The functionality of dict_list
is great, but can fall short in cases of submitting jobs to a computer cluster. For typical clusters that use qsub
or slurm
, each run is submitted to a different Julia process and thus one cannot propagate a Julia in-memory Dict
(in the case of being already on a machine with a connected and massive amount of processors/nodes, simply using pmap
is fine).
To balance this, we have here some simple functionality that stores the result of dict_list
(or any other dictionary collection, really) to files with temporary names. The names are returned and can then be propagated into a main
-like Julia process that can take the temp-name as an input, load the dictionary and then extract the data.
DrWatson.tmpsave
— Functiontmpsave(dicts::Vector{Dict} [, tmp]; kwargs...) -> r
Save each entry in dicts
into a unique temporary file in the directory tmp
. Then return the list of file names (relative to tmp
) that were used for saving each dictionary. Each dictionary can then be loaded back by calling
wload(nth_tmpfilename, "params")
tmp
defaults to projectdir("_research", "tmp")
.
See also dict_list
.
Keywords
l = 8
: number of characters in the random string.prefix = ""
: prefix each temporary name will have.suffix = "jld2"
: ending of the temporary names (no need for the dot).kwargs...
: Any additional keywords are passed through to wsave (e.g. compression).
An example usage is shown in Using a Serial Cluster.
Collecting Results
There are cases where you have saved a bunch of simulation results in a bunch of different files in a folder. It is useful to be able to collect all of these results into a single table, in this case a DataFrame
. The function collect_results!
provides this functionality. Importantly, the function is "future-proof" which means that it works nicely even if you add new parameters or remove old parameters from your results as your project progresses!
DrWatson.collect_results!
— Functioncollect_results!([filename,] folder; kwargs...) -> df
The function collect_results!
is only available if you do using DataFrames
in your Julia session.
Search the folder
(and possibly all subfolders) for new result-files and add them to df
which is a DataFrame
containing all the information from each result-file. If a result-file is missing keys that are already columns in df
, they will be set as missing
. If on the other hand new keys are encountered, a new column will be added and filled with missing
for all previous entries.
If no file exists in filename
, then df
will be saved there. If however filename
exists, the existing df
will be first loaded and then reused. The reused df
has some results already collected: files already included in df
are skipped in subsequent calls to collect_results!
while new result-files are simply appended to the dataframe.
filename
defaults to:
filename = joinpath(dirname(folder), "results_$(basename(folder)).jld2")
See also collect_results
.
df
contains a column :path
which is the path where each result-file is saved to. This is used to not reload and reprocess files already present in df
when searching for new ones.
Keyword Arguments
subfolders::Bool = false
: Iftrue
also scan all subfolders offolder
for result-files.valid_filetypes = [".bson", ".jld", ".jld2"]
: Only files that have these endings are interpreted as result-files. Other files are skipped.rpath = nothing
: If notnothing
, then it must be a path to a folder. Thepath
column of the result-files is thenrelpath(file, rpath)
, instead of the absolute path, which is used by default.verbose = true
: Print (using@info
) information about the process.update = false
: Update data from modified files and remove entries for deleted files.rinclude = [r""]
: Only include files whose name matches any of these Regex expressions. Default value includes all files.rexclude = [r"^\b$"]
: Exclude any files whose name matches any of these Regex expressions. Default value does not exclude any files.white_list
: List of keys to use from result file. By default uses all keys from all loaded result-files.black_list = [:gitcommit, :gitpatch, :script]
: List of keys not to include from result-file.special_list = []
: List of additional (derived) key-value pairs to put indf
as explained below.load_function = wload
: Load function. Defaults towload
. You may want to specify a custom load function for example if you store results as a struct and you want the fields of the struct to form the columns of the dataframe. The struct is saved to file as a one-element dictionary so the dataframe will only have a single column. To work around this you could convert it to a dictionary by specifyingload_function = (filename) -> struct2dict(wload(filename)["mykey"])
. This waycollect_results
will receive aDict
whose keys are the fields of the struct.
special_list
is a Vector
where each entry is a derived quantity to be included in df
. There are two types of entries. The first option is of the form key => func
where the key
is a symbol to be used as column name in the DataFrame. The function entry always takes a single argument, which is the loaded result-file (a dictionary). The second option is to provide just one function func
. This function also takes the single dictionary argument but returns one or more key => value
pairs. This second notation may be useful when one wants to extract values for multiple columns in a single step. As an example consider that each result-file contains a field :longvector
too large to be included in the df
. The quantity of interest is the mean and the variance of said field. To have these values in your results first use black_list = [:longvector]
and then define
special_list = [ :lv_mean => data -> mean(data[:longvector]),
:lv_lar => data -> var(data[:longvector]) ]
In case this operation fails the values will be treated as missing
.
DrWatson.collect_results
— Functioncollect_results(folder; kwargs...) -> df
Do exactly the same as collect_results!
but don't care to load (or later save) an existing dataframe. Thus all found results files are processed.
For an example of using this functionality please have a look at the Real World Examples page!