Saving Tools
This page discusses numerous tools that can significantly improve process of saving & loading files, always in a scientific context.
These tools are also used in the examples demonstrated in the Real World Examples page. After reading the proper documentation here it might be worth it to have a look there as well!
For saving and loading files we use FileIO.save
and FileIO.load
. This means that you have to install yourself whatever saving backend you want to use. FileIO
by itself does not install a package that saves data, it only provides the interface!
In addition, DrWatson re-exports FileIO.save
and FileIO.load
for convenience!
Converting a struct to a dictionary
savename
gives great support for getting a name out of any Julia composite type. To save something though, one needs a dictionary. So the following function can be conveniently used to directly save a struct using any saving function:
DrWatson.struct2dict
— Function.struct2dict(s) -> d
Convert a Julia composite type s
to a dictionary d
with key type Symbol
that maps each field of s
to its value. This can be useful in e.g. saving:
tagsave(savename(s), struct2dict(s))
Safely saving data
Almost all packages that save data by default overwrite existing files (if given a save name of an existing file). This is the default behavior because often it is desired.
Sometimes it is not though! And the consequences of overwritten data can range from irrelevant to catastrophic. To avoid such an event we provide an alternative way to save data that will never overwrite existing files:
DrWatson.safesave
— Function.safesave(filename, data)
Safely save data
in filename
by ensuring that no existing files are overwritten. Do this by renaming already existing data with a backup-number ending like #1, #2, ...
. For example if filename = test.bson
, the first time you safesave
it, the file is saved normally. The second time the existing save is renamed to test_#1.bson
and a new file test.bson
is then saved.
If a backup file already exists then its backup-number is incremented (e.g. going from #2
to #3
). For example safesaving test.bson
a third time will rename the old test_#1.bson
to test_#2.bson
, rename the old test.bson
to test_#1.bson
and then save a new test.bson
with the latest data
.
See also tagsave
.
Tagging a run using Git
For reproducibility reasons (and also to not go insane when asking "HOW DID I GET THOSE RESUUUULTS") it is useful to "tag" any simulation/result/process with the Git commit of the repository.
To this end we have some functions that can be used to ensure reproducibility:
DrWatson.tag!
— Function.tag!(d::Dict, gitpath = projectdir(), storepatch = true) -> d
Tag d
by adding an extra field gitcommit
which will have as value the gitdescribe
of the repository at gitpath
(by default the project's gitpath). Do nothing if a key gitcommit
already exists or if the Git repository is not found. If the git repository is dirty, i.e. there are un-commited changes, then the output of git diff HEAD
is stored in the field gitpatch
. Note that patches for binary files are not stored.
Notice that if String
is not a subtype of the value type of d
then a new dictionary is created and returned. Otherwise the operation is inplace (and the dictionary is returned again).
To restore a repository to the state of a particular model-run do:
- checkout the relevant commit with
git checkout xyz
where
xyz is the value stored
- apply the patch
git apply patch
, where the string stored
in the gitpatch
field needs to be written to the file patch
.
Examples
julia> d = Dict(:x => 3, :y => 4)
Dict{Symbol,Int64} with 2 entries:
:y => 4
:x => 3
julia> tag!(d)
Dict{Symbol,Any} with 3 entries:
:y => 4
:gitcommit => "96df587e45b29e7a46348a3d780db1f85f41de04"
:x => 3
DrWatson.@tag!
— Macro.@tag!(d, gitpath = projectdir()) -> d
Do the same as tag!
but also add another field script
that has the path of the script that called @tag!
, relative with respect to gitpath
. The saved string ends with #line_number
, which indicates the line number within the script that @tag!
was called at.
Examples
julia> d = Dict(:x => 3)Dict{Symbol,Int64} with 1 entry:
:x => 3
julia> @tag!(d) # running from a script or inline evaluation of Juno
Dict{Symbol,Any} with 3 entries:
:gitcommit => "618b72bc0936404ab6a4dd8d15385868b8299d68"
:script => "test\stools_tests.jl#10"
:x => 3
DrWatson.gitdescribe
— Function.gitdescribe(gitpath = projectdir()) -> gitstr
Return a string gitstr
with the output of git describe
if an annotated git tag exists, otherwise the current active commit id of the Git repository present in gitpath
, which by default is the currently active project. If the repository is dirty when this function is called the string will end with "_dirty"
.
Return nothing
if gitpath
is not a Git repository, i.e. a directory within a git repository.
The format of the git describe
output in general is
`"TAGNAME-[NUMBER_OF_COMMITS_AHEAD-]gLATEST_COMMIT_HASH[_dirty]"`
If the latest tag is v1.2.3
and there are 5 additional commits while the latest commit hash is 334a0f225d9fba86161ab4c8892d4f023688159c, the output will be v1.2.3-5-g334a0f
. Notice that git will shorten the hash if there are no ambiguous commits.
More information about the git describe
output can be found on (https://git-scm.com/docs/git-describe)
See also tag!
.
Examples
julia> gitdescribe() # a tag exists
"v1.2.3-g7364ab"
julia> gitdescribe() # a tag doesn't exist
"96df587e45b29e7a46348a3d780db1f85f41de04"
julia> gitdescribe(path_to_a_dirty_repo)
"3bf684c6a115e3dce484b7f200b66d3ced8b0832_dirty"
DrWatson.gitpatch
— Function.gitpatch(gitpath = projectdir())
Generates a patch describing the changes of a dirty repository compared to its last commit; i.e. what git diff HEAD
produces. The gitpath
needs to point to a directory within a git repository, otherwise nothing
is returned.
Please notice that tag!
will operate in place only when possible. If not possible then a new dictionary is returned. Also (importantly) these functions will never error as they are most commonly used when saving simulations and this could risk data not being saved!
Automatic Tagging during Saving
If you don't want to always call tag!
before saving a file, you can just use tagsave
or @tagsave
, which can also nicely incorporate safesave
if need be!
DrWatson.tagsave
— Function.DrWatson.@tagsave
— Macro.Produce or Load
produce_or_load
is a function that very conveniently integrates with savename
to either load a file if it exists, or if it doesn't to produce it, save it and then return it!
This saves you the effort of checking if a file exists and then loading, or then running some code and saving, or writing a bunch of if
clauses in your code! produce_or_load
really shines when used in interactive sessions where some results require a couple of minutes to complete.
DrWatson.produce_or_load
— Function.produce_or_load([path="",] c, f; kwargs...) -> file, s
Let s = joinpath(path, savename(prefix, c, suffix))
. If a file named s
exists then load it and return it, along with the global path that it is saved at (s
).
If the file does not exist then call file = f(c)
, save file
as s
and then return file, s
. The function f
must return a dictionary. The macros @dict
and @strdict
can help with that.
Keywords
tag = true
: Save the file usingtagsave
.gitpath = projectdir()
: Path to search for a Git repo.suffix = "bson", prefix = default_prefix(c)
: Used insavename
.force = false
: Iftrue
then don't check if files
exists and produce it and save it anyway.loadfile = true
: Iffalse
, this function does not actually load the file, but only checks if it exists. The return value in this case is alwaysnothing, s
, regardless of whether the file must be produced or not.verbose = true
: print info about the process.kwargs...
: All other keywords are propagated tosavename
.
See also savename
.