Project Setup
Part of the functionality of DrWatson is creating and navigating through a project setup consistently. This works even if you move your project to a different location/computer or send it to a colleague with a different Julia installation. In addition, the navigation process is identical across any project that uses DrWatson.
This can "just work" (TM) because of the following principles:
- Your science project is also a Julia project defined by a
Project.toml
file. This way the project tracks the used packages (and their versions) and can be shared with any other Julia user. - You first activate this project environment before running any code. This way you ensure that your project runs on the specified package installation (instead of the global one). See Activating a Project for ways to do this.
- You use the functions
projectdir
,datadir
, etc. from DrWatson to navigate your project (see Navigating a Project).
Importantly, our suggested project setup was designed to be fully reproducible, see Reproducibility.
Default Project Setup
DrWatson suggests a universal project structure for any scientific project, which is the following:
│projectdir <- Project's main folder. It is initialized as a Git
│ repository with a reasonable .gitignore file.
│
├── _research <- WIP scripts, code, notes, comments,
│ | to-dos and anything in an alpha state.
│ └── tmp <- Temporary data folder.
│
├── data <- **Immutable and add-only!**
│ ├── sims <- Data resulting directly from simulations.
│ ├── exp_pro <- Data from processing experiments.
│ └── exp_raw <- Raw experimental data.
│
├── plots <- Self-explanatory.
├── notebooks <- Jupyter, Weave or any other mixed media notebooks.
│
├── papers <- Scientific papers resulting from the project.
│
├── scripts <- Various scripts, e.g. simulations, plotting, analysis,
│ │ The scripts use the `src` folder for their base code.
│ └── intro.jl <- Simple file that uses DrWatson and uses its greeting.
│
├── src <- Source code for use in this project. Contains functions,
│ structures and modules that are used throughout
│ the project and in multiple scripts.
│
├── test <- Folder containing tests for `src`.
│ └── runtests.jl <- Main test file, also run via continuous integration.
│
├── README.md <- Optional top-level README for anyone using this project.
├── .gitignore <- by default ignores _research, data, plots, videos,
│ notebooks and latex-compilation related files.
│
├── Manifest.toml <- Contains full list of exact package versions used currently.
└── Project.toml <- Main project file, allows activation and installation.
Includes DrWatson by default.
src
vs scripts
Seems like src
and scripts
folders have pretty similar functionality. However there is a distinction between these two. You can follow these mental rules to know where to put file.jl
:
- If upon
include("file.jl")
there is anything being produced, be it data files, plots or even output to the console, then it should be inscripts
. - If it is functionality used across multiple files or pipelines, it should be in
src
. src
should only contain files that define functions or types but not output anything.
Initializing a Project
To initialize a project as described in the Default Project Setup section, we provide the following function:
DrWatson.initialize_project
— Functioninitialize_project(path [, name]; kwargs...)
Initialize a scientific project expected by DrWatson
in path
(directory representing an empty folder). If name
is not given, it is assumed to be the folder's name.
The new project remains activated for you to immediately add packages.
Keywords
readme = true
: adds a README.md file.authors = nothing
: if a string or container of strings, adds the authors in the Project.toml file and README.md.force = false
: If thepath
is not empty then throw an error. If howeverforce
istrue
then recursively delete everything in the path and create the project.git = true
: Make the project a Git repository.add_test = true
: Add some additional files for testing the project. This is done automatically during continuous integration (if hosted on GitHub), or manually by running the contents of thetest/runtests.jl
file.add_docs = false
: Add some additional files for generating documentation for the project, which can be generated locally by runningdocs/make.jl
but is also generated and hosted during continuous integration using Documenter.jl (if hosted on GitHub). If this option is enabled,Documenter
also becomes a dependency of the project.To host the docs online, set the keyword
github_name
with the name of the GitHub account you plan to upload at, and then manually enable thegh-pages
deployment by going to settings/pages of the GitHub repo, and choosing as "Source" thegh-pages
branch.Typically, a full documentation is not necessary for most projects, because README.md can serve as the documentation, hence this feature is
false
by default.template = DrWatson.DEFAULT_TEMPLATE
: A template containing the folder structure of the project. It should be a vector containing strings (folders) or pairs ofString => Vector{String}
, containg a folder and subfolders (this can be nested further). Example:DEFAULT_TEMPLATE = [ "_research", "src", "scripts", "data", "plots", "notebooks", "papers", "data" => ["sims", "exp_raw", "exp_pro"], ]
Obviously, the default derivative functions of
projectdir
, such asdatadir
, have been written with the default template in mind.placeholder = false
: Add "hidden" placeholder files in each default folder to ensure that project folder structure is maintained when the directory is cloned (because empty folders are not pushed to a remote). Only used whengit = true
.folders_to_gitignore = ["data", "videos","plots","notebooks","_research"]
: Folders to include in the created .gitignore
Activating a Project
This part of DrWatson's functionality requires you to have your scientific project (and as a consequence, the Julia project) activated. This can be done in multiple ways:
- doing
Pkg.activate("path/to/project")
programmatically - using the startup flag
--project path
when starting Julia - by setting the
JULIA_PROJECT
environment variable - using the function
quickactivate
or the macro@quickactivate
offered by DrWatson.
We recommend the fourth approach, although it does come with a caveat (see the docstring of quickactivate
).
DrWatson.quickactivate
— Functionquickactivate(path [, name::String])
Activate the project found by recursively searching the path
and its parents for a valid Julia project file via the findproject
function. Optionally check if name
is the same as the activated project's name. If it is not, throw an error. See also @quickactivate
. Do nothing if the project found is already active, or if no project file is found.
Example:
using DrWatson
quickactivate("path/to/project", "Best project in the WOLRLD")
Notice that this function is first activating the project and then checking if it matches the name
.
Note that to access quickactivate
you need to be using DrWatson
. For this to be possible DrWatson
must be already added in the existing global environment. If you use quickactivate
and share your project, do note to your co-workers that they need to add DrWatson
globally (the default README.md created by initialize_project
says this automatically).
In addition, in your scripts write:
using DrWatson # YES
quickactivate(@__DIR__)
using Package1, Package2
# do stuff
instead of the erroneous:
using DrWatson, Package1, Package2 # NO!
quickactivate(@__DIR__)
# do stuff
This ensures that the packages you use will all have the versions dictated by your activated project (besides DrWatson
, since this is impossible to do using quickactivate
).
DrWatson.@quickactivate
— Macro@quickactivate
Equivalent with quickactivate(@__DIR__)
.
@quickactivate name::String
Equivalent with quickactivate(@__DIR__, name)
.
Notice that since @quickactivate
is a macro, standard caveats apply when using Distributed
computing. Specifically, you need to import DrWatson
and use @quickactivate
in different begin
blocks as follows:
using Distributed
addprocs(8)
@everywhere using DrWatson
@everywhere begin
@quickactivate "TestEnv"
using Distributions, ...
# remaining imports
end
Pluto.jl understands the @quickactivate
macro and will switch to using the standard Julia package manager once it encounters it (or quickactivate
). But, because @quickactivate
is a macro it needs to be executed in a new cell, after using DrWatson
. I.e., you need to split
begin
using DrWatson
@quickactivate "Whatever"
end
to two different cells:
using DrWatson
@quickcativate "Whatever"
@quickactivate ProjectName::Symbol
If given a Symbol
then first quickactivate(@__DIR__, string(ProjectName))
, and then do using ProjectName
, as if the symbol was representing a module name.
This ties with Making your project a usable module functionality, see the docs for an example.
DrWatson.findproject
— Functionfindproject(dir = pwd()) -> project_path
Recursively search dir
and its parents for a valid Julia project file (anything in Base.project_names
). If it is found return its path, otherwise issue a warning and return nothing
.
The function stops searching if it hits either the home directory or the root directory.
Notice that to get the current project's name you can use:
DrWatson.projectname
— Functionprojectname()
Return the name of the currently active project.
Including Julia packages/modules in src
Notice that the project initialized by DrWatson does not represent a Julia package. It represents a scientific project. That being said, it is often the case that you want to develop normal Julia Modules (and perhaps later publish them as packages) inside your project, so that you can later use them in your code with using ModuleName
. The proper way to do this is to initialize Julia packages, using the package manager, inside the src
folder, using these steps:
- Active your project that uses DrWatson.
- Change directory to the project's main folder (important!).
- Go into package mode and initialize a package with the name that you want:
generate src/ModuleName
dev
the local path toModuleName
using the package manager:dev src/ModuleName
. Notice that this command uses a local path, see this PR for more details.- If you don't care to make this module a Julia package, simply delete its
.git
folder:src/Modulename/.git
. - If you do care about publishing this module as a Julia package, then it is mandatory to keep it as git-repository. In this case it is sensible to put
src/ModuleName/.git
into the main.gitignore
file.
- If you don't care to make this module a Julia package, simply delete its
Now whenever you do using ModuleName
, the local version will be used. This will still work even if you transfer your project to another computer, because the Manifest.toml file stores the local path.
Navigating a Project
To be able to navigate the project consistently, DrWatson provides the core function
DrWatson.projectdir
— Functionprojectdir()
Return the directory of the currently active project.
projectdir(args...) = joinpath(projectdir(), args...)
Join the path of the currently active project with args
(typically other subfolders).
Besides the above, the following derivative functions
datadir()
srcdir()
plotsdir()
scriptsdir()
papersdir()
behave exactly like projectdir
but have as root the appropriate subdirectory. These are also defined due to the frequent use of these subdirectories.
All of these functions take advantage of joinpath
, ensuring an error-free path creation that works across different operating systems. It is heavily advised to use projectdir
and derivatives by giving them the subpaths as arguments, instead of using multiplication between paths:
datadir("foo", "test.jld2") # preferred
datadir() * "/foo/test.jld2" # not recommended
Custom directory functions
It is straightforward to make custom directory functions if there is a directory you created that you access more often. Simply define
customdir(args...) = projectdir("custom", args...)
to make the customdir
version that works exactly like e.g. datadir
but for "custom"
instead of "data"
.
Reproducibility
The project setup approach that DrWatson suggests is designed to work flawlessly with Julia standards, to be easy to share and to be fully reproducible. There are three reasons that true reproducibility is possible:
- The project's used packages are embedded in the project because of
Manifest.toml
. - The navigation around the folders of the project uses local directories.
- The project is a Git repository, which means that it has a detailed (and re-traceable) history of all changes and additions.
If you send your entire project folder to a colleague, they only need to do:
julia> cd("path/to/project")
pkg> activate .
pkg> instantiate
to use your project (assuming of course that you are both using the same Julia installation and version). All required packages and dependencies will be installed and then any script that was running in your computer will also be running in their computer in the same way!
In addition, with DrWatson you have the possibility of "tagging" each simulation created with the commit id, see the discussion around gitdescribe
and tag!
. This way, any data result obtained at any moment can be truly reproduced simply by resetting the Git tree to the appropriate commit and running the code.
Transitioning an existing project to DrWatson
If you already have an existing project with scripts and data etc., then there is no reason to use the initialize_project
function. The only requirement is that everything that belongs to your project is contained within a single folder (which can have an arbitrary amount of subfolders). If your project is already a Julia project (which means it has its own Project.toml and Manifest.toml files), then there is nothing more necessary to be done, you can immediately start using DrWatson with it. Although we recommend following the Default Project Setup, you don't have to do this either, since you can create your own Custom directory functions.
If your project is not also a Julia project, the steps necessary are still quite simple. You can do:
julia> cd("path/to/project")
pkg> activate .
pkg> add Package1 Package2 ...
Julia will automatically make the Project.toml and Manifest.toml files for you as you add packages used by your project.