Software

Good software packages may make your experimental cycle (and life) a lot easier.

Intended for: BSc, MSc, PhD

Back-up & Version control

Managing Experiments

Deep Learning Libraries

Datasets

Reinforcement Learning Environments

Parallelization

LaTeX documents

References

Presentations & Diagrams

Back-up & Version control

The first thing you need to set up is version control and all your relevant documents. You don't want to lose progress because you accidentally delete a file or folder.
For your code, use Git (with Github), even if you work on a project alone. It naturally allows you to branch off in new experiment directions, and later on merge or discard these.
For all other files, I advise to use a Dropbox account (the first 2GB is free, which is definitely enough for a few projects if you mostly store text files and images). Install the Dropbox folder on all your devices, and work on all your relevant documents from inside this folder. You then always have a back-up, and your files are automatically up-to-date on all of your devices (which prevents having to carry a laptop around all the time).

Git (with Github)

Dropbox

Managing Experiments

Management of experiments can be a lot of hassle. Start multiple experiments with different hyperparameters and repetitions, log all their results, generate learning curves from them, etc. A few years back, researchers would all write this code themselves (which often took more time than the actual experiment code). Luckily, these days there are very convenient packages that help you launch experiments, control them, log all your results, and directly visualize them. The best option is Weights and Biases.

Weights and Biases
Very important: in bigger experiments, always separate result logging from plotting. You do not want to redo your experiments if you want to slightly adjust your plot. Weights and Biases automatically handles this for you.

Deep Learning Libraries

To implement and train your neural network, you want to use a deep learning library. These packages are actually 'automatic differentiation engines': they allow you to build a network/graph through which you can automatically differentiate (a loss with respect to the variables). Examples are:

Datasets

For supervised learning experiments, you will need a dataset to train on. There is a huge variety of available datasets, which range in type of challenge and difficulty.

Some common examples for computer vision include MNIST, ImageNet, CIFAR-10/100.

Overview of datasets gives a more extensive overview, also for other fields (Natural Language Processing, Speech Recognition, etc.)

Reinforcement Learning Environments

For reinforcement learning experiments, you usually need environments to test on. Most environments follow the Environment class template introduced in Gym. Many researchers have written new environment(s) (packages) in the same template, of which you may find examples in the below lists.

Parallelization

When computation is a bottleneck, you may want to parallelize your code. However, read the below considerations first:

Note that in ML experiments you often need to run repetitions and different hyperparameter settings. Therefore, carefully think whether you even need parallelization: you need to run many separate experiments anyway, which you can also start next to eachother (and each run them longer).
Note that Python has a Global Interperter Lock (GIL): therefore, threading within a Python process will generally not speed up you code (much).
Note that deep learning libraries usually automatically internally parallelize the neural network operations over available resources. Therefore, you often do not need to implement parallelization to improve that part of your code.

If computation outside of the network operations remains the bottleneck, then you may want to parallelize your Python code. The best option is to use Ray.

LaTeX documents

You can of course work with a local Tex installation (I usually still do). However, an alternative is to use an online Latex editor, such as Overleaf. Two important benefits are i) automatic version control and ii) easy sharing with supervisors/collaborators.

Overleaf

References

To keep track of your references, it can be helpful to use a reference manager. For a bachelor/master thesis this is probably not necessary, but as a PhD student it might be useful. For a paper, you can simply extract the required bibliography file from your reference manager. Some popular options:

Presentations & Diagrams

For presentations and diagrams, I mostly use online tools. You can make very neat presentations with Google Slides, and all your progress is automatically stored. For conceptual drawings, you can of course write scripts, but they take a lot of time. I usually prefer the online tool Diagrams.net, which provides fast prototyping and a clean lay-out.

Page updated

Google Sites

Report abuse