Create our own CMIP6 python environment

This notebook sets up an environment using conda for our CMIP6 analysis.

This allows you to install and use specific versions of python packages, which has three advantages:

  1. You can use newer version of the packages than those available by default on the JASMIN servers.
  2. You can use consistent versions of packages across all your analysis on JASMIN, whether you are using the JASMIN Notebook Service or the JASMIN Scientific Analysis (sci) servers.
  3. You can allow others (including your future self) to configure the same version of packages to ensure reproducibility of your work.

*Hint: if all you're looking to do is set-up a Python environment with some up-to-date versions of frequently-used packages, then you may find that it's quicker and easier to use our pre-installed environment.*

Overview of workflow

You need to run through these steps once per environment that you want to create, and associate each notebook you're using with the environment once per notebook.

The environment is stored in a file which by convention is usually named environment.yml. We provide an example environment.yml file will give you the latest versions of several python packages, which you can install and then freeze. This will add the exact version numbers you end up installing into the environment file to ensure anyone else running your code gets the same versions.

Step 1: Create a new environment with packages from environment.yml

First we create a new conda environment, installing all the packages defined in the environment file.

We're going to assume that you're going to name the environment cmip6 and that your environment definition is in environment.yml (if you are working with more than one environment then you'll need to give each a distinct name and definition).

We're going to run commands in the shell using the special jupyter ! syntax. This is the equivalent of running these commands directly on the command line (if you are working at the command line rather than in JupyterLab, then just type these commands directly, without the !).

If you've used pip virtual environments before, you'll recognise that this is similar to how use use a requirements.txt file with pip.

Step 2: Freeze the versions of the packages that you have just installed

By freezing the exact versions of all the packages that we have installed to a new environment_fronzen.yml file, we ensure that we have a record of all the software needed to recreate our results. We can reinstall an (almost) identical copy of the environment on another machine, or at a later date.

If you've used virtual environments with pip before, you'll recognise that this is a bit like pip freeze command.

Step 3: Register your environment with Jupyter

Next we need to:

We only need to do this once per environment.

If you made changes to the environment file in the previous steps then make sure you still include the ipykernel package, because Jupyter uses this to execute code within the environment.

Finally, we need to tell Jupyter to use the environment when executing code. We need to do this once for each notebook that will be using the environment.

In Jupyter this is called the 'kernel', and can be changed from either the top-right of the screen, or the "Kernel" menu:

Screenshot of kernel change mechanism

and then choosing the name of your kernel from the dropdown list:

Screenshot of kernel change dropdown

Step 4 (optional): Subsequently add further packages

There are two main ways to add packages:

  1. Install the latest version of a package and then re-freeze the versions of packages you have installed to your environment file
  2. If you know the exact packages and versions you want already, you can just add then to the environment file directly and then update your environment

We recommend you use conda packages where possible, however you can also specify packages that are installed into the environment using pip too.

Method 1: Install and freeze

You can install the latest available version of a package using the conda install command.

For example, to install the latest version of the requests library:

If a package isn't available on the conda repository, then we could try conda-forge or use pip:

The final step in this method is to then freeze the exact versions of all the packages that we have installed to our environment file.

Method 2: Modify environment file

The other way to add more packages to the environment is to add them to dependencies section of your environment_frozen.yml file:

dependencies:
 - python=3.9.1
 - numpy=1.19.2
 - pip
 - pip:
    - altair==4.1.0

Note how you can specify the exact versions of python and the packages that you are using (you don't have to, but it's recommended that you do).

Packages that can be installed via conda can be specified as <package name>=<package version>, e.g. numpy=1.19.2.

For packages that aren't included in the default conda repository, you can install them via pip, but note that you must include pip as a conda dependency and then specify version numbers with a double-equals sign, e.g. altair==4.1.0. Instead of pip, you can also try conda-forge.

Finally, to actually install the packages, we need to tell conda to update the environment according to the changes we have made:

If we also want to remove packages from the environment, then the we can recreate the environment afresh:

For further information, the relevant documentation is:

Acknowledgements

This notebook is based on the conda envs tutorial from CEDA.

By: James Thomas and William Seviour

Last updated: 27th May 2021