Finding CMIP6 data in the CEDA Archive

The notebook gives a quickstart guide to finding a CMIP6 dataset on the CEDA Archive and using it within JASMIN.

Table of contents:

Introduction to the CMIP6 data structure

The CMIP6 dataset is organised in a very specific directory and filename structure. On JASMIN, this structure is available via the top-level directory: /badc/cmip6/data/. Data is stored separately for each variable in NetCDF files ending in .nc.

A full guide to the data structure is provided in the CEDA help. A summary follows.

You can also download a spreadsheet of the variable definitions, and it may be helpful to search the CMIP6_CVs and Earth System Documentation websites.

Directories are structured as follows:

Filenames then repeat a number of these fields, with the addition of a time range, as follows:

The above examples would produce the directory:

/badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203

and filename:

ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc

for the idealised transient global deforestation dataset.

Finding CMIP6 data using the CEDA Archive

The CEDA Archive contains a subset of the CMIP6 data. You should first search the CEDA Archive to see if the data you need is already available. If you can't find it, then you can also try searching the full CMIP6 Archive.

Overview

  1. Check the CEDA Archive to see if the data you need is there already:

Note: when browsing the CEDA Archive, some directories initially appear empty and there is a slight delay before they are populated. If you find this is too slow, you can also try the live view.

  1. If you can't find the dataset that you need on the CEDA Archive, then search the Earth System Grid Federation site which contains the full CMIP6 Archive.

Browsing the CEDA Archive using the Catalogue

Screenshot of a project page on the CEDA Archive

Then browse to either:

  1. A collection of datasets, e.g. the UKESM1-0-LL model output collection, or
  2. A dataset directly, e.g. the UKESM1-0-LL model output for the "deforest-globe" experiment

Screenshot of datasets and collections on a project page on the CEDA Archive

On an individual dataset page, you can then view:

  1. An abstract for the data with citation information
  2. The temporal and geographic extent of the data
  3. A definition of the variables supplied in the dataset
  4. An option to download the dataset

Screenshot of a dataset on the CEDA Archive

From the dataset's catalogue page, you can click Download to move to the CEDA Archive, where can then find the variables you are interested in and locate their individual NetCDF files.

You can then find:

  1. The path to the dataset
  2. The name of the NetCDF file
  3. A button that allows you to copy a JASMIN-compatible path to your clipboard

Note: some directories initially appear empty and there is a slight delay before they are populated.

Screenshot of a directory listing on the CEDA Archive

You should then be able to find the NetCDF data files on JASMIN using these paths, and then use the command ncdump -h to view the header information:

Converting dataset identifiers found on the ESGF site

You may also have found datasets of interest by searching the Earth System Grid Federation (ESGF) site. If you want to load datasets that you have located on this website, you will need to convert the identifiers into directory paths by replacing the dots with slashes, and adding the top-level directory for the CMIP6 data on JASMIN, as follows:

e.g.

CMIP6.LUMIP.MOHC.UKESM1-0-LL.deforest-globe.r1i1p1f2.Amon.ch4.gn

becomes

/badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn

You can convert the identifier and list the NetCDF files available using the following commands:

This will list all the NetCDF files that are available in that dataset, or it will give you an error saying No such file or directory if the data is not available on JASMIN.

Searching for data on the command line

If you are comfortable searching for datafiles via the command line, but do not know the specific directory path you need, you can run a search using one or more wildcards:

The more wildcards you use, the slower the serarch will be.

Data that is not available on JASMIN

If you can't find the dataset that you need on the CEDA Archive, then you can try searching the Earth System Grid Federation (ESGF) site which contains the full CMIP6 Archive.

For datasets with a handful of small data files you can download them onto JASMIN using curl, however for larger datasets, you should first contact the CEDA helpdesk.

For example, from a notebook or on the command line, you can download a NetCDF file by using the HTTP download link on the ESGF website:

And then check its contents using ncdump -h:

Further reading

Acknowledgements

This notebook is insired by the CMIP6 data at CEDA guide produced by the CEDA team.

By: James Thomas

Last updated: 21st April 2021