The notebook gives a quickstart guide to finding a CMIP6 dataset on the CEDA Archive and using it within JASMIN.
The CMIP6 dataset is organised in a very specific directory and filename structure. On JASMIN, this structure is available via the top-level directory: /badc/cmip6/data/
. Data is stored separately for each variable in NetCDF files ending in .nc
.
A full guide to the data structure is provided in the CEDA help. A summary follows.
You can also download a spreadsheet of the variable definitions, and it may be helpful to search the CMIP6_CVs and Earth System Documentation websites.
Directories are structured as follows:
/badc/cmip6/data/
<mip_era>/
phase of the project, e.g. CMIP6
<activity_id>/
identifier of the MIP, e.g. LUMIP
<institution_id>/
institution responsible for the model, e.g. MOHC
<source_id>/
model used (ES-DOC), e.g. UKESM1-0-LL
<experiment_id>/
set of experiments (ES-DOC, CMIP6 Data Request) being run, e.g. deforest-globe
<variant_label>/
in the form r0i0p0f0
, e.g. r1i1p1f2
, where the numbers are the indexes for:<table_id>/
MIP table (CMIP6 Data Request) being used, e.g. Amon
<variable_id>/
data variable, e.g. ch4
<grid_label>/
model grid being used, e.g. gn
, wheregm
= global mean datagn
= data reported on a model's native gridgr1
= regridded data reported on a grid other than the native grid and other than the preferred target grid<version>
normally in the form vYYYYMMDD
or latest
, e.g. v20200203
Filenames then repeat a number of these fields, with the addition of a time range, as follows:
<variable_id>_
(as above)<table_id>_
(as above)<source_id>_
(as above)<experiment_id>_
(as above)<variant_label>_
(as above)<grid_label>_
(as above)<time_range>.nc
normally in the form YYYYMMDD-YYYYMMDD
, but may be specified at a different time resolution, e.g. 185001-192912
The above examples would produce the directory:
/badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203
and filename:
ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc
!ncdump -h /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203/ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc
netcdf ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912 { dimensions: time = UNLIMITED ; // (960 currently) bnds = 2 ; plev = 19 ; lat = 144 ; lon = 192 ; variables: double time(time) ; time:bounds = "time_bnds" ; time:units = "days since 1850-01-01" ; time:calendar = "360_day" ; time:axis = "T" ; time:long_name = "time" ; time:standard_name = "time" ; double time_bnds(time, bnds) ; double plev(plev) ; plev:units = "Pa" ; plev:axis = "Z" ; plev:positive = "down" ; ...
The CEDA Archive contains a subset of the CMIP6 data. You should first search the CEDA Archive to see if the data you need is already available. If you can't find it, then you can also try searching the full CMIP6 Archive.
Note: when browsing the CEDA Archive, some directories initially appear empty and there is a slight delay before they are populated. If you find this is too slow, you can also try the live view.
Then browse to either:
On an individual dataset page, you can then view:
From the dataset's catalogue page, you can click Download to move to the CEDA Archive, where can then find the variables you are interested in and locate their individual NetCDF files.
You can then find:
Note: some directories initially appear empty and there is a slight delay before they are populated.
You should then be able to find the NetCDF data files on JASMIN using these paths, and then use the command ncdump -h
to view the header information:
!ncdump -h /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203/ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc
netcdf ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912 { dimensions: time = UNLIMITED ; // (960 currently) bnds = 2 ; plev = 19 ; lat = 144 ; lon = 192 ; variables: double time(time) ; time:bounds = "time_bnds" ; time:units = "days since 1850-01-01" ; time:calendar = "360_day" ; time:axis = "T" ; time:long_name = "time" ; time:standard_name = "time" ; double time_bnds(time, bnds) ; double plev(plev) ; plev:units = "Pa" ; plev:axis = "Z" ; plev:positive = "down" ; ...
You may also have found datasets of interest by searching the Earth System Grid Federation (ESGF) site. If you want to load datasets that you have located on this website, you will need to convert the identifiers into directory paths by replacing the dots with slashes, and adding the top-level directory for the CMIP6 data on JASMIN, as follows:
e.g.
CMIP6.LUMIP.MOHC.UKESM1-0-LL.deforest-globe.r1i1p1f2.Amon.ch4.gn
becomes
/badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn
You can convert the identifier and list the NetCDF files available using the following commands:
%%bash
cmip_identifier=CMIP6.LUMIP.MOHC.UKESM1-0-LL.deforest-globe.r1i1p1f2.Amon.ch4.gn
find /badc/cmip6/data/`echo $cmip_identifier | tr . /`
/badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/files /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/files/d20200203 /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/files/d20200203/ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/latest /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203 /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203/ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc
This will list all the NetCDF files that are available in that dataset, or it will give you an error saying No such file or directory
if the data is not available on JASMIN.
If you are comfortable searching for datafiles via the command line, but do not know the specific directory path you need, you can run a search using one or more wildcards:
!ls /badc/cmip6/data/CMIP6/LUMIP/MOHC/*/deforest-globe/*/Amon/ch4/*/*/*.nc
/badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/latest/ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc /badc/cmip6/data/CMIP6/LUMIP/MOHC/UKESM1-0-LL/deforest-globe/r1i1p1f2/Amon/ch4/gn/v20200203/ch4_Amon_UKESM1-0-LL_deforest-globe_r1i1p1f2_gn_185001-192912.nc
The more wildcards you use, the slower the serarch will be.
If you can't find the dataset that you need on the CEDA Archive, then you can try searching the Earth System Grid Federation (ESGF) site which contains the full CMIP6 Archive.
For datasets with a handful of small data files you can download them onto JASMIN using curl
, however for larger datasets, you should first contact the CEDA helpdesk.
For example, from a notebook or on the command line, you can download a NetCDF file by using the HTTP download link on the ESGF website:
!curl -O http://crd-esgf-drc.ec.gc.ca/thredds/fileServer/esgC_dataroot/AR6/CMIP6/LUMIP/CCCma/CanESM5/deforest-globe/r1i1p2f1/Ofx/areacello/gn/v20190429/areacello_Ofx_CanESM5_deforest-globe_r1i1p2f1_gn.nc
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1486k 100 1486k 0 0 210k 0 0:00:07 0:00:07 --:--:-- 418k
And then check its contents using ncdump -h
:
!ncdump -h areacello_Ofx_CanESM5_deforest-globe_r1i1p2f1_gn.nc
netcdf areacello_Ofx_CanESM5_deforest-globe_r1i1p2f1_gn { dimensions: j = 291 ; i = 360 ; bnds = 2 ; vertices = 4 ; variables: int j(j) ; j:units = "1" ; j:long_name = "cell index along second dimension" ; int i(i) ; i:units = "1" ; i:long_name = "cell index along first dimension" ; double latitude(j, i) ; latitude:standard_name = "latitude" ; latitude:long_name = "latitude" ; latitude:units = "degrees_north" ; latitude:missing_value = 1.e+20 ; latitude:_FillValue = 1.e+20 ; latitude:bounds = "vertices_latitude" ; ...
This notebook is insired by the CMIP6 data at CEDA guide produced by the CEDA team.
By: James Thomas
Last updated: 21st April 2021