We're going to try some basic operations with:
Amon
) near-surface air temperatures (tas
)latest
) of the Met Office Hadley Centre's (MOHC
) Hadley Centre Global Environment Model version 3 (HadGEM3-GC31-LL
)piControl
) variant r1i1p1f1
We'll be:
The data is available on JASMIN here:
data_directory = '/badc/cmip6/data/CMIP6/CMIP/MOHC/HadGEM3-GC31-LL/piControl/r1i1p1f1/Amon/tas/gn/latest'
!ls {data_directory}
tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_185001-194912.nc tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_195001-204912.nc tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_205001-214912.nc tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_215001-224912.nc tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_225001-234912.nc
!ncdump -h {data_directory}/tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_195001-204912.nc
netcdf tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_195001-204912 { dimensions: time = UNLIMITED ; // (1200 currently) bnds = 2 ; lat = 144 ; lon = 192 ; variables: double time(time) ; time:bounds = "time_bnds" ; time:units = "days since 1850-01-01" ; time:calendar = "360_day" ; time:axis = "T" ; time:long_name = "time" ; time:standard_name = "time" ; double time_bnds(time, bnds) ; double lat(lat) ; lat:bounds = "lat_bnds" ; lat:units = "degrees_north" ; lat:axis = "Y" ; lat:long_name = "Latitude" ; lat:standard_name = "latitude" ; double lat_bnds(lat, bnds) ; double lon(lon) ; lon:bounds = "lon_bnds" ; lon:units = "degrees_east" ; lon:axis = "X" ; lon:long_name = "Longitude" ; lon:standard_name = "longitude" ; double lon_bnds(lon, bnds) ; double height ; height:units = "m" ; height:axis = "Z" ; height:positive = "up" ; height:long_name = "height" ; height:standard_name = "height" ; float tas(time, lat, lon) ; tas:standard_name = "air_temperature" ; tas:long_name = "Near-Surface Air Temperature" ; tas:comment = "near-surface (usually, 2 meter) air temperature" ; tas:units = "K" ; tas:original_name = "mo: (stash: m01s03i236, lbproc: 128)" ; tas:cell_methods = "area: time: mean" ; tas:cell_measures = "area: areacella" ; tas:history = "2019-06-20T14:08:01Z altered by CMOR: Treated scalar dimension: \'height\'. 2019-06-20T14:08:01Z altered by CMOR: replaced missing value flag (-1.07374e+09) with standard missing value (1e+20)." ; tas:coordinates = "height" ; tas:missing_value = 1.e+20f ; tas:_FillValue = 1.e+20f ; // global attributes: :Conventions = "CF-1.7 CMIP-6.2" ; :activity_id = "CMIP" ; :branch_method = "standard" ; :branch_time_in_child = 0. ; :branch_time_in_parent = 267840. ; :creation_date = "2019-06-20T14:08:01Z" ; :cv_version = "6.2.20.1" ; :data_specs_version = "01.00.29" ; :experiment = "pre-industrial control" ; :experiment_id = "piControl" ; :external_variables = "areacella" ; :forcing_index = 1 ; :frequency = "mon" ; :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.MOHC.HadGEM3-GC31-LL.piControl.none.r1i1p1f1" ; :grid = "Native N96 grid; 192 x 144 longitude/latitude" ; :grid_label = "gn" ; :history = "2019-06-20T13:42:01Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards.;\n", "2019-06-20T13:41:40Z MIP Convert v1.1.0, Python v2.7.12, Iris v1.13.0, Numpy v1.13.3, netcdftime v1.4.1." ; :initialization_index = 1 ; :institution = "Met Office Hadley Centre, Fitzroy Road, Exeter, Devon, EX1 3PB, UK" ; :institution_id = "MOHC" ; :mip_era = "CMIP6" ; :mo_runid = "u-ar766" ; :nominal_resolution = "250 km" ; :parent_activity_id = "CMIP" ; :parent_experiment_id = "piControl-spinup" ; :parent_mip_era = "CMIP6" ; :parent_source_id = "HadGEM3-GC31-LL" ; :parent_time_units = "days since 1850-01-01-00-00-00" ; :parent_variant_label = "r1i1p1f1" ; :physics_index = 1 ; :product = "model-output" ; :realization_index = 1 ; :realm = "atmos" ; :source = "HadGEM3-GC31-LL (2016): \n", "aerosol: UKCA-GLOMAP-mode\n", "atmos: MetUM-HadGEM3-GA7.1 (N96; 192 x 144 longitude/latitude; 85 levels; top level 85 km)\n", "atmosChem: none\n", "land: JULES-HadGEM3-GL7.1\n", "landIce: none\n", "ocean: NEMO-HadGEM3-GO6.0 (eORCA1 tripolar primarily 1 deg with meridional refinement down to 1/3 degree in the tropics; 360 x 330 longitude/latitude; 75 levels; top grid cell 0-1 m)\n", "ocnBgchem: none\n", "seaIce: CICE-HadGEM3-GSI8 (eORCA1 tripolar primarily 1 deg; 360 x 330 longitude/latitude)" ; :source_id = "HadGEM3-GC31-LL" ; :source_type = "AOGCM AER" ; :sub_experiment = "none" ; :sub_experiment_id = "none" ; :table_id = "Amon" ; :table_info = "Creation Date:(13 December 2018) MD5:2b12b5db6db112aa8b8b0d6c1645b121" ; :title = "HadGEM3-GC31-LL output prepared for CMIP6" ; :variable_id = "tas" ; :variant_label = "r1i1p1f1" ; :license = "CMIP6 model data produced by the Met Office Hadley Centre is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://ukesm.ac.uk/cmip6. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ; :cmor_version = "3.4.0" ; :tracking_id = "hdl:21.14100/f3ef3b0d-1929-44c9-8088-39beb8d5c7bf" ; }
First, import the libraries we will be using:
from itertools import chain
from glob import glob
import matplotlib.pyplot as plt
import xarray as xr
# Set some plotting defaults
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100
We can either load a single NetCDF file:
dataset = xr.load_dataset(data_directory + '/tas_Amon_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn_195001-204912.nc')
open multiple NetCDF files as one dataset by using wildcards:
dataset = xr.open_mfdataset(data_directory + '/*.nc')
or even pass a list of NetCDF files if we want to load more than one variable.
For example, if we wanted to load near-surface air temperature (tas
) and precipitation flux (pr
):
paths_to_load = [
glob(f'/badc/cmip6/data/CMIP6/CMIP/MOHC/HadGEM3-GC31-LL/piControl/r1i1p1f1/Amon/{variable}/gn/latest/*.nc')
for variable in ['tas', 'pr']
]
dataset = xr.open_mfdataset(paths=chain(*paths_to_load))
This will (lazily) load the data into an xr.DataArray
structure. Loading lazily means that the data will only be loaded into memory as and when calculcations absolutely need to be performed (for example, when plotting a graph). See Triggering calculations manually.
Looking at dataset
from within the notebook, we get similar information to the ncdump -h
command, however it will be formatted nicely with some interactive buttons to help browse the structure of the data:
dataset
<xarray.Dataset> Dimensions: (bnds: 2, lat: 144, lon: 192, time: 6000) Coordinates: * time (time) object 1850-01-16 00:00:00 ... 2349-12-16 00:00:00 * lat (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38 * lon (lon) float64 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1 height float64 1.5 Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) object dask.array<chunksize=(1200, 2), meta=np.ndarray> lat_bnds (time, lat, bnds) float64 dask.array<chunksize=(1200, 144, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 dask.array<chunksize=(1200, 192, 2), meta=np.ndarray> pr (time, lat, lon) float32 dask.array<chunksize=(1200, 144, 192), meta=np.ndarray> tas (time, lat, lon) float32 dask.array<chunksize=(1200, 144, 192), meta=np.ndarray> Attributes: (12/46) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 267840.0 creation_date: 2019-06-20T23:27:07Z ... ... title: HadGEM3-GC31-LL output prepared for CMIP6 variable_id: tas variant_label: r1i1p1f1 license: CMIP6 model data produced by the Met Office Hadle... cmor_version: 3.4.0 tracking_id: hdl:21.14100/0acbe90f-dc1b-42a1-97d0-10880d151ca9
array([cftime.Datetime360Day(1850, 1, 16, 0, 0, 0, 0), cftime.Datetime360Day(1850, 2, 16, 0, 0, 0, 0), cftime.Datetime360Day(1850, 3, 16, 0, 0, 0, 0), ..., cftime.Datetime360Day(2349, 10, 16, 0, 0, 0, 0), cftime.Datetime360Day(2349, 11, 16, 0, 0, 0, 0), cftime.Datetime360Day(2349, 12, 16, 0, 0, 0, 0)], dtype=object)
array([-89.375, -88.125, -86.875, -85.625, -84.375, -83.125, -81.875, -80.625, -79.375, -78.125, -76.875, -75.625, -74.375, -73.125, -71.875, -70.625, -69.375, -68.125, -66.875, -65.625, -64.375, -63.125, -61.875, -60.625, -59.375, -58.125, -56.875, -55.625, -54.375, -53.125, -51.875, -50.625, -49.375, -48.125, -46.875, -45.625, -44.375, -43.125, -41.875, -40.625, -39.375, -38.125, -36.875, -35.625, -34.375, -33.125, -31.875, -30.625, -29.375, -28.125, -26.875, -25.625, -24.375, -23.125, -21.875, -20.625, -19.375, -18.125, -16.875, -15.625, -14.375, -13.125, -11.875, -10.625, -9.375, -8.125, -6.875, -5.625, -4.375, -3.125, -1.875, -0.625, 0.625, 1.875, 3.125, 4.375, 5.625, 6.875, 8.125, 9.375, 10.625, 11.875, 13.125, 14.375, 15.625, 16.875, 18.125, 19.375, 20.625, 21.875, 23.125, 24.375, 25.625, 26.875, 28.125, 29.375, 30.625, 31.875, 33.125, 34.375, 35.625, 36.875, 38.125, 39.375, 40.625, 41.875, 43.125, 44.375, 45.625, 46.875, 48.125, 49.375, 50.625, 51.875, 53.125, 54.375, 55.625, 56.875, 58.125, 59.375, 60.625, 61.875, 63.125, 64.375, 65.625, 66.875, 68.125, 69.375, 70.625, 71.875, 73.125, 74.375, 75.625, 76.875, 78.125, 79.375, 80.625, 81.875, 83.125, 84.375, 85.625, 86.875, 88.125, 89.375])
array([ 0.9375, 2.8125, 4.6875, 6.5625, 8.4375, 10.3125, 12.1875, 14.0625, 15.9375, 17.8125, 19.6875, 21.5625, 23.4375, 25.3125, 27.1875, 29.0625, 30.9375, 32.8125, 34.6875, 36.5625, 38.4375, 40.3125, 42.1875, 44.0625, 45.9375, 47.8125, 49.6875, 51.5625, 53.4375, 55.3125, 57.1875, 59.0625, 60.9375, 62.8125, 64.6875, 66.5625, 68.4375, 70.3125, 72.1875, 74.0625, 75.9375, 77.8125, 79.6875, 81.5625, 83.4375, 85.3125, 87.1875, 89.0625, 90.9375, 92.8125, 94.6875, 96.5625, 98.4375, 100.3125, 102.1875, 104.0625, 105.9375, 107.8125, 109.6875, 111.5625, 113.4375, 115.3125, 117.1875, 119.0625, 120.9375, 122.8125, 124.6875, 126.5625, 128.4375, 130.3125, 132.1875, 134.0625, 135.9375, 137.8125, 139.6875, 141.5625, 143.4375, 145.3125, 147.1875, 149.0625, 150.9375, 152.8125, 154.6875, 156.5625, 158.4375, 160.3125, 162.1875, 164.0625, 165.9375, 167.8125, 169.6875, 171.5625, 173.4375, 175.3125, 177.1875, 179.0625, 180.9375, 182.8125, 184.6875, 186.5625, 188.4375, 190.3125, 192.1875, 194.0625, 195.9375, 197.8125, 199.6875, 201.5625, 203.4375, 205.3125, 207.1875, 209.0625, 210.9375, 212.8125, 214.6875, 216.5625, 218.4375, 220.3125, 222.1875, 224.0625, 225.9375, 227.8125, 229.6875, 231.5625, 233.4375, 235.3125, 237.1875, 239.0625, 240.9375, 242.8125, 244.6875, 246.5625, 248.4375, 250.3125, 252.1875, 254.0625, 255.9375, 257.8125, 259.6875, 261.5625, 263.4375, 265.3125, 267.1875, 269.0625, 270.9375, 272.8125, 274.6875, 276.5625, 278.4375, 280.3125, 282.1875, 284.0625, 285.9375, 287.8125, 289.6875, 291.5625, 293.4375, 295.3125, 297.1875, 299.0625, 300.9375, 302.8125, 304.6875, 306.5625, 308.4375, 310.3125, 312.1875, 314.0625, 315.9375, 317.8125, 319.6875, 321.5625, 323.4375, 325.3125, 327.1875, 329.0625, 330.9375, 332.8125, 334.6875, 336.5625, 338.4375, 340.3125, 342.1875, 344.0625, 345.9375, 347.8125, 349.6875, 351.5625, 353.4375, 355.3125, 357.1875, 359.0625])
array(1.5)
|
|
|
|
|
We can also check the coordinates and data variables using the dims
and data_vars
properties of the dataset, and can access individual attributes by using the attrs
property:
dataset.dims
Frozen(SortedKeysDict({'time': 6000, 'bnds': 2, 'lat': 144, 'lon': 192}))
dataset.data_vars
Data variables: time_bnds (time, bnds) object dask.array<chunksize=(1200, 2), meta=np.ndarray> lat_bnds (time, lat, bnds) float64 dask.array<chunksize=(1200, 144, 2), meta=np.ndarray> lon_bnds (time, lon, bnds) float64 dask.array<chunksize=(1200, 192, 2), meta=np.ndarray> pr (time, lat, lon) float32 dask.array<chunksize=(1200, 144, 192), meta=np.ndarray> tas (time, lat, lon) float32 dask.array<chunksize=(1200, 144, 192), meta=np.ndarray>
dataset.attrs['further_info_url']
'https://furtherinfo.es-doc.org/CMIP6.MOHC.HadGEM3-GC31-LL.piControl.none.r1i1p1f1'
The recommended way to select data for a coordinate or data variable is to use the dataset like a dictionary:
air_temperature = dataset['tas']
precipitation = dataset['pr']
You will find that xarray does not actually compute the result of a calculation until it's needed (for example, in a plot).
If you want to force the data to be loaded, or a calculation to be made straight away, then you can use .compute()
, for example:
air_temperature
<xarray.DataArray 'tas' (time: 6000, lat: 144, lon: 192)> dask.array<concatenate, shape=(6000, 144, 192), dtype=float32, chunksize=(1200, 144, 192), chunktype=numpy.ndarray> Coordinates: * time (time) object 1850-01-16 00:00:00 ... 2349-12-16 00:00:00 * lat (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38 * lon (lon) float64 0.9375 2.812 4.688 6.562 ... 353.4 355.3 357.2 359.1 height float64 1.5 Attributes: standard_name: air_temperature long_name: Near-Surface Air Temperature comment: near-surface (usually, 2 meter) air temperature units: K original_name: mo: (stash: m01s03i236, lbproc: 128) cell_methods: area: time: mean cell_measures: area: areacella history: 2019-06-20T13:04:39Z altered by CMOR: Treated scalar dime...
|
array([cftime.Datetime360Day(1850, 1, 16, 0, 0, 0, 0), cftime.Datetime360Day(1850, 2, 16, 0, 0, 0, 0), cftime.Datetime360Day(1850, 3, 16, 0, 0, 0, 0), ..., cftime.Datetime360Day(2349, 10, 16, 0, 0, 0, 0), cftime.Datetime360Day(2349, 11, 16, 0, 0, 0, 0), cftime.Datetime360Day(2349, 12, 16, 0, 0, 0, 0)], dtype=object)
array([-89.375, -88.125, -86.875, -85.625, -84.375, -83.125, -81.875, -80.625, -79.375, -78.125, -76.875, -75.625, -74.375, -73.125, -71.875, -70.625, -69.375, -68.125, -66.875, -65.625, -64.375, -63.125, -61.875, -60.625, -59.375, -58.125, -56.875, -55.625, -54.375, -53.125, -51.875, -50.625, -49.375, -48.125, -46.875, -45.625, -44.375, -43.125, -41.875, -40.625, -39.375, -38.125, -36.875, -35.625, -34.375, -33.125, -31.875, -30.625, -29.375, -28.125, -26.875, -25.625, -24.375, -23.125, -21.875, -20.625, -19.375, -18.125, -16.875, -15.625, -14.375, -13.125, -11.875, -10.625, -9.375, -8.125, -6.875, -5.625, -4.375, -3.125, -1.875, -0.625, 0.625, 1.875, 3.125, 4.375, 5.625, 6.875, 8.125, 9.375, 10.625, 11.875, 13.125, 14.375, 15.625, 16.875, 18.125, 19.375, 20.625, 21.875, 23.125, 24.375, 25.625, 26.875, 28.125, 29.375, 30.625, 31.875, 33.125, 34.375, 35.625, 36.875, 38.125, 39.375, 40.625, 41.875, 43.125, 44.375, 45.625, 46.875, 48.125, 49.375, 50.625, 51.875, 53.125, 54.375, 55.625, 56.875, 58.125, 59.375, 60.625, 61.875, 63.125, 64.375, 65.625, 66.875, 68.125, 69.375, 70.625, 71.875, 73.125, 74.375, 75.625, 76.875, 78.125, 79.375, 80.625, 81.875, 83.125, 84.375, 85.625, 86.875, 88.125, 89.375])
array([ 0.9375, 2.8125, 4.6875, 6.5625, 8.4375, 10.3125, 12.1875, 14.0625, 15.9375, 17.8125, 19.6875, 21.5625, 23.4375, 25.3125, 27.1875, 29.0625, 30.9375, 32.8125, 34.6875, 36.5625, 38.4375, 40.3125, 42.1875, 44.0625, 45.9375, 47.8125, 49.6875, 51.5625, 53.4375, 55.3125, 57.1875, 59.0625, 60.9375, 62.8125, 64.6875, 66.5625, 68.4375, 70.3125, 72.1875, 74.0625, 75.9375, 77.8125, 79.6875, 81.5625, 83.4375, 85.3125, 87.1875, 89.0625, 90.9375, 92.8125, 94.6875, 96.5625, 98.4375, 100.3125, 102.1875, 104.0625, 105.9375, 107.8125, 109.6875, 111.5625, 113.4375, 115.3125, 117.1875, 119.0625, 120.9375, 122.8125, 124.6875, 126.5625, 128.4375, 130.3125, 132.1875, 134.0625, 135.9375, 137.8125, 139.6875, 141.5625, 143.4375, 145.3125, 147.1875, 149.0625, 150.9375, 152.8125, 154.6875, 156.5625, 158.4375, 160.3125, 162.1875, 164.0625, 165.9375, 167.8125, 169.6875, 171.5625, 173.4375, 175.3125, 177.1875, 179.0625, 180.9375, 182.8125, 184.6875, 186.5625, 188.4375, 190.3125, 192.1875, 194.0625, 195.9375, 197.8125, 199.6875, 201.5625, 203.4375, 205.3125, 207.1875, 209.0625, 210.9375, 212.8125, 214.6875, 216.5625, 218.4375, 220.3125, 222.1875, 224.0625, 225.9375, 227.8125, 229.6875, 231.5625, 233.4375, 235.3125, 237.1875, 239.0625, 240.9375, 242.8125, 244.6875, 246.5625, 248.4375, 250.3125, 252.1875, 254.0625, 255.9375, 257.8125, 259.6875, 261.5625, 263.4375, 265.3125, 267.1875, 269.0625, 270.9375, 272.8125, 274.6875, 276.5625, 278.4375, 280.3125, 282.1875, 284.0625, 285.9375, 287.8125, 289.6875, 291.5625, 293.4375, 295.3125, 297.1875, 299.0625, 300.9375, 302.8125, 304.6875, 306.5625, 308.4375, 310.3125, 312.1875, 314.0625, 315.9375, 317.8125, 319.6875, 321.5625, 323.4375, 325.3125, 327.1875, 329.0625, 330.9375, 332.8125, 334.6875, 336.5625, 338.4375, 340.3125, 342.1875, 344.0625, 345.9375, 347.8125, 349.6875, 351.5625, 353.4375, 355.3125, 357.1875, 359.0625])
array(1.5)
This is telling you about the size and shape of the result, but not its value because it has not yet been calculated.
The actual calculation will be done using Dask (note the "15 Tasks, 5 Chunks"):
air_temperature.compute()
<xarray.DataArray 'tas' (time: 6000, lat: 144, lon: 192)> array([[[250.32886, 250.32324, 250.32935, ..., 250.34937, 250.33301, 250.32056], [251.0586 , 250.94751, 250.86621, ..., 251.32129, 251.23145, 251.18994], [251.68872, 251.55005, 251.34009, ..., 252.21875, 252.01123, 251.76636], ..., [235.06787, 235.12476, 235.22192, ..., 234.69995, 234.83398, 234.93921], [233.54102, 233.62524, 233.67993, ..., 233.32544, 233.40308, 233.4773 ], [231.41382, 231.42969, 231.40356, ..., 231.32178, 231.36523, 231.35059]], [[235.42041, 235.44043, 235.44922, ..., 235.46582, 235.44946, 235.43115], [236.75244, 236.69043, 236.62769, ..., 236.94019, 236.8938 , 236.81152], [237.34253, 237.24854, 237.1272 , ..., 237.73315, 237.59863, 237.4458 ], ... [242.58374, 242.7544 , 242.95117, ..., 242.10327, 242.2522 , 242.45459], [241.95752, 242.0127 , 242.0586 , ..., 241.70459, 241.81274, 241.88501], [241.06982, 241.09692, 241.09326, ..., 241.08228, 241.0752 , 241.07373]], [[255.11694, 255.10498, 255.11841, ..., 255.13794, 255.12451, 255.12451], [255.86841, 255.77466, 255.67114, ..., 256.13965, 256.05762, 255.97363], [256.39673, 256.21167, 256.08203, ..., 256.84888, 256.70703, 256.54248], ..., [239.31738, 239.3352 , 239.35742, ..., 239.24683, 239.26514, 239.27979], [239.31421, 239.29858, 239.31665, ..., 239.29224, 239.29468, 239.29883], [239.52539, 239.50659, 239.51709, ..., 239.51123, 239.49268, 239.50684]]], dtype=float32) Coordinates: * time (time) object 1850-01-16 00:00:00 ... 2349-12-16 00:00:00 * lat (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38 * lon (lon) float64 0.9375 2.812 4.688 6.562 ... 353.4 355.3 357.2 359.1 height float64 1.5 Attributes: standard_name: air_temperature long_name: Near-Surface Air Temperature comment: near-surface (usually, 2 meter) air temperature units: K original_name: mo: (stash: m01s03i236, lbproc: 128) cell_methods: area: time: mean cell_measures: area: areacella history: 2019-06-20T13:04:39Z altered by CMOR: Treated scalar dime...
array([[[250.32886, 250.32324, 250.32935, ..., 250.34937, 250.33301, 250.32056], [251.0586 , 250.94751, 250.86621, ..., 251.32129, 251.23145, 251.18994], [251.68872, 251.55005, 251.34009, ..., 252.21875, 252.01123, 251.76636], ..., [235.06787, 235.12476, 235.22192, ..., 234.69995, 234.83398, 234.93921], [233.54102, 233.62524, 233.67993, ..., 233.32544, 233.40308, 233.4773 ], [231.41382, 231.42969, 231.40356, ..., 231.32178, 231.36523, 231.35059]], [[235.42041, 235.44043, 235.44922, ..., 235.46582, 235.44946, 235.43115], [236.75244, 236.69043, 236.62769, ..., 236.94019, 236.8938 , 236.81152], [237.34253, 237.24854, 237.1272 , ..., 237.73315, 237.59863, 237.4458 ], ... [242.58374, 242.7544 , 242.95117, ..., 242.10327, 242.2522 , 242.45459], [241.95752, 242.0127 , 242.0586 , ..., 241.70459, 241.81274, 241.88501], [241.06982, 241.09692, 241.09326, ..., 241.08228, 241.0752 , 241.07373]], [[255.11694, 255.10498, 255.11841, ..., 255.13794, 255.12451, 255.12451], [255.86841, 255.77466, 255.67114, ..., 256.13965, 256.05762, 255.97363], [256.39673, 256.21167, 256.08203, ..., 256.84888, 256.70703, 256.54248], ..., [239.31738, 239.3352 , 239.35742, ..., 239.24683, 239.26514, 239.27979], [239.31421, 239.29858, 239.31665, ..., 239.29224, 239.29468, 239.29883], [239.52539, 239.50659, 239.51709, ..., 239.51123, 239.49268, 239.50684]]], dtype=float32)
array([cftime.Datetime360Day(1850, 1, 16, 0, 0, 0, 0), cftime.Datetime360Day(1850, 2, 16, 0, 0, 0, 0), cftime.Datetime360Day(1850, 3, 16, 0, 0, 0, 0), ..., cftime.Datetime360Day(2349, 10, 16, 0, 0, 0, 0), cftime.Datetime360Day(2349, 11, 16, 0, 0, 0, 0), cftime.Datetime360Day(2349, 12, 16, 0, 0, 0, 0)], dtype=object)
array([-89.375, -88.125, -86.875, -85.625, -84.375, -83.125, -81.875, -80.625, -79.375, -78.125, -76.875, -75.625, -74.375, -73.125, -71.875, -70.625, -69.375, -68.125, -66.875, -65.625, -64.375, -63.125, -61.875, -60.625, -59.375, -58.125, -56.875, -55.625, -54.375, -53.125, -51.875, -50.625, -49.375, -48.125, -46.875, -45.625, -44.375, -43.125, -41.875, -40.625, -39.375, -38.125, -36.875, -35.625, -34.375, -33.125, -31.875, -30.625, -29.375, -28.125, -26.875, -25.625, -24.375, -23.125, -21.875, -20.625, -19.375, -18.125, -16.875, -15.625, -14.375, -13.125, -11.875, -10.625, -9.375, -8.125, -6.875, -5.625, -4.375, -3.125, -1.875, -0.625, 0.625, 1.875, 3.125, 4.375, 5.625, 6.875, 8.125, 9.375, 10.625, 11.875, 13.125, 14.375, 15.625, 16.875, 18.125, 19.375, 20.625, 21.875, 23.125, 24.375, 25.625, 26.875, 28.125, 29.375, 30.625, 31.875, 33.125, 34.375, 35.625, 36.875, 38.125, 39.375, 40.625, 41.875, 43.125, 44.375, 45.625, 46.875, 48.125, 49.375, 50.625, 51.875, 53.125, 54.375, 55.625, 56.875, 58.125, 59.375, 60.625, 61.875, 63.125, 64.375, 65.625, 66.875, 68.125, 69.375, 70.625, 71.875, 73.125, 74.375, 75.625, 76.875, 78.125, 79.375, 80.625, 81.875, 83.125, 84.375, 85.625, 86.875, 88.125, 89.375])
array([ 0.9375, 2.8125, 4.6875, 6.5625, 8.4375, 10.3125, 12.1875, 14.0625, 15.9375, 17.8125, 19.6875, 21.5625, 23.4375, 25.3125, 27.1875, 29.0625, 30.9375, 32.8125, 34.6875, 36.5625, 38.4375, 40.3125, 42.1875, 44.0625, 45.9375, 47.8125, 49.6875, 51.5625, 53.4375, 55.3125, 57.1875, 59.0625, 60.9375, 62.8125, 64.6875, 66.5625, 68.4375, 70.3125, 72.1875, 74.0625, 75.9375, 77.8125, 79.6875, 81.5625, 83.4375, 85.3125, 87.1875, 89.0625, 90.9375, 92.8125, 94.6875, 96.5625, 98.4375, 100.3125, 102.1875, 104.0625, 105.9375, 107.8125, 109.6875, 111.5625, 113.4375, 115.3125, 117.1875, 119.0625, 120.9375, 122.8125, 124.6875, 126.5625, 128.4375, 130.3125, 132.1875, 134.0625, 135.9375, 137.8125, 139.6875, 141.5625, 143.4375, 145.3125, 147.1875, 149.0625, 150.9375, 152.8125, 154.6875, 156.5625, 158.4375, 160.3125, 162.1875, 164.0625, 165.9375, 167.8125, 169.6875, 171.5625, 173.4375, 175.3125, 177.1875, 179.0625, 180.9375, 182.8125, 184.6875, 186.5625, 188.4375, 190.3125, 192.1875, 194.0625, 195.9375, 197.8125, 199.6875, 201.5625, 203.4375, 205.3125, 207.1875, 209.0625, 210.9375, 212.8125, 214.6875, 216.5625, 218.4375, 220.3125, 222.1875, 224.0625, 225.9375, 227.8125, 229.6875, 231.5625, 233.4375, 235.3125, 237.1875, 239.0625, 240.9375, 242.8125, 244.6875, 246.5625, 248.4375, 250.3125, 252.1875, 254.0625, 255.9375, 257.8125, 259.6875, 261.5625, 263.4375, 265.3125, 267.1875, 269.0625, 270.9375, 272.8125, 274.6875, 276.5625, 278.4375, 280.3125, 282.1875, 284.0625, 285.9375, 287.8125, 289.6875, 291.5625, 293.4375, 295.3125, 297.1875, 299.0625, 300.9375, 302.8125, 304.6875, 306.5625, 308.4375, 310.3125, 312.1875, 314.0625, 315.9375, 317.8125, 319.6875, 321.5625, 323.4375, 325.3125, 327.1875, 329.0625, 330.9375, 332.8125, 334.6875, 336.5625, 338.4375, 340.3125, 342.1875, 344.0625, 345.9375, 347.8125, 349.6875, 351.5625, 353.4375, 355.3125, 357.1875, 359.0625])
array(1.5)
To select a value from a dimension, we can use the .sel()
method.
For example, to select the air temperatures (for all latitudes and longitudes) for January 2000:
temperature_january = air_temperature.sel(time='2000-01')
temperature_january.sizes
Frozen({'time': 1, 'lat': 144, 'lon': 192})
(Note: for simplicity we're just displaying the size of the resulting data)
When selecting by date, depending on the precision of the date we supply, more than one value may be returned.
For example, to select each of the monthly air temperatures (for all latitudes and longitudes) for 2000:
temperature_2000 = air_temperature.sel(time='2000')
temperature_2000.sizes
Frozen({'time': 12, 'lat': 144, 'lon': 192})
If an exact match to your filter conditions doesn't exist, you can use the method='nearest'
argument to find the closest matching point. This is especially useful when selecting locations:
bristol_temperature = air_temperature.sel(lat=51.4578, lon=-2.6017, method='nearest')
bristol_temperature.sizes
Frozen({'time': 6000})
It's possible to use the slice(start, end[, step])
function to specify ranges:
temperature_decade = air_temperature.sel(time=slice("2000-01", "2009-12"))
temperature_decade.sizes
Frozen({'time': 120, 'lat': 144, 'lon': 192})
Bear in mind that unlike regular Python slicing, the range is inclusive of the start
and end
values supplied.
temperature_equator = air_temperature.sel(lat=slice(-10, 10))
temperature_equator.sizes
Frozen({'time': 6000, 'lat': 16, 'lon': 192})
For more complex queries, it is often convenient to use boolean arrays, which you can give helpful names to.
For example, to select air temperatures for extreme latitudes it is winter in the northern hemisphere:
is_winter = air_temperature['time'].dt.season == 'DJF'
is_extreme_latitude = abs(air_temperature['lat']) > 60
temperature_winter_poles = air_temperature.isel(time=is_winter, lat=is_extreme_latitude)
temperature_winter_poles.sizes
Frozen({'time': 1500, 'lat': 48, 'lon': 192})
Sometimes you will need to select data by position in the dataset. This can be done using the .isel()
method, but it is generally better to use .sel()
to select by value, unless you're selecting using a boolean array.
For example, to select the first data point in the time dimension:
first_temperature = air_temperature.isel(time=0)
first_temperature.sizes
Frozen({'lat': 144, 'lon': 192})
You can aggregate your data using the .mean()
, .max()
, .min()
, .median()
, .std()
, .sum()
, etc. methods.
For example, the mean temperature over all time:
average_temperature = air_temperature.mean(dim='time')
average_temperature.sizes
Frozen({'lat': 144, 'lon': 192})
or over every longitude as well:
average_temperature = air_temperature.mean(dim=['time', 'lon'])
average_temperature.sizes
Frozen({'lat': 144})
or over all the data:
average_temperature = air_temperature.mean()
average_temperature.sizes
Frozen({})
Recall that we can call .compute()
to find out what this is:
average_temperature.compute()
<xarray.DataArray 'tas' ()> array(277.46924, dtype=float32) Coordinates: height float64 1.5
array(277.46924, dtype=float32)
array(1.5)
When working with time series data, you can use .resample()
which has the same syntax as Pandas, in combination with the aggregation functions.
For example, to calculate the annual mean, minimum and maximum temperatures for Bristol:
bristol_temperature = air_temperature.sel(lat=51.4578, lon=-2.6017, method="nearest")
annual_bristol_temperatures = (
bristol_temperature
.resample(time="1Y")
)
mean_annual_temperature = annual_bristol_temperatures.mean()
min_annual_temperature = annual_bristol_temperatures.min()
max_annual_temperature = annual_bristol_temperatures.max()
mean_annual_temperature.sizes
Frozen({'time': 500})
Grouping data by value works similarly to resampling, again using a familiar Pandas-style syntax, except additional calendar functionality is available.
For example, to calcuate the mean temperature in Bristol for each season:
bristol_seasonal_temperature = (
bristol_temperature
.groupby("time.season")
.mean()
.reindex(season=["DJF", "MAM", "JJA", "SON"]) # Put the values in a useful order
)
bristol_seasonal_temperature.compute()
<xarray.DataArray 'tas' (season: 4)> array([278.642 , 280.7515 , 288.65384, 285.46402], dtype=float32) Coordinates: * season (season) <U3 'DJF' 'MAM' 'JJA' 'SON' lat float64 51.88 lon float64 0.9375 height float64 1.5
array([278.642 , 280.7515 , 288.65384, 285.46402], dtype=float32)
array(['DJF', 'MAM', 'JJA', 'SON'], dtype='<U3')
array(51.875)
array(0.9375)
array(1.5)
TODO
Putting it all together:
air_temperature = dataset['tas']
bristol_temperature = air_temperature.sel(lat=51.4578, lon=-2.6017, method="nearest")
mean_annual_temperature = bristol_temperature.resample(time="1Y").mean()
min_annual_temperature = bristol_temperature.resample(time="1Y").min()
max_annual_temperature = bristol_temperature.resample(time="1Y").max()
mean_annual_temperature.plot()
plt.fill_between(
x=min_annual_temperature['time'].values,
y1=min_annual_temperature.values,
y2=max_annual_temperature.values,
alpha=0.4,
)
<matplotlib.collections.PolyCollection at 0x7f823813cf10>
TODO
(
dataset['tas']
.sel(time="2000-01")
).plot()
<matplotlib.collections.QuadMesh at 0x7f8250564310>
TODO
(
dataset['tas']
.groupby('time.season')
.mean()
.reindex(season=['DJF', 'MAM', 'JJA', 'SON']) # Put the values in a useful order
).plot(col='season')
<xarray.plot.facetgrid.FacetGrid at 0x7f825056ca60>