Loading data using Xarray

We're going to try some basic operations with:

We'll be:

The data is available on JASMIN here:

First, import the libraries we will be using:

Loading single or multiple NetCDF files

We can either load a single NetCDF file:

open multiple NetCDF files as one dataset by using wildcards:

or even pass a list of NetCDF files if we want to load more than one variable.

For example, if we wanted to load near-surface air temperature (tas) and precipitation flux (pr):

This will (lazily) load the data into an xr.DataArray structure. Loading lazily means that the data will only be loaded into memory as and when calculcations absolutely need to be performed (for example, when plotting a graph). See Triggering calculations manually.

Looking at dataset from within the notebook, we get similar information to the ncdump -h command, however it will be formatted nicely with some interactive buttons to help browse the structure of the data:

We can also check the coordinates and data variables using the dims and data_vars properties of the dataset, and can access individual attributes by using the attrs property:

Selecting data for a coordinate or data variable

The recommended way to select data for a coordinate or data variable is to use the dataset like a dictionary:

Triggering calculations manually

You will find that xarray does not actually compute the result of a calculation until it's needed (for example, in a plot).

If you want to force the data to be loaded, or a calculation to be made straight away, then you can use .compute(), for example:

This is telling you about the size and shape of the result, but not its value because it has not yet been calculated.

The actual calculation will be done using Dask (note the "15 Tasks, 5 Chunks"):

Selecting data by value, range or condition

By value

To select a value from a dimension, we can use the .sel() method.

For example, to select the air temperatures (for all latitudes and longitudes) for January 2000:

(Note: for simplicity we're just displaying the size of the resulting data)

When selecting by date, depending on the precision of the date we supply, more than one value may be returned.

For example, to select each of the monthly air temperatures (for all latitudes and longitudes) for 2000:

If an exact match to your filter conditions doesn't exist, you can use the method='nearest' argument to find the closest matching point. This is especially useful when selecting locations:

By range

It's possible to use the slice(start, end[, step]) function to specify ranges:

Bear in mind that unlike regular Python slicing, the range is inclusive of the start and end values supplied.

By condition

For more complex queries, it is often convenient to use boolean arrays, which you can give helpful names to.

For example, to select air temperatures for extreme latitudes it is winter in the northern hemisphere:

Selecting data by position

Sometimes you will need to select data by position in the dataset. This can be done using the .isel() method, but it is generally better to use .sel() to select by value, unless you're selecting using a boolean array.

For example, to select the first data point in the time dimension:

Aggregrating data by dimension

You can aggregate your data using the .mean(), .max(), .min(), .median(), .std(), .sum(), etc. methods.

For example, the mean temperature over all time:

or over every longitude as well:

or over all the data:

Recall that we can call .compute() to find out what this is:

Resampling time series data

When working with time series data, you can use .resample() which has the same syntax as Pandas, in combination with the aggregation functions.

For example, to calculate the annual mean, minimum and maximum temperatures for Bristol:

Grouping dimensions by value

Grouping data by value works similarly to resampling, again using a familiar Pandas-style syntax, except additional calendar functionality is available.

For example, to calcuate the mean temperature in Bristol for each season:

Plotting line graphs

TODO

Putting it all together:

Plotting colormeshes

TODO

Plotting facet grids

TODO

Further reading

Acknowledgements

By: James Thomas

Last updated: 22nd April 2021