Getting your hands-on climate data
Author(s) | Anne Fouilloux |
Reviewers |
OverviewQuestions:Objectives:
What is climate?
What type of data is available?
Requirements:
Learn about the terminology
Learn about the different source of climate data
Learn about climate observations, reanalysis, climate predictions and climate projections
Time estimation: 1 hourSupporting Materials:Published: Apr 30, 2020Last modification: Jun 11, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00041rating Rating: 4.7 (0 recent ratings, 3 all time)version Revision: 12
The practical aims at familiarzing you with Climate Science and the terminology used by climate scientists. The target audience is not a climate scientist but anyone interested in learning about climate.
CommentThis tutorial is significantly based on Getting your hands-on Climate data.
AgendaIn this tutorial, we will cover:
Comment: BackgroundEuropean Copernicus Climate Change Service (C3S) provide authoritative information about the past, present and future climate. C3S is one of the many services provided by Copernicus, the European Union’s Earth Observation Programme, looking at our planet and its environment for the ultimate benefit of all European citizens. The C3S Climate Data Store (CDS) provides a single point of access to a wide range of quality-assured climate datasets distributed in the cloud. Access to the CDS data is open, free and unrestricted. We will be using freely available datasets from the CDS, including observations, historical climate data records, estimates of Essential Climate Variables (ECVs) derived from Earth observations, global and regional climate reanalyses of past observations, seasonal forecasts and climate projections.
For the purpose of this tutorial, sample datasets have been created from data downloaded from C3S through Copernicus Climate Data Store:
- E-OBS daily gridded meteorological data for Europe from 1950 to present derived from in-situ observations
- Essential climate variables for assessment of climate variability from 1979 to present
To reduce the volume of data, the data resolution (in space and/or time) has been significantly reduced and/or data has been selected on sample locations (Paris, Oslo and Freiburg). The data format may also have been changed (for instance to tabular) to ease processing.
Get data
Hands-on: Data upload
Create a new history for this tutorial. If you are not inspired, you can name it climate101.
To create a new history simply click the new-history icon at the top of the history panel:
Import the files from Zenodo or from the shared data library
https://zenodo.org/record/3776500/files/tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv https://zenodo.org/record/3776500/files/ts_cities.csv
- Copy the link location
Click galaxy-upload Upload Data at the top of the tool panel
- Select galaxy-wf-edit Paste/Fetch Data
Paste the link(s) into the text field
Press Start
- Close the window
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
- Go into Data (top panel) then Data libraries
- Navigate to the correct folder as indicated by your instructor.
- On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
- Select the desired files
- Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
In the pop-up window, choose
- “Select history”: the history you want to import the data to (or create a new one)
- Click on Import
Check that the datatype is tabular
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click galaxy-chart-select-data Datatypes tab on the top
- In the galaxy-chart-select-data Assign Datatype, select
datatypes
from “New type” dropdown
- Tip: you can start typing the datatype into the field to filter the dropdown menu
- Click the Save button
If it is not
tabular
make sure to convert it using the Galaxy built-in format converters.
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click on the galaxy-gear Convert tab on the top
- In the upper part galaxy-gear Convert, select
Convert CSV to Tabular
- Click the Create dataset button to start the conversion.
Rename Datasets
As “
https://zenodo.org/record/3776500/files/tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
” is not a beautiful name and can give errors for some tools, it is a good practice to change the dataset name by something more meaningful. For example by removinghttps://zenodo.org/record/3776500/files/
to obtaintg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
andts_cities.csv
, respectively.
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, change the Name field
- Click the Save button
Add a tag to the dataset corresponding to
copernicus
Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.
To tag a dataset:
- Click on the dataset to expand it
- Click on Add Tags galaxy-tags
- Add tag text. Tags starting with
#
will be automatically propagated to the outputs of tools using this dataset (see below).- Press Enter
- Check that the tag appears below the dataset name
Tags beginning with
#
are special!They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):
- a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
- dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for
+
and-
strands. This generates two datasets (4 and 5 for plus and minus, respectively);- datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
- datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.
Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.
The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with
#plus
and#minus
, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.More information is in a dedicated #nametag tutorial.
What is climate?
According to wikipedia, Climate is defined as the average state of everyday’s weather condition over a period of 30 years. It is measured by assessing the patterns of variation in temperature, humidity, atmospheric pressure, wind, precipitation, atmospheric particle count and other meteorological variables in a given region over a long period of time (usually 20 or 30 years). Climate differs from weather, in that weather only describes the short-term conditions of these variables in a given region.
Climate versus Weather
Quantities that climate scientists are interested in are similar to those used to assess the weather (temperature, precipitation, etc.). But there is a big difference between climate and weather: weather varies from hour to hour and from day to day whereas climate is defined as the average of weather over several decades or longer.
The figure below shows a woman walking her dog and we can use it to make an analogy to illustrate the difference between weather and climate. if you focus your attention on the dog, you can see that it is all over the place, sometimes upwards, sometimes downwards: this can represent the weather and its variability. The dog (weather) is not following a fully random pattern and varies around a main direction (trend) that is given by the woman: the woman is representing the climate and gives us an indication of where both the woman and dog are likely to be in the future.
Source: Animated short introduction to statistics in climate research from Norwegian infotainment program Siffer. Produced by TeddyTV for NRK. Animation by Ole Christoffer Haga
You can also watch an animated illustration of the difference between climate and weather:
What is the weather like in Paris?
In order to answer this question, we are going to inspect and visualize the dataset tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
using simple galaxy tools.
Hands-on: Daily temperature time seriesComment: Tip: search for the toolMany different tools can be used to answer to the questions. Here we give you some guidelines to help you to choose. Use the tools search box at the top of the tool panel to find Select lines that match an expression tool and Datamash tool.
Question
- What was the average temperature in Paris on the 14th of July 2003?
- What is the minimum and maximum temperatures in Paris?
- On which date did the minimum temperature occured?
- On which date did the maximum temperature occured?
- The average temperature in Paris on the 14th of July 2003 was 26.73 degrees Celcius. It can be found by using Select lines that match an expression tool with parameter “the pattern” set to 2003-07-14.
- The minimum temperature in Paris is -11.6799995 degrees celcius and the maximum temperature in Paris is 33.579998 degrees celcius. To find out, you can use Datamash tool with the following parameters:
- param-file “Input tabular dataset”:
tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
- “Input file has a header line”:
Yes
- “Print header line”:
Yes
- “Print all fields from input file”:
No
- In “Operation to perform on each group”:
- param-repeat “Insert Operation to perform on each group”
- “Type”:
minimum
- “On column”:
c2
- param-repeat “Insert Operation to perform on each group”
- “Type”:
maximum
- “On column”:
c2
- The minimum temperature (-11.6799995 degrees celcius) was observed on January 16 1985. You can use different Galaxy tools to find out the solution and here we show you how to use Datamash tool with the following parameters:
- param-file “Input tabular dataset”:
tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
- “Input file has a header line”:
Yes
- “Print header line”:
Yes
- “Print all fields from input file”:
Yes
- In “Operation to perform on each group”:
- param-repeat “Insert Operation to perform on each group”
- “Type”:
minimum
- “On column”:
c2
The maximum temperature (33.579998 degrees celcius) was observed on July 25 2019. For the maximum, repeat Datamash tool with the following parameters:
- param-file “Input tabular dataset”:
tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
- “Input file has a header line”:
Yes
- “Print header line”:
Yes
- “Print all fields from input file”:
Yes
- In “Operation to perform on each group”:
- param-repeat “Insert Operation to perform on each group”
- “Type”:
maximum
- “On column”:
c2
What is the climate in Paris?
To get some information about the (past and current) climate in Paris, we will first look at monthly averages.
Seasonality
Hands-on: What is the monthly climatological temperature in Paris?To answer to this question, we will compute the global average temperatures over the entire period 1950 and 2019 for each month (January, February, etc.). Indeed, this period of time is sufficiently long for computing monthly climatological temperature (more than 30 years).
Question
- What is the warmest summer month e.g. between June, July and August (JJA) in Paris?
- What is the coolest winter month e.g. between December, January and February (DJF) in Paris?
- The warmest summer month in Paris is July (19.921018171429 degrees celcius). However, it is interesting to remark that on our dataset we see very little difference in the mean temperature between July and August.
The coolest winter month in Paris is January (4.4669169722484 degrees celcius).
Below, we show you how we found these results. We will first split all the dates (first column) from YYYY-MM-DD (where YYYY is the year, MM the month and DD the day) to three column to get 3 columns: one for the year, one for the month and one for the day. Use Text reformatting with awk tool with parameters:
- File to process:
tg_ens_mean_0.1deg_reg_v20.0e_Paris_daily.csv
- AWK Program:
gsub(/-/,"\t",$1){$1=$1} {print}
Rename the resulting file to
split_dates_Paris.csv
.Then use Datamash tool with the following parameters:
- param-file “Input tabular dataset”:
split_dates_Paris.csv
- “Group by fields”: 2
- “Input file has a header line”:
Yes
- “Print header line”:
No
- “Sort input”:
Yes
- “Print all fields from input file”:
No
- In “Operation to perform on each group”:
- param-repeat “Insert Operation to perform on each group”
- “Type”:
Mean
- “On column”:
c4
Rename the resulting file to
climatology_Paris.csv
. Then use again Datamash to get the month where the minimum and maximum temperatures are found:
- param-file “Input tabular dataset”:
climatology_Paris.csv
- “Group by fields”:
- “Input file has a header line”:
No
- “Print header line”:
No
- “Print all fields from input file”:
Yes
- In “Operation to perform on each group”: - param-repeat “Insert Operation to perform on each group” - “Type”:
minimum
- “On column”:c2
Look at the resulting file and the first field will give you the month (07 e.g. July) where the maximum temperature is found.
For the maximum, repeat Datamash tool with the following parameters:
- param-file “Input tabular dataset”:
climatology_Paris.csv
- “Group by fields”: ``
- “Input file has a header line”:
Yes
- “Print header line”:
No
- “Print all fields from input file”:
Yes
- In “Operation to perform on each group”:
- param-repeat “Insert Operation to perform on each group”
- “Type”:
maximum
- “On column”:
c2
The result is in the first column of the resulting file which indicates
01
e.g. January.Please note that you may use other Galaxy tools to reach the same results. Results can be slightly different when using different source of climate information. However, you will always observe the same pattern e.g. cool month in winter and warm month on summer. We can also clearly see that Paris has a mild climate with on average no extreme temperatures.
In this tutorial, we compute manually the monthly climatological temperatures to explain you the algorithm used behing. However, many data providers have pre-computed climatologies and can be directly downloaded. For instance, on the CDS, climatologies are provided for Essential climate variables for assessment of climate variability from 1979 to present.
Yearly average
Hands-on: What is the trend (cooling/warming) in the climate for Paris between 1950 and 2019?To answer to this question, we will compute yearly mean of the temperature in Paris and visualize it.
- Use Datamash tool with the following parameters:
- param-file “Input tabular dataset”:
split_dates_Paris.csv
- “Group by fields”: 1
- “Input file has a header line”:
Yes
- “Print header line”:
No
- “Print all fields from input file”:
No
- “Sort input”:
Yes
- In “Operation to perform on each group”:
- param-repeat “Insert Operation to perform on each group” - “Type”:
Mean
- “On column”:c4
Rename the resulting file to
yearly_mean_Paris.csv
.- To make a plot, you can use Scatterplot w ggplot2 tool with the following parameters:
- “Input in tabular format”:
yearly_mean_Paris.csv
- “Column to plot on x-axis”: 1
- “Column to plot on y-axis”: 2
- “Plot title”: Yearly mean temperature in Paris from 1950 to 2019
- “Label for x axis”: Year (YYYY)
- “Label for y axis”: Temperature (degrees celcius)
- And finally in
Advanced Options
changeType of plot
to Points and Lines.- View galaxy-eye the resulting plot:
QuestionCan we easily observe a trend?
The plot clearly shows a slight increase in the yearly mean temperature between 1950 and 2019. Even though it looks no more than a few degrees celcius, it is quite significant.
Anomalies
In climate change studies, temperature anomalies are more important than absolute temperature. A temperature anomaly is the difference from an average, or baseline, temperature. The baseline temperature is typically computed by averaging 30 or more years of temperature data. A positive anomaly indicates the observed temperature was warmer than the baseline, while a negative anomaly indicates the observed temperature was cooler than the baseline.
Hands-on: Climate stripes for ParisComputing temperature anomalies is out of scope of this tutorial and we will therefore use pre-computed temperature anomalies
ts_cities.csv
. A simple way to visualize anomalies and highlight cooling/warming over the years, is to use climate stripes from timeseries tool with the following parameters:
- “timeseries to plot”:
ts_cities.csv
- “column name to use for plotting”:
tg_anomalies_paris
- “plot title”:
Climate stripes for Paris (1950-2019)
View galaxy-eye the resulting plot:
Question: do you observe a warming or cooling between 1950 and 2019?The climate stripes clearly show a warming between 1950 and 2019.
Copernicus Climate Bulletins presents the current condition of the climate using key climate change indicators. They also provide data, analysis of the maps and guidance on how they are produced. Datasets for temperature anomalies can be found and are regularly updated (with recent dates). For instance, in March 2020, the corresponding dataset can be found on the Copernicus site.
Climate variables
Temperature is often the first variable that comes to mind when we talk about climate. However, it is insufficient to fully characterize the climate, and scientists have agreed on a number of variables to systematically observe Earth`s changing climate.
That is what we call Essential Climate Variables.
Essential Climate Variables
The Global Climate Observing System (GCOS) and its GCOS expert panels maintain definitions of Essential Climate Variables (ECVs).
GCOS is co-sponsored by the World Meteorological Organization (WMO), the Intergovernmental Oceanographic Commission of the United Nations Educational, Scientific and Cultural Organization (IOC-UNESCO), the United Nations Environment Programme (UN Environment), and the International Science Council (ISC). It regularly assesses the status of global climate observations of the atmosphere, land and ocean and produces guidance for its improvement.
At the moment, there are 54 ECVs.
Source: https://gcos.wmo.int/en/essential-climate-variables
Hands-on: Essential Climate VariablesWe will look at the Water Vapor Essential Climate Variable : The humidity of air near the surface of the Earth affects the comfort and health of humans, livestock and wildlife, the swarming behaviour of insects and the occurrence of plant disease. The humidity of air near the surface affects evaporation and the strength of the hydrological and energy cycles. Evaporation from the surface of the earth is the source of water in the atmosphere and so is responsible for important feedbacks in the climate system due to clouds and radiation.
- Copernicus Essential Climate Variables tool with the following parameters:
- “Variable(s)”: surface_air_relative_humidity
- “Select type of data”: Monthly mean
- “Select year(s)”:
1980
and2018
- “Select month”:
July
Rename the resulting file torh_mean_july_1980_2018.nc
- map plot gridded (lat/lon) netCDF data tool with the following parameters:
- “input with geographical coordinates (netCDF format)”:
rh_mean_july_1980_2018.nc
- “variable name as given in the netCDF file”:
R
- And finally in
Advanced Options
change:
- “multiple times”:
Yes
- “comma separated list of indexes for fields to plot”: 0,1
- “number of rows for subplot grid”: 2
- “subplot title (repeated on each subplot)”: relative humidity in percentage
- “colormap”: PiYG
- View galaxy-eye the resulting plot:
Question: Relative humidity
- Do you observe any significant changes relative humidity in France from 1979 to 2018?
- Do we have sufficient information to make any conclusions on the change in climate?
- We can see significant changes on the plot over France. The relative humidity of air near the surface of the Earth is lower in July 2018 than in July 1980.
- We do not have sufficient information to draw any conclusions about the change in climate. In our analysis, we only used two different months (July 1980 and July 2018) and can only discuss the average changes in weather during these two periods (July 1980 and July 2018). We learnt that to draw any conclusions on the climate, we would need to make statistics over a long period of time e.g. we would need to download about 30 years of data and for instance compute anomalies in relative humidity to check if there is any trend. These aspects will be discussed further in other Galaxy tutorials.
Past, present and future climate?
When we talk about climate data, the type of data can vary significantly. We have very little actual observations at the scale of climate and usually not covering a large area. In addition to observations, we can make use of:
- Re-analyses where observations and numerical modelling are combined together.
- Climate models.
Observations and re-analyses provide information about the past and current climate while climate models can provide past, current and future climate information. When it comes to future climate, we usually need to make some assumptions (such as how much CO2 emissions, etc.) and simulate different scenarios e.g. we run climate models using different assumptions and look at future trends under each of these scenarios: this is what we call climate projections. Climate projections will be discussed in a separate Galaxy tutorial.
Conclusion
We have learnt to differentiate climate from weather and got an overview of the terminology used by climate scientists to identify the various source of climate data.