Analyse Argo data
Author(s) | Marie Josse |
Reviewers |
OverviewQuestions:Objectives:
How to use the Pangeo ecosystem to analyse ocean data?
How to process Argo data to visualise ocean temperature variations?
Requirements:
How to fetch Argo Data?
How to get metadata with xarray netcdf tools?
Learn to get temperature variations.
Learn to use an interactive tool to visualise Argo temperature variable on an interactive map.
- Introduction to Galaxy Analyses
- slides Slides: A short introduction to Galaxy
- tutorial Hands-on: A short introduction to Galaxy
Time estimation: 1 hourSupporting Materials:Published: Mar 24, 2024Last modification: Mar 24, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00428version Revision: 1
The ocean is a key component of the Earth’s climate system. It thus needs continuous real-time monitoring to help scientists better understand its dynamics and predict its evolution. All around the world, oceanographers have managed to join their efforts and set up a Global Ocean Observing System among which Argo is a key component. Argo is an international program that collects information from inside the ocean using a fleet of robotic instruments that drift with the ocean currents and move up and down between the surface and a mid-water level.
The data used in this tutorial are from the Argo gliders network. We are interested in the following variables: water temperature, latitude, longitude and time. Our main objective is to plot the water temperature with respect to time. For this, we will be using the netCDF xarray tools available in the Galaxy Europe (or your favourite Galaxy Instance) server.
These tools are part of the Pangeo ecosystem in which the next generation of open-source analysis tools for ocean, atmosphere and climate science can be developed, distributed, and sustained. These tools must be scalable to meet the current and future challenges of big data, and these solutions should leverage the existing expertise outside of the geoscience community.
This tutorial is part of a set of tutorials on the Galaxy Earth System supported by the EOSC FAIR-EASE project.
AgendaIn this tutorial, we will cover:
You can come back to where you left off the tutorial anytime by clicking level.
Hands-on: Log in to Galaxy
- Open your favorite browser (Chrome, Safari or Firefox as your browser, not Internet Explorer!)
- Browse to your Galaxy instance
- On the top go to Login or Register
The Galaxy homepage is divided into three panels:
- Tools on the left
- Viewing panel in the middle
- History of analysis and files on the right
The first time you use Galaxy, there will be no files in your history panel.
Argo gliders data
Argo is a global network of nearly 4000 autonomous probes measuring pressure, temperature and salinity from the surface to 2000m depth every 10 days. The localisation of these probes is nearly random between the 60th parallels. All probes data are collected by satellite in real-time, processed by several data centers and finally merged in a single dataset (collecting more than 2 million vertical profiles) made freely available to anyone through an FTP server or monthly zip snapshots.
Each Argo probe is an autonomous, free drifting, profiling float, i.e. a probe that can’t control its trajectory but can control its buoyancy and thus move up and down the water column as it wishes. Argo floats continuously operate the same program, or cycle, illustrated in the figure below. After 9 to 10 days of free drift at a parking depth of about 1000m, a typical Argo float dives down to 2000m and then shoals back to the surface while measuring pressure, temperature and salinity. Once it reaches the surface, the float sends by satellite its measurements to a data center where they are processed in real-time and made freely available on the web in less than 24h00.
Here we will focus on the Caribbean Sea surrounding the Antilles during April and May. On the 9th of April 2021 an eruption of the volcano La Soufriere Saint Vincent (Antilles) occurred. Another tutorial on this event is available on the Galaxy Training Network.
Get Argo data
Hands-on: History managementCreate a new history for this tutorial and give it a name (example: “Argo data with Pangeo”) for you to find it again later if needed.
To create a new history simply click the new-history icon at the top of the history panel:
Hands-on: Arge data fetching
- Argo data access ( Galaxy version 0.1.15+galaxy0) with the following parameters:
- param-select “We have preconfigured some mode of operations for you. What mode do you want to use?”:
🏊 standard mode simplifies the dataset, removes most of its jargon and returns a priori good data
- param-select “How do you want to select your data of interest ?”:
🗺 For a space/time domain
- “Input longitude min (+east/-west)”:
-75.0
- “Input longitude max (+east/-west)”:
-45.0
- “Input latitude min (+north/-south)”:
20.0
- “Input latitude max (+north/-south)”:
30.0
- “Input pressure min (db)”:
0.0
- “Input pressure max (db)”:
10.0
- “Input starting date”:
2021-04
- “Input ending date”:
2021-06
- param-select “Which kind of dataset do you want ?”:
Physical parameters: temperature, salinity, pressure
- Run Tool
- After a couple of minutes, an
Argo data
output will appear green in your history.- Check that your data are in netcdf format with galaxy-pencil, it should be
netcdf
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click galaxy-chart-select-data Datatypes tab on the top
- In the galaxy-chart-select-data Assign Datatype, select
nectdf
from “New type” dropdown
- Tip: you can start typing the datatype into the field to filter the dropdown menu
- Click the Save button
Argo data analysed, managed, and visualised by the Pangeo tools
Xarray tools
First, we’ll use 2 xarray netcdf tools. xarray, formerly known as xray, is a Python package which enables us to play with gridded data. This package shares most of its features from numpy, but in a more convenient manner by keeping track of labels in arrays. The gridded data is mainly available in netCDF data format. Thus xarray comes in very handy while dealing with netCDF files.
Knowing more about hour data
After fetching the required Argo data, the following stage is to obtain the meta info or metadata of the file. The very purpose of these steps is to obtain information about dimensions, variables, global attributes, etc. The coordinate info helps to know about the actual data entries present under the various variables.
Get metadata
Hands-on: NetCDF dataset with Xarray metadata Galaxy Tool
- NetCDF xarray Metadata Info ( Galaxy version 2022.3.0+galaxy0) with the following parameters:
- param-file “Netcdf file”:
output_argo
(output of Argo data access tool)
In the info output file, we can identify 4 different sections:
- Dimensions: name of dimensions and corresponding number of elements;
- Coordinates: contains coordinate arrays (longitude, latitude, level and time) with their values.
- Data variables: contains all the variables available in the dataset. Here, we only have one variable. For each variable, we get information on its shape and values.
- Global Attributes: at this level, we get the global attributes of the dataset. Each attribute has a name and a value.
Coordinates information
Hands-on: Get Coordinate information with Xarray Coordinate
- NetCDF xarray Coordinate Info ( Galaxy version 2022.3.0+galaxy0) with the following parameters:
- param-file “Netcdf file”:
output_argo
(output of Argo data access tool)
View galaxy-eye the 5 generated outputs:
- latitude: a tabular file containing all the latitude values of our Xarray dataset;
- longitude: a tabular file containing all the longitudes values;
- level: this file contains information on all the different levels (here, we have surface data so level=0 meter);
- time: this tabular file contains all the forecast times. In our case, these are relative to December 22, 2021;
- version: this is a text file returning the Xarray package version. It is useful when publishing your Galaxy workflow.
Timeseries visualisation
Hands-on: NetCDF timeseries Extractor
- NetCDF timeseries Extractor ( Galaxy version 2022.3.0+galaxy0) with the following parameters:
- param-file “Input netcdf file”:
output_argo
(output of Argo data access tool)- param-file “Tabular of variables”:
output
(output of NetCDF xarray Metadata Info tool)- “Choose the variable to plot”: ``
- “Datetime selection”:
No
- In “Advanced Plotting Options”:
- “Plot title”:
Temperature
- “Label for x-axis”:
Time
- “Label for y-axis”:
Temperature
Visualisation mapping with a Galaxy Earth System’s tool
The Earth System is a complex and dynamic system that encompasses the interactions between the atmosphere, oceans, land, and biosphere. Understanding and analyzing data from the Earth System Model (ESM) is essential, for example, to predict and mitigate the impacts of climate change. The ESM that the project tries to implement includes coastal water dynamics, ocean bio-geochemical in-situ data, marine omics observations, volcano activities and land degradation.
To know more keep an galaxy-eye open for all related blogs on the Galaxy Community Hub or go try the tools directly on Galaxy Earth System.
Ocean Data View (ODV)
Ocean Data View (ODV) is a software package for the interactive exploration, analysis and visualization of oceanographic and other geo-referenced profile, time-series, trajectory or sequence data. To know more about ODV go check the official page.
ODV is now integrated with Galaxy as an interactive tool. This kind of tool works differently than classical tools as it allows the user to interact with a dedicated graphical interface. Those tools are used to give access to Jupyter Notebooks, RStudio or R Shiny apps for example.
Hands-on: Launch ODV
- ODV with the following parameters:
- “Select if you are using a ODV collection in a zip folder or if you have your own raw data”:
The data you are using are Netcdf or tabular text files
- param-file “Netcdf or tabular text file. For text file, odv format is recommanded.”:
timeseries_tabular
(output of NetCDF timeseries Extractor tool)- Click on Run Tool
- Go to User > Active InteractiveTools
- Wait for the ODV to be running (Job Info)
- Click on ODV
If at one point your ODV interface becomes grey with a red panel on the top “X ODV - Disconnected”, do NOT panic ;) you just need to reload your tab (circular arrow on top left)
You can expand the ODV left panel (where there are 3 dots, vertically) to access the “clipboard” menu, and paste the content you want to paste on an ODV form. From there you can copy-paste everything from one side to the other. Then, click outside of this panel to collapse it.
Hands-on: Visualise with ODVIn the new tab in ODV.
Click on close of the pop-up screen for the check for Updates
Go the top left and click on File, then on Open…
On the pop-up screen on the left panel select ODV, then the folder galaxy, then data. You should see a folder open it (double clicking)
In the bottom of the pop-up window in Files of type select All Files (*)
Click on Open in the bottom right
On the left smaller map right click and select Zoom
Then move your cursor on the map you should see a red rectangle moving along
Reduce the rectangular to have the selection you want on the map. It can be something similar to the following image (no need to be exactly the same).
Once you’re happy with your selection click on Enter on your keyboard.
There your data should be opening and you can now visualise them!
Hands-on: Visualise the temperature on a map with ODV
- Go on the big + on the top left
- Select “1 SURFACE Window”
- On the central white rectangular next to the map click right and and select Properties…
- Go to the Data tab of the pop-up window
- Select for Z-axis the variable TEMP @ N POINTS=first as the following image.
- Then, click on OK.
- Go to the central map
- Click right and select Properties…
- For example, make your data dots bigger in “Display Style” increase the number below “Symbol Size” to 30, and click OK
You can now see bigger dots representing your data.
Hands-on: Save your visualisation map
- Click right on the map select Save Plot As…
- In the pop-up screen go to the folder ODV, galaxy, outputs.
- In File name rename your view (for example subset_temp_argo_data)
- In Files of type select
PNG (*.png *.PNG)
and Save then OK and OK. Once you’re finished with ODV :- On th top left click on File select Exit
- If you want to save the other window also click on Yes. Here we don’t need it so click No.
Conclusion
Awesome! You now know how to get Argo data, then get metadata and other information within the Pangeo ecosystem and finally visualise these data with an Earth System tool, Ocean Data View.
Extra information
Coming up soon even more tutorials on and other Earth-System related trainings. Keep an galaxy-eye open if you are interested!