Multiomics data analysis using MultiGSEA

Author(s)	Thorben Stehling Matthias Bernt
Reviewers

Overview
Questions:

How to use MultiGSEA for GSEA-based pathway enrichment for multiple omics layers?

Objectives:

Perform GSEA-based pathway enrichment for transcriptomics, proteomics, and metabolomics data.

Understand how to combine p-values across multiple omics layers.

Requirements:

Introduction to Galaxy Analyses

Time estimation: 1 hour

Supporting Materials:

Datasets

Workflows

FAQs

instances Available on these Galaxies

Known Working

UseGalaxy.eu ✅ ⭐️

Possibly Working

UseGalaxy.cz

UseGalaxy.org.au

Published: Mar 10, 2025

Last modification: Mar 11, 2025

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00503

version Revision: 2

The multiGSEA package was designed to run a robust GSEA-based pathway enrichment for multiple omics layers (Canzler and Hackermüller, 2020) Canzler and Hackermüller 2020. The enrichment is calculated for each omics layer separately and aggregated p-values are calculated afterwards to derive a composite multi-omics pathway enrichment.

Pathway definitions can be downloaded from up to eight different pathway databases by means of the graphite Bioconductor package (Sales, Calura, and Romualdi 2018). Feature mapping for transcripts and proteins is supported towards Entrez Gene IDs, Uniprot, Gene Symbol, RefSeq, and Ensembl IDs. The mapping is accomplished through the AnnotationDbi package (Pagès et al. 2019) and currently supported for 11 different model organisms including human, mouse, and rat. ID conversion of metabolite features to Comptox Dashboard IDs (DTXCID, DTXSID), CAS-numbers, Pubchem IDs (CID), HMDB, KEGG, ChEBI, Drugbank IDs, or common metabolite names is accomplished through the AnnotationHub package metabliteIDmapping. This package provides a comprehensive ID mapping for more than 1.1 million entries.

This tutorial covers a simple example workflow illustrating how the multiGSEA package works. The omics data sets that will be used throughout the example were originally provided by Quirós et al. 2017. In their publication the authors analyzed the mitochondrial response to four different toxicants, including Actinonin, Diclofenac, FCCB, and Mito-Block (MB), within the transcriptome, proteome, and metabolome layer. In this tutorial we will solely focus on the Actinonin data set.

For more background information you can read following articles:

Multiomics analysis by Quiros Quirós et al. 2017
Methods for combining p-values by Loughin Loughin 2004

Agenda

In this tutorial, we will cover:

Preparing the Data

Get data

Running MultiGSEA

Mitochondrial stress activates metabolic pathways of amino acid biosynthesis.

Conclusion

Preparing the Data

To perform pathway enrichment with MultiGSEA, you’ll need omics datasets in the file type TSV . Each individual data set contains four columns representing the feature (denoted as Symbol), the log2 fold change (logFC), the p-value (pValue), and the adjusted p-values (adj.pValue). We’ll use example data provided on Zenodo.

Get data

Data Upload

Hands On: Getting datasets
Create a new history for this tutorial.

To create a new history simply click the new-history icon at the top of the history panel:
Import the datasets from Zenodo into your Galaxy instance:
https://zenodo.org/records/14216972/files/transcriptome.tsv
https://zenodo.org/records/14216972/files/proteome.tsv
https://zenodo.org/records/14216972/files/metabolome.tsv

Running MultiGSEA

In this step, you’ll use the MultiGSEA tool to perform GSEA-based pathway enrichment on the uploaded datasets.

Hands On: Task description

Run multiGSEA ( Galaxy version 1.12.0+galaxy0) with the following parameters

“Select transcriptomics data”: Enabled

param-file “Transcriptomics data”: Transcriptomics

param-select “Gene ID format in transcriptomics data”: SYMBOL

“Select proteomics data”: Enabled

param-file “Proteomics data”: Proteomics

param-select “Gene ID format in proteomics data”: SYMBOL

“Select metabolomics data”: Enabled

param-file “Metabolomics data”: Metabolomics

param-select “Metabolite ID format”: HMDB

“Supported organisms”: Homo sapiens (Human).

“Pathway databases”: Kegg, Reactome

“Combine p-values method”: Stouffer

“P-value correction method”: BH

Pathway databases: Databases often contain their own format in which pathway definitions are provided. So you can select a relevant database. For the tutorial we choose the preset “Kegg” and “Reactome”.

Combine p-values method: Choose a method (here Stouffer for balanced weighting). To more comprehensively measure a pathway response, multiGSEA provides different approaches to compute an aggregated p value over multiple omics layers. Because no single approach for aggregating p values performs best under all circumstances, Loughin 2004 proposed basic recommendations on which method to use depending on structure and expectation of the problem. If small p values should be emphasized, Fisher’s method should be chosen. In cases where p values should be treated equally, Stouffer’s method is preferable. If large p values should be emphasized, the user should select Edgington’s method. Figure 2 indicates the difference between those three methods.
Open image in new tab

Figure 1: P-value methods

P-value correction method Type I and type II errors depend on each other and thus reducing type I errors through a p value adjustment will likely increase the chance of making a type II error and an appropriate trade-off has to be made. Choose one of the different methods for controlling false discovery rate: For the tutorial choose BH (Benjamini-Hochberg).

Mitochondrial stress activates metabolic pathways of amino acid biosynthesis.

After we performed pathway enrichment on our data we want to continue our analysis by filtering the outputs, e.g. by p-value.

Hands On: Filtering by values

We are going to use the tool Filter data on any column using simple expressions with the following parameters:

param-file “Filter”: Select the output dataset of the multiGSEA tool.

“With following condition”: c9<=0.01

Hands On: Filtering by keyword

Mitochondrial stress triggers the activation of amino acid biosynthesis and related metabolic pathways, as highlighted by Quiros et al. (Ref findest du in der Vignette), who identified up-regulation of several amino acid related pathways. These findings align with our results using a multi-omics approach with multiGSEA Our results also reveal the enrichment of amino acid-related pathways. This can be see, e.g., by searching for pathways containing the word “amino” in the name. We can search for this using the regular expression "\bamino\b" (\b is a special character marking a word border).

For further filtering we are going to use the tool Search in textfiles ( Galaxy version 9.3+galaxy1) with the following parameters:

param-file “Selectlines from”: Select the output of the last filter.

“that”: Choose “Match”

“Type of regex”: Choose “Basic (-G)”

“Regular Expression”: \bamino\b

“Match type”: “case insensitive”

“Show lines preceding the matched line”: 0

“Show lines trailing the matched line”: 0

“Output”: “text file (for further processing)”

This should produce a file with the following content:
(KEGG) Amino sugar and nucleotide sugar metabolism	0.258278145695364	0.571272307893517	8.40563134115856e-07	4.03302191748788e-05	2.34233775080415e-05	0.00310502061987873	8.75854833022965e-05	0.00347274191804089
(KEGG) Biosynthesis of amino acids	3.7749282501799e-05	0.00138827212066317	1.53729180439681e-10	2.45864202583197e-08	0.0794621026894866	0.333364038319823	1.48946778740876e-07	1.13741176493032e-05
(REACTOME) Amino acid and derivative metabolism	4.7058253543142e-06	0.000231903073460604	7.70587893652191e-22	1.84864035687161e-18	0.055363321799308	0.286607081313329	9.03772696553135e-14	2.2775071953139e-11
(REACTOME) Response of EIF2AK4 (GCN2) to amino acid deficiency	6.42075812177139e-07	4.52021371772706e-05	3.76589836665495e-10	5.31434716565013e-08	NA	NA	3.35403180955872e-11	4.83629358772636e-09

Question

What file format is required for the input data in MultiGSEA?

What is the purpose of the “Combine p-values method” parameter, and which method was selected in this tutorial?

Why is it important to select pathway databases (e.g., KEGG) when using MultiGSEA?

The required file format is TSV.

The “Combine p-values method” parameter is used to aggregate p-values across omics layers. In this tutorial, the method Stouffer was selected to apply balanced weighting.

Selecting pathway databases ensures that the analysis uses appropriate and relevant pathway definitions for enrichment.

Conclusion

In this tutorial, you explored the capabilities of MultiGSEA for performing pathway enrichment analysis across multiple omics layers, including transcriptomics, proteomics, and metabolomics data. By following the steps, you learned how to:

Prepare and upload the required omics datasets.
Configure and execute the MultiGSEA tool within Galaxy.
Combine p-values from different omics layers to derive a unified perspective on pathway enrichment.

You've Finished the Tutorial

Key points

MultiGSEA provides an integrated workflow for pathway enrichment analysis across multi-omics data.

Supports pathway definitions from several databases and robust ID mapping.

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

References

Loughin, T. M., 2004 A systematic comparison of methods for combining p-values from independent tests. Computational Statistics & Data Analysis 47: 467–485. 10.1016/j.csda.2003.11.020
Quirós, P. M., M. A. Prado, N. Zamboni, D. D’Amico, R. W. Williams et al., 2017 Multi-omics analysis identifies ATF4 as a key regulator of the mitochondrial stress response in mammals. Journal of Cell Biology 216: 2027–2045. 10.1083/jcb.201702058
Canzler, S., and J. Hackermüller, 2020 multiGSEA: a GSEA-based pathway enrichment analysis for multi-omics data. BMC bioinformatics 21: 1–13. 10.1186/s12859-020-03910-x

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Thorben Stehling, Matthias Bernt, Multiomics data analysis using MultiGSEA (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/multiGSEA-tutorial/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{proteomics-multiGSEA-tutorial,
author = "Thorben Stehling and Matthias Bernt",
	title = "Multiomics data analysis using MultiGSEA (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/multiGSEA-tutorial/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.

shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/proteomics/tutorials/multiGSEA-tutorial/tutorial.json | jq .admin_install_yaml -r)

Alternatively you can copy and paste the following YAML

---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools:
- name: text_processing
  owner: bgruening
  revisions: 86755160afbf
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: multigsea
  owner: iuc
  revisions: e48b10ce08b8
  tool_panel_section_label: Annotation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.