Pathway analysis with the MINERVA Platform

Overview
Creative Commons License: CC-BY Questions:
  • Which pathways are affected in this COVID-19 study?

  • How can I visualise the results of a differential expression analysis in the MINERVA Platform?

Objectives:
  • Perform an analysis using a workflow from WorkflowHub

  • Visualise and interpret the results with MINERVA

Requirements:
Time estimation: 1 hour
Level: Intermediate Intermediate
Supporting Materials:
Published: Mar 28, 2024
Last modification: Mar 28, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00437
version Revision: 14

This tutorial is a partial reproduction of Togami et al. 2022 wherein they evaluated mRNA and miRNA in a selection of COVID-19 patients and healthy controls. While that paper uses a closed source pipeline, we’ll be reproducing the analysis with open source tools in Galaxy, using a workflow on WorkflowHub developed for the BY-COVID project.

Comment: Full data

The original data is available at NCBI under BioProject PRJNA754796

Agenda

In this tutorial, we will deal with:

  1. Study Design
  2. Analysis
  3. The MINERVA Platform

There are several places you can jump to in this tutorial, using pre-calculated data. We recommend you jump skipping the data download and counting step, and skipping to the analysis, as that precludes the slowest and most data intensive parts of this tutorial. However, the entire process is documented in case you want to reproduce our work.

Study Design

Hands-on: Data upload
  1. Create a new history for this tutorial

    Click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Import the factor table from Zenodo

    https://zenodo.org/records/10405036/files/factordata.tabular
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Check that the datatype

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
    • In the galaxy-chart-select-data Assign Datatype, select tabular from “New type” dropdown
      • Tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

Analysis

We have split this workflow into three parts, based only on how long the first two portions of the workflow take to execute. The rough runtime of the workflow portions, when this was being developed, can be broken down as follows:

Step Time
Data Download ~6h
Processing Counts ~8h
Analysis & Visualisation 15m

These numbers were generated on UseGalaxy.eu and may not represent the most efficient possible computation, as they are executed on a shared cluster that can, at times, be more or less busy.

As such we recommend you skip to the analysis step to progress to the interesting portion of the tutorial. We have provided in the Zenodo record data from the entire analysis, analysed with the Download & Counts steps that can be skipped.

Hands-on: Choose Your Own Tutorial

This is a "Choose Your Own Tutorial" section, where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial

Choose whether you want to run the time-consuming Download Data step.

Data Download

We’ll start by downloading our FASTQ files from the GEO Dataset GSE182152

Hands-on: Download the data from GEO (ETA: 6 Hours)
  1. Import the workflow into Galaxy

    Launch Trim and Filter reads workflow.

  2. Run the workflow with the following parameters:

    • “Sample Table”: factordata.tabular

Here we have cut the SRR* identifiers from the sample table and downloaded them with fasterq, part of the SRA toolkit.

Counts

With that done, we can start to analyse the data using HISAT2 and featureCounts. This workflow takes in the RNA Sequencing data we’ve downloaded previously, before trimming it with cutadapt. Both the trimmed and untrimmed reads are run through FastQC for visualisation.

flowchart TD
  0["ℹ️ Input Collection\nmRNA-Seq Reads"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Dataset\nUCSC Genome"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["FastQC"];
  0 -->|output| 2;
  3["Cutadapt"];
  0 -->|output| 3;
  4["FastQC"];
  3 -->|out1| 4;
  5["HISAT2"];
  3 -->|out1| 5;
  6["featureCounts"];
  5 -->|output_alignments| 6;
  5cb5bd5d-df81-4463-8002-e89551780f01["Output\noutput_feature_lengths"];
  6 --> 5cb5bd5d-df81-4463-8002-e89551780f01;
  style 5cb5bd5d-df81-4463-8002-e89551780f01 stroke:#2c3143,stroke-width:4px;
  5dec13e1-0011-472c-a72d-6715fe55c23c["Output\noutput_short"];
  6 --> 5dec13e1-0011-472c-a72d-6715fe55c23c;
  style 5dec13e1-0011-472c-a72d-6715fe55c23c stroke:#2c3143,stroke-width:4px;
  7["Read Distribution"];
  5 -->|output_alignments| 7;
  1 -->|output| 7;
  8["Gene Body Coverage BAM"];
  5 -->|output_alignments| 8;
  1 -->|output| 8;
  9["Multi QC raw reads"];
  2 -->|text_file| 9;
  4 -->|text_file| 9;
  3 -->|report| 9;
  5 -->|summary_file| 9;
  7 -->|output| 9;
  8 -->|outputtxt| 9;
  6 -->|output_summary| 9;
  9217af4e-e34c-4a49-9d67-4b3df511ac2c["Output\nhtml_report"];
  9 --> 9217af4e-e34c-4a49-9d67-4b3df511ac2c;
  style 9217af4e-e34c-4a49-9d67-4b3df511ac2c stroke:#2c3143,stroke-width:4px;
Workflow 1: Counts Workflow for mRNA-Seq BY-COVID Pipeline.

The trimmed reads are then handled by HISAT2 for alignment to the reference human genome, and featureCounts is run for quantification.

This workflow produces a set of featureCounts tables, a set of featureLengths (needed for goseq), and a MultiQC report.

Hands-on: Run the Workflow
  1. Import the workflow into Galaxy

    Launch mRNA-Seq BY-COVID Pipeline (View on WorkflowHub)

  2. Run the workflow with the following parameters:

This workflow produces a handful of outputs: the featureCounts results, and a MultiQC report. Looking at the report we see generally reasonable quality data.

limma

Now we’re ready to analyse the counts files. Here we’ll take the feature counts dataset collection and merge it into one count matrix through the use of “Column join”. This can then be annotated with the human readable names of the genes. This is all passed to limma for differential expression analysis.

With this result in hand, we’re ready to do two further steps: preparing the dataset for goseq, and for analysis in the MINERVA Platform. Goseq is a tool for gene ontology enrichment analysis, and the MINERVA Platform is a tool for visualising pathway analysis.

flowchart TD
  0["ℹ️ Input Collection\nfeatureCounts: Counts"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Dataset\nfactordata"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["ℹ️ Input Collection\nfeatureCounts: Lengths"];
  style 2 stroke:#2c3143,stroke-width:4px;
  3["Column join"];
  0 -->|output| 3;
  4["Extract dataset"];
  2 -->|output| 4;
  5["countdata"];
  3 -->|tabular_output| 5;
  e8dcfaaf-0d8d-46eb-8cc9-ce2fbdfc45e5["Output\ncount_data"];
  5 --> e8dcfaaf-0d8d-46eb-8cc9-ce2fbdfc45e5;
  style e8dcfaaf-0d8d-46eb-8cc9-ce2fbdfc45e5 stroke:#2c3143,stroke-width:4px;
  6["annodata"];
  3 -->|tabular_output| 6;
  7["Replace Text"];
  5 -->|outfile| 7;
  8["limma DEG analysis"];
  6 -->|out_tab| 8;
  7 -->|outfile| 8;
  1 -->|output| 8;
  1c8d7950-1003-4c86-9986-c6ed1264cd8b["Output\nlimma_report"];
  8 --> 1c8d7950-1003-4c86-9986-c6ed1264cd8b;
  style 1c8d7950-1003-4c86-9986-c6ed1264cd8b stroke:#2c3143,stroke-width:4px;
  9["Extract dataset"];
  8 -->|outTables| 9;
  10["MINERVA Formatting"];
  9 -->|output| 10;
  9ce4f48b-e742-4b7c-9bde-5d51f11068d3["Output\nminerva_table"];
  10 --> 9ce4f48b-e742-4b7c-9bde-5d51f11068d3;
  style 9ce4f48b-e742-4b7c-9bde-5d51f11068d3 stroke:#2c3143,stroke-width:4px;
  11["Join two Datasets"];
  9 -->|output| 11;
  4 -->|output| 11;
  12["Compute"];
  11 -->|out_file1| 12;
  13["Cut"];
  12 -->|out_file1| 13;
  14["Unique"];
  13 -->|out_file1| 14;
  15["Cut"];
  14 -->|outfile| 15;
  16["Cut"];
  14 -->|outfile| 16;
  17["goseq"];
  15 -->|out_file1| 17;
  16 -->|out_file1| 17;
Workflow 2: Analysis Workflow for mRNA-Seq BY-COVID Pipeline.
Hands-on: Analyse the Counts
  1. Run the workflow with the Factor Data from the first Hands-on, and the datasets from the workflow or Zenodo download, depending on your path:

    Launch mRNA-Seq BY-COVID Pipeline (View on WorkflowHub)

You should have a few outputs, namely the goseq outputs, and a table ready for visualisation in the MINERVA Platform!

The MINERVA Platform

The dataset prepared for the MINERVA Platform must be correctly formatted as a tabular dataset (\t separated values) like the following, with the dbkey set to hg19 or hg38. If you’ve run the above workflow, this should be the case.

SYMBOL  logFC              P.Value               adj.P.Val
TRIM25  2.07376444684004   1.2610025125617e-18   3.57368112059986e-15
ACSL1   2.90647033200259   2.71976234791064e-16  3.85390324698937e-13
NBEAL2  2.45952426389725   2.71787290816654e-14  2.56748394058132e-11
MIR150  -2.55304226607428  9.55912390273625e-14  6.74866152827879e-11
SLC2A3  2.95861349227708   1.19066011437523e-13  6.74866152827879e-11

The tabular dataset, as prepared above is then used by a dedicated MINERVA plugin (Hoksza et al. 2019) to visualise the data on-the-fly in the COVID-19 Disease Map. To visualise and explore the data, follow these steps:

Hands-on: Visualise in MINERVA
  1. Click to expand the final “MINERVA-Ready Table”

  2. Click on the galaxy-barchart (Visualize) icon

    A galaxy dataset with the barchart icon circled in red.

  3. Select “display at Minerva (SARS-CoV-2 Minerva Map)”

    The galaxy visualisation interface with a big display at minerva banner at the top.

    The MINERVA visualisation is only for correctly formatted files with the correct genome (i.e. human, hg19). If you dont’ see MINERVA listed, first check that your dataset is:

    1. recognised as a tabular dataset

      • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
      • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
      • In the galaxy-chart-select-data Assign Datatype, select tabular from “New type” dropdown
        • Tip: you can start typing the datatype into the field to filter the dropdown menu
      • Click the Save button

    2. Has the correct genome build:

      • Click the desired dataset’s name to expand it.
      • Click on the “?” next to database indicator:

        UI for changing dbkey

      • In the central panel, change the Database/Build field
      • Select your desired database key from the dropdown list: hg19
      • Click the Save button

      It should be specifically hg19 not a patch like hg19Patch5

    If that still doesn’t work, please check that the Galaxy server you are using is updated to 24.0 or later.

Analysis in the MINERVA Platform

Welcome to the MINERVA Covid-19 Disease Map! It has a similar interface to Galaxy, there is an interaction menu on the left, the main area is where you’ll do your investigation, and on the right are your datasets! In this case, the differentially expressed genes analysed above automatically loaded from Galaxy when you clicked “Display at MINERVA”.

MINERVA main view, a search menu on the left, a view of the disease map in the center showing a colourful cell diagram with signalling and T-Cells and CD8+ and more. On the right is a filtering box for the dataset from Galaxy allowing the user to filter by p-value and FC thresholds. 397 genes have been loaded.

After the loading time, marked as “Reading Map Elements”, the dataset will be visible in the right panel of the COVID-19 Disease Map, with the four corresponding columns specified earlier (see image below). The MINERVA-Galaxy plugin allows you to:

  • filter the data table by fold-change (FC) threshold or by p-value (default: adjusted p-value, threshold set to 0.05)
  • Search for specific gene symbols to display (“Search” box)
  • Select specific differential expression values to display in the map (checkboxes in the data tab)
  • Select all entries in the data table for visualisation (Select All)
  • Reset the visualisation

The general process of data exploration looks like:

  1. In the main map, find pathways with matching entries indicated by blue pins.
  2. After selecting what you want to see, browse the COVID-19 Disease Map to explore the pathways with the corresponding expression pattern.
  3. Select a pathway of your choice and in the left panel click the “Associated submap” button
  4. Explore the expression patterns in the diagram that will be displayed.
Hands-on: Explore TLR pathways
  1. Use the Search box above the table on the right to search for TLR.
  2. Select all four TLR genes.

    • TLR3
    • TLR4
    • TLR7
    • TLR8
  3. In the main map, find PAMP Signalling and click on it. (Note: don’t click the blue pin, click the pathway name)

    The PAMP signalling box next to the golgi body. A blue map pin is placed upon the red and whitebox.

  4. In the left panel, click the Associated submap: PAMP Signalling button.
  5. Explore the detailed diagram to examine the expression pattern.

    Question

    What is the expression pattern of TLR3, TLR7, and TLR8 in the PAMP Signalling pathway?

    TLR3 and TLR7 are downregulated (cool/blue colour), TLR8 is upregulated (warm/red colour). image of the aforementioned regulation patterns.

  6. Without closing the PAMP Signalling submap, click “Select all” to visualise the entire data table

    Question

    What is the expression pattern downstream of TLR7 and TLR8, namely how are MYD88 and IRAK4 regulated?

    MYD88 and IRAK4 are strongly and weakly upregulated, respectively, despite TLR7 downregulation. image of the aforementioned regulation patterns.

For further analysis in the MINERVA Platform, a full user guide is available