metaQuantome 3: Taxonomy

Overview
Creative Commons License: CC-BY Questions:
  • How do I look at differentially expressed taxa?

  • Can we get top 5 or 10 taxa present in the sample?

  • How can I visualize my results?

Objectives:
  • A taxonomy and quantitational analysis of metaproteomic mass spectrometry data.

Requirements:
Time estimation: 1 hour
Level: Intermediate Intermediate
Supporting Materials:
Published: Oct 29, 2020
Last modification: Mar 14, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00224
version Revision: 36

metaQuantome software suite Easterly et al. 2019 was developed by the Team for quantitative and statistical analysis of metaproteomics data. For taxonomic and functional expression analysis within the microbial community, metaQuantome leverages peptide-level quantitative information to generate visual outputs for data interpretation. It also generates outputs that help in understanding the taxonomic contribution to a selected function as well as functions expressed by selected taxonomic group.

In this tutorial, we will learn specifically about the metaQuantome Taxonomy workflow. In particular, we will learn about how peptide-level quantitation and associated taxonomic information can be used to generate bar plots (for taxonomic composition), volcano plots (to detect differentially expressed taxa) and heatmap cluster analysis.

To demonstrate the use of this workflow, we have used a thermophilic biogas reactor dataset wherein municipal food waste and manure is digested to generate methane gas (Delogu et al. 2020). After one round in the reactor, the microbial community was simplified and enriched via serial dilution. This inoculum was then transferred to a solution of cellulose from Norwegian Spruce and incubated at 65°C. Triplicate samples were taken in a time series from 0 to 43 hours after inoculation and mass spectrometry data was acquired on a Q-Exactive (Thermo) mass spectrometer. For this training, we have chosen three time points (8 hours, 18 hours and 33 hours) from this dataset.

MetaQuantome-taxonomy-workflow.

Agenda

In this tutorial, we will cover:

  1. Pretreatments
    1. Get data
  2. Download metaQuantome Databases
  3. Create metaQuantome sample file
  4. Run metaQuantome
    1. metaQuantome: expand
    2. metaQuantome: filter
    3. metaQuantome: stat
  5. Visualize your Data
    1. metaQuantome: visualize
    2. metaQuantome: visualize Volcano Plots
  6. Conclusion

Pretreatments

The first step in this tutorial is to get the data from the Zenodo link provided and make sure that it is in the correct format.

Get data

Hands-on: Data upload
  1. Create a new history for this tutorial and give it a meaningful name.

    Click the new-history icon at the top of the history panel:

    UI for creating new history

    1. Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)
    2. Type the new name
    3. Click on Save

    If you do not have the galaxy-pencil (Edit) next to the history name:

    1. Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
    2. Type the new name
    3. Press Enter

  2. Import the files from Zenodo: a Functional File and an Intensity file.

    https://zenodo.org/record/4110725/files/Intensity-File.tabular
    https://zenodo.org/record/4110725/files/Taxonomy-File.tabular
    
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

    Alternatively, import the files from the shared data library (GTN - Material -> proteomics -> metaQuantome 3: Taxonomy)

    As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

    1. Go into Shared data (top panel) then Data libraries
    2. Navigate to the correct folder as indicated by your instructor.
      • On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
    3. Select the desired files
    4. Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
    5. In the pop-up window, choose

      • “Select history”: the history you want to import the data to (or create a new one)
    6. Click on Import

Download metaQuantome Databases

Hands-on: Run metaQuantome databases
  1. metaQuantome: databases ( Galaxy version 2.0.0-0) with the following parameters:
    • param-select “Databases to Download”: Select All (NCBI Taxonomy Database, Gene Ontology Database and Enzyme Commission (EC) database)
    Comment

    metaQuantome uses freely available bioinformatic databases to expand your set of direct annotations.

    The databases are:

    1. NCBI taxonomy database. consists of a list of all currently identified taxa and the relationships between them.

    2. Gene Ontology (GO) term database. metaQuantome uses the go-basic.obo file and the metagenomics slim GO (subset of the full GO). More details are available at GO term website

    3. ENZYME database with Enzyme Classification (EC) numbers. This database classifies enzymes and organizes the relationships between them.

Question
  1. Why is it necessary to download metaQuanome databases?
  2. Can these databases be downloaded from other sources?
  1. metaQuantome databases help the metaQuantome:expand tool to co-relate information your have provided in your input files according to the GO term, taxa or EC databases and its relationship .
  2. For metaQuantome, we use freely available databases. This module downloads the most recent releases of the specified databases and stores them in a single file, which can then be accessed by the rest of the metaQuantome modules. For reference, the taxonomy database is the largest (~500 Mb), while the GO and EC databases are smaller: ~34 Mb and ~10Mb, respectively. Also, note that the databases will be stored in the history so that the date of download can be referenced later. Thus, the databases will not be modified, except for the NCBI database.

Create metaQuantome sample file

The create samples file module is used to generate the samples file input file for the metaQuantome workflow. This input file is used to specify the column names used for each experimental group. These column names are referenced when handling the input data and performing statistical analysis.

Hands-on: Create an experimental Design file for sorting samples
  1. metaQuantome: create samples file ( Galaxy version 2.0.0-0) with the following parameters:
    • param-select “Sample file creation method”: Manually specify experimental conditions and samples
    • In “Insert Samples”:
      • “1.Samples”:
        • “Group Name”: T2
          • “Column”: T2_A1,T2_B1
      • “2.Samples”:
        • “Group Name”: T4
          • “Column”: T4A_1,T4B_1
      • “3.Samples”:
        • “Group Name”: T7
          • “Column”: T7A_1,T7B_1

Run metaQuantome

metaQuantome: expand

The expand module is the first analysis step in the metaQuantome analysis workflow, and can be run to analyze differently expressed Taxa in the samples. In taxonomy mode, the following information is required apart from metaQuantome databases and samples file: a tab-separated taxonomy annotation file, with a peptide column and a taxonomy annotation column. The taxonomic annotations should be the lowest common ancestor (LCA) for each peptide, preferably given as NCBI taxonomy IDs.

  • The name of the peptide column in the taxonomic annotation file (peptide)
  • The name of the taxonomy annotation column in the taxonomy annotation file (taxon_id)
  • The name of the peptide column in the Intensity file (peptide)
Hands-on: Run metaQuantome expand
  1. metaQuantome: expand ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Database Archive File”: metaQuantome databases (output of metaQuantome: database tool)
    • param-file “Samples file”: metaQuantome: create samples_file (output of metaQuantome: create samples file tool)
    • “Mode”: Taxonomic analysis
      • param-file “Taxonomy file”: Taxonomy File (Input dataset)
      • “Taxonomy file: peptide column name”: peptide
      • “Taxonomy column name”: taxon_id
    • param-file “Intensity file”: Intensity File (Input dataset)
      • “Intensity file: peptide column name”: peptide
  2. Rename galaxy-pencil the output file to metaQuantome expand.

    Comment

    The structure of the output file depends on the analysis mode and the experimental design, but the columns generally look like this, with one row for each term:

    term id info about term. (one or more columns) mean term intensity (by sample group) term intensity (by sample) number of unique peptides (by sample) number of sample children in each sample
    term1 name, rank, etc. note that this is the log2 of the mean intensity this is the log2 of term intensity in each sample. Missing data is coded as NA. integer. 0 is coded as NA integer. 0 is coded as NA
               

metaQuantome: filter

The filter module is the second step in the metaQuantome workflow. The filter module filters the expanded terms to those that are representative of the data according to the sample parameters the user has specified.

Hands-on: Filtering the expanded data
  1. metaQuantome: filter ( Galaxy version 2.0.0-0) with the following parameters:
    • “Mode”: Taxonomic analysis
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • param-file “metaquantome expand file”: metaQuantome:expand (output of metaQuantome: expand tool)
    • “min_peptides”: 1
    • “min_pep_nsamp”: 1
    • “min_children_non_leaf”: 2
    • “min_child_nsamp”: 1
    • “qthreshold”: 2
  2. Rename galaxy-pencil the output file to metaQuantome:Filter.

    Comment

    To learn more about the filter module please read the metaQuantome paper published by Galaxy-P Easterly et al. 2019.

Question
  1. Can we change the filter parameters?
  1. The parameters can be changed according to the experimental design or depending on the data.

metaQuantome: stat

Hands-on: Statistical analysis of the filtered data on multiple conditions.
  1. metaQuantome: stat ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • param-file “metaquantome expanded or filtered file”: metaQuantome:Filter (output of metaQuantome: filter tool)
    • “Mode”: Taxonomic analysis
    • “Name of the control sample group of control (used as denominator for fold change evaluation).”: T2
    • “Statistical test to perform”: standard t-test (parametric)
    • “Perform paired tests”: Yes
  2. Rename galaxy-pencil the output file to metaQuantome:stat.

    Comment

    In this tutorial, we have used the sample T2 as the control group for fold change evaluation. The users have the ability to choose their own control sample group. The initial version of the metaQuantome stat module could only process 2 conditions at a time. However, the new update of metaQuantome has the ability to perform statistical analysis across multiple conditions.

Visualize your Data

The outputs of the visualization module of metaQuantome are high-quality, publication-ready visualizations: barplots for the analysis of a single sample or experimental condition and differential abundance analysis, volcano plots, heatmaps, and principal components analysis for comparisons between two or more experimental conditions. Here were are showing 2 visualizations: Barplot and Volcano Plot. The Heatmap and PCA plot for multiple conditions are under development. There are two outputs of the visualization tool : an HTML file (figure) and a tabular output containing the plot data.

metaQuantome: visualize

Hands-on: Bar chart visualization of Taxonomy in T2 sample.
  1. metaQuantome: visualize ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Tabular file from metaQuantome stats or metaQuantome filter”: metaQuantome:stat (output of metaQuantome: stat tool)
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • “Mode”: Taxonomic analysis
      • “Plot type”: Bar Chart (bar)
        • “Taxonomic rank to restrict to in the plot”: genus
        • “Mean intensity column name”: T2_mean
    Comment

    The user has a choice to run Bar plot for T2, T4 and T7 to know the top 5 or 10 most differently expressed taxonomy.

T2_taxa. Open image in new tab

Figure 1: Bar Chart visualization of Top 5 genus present in T2 sample.
Question
  1. Can we select any color for the bar chart?
  2. Can we increase the number of terms to display?
  1. Yes, the available colors are blue, orange, yellow, violet, black and red.
  2. Yes, the user can choose their own number, however, the larger the display numbers are the dimensions of the image has to be adjusted accordingly.
Hands-on: Bar chart visualization of Taxonomy in T7 sample.
  1. metaQuantome: visualize ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Tabular file from metaQuantome stats or metaQuantome filter”: metaQuantome:stat (output of metaQuantome: stat tool)
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • “Mode”: Taxonomic analysis
      • “Plot type”: Bar Chart (bar)
        • “Taxonomic rank to restrict to in the plot”: genus
        • “Mean intensity column name”: T7_mean
T7_taxa. Open image in new tab

Figure 2: Bar Chart visualization of Top 5 genus present in T7 sample.
Hands-on: Bar chart visualization of Taxonomy in T4 sample.
  1. metaQuantome: visualize ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Tabular file from metaQuantome stats or metaQuantome filter”: metaQuantome:stat (output of metaQuantome: stat tool)
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • “Mode”: Taxonomic analysis
      • “Plot type”: Bar Chart (bar)
        • “Taxonomic rank to restrict to in the plot”: genus
        • “Mean intensity column name”: T4_mean
T4_taxa. Open image in new tab

Figure 3: Bar Chart visualization of Top 5 genus present in T4 sample.

metaQuantome: visualize Volcano Plots

Hands-on: Volcano Plot visualization of the data T4 and T2.
  1. metaQuantome: visualize ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Tabular file from metaQuantome stats or metaQuantome filter”: metaQuantome:stat (output of metaQuantome: stat tool)
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • “Mode”: Taxonomic analysis
      • “Plot type”: Volcano Plot (volcano)
        • “Name of the fold change column in the stat dataframe”: log2fc_T4_over_T2
        • “Name of the Corrected p-value column in the stat dataframe”: corrected_p_T4_over_T2
      • param-select “Flip the fold change (i.e., multiply log fold change by -1)”: No
    Comment

    Here, we show metaQuantome’s Taxonomy differential abundance volcano plot. The user may select the significance threshold (0.05 by default), and terms with statistically significant fold changes are colored green and labeled. However, we can currently only compare two conditions at a time. Here we have compared T4 and T2 time points. The user can also perform comparison of T7 and T2.

T2-T4-taxa-volcano. Open image in new tab

Figure 4: Volcano plot for differentially expressed taxa comparing sample T4 and T2.
Hands-on: Volcano Plot visualization of the data T7 and T2.
  1. metaQuantome: visualize ( Galaxy version 2.0.0-0) with the following parameters:
    • param-file “Tabular file from metaQuantome stats or metaQuantome filter”: metaQuantome:stat (output of metaQuantome: stat tool)
    • param-file “Samples file”: metaQuantome:create samples_file (output of metaQuantome: create samples file tool)
    • “Mode”: Taxonomic analysis
      • “Plot type”: Volcano Plot (volcano)
        • “Name of the fold change column in the stat dataframe”: log2fc_T7_over_T2
        • “Name of the Corrected p-value column in the stat dataframe”: corrected_p_T7_over_T2
T2-T7-taxa-volcano. Open image in new tab

Figure 5: Volcano plot for differentially expressed taxa comparing sample T7 and T2.
Question
  1. Can we choose to Flip the fold change?
  1. Certainly, the user has the ability to flip the fold changes.

Conclusion

This completes the walkthrough of the metaQuantome taxonomy workflow. This tutorial is a guide to run the metaQuantome modules and can be used for metaproteomics research. Here, we have incorporated only two visualization modules in this workflow but we will make the heatmap and PCA plot also available. Researchers can use this workflow with their data also, please note that the tool parameters and the workflow will be needed to be modified accordingly. Also, refer to the metaQuantome data creation workflow to understand how to make files metaQuantome compatible.

This workflow was developed by the Galaxy-P team at the University of Minnesota. For more information about Galaxy-P or our ongoing work, please visit us at galaxyp.org