Clinical Metaproteomics 4: Quantitation

Overview
Creative Commons License: CC-BY Questions:
  • How to perform quantitation?

Objectives:
  • Perform quantitation using MaxQuant and extract microbial and human proteins and peptides.

Requirements:
Time estimation: 3 hours
Supporting Materials:
Published: Dec 18, 2024
Last modification: Dec 18, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00463
version Revision: 0

Introduction

The next step of the clinical metaproteomics workflow is the quantification workflow. Running a quantification workflow in proteomics is essential for several critical purposes. It allows researchers to measure and compare the abundance of proteins or peptides in biological samples, offering valuable insights into biomarker discovery, comparative analysis, and differential expression studies. Quantitative proteomics helps reveal the functional roles of proteins, the stoichiometry of protein complexes, and the effects of drugs on protein expression in pharmacological studies. Additionally, it serves as a quality control measure, validating initial protein identifications, and providing data normalization for increased accuracy. Quantitative data are indispensable for hypothesis testing, systems biology, and their clinical relevance in areas such as disease diagnosis, prognosis, and therapeutic decision-making. In summary, the quantitation workflow in proteomics is a cornerstone for deciphering the complexities of protein expression and regulation, facilitating a wide array of biological and clinical applications.

In this current workflow, we perform Quantification using the MaxQuant tool and the output will be interpreted in our next module.

Quantitation workflow.

Agenda

In this tutorial, we will cover:

  1. Introduction
    1. Get data
  2. Import Workflow
  3. Peptide quantification
    1. Using Text Manipulation Tools to Manage MaxQuant Outputs
    2. Generating a list of quantified proteins and peptides
  4. Conclusion

Get data

Hands-on: Data Upload
  1. Create a new history for this tutorial
  2. Import the files from Zenodo or from the shared data library (GTN - Material -> microbiome -> Clinical Metaproteomics 4: Quantitation):

    https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F10_9Aug19_Rage_Rep-19-06-08.raw
    https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F11_9Aug19_Rage_Rep-19-06-08.raw
    https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F13_9Aug19_Rage_Rep-19-06-08.raw
    https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F15_9Aug19_Rage_Rep-19-06-08.raw
    https://zenodo.org/records/10105821/files/Experimental-Design_Discovery_MaxQuant.tabular
    https://zenodo.org/records/10105821/files/Quantitation_Database_for_MaxQuant.fasta
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

    As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

    1. Go into Data (top panel) then Data libraries
    2. Navigate to the correct folder as indicated by your instructor.
      • On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
    3. Select the desired files
    4. Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
    5. In the pop-up window, choose

      • “Select history”: the history you want to import the data to (or create a new one)
    6. Click on Import

  3. Rename the datasets
  4. Check that the datatype

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
    • In the galaxy-chart-select-data Assign Datatype, select datatypes from “New type” dropdown
      • Tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

  5. Add to each database a tag corresponding to input files.
  6. Create a dataset of the RAW files.

    Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

    To tag a dataset:

    1. Click on the dataset to expand it
    2. Click on Add Tags galaxy-tags
    3. Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).
    4. Press Enter
    5. Check that the tag appears below the dataset name

    Tags beginning with # are special!

    They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

    1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
    2. dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);
    3. datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
    4. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

    A history without name tags versus history with name tags

    Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

    The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

    More information is in a dedicated #nametag tutorial.

Import Workflow

Hands-on: Running the Workflow
  1. Import the workflow into Galaxy:

    Hands-on: Importing and launching a GTN workflow
    Launch Quantitation Workflow (View on GitHub, Download workflow) workflow.
    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on galaxy-upload Import at the top-right of the screen
    • Paste the following URL into the box labelled “Archived Workflow URL”: https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/clinical-mp-4-quantitation/workflows/WF4_Quantitation_Workflow.ga
    • Click the Import workflow button

    Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

    Video: Importing a workflow from URL

  2. Run Workflow workflow using the following parameters:

    • “Send results to a new history”: No
    • param-file ” Quantitation_Database-For-MaxQuant * “: Quantitation_Database_for_MaxQuant.fasta
    • param-file ” Experimental-Design Discovery MaxQuant”: Experimental-Design_Discovery_MaxQuant.tabular
    • param-file ” Input Raw-files”: RAW dataset collection
    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

Peptide quantification

In the Discovery Module, we used MaxQuant to identify peptides for verification. Now, we will again use MaxQuant to further quantify the PepQuery-verified peptides, both microbial and human. More information about quantitation using MaxQuant is available, including Label-free data analysis and MaxQuant and MSstats for the analysis of TMT data.

The outputs we are most interested in consist of the MaxQuant Evidence file, MaxQuant Protein Groups, and MaxQuant Peptides. The MaxQuant Peptides file will allow us to group them to generate a list of quantified microbial peptides.

Hands-on: Quantify verified peptides (from PepQuery2)
  1. MaxQuant ( Galaxy version 1.6.17.0+galaxy4) with the following parameters:
    • In “Input Options”:
      • param-file “FASTA files”: Quantitation Database for MaxQuant (Input dataset)
    • In “Search Options”:
      • param-file “Specify an experimental design template (if needed). For detailed instructions see the help text.”: output (Input dataset)
      • “minimum peptide length”: 8
      • “Match between runs”: Yes
      • “Maximum peptide length for unspecific searches”: 50
    • In “Protein quantification”:
      • “Use only unmodified peptides”: Yes
        • “Modifications used in protein quantification”: Oxidation (M)
      • In “LFQ Options”:
        • “iBAQ (calculates absolute protein abundances by normalizing to copy number and not protein mass)”: No
    • In “Parameter Group”:
      • param-repeat “Insert Parameter Group”
        • param-collection “Infiles”: output (Input dataset collection)
        • “fixed modifications”: Carbamidomethyl (C)
        • “variable modifications”: Oxidation (M)
        • “enzyme”: Trypsin/P
        • “Quantitation Methods”: reporter ion MS2
          • “isobaric labeling”: TMT11plex
          • “Filter by PIF”: Yes
    • In “Output Options”:
      • “Select the desired outputs.”: Protein Groups mqpar.xml Peptides Evidence MSMS
Question
  1. Why can we switch back to using RAW files for MaxQuant, instead of using MGF files?
  1. MaxQuant prefers RAW format compared to MGF as it has more information compared to MGF.
Question
  1. Previously, we used MaxQuant in the Discovery workflow. Why are we using MaxQuant again, instead of Search GUI/PeptideShaker?
  1. We are using MaxQuant for quantification purposes only. SearchGUI Peptide Shaker doesn’t have the capability to perform quantification of peptides or proteins.

Using Text Manipulation Tools to Manage MaxQuant Outputs

Hands-on: Select microbial protein groups from MaxQuant with Select
  1. Select with the following parameters:
    • param-file “Select lines from”: proteinGroups (output of MaxQuant tool)
    • “that”: NOT Matching
    • “the pattern”: (_HUMAN)|(_REVERSED)|(CON)|(con)
  2. Select with the following parameters:
    • param-file “Select lines from”: peptides (output of MaxQuant tool)
    • “that”: NOT Matching
    • “the pattern”: (_HUMAN)|(_REVERSED)|(CON)|(con)
  3. Cut with the following parameters:
    • “Cut columns”: c1
    • param-file “From”: out_file1 (output of Select tool)
  4. Cut with the following parameters:
    • “Cut columns”: c1
    • param-file “From”: out_file1 (output of Select tool)

Generating a list of quantified proteins and peptides

Hands-on: Group quantified proteins
  1. Group with the following parameters:
    • param-file “Select data”: out_file1 (output of Cut tool)
    • “Group by column”: c1
Hands-on: Group quantified peptides
  1. Group with the following parameters:
    • param-file “Select data”: out_file1 (output of Cut tool)
    • “Group by column”: c1

Conclusion

In summary, the implementation of a quantitation workflow using MaxQuant represents a significant advancement in quantitative proteomic research. This approach enables precise measurement of protein and peptide abundances, enhancing our ability to unravel the complexities of biological systems. This workflow is instrumental in biomarker discovery, comparative analysis, and understanding differential protein expression by offering detailed insights into quantitative changes across different experimental conditions. Its capacity to generate accurate data supports a wide spectrum of applications, including disease research, drug development, and systems biology investigations. Furthermore, the MaxQuant-based quantitation workflow ensures data quality, enabling reliable and reproducible results. It serves as a vital step for quality control, allowing researchers to draw meaningful conclusions from proteomic experiments confidently.