Clinical Metaproteomics 2: Discovery

Author(s)	Subina Mehta Katherine Do Dechen Bhuming
Editor(s)	Pratik Jagtap Timothy J. Griffin
Reviewers

Overview
Questions:

How to perform database searching?

How to extract microbial and Human protein and peptide sequences from the results

Objectives:

Perform Database searching using two algorithms

Extract confident peptides and proteins

Generate a microbial peptide panel for verification

Requirements:

Introduction to Galaxy Analyses

Proteomics

Time estimation: 3 hours

Supporting Materials:

Datasets

Workflows

FAQs

video Recordings

video Tutorial (June 2024) - 16m

video View All

instances Available on these Galaxies

Known Working

UseGalaxy.eu ✅ ⭐️

UseGalaxy.org (Main) ✅ ⭐️

UseGalaxy.org.au ✅ ⭐️

UseGalaxy.cz ✅

Published: Feb 6, 2024

Last modification: Dec 5, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00414

version Revision: 4

This tutorial can be followed with any user-defined database but would work best if the clinical metaproteomics database generation module was used (see Database Generation tutorial). The MetaNovo tool generates a more manageable database that contains identified proteins. The MetaNovo-generated database merged with Human SwissProt (reviewed only) and contaminants (cRAP) databases to generate a compact database (~21.2k protein sequences) that will be used for peptide identification.

Peptide identification

The MSMS data will be searched against the compact database Human UniProt Microbial Proteins (from MetaNovo) and cRAP to identify peptide and protein sequences via sequence database searching. For this tutorial, two peptide identification programs will be used: SearchGUI/PeptideShaker and MaxQuant. However, you could use other software too, such as Fragpipe or Scribe. For the purpose of this tutorial, a dataset of the 4 RAW/MGF files will be used as the MS/MS input.

Agenda

In this tutorial, we will cover:

Peptide identification

Database Searching

Get data

Import Workflow

Peptide identification

Appending decoy sequenced to FASTA database with FastaCLI

Converting RAW files to MGF files with msconvert

Perform Database searching with Search GUI

Post-processing of SearchGUI output using with Peptide Shaker

Using Text Manipulation Tools to Manage Microbial Outputs from SearchGUI/PeptideShaker

Perform peptide discovery with MaxQuant

Using Text Manipulation Tools to Manage Microbial Outputs from MaxQuant

Process SGPS and MaxQuant peptides to compile one list of unique microbial peptides

Conclusion

Database Searching

This step is to identify proteins based on mass spectrometry data. The algorithms identify peptides in the spectra and search a protein sequence database to match observed peptide data with theoretical peptide masses and spectra. Scoring and false discovery rate control help assess the reliability of matches, followed by protein inference to determine the proteins present in the sample. These algorithms are essential for interpreting mass spectrometry data, aiding in protein identification, quantification, and insights into biological processes and disease mechanisms in proteomics research.

Get data

Hands On: Data Upload
Create a new history for this tutorial
Import the files from Zenodo or from the shared data library (GTN - Material -> proteomics -> Clinical Metaproteomics 2: Discovery):
https://zenodo.org/records/10105821/files/Human_UniProt_Microbial_Proteins_(from_MetaNovo)_and_cRAP.fasta
https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F10_9Aug19_Rage_Rep-19-06-08.raw
https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F11_9Aug19_Rage_Rep-19-06-08.raw
https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F13_9Aug19_Rage_Rep-19-06-08.raw
https://zenodo.org/records/10105821/files/PTRC_Skubitz_Plex2_F15_9Aug19_Rage_Rep-19-06-08.raw
https://zenodo.org/records/10105821/files/Experimental-Design_Discovery_MaxQuant.tabular
Copy the link location

Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data

Paste the link(s) into the text field

Press Start

Close the window

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

Go into Libraries (left panel)

Navigate to the correct folder as indicated by your instructor.

On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.

Select the desired files

Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu

In the pop-up window, choose

“Select history”: the history you want to import the data to (or create a new one)

Click on Import
Rename the datasets

Check that the datatype

Click on the galaxy-pencil pencil icon for the dataset to edit its attributes

In the central panel, click galaxy-chart-select-data Datatypes tab on the top

In the galaxy-chart-select-data Assign Datatype, select datatypes from “New Type” dropdown

Tip: you can start typing the datatype into the field to filter the dropdown menu

Click the Save button

Add to each database a tag corresponding to user.

Create a dataset collection of all the raw files and MGF files.

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

To tag a dataset:

Click on the dataset to expand it

Click on Add Tags galaxy-tags

Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).

Press Enter

Check that the tag appears below the dataset name

Tags beginning with # are special!

They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;

dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);

datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;

datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a dedicated #nametag tutorial.

Import Workflow

Hands On: Running the Workflow

Import the workflow into Galaxy:

Hands On: Importing and launching a GTN workflow

Launch Discovery Workflow (View on GitHub, Download workflow) workflow.

Click to Launch Discovery Workflow (View on GitHub, Download workflow)

Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows

Click on galaxy-upload Import at the top-right of the screen

Paste the following URL into the box labelled “Archived Workflow URL”: https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/clinical-mp-2-discovery/workflows/WF2_Discovery-Workflow.ga

Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Run Workflow workflow using the following parameters:

“Send results to a new history”: No

param-file ” RAW files”: RAW dataset collection

param-file ” Human UniProt Microbial Proteins (from MetaNovo) and cRAP”: Human_UniProt_Microbial_Proteins_(from_MetaNovo)_and_cRAP.fasta

param-file ” Experimental Design Discovery MaxQuant”: Experimental-Design_Discovery_MaxQuant.tabular

Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). At the top of the resulting page you will have the option to switch between the My workflows, Workflows shared with me and Public workflows tabs. Select the tab you want to see all workflows in that category.

Click on the workflow-run Run workflow button of the workflow you would like to use

Configure the workflow as needed

Click the Run Workflow button at the top-right of the screen

You may have to refresh your history to see the queued jobs

Peptide identification

Using the compact database generated by MetaNovo as the input database, we will match MS/MS data to peptide sequences via sequence database searching.

For this tutorial, two peptide identification programs will be used: SearchGUI/PeptideShaker and MaxQuant. For both programs, the created dataset of the four MS datasets in the history will be used as the MS/MS input. The RAW MS/MS data files will be converted into mascot generic format (MGF) files as that is the standard format in which MS/MS searches are performed.

Peptides identified from each program will be verified with the PepQuery tool to generate a master list of confident verified microbial peptides.

Appending decoy sequenced to FASTA database with FastaCLI

Using the FastaCLI tool, decoy sequences will be appended to the FASTA database. Decoy sequences are protein sequences are not expected to be present in samples. For more information on how to generate and append decoy sequences, see GTN Protein FASTA Database Handling.

Hands On: FastaCLI

FastaCLI ( Galaxy version 4.0.41+galaxy1) with the following parameters:

param-file “Protein Database”: output (Input dataset)

Converting RAW files to MGF files with msconvert

The msconvert tool allows for the conversion of mass spectrometry data files between different formats, such as thermo.raw, mgf, or mzml.

Hands On: msconvert: RAW to MGF

msconvert ( Galaxy version 3.0.20287.2) with the following parameters:

param-collection “Input unrefined MS data”: output (Input dataset collection)

“Do you agree to the vendor licenses?”: Yes

“Output Type”: mgf

In “Data Processing Filters”:

“Apply peak picking?”: Yes

“(Re-)calculate charge states?”: no

Question

Why do we need to use MGF instead of RAW files for Search GUI?

SearchGUI is compatible only with MGF files, hence you have to use msconvert or Thermofile converter tools to convert the RAW format to MGF fomat.

Perform Database searching with Search GUI

SearchGUI is a database-searching tool that comprises different search engines to match sample MS/MS spectra to known peptide sequences. In our analysis, we will use X!Tandem and MS-GF+ as search algorithms within SearchGUI for matching spectra from mass spectrometry data against peptides from the protein sequence database.

The SearchGUI tool will perform a database search based on the parameters we’ve set and will generate a file (called a SearchGUI archive file) that will serve as the input for the PeptideShaker tool. The SearchGUI archive file contains Peptide-Spectral Matches (PSMs), and PeptideShaker is a post-processing software that will assess the confidence of the data. PeptideShaker also infers the identities of proteins based on the matched peptide sequences, and users are able to visualize these outputs to interpret results. More information about database searching using SearchGUI and PeptideShaker is accessible at Metaproteomics tutorial.

Hands On: Peptide discovery using SearchGUI

Search GUI ( Galaxy version 4.0.41+galaxy1) with the following parameters:

param-file “Identification Parameters file”: Identification_Parameters_File (output of Identification Parameters tool)

param-file “Fasta file”: input_database_concatenated_target_decoy (output of FastaCLI tool)

param-file “Input Peak Lists”: output (output of msconvert tool)

“SearchGUI Options”: Default

Question

Why do we need to add decoy sequences to our FASTA database for Search GUI? And how many do we need to add?

Adding decoy sequences helps in FDR estimation, discriminating true positives from false positives, and quality control of the data. The number of decoy sequences you need to add to your database depends on the desired FDR level you want to achieve. A common practice is to use a 1:1 ratio of target sequences to decoy sequences. In other words, for every real protein sequence in your database, you would add a decoy sequence. This allows you to estimate the FDR at 1%, 5%, or any other chosen threshold.

Question

What is the Identification Parameters tool?

Identification Parameters tool is an input required by the search GUI tool, it contains all the parameters required to run the search algorithms.

Post-processing of SearchGUI output using with Peptide Shaker

Hands On: Peptide Shaker

Peptide Shaker ( Galaxy version 2.0.33+galaxy1) with the following parameters:

param-file “Compressed SearchGUI results”: searchgui_results (output of Search GUI tool)

In “Exporting options”:

“Follow-up analysis export options”: Do not export

“Identification features reports to be generated”: PSM Report Peptide Report Protein Report Certificate of Analysis

Question

What are the differences between the following reports from PeptideShaker: PSM report, Peptide report, and Protein report?

PSM reports focus on individual peptide-spectrum matches, providing detailed information about each spectrum and its assigned peptide sequence. Peptide reports summarize information about unique peptides and their properties. Protein reports, on the other hand, focus on proteins, including protein inference, grouping, and quantification, making them more suitable for understanding the overall protein composition in a sample. These reports serve different purposes in proteomic data analysis and are used to extract various levels of information from mass spectrometry results.

Using Text Manipulation Tools to Manage Microbial Outputs from SearchGUI/PeptideShaker

Hands On: Selecting microbial peptides from SearchGUI/PeptideShaker with Select tool

Select with the following parameters:

param-file “Select lines from”: output_peptides (output of Peptide Shaker tool)

“that”: NOT Matching

“the pattern”: (_HUMAN)|(_REVERSED)|(CON)|(con)

“Keep header line”: Yes

Question

What is the purpose of this step?

This step is to extract microbial peptides or to remove any peptides that match humans, reverse, contaminants, etc.

Hands On: Selecting microbial PSMs from SearchGUI/PeptideShaker with Select

Select with the following parameters:

param-file “Select lines from”: output_psm (output of Peptide Shaker tool)

“that”: NOT Matching

“the pattern”: (_HUMAN)|(_REVERSED)|(CON)|(con)

“Keep header line”: Yes

Hands On: Filtering confident microbial peptides from SGPS with Filter

Filter with the following parameters:

param-file “Filter”: out_file1 (output of Select tool)

“With following condition”: c17=='Confident'

“Number of header lines to skip”: 1

Question

In the Filtering steps, what does “Confidence” mean quantitatively, i.e. what is the percentage cutoff?

The term “Confidence” in the context of proteomic data analysis often refers to a measure of how reliable or trustworthy a particular protein or peptide identification is. However, the specific numerical value or percentage cutoff for confidence can vary depending on the software or approach you are using and the goals of your analysis. In many proteomics studies, researchers use a false discovery rate (FDR) to set a quantitative confidence threshold. Here we have set it as 1%FDR, which means that you’re accepting only 1% or less of your reported identifications as likely to be false positives.

Hands On: Filtering confident microbial PSMs from SGPS with Filter

Filter with the following parameters:

param-file “Filter”: out_file1 (output of Select tool)

“With following condition”: c24=='Confident'

“Number of header lines to skip”: 1

We will generate and merge the Human SwissProt Protein Database and contaminants (cRAP) and convert the resulting FASTA file to a tabular file that will be used in the Query Tabular tool to generate distinct microbial peptides from SearchGUI/PeptideShaker.

Hands On: Merging Human SwissProt and cRAP databases for Query Tabular with FASTA Merge Files and Filter Unique Sequences

FASTA Merge Files and Filter Unique Sequences ( Galaxy version 1.2.0) with the following parameters:

“Run in batch mode?”: Merge individual FASTAs (output collection if input is collection)

In “Input FASTA File(s)”:

param-repeat “Insert Input FASTA File(s)”

param-file “FASTA File”: Human Swissprot Protein Database (output of Protein Database Downloader tool)

param-file “FASTA File”: Contaminants cRAP database (output of Protein Database Downloader tool)

Hands On: Converting FASTA sequences to TAB-delimited file with FASTA-to-Tabular

FASTA-to-Tabular ( Galaxy version 1.1.0) with the following parameters:

param-file “Convert these sequences”: output (output of FASTA Merge Files and Filter Unique Sequences tool)

Hands On: Filtering out accession numbers from TAB-delimited file with Filter Tabular

Filter Tabular ( Galaxy version 3.3.0) with the following parameters:

param-file “Tabular Dataset to filter”: output (output of FASTA-to-Tabular tool)

In “Filter Tabular Input Lines”:

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: select columns

“enter column numbers to keep”: 1

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: regex replace value in column

“enter column number to replace”: 1

“regex pattern”: ^[^|]+[|]([^| ]+).*$

“replacement expression”: \1

Question

What’s the difference between a FASTA and Tabular output?

FASTA Output: Typically used to report identified peptide or protein sequences, which are useful for building or updating sequence databases, for downstream sequence analysis, or for re-searching against the sequences. Tabular Output: Used for presenting various information related to identified peptides or proteins, such as accession numbers, scores, abundance values, and other attributes. Tabular output facilitates data analysis, comparisons, and custom data processing.

Hands On: Querying protein accession numbers and peptide sequences of confident microbial PSMs (from SGPS) with Query Tabular

Query Tabular ( Galaxy version 3.3.0) with the following parameters:

In “Database Table”:

param-repeat “Insert Database Table”

param-file “Tabular Dataset for Table”: out_file1 (output of Filter tool)

In “Filter Dataset Input”:

In “Filter Tabular Input Lines”:

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: skip leading lines

“Skip lines”: 1

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: prepend a line number column

In “Table Options”:

“Specify Name for Table”: psms

“Specify Column Names (comma-separated list)”: ln,id,Proteins,Sequence

“Only load the columns you have named into database”: Yes

In “Table Index”:

param-repeat “Insert Table Index”

“Index on Columns”: ln

param-repeat “Insert Database Table”

param-file “Tabular Dataset for Table”: out_file1 (output of Filter tool)

In “Filter Dataset Input”:

In “Filter Tabular Input Lines”:

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: skip leading lines

“Skip lines”: 1

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: prepend a line number column

param-repeat “Insert Filter Tabular Input Lines”

“Filter By”: normalize list columns, replicates row for each item in list

“enter column numbers to normalize”: 3

In “Table Options”:

“Specify Name for Table”: prots

“Specify Column Names (comma-separated list)”: ln,id,prot

“Only load the columns you have named into database”: Yes

In “Table Index”:

param-repeat “Insert Table Index”

“This is a unique index”: Yes

“Index on Columns”: prot,ln

param-repeat “Insert Database Table”

param-file “Tabular Dataset for Table”: output (output of Filter Tabular tool)

In “Table Options”:

“Specify Name for Table”: Uniprot

“Specify Column Names (comma-separated list)”: prot

In “Table Index”:

param-repeat “Insert Table Index”

“Index on Columns”: prot

“SQL Query to generate tabular output”: SELECT id,Proteins,Sequence FROM psms WHERE psms.ln NOT IN (SELECT distinct prots.ln FROM prots JOIN Uniprot ON prots.prot = Uniprot.prot) ORDER BY psms.ln

“include query result column headers”: Yes

Hands On: Cutting out peptide sequences from Query Tabular with Cut

Cut with the following parameters:

“Cut columns”: c3

param-file “From”: output (output of Query Tabular tool)

Hands On: Grouping distinct (unique) peptides from SGPS with Group

Group with the following parameters:

param-file “Select data”: out_file1 (output of Cut tool)

“Group by column”: c1

Perform peptide discovery with MaxQuant

MaxQuant is an MS-based proteomics platform that is capable of processing raw data and provides improved mass precision and high precursor mass accuracy (HPMA), which resulted in increased protein identification and more in-depth proteomic analysis. Raw MS/MS spectra will be searched against the reduced MetaNovo-generated database (~21.2k sequences). More information about analysis using MaxQuant is available, including Label-free data analysis and MaxQuant and MSstats for the analysis of TMT data.

Hands On: Peptide discovery using MaxQuant

MaxQuant ( Galaxy version 2.0.3.0+galaxy0) with the following parameters:

In “Input Options”:

param-file “FASTA files”: output (Input dataset)

In “Search Options”:

param-file “Specify an experimental design template (if needed). For detailed instructions see the help text.”: output (Input dataset)

“minimum peptide length”: 8

“Match between runs”: Yes

“Maximum peptide length for unspecific searches”: 50

In “Protein quantification”:

“Use only unmodified peptides”: Yes

“Modifications used in protein quantification”: Oxidation (M)

In “LFQ Options”:

“iBAQ (calculates absolute protein abundances by normalizing to copy number and not protein mass)”: No

In “Parameter Group”:

param-repeat “Insert Parameter Group”

param-collection “Infiles”: output (Input dataset collection)

“fixed modifications”: Carbamidomethyl (C)

“variable modifications”: Oxidation (M)

“enzyme”: Trypsin/P

“Quantitation Methods”: reporter ion MS2

“isobaric labeling”: TMT11plex

“Filter by PIF”: True

“Generate PTXQC (proteomics quality control pipeline) report? (experimental setting)”: False

In “Output Options”:

“Select the desired outputs.”: Protein Groups mqpar.xml Peptides MSMS msms scans summary MaxQuant and PTXQC log yaml config file

Question

What is the Experimental Design file for MaxQuant?

In MaxQuant, the Experimental Design file is used to specify the experimental conditions, sample groups, and the relationships between different samples in a proteomics experiment. This file is a crucial component of the MaxQuant analysis process because it helps the software correctly organize and analyze the mass spectrometry data. The Experimental Design file typically has a “.txt” extension and is a tab-delimited text file. Here’s what you might include in an Experimental Design file for MaxQuant: Sample Names (You specify the names of each sample in your experiment. These names should be consistent with the naming conventions used in your raw data files.), Experimental Conditions (You define the experimental conditions or treatment groups associated with each sample. For example, you might have control and treated groups, and you would assign the appropriate condition to each sample.), Replicates (You indicate the replicates for each sample, which is important for assessing the statistical significance of your results. Replicates are typically denoted by numeric values (e.g., “1,” “2,” “3”) or by unique identifiers (e.g., “Replicate A,” “Replicate B”)), Labels (If you’re using isobaric labeling methods like TMT (Tandem Mass Tag) or iTRAQ (Isobaric Tags for Relative and Absolute Quantitation), you specify the labels associated with each sample. This is important for quantification.), Other Metadata (You can include additional metadata relevant to your experiment, such as the biological source, time points, or any other information that helps describe the samples and experimental conditions.)

Using Text Manipulation Tools to Manage Microbial Outputs from MaxQuant

Hands On: Selecting microbial peptides from MaxQuant with Select

Select with the following parameters:

param-file “Select lines from”: peptides (output of MaxQuant tool)

“that”: NOT Matching

“the pattern”: (_HUMAN)|(_REVERSED)|(CON)|(con)

“Keep header line”: Yes

Hands On: Cutting out microbial peptide sequences with Cut

Cut with the following parameters:

“Cut columns”: c1

param-file “From”: out_file1 (output of Select tool)

Hands On: Remove header line from MaxQuant peptide output with Remove beginning

Remove beginning with the following parameters:

param-file “from”: out_file1 (output of Cut tool)

Hands On: Grouping distinct (unique) peptide sequences from MaxQuant with Group

Group with the following parameters:

param-file “Select data”: out_file1 (output of Remove beginning tool)

“Group by column”: c1

Question

How case-sensitive is the Group tool? Can I only group by column values, and not row values?

You can make it case sensitive, by default it is not. The tool here does column grouping only.

Process SGPS and MaxQuant peptides to compile one list of unique microbial peptides

Hands On: Concatenate SGPS and MaxQuant peptides into a singular database with Concatenate datasets

Concatenate datasets ( Galaxy version 0.1.1) with the following parameters:

param-files “Datasets to concatenate”: out_file1 (output of Group tool), out_file1 (output of Group tool)

Hands On: Group the peptides from SGPS and MaxQuant to remove duplicates with Group

Group with the following parameters:

param-file “Select data”: out_file1 (output of Concatenate datasets tool)

“Group by column”: c1

Conclusion

By following this tutorial, you have effectively conducted a search of your MS/MS data against the compact database and successfully retrieved reliable microbial peptides. After identifying these microbial peptides with the assistance of MaxQuant and SearchGUI, the next step is to verify the presence of these peptides. This compiled list of unique peptides will serve as the input for PepQuery to validate the confident identification of microbial peptides with the help of the verification workflow.

You've Finished the Tutorial

Key points

Employ SearchGUI/PeptideShaker and MaxQuant for database searching

Extraction of confident microbial peptides for verification

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Subina Mehta, Katherine Do, Dechen Bhuming, Clinical Metaproteomics 2: Discovery (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/clinical-mp-2-discovery/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{proteomics-clinical-mp-2-discovery,
author = "Subina Mehta and Katherine Do and Dechen Bhuming",
	title = "Clinical Metaproteomics 2: Discovery (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/clinical-mp-2-discovery/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

Do you want to extend your knowledge?
Follow one of our recommended follow-up trainings:

tutorial Hands-on: Clinical Metaproteomics 3: Verification

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.

shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/proteomics/tutorials/clinical-mp-2-discovery/tutorial.json | jq .admin_install_yaml -r)

Alternatively you can copy and paste the following YAML

---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools:
- name: text_processing
  owner: bgruening
  revisions: d698c222f354
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fasta_to_tabular
  owner: devteam
  revisions: 7e801ab2b70e
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: dbbuilder
  owner: galaxyp
  revisions: 983bf725dfc2
  tool_panel_section_label: Get Data
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fasta_merge_files_and_filter_unique_sequences
  owner: galaxyp
  revisions: f546e7278f04
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: maxquant
  owner: galaxyp
  revisions: 9c52362ae7bb
  tool_panel_section_label: Proteomics
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: msconvert
  owner: galaxyp
  revisions: 6153e8ada1ee
  tool_panel_section_label: Convert Formats
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: peptideshaker
  owner: galaxyp
  revisions: 3ec27b4cee7c
  tool_panel_section_label: Proteomics
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: peptideshaker
  owner: galaxyp
  revisions: 3ec27b4cee7c
  tool_panel_section_label: Proteomics
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: peptideshaker
  owner: galaxyp
  revisions: 3ec27b4cee7c
  tool_panel_section_label: Proteomics
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: peptideshaker
  owner: galaxyp
  revisions: 3ec27b4cee7c
  tool_panel_section_label: Proteomics
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: filter_tabular
  owner: iuc
  revisions: 557ec8d7087d
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: query_tabular
  owner: iuc
  revisions: cf34c344508d
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.