Proteogenomics 1: Database Creation

proteomics-proteogenomics-dbcreation/galaxy-workflow-mouse-rnaseq-dbcreation

Author(s)

Version
10
Last updated
May 8, 2025
License
None Specified, defaults to CC-BY-4.0
Tags

Features

Includes a Galaxy Workflow Report

Tutorial
Proteogenomics 1: Database Creation

Workflow Testing
Tests: ❌
Results: Not yet automated

FAIRness PURL
https://gxy.io/GTN:W00172

Download Workflow RO-Crate

View on WorkflowHub

Launch in Tutorial Mode
Download

flowchart TD
  0["ℹ️ Input Dataset\nTrimmed_ref_5000_uniprot_cRAP.fasta"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Dataset\nFASTQ_ProB_22LIST.fastqsanger"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["ℹ️ Input Dataset\nReference Annotation"];
  style 2 stroke:#2c3143,stroke-width:4px;
  3["ℹ️ Input Parameter\nReference Genome"];
  style 3 fill:#ded,stroke:#393,stroke-width:4px;
  4["ℹ️ Input Parameter\nReference Genome Annotation for CustomProDB"];
  style 4 fill:#ded,stroke:#393,stroke-width:4px;
  5["Filter Tabular"];
  0 -->|output| 5;
  6["Replace Text"];
  2 -->|output| 6;
  da9f13a5-80a6-4ed5-8563-b3135cf2bb07["Output\nReference Annotation fixed.gtf"];
  6 --> da9f13a5-80a6-4ed5-8563-b3135cf2bb07;
  style da9f13a5-80a6-4ed5-8563-b3135cf2bb07 stroke:#2c3143,stroke-width:4px;
  7["HISAT2"];
  1 -->|output| 7;
  3 -->|output| 7;
  8468a5c2-313b-4cc2-8676-3f255c579bdd["Output\nHISAT_Output.BAM"];
  7 --> 8468a5c2-313b-4cc2-8676-3f255c579bdd;
  style 8468a5c2-313b-4cc2-8676-3f255c579bdd stroke:#2c3143,stroke-width:4px;
  8["FreeBayes"];
  7 -->|output_alignments| 8;
  3 -->|output| 8;
  9["StringTie"];
  6 -->|outfile| 9;
  7 -->|output_alignments| 9;
  940b9a6a-19de-4430-8ead-9636f1cfcd4c["Output\nStringtie_output.gtf"];
  9 --> 940b9a6a-19de-4430-8ead-9636f1cfcd4c;
  style 940b9a6a-19de-4430-8ead-9636f1cfcd4c stroke:#2c3143,stroke-width:4px;
  10["CustomProDB"];
  7 -->|output_alignments| 10;
  4 -->|output| 10;
  8 -->|output_vcf| 10;
  11["GffCompare"];
  2 -->|output| 11;
  9 -->|output_gtf| 11;
  171de99f-3ec1-487f-92bc-9919dc0d904a["Output\ntranscripts_annotated"];
  11 --> 171de99f-3ec1-487f-92bc-9919dc0d904a;
  style 171de99f-3ec1-487f-92bc-9919dc0d904a stroke:#2c3143,stroke-width:4px;
  12["FASTA Merge Files and Filter Unique Sequences"];
  10 -->|output_rpkm| 12;
  10 -->|output_snv| 12;
  10 -->|output_indel| 12;
  04a7cf44-f98a-4dca-99e5-9a1ae56dbb0d["Output\nMerged and Filtered FASTA from CustomProDB"];
  12 --> 04a7cf44-f98a-4dca-99e5-9a1ae56dbb0d;
  style 04a7cf44-f98a-4dca-99e5-9a1ae56dbb0d stroke:#2c3143,stroke-width:4px;
  13["SQLite to tabular"];
  10 -->|output_genomic_mapping_sqlite| 13;
  5fb81959-b607-4d81-89b5-9da027c538d1["Output\ngenomic_mapping_sqlite"];
  13 --> 5fb81959-b607-4d81-89b5-9da027c538d1;
  style 5fb81959-b607-4d81-89b5-9da027c538d1 stroke:#2c3143,stroke-width:4px;
  14["SQLite to tabular"];
  10 -->|output_variant_annotation_sqlite| 14;
  a5cc8d8b-3401-46c1-a0d0-eb5af7d04b4c["Output\nvariant_annotation_sqlite"];
  14 --> a5cc8d8b-3401-46c1-a0d0-eb5af7d04b4c;
  style a5cc8d8b-3401-46c1-a0d0-eb5af7d04b4c stroke:#2c3143,stroke-width:4px;
  15["FASTA-to-Tabular"];
  10 -->|output_rpkm| 15;
  16["Convert gffCompare annotated GTF to BED"];
  11 -->|transcripts_annotated| 16;
  17["FASTA-to-Tabular"];
  12 -->|output| 17;
  18["Column Regex Find And Replace"];
  13 -->|query_results| 18;
  1d644076-da23-4303-81c9-63e5e87cb053["Output\nSAV_INDEL"];
  18 --> 1d644076-da23-4303-81c9-63e5e87cb053;
  style 1d644076-da23-4303-81c9-63e5e87cb053 stroke:#2c3143,stroke-width:4px;
  19["Column Regex Find And Replace"];
  14 -->|query_results| 19;
  45749ecf-369c-4569-965a-261dcce60d23["Output\nvariant_annotation"];
  19 --> 45749ecf-369c-4569-965a-261dcce60d23;
  style 45749ecf-369c-4569-965a-261dcce60d23 stroke:#2c3143,stroke-width:4px;
  20["Filter Tabular"];
  15 -->|output| 20;
  21["Translate BED transcripts"];
  16 -->|output| 21;
  3119e0c0-30d4-49dd-bdc9-56ca6d7fbbc2["Output\nTranslate cDNA_minus_CDS"];
  21 --> 3119e0c0-30d4-49dd-bdc9-56ca6d7fbbc2;
  style 3119e0c0-30d4-49dd-bdc9-56ca6d7fbbc2 stroke:#2c3143,stroke-width:4px;
  22["Column Regex Find And Replace"];
  17 -->|output| 22;
  23["Query Tabular"];
  19 -->|out_file1| 23;
  de055cf9-fab7-4848-a518-135e3f895fc6["Output\nVariant_annotation_sqlitedb"];
  23 --> de055cf9-fab7-4848-a518-135e3f895fc6;
  style de055cf9-fab7-4848-a518-135e3f895fc6 stroke:#2c3143,stroke-width:4px;
  24["Concatenate multiple datasets"];
  20 -->|output| 24;
  5 -->|output| 24;
  25["bed to protein map"];
  21 -->|translation_bed| 25;
  26["Tabular-to-FASTA"];
  22 -->|out_file1| 26;
  d57ef383-ba16-42c2-9ee9-4fdb4acca1a2["Output\nCustomProDB Merged Fasta"];
  26 --> d57ef383-ba16-42c2-9ee9-4fdb4acca1a2;
  style d57ef383-ba16-42c2-9ee9-4fdb4acca1a2 stroke:#2c3143,stroke-width:4px;
  27["Concatenate multiple datasets"];
  18 -->|out_file1| 27;
  25 -->|output| 27;
  b30d6097-a709-42d3-933a-aeedc4e90ec6["Output\nGenomic_Protein_map"];
  27 --> b30d6097-a709-42d3-933a-aeedc4e90ec6;
  style b30d6097-a709-42d3-933a-aeedc4e90ec6 stroke:#2c3143,stroke-width:4px;
  28["FASTA Merge Files and Filter Unique Sequences"];
  26 -->|output| 28;
  21 -->|translation_fasta| 28;
  0 -->|output| 28;
  ce87fe82-008d-404d-b512-034116326e1e["Output\nUniprot_cRAP_SAV_indel_translatedbed"];
  28 --> ce87fe82-008d-404d-b512-034116326e1e;
  style ce87fe82-008d-404d-b512-034116326e1e stroke:#2c3143,stroke-width:4px;
  29["Query Tabular"];
  27 -->|out_file1| 29;
  d470b980-5ccc-4217-91fb-ced35304e854["Output\ngenomic_mapping_sqlitedb"];
  29 --> d470b980-5ccc-4217-91fb-ced35304e854;
  style d470b980-5ccc-4217-91fb-ced35304e854 stroke:#2c3143,stroke-width:4px;

Inputs

Input	Label
Input dataset	Trimmed_ref_5000_uniprot_cRAP.fasta
Input dataset	FASTQ_ProB_22LIST.fastqsanger
Input dataset	Reference Annotation
Input parameter	Reference Genome
Input parameter	Reference Genome Annotation for CustomProDB

Outputs

From	Output	Label
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/9.5+galaxy0	Replace Text
toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1	HISAT2
toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.3+galaxy0	StringTie
toolshed.g2.bx.psu.edu/repos/iuc/gffcompare/gffcompare/0.12.6+galaxy0	GffCompare
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0	FASTA Merge Files and Filter Unique Sequences
toolshed.g2.bx.psu.edu/repos/iuc/sqlite_to_tabular/sqlite_to_tabular/3.2.1	SQLite to tabular
toolshed.g2.bx.psu.edu/repos/iuc/sqlite_to_tabular/sqlite_to_tabular/3.2.1	SQLite to tabular
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3	Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3	Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/galaxyp/translate_bed/translate_bed/0.1.0	Translate BED transcripts
toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2	Query Tabular
toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta/1.1.1	Tabular-to-FASTA
toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.3	Concatenate multiple datasets
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0	FASTA Merge Files and Filter Unique Sequences
toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2	Query Tabular

Tools

Tool	Links
toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.3	View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/9.5+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/fasta_to_tabular/fasta2tab/1.1.1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/freebayes/freebayes/1.3.9+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta/1.1.1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/bed_to_protein_map/bed_to_protein_map/0.2.0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/custom_pro_db/custom_pro_db/1.22.0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/gffcompare_to_bed/gffcompare_to_bed/0.2.1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3	View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/translate_bed/translate_bed/0.1.0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/gffcompare/gffcompare/0.12.6+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/sqlite_to_tabular/sqlite_to_tabular/3.2.1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.3+galaxy0	View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!

Hands On: Importing a workflow

Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows

Click on galaxy-upload Import at the top-right of the screen

Provide your workflow

Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”

Option 2: Upload the workflow file in the box labelled “Archived Workflow File”

Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version	Commit	Time	Comments
15	2da1f4758	2025-05-07 16:18:34	fix regex for merging fasta files
14	2b884c8b5	2025-05-06 09:18:08	add workflow tag
13	a022e7683	2025-05-05 22:44:23	update tools in tutorial and workflow
12	28fc28c45	2021-02-08 19:18:49	Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
11	1eca54ca9	2021-01-27 21:47:56	Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
10	0db38ae94	2021-01-27 21:41:38	Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
9	13e2ffc4d	2021-01-27 21:27:34	Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
8	2c4b4646c	2021-01-27 21:17:15	Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
7	f99b54e3c	2021-01-27 21:01:48	Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
6	667ff3de9	2020-01-22 10:59:29	annotation
5	eb4d724e0	2020-01-15 10:41:35	Workflow renaming
4	55fe079b2	2020-01-13 16:30:56	WoLF PSORT WF
3	361236c41	2019-04-04 09:00:14	Changed format of workflows
2	6eef55b7e	2019-02-27 18:54:36	Updated install_tutorial_requirements.sh + minor fixes (#1275)
1	a928824de	2018-08-25 09:12:50	add protegenomics dbcreation tutorial

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/proteogenomics-dbcreation/workflows/galaxy-workflow-mouse_rnaseq_dbcreation.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows