Proteogenomics 1: Database Creation

proteomics-proteogenomics-dbcreation/galaxy-workflow-mouse-rnaseq-dbcreation

Author(s)

version Version
10
last_modification Last updated
May 8, 2025
license License
None Specified, defaults to CC-BY-4.0
galaxy-tags Tags

Features
Tutorial
hands_on Proteogenomics 1: Database Creation

Workflow Testing
Tests: ❌
Results: Not yet automated
FAIRness purl PURL
https://gxy.io/GTN:W00172
RO-Crate logo with flask Download Workflow RO-Crate Workflowhub cloud with gears logo View on (Dev) WorkflowHub
Launch in Tutorial Mode question
galaxy-download Download
flowchart TD
  0["ℹ️ Input Dataset\nTrimmed_ref_5000_uniprot_cRAP.fasta"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Dataset\nFASTQ_ProB_22LIST.fastqsanger"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["ℹ️ Input Dataset\nReference Annotation"];
  style 2 stroke:#2c3143,stroke-width:4px;
  3["ℹ️ Input Parameter\nReference Genome"];
  style 3 fill:#ded,stroke:#393,stroke-width:4px;
  4["ℹ️ Input Parameter\nReference Genome Annotation for CustomProDB"];
  style 4 fill:#ded,stroke:#393,stroke-width:4px;
  5["Filter Tabular"];
  0 -->|output| 5;
  6["Replace Text"];
  2 -->|output| 6;
  da9f13a5-80a6-4ed5-8563-b3135cf2bb07["Output\nReference Annotation fixed.gtf"];
  6 --> da9f13a5-80a6-4ed5-8563-b3135cf2bb07;
  style da9f13a5-80a6-4ed5-8563-b3135cf2bb07 stroke:#2c3143,stroke-width:4px;
  7["HISAT2"];
  1 -->|output| 7;
  3 -->|output| 7;
  8468a5c2-313b-4cc2-8676-3f255c579bdd["Output\nHISAT_Output.BAM"];
  7 --> 8468a5c2-313b-4cc2-8676-3f255c579bdd;
  style 8468a5c2-313b-4cc2-8676-3f255c579bdd stroke:#2c3143,stroke-width:4px;
  8["FreeBayes"];
  7 -->|output_alignments| 8;
  3 -->|output| 8;
  9["StringTie"];
  6 -->|outfile| 9;
  7 -->|output_alignments| 9;
  940b9a6a-19de-4430-8ead-9636f1cfcd4c["Output\nStringtie_output.gtf"];
  9 --> 940b9a6a-19de-4430-8ead-9636f1cfcd4c;
  style 940b9a6a-19de-4430-8ead-9636f1cfcd4c stroke:#2c3143,stroke-width:4px;
  10["CustomProDB"];
  7 -->|output_alignments| 10;
  4 -->|output| 10;
  8 -->|output_vcf| 10;
  11["GffCompare"];
  2 -->|output| 11;
  9 -->|output_gtf| 11;
  171de99f-3ec1-487f-92bc-9919dc0d904a["Output\ntranscripts_annotated"];
  11 --> 171de99f-3ec1-487f-92bc-9919dc0d904a;
  style 171de99f-3ec1-487f-92bc-9919dc0d904a stroke:#2c3143,stroke-width:4px;
  12["FASTA Merge Files and Filter Unique Sequences"];
  10 -->|output_rpkm| 12;
  10 -->|output_snv| 12;
  10 -->|output_indel| 12;
  04a7cf44-f98a-4dca-99e5-9a1ae56dbb0d["Output\nMerged and Filtered FASTA from CustomProDB"];
  12 --> 04a7cf44-f98a-4dca-99e5-9a1ae56dbb0d;
  style 04a7cf44-f98a-4dca-99e5-9a1ae56dbb0d stroke:#2c3143,stroke-width:4px;
  13["SQLite to tabular"];
  10 -->|output_genomic_mapping_sqlite| 13;
  5fb81959-b607-4d81-89b5-9da027c538d1["Output\ngenomic_mapping_sqlite"];
  13 --> 5fb81959-b607-4d81-89b5-9da027c538d1;
  style 5fb81959-b607-4d81-89b5-9da027c538d1 stroke:#2c3143,stroke-width:4px;
  14["SQLite to tabular"];
  10 -->|output_variant_annotation_sqlite| 14;
  a5cc8d8b-3401-46c1-a0d0-eb5af7d04b4c["Output\nvariant_annotation_sqlite"];
  14 --> a5cc8d8b-3401-46c1-a0d0-eb5af7d04b4c;
  style a5cc8d8b-3401-46c1-a0d0-eb5af7d04b4c stroke:#2c3143,stroke-width:4px;
  15["FASTA-to-Tabular"];
  10 -->|output_rpkm| 15;
  16["Convert gffCompare annotated GTF to BED"];
  11 -->|transcripts_annotated| 16;
  17["FASTA-to-Tabular"];
  12 -->|output| 17;
  18["Column Regex Find And Replace"];
  13 -->|query_results| 18;
  1d644076-da23-4303-81c9-63e5e87cb053["Output\nSAV_INDEL"];
  18 --> 1d644076-da23-4303-81c9-63e5e87cb053;
  style 1d644076-da23-4303-81c9-63e5e87cb053 stroke:#2c3143,stroke-width:4px;
  19["Column Regex Find And Replace"];
  14 -->|query_results| 19;
  45749ecf-369c-4569-965a-261dcce60d23["Output\nvariant_annotation"];
  19 --> 45749ecf-369c-4569-965a-261dcce60d23;
  style 45749ecf-369c-4569-965a-261dcce60d23 stroke:#2c3143,stroke-width:4px;
  20["Filter Tabular"];
  15 -->|output| 20;
  21["Translate BED transcripts"];
  16 -->|output| 21;
  3119e0c0-30d4-49dd-bdc9-56ca6d7fbbc2["Output\nTranslate cDNA_minus_CDS"];
  21 --> 3119e0c0-30d4-49dd-bdc9-56ca6d7fbbc2;
  style 3119e0c0-30d4-49dd-bdc9-56ca6d7fbbc2 stroke:#2c3143,stroke-width:4px;
  22["Column Regex Find And Replace"];
  17 -->|output| 22;
  23["Query Tabular"];
  19 -->|out_file1| 23;
  de055cf9-fab7-4848-a518-135e3f895fc6["Output\nVariant_annotation_sqlitedb"];
  23 --> de055cf9-fab7-4848-a518-135e3f895fc6;
  style de055cf9-fab7-4848-a518-135e3f895fc6 stroke:#2c3143,stroke-width:4px;
  24["Concatenate multiple datasets"];
  20 -->|output| 24;
  5 -->|output| 24;
  25["bed to protein map"];
  21 -->|translation_bed| 25;
  26["Tabular-to-FASTA"];
  22 -->|out_file1| 26;
  d57ef383-ba16-42c2-9ee9-4fdb4acca1a2["Output\nCustomProDB Merged Fasta"];
  26 --> d57ef383-ba16-42c2-9ee9-4fdb4acca1a2;
  style d57ef383-ba16-42c2-9ee9-4fdb4acca1a2 stroke:#2c3143,stroke-width:4px;
  27["Concatenate multiple datasets"];
  18 -->|out_file1| 27;
  25 -->|output| 27;
  b30d6097-a709-42d3-933a-aeedc4e90ec6["Output\nGenomic_Protein_map"];
  27 --> b30d6097-a709-42d3-933a-aeedc4e90ec6;
  style b30d6097-a709-42d3-933a-aeedc4e90ec6 stroke:#2c3143,stroke-width:4px;
  28["FASTA Merge Files and Filter Unique Sequences"];
  26 -->|output| 28;
  21 -->|translation_fasta| 28;
  0 -->|output| 28;
  ce87fe82-008d-404d-b512-034116326e1e["Output\nUniprot_cRAP_SAV_indel_translatedbed"];
  28 --> ce87fe82-008d-404d-b512-034116326e1e;
  style ce87fe82-008d-404d-b512-034116326e1e stroke:#2c3143,stroke-width:4px;
  29["Query Tabular"];
  27 -->|out_file1| 29;
  d470b980-5ccc-4217-91fb-ced35304e854["Output\ngenomic_mapping_sqlitedb"];
  29 --> d470b980-5ccc-4217-91fb-ced35304e854;
  style d470b980-5ccc-4217-91fb-ced35304e854 stroke:#2c3143,stroke-width:4px;

Inputs

Input Label
Input dataset Trimmed_ref_5000_uniprot_cRAP.fasta
Input dataset FASTQ_ProB_22LIST.fastqsanger
Input dataset Reference Annotation
Input parameter Reference Genome
Input parameter Reference Genome Annotation for CustomProDB

Outputs

From Output Label
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/9.5+galaxy0 Replace Text
toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1 HISAT2
toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.3+galaxy0 StringTie
toolshed.g2.bx.psu.edu/repos/iuc/gffcompare/gffcompare/0.12.6+galaxy0 GffCompare
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0 FASTA Merge Files and Filter Unique Sequences
toolshed.g2.bx.psu.edu/repos/iuc/sqlite_to_tabular/sqlite_to_tabular/3.2.1 SQLite to tabular
toolshed.g2.bx.psu.edu/repos/iuc/sqlite_to_tabular/sqlite_to_tabular/3.2.1 SQLite to tabular
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 Column Regex Find And Replace
toolshed.g2.bx.psu.edu/repos/galaxyp/translate_bed/translate_bed/0.1.0 Translate BED transcripts
toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2 Query Tabular
toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta/1.1.1 Tabular-to-FASTA
toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.3 Concatenate multiple datasets
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0 FASTA Merge Files and Filter Unique Sequences
toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2 Query Tabular

Tools

Tool Links
toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/9.5+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/fasta_to_tabular/fasta2tab/1.1.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/freebayes/freebayes/1.3.9+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/tabular_to_fasta/tab2fasta/1.1.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/bed_to_protein_map/bed_to_protein_map/0.2.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/custom_pro_db/custom_pro_db/1.22.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/gffcompare_to_bed/gffcompare_to_bed/0.2.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regexColumn1/1.0.3 View in ToolShed
toolshed.g2.bx.psu.edu/repos/galaxyp/translate_bed/translate_bed/0.1.0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/gffcompare/gffcompare/0.12.6+galaxy0 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.2.1+galaxy1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/sqlite_to_tabular/sqlite_to_tabular/3.2.1 View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.3+galaxy0 View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!
Hands On: Importing a workflow
  1. Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows
  2. Click on galaxy-upload Import at the top-right of the screen
  3. Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  4. Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version Commit Time Comments
15 2da1f4758 2025-05-07 16:18:34 fix regex for merging fasta files
14 2b884c8b5 2025-05-06 09:18:08 add workflow tag
13 a022e7683 2025-05-05 22:44:23 update tools in tutorial and workflow
12 28fc28c45 2021-02-08 19:18:49 Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
11 1eca54ca9 2021-01-27 21:47:56 Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
10 0db38ae94 2021-01-27 21:41:38 Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
9 13e2ffc4d 2021-01-27 21:27:34 Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
8 2c4b4646c 2021-01-27 21:17:15 Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
7 f99b54e3c 2021-01-27 21:01:48 Update galaxy-workflow-mouse_rnaseq_dbcreation.ga
6 667ff3de9 2020-01-22 10:59:29 annotation
5 eb4d724e0 2020-01-15 10:41:35 Workflow renaming
4 55fe079b2 2020-01-13 16:30:56 WoLF PSORT WF
3 361236c41 2019-04-04 09:00:14 Changed format of workflows
2 6eef55b7e 2019-02-27 18:54:36 Updated install_tutorial_requirements.sh + minor fixes (#1275)
1 a928824de 2018-08-25 09:12:50 add protegenomics dbcreation tutorial

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/proteogenomics-dbcreation/workflows/galaxy-workflow-mouse_rnaseq_dbcreation.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows