QC + Mapping + Counting (single+paired) - Ref Based RNA Seq - Transcriptomics - GTN

transcriptomics-ref-based/qc-mapping-counting-paired-and-single

Author(s)
Bérénice Batut, Mallory Freeberg, Mo Heydarian, Anika Erxleben, Pavankumar Videm, Clemens Blank, Maria Doyle, Nicola Soranzo, Peter van Heusden, Lucille Delisle
Version
6
Last updated
Feb 11, 2025
License
MIT
Tags

transcriptomics

Features

Tutorial
Reference-based RNA-Seq data analysis
Other workflows associated with this material

Workflow Testing
Tests: ✅
Results: Not yet automated

FAIRness PURL
https://gxy.io/GTN:W00245

Download Workflow RO-Crate

View on WorkflowHub

Launch in Tutorial Mode
Download

flowchart TD
  0["ℹ️ Input Collection\nsingle fastqs"];
  style 0 stroke:#2c3143,stroke-width:4px;
  1["ℹ️ Input Collection\npaired fastqs"];
  style 1 stroke:#2c3143,stroke-width:4px;
  2["ℹ️ Input Dataset\nDrosophila_melanogaster.BDGP6.32.109_UCSC.gtf.gz"];
  style 2 stroke:#2c3143,stroke-width:4px;
  3["Cutadapt: remove bad quality bp"];
  0 -->|output| 3;
  4["Flatten paired collection for Falco"];
  1 -->|output| 4;
  5["Cutadapt"];
  1 -->|output| 5;
  6["Get gene length"];
  2 -->|output| 6;
  077640cc-edbb-4185-9eb1-d11b522774af["Output\nGene length"];
  6 --> 077640cc-edbb-4185-9eb1-d11b522774af;
  style 077640cc-edbb-4185-9eb1-d11b522774af stroke:#2c3143,stroke-width:4px;
  7["convert gtf to bed12"];
  2 -->|output| 7;
  8["STAR: map single reads"];
  2 -->|output| 8;
  3 -->|out1| 8;
  9["Merge fastqs for Falco"];
  4 -->|output| 9;
  0 -->|output| 9;
  10["Merge Cutadapt reports"];
  5 -->|report| 10;
  3 -->|report| 10;
  11["STAR: map paired reads"];
  2 -->|output| 11;
  5 -->|out_pairs| 11;
  12["count reads per gene for SR"];
  8 -->|mapped_reads| 12;
  2 -->|output| 12;
  13["Falco"];
  9 -->|output| 13;
  14["Combine cutadapt results"];
  10 -->|output| 14;
  cab760db-5c9d-4a3c-b768-998bfbac6b57["Output\nmultiqc_cutadapt_html"];
  14 --> cab760db-5c9d-4a3c-b768-998bfbac6b57;
  style cab760db-5c9d-4a3c-b768-998bfbac6b57 stroke:#2c3143,stroke-width:4px;
  15["Merge STAR logs"];
  11 -->|output_log| 15;
  8 -->|output_log| 15;
  16["Merge STAR counts"];
  8 -->|reads_per_gene| 16;
  11 -->|reads_per_gene| 16;
  17["count fragments per gene for PE"];
  11 -->|mapped_reads| 17;
  2 -->|output| 17;
  1527b5d7-1681-4934-9d9e-3a5f86ae0fee["Output\nfeatureCounts_gene_length"];
  17 --> 1527b5d7-1681-4934-9d9e-3a5f86ae0fee;
  style 1527b5d7-1681-4934-9d9e-3a5f86ae0fee stroke:#2c3143,stroke-width:4px;
  18["Merge STAR BAM"];
  11 -->|mapped_reads| 18;
  8 -->|mapped_reads| 18;
  802017f4-fb1a-4243-b50d-2ed46f746f11["Output\nSTAR_BAM"];
  18 --> 802017f4-fb1a-4243-b50d-2ed46f746f11;
  style 802017f4-fb1a-4243-b50d-2ed46f746f11 stroke:#2c3143,stroke-width:4px;
  19["merge coverage unique strand 1"];
  8 -->|signal_unique_str1| 19;
  11 -->|signal_unique_str1| 19;
  20["merge coverage unique strand 2"];
  8 -->|signal_unique_str2| 20;
  11 -->|signal_unique_str2| 20;
  21["Combine FastQC results"];
  13 -->|text_file| 21;
  791893e6-6e65-49fe-b71f-4a7b6482e0ce["Output\nmultiqc_falco_html"];
  21 --> 791893e6-6e65-49fe-b71f-4a7b6482e0ce;
  style 791893e6-6e65-49fe-b71f-4a7b6482e0ce stroke:#2c3143,stroke-width:4px;
  22["Combine STAR Results"];
  15 -->|output| 22;
  204e3f6c-6f54-46f0-b07c-1f31113265e7["Output\nmultiqc_star_html"];
  22 --> 204e3f6c-6f54-46f0-b07c-1f31113265e7;
  style 204e3f6c-6f54-46f0-b07c-1f31113265e7 stroke:#2c3143,stroke-width:4px;
  23["Remove statistics from STAR counts"];
  16 -->|output| 23;
  24["Determine library strandness with STAR"];
  16 -->|output| 24;
  fe7b84dd-4466-4fe7-94a8-408f4ac7ed1a["Output\nmultiqc_star_counts_html"];
  24 --> fe7b84dd-4466-4fe7-94a8-408f4ac7ed1a;
  style fe7b84dd-4466-4fe7-94a8-408f4ac7ed1a stroke:#2c3143,stroke-width:4px;
  25["merge counts from featureCounts"];
  12 -->|output_short| 25;
  17 -->|output_short| 25;
  c82388f8-cb09-4fdf-8a0e-03cdad579f37["Output\nfeatureCounts"];
  25 --> c82388f8-cb09-4fdf-8a0e-03cdad579f37;
  style c82388f8-cb09-4fdf-8a0e-03cdad579f37 stroke:#2c3143,stroke-width:4px;
  26["merge featureCounts summary"];
  12 -->|output_summary| 26;
  17 -->|output_summary| 26;
  27["Determine library strandness with Infer Experiment"];
  18 -->|output| 27;
  7 -->|bed_file| 27;
  940ec3ec-dd2e-4d50-bbc4-756945eb16b2["Output\ninferexperiment"];
  27 --> 940ec3ec-dd2e-4d50-bbc4-756945eb16b2;
  style 940ec3ec-dd2e-4d50-bbc4-756945eb16b2 stroke:#2c3143,stroke-width:4px;
  28["Read Distribution"];
  18 -->|output| 28;
  7 -->|bed_file| 28;
  29["Compute read distribution statistics"];
  18 -->|output| 29;
  7 -->|bed_file| 29;
  30["sample BAM"];
  18 -->|output| 30;
  31["Get reads number per chromosome"];
  18 -->|output| 31;
  32["Remove duplicates"];
  18 -->|output| 32;
  33["Determine library strandness with STAR coverage"];
  19 -->|output| 33;
  20 -->|output| 33;
  2 -->|output| 33;
  89e1b053-03c2-467a-95a0-d2dc404670ec["Output\npgt"];
  33 --> 89e1b053-03c2-467a-95a0-d2dc404670ec;
  style 89e1b053-03c2-467a-95a0-d2dc404670ec stroke:#2c3143,stroke-width:4px;
  34["Select unstranded counts"];
  23 -->|outfile| 34;
  bce755be-ac3b-4346-9ac5-1128a287bf00["Output\ncounts_from_star"];
  34 --> bce755be-ac3b-4346-9ac5-1128a287bf00;
  style bce755be-ac3b-4346-9ac5-1128a287bf00 stroke:#2c3143,stroke-width:4px;
  35["Sort counts to get gene with highest count on feature Counts"];
  25 -->|output| 35;
  6aeb4dd1-445f-4c66-b1ce-4bb8faac53db["Output\nfeatureCounts_sorted"];
  35 --> 6aeb4dd1-445f-4c66-b1ce-4bb8faac53db;
  style 6aeb4dd1-445f-4c66-b1ce-4bb8faac53db stroke:#2c3143,stroke-width:4px;
  36["Combine read asignments statistics"];
  26 -->|output| 36;
  fc72242a-f23c-4ceb-9a8b-5280343ea5d6["Output\nmultiqc_featureCounts_html"];
  36 --> fc72242a-f23c-4ceb-9a8b-5280343ea5d6;
  style fc72242a-f23c-4ceb-9a8b-5280343ea5d6 stroke:#2c3143,stroke-width:4px;
  37["Combine read distribution on known features"];
  29 -->|output| 37;
  07dca732-0ac7-432e-9e61-2b77f921a23b["Output\nmultiqc_read_distrib"];
  37 --> 07dca732-0ac7-432e-9e61-2b77f921a23b;
  style 07dca732-0ac7-432e-9e61-2b77f921a23b stroke:#2c3143,stroke-width:4px;
  38["Get gene body coverage"];
  30 -->|outputsam| 38;
  7 -->|bed_file| 38;
  39["Combine results on reads per chromosome"];
  31 -->|output| 39;
  7bfa8ae7-8ffd-46a1-a56e-815ed2c9f1cf["Output\nmultiqc_reads_per_chrom"];
  39 --> 7bfa8ae7-8ffd-46a1-a56e-815ed2c9f1cf;
  style 7bfa8ae7-8ffd-46a1-a56e-815ed2c9f1cf stroke:#2c3143,stroke-width:4px;
  40["Combine results of duplicate reads"];
  32 -->|metrics_file| 40;
  66553d0f-e851-458b-82c2-f9b30e394bac["Output\nmultiqc_dup"];
  40 --> 66553d0f-e851-458b-82c2-f9b30e394bac;
  style 66553d0f-e851-458b-82c2-f9b30e394bac stroke:#2c3143,stroke-width:4px;
  41["Sort counts to get gene with highest count on STAR"];
  34 -->|out_file1| 41;
  383df008-0ccb-4d67-98dd-33fa5e2db81e["Output\ncounts_from_star_sorted"];
  41 --> 383df008-0ccb-4d67-98dd-33fa5e2db81e;
  style 383df008-0ccb-4d67-98dd-33fa5e2db81e stroke:#2c3143,stroke-width:4px;
  42["Combine gene body coverage"];
  38 -->|outputtxt| 42;
  8544ea5c-faf2-44c9-85d6-40658fc9b9eb["Output\nmultiqc_gene_body_cov"];
  42 --> 8544ea5c-faf2-44c9-85d6-40658fc9b9eb;
  style 8544ea5c-faf2-44c9-85d6-40658fc9b9eb stroke:#2c3143,stroke-width:4px;

Inputs

Input	Label
Input dataset collection	single fastqs
Input dataset collection	paired fastqs
Input dataset	Drosophila_melanogaster.BDGP6.32.109_UCSC.gtf.gz

Outputs

From	Output	Label
toolshed.g2.bx.psu.edu/repos/iuc/length_and_gc_content/length_and_gc_content/0.1.2	Gene length and GC content	Get gene length
toolshed.g2.bx.psu.edu/repos/iuc/gtftobed12/gtftobed12/357	Convert GTF to BED12	convert gtf to bed12
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine cutadapt results
toolshed.g2.bx.psu.edu/repos/iuc/featurecounts/featurecounts/2.0.3+galaxy2	featureCounts	count fragments per gene for PE
__MERGE_COLLECTION__	Merge collections	Merge STAR BAM
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine FastQC results
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine STAR Results
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Determine library strandness with STAR
__MERGE_COLLECTION__	Merge collections	merge counts from featureCounts
toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_infer_experiment/5.0.3+galaxy0	Infer Experiment	Determine library strandness with Infer Experiment
toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_read_distribution/5.0.3+galaxy0	Read Distribution
toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MarkDuplicates/3.1.1.0	MarkDuplicates	Remove duplicates
toolshed.g2.bx.psu.edu/repos/iuc/pygenometracks/pygenomeTracks/3.8+galaxy2	pyGenomeTracks	Determine library strandness with STAR coverage
Cut1	Cut	Select unstranded counts
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/9.3+galaxy1	Sort	Sort counts to get gene with highest count on feature Counts
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine read asignments statistics
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine read distribution on known features
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine results on reads per chromosome
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine results of duplicate reads
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/9.3+galaxy1	Sort	Sort counts to get gene with highest count on STAR
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	MultiQC	Combine gene body coverage

Tools

Tool	Links
Cut1
__FLATTEN__
__MERGE_COLLECTION__
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/9.3+galaxy1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_tail_tool/9.3+galaxy1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MarkDuplicates/3.1.1.0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/devteam/samtools_idxstats/samtools_idxstats/2.0.5	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/falco/falco/1.2.4+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/featurecounts/featurecounts/2.0.3+galaxy2	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/gtftobed12/gtftobed12/357	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/length_and_gc_content/length_and_gc_content/0.1.2	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/pygenometracks/pygenomeTracks/3.8+galaxy2	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.11a+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/iuc/samtools_view/samtools_view/1.15.1+galaxy2	View in ToolShed
toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.9+galaxy1	View in ToolShed
toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_geneBody_coverage/5.0.3+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_infer_experiment/5.0.3+galaxy0	View in ToolShed
toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_read_distribution/5.0.3+galaxy0	View in ToolShed

To use these workflows in Galaxy you can either click the links to download the workflows, or you can right-click and copy the link to the workflow which can be used in the Galaxy form to import workflows.

Importing into Galaxy

Below are the instructions for importing these workflows directly into your Galaxy server of choice to start using them!

Hands On: Importing a workflow

Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows

Click on galaxy-upload Import at the top-right of the screen

Provide your workflow

Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”

Option 2: Upload the workflow file in the box labelled “Archived Workflow File”

Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Version History

Version	Commit	Time	Comments
11	f81845b85	2025-01-21 10:07:17	Use Falco instead of FastQC in ref-based tutorial
10	9a19075e2	2024-10-18 13:22:04	Update ref-based workflows
9	a1251f286	2024-07-05 09:38:54	Removed 'comments' tags
8	d804d52ac	2024-07-05 09:22:56	Updated tools in 'QC + Mapping + Counting (single+paired)' workflow
7	41dead43e	2023-05-02 10:31:07	add mo orcid to workflows
6	36eb5cf82	2023-04-28 17:26:00	update workflows and tests
5	8fc9c9026	2023-04-25 07:46:15	add creators and licence to workflows
4	dc21d9ddb	2023-04-22 08:29:08	update images and results, rearrange workflow for part1
3	9921a8623	2023-04-21 12:37:10	Update first part of the tutorial
2	4d2f611a6	2022-04-28 15:20:51	subset BAM before gene body coverage
1	8bf6877e4	2022-04-15 11:16:13	add workflow for PE and SE in parallel

For Admins

Installing the workflow tools

wget https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/workflows/qc-mapping-counting-paired-and-single.ga -O workflow.ga
workflow-to-tools -w workflow.ga -o tools.yaml
shed-tools install -g GALAXY -a API_KEY -t tools.yaml
workflow-install -g GALAXY -a API_KEY -w workflow.ga --publish-workflows