RNAseq_SR

This workflow takes as input a list of single-reads fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.

Author(s):
Lucille Delisle
Release: 0.9
License: MIT
UniqueID: 6a424c0e-eb24-4f61-8750-adc09f4db1b1

RNA-seq single-read Workflow

Inputs dataset

The workflow needs a list of datasets of fastqsanger.
As well as a gtf file with genes
Optional, but recommended: a gtf file with regions to exclude from normalization in Cufflinks.
- For instance a gtf that masks chrM for the mm10 genome:

chrM	chrM_gene	exon	0	16299	.	+	.	gene_id "chrM_gene_plus"; transcript_id "chrM_tx_plus"; exon_id "chrM_ex_plus";
chrM	chrM_gene	exon	0	16299	.	-	.	gene_id "chrM_gene_minus"; transcript_id "chrM_tx_minus"; exon_id "chrM_ex_minus";

Inputs values

forward adapter sequence: this depends on the library preparation. Usually classical Illumina RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter so it will not impact your results.
reference_genome: this field will be adapted to the genomes available for STAR
strandedness: For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence. This will only count alignments that are compatible with your library preparation strategy. This is also used for the stranded coverage and for FPKM computation with cufflinks/StringTie.
cufflinks_FPKM: Whether you want to get FPKM with Cufflinks (pretty long)
stringtie_FPKM: Whether you want to get FPKM/TPM etc... with Stringtie.

Processing

The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp.
The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene and generate strand-specific normalized coverage (on uniquely mapped reads).
A multiQC is run to have an overview of the QC. This can also be used to get the strandedness.
FPKM values for genes and transcripts are computed with cufflinks using correction for multi-mapped reads (this step is optionnal).
FPKM/TPM values for genes are computed with StringTie (this step is optional).
The BAM is filtered to keep only uniquely mapped reads (tag NH:i:1).
Unstranded coverage is computed with bedtools and normalized to the number of million uniquely mapped reads.
The three coverage files are converted to bigwig.

Warning

The coverage stranded output depends on the strandedness of the library:
- If you have an unstranded library, stranded coverages are useless
- If you have a forward stranded library, the label matches the orientation of reads.
- If you have a reverse stranded library, forward should correspond to genes on the forward strand and uses the reads mapped on the reverse strand. reverse should correspond to genes on the reverse strand and uses the reads mapped on the forward strand.

Contribution

@lldelisle wrote the workflow and the tests.

@nagoue updated the tools, made it work in usegalaxy.org, fixed some best practices.

Changelog

[0.9] 2024-09-23

Automatic update

toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.9+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.9+galaxy1
toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.11a+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.11a+galaxy1
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0

[0.8] 2024-07-15

Automatic update

toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.8+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.9+galaxy0

[0.7] 2024-06-25

Automatic update (triggered manually)

toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.6+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.8+galaxy1
toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.2+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.2+galaxy2
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.3+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.3+galaxy1
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/9.3+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/9.3+galaxy1
toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.31.1
toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.1+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/stringtie/stringtie/2.2.3+galaxy0

[0.6] 2024-02-05

Automatic update

toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.4+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.6+galaxy1
toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.2.0
toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.10b+galaxy4 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.11a+galaxy0
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.3+galaxy0
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/9.3+galaxy0

[0.5] 2023-09-15

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.10b+galaxy4

Manual update

Use STAR to compute normalized strand-specific coverage
Add an option to use StringTie to compute FPKM
Make cufflinks step optional

[0.4.1] 2023-09-14

add author in dockstore file

[0.4] 2023-01-16

Automatic update

toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.1+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.2+galaxy1

[0.3] 2022-12-17

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy1

[0.2] 2022-11-02

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy1

[0.1] 2022-10-12

First release.

RNAseq_SR

RNA-seq single-read Workflow

Inputs dataset

Inputs values

Processing

Warning

Contribution

Changelog

[0.9] 2024-09-23

Automatic update

[0.8] 2024-07-15

Automatic update

[0.7] 2024-06-25

Automatic update (triggered manually)

[0.6] 2024-02-05

Automatic update

[0.5] 2023-09-15

Automatic update

Manual update

[0.4.1] 2023-09-14

[0.4] 2023-01-16

Automatic update

[0.3] 2022-12-17

Automatic update

[0.2] 2022-11-02

Automatic update

[0.1] 2022-10-12

The following tools are required to run this workflow.