scRNA-seq_preprocessing_10X_cellPlex

This workflow processes the CMO fastqs with CITE-seq-Count and include the translation step required for cellPlex processing. In parallel it processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.

Author(s):
Lucille Delisle
Mehmet Tekman
Hans-Rudolf Hotz
Daniel Blankenberg
Wendi Bacon
Release: 0.5
License: MIT
UniqueID: 470536e7-ac58-4859-94d6-2085be9c2e17

Single-cell RNA-seq fastq to matrix for 10X data

These workflows are inspired by the training material. Except that the output is in a 'bundle' format: three files (one matrix, one with genes, one with barcodes) which is similar to the cellranger output format.

Both are designed for fastqs from 10X libraries v3. One is for regular 10X library (one library per sample), while the other one is for CellPlex 10X library which allows to multiplex samples using CMOs (see this blog article).

Input datasets

Specific for each experiment:
- For both workflows: you need a list of pairs of fastqs with gene expression.
- For CellPlex: you need in addition a list of pairs of fastqs with CMO.
- For CellPlex: you need a list of csv which describes samples and CMO used:
  - first column is the sequence and second column is the name /!\ The order of samples need to be exactly the same between the collection of fastqs of CMO and the collection of csv.
Common for all experiments:
- Gene annotations: A gtf file with gene locations
- List of barcodes used by 10X. You can download it at https://zenodo.org/record/3457880/files/3M-february-2018.txt.gz

Input values

reference genome: this genome needs to be available for STAR
Barcode Size is same size of the Read: if the length of your R1 of GEX matches the size of cell barcode + UMI set to true. If your R1 contains trailling A, put false.
number of cells: If you make it too large no cell barcode correction will be performed to demultiplex CMOs.

Processing

Gene expression processing:
- Reads are aligned to the genome, asigned to genes, cell barcode and UMI with STAR Solo
- MultiQC report the mapping rate and the number of reads attributed to genes
- The output of STAR Solo is filtered with Droplet Utils to remove cellular barcodes which are probably empty.
- The output of Droplet Utils is reorganized to be:

Main Collection:
    - Sample 1:
        - matrix.mtx
        - barcodes.tsv
        - genes.tsv
    - Sample 2:
        - matrix.mtx
        - barcodes.tsv
        - genes.tsv
...

For the CellPlex workflow:

CMO processing:
- CITE-Seq Count is used to asign reads and generate a matrix where 'genes' are the CMO and 'unmapped'.
- Cellular barcodes are translated to match the cellular barcodes of Gene expression see this article.
- Reorganize the output with UMI matrices to match the same structure as gene expression matrices.

Test data

The test dataset has been produced to make it as small as possible in order to make the workflow pass on CI.

The CMO reads come from zenodo and have been sampled to 0.1 with seqtk.
The GEX reads come from SRR13948489 but have been subsetted to the cells selected in the above zenodo.

Changelog

[0.5] 2024-09-25

Manual update

toolshed.g2.bx.psu.edu/repos/iuc/rna_starsolo/rna_starsolo/2.7.11a+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rna_starsolo/rna_starsolo/2.7.11a+galaxy1
toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.24.1+galaxy0

[0.4] 2024-04-08

Automatic update

toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy1
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.3+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.3+galaxy1
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy1

[0.3] 2024-02-12

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/rna_starsolo/rna_starsolo/2.7.10b+galaxy4 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rna_starsolo/rna_starsolo/2.7.11a+galaxy0
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy0
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.3+galaxy0

[0.2] 2024-02-05

Tool updates

toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.10b+galaxy3 was updated to toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.10b+galaxy4
pick_value was replaced by toolshed.g2.bx.psu.edu/repos/iuc/pick_value/pick_value/0.2.0

[0.1] 2023-12-21

First release.

scRNA-seq_preprocessing_10X_cellPlex

Single-cell RNA-seq fastq to matrix for 10X data

Input datasets

Input values

Processing

Test data

Changelog

[0.5] 2024-09-25

Manual update

[0.4] 2024-04-08

Automatic update

[0.3] 2024-02-12

Automatic update

[0.2] 2024-02-05

Tool updates

[0.1] 2023-12-21

The following tools are required to run this workflow.