sra_manifest_to_concatenated_fastqs_parallel

This workflow takes as input a SRA_manifest from SRA Run Selector and will generate one fastq file or fastq pair of file for each experiment (concatenated multiple runs if necessary). Output will be relabelled to match the column specified by the user.

Author(s):
Lucille Delisle
Pierre Osteil
Wolfgang Maier
Release: 0.7
License: MIT
UniqueID: 363898af-e598-4f0e-abd9-e6ded395ce66

SRA manifest to concatenated fastqs

This workflow takes as input a SRA manifest from SRA Run Selector (or a tabular with a header line), downloads all sequencing run data from the SRA and arranges it into per-sample fastq or pairs of fastq datasets.

It will work out the relationship between runs and samples from the user-indicated run and sample columns in the input and will concatenate sequencing run data as needed to obtain per-sample datasets.

Input dataset

The workflow needs a single tabular input dataset, which is supposed to list SRA run identifiers in one column and sample names in another, and which needs to have a header line.
SRA manifests obtained via the SRA Run Selector and turned into tabular format represent valid input.

Input values

Column number with SRA run ID

For manifests obtained through the SRA Run Selector this is column 1
Column number with sample names

The number of the column that should be used to assign sequencing runs to samples The names in the column will also serve as the labels of datasets in the output collection. For manifests obtained through the SRA Run Selector suitable columns might be number 6 (BioSample), 16 (Experiment) or 36 (Sample Name).

Processing

The workflow downloads sequencing run data in fastq format with fasterqdump (one job per SRA run ID).
Run data gets concatenated if it comes from the same sample.

Outputs

There are 2 outputs, one with paired-end datasets, one with single-read datasets.

Limitations

Special characters in sample names (anything that is not an English alphabet character, digit, underscore, dash, space, dot or comma ([a-zA-Z0-9_\- \.,]) will be converted to dashes (-).

Changelog

[0.7] 2024-06-17

Automatic update

toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.2 was updated to toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.3

[0.6] 2024-06-10

Automatic update

toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.1 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2
toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.0+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy0

[0.5] 2024-04-22

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.0+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.0+galaxy1

[0.4] 2024-04-08

Automatic update

toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/9.3+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/9.3+galaxy1
toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.10+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.0+galaxy0

[0.3] 2024-03-11

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.8+galaxy1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.10+galaxy0

[0.2.4] 2024-03-05

Automatic update

toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.4 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/9.3+galaxy0

[0.2.3] 2024-02-05

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1 was updated to toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.2.0

[0.2.2] 2023-11-27

Automatic update

toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.0 was updated to toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.1

[0.2.1] 2023-11-20

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.8+galaxy0 was updated to toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.8+galaxy1

[0.2] 2023-11-10

Automatic update

toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.5+galaxy3 was updated to toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.0.8+galaxy0
toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.1 was updated to toolshed.g2.bx.psu.edu/repos/artbio/concatenate_multiple_datasets/cat_multi_datasets/1.4.2

[0.1] 2023-10-23

First release.