- Taxonomy Profiling and Visualization with Krona
- Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation
- Nanopore Preprocessing
- Gene-based Pathogen Identification
- Allele-based Pathogen Identification
- bacterial_genome_annotation
- amr_gene_detection
- Quality and Contamination Control For Genome Assembly
- Genome assembly with Flye
- Assembly polishing with long reads
- Bacterial Genome Assembly using Shovill
- Mass spectrometry: LC-MS preprocessing with XCMS
- Mass spectrometry: GCMS with metaMS
- QIIME2 Ia: multiplexed data (single-end)
- QIIME2 Ib: multiplexed data (paired-end)
- QIIME2 Ic: Demultiplexed data (single-end)
- QIIME2 Id: Demultiplexed data (paired-end)
- QIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)
- QIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)
- QIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis
- QIIME2 VI: Diversity metrics and estimations
- dada2 amplicon analysis pipeline - for paired end data
- MetaProSIP OpenMS 2.8
- Clinical Metaproteomics Quantitation
- Create GRO and TOP complex files
- dcTMD calculations with GROMACS
- MMGBSA calculations with GROMACS
- Fragment-based virtual screening using rDock for docking and SuCOS for pose scoring
- Generic variation analysis on WGS PE data
- Generic variation analysis reporting
- Segmentation and counting of cell nuclei in fluorescence microscopy images
- Parallel Accession Download
- sra_manifest_to_concatenated_fastqs_parallel
- Pox Virus Illumina Amplicon Workflow from half-genomes
- scRNA-seq_preprocessing_10X_cellPlex
- scRNA-seq_preprocessing_10X_v3_Bundle
- Velocyto-on10X-from-bundled
- Velocyto-on10X-filtered-barcodes
- baredSC_1d_logNorm
- baredSC_2d_logNorm
- COVID-19: variation analysis on WGS PE data
- COVID-19: variation analysis reporting
- COVID-19: consensus construction
- COVID-19: variation analysis on ARTIC PE data
- SARS-CoV-2 Illumina Amplicon pipeline - iVar based
- COVID-19: variation analysis on WGS SE data
- COVID-19: variation analysis of ARTIC ONT data
- ATACseq
- ChIPseq_SR
- Get Confident Peaks From ChIP_SR replicates
- Get Confident Peaks From ChIP_PE replicates
- Get Confident Peaks From ATAC or CUTandRUN replicates
- Hi-C_fastqToCool_hicup_cooler
- cHi-C_fastqToCool_hicup_cooler
- Hi-C_juicermediumtabixToCool_cooler
- Hi-C_fastqToPairs_hicup
- CUTandRUN
- ChIPseq_PE
- Average Bigwig between replicates
- Purge-duplicate-contigs-VGP6
- Assembly-Hifi-only-VGP3
- Scaffolding with Hi-C data VGP8
- Mitogenome-Assembly-VGP0
- Assembly-Hifi-HiC-phasing-VGP4
- Generate Nx and Size plots for multiple assemblies
- Purging-duplicates-one-haplotype-VGP6b
- Assembly-Hifi-Trio-phasing-VGP5
- Scaffolding-BioNano-VGP7
- Assembly-decontamination-VGP9
- kmer-profiling-hifi-trio-VGP2
- kmer-profiling-hifi-VGP1
- RNAseq_PE
- RNAseq_SR
- BREW3R
- Repeat masking with RepeatModeler and RepeatMasker
Taxonomy Profiling and Visualization with Krona
Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation
Pathogens of all samples report generation and visualization
Nanopore Preprocessing
Microbiome - QC and Contamination Filtering
Gene-based Pathogen Identification
Nanopore datasets analysis - Phylogenetic Identification - antibiotic resistance genes detection and contigs building
Allele-based Pathogen Identification
Microbiome - Variant calling and Consensus Building
bacterial_genome_annotation
Annotation of an assembled bacterial genomes to detect genes, potential plasmids, integrons and Insertion sequence (IS) elements.
amr_gene_detection
Antimicrobial resistance gene detection from assembled bacterial genomes
Quality and Contamination Control For Genome Assembly
Short paired-end read analysis to provide quality analysis, read cleaning and taxonomy assignation
Genome assembly with Flye
Assemble long reads with Flye, then view assembly statistics and assembly graph
DetailsAssembly polishing with long reads
Racon polish with long reads, x4
DetailsBacterial Genome Assembly using Shovill
Assembly of bacterial paired-end short read data with generation of quality metrics and reports
Mass spectrometry: LC-MS preprocessing with XCMS
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract, filter, align and fill gapand the possibility to annotate isotopes, adducts and fragments using the CAMERA R package (Kuhl, C 2012). https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.html
Mass spectrometry: GCMS with metaMS
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract and the metaMS R package (Wehrens, R 2014) for the field of untargeted metabolomics. https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/gcms/tutorial.html
QIIME2 Ia: multiplexed data (single-end)
Importing single-end multiplexed data (not demultiplexed yet)
DetailsQIIME2 Ib: multiplexed data (paired-end)
Importing paired-end multiplexed data (not demultiplexed yet)
DetailsQIIME2 Ic: Demultiplexed data (single-end)
Importing demultiplexed data (single-end)
DetailsQIIME2 Id: Demultiplexed data (paired-end)
Importing demultiplexed data (paired-end)
DetailsQIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
DetailsQIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
DetailsQIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis
This workflow - Reconstruct phylogeny (insert fragments in a reference) - Alpha rarefaction analysis - Taxonomic analysis
DetailsQIIME2 VI: Diversity metrics and estimations
The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. We can calculate diversity metrics, apply appropriate statistical tests, and visualize the data using the q2-diversity plugin.
Detailsdada2 amplicon analysis pipeline - for paired end data
dada2 amplicon analysis for paired end data The workflow has three main outputs: - the sequence table (output of makeSequenceTable) - the taxonomy (output of assignTaxonomy) - the counts which allow to track the number of sequences in the samples through the steps (output of sequence counts)
MetaProSIP OpenMS 2.8
Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics
DetailsClinical Metaproteomics Quantitation
Create GRO and TOP complex files
dcTMD calculations with GROMACS
Perform dcTMD free energy simulations and calculations
DetailsMMGBSA calculations with GROMACS
MMGBSA simulation and calculation
DetailsFragment-based virtual screening using rDock for docking and SuCOS for pose scoring
Virtual screening of the SARS-CoV-2 main protease with rDock and pose scoring
DetailsGeneric variation analysis on WGS PE data
Generic variation analysis reporting
This workflow takes a VCF dataset of variants produced by any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
Segmentation and counting of cell nuclei in fluorescence microscopy images
This workflow performs segmentation and counting of cell nuclei using fluorescence microscopy images. The segmentation step is performed using Otsu thresholding (Otsu, 1979). The workflow is based on the tutorial: https://training.galaxyproject.org/training-material/topics/imaging/tutorials/imaging-introduction/tutorial.html
DetailsParallel Accession Download
Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump. Creates one job per listed run accession.
Detailssra_manifest_to_concatenated_fastqs_parallel
This workflow takes as input a SRA_manifest from SRA Run Selector and will generate one fastq file or fastq pair of file for each experiment (concatenated multiple runs if necessary). Output will be relabelled to match the column specified by the user.
DetailsPox Virus Illumina Amplicon Workflow from half-genomes
A workflow for the analysis of pox virus genomes sequenced as half-genomes (for ITR resolution) in a tiled-amplicon approach
scRNA-seq_preprocessing_10X_cellPlex
This workflow processes the CMO fastqs with CITE-seq-Count and include the translation step required for cellPlex processing. In parallel it processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.
scRNA-seq_preprocessing_10X_v3_Bundle
This workflow processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.
Velocyto-on10X-from-bundled
Run velocyto to get loom with counts of spliced and unspliced. It will extract the 'barcodes' from the bundled outputs.
Velocyto-on10X-filtered-barcodes
baredSC_1d_logNorm
Run baredSC in 1 dimension in logNorm for 1 to N gaussians and combine models.
DetailsbaredSC_2d_logNorm
Run baredSC in 2 dimensions in logNorm for 1 to N gaussians and combine models.
DetailsCOVID-19: variation analysis on WGS PE data
This workflows performs paired end read mapping with bwa-mem followed by sensitive variant calling across a wide range of AFs with lofreq
COVID-19: variation analysis reporting
This workflow takes a VCF dataset of variants produced by any of the *-variant-calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
COVID-19: consensus construction
Build a consensus sequence from FILTER PASS variants with intrasample allele-frequency above a configurable consensus threshold. Hard-mask regions with low coverage (but not consensus variants within them) and ambiguous sites.
COVID-19: variation analysis on ARTIC PE data
The workflow for Illumina-sequenced ARTIC data builds on the RNASeq workflow for paired-end data using the same steps for mapping and variant calling, but adds extra logic for trimming ARTIC primer sequences off reads with the ivar package. In addition, this workflow uses ivar also to identify amplicons affected by ARTIC primer-binding site mutations and tries to exclude reads derived from such tainted amplicons when calculating allele-frequencies of other variants.
SARS-CoV-2 Illumina Amplicon pipeline - iVar based
Find and annotate variants in ampliconic SARS-CoV-2 Illumina sequencing data and classify samples with pangolin and nextclade
COVID-19: variation analysis on WGS SE data
This workflows performs single end read mapping with bowtie2 followed by sensitive variant calling across a wide range of AFs with lofreq
COVID-19: variation analysis of ARTIC ONT data
This workflow for ONT-sequenced ARTIC data is modeled after the alignment/variant-calling steps of the [ARTIC pipeline](https://artic.readthedocs.io/en/latest/). It performs, essentially, the same steps as that pipeline’s minion command, i.e. read mapping with minimap2 and variant calling with medaka. Like the Illumina ARTIC workflow it uses ivar for primer trimming. Since ONT-sequenced reads have a much higher error rate than Illumina-sequenced reads and are therefor plagued more by false-positive variant calls, this workflow does make no attempt to handle amplicons affected by potential primer-binding site mutations.
ATACseq
This workflow takes as input a collection of paired fastq. It will remove bad quality and adapters with cutadapt. Map with Bowtie2 end-to-end. Will remove reads on MT and unconcordant pairs and pairs with mapping quality below 30 and PCR duplicates. Will compute the pile-up on 5' +- 100bp. Will call peaks and count the number of reads falling in the 1kb region centered on the summit. Will compute 2 normalization for coverage: normalized by million reads and normalized by million reads in peaks. Will plot the number of reads for each fragment length.
ChIPseq_SR
This workflow takes as input a collection of fastqs (single reads). Remove adapters with cutadapt, map with bowtie2. Keep MAPQ30. MACS2 for bam with fixed extension or model.
Get Confident Peaks From ChIP_SR replicates
This workflow takes as input SR BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.
Get Confident Peaks From ChIP_PE replicates
This workflow takes as input PE BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.
Get Confident Peaks From ATAC or CUTandRUN replicates
This workflow takes as input BAM from ATAC-seq or CUT&RUN. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.
Hi-C_fastqToCool_hicup_cooler
This workflow takes as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file using the middle of the fragment as coordinates. The pairs are filtered for MAPQ and sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.
cHi-C_fastqToCool_hicup_cooler
This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and for the region captured. Then, they are sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.
Hi-C_juicermediumtabixToCool_cooler
This workflow uses as input a collection of juicer medium tabix files and a genome name. It builds balanced cool file to the desired resolution.
Hi-C_fastqToPairs_hicup
This workflow takes as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. First truncate the fastq using the cutting sequence to guess the fill-in. Then map the truncated fastq. Then asign to fragment and filter the self-ligated and dandling ends or internal (it can also filter for the size). Then it removes the duplicates. Convert the output to be compatible with juicebox or cooler using the middle of the fragment as coordinates. Finally filter for mapping quality
CUTandRUN
This workflow take as input a collection of paired fastq. Remove adapters with cutadapt, map pairs with bowtie2 allowing dovetail. Keep MAPQ30 and concordant pairs. BAM to BED. MACS2 with "ATAC" parameters.
ChIPseq_PE
This workflow takes as input a collection of paired fastqs. Remove adapters with cutadapt, map pairs with bowtie2. Keep MAPQ30 and concordant pairs. MACS2 for paired bam.
Average Bigwig between replicates
We assume the identifiers of the input list are like: sample_name_replicateID. The identifiers of the output list will be: sample_name
DetailsPurge-duplicate-contigs-VGP6
Purge contigs marked as duplicates by purge_dups (could be haplotypic duplication or overlap duplication). This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5)
Assembly-Hifi-only-VGP3
Scaffolding with Hi-C data VGP8
Mitogenome-Assembly-VGP0
Assembly-Hifi-HiC-phasing-VGP4
Generate Nx and Size plots for multiple assemblies
Purging-duplicates-one-haplotype-VGP6b
Assembly-Hifi-Trio-phasing-VGP5
Scaffolding-BioNano-VGP7
Assembly-decontamination-VGP9
kmer-profiling-hifi-trio-VGP2
Create Meryl Database used for the estimation of assembly parameters and quality control with Merqury. Part of the VGP pipeline.
kmer-profiling-hifi-VGP1
RNAseq_PE
This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.
RNAseq_SR
This workflow takes as input a list of single-reads fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.
BREW3R
This workflow takes a collection of BAM (output of STAR) and a gtf. It extends the input gtf using de novo annotation.
Details