• Gene-based Pathogen Identification
  • Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation
  • Nanopore Preprocessing
  • Allele-based Pathogen Identification
  • Taxonomy Profiling and Visualization with Krona
  • Assembly polishing with long reads
  • Quality and Contamination Control For Genome Assembly
  • Genome assembly with Flye
  • Bacterial Genome Assembly using Shovill
  • bacterial_genome_annotation
  • amr_gene_detection
  • Create GRO and TOP complex files
  • dcTMD calculations with GROMACS
  • Fragment-based virtual screening using rDock for docking and SuCOS for pose scoring
  • MMGBSA calculations with GROMACS
  • COVID-19: variation analysis reporting
  • COVID-19: variation analysis on WGS SE data
  • SARS-CoV-2 Illumina Amplicon pipeline - iVar based
  • COVID-19: variation analysis of ARTIC ONT data
  • COVID-19: variation analysis on WGS PE data
  • COVID-19: consensus construction
  • COVID-19: variation analysis on ARTIC PE data
  • Paired end variant calling in haploid system
  • Generic variation analysis on WGS PE data
  • Generic variation analysis reporting
  • Parallel Accession Download
  • sra_manifest_to_concatenated_fastqs_parallel
  • Segmentation and counting of cell nuclei in fluorescence microscopy images
  • baredSC_1d_logNorm
  • baredSC_2d_logNorm
  • Velocyto-on10X-from-bundled
  • Velocyto-on10X-filtered-barcodes
  • scRNA-seq_preprocessing_10X_cellPlex
  • scRNA-seq_preprocessing_10X_v3_Bundle
  • Differential gene expression for single-cell data using pseudo-bulk counts with edgeR
  • Hi-C_fastqToCool_hicup_cooler
  • cHi-C_fastqToCool_hicup_cooler
  • Hi-C_juicermediumtabixToCool_cooler
  • Hi-C_fastqToPairs_hicup
  • Get Confident Peaks From ChIP_SR replicates
  • Get Confident Peaks From ChIP_PE replicates
  • Get Confident Peaks From ATAC or CUTandRUN replicates
  • ChIPseq_PE
  • CUTandRUN
  • ATACseq
  • ChIPseq_SR
  • Average Bigwig between replicates
  • MetaProSIP OpenMS 2.8
  • Clinical Metaproteomics Data Interpretation
  • Generate a Clinical Metaproteomics Database
  • Clinical Metaproteomics Quantitation
  • Clinical Metaproteomics Verification Workflow
  • Clinical Metaproteomics Discovery Workflow
  • Goseq GO-KEGG Enrichment Analysis
  • RNAseq_DE_filtering_plotting
  • BREW3R
  • RNA-seq for Single-read fastqs
  • RNA-seq for Paired-end fastqs
  • dada2 amplicon analysis pipeline - for paired end data
  • QIIME2 Ia: multiplexed data (single-end)
  • QIIME2 Ib: multiplexed data (paired-end)
  • QIIME2 Ic: Demultiplexed data (single-end)
  • QIIME2 Id: Demultiplexed data (paired-end)
  • QIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis
  • QIIME2 VI: Diversity metrics and estimations
  • QIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)
  • QIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)
  • Repeat masking with RepeatModeler and RepeatMasker
  • Mass spectrometry: GCMS with metaMS
  • Mass spectrometry: LC-MS preprocessing with XCMS
  • Pox Virus Illumina Amplicon Workflow from half-genomes
  • Purging-duplicates-one-haplotype-VGP6b
  • Generate Nx and Size plots for multiple assemblies
  • Mitogenome-Assembly-VGP0
  • Purge-duplicate-contigs-VGP6
  • Assembly-decontamination-VGP9
  • Assembly-Hifi-only-VGP3
  • kmer-profiling-hifi-trio-VGP2
  • Scaffolding-BioNano-VGP7
  • Assembly-Hifi-Trio-phasing-VGP5
  • kmer-profiling-hifi-VGP1
  • Assembly-Hifi-HiC-phasing-VGP4
  • Scaffolding with Hi-C data VGP8

Gene-based Pathogen Identification

Nanopore datasets analysis - Phylogenetic Identification - antibiotic resistance genes detection and contigs building

name:Collectionname:PathoGFAIRname:IWCname:microGalaxy
Details

Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation

Pathogens of all samples report generation and visualization

name:Collectionname:microGalaxyname:PathoGFAIRname:IWC
Details

Nanopore Preprocessing

Microbiome - QC and Contamination Filtering

name:Collectionname:microGalaxyname:PathoGFAIRname:Nanoporename:IWC
Details

Allele-based Pathogen Identification

Microbiome - Variant calling and Consensus Building

name:Collectionname:microGalaxyname:PathoGFAIRname:IWC
Details

Taxonomy Profiling and Visualization with Krona

Microbiome - Taxonomy Profiling

name:Collectionname:microGalaxyname:PathoGFAIRname:IWC
Details

Assembly polishing with long reads

Racon polish with long reads, x4

Details

Quality and Contamination Control For Genome Assembly

Short paired-end read analysis to provide quality analysis, read cleaning and taxonomy assignation

Genomicsfastqbacterial-genomicstaxonomy-assignmentpaired-endqualityABRomicstrimming
Details

Genome assembly with Flye

Assemble long reads with Flye, then view assembly statistics and assembly graph

Details

Bacterial Genome Assembly using Shovill

Assembly of bacterial paired-end short read data with generation of quality metrics and reports

fastqGenomicsbacterial-genomicspaired-endassemblyqualityABRomics
Details

bacterial_genome_annotation

Annotation of an assembled bacterial genomes to detect genes, potential plasmids, integrons and Insertion sequence (IS) elements.

GenomicsfastaABRomicsbacterial-genomicsAnnotationgenome-annotation
Details

amr_gene_detection

Antimicrobial resistance gene detection from assembled bacterial genomes

fastaGenomicsABRomicsantibiotic-resistanceantimicrobial-resistance-genesantimicrobial resistancebacterial-genomicsAMRAMR-detection
Details

Create GRO and TOP complex files

dcTMD calculations with GROMACS

Perform dcTMD free energy simulations and calculations

Details

Fragment-based virtual screening using rDock for docking and SuCOS for pose scoring

Virtual screening of the SARS-CoV-2 main protease with rDock and pose scoring

Details

MMGBSA calculations with GROMACS

MMGBSA simulation and calculation

Details

COVID-19: variation analysis reporting

This workflow takes a VCF dataset of variants produced by any of the *-variant-calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.

COVID-19covid19.galaxyproject.org
Details

COVID-19: variation analysis on WGS SE data

This workflows performs single end read mapping with bowtie2 followed by sensitive variant calling across a wide range of AFs with lofreq

COVID-19covid19.galaxyproject.org
Details

SARS-CoV-2 Illumina Amplicon pipeline - iVar based

Find and annotate variants in ampliconic SARS-CoV-2 Illumina sequencing data and classify samples with pangolin and nextclade

COVID-19ARTICiwc
Details

COVID-19: variation analysis of ARTIC ONT data

This workflow for ONT-sequenced ARTIC data is modeled after the alignment/variant-calling steps of the [ARTIC pipeline](https://artic.readthedocs.io/en/latest/). It performs, essentially, the same steps as that pipeline’s minion command, i.e. read mapping with minimap2 and variant calling with medaka. Like the Illumina ARTIC workflow it uses ivar for primer trimming. Since ONT-sequenced reads have a much higher error rate than Illumina-sequenced reads and are therefor plagued more by false-positive variant calls, this workflow does make no attempt to handle amplicons affected by potential primer-binding site mutations.

COVID-19ARTICONTcovid19.galaxyproject.org
Details

COVID-19: variation analysis on WGS PE data

This workflows performs paired end read mapping with bwa-mem followed by sensitive variant calling across a wide range of AFs with lofreq

COVID-19covid19.galaxyproject.orgiwcemergen_validated
Details

COVID-19: consensus construction

Build a consensus sequence from FILTER PASS variants with intrasample allele-frequency above a configurable consensus threshold. Hard-mask regions with low coverage (but not consensus variants within them) and ambiguous sites.

COVID-19covid19.galaxyproject.org
Details

COVID-19: variation analysis on ARTIC PE data

The workflow for Illumina-sequenced ARTIC data builds on the RNASeq workflow for paired-end data using the same steps for mapping and variant calling, but adds extra logic for trimming ARTIC primer sequences off reads with the ivar package. In addition, this workflow uses ivar also to identify amplicons affected by ARTIC primer-binding site mutations and tries to exclude reads derived from such tainted amplicons when calculating allele-frequencies of other variants.

COVID-19ARTICcovid19.galaxyproject.org
Details

Paired end variant calling in haploid system

Workflow for variant analysis against a reference genome in GenBank format

genericVeuPathHaploid
Details

Generic variation analysis on WGS PE data

Workflow for variant analysis against a reference genome in GenBank format

mpxvgeneric
Details

Generic variation analysis reporting

This workflow takes a VCF dataset of variants produced by any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.

mpvxgeneric
Details

Parallel Accession Download

Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump. Creates one job per listed run accession.

Details

sra_manifest_to_concatenated_fastqs_parallel

This workflow takes as input a SRA_manifest from SRA Run Selector and will generate one fastq file or fastq pair of file for each experiment (concatenated multiple runs if necessary). Output will be relabelled to match the column specified by the user.

Details

Segmentation and counting of cell nuclei in fluorescence microscopy images

This workflow performs segmentation and counting of cell nuclei using fluorescence microscopy images. The segmentation step is performed using Otsu thresholding (Otsu, 1979). The workflow is based on the tutorial: https://training.galaxyproject.org/training-material/topics/imaging/tutorials/imaging-introduction/tutorial.html

Details

baredSC_1d_logNorm

Run baredSC in 1 dimension in logNorm for 1 to N gaussians and combine models.

Details

baredSC_2d_logNorm

Run baredSC in 2 dimensions in logNorm for 1 to N gaussians and combine models.

Details

Velocyto-on10X-from-bundled

Run velocyto to get loom with counts of spliced and unspliced. It will extract the 'barcodes' from the bundled outputs.

name:single-cell
Details

Velocyto-on10X-filtered-barcodes

Run velocyto to get loom with counts of spliced and unspliced

name:single-cell
Details

scRNA-seq_preprocessing_10X_cellPlex

This workflow processes the CMO fastqs with CITE-seq-Count and include the translation step required for cellPlex processing. In parallel it processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.

#single-cell
Details

scRNA-seq_preprocessing_10X_v3_Bundle

This workflow processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.

#single-cell
Details

Differential gene expression for single-cell data using pseudo-bulk counts with edgeR

This workflow uses the decoupler tool in Galaxy to generate pseudobulk counts from an annotated AnnData file obtained from scRNA-seq analysis. Following the pseudobulk step, differential expression genes (DEG) are calculated using the edgeR tool. The workflow also includes data sanitation steps to ensure smooth operation of edgeR and minimizing potential issues. Additionally, a Volcano plot tool is used to visualize the results after the DEG analysis.

Details

Hi-C_fastqToCool_hicup_cooler

This workflow takes as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file using the middle of the fragment as coordinates. The pairs are filtered for MAPQ and sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.

Hi-C
Details

cHi-C_fastqToCool_hicup_cooler

This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and for the region captured. Then, they are sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.

Hi-C
Details

Hi-C_juicermediumtabixToCool_cooler

This workflow uses as input a collection of juicer medium tabix files and a genome name. It builds balanced cool file to the desired resolution.

Hi-C
Details

Hi-C_fastqToPairs_hicup

This workflow takes as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. First truncate the fastq using the cutting sequence to guess the fill-in. Then map the truncated fastq. Then asign to fragment and filter the self-ligated and dandling ends or internal (it can also filter for the size). Then it removes the duplicates. Convert the output to be compatible with juicebox or cooler using the middle of the fragment as coordinates. Finally filter for mapping quality

Hi-C
Details

Get Confident Peaks From ChIP_SR replicates

This workflow takes as input SR BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.

ATAC-seq
Details

Get Confident Peaks From ChIP_PE replicates

This workflow takes as input PE BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.

ATAC-seq
Details

Get Confident Peaks From ATAC or CUTandRUN replicates

This workflow takes as input BAM from ATAC-seq or CUT&RUN. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.

ATAC-seq
Details

ChIPseq_PE

This workflow takes as input a collection of paired fastqs. Remove adapters with cutadapt, map pairs with bowtie2. Keep MAPQ30 and concordant pairs. MACS2 for paired bam.

ChIP
Details

CUTandRUN

This workflow take as input a collection of paired fastq. Remove adapters with cutadapt, map pairs with bowtie2 allowing dovetail. Keep MAPQ30 and concordant pairs. BAM to BED. MACS2 with "ATAC" parameters.

CUTnRUN
Details

ATACseq

This workflow takes as input a collection of paired fastq. It will remove bad quality and adapters with cutadapt. Map with Bowtie2 end-to-end. Will remove reads on MT and unconcordant pairs and pairs with mapping quality below 30 and PCR duplicates. Will compute the pile-up on 5' +- 100bp. Will call peaks and count the number of reads falling in the 1kb region centered on the summit. Will compute 2 normalization for coverage: normalized by million reads and normalized by million reads in peaks. Will plot the number of reads for each fragment length.

ATACseq
Details

ChIPseq_SR

This workflow takes as input a collection of fastqs (single reads). Remove adapters with cutadapt, map with bowtie2. Keep MAPQ30. MACS2 for bam with fixed extension or model.

ChIP
Details

Average Bigwig between replicates

We assume the identifiers of the input list are like: sample_name_replicateID. The identifiers of the output list will be: sample_name

Details

MetaProSIP OpenMS 2.8

Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics

Details

Clinical Metaproteomics Data Interpretation

This workflow will perform taxonomic and functional annotations using Unipept and statistical analysis using MSstatsTMT.

name:clinicalMP
Details

Generate a Clinical Metaproteomics Database

The workflow begins with the Database Generation process. The Galaxy-P team has developed a workflow that collects protein sequences from known disease-causing microorganisms to build a comprehensive database. This extensive database is then refined into a smaller, more relevant dataset using the Metanovo tool.

name:clinicalMP
Details

Clinical Metaproteomics Quantitation

Clinical Metaproteomics 4: Quantitation

name:clinicalMP
Details

Clinical Metaproteomics Verification Workflow

In proteomics research, verifying detected peptides is essential for ensuring data accuracy and biological relevance. This tutorial continues from the clinical metaproteomics discovery workflow, focusing on verifying identified microbial peptides using the PepQuery tool.

name:clinicalMP
Details

Clinical Metaproteomics Discovery Workflow

Workflow for clinical metaproteomics database searching

name:clinicalMP
Details

Goseq GO-KEGG Enrichment Analysis

This workflow is used for GO and KEGG enrichment analysis using GOseq tools.

Details

RNAseq_DE_filtering_plotting

This workflow can only work on an experimental setup with exactly 2 conditions. It takes two collections of count tables as input and performs differential expression analysis. Additionally it filters for DE genes based on adjusted p-value and log2 fold changes thresholds. It also generates informative plots.

transcriptomicsRNAseq
Details

BREW3R

This workflow takes a collection of BAM (output of STAR) and a gtf. It extends the input gtf using de novo annotation.

Details

RNA-seq for Single-read fastqs

This workflow takes as input a list of single-end fastqs. Adapters and bad quality bases are removed with fastp. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. Alternatively, featureCounts can be used to count the reads/fragments per gene. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.

RNAseqtranscriptomics
Details

RNA-seq for Paired-end fastqs

This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with fastp. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. Alternatively, featureCounts can be used to count the reads/fragments per gene. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.

RNAseqtranscriptomics
Details

dada2 amplicon analysis pipeline - for paired end data

dada2 amplicon analysis for paired end data The workflow has three main outputs: - the sequence table (output of makeSequenceTable) - the taxonomy (output of assignTaxonomy) - the counts which allow to track the number of sequences in the samples through the steps (output of sequence counts)

name:amplicon
Details

QIIME2 Ia: multiplexed data (single-end)

Importing single-end multiplexed data (not demultiplexed yet)

Details

QIIME2 Ib: multiplexed data (paired-end)

Importing paired-end multiplexed data (not demultiplexed yet)

Details

QIIME2 Ic: Demultiplexed data (single-end)

Importing demultiplexed data (single-end)

Details

QIIME2 Id: Demultiplexed data (paired-end)

Importing demultiplexed data (paired-end)

Details

QIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis

This workflow - Reconstruct phylogeny (insert fragments in a reference) - Alpha rarefaction analysis - Taxonomic analysis

Details

QIIME2 VI: Diversity metrics and estimations

The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. We can calculate diversity metrics, apply appropriate statistical tests, and visualize the data using the q2-diversity plugin.

Details

QIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)

Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.

Details

QIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)

Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.

Details

Repeat masking with RepeatModeler and RepeatMasker

Mass spectrometry: GCMS with metaMS

This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract and the metaMS R package (Wehrens, R 2014) for the field of untargeted metabolomics. https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/gcms/tutorial.html

metabolomicsMSworkflow4metabolomicsGC-MSGTNmetaMS
Details

Mass spectrometry: LC-MS preprocessing with XCMS

This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract, filter, align and fill gapand the possibility to annotate isotopes, adducts and fragments using the CAMERA R package (Kuhl, C 2012). https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.html

metabolomicsMSLC-MSworkflow4metabolomicsxcmsGTN
Details

Pox Virus Illumina Amplicon Workflow from half-genomes

A workflow for the analysis of pox virus genomes sequenced as half-genomes (for ITR resolution) in a tiled-amplicon approach

poxvirology
Details

Purging-duplicates-one-haplotype-VGP6b

VGP_curated
Details

Generate Nx and Size plots for multiple assemblies

Mitogenome-Assembly-VGP0

ReviewedVGP
Details

Purge-duplicate-contigs-VGP6

Purge contigs marked as duplicates by purge_dups (could be haplotypic duplication or overlap duplication). This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5)

VGP_curated
Details

Assembly-decontamination-VGP9

VGP_curated
Details

Assembly-Hifi-only-VGP3

VGPReviewed
Details

kmer-profiling-hifi-trio-VGP2

Create Meryl Database used for the estimation of assembly parameters and quality control with Merqury. Part of the VGP pipeline.

ReviewedVGP
Details

Scaffolding-BioNano-VGP7

VGP_curated
Details

Assembly-Hifi-Trio-phasing-VGP5

VGPReviewed
Details

kmer-profiling-hifi-VGP1

Performs k-mer profiling on PacBio data and generates GenomeScope plots and summary for genome characteristics assessment.

ReviewedVGP
Details

Assembly-Hifi-HiC-phasing-VGP4

VGPReviewed
Details

Scaffolding with Hi-C data VGP8

Scaffolding using HiC data with YAHS.

VGP_curated
Details