- Gene-based Pathogen Identification
- Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation
- Nanopore Preprocessing
- Allele-based Pathogen Identification
- Taxonomy Profiling and Visualization with Krona
- Assembly polishing with long reads
- Quality and Contamination Control For Genome Assembly
- Genome assembly with Flye
- Bacterial Genome Assembly using Shovill
- bacterial_genome_annotation
- amr_gene_detection
- Create GRO and TOP complex files
- dcTMD calculations with GROMACS
- Fragment-based virtual screening using rDock for docking and SuCOS for pose scoring
- MMGBSA calculations with GROMACS
- COVID-19: variation analysis reporting
- COVID-19: variation analysis on WGS SE data
- SARS-CoV-2 Illumina Amplicon pipeline - iVar based
- COVID-19: variation analysis of ARTIC ONT data
- COVID-19: variation analysis on WGS PE data
- COVID-19: consensus construction
- COVID-19: variation analysis on ARTIC PE data
- Paired end variant calling in haploid system
- Generic variation analysis on WGS PE data
- Generic variation analysis reporting
- Parallel Accession Download
- sra_manifest_to_concatenated_fastqs_parallel
- Segmentation and counting of cell nuclei in fluorescence microscopy images
- baredSC_1d_logNorm
- baredSC_2d_logNorm
- Velocyto-on10X-from-bundled
- Velocyto-on10X-filtered-barcodes
- scRNA-seq_preprocessing_10X_cellPlex
- scRNA-seq_preprocessing_10X_v3_Bundle
- Differential gene expression for single-cell data using pseudo-bulk counts with edgeR
- Hi-C_fastqToCool_hicup_cooler
- cHi-C_fastqToCool_hicup_cooler
- Hi-C_juicermediumtabixToCool_cooler
- Hi-C_fastqToPairs_hicup
- Get Confident Peaks From ChIP_SR replicates
- Get Confident Peaks From ChIP_PE replicates
- Get Confident Peaks From ATAC or CUTandRUN replicates
- ChIPseq_PE
- CUTandRUN
- ATACseq
- ChIPseq_SR
- Average Bigwig between replicates
- MetaProSIP OpenMS 2.8
- Clinical Metaproteomics Data Interpretation
- Generate a Clinical Metaproteomics Database
- Clinical Metaproteomics Quantitation
- Clinical Metaproteomics Verification Workflow
- Clinical Metaproteomics Discovery Workflow
- Goseq GO-KEGG Enrichment Analysis
- RNAseq_DE_filtering_plotting
- BREW3R
- RNA-seq for Single-read fastqs
- RNA-seq for Paired-end fastqs
- dada2 amplicon analysis pipeline - for paired end data
- QIIME2 Ia: multiplexed data (single-end)
- QIIME2 Ib: multiplexed data (paired-end)
- QIIME2 Ic: Demultiplexed data (single-end)
- QIIME2 Id: Demultiplexed data (paired-end)
- QIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis
- QIIME2 VI: Diversity metrics and estimations
- QIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)
- QIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)
- Repeat masking with RepeatModeler and RepeatMasker
- Mass spectrometry: GCMS with metaMS
- Mass spectrometry: LC-MS preprocessing with XCMS
- Pox Virus Illumina Amplicon Workflow from half-genomes
- Purging-duplicates-one-haplotype-VGP6b
- Generate Nx and Size plots for multiple assemblies
- Mitogenome-Assembly-VGP0
- Purge-duplicate-contigs-VGP6
- Assembly-decontamination-VGP9
- Assembly-Hifi-only-VGP3
- kmer-profiling-hifi-trio-VGP2
- Scaffolding-BioNano-VGP7
- Assembly-Hifi-Trio-phasing-VGP5
- kmer-profiling-hifi-VGP1
- Assembly-Hifi-HiC-phasing-VGP4
- Scaffolding with Hi-C data VGP8
Gene-based Pathogen Identification
Nanopore datasets analysis - Phylogenetic Identification - antibiotic resistance genes detection and contigs building
Pathogen Detection PathoGFAIR Samples Aggregation and Visualisation
Pathogens of all samples report generation and visualization
Nanopore Preprocessing
Microbiome - QC and Contamination Filtering
Allele-based Pathogen Identification
Microbiome - Variant calling and Consensus Building
Taxonomy Profiling and Visualization with Krona
Assembly polishing with long reads
Racon polish with long reads, x4
DetailsQuality and Contamination Control For Genome Assembly
Short paired-end read analysis to provide quality analysis, read cleaning and taxonomy assignation
Genome assembly with Flye
Assemble long reads with Flye, then view assembly statistics and assembly graph
DetailsBacterial Genome Assembly using Shovill
Assembly of bacterial paired-end short read data with generation of quality metrics and reports
bacterial_genome_annotation
Annotation of an assembled bacterial genomes to detect genes, potential plasmids, integrons and Insertion sequence (IS) elements.
amr_gene_detection
Antimicrobial resistance gene detection from assembled bacterial genomes
Create GRO and TOP complex files
dcTMD calculations with GROMACS
Perform dcTMD free energy simulations and calculations
DetailsFragment-based virtual screening using rDock for docking and SuCOS for pose scoring
Virtual screening of the SARS-CoV-2 main protease with rDock and pose scoring
DetailsMMGBSA calculations with GROMACS
MMGBSA simulation and calculation
DetailsCOVID-19: variation analysis reporting
This workflow takes a VCF dataset of variants produced by any of the *-variant-calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
COVID-19: variation analysis on WGS SE data
This workflows performs single end read mapping with bowtie2 followed by sensitive variant calling across a wide range of AFs with lofreq
SARS-CoV-2 Illumina Amplicon pipeline - iVar based
Find and annotate variants in ampliconic SARS-CoV-2 Illumina sequencing data and classify samples with pangolin and nextclade
COVID-19: variation analysis of ARTIC ONT data
This workflow for ONT-sequenced ARTIC data is modeled after the alignment/variant-calling steps of the [ARTIC pipeline](https://artic.readthedocs.io/en/latest/). It performs, essentially, the same steps as that pipeline’s minion command, i.e. read mapping with minimap2 and variant calling with medaka. Like the Illumina ARTIC workflow it uses ivar for primer trimming. Since ONT-sequenced reads have a much higher error rate than Illumina-sequenced reads and are therefor plagued more by false-positive variant calls, this workflow does make no attempt to handle amplicons affected by potential primer-binding site mutations.
COVID-19: variation analysis on WGS PE data
This workflows performs paired end read mapping with bwa-mem followed by sensitive variant calling across a wide range of AFs with lofreq
COVID-19: consensus construction
Build a consensus sequence from FILTER PASS variants with intrasample allele-frequency above a configurable consensus threshold. Hard-mask regions with low coverage (but not consensus variants within them) and ambiguous sites.
COVID-19: variation analysis on ARTIC PE data
The workflow for Illumina-sequenced ARTIC data builds on the RNASeq workflow for paired-end data using the same steps for mapping and variant calling, but adds extra logic for trimming ARTIC primer sequences off reads with the ivar package. In addition, this workflow uses ivar also to identify amplicons affected by ARTIC primer-binding site mutations and tries to exclude reads derived from such tainted amplicons when calculating allele-frequencies of other variants.
Paired end variant calling in haploid system
Workflow for variant analysis against a reference genome in GenBank format
Generic variation analysis on WGS PE data
Generic variation analysis reporting
This workflow takes a VCF dataset of variants produced by any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates tabular lists of variants by Samples and by Variant, and an overview plot of variants and their allele-frequencies.
Parallel Accession Download
Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump. Creates one job per listed run accession.
Detailssra_manifest_to_concatenated_fastqs_parallel
This workflow takes as input a SRA_manifest from SRA Run Selector and will generate one fastq file or fastq pair of file for each experiment (concatenated multiple runs if necessary). Output will be relabelled to match the column specified by the user.
DetailsSegmentation and counting of cell nuclei in fluorescence microscopy images
This workflow performs segmentation and counting of cell nuclei using fluorescence microscopy images. The segmentation step is performed using Otsu thresholding (Otsu, 1979). The workflow is based on the tutorial: https://training.galaxyproject.org/training-material/topics/imaging/tutorials/imaging-introduction/tutorial.html
DetailsbaredSC_1d_logNorm
Run baredSC in 1 dimension in logNorm for 1 to N gaussians and combine models.
DetailsbaredSC_2d_logNorm
Run baredSC in 2 dimensions in logNorm for 1 to N gaussians and combine models.
DetailsVelocyto-on10X-from-bundled
Run velocyto to get loom with counts of spliced and unspliced. It will extract the 'barcodes' from the bundled outputs.
Velocyto-on10X-filtered-barcodes
scRNA-seq_preprocessing_10X_cellPlex
This workflow processes the CMO fastqs with CITE-seq-Count and include the translation step required for cellPlex processing. In parallel it processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.
scRNA-seq_preprocessing_10X_v3_Bundle
This workflow processes the Gene Expresion fastqs with STARsolo, filter cells with DropletUtils and reformat all outputs to be easily used by the function 'Read10X' from Seurat.
Differential gene expression for single-cell data using pseudo-bulk counts with edgeR
This workflow uses the decoupler tool in Galaxy to generate pseudobulk counts from an annotated AnnData file obtained from scRNA-seq analysis. Following the pseudobulk step, differential expression genes (DEG) are calculated using the edgeR tool. The workflow also includes data sanitation steps to ensure smooth operation of edgeR and minimizing potential issues. Additionally, a Volcano plot tool is used to visualize the results after the DEG analysis.
DetailsHi-C_fastqToCool_hicup_cooler
This workflow takes as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file using the middle of the fragment as coordinates. The pairs are filtered for MAPQ and sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.
cHi-C_fastqToCool_hicup_cooler
This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and for the region captured. Then, they are sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.
Hi-C_juicermediumtabixToCool_cooler
This workflow uses as input a collection of juicer medium tabix files and a genome name. It builds balanced cool file to the desired resolution.
Hi-C_fastqToPairs_hicup
This workflow takes as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. First truncate the fastq using the cutting sequence to guess the fill-in. Then map the truncated fastq. Then asign to fragment and filter the self-ligated and dandling ends or internal (it can also filter for the size). Then it removes the duplicates. Convert the output to be compatible with juicebox or cooler using the middle of the fragment as coordinates. Finally filter for mapping quality
Get Confident Peaks From ChIP_SR replicates
This workflow takes as input SR BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.
Get Confident Peaks From ChIP_PE replicates
This workflow takes as input PE BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.
Get Confident Peaks From ATAC or CUTandRUN replicates
This workflow takes as input BAM from ATAC-seq or CUT&RUN. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using all subsets combined. Only peaks called using a combination of all subsets which have summits intersecting the intersection of at least x replicates will be kept.
ChIPseq_PE
This workflow takes as input a collection of paired fastqs. Remove adapters with cutadapt, map pairs with bowtie2. Keep MAPQ30 and concordant pairs. MACS2 for paired bam.
CUTandRUN
This workflow take as input a collection of paired fastq. Remove adapters with cutadapt, map pairs with bowtie2 allowing dovetail. Keep MAPQ30 and concordant pairs. BAM to BED. MACS2 with "ATAC" parameters.
ATACseq
This workflow takes as input a collection of paired fastq. It will remove bad quality and adapters with cutadapt. Map with Bowtie2 end-to-end. Will remove reads on MT and unconcordant pairs and pairs with mapping quality below 30 and PCR duplicates. Will compute the pile-up on 5' +- 100bp. Will call peaks and count the number of reads falling in the 1kb region centered on the summit. Will compute 2 normalization for coverage: normalized by million reads and normalized by million reads in peaks. Will plot the number of reads for each fragment length.
ChIPseq_SR
This workflow takes as input a collection of fastqs (single reads). Remove adapters with cutadapt, map with bowtie2. Keep MAPQ30. MACS2 for bam with fixed extension or model.
Average Bigwig between replicates
We assume the identifiers of the input list are like: sample_name_replicateID. The identifiers of the output list will be: sample_name
DetailsMetaProSIP OpenMS 2.8
Automated inference of stable isotope incorporation rates in proteins for functional metaproteomics
DetailsClinical Metaproteomics Data Interpretation
This workflow will perform taxonomic and functional annotations using Unipept and statistical analysis using MSstatsTMT.
Generate a Clinical Metaproteomics Database
The workflow begins with the Database Generation process. The Galaxy-P team has developed a workflow that collects protein sequences from known disease-causing microorganisms to build a comprehensive database. This extensive database is then refined into a smaller, more relevant dataset using the Metanovo tool.
Clinical Metaproteomics Quantitation
Clinical Metaproteomics Verification Workflow
In proteomics research, verifying detected peptides is essential for ensuring data accuracy and biological relevance. This tutorial continues from the clinical metaproteomics discovery workflow, focusing on verifying identified microbial peptides using the PepQuery tool.
Clinical Metaproteomics Discovery Workflow
Goseq GO-KEGG Enrichment Analysis
This workflow is used for GO and KEGG enrichment analysis using GOseq tools.
DetailsRNAseq_DE_filtering_plotting
This workflow can only work on an experimental setup with exactly 2 conditions. It takes two collections of count tables as input and performs differential expression analysis. Additionally it filters for DE genes based on adjusted p-value and log2 fold changes thresholds. It also generates informative plots.
BREW3R
This workflow takes a collection of BAM (output of STAR) and a gtf. It extends the input gtf using de novo annotation.
DetailsRNA-seq for Single-read fastqs
This workflow takes as input a list of single-end fastqs. Adapters and bad quality bases are removed with fastp. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. Alternatively, featureCounts can be used to count the reads/fragments per gene. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.
RNA-seq for Paired-end fastqs
This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with fastp. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously as well as normalized coverage (per million mapped reads) on uniquely mapped reads. The counts are reprocessed to be similar to HTSeq-count output. Alternatively, featureCounts can be used to count the reads/fragments per gene. FPKM are computed with cufflinks and/or with StringTie. The unstranded normalized coverage is computed with bedtools.
dada2 amplicon analysis pipeline - for paired end data
dada2 amplicon analysis for paired end data The workflow has three main outputs: - the sequence table (output of makeSequenceTable) - the taxonomy (output of assignTaxonomy) - the counts which allow to track the number of sequences in the samples through the steps (output of sequence counts)
QIIME2 Ia: multiplexed data (single-end)
Importing single-end multiplexed data (not demultiplexed yet)
DetailsQIIME2 Ib: multiplexed data (paired-end)
Importing paired-end multiplexed data (not demultiplexed yet)
DetailsQIIME2 Ic: Demultiplexed data (single-end)
Importing demultiplexed data (single-end)
DetailsQIIME2 Id: Demultiplexed data (paired-end)
Importing demultiplexed data (paired-end)
DetailsQIIME2-III-V-Phylogeny-Rarefaction-Taxonomic-Analysis
This workflow - Reconstruct phylogeny (insert fragments in a reference) - Alpha rarefaction analysis - Taxonomic analysis
DetailsQIIME2 VI: Diversity metrics and estimations
The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. We can calculate diversity metrics, apply appropriate statistical tests, and visualize the data using the q2-diversity plugin.
DetailsQIIME2 IIa: Denoising (sequence quality control) and feature table creation (single-end)
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
DetailsQIIME2 IIb: Denoising (sequence quality control) and feature table creation (paired-end)
Use DADA2 for sequence quality control. DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences.
DetailsRepeat masking with RepeatModeler and RepeatMasker
Mass spectrometry: GCMS with metaMS
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract and the metaMS R package (Wehrens, R 2014) for the field of untargeted metabolomics. https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/gcms/tutorial.html
Mass spectrometry: LC-MS preprocessing with XCMS
This workflow is composed with the XCMS tool R package (Smith, C.A. 2006) able to extract, filter, align and fill gapand the possibility to annotate isotopes, adducts and fragments using the CAMERA R package (Kuhl, C 2012). https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.html
Pox Virus Illumina Amplicon Workflow from half-genomes
A workflow for the analysis of pox virus genomes sequenced as half-genomes (for ITR resolution) in a tiled-amplicon approach
Purging-duplicates-one-haplotype-VGP6b
Generate Nx and Size plots for multiple assemblies
Mitogenome-Assembly-VGP0
Purge-duplicate-contigs-VGP6
Purge contigs marked as duplicates by purge_dups (could be haplotypic duplication or overlap duplication). This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5)
Assembly-decontamination-VGP9
Assembly-Hifi-only-VGP3
kmer-profiling-hifi-trio-VGP2
Create Meryl Database used for the estimation of assembly parameters and quality control with Merqury. Part of the VGP pipeline.
Scaffolding-BioNano-VGP7
Assembly-Hifi-Trio-phasing-VGP5
kmer-profiling-hifi-VGP1
Performs k-mer profiling on PacBio data and generates GenomeScope plots and summary for genome characteristics assessment.