dada2 amplicon analysis pipeline - for paired end data
dada2 amplicon analysis for paired end data The workflow has three main outputs: - the sequence table (output of makeSequenceTable) - the taxonomy (output of assignTaxonomy) - the counts which allow to track the number of sequences in the samples through the steps (output of sequence counts)
- Author(s):
- Release: 0.2
- License: MIT
- UniqueID: 11d717f8-92ef-4dd7-bda7-c193175fece2
Dada2: amplicon analysis for paired end data
Inputs dataset
Paired input data
paired input collection in FASTQ format
Inputs values
Read length forward/reverse reads
length of the forward/reverse reads to which they should be truncated in the filter and trim stepPool samples
pooling may increase sensitivityReference database
that should be used for taxonomic assignment
Processing
The workflow follows the steps described in the dada2 tutorial.
As a first step the input collection is sorted. This is important because the dada2 step outputs a collection in sorted order. If the input collection would not be sorted then the mergePairs step samples would be mixed up.
FilterAndTrim
Quality control by filtering and trimming readsQualityProfile
is called before and after the FilterAndTrim stepUnzip Collection
separates forward and reverse reads (the next steps are evaluated separately on forward and reverse reads)learnErrors
learn error ratesdada
filter noisy readsmergePairs
merge forward and reverse readsmakeSequenceTable
create the sequence tableremoveBimeraDenovo
remove chimeric sequencsassignTaxonomy
assign taxonomic information from a reference data base
TODO
Some possibilities to extend/improve the workflow
- output BIOM
- use ASV1, ... in sequence table and taxonomy output, and output additional fasta
- allow to use custom taxonomy / make it optional