Clean and manage Sanger sequences from raw files to aligned consensus

Author(s)	Coline Royaux
Reviewers

Overview
Questions:

How to clean Sanger sequencing files?

Objectives:

Learn how to manage sequencing files (AB1, FASTQ, FASTA)

Learn how to clean your Sanger sequences in an automated and reproducible way

Requirements:

Introduction to Galaxy Analyses

Time estimation: 1 hour

Supporting Materials:

Datasets

Workflows

FAQs

instances Available on these Galaxies

Known Working

UseGalaxy.eu ✅ ⭐️

UseGalaxy.fr ✅ ⭐️

UseGalaxy.cz ✅

Published: Jan 8, 2024

Last modification: Mar 5, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00383

rating Rating: 1.0 (1 recent ratings, 3 all time)

version Revision: 2

The objective of this tutorial is to learn how to clean and manage AB1 data files freshly obtained from Sanger sequencing. This kind of sequencing is targeting a specific sequence with short single DNA strands called primers. These primers are delimiting ends of the targeted marker. Usually, one gets two .ab1 files for each sample, representing the sense (forward) and the antisense (reverse) strands of DNA.

Here, we’ll be using raw data from “AOPEP variants as a novel cause of recessive dystonia: Generalized dystonia and dystonia-parkinsonism” 2022. In this article, two DNA markers are investiguated CHD8 (Chromodomain-helicase-DNA-binding protein 8) and AOPEP (Aminopeptidase O Putative). We’ll focus on CHD8 sequences but you can try to apply the same steps on the AOPEP sequences to practice after the tutorial !

In the first section of the tutorial, we’ll be preparing primer’s data by:

selecting the right primer sequences with the identifier;
removing eventual gaps included in the sequences;
and compute the reverse-complement sequence for the antisense primer only.

In the second section of the tutorial, we’ll be preparing the Sanger sequences data by:

extracting ab1 files of the interest sequence (CHD8) and separating sense and antisense sequences in two distinct data collections;
converting ab1 files to FASTQ to permit its use in the following tools;
trimming low quality ends of the sequences;
compute the reverse-complement for the antisense sequence only;
align sense and antisense sequences;
obtain a consensus sequence (which results the correspondance between nucleotides of the sense and the antisense sequences) for each three samples.

In the third section of the tutorial, primers and all consensus sequences are finally merged into a single file to be aligned and verified.

Consider a double-strand DNA molecule with the following sequences:

Open image in new tab

Figure 1: Double-strand DNA

When sequencing, each strand of DNA are read separately in the 5’-3’ orientation. Hence, in the sequence files each strand are provided as:

Open image in new tab

Figure 2: Single-strand DNA sequences in output file

To get the antisense sequence in its original orientation, the reverse sequence is computed:

Open image in new tab

Figure 3: Reversed antisense sequence

To align sense and antisense sequence, the complement sequence of the reversed antisense sequence is computed:

Open image in new tab

Figure 4: Reversed antisense sequence

The two sequences can be aligned now:

Open image in new tab

Figure 5: Aligned sense and antisense sequences

Agenda

In this tutorial, we will cover:

Get data

Prepare primer data

Separate and format primers files

Prepare sequence data

Unzip data files

Filter collection to separate sense and antisense sequence files

Convert AB1 sequence files to FASTQ and trim low-quality ends

Compute reverse complement sequence for antisense (reverse) sequences only

Merge corresponding sense and antisense sequences single files

Convert FASTQ files to FASTA

Align sequences and retrieve consensus for each sequence

Manage primers and sequences

Merge and align consensus sequences file and primer files

Check your sequences belongs to the right taxonomic group by computing a BLAST on the NCBI database

Conclusion

AOPEP Sanger files

Get data

Authors of “AOPEP variants as a novel cause of recessive dystonia: Generalized dystonia and dystonia-parkinsonism” 2022 have shared openly their raw AB1 files on Zenodo.

Hands On: Data Upload
Create a new history for this tutorial
Import the files from Zenodo :
https://zenodo.org/records/7104640/files/AOPEP_and_CHD8_sequences_20220907.zip
Change Type (set all): from “Auto-detect” to zip and click Start

Copy the link location

Click galaxy-upload Upload Data at the top of the tool panel

Select galaxy-wf-edit Paste/Fetch Data

Paste the link(s) into the text field

Press Start

Close the window

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

Go into Libraries (left panel)

Navigate to the correct folder as indicated by your instructor.

On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.

Select the desired files

Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu

In the pop-up window, choose

“Select history”: the history you want to import the data to (or create a new one)

Click on Import
Create primer FASTA file, copy:
>Forward_CHD8
GAGGTGAAAGAATCATAAATTGG
>Reverse_CHD8
CCCTGTGTACAAATAGCTTTTGT
>Forward_AOPEP
TCATGGTTCCAGGCAGAGTTATT
>Reverse_AOPEP
TGCTGTGACAAGCCAACCAATGG
Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

Select Paste/Fetch Data

Paste into the text field

Change Type (set all): from “Auto-detect” to fasta

Change the name from “New File” to “Primer file”

Click Start

Note these primer sequences were invented for the purpose of the tutorial, it is not the sequences used in the publication.

Prepare primer data

Separate and format primers files

Primers must be separated in distinct files because sense (forward) and antisense (reverse) primers won’t be subjected to the same formating.

Hands On: Create separate files for each primer

Filter FASTA ( Galaxy version 2.3) with the following parameters:

param-file “FASTA sequences”: Primer file

“Criteria for filtering on the headers”: Regular expression on the headers

“Regular expression pattern the header should match”: Reverse_CHD8

Add tags “#Primer” and “#Reverse”

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

To tag a dataset:

Click on the dataset to expand it

Click on Add Tags galaxy-tags

Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).

Press Enter

Check that the tag appears below the dataset name

Tags beginning with # are special!

They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;

dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);

datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;

datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a dedicated #nametag tutorial.

Expand one of the output datasets of the tool (by clicking on it)

Click re-run galaxy-refresh the tool

This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.

Filter FASTA ( Galaxy version 2.3) with the following parameters:

param-file “FASTA sequences”: Primer file

“Criteria for filtering on the headers”: Regular expression on the headers

“Regular expression pattern the header should match”: Forward_CHD8

Add tags “#Primer” and “#Forward”

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

To tag a dataset:

Click on the dataset to expand it

Click on Add Tags galaxy-tags

Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).

Press Enter

Check that the tag appears below the dataset name

Tags beginning with # are special!

They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;

dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);

datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;

datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a dedicated #nametag tutorial.

Remove eventual gaps from primers Degap.seqs ( Galaxy version 1.39.5.0) with the following parameters:

Click on param-files Multiple datasets

Select several files by keeping the Ctrl (or COMMAND) key pressed and clicking on the files of interest

param-files “fasta - Dataset”: Two Filter FASTA outputs (outputs of Filter FASTA tool)

In this previous hands-on, the step of removing eventual gaps (- in the FASTA files) is a precaution, there are no gaps in our primers file. However, it is important to remove gaps at this point in case you are using different data, otherwise some steps of the tutorial could fail (e.g. alignment).

This following hands-on is to be applied only on the sequence of the antisense (reverse) primer.

Hands On: Compute Reverse-Complement of the antisense (reverse) primer

Reverse-Complement ( Galaxy version 1.0.2+galaxy0) the sequence antisense (reverse) primer with the following parameters:

param-file “Input file in FASTA or FASTQ format”: Degap.seqs #Reverse FASTA output (output of Degap.seqs tool)

See in the introduction for explanations on the Reverse-Complement.

Prepare sequence data

Unzip data files

Hands On: Unzip

Unzip ( Galaxy version 6.0+galaxy0) with the following parameters:

param-file “input_file”: AOPEP_and_CHD8_sequences_20220907.zip?download=1

“Extract single file”: All files

Question

How many files is there in the ZIP archive ?

12 (if you have a different number of files something likely went wrong)

From now on, we’ll be working a lot on data collections:

Click on param-collection Dataset collection in front of the input parameter you want to supply the collection to.

Select the collection you want to use from the list

Filter collection to separate sense and antisense sequence files

As for primers, sense and antisense sequences will be subjected to slightly different procedures so they must be separated in distinct data collections.

Hands On: Filter

Extract element identifiers ( Galaxy version 0.0.2) with the following parameters:

param-collection “Dataset collection”: output collection (output of Unzip tool)

Regex Find And Replace ( Galaxy version 1.0.3) with the following parameters:

param-file “Select lines from”: output (output of Extract element identifiers tool)

In “Check”:

param-repeat “Insert Check”

“Find Regex”: ^[A-Za-z0-9_-]+F$

“Replacement”: ``

param-repeat “Insert Check”

“Find Regex”: ^[A-Za-z0-9_-]+AOPEP[A-Za-z0-9_-]+$

“Replacement”: ``

Tag output with “#Reverse”

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

To tag a dataset:

Click on the dataset to expand it

Click on Add Tags galaxy-tags

Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).

Press Enter

Check that the tag appears below the dataset name

Tags beginning with # are special!

They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;

dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);

datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;

datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a dedicated #nametag tutorial.

Expand one of the output datasets of the tool (by clicking on it)

Click re-run galaxy-refresh the tool

This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.

Regex Find And Replace ( Galaxy version 1.0.3) with the following parameters:

param-file “Select lines from”: output (output of Extract element identifiers tool)

In “Check”:

param-repeat “Insert Check”

“Find Regex”: ^[A-Za-z0-9_-]+R$

“Replacement”: ``

param-repeat “Insert Check”

“Find Regex”: ^[A-Za-z0-9_-]+AOPEP[A-Za-z0-9_-]+$

“Replacement”: ``

Tag output with “#Forward”

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

To tag a dataset:

Click on the dataset to expand it

Click on Add Tags galaxy-tags

Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).

Press Enter

Check that the tag appears below the dataset name

Tags beginning with # are special!

They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;

dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);

datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;

datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a dedicated #nametag tutorial.

Filter collection with the following parameters:

param-collection “Input Collection: output collection (output of Unzip tool)

“How should the elements to remove be determined?”: Remove if identifiers are ABSENT from file

param-files “Filter out identifiers absent from”: #Forward files list & #Reverse files list (output of Regex Find And Replace tool)

Tag (filtered) outputs with “#Forward” and “#Reverse”

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

To tag a dataset:

Click on the dataset to expand it

Click on Add Tags galaxy-tags

Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).

Press Enter

Check that the tag appears below the dataset name

Tags beginning with # are special!

They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;

dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);

datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;

datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a dedicated #nametag tutorial.

Comment: What's happening in this section?

First step: Extracting the list of file names in the data collection Second step: Removing file names containing a “F” and “AOPEP” -> creating a list of antisense (reverse) sequence files of the marker CHD8 Third step: Removing file names containing a “R” and “AOPEP” -> creating a list of sense (forward) sequence files of the marker CHD8 Fourth step: Select files in the collection -> creating two distinct collections with sense (forward) sequence files on one hand and antisense (reverse) sequence file on the other hand

For the second and third step, we used regular expressions (Regex):

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expression Matches

abc an occurrence of abc within your data

(abc|def) abc or def

[abc] a single character which is either a, b, or c

[^abc] a character that is NOT a, b, nor c

[a-z] any lowercase letter

[a-zA-Z] any letter (upper or lower case)

[0-9] numbers 0-9

\d any digit (same as [0-9])

\D any non-digit character

\w any alphanumeric character

\W any non-alphanumeric character

\s any whitespace

\S any non-whitespace character

. any character

\. literal . (period)

{x,y} between x and y repetitions

^ the beginning of the line

$ the end of the line

Note: you see that characters such as *, ?, ., + etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So \? matches the question mark character exactly.

Examples

Regular expression matches

\d{4} 4 digits (e.g. a year)

chr\d{1,2} chr followed by 1 or 2 digits

.*abc$ anything with abc at the end of the line

^$ empty line

^>.* Line starting with > (e.g. Fasta header)

^[^>].* Line not starting with > (e.g. Fasta sequence)

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups (...), which we can refer to using \1, \2 etc for the first and second captured values. If you want to refer to the whole match, use &.

Regular expression Input Captures

chr(\d{1,2}) chr14 \1 = 14

(\d{2}) July (\d{4}) 24 July 1984 \1 = 24, \2 = 1984

An expression like s/find/replacement/g indicates a replacement expression, this will search (s) for any occurrence of find, and replace it with replacement. It will do this globally (g) which means it doesn’t stop after the first match.

Example: s/chr(\d{1,2})/CHR\1/g will replace chr14 with CHR14 etc.

You can also use replacement modifier such as convert to lower case \L or upper case \U. Example: s/.*/\U&/g will convert the whole text to upper case.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the s/../../g structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.

With [A-Za-z0-9_-] meaning any character between A to Z, a to z, 0 to 9 or _ or -, the following + meaning that any of these characters are found once or more.

Regular expression	Matches
`abc`	an occurrence of `abc` within your data
`(abc\|def)`	`abc` or `def`
`[abc]`	a single character which is either `a`, `b`, or `c`
`[^abc]`	a character that is NOT `a`, `b`, nor `c`
`[a-z]`	any lowercase letter
`[a-zA-Z]`	any letter (upper or lower case)
`[0-9]`	numbers 0-9
`\d`	any digit (same as `[0-9]`)
`\D`	any non-digit character
`\w`	any alphanumeric character
`\W`	any non-alphanumeric character
`\s`	any whitespace
`\S`	any non-whitespace character
`.`	any character
`\.`	literal . (period)
`{x,y}`	between x and y repetitions
`^`	the beginning of the line
`$`	the end of the line

Regular expression	matches
`\d{4}`	4 digits (e.g. a year)
`chr\d{1,2}`	`chr` followed by 1 or 2 digits
`.*abc$`	anything with `abc` at the end of the line
`^$`	empty line
`^>.*`	Line starting with `>` (e.g. Fasta header)
`^[^>].*`	Line not starting with `>` (e.g. Fasta sequence)

Regular expression	Input	Captures
`chr(\d{1,2})`	`chr14`	`\1 = 14`
`(\d{2}) July (\d{4})`	24 July 1984	`\1 = 24`, `\2 = 1984`

Convert AB1 sequence files to FASTQ and trim low-quality ends

In Sanger sequencing, ends tend to be of low trust levels (each nucleotide has a quality score reflecting this trust level), it is important to delete these sections of the sequences to ensure wrong nucleotides aren’t introduced in the sequences.

Hands On: AB1 to FASTQ files and trim low quality ends

Do these steps twice !! We have Froward and antisense (reverse) sequence data collections, do these steps starting with each “(filtered)” data collections, this could help:

Expand one of the output datasets of the tool (by clicking on it)

Click re-run galaxy-refresh the tool

This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.

ab1 to FASTQ converter ( Galaxy version 1.20.0) with the following parameters:

param-collection “Input ab1 file”: (filtered) output collection (output of Filter collection tool)

“Do you want trim ends according to quality scores ?”: No, use full sequences.

In this tool, it is possible to trim low-quality ends along with the conversion of the file but parametrization is less precise.

seqtk_trimfq ( Galaxy version 1.3.1) with the following parameters:

param-collection “Input FASTA/Q file”: output collection (output of ab1 to FASTQ converter tool)

“Mode for trimming FASTQ File”: Quality

“Maximally trim down to INT bp”: 0

Compute reverse complement sequence for antisense (reverse) sequences only

See in the introduction for explanations on the Reverse-Complement.

Hands On: Reverse complement

FASTQ Groomer ( Galaxy version 1.1.5) with the following parameters:

param-collection “File to groom”: #Reverse output collection (output of seqtk_trimfq tool)

“Advanced Options”: Show Advanced Options

“Summarize input data”: Do not Summarize Input (faster)

Comment: What is this step?

It is a necessary step to get the right input format for the following step Reverse-Complement tool

Reverse-Complement ( Galaxy version 1.0.2+galaxy0) with the following parameters:

param-collection “Input file in FASTA or FASTQ format”: #Reverse output collection (output of FASTQ Groomer tool)

Merge corresponding sense and antisense sequences single files

Hands On: Sort collections

Do this step twice !! One has to make sure sense (forward) and antisense (reverse) sequences collections are in the same order to get the right sense and the right antisense sequence to be merged together

Expand one of the output datasets of the tool (by clicking on it)

Click re-run galaxy-refresh the tool

This is useful if you want to run the tool again but with slightly different paramters, or if you just want to check which parameter setting you used.

Sort collection with the following parameters:

param-collection “Input Collection”: Collection (output of seqtk_trimfq tool & output of Reverse-Complement tool)

“Sort type”: alphabetical

Hands On: Merge sense (forward) and antisense (reverse) sequence files

seqtk_mergepe ( Galaxy version 1.3.1) with the following parameters:

param-collection “Input FASTA/Q file #1”: output (output of Sort collection tool)

param-collection “Input FASTA/Q file #2”: output (output of Sort collection tool)

Check there is two sequences in each three files of the newly-created collection.

Convert FASTQ files to FASTA

Hands On: FASTQ to FASTA

FASTQ Groomer ( Galaxy version 1.1.5) with the following parameters:

param-collection “File to groom”: default (output of seqtk_mergepe tool)

“Advanced Options”: Show Advanced Options

“Summarize input data”: Do not Summarize Input (faster)

Comment: What is this step?

It is a necessary step to get the right input format for the following step FASTQ to FASTA tool

FASTQ to FASTA ( Galaxy version 1.0.2+galaxy2) with the following parameters:

param-collection “FASTQ file to convert”: output collection (output of FASTQ Groomer tool)

“Discard sequences with unknown (N) bases”: no

“Rename sequence names in output file (reduces file size)”: no

“Compress output FASTA”: No

Comment: information

If this step doesn’t work, one can try tools FASTQ to tabular tool and tabular to FASTA tool instead

Align sequences and retrieve consensus for each sequence

Hands On: Align and consensus

Align sequences ( Galaxy version 1.9.1.0) with the following parameters:

param-collection “Input fasta file”: output collection (output of FASTQ-to-FASTA tool)

“Method for aligning sequences”: clustalw

“Minimum percent sequence identity to closest blast hit to include sequence in alignment”: 0.1

Consensus sequence from aligned FASTA ( Galaxy version 1.0.0) with the following parameters:

param-collection “Input fasta file with at least two sequences”: aligned_sequences (output of Align sequences tool)

Add tag “#Consensus”

Merge.files ( Galaxy version 1.39.5.0) with the following parameters:

“Merge”: fasta files

param-collection “inputs - fasta”: output collection (output of Consensus sequence from aligned FASTA tool)

Manage primers and sequences

Merge and align consensus sequences file and primer files

Hands On: Merge and format consensus sequences + primers file

Merge.files ( Galaxy version 1.39.5.0) with the following parameters:

“Merge”: fasta files

param-files “inputs - fasta”: consensus sequences (output of Merge.files tool), Reverse primer (output of Reverse-Complement tool), Forward primer (output of Degap.seqs tool)

Click on param-files Multiple datasets

Select several files by keeping the Ctrl (or COMMAND) key pressed and clicking on the files of interest

Remove tags “#Forward” and “#Reverse”

Regex Find And Replace ( Galaxy version 1.0.3) with the following parameters:

param-file “Select lines from”: output (output of Merge.files tool)

In “Check”:

param-repeat “Insert Check”

“Find Regex”: ([A-Z-])>

“Replacement”: \1\n>

Comment: What's going on in this second step?

Sometimes, Merge.files tool doesn’t keep linefeed between the files, this step permits to correct it and get a FASTA file that is formatted properly.

For the second step, we used regular expressions (Regex):

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expression Matches

abc an occurrence of abc within your data

(abc|def) abc or def

[abc] a single character which is either a, b, or c

[^abc] a character that is NOT a, b, nor c

[a-z] any lowercase letter

[a-zA-Z] any letter (upper or lower case)

[0-9] numbers 0-9

\d any digit (same as [0-9])

\D any non-digit character

\w any alphanumeric character

\W any non-alphanumeric character

\s any whitespace

\S any non-whitespace character

. any character

\. literal . (period)

{x,y} between x and y repetitions

^ the beginning of the line

$ the end of the line

Note: you see that characters such as *, ?, ., + etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So \? matches the question mark character exactly.

Examples

Regular expression matches

\d{4} 4 digits (e.g. a year)

chr\d{1,2} chr followed by 1 or 2 digits

.*abc$ anything with abc at the end of the line

^$ empty line

^>.* Line starting with > (e.g. Fasta header)

^[^>].* Line not starting with > (e.g. Fasta sequence)

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups (...), which we can refer to using \1, \2 etc for the first and second captured values. If you want to refer to the whole match, use &.

Regular expression Input Captures

chr(\d{1,2}) chr14 \1 = 14

(\d{2}) July (\d{4}) 24 July 1984 \1 = 24, \2 = 1984

An expression like s/find/replacement/g indicates a replacement expression, this will search (s) for any occurrence of find, and replace it with replacement. It will do this globally (g) which means it doesn’t stop after the first match.

Example: s/chr(\d{1,2})/CHR\1/g will replace chr14 with CHR14 etc.

You can also use replacement modifier such as convert to lower case \L or upper case \U. Example: s/.*/\U&/g will convert the whole text to upper case.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the s/../../g structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.

With [A-Z-] meaning any character between A to Z or -, \1 repeat the character chain between brackets in the “Find Regex” section, \n meaning a line-feed.

Regular expression	Matches
`abc`	an occurrence of `abc` within your data
`(abc\|def)`	`abc` or `def`
`[abc]`	a single character which is either `a`, `b`, or `c`
`[^abc]`	a character that is NOT `a`, `b`, nor `c`
`[a-z]`	any lowercase letter
`[a-zA-Z]`	any letter (upper or lower case)
`[0-9]`	numbers 0-9
`\d`	any digit (same as `[0-9]`)
`\D`	any non-digit character
`\w`	any alphanumeric character
`\W`	any non-alphanumeric character
`\s`	any whitespace
`\S`	any non-whitespace character
`.`	any character
`\.`	literal . (period)
`{x,y}`	between x and y repetitions
`^`	the beginning of the line
`$`	the end of the line

Regular expression	matches
`\d{4}`	4 digits (e.g. a year)
`chr\d{1,2}`	`chr` followed by 1 or 2 digits
`.*abc$`	anything with `abc` at the end of the line
`^$`	empty line
`^>.*`	Line starting with `>` (e.g. Fasta header)
`^[^>].*`	Line not starting with `>` (e.g. Fasta sequence)

Regular expression	Input	Captures
`chr(\d{1,2})`	`chr14`	`\1 = 14`
`(\d{2}) July (\d{4})`	24 July 1984	`\1 = 24`, `\2 = 1984`

When you have the consensus sequences, you can check if any ambiguous nucleotide is to be found in the sequences. If you find such nucleotides, it means different nucleotides were found in the sense and antisense sequence at the same position, some checks are needed.

Y = C or T

R = A or G

W = A or T

S = G or C

K = T or G

M = C or A

Hands On: Look for ambiguous nucleotides

Click on output of Regex Find and Replace tool in the history to expand it

Click on galaxy-barchart Visualize

Select Multiple Sequence Alignment

Set color scheme to Clustal, ambiguous nucleotides are highlighted in dark blue

There are two nucleotide positions to check, Y at 121 in sequence consensus_B05_CHD8-III6brother-18 and W at 286 in sequence consensus_05_CHD8-III6mother-18

You need to go back to your FASTQ sequences to understand the origin of the ambiguity

Regex Find And Replace ( Galaxy version 1.0.3) with the following parameters:

param-file “Select lines from”: #Consensus #Primer output (output of Regex Find and Replace tool)

In “Check”:

param-repeat “Insert Check”

“Find Regex”: ^[ACTG]+([ACTG]{20}Y)[ACTG]+$

“Replacement”: \1

param-repeat “Insert Check”

“Find Regex”: ^[ACTG]+([ACTG]{20}W)[ACTG]+$

“Replacement”: \1

Comment: What's going on in this step?

We want to retrieve the 20 nucleotides before the ambiguities.

We use regular expressions (Regex):

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expression Matches

abc an occurrence of abc within your data

(abc|def) abc or def

[abc] a single character which is either a, b, or c

[^abc] a character that is NOT a, b, nor c

[a-z] any lowercase letter

[a-zA-Z] any letter (upper or lower case)

[0-9] numbers 0-9

\d any digit (same as [0-9])

\D any non-digit character

\w any alphanumeric character

\W any non-alphanumeric character

\s any whitespace

\S any non-whitespace character

. any character

\. literal . (period)

{x,y} between x and y repetitions

^ the beginning of the line

$ the end of the line

Note: you see that characters such as *, ?, ., + etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So \? matches the question mark character exactly.

Examples

Regular expression matches

\d{4} 4 digits (e.g. a year)

chr\d{1,2} chr followed by 1 or 2 digits

.*abc$ anything with abc at the end of the line

^$ empty line

^>.* Line starting with > (e.g. Fasta header)

^[^>].* Line not starting with > (e.g. Fasta sequence)

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups (...), which we can refer to using \1, \2 etc for the first and second captured values. If you want to refer to the whole match, use &.

Regular expression Input Captures

chr(\d{1,2}) chr14 \1 = 14

(\d{2}) July (\d{4}) 24 July 1984 \1 = 24, \2 = 1984

An expression like s/find/replacement/g indicates a replacement expression, this will search (s) for any occurrence of find, and replace it with replacement. It will do this globally (g) which means it doesn’t stop after the first match.

Example: s/chr(\d{1,2})/CHR\1/g will replace chr14 with CHR14 etc.

You can also use replacement modifier such as convert to lower case \L or upper case \U. Example: s/.*/\U&/g will convert the whole text to upper case.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the s/../../g structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.

With [ACTG] meaning any character of the four unambiguous nucleotides followed by + meaning “at least once in the character chain” or by {20} meaning “20 times”.

In the output of this tool we get: - the 20 nucleotides before the Y at position 121 in sequence consensus_B05_CHD8-III6brother-18: CAGGCACGATGTCATCGAAT - and the 20 nuleotides before the W at position 286 in sequence consensus_05_CHD8-III6mother-18: AGTCCTCTTAGTTTATAGAT

FASTQ masker ( Galaxy version 1.1.5) with the following parameters:

param-collection “File to mask”: #Forward #Reverse collection (output of FASTQ groomer tool)

“Mask input with”: Lowercase

“Quality score”: 10

This tool displays low-quality bases in lowercase to permit better detection of potential errors.

Open galaxy-eye B05_CHD8-III6brother-18 output of FASTQ masker tool and ctrl+f : CAGGCACGATGTCATCGAAT. In the sense sequence (ID ending with 18F), this fragment is followed by a c in low-quality, whereas in the antisense sequence it is followed by a T in decent quality. Additionally, when looking into the galaxy-eye #Consensus #Primer output of Regex Find and Replace tool, we can see the two other consensus sequences (consensus_05_CHD8-III6mother-18 and consensus_07_CHD8-III6-18) have a T at this same position. It seems more likely that the nucleotide at position 121 in sequence consensus_B05_CHD8-III6brother-18 is a T.

Open galaxy-eye 05_CHD8-III6mother-18 outputs of FASTQ masker tool and ctrl+f : AGTCCTCTTAGTTTATAGAT. In the antisense sequence (ID ending with 18R), this fragment is followed by a t in low-quality, whereas in the sense sequence it is followed by a A in decent quality. Additionally, when looking into the galaxy-eye #Consensus #Primer output of Regex Find and Replace tool, we can see the two other consensus sequences (consensus_B05_CHD8-III6brother-18 and consensus_07_CHD8-III6-18) have a A at this same position. It seems more likely that the nucleotide at position 286 in sequence consensus_05_CHD8-III6mother-18 is a A.

You can now correct them by clicking on output of Regex Find and Replace tool in the history to expand it

Click on galaxy-barchart Visualize

Select Editor and:

replace manually the Y with T in consensus_B05_CHD8-III6brother-18

replace manually the W with A in consensus_05_CHD8-III6mother-18 and click on export

Regular expression	Matches
`abc`	an occurrence of `abc` within your data
`(abc\|def)`	`abc` or `def`
`[abc]`	a single character which is either `a`, `b`, or `c`
`[^abc]`	a character that is NOT `a`, `b`, nor `c`
`[a-z]`	any lowercase letter
`[a-zA-Z]`	any letter (upper or lower case)
`[0-9]`	numbers 0-9
`\d`	any digit (same as `[0-9]`)
`\D`	any non-digit character
`\w`	any alphanumeric character
`\W`	any non-alphanumeric character
`\s`	any whitespace
`\S`	any non-whitespace character
`.`	any character
`\.`	literal . (period)
`{x,y}`	between x and y repetitions
`^`	the beginning of the line
`$`	the end of the line

Regular expression	matches
`\d{4}`	4 digits (e.g. a year)
`chr\d{1,2}`	`chr` followed by 1 or 2 digits
`.*abc$`	anything with `abc` at the end of the line
`^$`	empty line
`^>.*`	Line starting with `>` (e.g. Fasta header)
`^[^>].*`	Line not starting with `>` (e.g. Fasta sequence)

Regular expression	Input	Captures
`chr(\d{1,2})`	`chr14`	`\1 = 14`
`(\d{2}) July (\d{4})`	24 July 1984	`\1 = 24`, `\2 = 1984`

Now, one can align its sequences with primers. Ultimately, it is common to cut sequences between primers to get the right fragment for each sequence.

Hands On: Align sequences and primers

Align sequences ( Galaxy version 1.9.1.0) with the following parameters:

param-file “Input fasta file”: out_file1 Regex Find And Replace (modified)

“Method for aligning sequences”: mafft

“Minimum percent sequence identity to closest blast hit to include sequence in alignment”: 0.1

Check your sequences belongs to the right taxonomic group by computing a BLAST on the NCBI database

Hands On: NVBI Blast

NCBI BLAST+ blastn ( Galaxy version 2.10.1+galaxy2) with the following parameters:

param-file “Nucleotide query sequence(s)”: out_file1 (output of Regex Find And Replace tool)

“Subject database/sequences”: Locally installed BLAST database

“Nucleotide BLAST database”: select most recent nt_ database

“Output format”: Tabular (select which columns)

“Standard columns”: qseqid, pident, mismatch and gapopen

“Extended columns”: gaps and salltitles

“Other identifier columns”: saccver

“Advanced Options”: Show Advanced Options

“Maximum hits to consider/show”: 10

“Restrict search of database to a given set of ID’s”: No restriction, search the entire database

Question

The sequences we cleaned belong to what species?

Homo sapiens

It is a good practice to proceed to such checks, its permits to make sure the sequencing went as planned and your samples haven’t been contaminated.

Conclusion

We successfully cleaned AB1 sequence files !

AOPEP Sanger files

The history following the same steps but for AOPEP marker files is available: Clean AOPEP sequences

You've Finished the Tutorial

Key points

Check your data and results for mistakes afterward !!!! This procedure is useful but not perfect

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

References

AOPEP variants as a novel cause of recessive dystonia: Generalized dystonia and dystonia-parkinsonism, 2022 Parkinsonism and related disorders 97: 52–56. 10.1016/j.parkreldis.2022.03.007

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Coline Royaux, Clean and manage Sanger sequences from raw files to aligned consensus (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/Manage_AB1_Sanger/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{sequence-analysis-Manage_AB1_Sanger,
author = "Coline Royaux",
	title = "Clean and manage Sanger sequences from raw files to aligned consensus (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/Manage_AB1_Sanger/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.

shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/sequence-analysis/tutorials/Manage_AB1_Sanger/tutorial.json | jq .admin_install_yaml -r)

Alternatively you can copy and paste the following YAML

---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools:
- name: fastq_groomer
  owner: devteam
  revisions: 47e5dbc3e790
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fastq_masker_by_quality
  owner: devteam
  revisions: 9dfda4e310ed
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fastq_to_fasta
  owner: devteam
  revisions: 191e43b329f6
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fastq_to_tabular
  owner: devteam
  revisions: 4b347702c4aa
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: fastx_reverse_complement
  owner: devteam
  revisions: 6027ef51ef91
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: ncbi_blast_plus
  owner: devteam
  revisions: 0e3cf9594bb7
  tool_panel_section_label: NCBI Blast
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: tabular_to_fasta
  owner: devteam
  revisions: 0a7799698fe5
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: ab1_fastq_converter
  owner: ecology
  revisions: 307518fb51af
  tool_panel_section_label: Convert Formats
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: aligned_to_consensus
  owner: ecology
  revisions: 0ccbe1c20fc3
  tool_panel_section_label: Assembly
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: filter_by_fasta_ids
  owner: galaxyp
  revisions: dff7df6fcab5
  tool_panel_section_label: Proteomics
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: regex_find_replace
  owner: galaxyp
  revisions: 503bcd6ebe4b
  tool_panel_section_label: Text Manipulation
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: unzip
  owner: imgteam
  revisions: 57f0914ddb7b
  tool_panel_section_label: Collection Operations
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: collection_element_identifiers
  owner: iuc
  revisions: d3c07d270a50
  tool_panel_section_label: Collection Operations
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: mothur_degap_seqs
  owner: iuc
  revisions: 6f08af23888a
  tool_panel_section_label: Mothur
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: mothur_merge_files
  owner: iuc
  revisions: bc20680d28d5
  tool_panel_section_label: Mothur
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: qiime_align_seqs
  owner: iuc
  revisions: e8bb88f051ec
  tool_panel_section_label: Sequence analysis
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: seqtk
  owner: iuc
  revisions: 3da72230c066
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/
- name: seqtk
  owner: iuc
  revisions: 3da72230c066
  tool_panel_section_label: FASTA/FASTQ
  tool_shed_url: https://toolshed.g2.bx.psu.edu/

t{ hist[0] | to_stars }} 1

t{ hist[0] | to_stars }} 2

June 2025

1 stars: Disliked: .ab1 file conversion tool no longer available anywhere

August 2024

1 stars: Disliked: the steps are diffucult to follo, it's not always clear what was done, workflow doesn't reflect the tutorial well

June 2024

3 stars: Liked: good dataset Disliked: align sequences tool error