Repeat masking with RepeatModeler and RepeatMasker
- Author(s):
- Release: 0.1
- License: MIT
- UniqueID: f25be8fa-7823-456f-9707-a497703f48d7
RepeatMasking Workflow
This workflow uses RepeatModeler and RepeatMasker for genome analysis.
RepeatModeler is a software package for identifying and modeling de novo families of transposable elements (TEs). At the heart of RepeatModeler are three de novo repeat search programs (RECON, RepeatScout and LtrHarvest/Ltr_retriever) which use complementary computational methods to identify repeat element boundaries and family relationships from sequence data.
RepeatMasker is a program that analyzes DNA sequences for interleaved repeats and low-complexity DNA sequences. The result of the program is a detailed annotation of the repeats present in the query sequence, as well as a modified version of the query sequence in which all annotated repeats are present.
Input dataset for RepeatModeler
- RepeatModeler requires a single input file, a genome in fasta format.
Outputs dataset for RepeatModeler
- Two output files are generated:
- summary file (.tbl)
- fasta file containing alignments in order of appearance in the query sequence
Input dataset for RepeatMasker
- ReapatMasker requires the fasta file generated by RepeatModeler
Outputs datasets for RepeatMasker
- Five output files are generated:
- a fasta file
- .gff3 file
- a table summarizing the repeated content of the sequence analyzed
- a file with statistics related to the repeated content of the sequence analyzed
- a summary of the mutation sites found and the order of grouping