Repeat masking with RepeatModeler and RepeatMasker

  • Author(s):
  • Romane Libouban
  • Release: 0.1
  • License: MIT
  • UniqueID: f25be8fa-7823-456f-9707-a497703f48d7

RepeatMasking Workflow

This workflow uses RepeatModeler and RepeatMasker for genome analysis.

  • RepeatModeler is a software package for identifying and modeling de novo families of transposable elements (TEs). At the heart of RepeatModeler are three de novo repeat search programs (RECON, RepeatScout and LtrHarvest/Ltr_retriever) which use complementary computational methods to identify repeat element boundaries and family relationships from sequence data.

  • RepeatMasker is a program that analyzes DNA sequences for interleaved repeats and low-complexity DNA sequences. The result of the program is a detailed annotation of the repeats present in the query sequence, as well as a modified version of the query sequence in which all annotated repeats are present.

Input dataset for RepeatModeler

  • RepeatModeler requires a single input file, a genome in fasta format.

Outputs dataset for RepeatModeler

  • Two output files are generated:
    • summary file (.tbl)
    • fasta file containing alignments in order of appearance in the query sequence

Input dataset for RepeatMasker

  • ReapatMasker requires the fasta file generated by RepeatModeler

Outputs datasets for RepeatMasker

  • Five output files are generated:
    • a fasta file
    • .gff3 file
    • a table summarizing the repeated content of the sequence analyzed
    • a file with statistics related to the repeated content of the sequence analyzed
    • a summary of the mutation sites found and the order of grouping