Tool to detect potential transposable elements in a fasta file
Switch branches/tags
Nothing to show
Latest commit df3cc7f May 22, 2017 sepro cleaned up readme
Failed to load latest commit information.
TE_DB added filter May 22, 2017
example implemented blast May 10, 2017
.gitignore adding data and setting up gitignore May 8, 2017
LICENSE added license May 22, 2017 cleaned up readme May 22, 2017 added filter May 22, 2017
requirements.txt getting click to work May 8, 2017 getting click to work May 8, 2017


Tool to detect potential transposable elements in a fasta file. Some classes of transposable elements (TE) encode for proteins. Often these look, without prior knowledge, exactly like a protein coding gene. Some genomes include these TEs in their annotation while others filter them out. For comparative genomics this is problematic as it creates differences (e.g. a simple gene count would not be comparable between genomes with and without TEs included) based on a technical bias and not a valid biological trait.

To solve this issue these TEs should be detected and removed in all genomes prior to doing a comparative analysis (e.g. counting genes, constructing gene families, ...). De-TE-ctor is a small pipeline to quickly check for putative TEs included in the set of protein coding genes of a genome and report them so they can be removed.



Clone the package from the git repository

git clone detector
cd detector

Create a virtual environment

virtualenv --python=python3 env
source env/bin/activate
pip install -r requirements.txt

Install the detector module

pip install --editable .


First, you need to create a blast database from a set of known transposable element proteins. First collect sequences of such elements in fasta format and store them in one directory. Note that all files should have the extension .fasta. Alternatively, the files in ./TE_DB/ can be used. For more details how these were prepared read the file here.

Create the blast DB using the command below

detector build ./TE_DB/ known_te_db

This command will pick up all fasta-files in the TE_DB folder, concatenate them and build a blast database named known_te_db.

Next, a fasta-file (with protein sequences) should be basted against the newly created database with known transposable element proteins.

detector blast species_peptides.fasta known_te_db species_blast_output

This command will blast a protein fasta-file species_peptides.fasta against known_te_db (created in the previous step). The output will be stored in species_blast_output

Finally, from the output, sequences similar to known transposable element proteins can be extracted using the analyze command below.

detector analyze species_blast_output species_putative_te.lst

The file species_putative_te.lst will be created if putative transposable element proteins are found.

Finally, detector can be used to remove the putative TEs from a fasta file.

detector filter species_peptides.fasta species_putative_te.lst species_peptides.clean.fasta


  • Add support for cluster