Skip to content
Permalink
7fb630d297
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
13 lines (9 sloc) 1.15 KB

De-TE-ctor

Tool to detect potential transposable elements in a fasta file. Some classes of transposable elements (TE) encode for proteins. Often these look, without prior knowledge, exactly like a protein coding gene. Some genomes include these TEs in their annotation while others filter them out. For comparative genomics this is problematic as it creates differences (e.g. a simple gene count would not be comparable between genomes with and without TEs included) based on a technical bias and not a valid biological trait.

To solve this issue these TEs should be detected and removed in all genomes prior to doing a comparative analysis (e.g. counting genes, constructing gene families, ...). De-TE-ctor is a small pipeline to quickly check for putative TEs included in the set of protein coding genes of a genome and report them so they can be removed.

TODO

  • Collect sequences for known protein coding transposable elements.
  • Script to initiate pipeline (build blast DB)
  • Script to blast fasta files (with support for a cluster)
  • Script to parse output and report potential TEs
  • Script to remove TEs from initial input producing a final clean fasta file