Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
Sebastian Proost committed May 5, 2017
1 parent e0e896b commit 7fb630d
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion README.md
@@ -1,2 +1,13 @@
# De-TE-ctor
Tool to detect potential transposable elements in a fasta file
Tool to detect potential transposable elements in a fasta file. Some classes of transposable elements (TE) encode for proteins. Often these look, without prior knowledge, exactly like a protein coding gene. Some genomes include these TEs in their annotation while others filter them out. For comparative genomics this is problematic as it creates differences (e.g. a simple gene count would not be comparable between genomes with and without TEs included) based on a technical bias and not a valid biological trait.

To solve this issue these TEs should be detected and removed in all genomes prior to doing a comparative analysis (e.g. counting genes, constructing gene families, ...). De-TE-ctor is a small pipeline to quickly check for putative TEs included in the set of protein coding genes of a genome and report them so they can be removed.


TODO

* Collect sequences for known protein coding transposable elements.
* Script to initiate pipeline (build blast DB)
* Script to blast fasta files (with support for a cluster)
* Script to parse output and report potential TEs
* Script to remove TEs from initial input producing a final *clean* fasta file

0 comments on commit 7fb630d

Please sign in to comment.