-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Sebastian Proost
authored
May 5, 2017
1 parent
e0e896b
commit 7fb630d
Showing
1 changed file
with
12 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,13 @@ | ||
# De-TE-ctor | ||
Tool to detect potential transposable elements in a fasta file | ||
Tool to detect potential transposable elements in a fasta file. Some classes of transposable elements (TE) encode for proteins. Often these look, without prior knowledge, exactly like a protein coding gene. Some genomes include these TEs in their annotation while others filter them out. For comparative genomics this is problematic as it creates differences (e.g. a simple gene count would not be comparable between genomes with and without TEs included) based on a technical bias and not a valid biological trait. | ||
|
||
To solve this issue these TEs should be detected and removed in all genomes prior to doing a comparative analysis (e.g. counting genes, constructing gene families, ...). De-TE-ctor is a small pipeline to quickly check for putative TEs included in the set of protein coding genes of a genome and report them so they can be removed. | ||
|
||
|
||
TODO | ||
|
||
* Collect sequences for known protein coding transposable elements. | ||
* Script to initiate pipeline (build blast DB) | ||
* Script to blast fasta files (with support for a cluster) | ||
* Script to parse output and report potential TEs | ||
* Script to remove TEs from initial input producing a final *clean* fasta file |