From 7fb630d297ac96a1af90fe36218fc1e03e31a620 Mon Sep 17 00:00:00 2001 From: Sebastian Proost Date: Fri, 5 May 2017 13:59:52 +0200 Subject: [PATCH] Update README.md --- README.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 92c8ed6..0b54208 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,13 @@ # De-TE-ctor -Tool to detect potential transposable elements in a fasta file +Tool to detect potential transposable elements in a fasta file. Some classes of transposable elements (TE) encode for proteins. Often these look, without prior knowledge, exactly like a protein coding gene. Some genomes include these TEs in their annotation while others filter them out. For comparative genomics this is problematic as it creates differences (e.g. a simple gene count would not be comparable between genomes with and without TEs included) based on a technical bias and not a valid biological trait. + +To solve this issue these TEs should be detected and removed in all genomes prior to doing a comparative analysis (e.g. counting genes, constructing gene families, ...). De-TE-ctor is a small pipeline to quickly check for putative TEs included in the set of protein coding genes of a genome and report them so they can be removed. + + +TODO + + * Collect sequences for known protein coding transposable elements. + * Script to initiate pipeline (build blast DB) + * Script to blast fasta files (with support for a cluster) + * Script to parse output and report potential TEs + * Script to remove TEs from initial input producing a final *clean* fasta file