From 468363d8a81fc0f2fef0bc7c8b446e05ed117b2e Mon Sep 17 00:00:00 2001 From: sepro Date: Wed, 10 May 2017 14:58:19 +0200 Subject: [PATCH] added documentation --- README.md | 36 +++++++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index c247775..636dd43 100644 --- a/README.md +++ b/README.md @@ -25,10 +25,36 @@ Install the detector module ### Commands -TODO +First, you need to create a blast database from a set of known transposable element proteins. First collect sequences of +such elements in fasta format and store them in one directory. Note that all files should have the extension .fasta. +Alternatively, the files in *./TE_DB/* can be used. For more details how these were prepared read the file +[here](TE_DB/README.md). - * Collect sequences for known protein coding transposable elements. - * Script to initiate pipeline (build blast DB) - * Script to blast fasta files (with support for a cluster) - * Script to parse output and report potential TEs +Create the blast DB using the command below + + detector build ./TE_DB/ known_te_db + +This command will pick up all fasta-files in the TE_DB folder, concatenate them and build a blast database named +known_te_db. + +Next, a fasta-file (with protein sequences) should be basted against the newly created database with known transposable +element proteins. + + detector blast species_peptides.fasta known_te_db species_blast_output + + This command will blast a protein fasta-file species_peptides.fasta against known_te_db (created in the previous step). + The output will be stored in species_blast_output + + Finally, from the output, sequences similar to known transposable element proteins can be extracted using the analyze + command below. + + detector analyze species_blast_output species_putative_te.lst + + The file species_putative_te.lst will be created if putative transposable element proteins are found. + + +# TODO + + * Add support for cluster + * Parameters for analyze (desired cutoff) + determine good thresholds * Script to remove TEs from initial input producing a final *clean* fasta file