Skip to content

Commit

Permalink
added documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
proost committed May 10, 2017
1 parent cd2fa4a commit 468363d
Showing 1 changed file with 31 additions and 5 deletions.
36 changes: 31 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,36 @@ Install the detector module

### Commands

TODO
First, you need to create a blast database from a set of known transposable element proteins. First collect sequences of
such elements in fasta format and store them in one directory. Note that all files should have the extension .fasta.
Alternatively, the files in *./TE_DB/* can be used. For more details how these were prepared read the file
[here](TE_DB/README.md).

* Collect sequences for known protein coding transposable elements.
* Script to initiate pipeline (build blast DB)
* Script to blast fasta files (with support for a cluster)
* Script to parse output and report potential TEs
Create the blast DB using the command below

detector build ./TE_DB/ known_te_db

This command will pick up all fasta-files in the TE_DB folder, concatenate them and build a blast database named
known_te_db.

Next, a fasta-file (with protein sequences) should be basted against the newly created database with known transposable
element proteins.

detector blast species_peptides.fasta known_te_db species_blast_output

This command will blast a protein fasta-file species_peptides.fasta against known_te_db (created in the previous step).
The output will be stored in species_blast_output

Finally, from the output, sequences similar to known transposable element proteins can be extracted using the analyze
command below.

detector analyze species_blast_output species_putative_te.lst
The file species_putative_te.lst will be created if putative transposable element proteins are found.


# TODO

* Add support for cluster
* Parameters for analyze (desired cutoff) + determine good thresholds
* Script to remove TEs from initial input producing a final *clean* fasta file

0 comments on commit 468363d

Please sign in to comment.