LSTrAP

LSTrAP, shot for Large Scale Transcriptome Analysis Pipeline, greatly facilitates the construction of co-expression networks from RNA Seq data. The various tools involved are seamlessly connected and CPU-intensive steps are submitted to a computer cluster automatically.

Workflow

LSTrAP wraps multiple existing tools into a single workflow. To use LSTrAP the following tools need to be installed

Steps in bold are submitted to a cluster.

Preparation

LSTrAP is designed to run on an Oracle Grid Engine computer cluster system and requires all external tools to be installed on the compute nodes. The "module load" system is supported. A comprehensive list of all tools necessary can be found here

Installation

Use git to obtain a copy of the LSTrAP code

git clone https://github.molgen.mpg.de/proost/LSTrAP

Next, move into the directory and copy config.template.ini and data.template.ini

cd LSTrAP
cp config.template.ini config.ini
cp data.template.ini data.ini

Configure config.ini and data.ini using the guidelines below

Configuration of LSTrAP

After copying the templates, config.ini needs to be set up to run on your system. It requires the path to Trimmomatic's jar and the modules where Bowtie, Tophat ... are installed in.

The location of the transcriptome data, the refrence genome and a few per-species options need to be defined in data.ini.

Detailed instruction how to set up both configuration files can be found here

Running LSTrAP

Once properly configured for your system and data, LSTrAP can be run using a single simple command (that should be executed on the head node)

./run.py config.ini data.ini

Options to skip certain steps of the pipeline are included, use the command below for more info

./run.py -h

Quality report

After running LSTrAP a log file (lstrap.log) is written, in which samples which failed a quality measure are reported. Note that no samples are excluded from the final network. In case certain samples need to be excluded from the final network remove the htseq file for the sample you which to exclude and re-run the pipeline skipping all steps prior to building the network.

./run.py config.ini data.ini --skip-interpro --skip-orthology --skip-bowtie-build --skip-trim-fastq --skip-tophat --skip-htseq --skip-qc

More information on how the quality of samples is determined can be found here.

Helper Scripts

LSTrAP comes with a few additional scripts to assist users to download and process data from the Sequence Read Archive, repeat analyses and the case study reported in the manuscript (Proost et al., under preparation).

Details for each script can be found here

Contact

LSTrAP was developped by Sebastian Proost and Marek Mutwil at the Max-Planck Institute for Molecular Plant Physiology

This work is supported by ERA-CAPS though the EVOREPRO project.

License

LSTrAP is freely available under the MIT License