Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
proost committed Jul 25, 2017
1 parent 444ea84 commit a35b62c
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 38 deletions.
58 changes: 29 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
# LSTrAP

LSTrAP, shot for Large Scale Transcriptome Analysis Pipeline, greatly facilitates the construction of co-expression networks from
RNA Seq data. The various tools involved are seamlessly connected and CPU-intensive steps are submitted to a computer cluster
LSTrAP, short for Large Scale Transcriptome Analysis Pipeline, greatly facilitates the construction of co-expression networks from
RNA-Seq data. The various tools involved are seamlessly connected and CPU-intensive steps are submitted to a computer cluster
automatically.

## Version 1.3 Changelog

* Support for [PBS](https://en.wikipedia.org/wiki/Portable_Batch_System) / [Torque](http://www.adaptivecomputing.com/products/open-source/torque/) scheduler (note proper [configuration](./docs/configuration.md) is required)
* [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) can be used as an alternative to [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and [TopHat 2](https://ccb.jhu.edu/software/tophat/index.shtml)
* Added [helper](./docs/helper.md) script to do PCA on samples
* **Parameter names in data.ini changed**

## Workflow

LSTrAP wraps multiple existing tools into a single workflow. To use LSTrAP the following tools need to be installed
Expand All @@ -13,13 +20,10 @@ LSTrAP wraps multiple existing tools into a single workflow. To use LSTrAP the f
Steps in bold are submitted to a cluster. Optional steps can be enabled by adding the flag *‑‑enable‑interpro* and/or
*‑‑enable‑orthology*.

## Preparation

LSTrAP is designed to run on an [Oracle Grid Engine](https://www.oracle.com/sun/index.html) computer cluster system and requires
all external tools to be installed on the compute nodes. The "module load" system is supported. A comprehensive list of all tools
necessary can be found [here](docs/preparation.md). Instructions to run LSTrAP on other systems are provided below.

## Installation
Before installing make sure your system meets all requirements. A detailed list of supported systems and required
software can be found [here](docs/preparation.md).


Use git to obtain a copy of the LSTrAP code

Expand All @@ -31,34 +35,35 @@ Next, move into the directory and copy **config.template.ini** and **data.templa
cp config.template.ini config.ini
cp data.template.ini data.ini

Configure config.ini and data.ini using the guidelines below

## Configuration of LSTrAP
Configure config.ini and data.ini using these [guidelines](docs/configuration.md)

After copying the templates, **config.ini** needs to be set up to run on your system. It requires the path to Trimmomatic's jar and the
modules where Bowtie, Tophat ... are installed in (if the [modules](http://modules.sourceforge.net/) environment is used.

The location of the transcriptome data, the refrence genome and a few per-species options need to be defined in **data.ini**.
## Running LSTrAP

Detailed instruction how to set up both configuration files can be found [here](docs/configuration.md)
Once properly configured for your system and data, LSTrAP can be run using a single simple command (that should be
executed on the head node).

## Obtaining and preparing data
./run.py config.ini data.ini

Scripts to download and prepare data from the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) are included in
LSTrAP in the folder **helper**. Furthermore, it is recommended to remove splice variants from the GFF3 files, a script
to do that is included there as well. Detailed instructions for each script provided to obtain and prepare data can be
found [here](docs/helper.md)
Run using [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml)

## Running LSTrAP
./run.py --use-hisat2 config.ini data.ini

Once properly configured for your system and data, LSTrAP can be run using a single simple command (that should be executed on the head node)
Run with InterProScan and/or OrthoFinder

./run.py config.ini data.ini
./run.py --enable-orthology --enable-interproscan config.ini data.ini

Options to enable InterProScan and/or OrthoFinder or to skip certain steps of the pipeline are included, use the command below for more info
Furthermore, steps can be skipped (to avoid re-running steps unnecessarily). Use the command below for more info.

./run.py -h

## Obtaining and preparing data

Scripts to download and prepare data from the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) are included in
LSTrAP in the folder **helper**. Furthermore, it is recommended to remove splice variants from the GFF3 files, a script
to do that is included there as well. Detailed instructions for each script provided to obtain and prepare data can be
found [here](docs/helper.md)

## Quality report

After running LSTrAP a log file (*lstrap.log*) is written, in which samples which failed a quality measure
Expand Down Expand Up @@ -92,11 +97,6 @@ for LSTrAP can be generated.

python3 fasta_to_gff.py /path/to/transcript.cds.fasta > output.gff

## Adapting LSTrAP to other cluster managers

LSTrAP is designed and tested on a cluster running the Oracle Grid Engine (default), PBS/Torque is also supported.

Though due to differences in parameters


## Contact
Expand Down
9 changes: 8 additions & 1 deletion docs/helper.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,17 @@ on a normalized expression matrix.


![matrix example](images/matrix.png "Sample distance heatmap (with hierarchical clustering)")


### pca_plot.py

Script to perform a PCA analysis on any expression matrix.

python3 pca_plot.py ./data/sbi.expression.matrix.tpm.txt

### pca_powerlaw.py

*This script and the required data are included to recreate results from the manuscript (Proost et al., under review)*

Script to perform a PCA analysis on the *Sorghum bicolor* data (case study) and draw the node degree distribution. The
required data is included here as well. Note that this script requires sklearn and seaborn.

Expand Down
21 changes: 13 additions & 8 deletions docs/preparation.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,29 @@
# Preparing your system

LSTrAP is designed to run on the head node of a Oracle Grid Engine cluster. Apart from a running compute cluster, the essential
tools need to be installed. A full list is provided below, tools can be installed on the grid nodes directly or inside modules.
When opting for the latter, the configuration file needs to contain the exact names of the modules containing the tools.
LSTrAP is designed with High Performance Computing in mind and requires a computer cluster running
[Oracle Grid Engine]((https://www.oracle.com/sun/index.html)) or [PBS](https://en.wikipedia.org/wiki/Portable_Batch_System)
/ [Torque](http://www.adaptivecomputing.com/products/open-source/torque/). Furthermore, the essential
tools (see below) need to be installed prior to running LSTrAP.
Using the [Environment modules](http://modules.sourceforge.net/) are supported, in that case the configuration file
needs to contain the exact names of the modules containing the tools.

## Required Tools

* [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
* [TopHat](https://ccb.jhu.edu/software/tophat/manual.shtml)
* [HISAT2]
* [HISAT2](http://ccb.jhu.edu/software/hisat2/index.shtml)
* [Samtools](http://www.htslib.org/)
* [SRAtools](http://ncbi.github.io/sra-tools/)
* [Python 2.7](https://www.python.org/download/releases/2.7/) + [HTSeq](http://www-huber.embl.de/users/anders/HTSeq/doc/index.html) + all dependencies (including [PySam](https://github.com/pysam-developers/pysam))
* [HTSeq](http://www-huber.embl.de/users/anders/HTSeq/doc/index.html) + all dependencies (including [PySam](https://github.com/pysam-developers/pysam))
* [Python 3.5](https://www.python.org/download/releases/3.5.1/) + SciPy + [Numpy](http://www.numpy.org/)
* [InterProScan](https://www.ebi.ac.uk/interpro/)
* [OrthoFinder](https://github.com/davidemms/OrthoFinder)
* [MCL](http://www.micans.org/mcl/index.html?sec_software)
* [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)

Optional tools
## Optional tools

* [InterProScan](https://www.ebi.ac.uk/interpro/)
* [OrthoFinder](https://github.com/davidemms/OrthoFinder)
* [Python 2.7](https://www.python.org/download/releases/2.7/) (for OrthoFinder)
* [scikit-learn](http://scikit-learn.org/) for Python 3, required for PCA analysis (helper script)
* [seaborn](https://stanford.edu/~mwaskom/software/seaborn/) for Python 3, required for PCA analysis (helper script)
* [Aspera connect client](http://downloads.asperasoft.com/en/downloads/2), required for the *get_sra_ip.py* (helper script)

0 comments on commit a35b62c

Please sign in to comment.