update docs

proost · Jul 25, 2017 · a35b62c · a35b62c
1 parent 444ea84
commit a35b62c
Show file tree

Hide file tree

Showing 3 changed files with 50 additions and 38 deletions.
diff --git a/README.md b/README.md
@@ -1,9 +1,16 @@
 # LSTrAP
 
-LSTrAP, shot for Large Scale Transcriptome Analysis Pipeline, greatly facilitates the construction of co-expression networks from
-RNA Seq data. The various tools involved are seamlessly connected and  CPU-intensive steps are submitted to a computer cluster 
+LSTrAP, short for Large Scale Transcriptome Analysis Pipeline, greatly facilitates the construction of co-expression networks from
+RNA-Seq data. The various tools involved are seamlessly connected and  CPU-intensive steps are submitted to a computer cluster 
 automatically. 
 
+## Version 1.3 Changelog
+
+  * Support for [PBS](https://en.wikipedia.org/wiki/Portable_Batch_System) / [Torque](http://www.adaptivecomputing.com/products/open-source/torque/) scheduler (note proper [configuration](./docs/configuration.md) is required)
+  * [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) can be used as an alternative to [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and [TopHat 2](https://ccb.jhu.edu/software/tophat/index.shtml)
+  * Added [helper](./docs/helper.md) script to do PCA on samples
+  * **Parameter names in data.ini changed**
+
 ## Workflow
 
 LSTrAP wraps multiple existing tools into a single workflow. To use LSTrAP the following tools need to be installed
@@ -13,13 +20,10 @@ LSTrAP wraps multiple existing tools into a single workflow. To use LSTrAP the f
 Steps in bold are submitted to a cluster. Optional steps can be enabled by adding the flag *&#8209;&#8209;enable&#8209;interpro* and/or 
 *&#8209;&#8209;enable&#8209;orthology*.
 
-## Preparation
-
-LSTrAP is designed to run on an [Oracle Grid Engine](https://www.oracle.com/sun/index.html) computer cluster system and requires 
-all external tools to be installed on the compute nodes. The "module load" system is supported. A comprehensive list of all tools 
-necessary can be found  [here](docs/preparation.md). Instructions to run LSTrAP on other systems are provided below.
-
 ## Installation
+Before installing make sure your system meets all requirements. A detailed list of supported systems and required 
+software can be found [here](docs/preparation.md).
+
 
 Use git to obtain a copy of the LSTrAP code
 
@@ -31,34 +35,35 @@ Next, move into the directory and copy **config.template.ini** and **data.templa
     cp config.template.ini config.ini
     cp data.template.ini data.ini
 
-Configure config.ini and data.ini using the guidelines below
-
-## Configuration of LSTrAP
+Configure config.ini and data.ini using these [guidelines](docs/configuration.md)
 
-After copying the templates, **config.ini** needs to be set up to run on your system. It requires the path to Trimmomatic's jar and the
-modules where Bowtie, Tophat ... are installed in (if the [modules](http://modules.sourceforge.net/) environment is used.
 
-The location of the transcriptome data, the refrence genome and a few per-species options need to be defined in **data.ini**. 
+## Running LSTrAP
 
-Detailed instruction how to set up both configuration files can be found [here](docs/configuration.md)
+Once properly configured for your system and data, LSTrAP can be run using a single simple command (that should be 
+executed on the head node).
 
-## Obtaining and preparing data
+    ./run.py config.ini data.ini
 
-Scripts to download and prepare data from the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) are included in
-LSTrAP in the folder **helper**. Furthermore, it is recommended to remove splice variants from the GFF3 files, a script
-to do that is included there as well. Detailed instructions for each script provided to obtain and prepare data can be
-found [here](docs/helper.md)
+Run using [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml)
 
-## Running LSTrAP
+    ./run.py --use-hisat2 config.ini data.ini
 
-Once properly configured for your system and data, LSTrAP can be run using a single simple command (that should be executed on the head node)
+Run with InterProScan and/or OrthoFinder 
 
-    ./run.py config.ini data.ini
+    ./run.py --enable-orthology --enable-interproscan config.ini data.ini
 
-Options to enable InterProScan and/or OrthoFinder or to skip certain steps of the pipeline are included, use the command below for more info
+Furthermore, steps can be skipped (to avoid re-running steps unnecessarily). Use the command below for more info.
 
     ./run.py -h
 
+## Obtaining and preparing data
+
+Scripts to download and prepare data from the [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) are included in
+LSTrAP in the folder **helper**. Furthermore, it is recommended to remove splice variants from the GFF3 files, a script
+to do that is included there as well. Detailed instructions for each script provided to obtain and prepare data can be
+found [here](docs/helper.md)
+
 ## Quality report
 
 After running LSTrAP a log file (*lstrap.log*) is written, in which samples which failed a quality measure
@@ -92,11 +97,6 @@ for LSTrAP can be generated.
 
     python3 fasta_to_gff.py /path/to/transcript.cds.fasta > output.gff
 
-## Adapting LSTrAP to other cluster managers
-
-LSTrAP is designed and tested on a cluster running the Oracle Grid Engine (default), PBS/Torque is also supported.
-
-Though due to differences in parameters 
 
 
 ## Contact

diff --git a/docs/helper.md b/docs/helper.md
@@ -82,10 +82,17 @@ on a normalized expression matrix.
 
 
 ![matrix example](images/matrix.png "Sample distance heatmap (with hierarchical clustering)")
-
+
+### pca_plot.py
+
+Script to perform a PCA analysis on any expression matrix.
+
+    python3 pca_plot.py ./data/sbi.expression.matrix.tpm.txt
 
 ### pca_powerlaw.py
 
+*This script and the required data are included to recreate results from the manuscript (Proost et al., under review)*
+
 Script to perform a PCA analysis on the *Sorghum bicolor* data (case study) and draw the node degree distribution. The
 required data is included here as well. Note that this script requires sklearn and seaborn.
 

diff --git a/docs/preparation.md b/docs/preparation.md
@@ -1,24 +1,29 @@
 # Preparing your system
 
-LSTrAP is designed to run on the head node of a Oracle Grid Engine cluster. Apart from a running compute cluster, the essential 
-tools need to be installed. A full list is provided below, tools can be installed on the grid nodes directly or inside modules. 
-When opting for the latter, the configuration file needs to contain the exact names of the modules containing the tools.
+LSTrAP is designed with High Performance Computing in mind and requires a computer cluster running 
+[Oracle Grid Engine]((https://www.oracle.com/sun/index.html)) or [PBS](https://en.wikipedia.org/wiki/Portable_Batch_System) 
+/ [Torque](http://www.adaptivecomputing.com/products/open-source/torque/). Furthermore, the essential 
+tools (see below) need to be installed prior to running LSTrAP. 
+Using the [Environment modules](http://modules.sourceforge.net/) are supported, in that case the configuration file 
+needs to contain the exact names of the modules containing the tools.
 
+## Required Tools
 
   * [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
   * [TopHat](https://ccb.jhu.edu/software/tophat/manual.shtml)
-  * [HISAT2]
+  * [HISAT2](http://ccb.jhu.edu/software/hisat2/index.shtml)
   * [Samtools](http://www.htslib.org/)
   * [SRAtools](http://ncbi.github.io/sra-tools/)
-  * [Python 2.7](https://www.python.org/download/releases/2.7/) + [HTSeq](http://www-huber.embl.de/users/anders/HTSeq/doc/index.html) + all dependencies (including [PySam](https://github.com/pysam-developers/pysam))
+  * [HTSeq](http://www-huber.embl.de/users/anders/HTSeq/doc/index.html) + all dependencies (including [PySam](https://github.com/pysam-developers/pysam))
   * [Python 3.5](https://www.python.org/download/releases/3.5.1/) + SciPy + [Numpy](http://www.numpy.org/)
-  * [InterProScan](https://www.ebi.ac.uk/interpro/)
-  * [OrthoFinder](https://github.com/davidemms/OrthoFinder)
   * [MCL](http://www.micans.org/mcl/index.html?sec_software)
   * [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)
 
-Optional tools
+## Optional tools
 
+  * [InterProScan](https://www.ebi.ac.uk/interpro/)
+  * [OrthoFinder](https://github.com/davidemms/OrthoFinder)
+  * [Python 2.7](https://www.python.org/download/releases/2.7/) (for OrthoFinder)
   * [scikit-learn](http://scikit-learn.org/) for Python 3, required for PCA analysis (helper script)
   * [seaborn](https://stanford.edu/~mwaskom/software/seaborn/) for Python 3, required for PCA analysis (helper script)
   * [Aspera connect client](http://downloads.asperasoft.com/en/downloads/2), required for the *get_sra_ip.py* (helper script)