Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
updated docs
  • Loading branch information
proost committed Mar 14, 2017
1 parent e323626 commit 0ad4e5c
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 2 deletions.
11 changes: 9 additions & 2 deletions README.md
Expand Up @@ -10,8 +10,8 @@ LSTrAP wraps multiple existing tools into a single workflow. To use LSTrAP the f

![LSTrAP Workflow](docs/images/LSTrAP_workflow.png "Steps automated by LSTrAP")

Steps in bold are submitted to a cluster. Optional steps can be enabled by adding the flag *--enable-interpro* and/or
*--enable-orthology*.
Steps in bold are submitted to a cluster. Optional steps can be enabled by adding the flag *‑‑enable‑interpro* and/or
*‑‑enable‑orthology*.

## Preparation

Expand Down Expand Up @@ -70,6 +70,13 @@ steps prior to building the network.

More information on how the quality of samples is determined can be found [here](docs/quality.md).

## Output

Apart from the output all tools included generate, LSTrAP will generate raw and normalized expression matrices, a
co‑expression network and co‑expression clusters.

A detailed overview of files produces, including examples, can be found [here](docs/example_output.md).

## Helper Scripts

LSTrAP comes with a few additional scripts to assist users to download and process data from the [Sequence Read Archive](http://www.ncbi.nlm.nih.gov/sra),
Expand Down
55 changes: 55 additions & 0 deletions docs/example_output.md
@@ -0,0 +1,55 @@
# Example output

Upon completion, LSTrAP will have run Trimmomatic, Bowtie 2, TopHat2 and HTSeq-Count. Unless specified otherwise, the
raw output from those tools will be stored. Furthermore, LSTrAP further processes the output of these tools to construct
expression matrices, co-expression networks and clusters. A description of the LSTrAP specific output can be found
below.

## Expression profiles/matrix

LSTrAP will write the raw expression matrix as well as an RPKM and TPM normalized version upon completion. This is a
large matrix where columns (separated by tabs) are samples and rows are transcripts. In each cell the raw or normalized expression value
is included. Mock example is included below.

A single row, along with the header, can be used to draw an expression profile (cfr. in Excel, R, ...)

gene Sample1 Sample2 Sample3 ... Sample10
Gene1 2 1 3 ... 0
Gene2 0 0.5 0 ... 0
Gene3 2 0.22 0.11 ... 0.5
... ... ... ... ... ...
Gene10 1 3 0 ... 0.7

## Co-expression network

Pearson's Correlation Coefficients (PCC) are calculated based on the TPM normalized expression matrix. A file is written
where for each transcript (ID before the colon) the top 1000 co-expressed genes (ID after colon, tab separated) are shown with the PCC value
(number between round brackets).

AT1G05660.1: AT1G06120.1(0.975109345421) AT4G01630.1(0.971643917372) AT3G59130.1(0.967450941397) AT2G39040.1(0.961912892051) AT2G43880.1(0.958996761442) ...
AT5G09780.1: AT3G17010.1(0.949034133987) AT5G57720.1(0.870169887662) AT2G16210.1(0.8604233184) AT5G47600.1(0.818799585331) AT5G37860.1(0.801435539475) ...
AT2G19740.1: AT3G02560.1(0.842648087998) AT5G28060.1(0.837579535602) AT5G56710.1(0.835775366218) AT2G44860.1(0.828341737973) AT2G39460.1(0.828069004117) ...
...

Furthermore, the co-expression network is prepared for MCL clustering. Here only co-expressed pairs with PCC
values > 0.7 are considered. The score stored in this file is PCC - 0.7 as MCL requires the minimal value to be zero.
On each line you have two co-expressed genes and the correlation transformed for use with mcl. This file can be imported
into Cytoscape desktop or Gephi for visualization/further analysis.

AT1G67450.1 AT4G23110.1 0.0496500079172
AT1G67450.1 AT4G05630.1 0.0490984038043
AT1G67450.1 AT5G40430.1 0.0479090219126
...

# Co-expression clusters

Co-expression clusters, detected using MCL, are stored as a text file where each line represents a co-expression
cluster. IDs for transcripts belonging to that cluster are separated by tabs.

AT2G19740.1 AT3G02560.1 AT5G28060.1 AT5G56710.1 AT2G44860.1 AT2G39460.1 AT1G34030.1 AT1G26880.1 AT3G28900.1 AT3G04920.1 ...
AT1G69250.1 AT1G52640.1 AT5G15820.1 AT1G17130.1 AT3G24210.1 AT5G27330.1 AT4G21140.1 AT4G12610.1 AT5G49000.1 AT4G25340.1 ...
AT1G27500.1 AT1G04700.1 AT3G24715.1 AT3G57140.1 AT4G07960.1 AT2G30505.1 AT1G79860.1 AT1G44120.1 AT1G05820.1 AT1G52240.1 ...
AT2G40030.1 AT3G51290.1 AT4G39600.1 AT2G02790.1 AT3G07200.1 AT2G27040.1 AT5G14610.1 AT3G17840.1 AT2G39620.1 AT2G40720.1 ...
AT5G05657.1 AT2G16881.1 AT3G23650.1 AT4G23103.1 AT4G08370.1 AT3G24216.1 AT2G02280.1 AT3G31068.1 AT2G01780.1 AT2G03932.1 ...
AT1G32520.1 AT4G09350.1 AT1G62250.1 AT3G47430.1 AT2G37240.1 AT2G04039.1 AT2G35660.1 AT5G09660.1 AT2G39730.1 AT4G26860.1 ...
...

0 comments on commit 0ad4e5c

Please sign in to comment.