Skip to content
Permalink
e73c7dae8b
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
143 lines (84 sloc) 7.82 KB
![PARrOT](./PARrOT/vignettes/parrot_logo.PNG "PARrOT")
## PAthway pRedictiOn by mulTimodal genes
## Abstract
This package uses the information of big datasets to perform an graph analysis. Therefore an adjacence matrix is computed from information of multimodal genes. Multimodal genes in this case are genes which show two or more distinctable normal distributions in their patient distribution.
![bimodal](./PARrOT/vignettes/bimodality_example.png "bimodal")
Those bimodal genes can be found by the help of [multimodalR](https://github.molgen.mpg.de/loosolab/multimodalR)
#### Availability
All components of the PARrOT R package and the belonging Python scripts are available for download from the Github repository [PARrOT](https://github.molgen.mpg.de/loosolab/PARrOT/).
Get a Docker container [here](https://cloud.docker.com/u/loosolab/repository/docker/loosolab/parrot).
Please make sure to check our other projects at [loosolab](http://loosolab.mpi-bn.mpg.de/).
#### Input
As input all data can be used, which show a multimodel distribution. The package was first tested with mRNA data from TCGA.
A standardized format was needed to allow the formation of a connection between the multimodality recognition and the pathway analyses. This JavaScript Object Notation (JSON) contains each gene with its number of modalities and the connected means, standard deviation, sizes and the belonging patients in the parameter “groups”. Also, the location of the files that contain clinical data and the expression matrix should be given. An example of the needed data format is given in Table 1.
![Table1](./PARrOT/vignettes/table1.PNG "Table 1")
#### Example
```r
table <- readjsonsheet(JSON = JSON)
matrix <- calcscorematrix(table = table)
snode <- buildsinglenode(matrix = matrix)
```
This saves a file named 'adjMatrix_singlenode.csv' in the working directory. It contains three rows where the first and the second contain the genename and the third the weight for the connection.
Also it performs the normalization of adjacency matrix and statistics for the given data.
````bash
python ./Graph.py -i adjMatrix_singlenode.csv -o <your OUTPUT dir>
````
This command will generate different models for the given connection list (graph) and the belonging graphics and logs.
Also it produces the block_member.csv file which contains the most probable gene clustering.
This can be loaded into R and prepared for a second anaylsis run where all information of the database are used.
````r
readcluster(clustermember = block_member.csv, matrix = matrix)
````
By this a file for each found cluster is generated. Naming the files as follows: subcluster_<number of cluster>.txt.
This files contain all edges and all modalties for a found cluster of the first run of Graph analyses.
````bash
python ./Graph_subcluster.py -i <OUTPUT dir> -o <your OUTPUT dir>
````
This performs a whole analysis for each cluster.
## Docker
The whole analysis can also be performed with the help of the dockker container.
It can easily started by this command.
````bash
docker run -i -v <dir containing JSON>:/INPUT/ -v <desired OUTPUT dir>:/OUTPUT/ parrot:latest
````
In the input directory only one json file is suposed to be located and the output directory is supposed to be empty.
The Docker container can be obtained [here](https://cloud.docker.com/u/loosolab/repository/docker/loosolab/parrot).
#### Structure
The workflow of PARrOT can be displayed in the following flowchart.
![flowchart](./PARrOT/vignettes/flowchart_parrot.png "Flowcchart")
The flowchart displays the data stream of the whole framework. The blue rectangles represent functions of the R-package, while the orange rectangles represent the python scripts. All displayed graphs are generated in those functions. The transfer format is mentioned in the arrows between the functions.
As entry into the framework, a JSON file in the presented format is necessary. With the functions readjsonmatrix() or spike_in(), a table, that contains all properties for each modality is generated. This is given to the calcscorematrix() function which generates the first plots that display the statistical properties of the given data. The normalization process is also completed at this point.
The generated adjacency matrix is passed to the buildsinglenode() function, which reduces the number of vertices and filters the edges to obtain a computable amount of edges.
The list of these genes is given to the graph analysis and the results are handled and validated by the readcluster() function. It also generates new lists for each found cluster, that contains all edges and modalities, that have been filtered in the buildsinglenode() function. In the last step of the framework, those lists are given to a second graph analysis. This is done in order to gain a higher resolution of the clusters, which has an impact on the significance of the discovered structures due to the fact that non-specific members of clusters are sorted out.
## Functions of the R package
### readjsonsheet
This function forms a data.table object from the given JSON file. The columns contain Ensemble ID, component, groupsize, proportion, variance, groupmean, FC (distance between means) and groupmember for each modality.
Therefore the packages JSONIO and data.table are used.
### calcscorematrix
This function counts the common patients between each modaility and performs the normalization. Therefore it takes the size of the patientsgroups, their distribution in the modalities and the foldchange into account
### buildsinglenode
Because of the amount of vertices and edges it is necessary to reduce the graph. Therefore only the top n genes of each vertice is used. As well as the reduction of the number of vertices by reducing from modalities to genes.
Also a global cutoff can be set.In this case only the top n edges are used.
#### Python analyses Graph.py
The python script performs four different generative models for the given graphs. The implementation of the anlgorithm is performed by [Graph-tool](https://graph-tool.skewed.de/)
Those generative models are then commpared by their minimum description length and the best fitting algorithm is used for the ongoing analyses.
The first two models are statistic block models. The first is the default version and the second a variation where a vertice can be part of several clusters/blocks.
The other models are nested block models (nbm). The first analyses is again the default version of a nbm. The second in this case is a nbm where a step of equilibration is performed.
For all models a normal and a degree corrected model is compared and only the more precise model is used.
### readcluster
This function reads in the results of the graph analyses. By using the clusterprofiler package it performs a gene set enrichment analyses angainst KEGG. Also it writes the files for the second graph analyses.
#### Python analyses Graph_subcluster.py
The second script performs the whole analyses of the first script for each subcluster which is saved with all information by readcluster.
### Spike in
The functionality has been tested with a randomised data pool. Into this data pool a slightly stronger connected spike was added. The function provides the possibilty to determine sizes, propotions, fold changes, overlap and number of patients. Most important also the number of spikes can be determined.
## Installation
To install the R-package the following commands have to be executed in R.
```r
library(devtools)
install_github(repo = "loosolab/PARrOT/PARrOT",host = "github.molgen.mpg.de/api/v3")
```
The install of graph-tool which is needed for the python script can be taken from [here](https://git.skewed.de/count0/graph-tool/wikis/installation-instructions).
The docker container can be easily obtained from [here](https://cloud.docker.com/u/loosolab/repository/docker/loosolab/parrot).
## How to cite
## License
This project is licensed under the MIT license.