Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
update R package
  • Loading branch information
Ghanbari committed Oct 5, 2017
1 parent cffdbc9 commit 4077cc3
Show file tree
Hide file tree
Showing 6 changed files with 104 additions and 102 deletions.
2 changes: 1 addition & 1 deletion NAMESPACE
Expand Up @@ -2,8 +2,8 @@

export(Dcenter)
export(dpm)
export(fisher_test)
export(get_link)
export(kmeans_links)
export(nldata)
export(rdata)
export(reg.dpm)
Expand Down
51 changes: 0 additions & 51 deletions R/fisher_test.R

This file was deleted.

45 changes: 45 additions & 0 deletions R/kmeans_links.R
@@ -0,0 +1,45 @@
#' Function to apply kmeans clustering on the score of links
#'
#' @param score_matrix score matrix obtained from DPM or reg.DPM.
#'
#' @return a list containing
#' - the list of the links as a data frame. Each line of the data frame corresponds to a link.
#' The first and second columns are the corresponding nodes of the link and the third column is the score of the link.
#' - the threshold that all the links with scores more than it are considered as an edge.
#'
#' @examples
#' ## Load DREAM4 data (insilico_size10_1_knockouts)
#' data(dream4)
#'
#'## Run regularized DPM
#' res <- reg.dpm(data)
#'
#' ## Get the list of the links based on kmeans clustering
#' linklist <- kmeans_links(res)$LinkList
#'
#' ##Get the threshold at significance level alpha=0.05
#' kmeans_links(res)$threshold
#'
#'
#' @export

###########################################################
### Function to compute fisher z transformation pvalues ###
###########################################################
kmeans_links <- function(score_matrix){
score_matrix <- abs(score_matrix)
diag(score_matrix) <- NA
score_matrix[upper.tri(score_matrix)] <- NA
linkList <- reshape2::melt(score_matrix, na.rm=TRUE)
colnames(linkList) <- c("Node1", "Node2", "Score")

# do kmeans on scores
res = kmeans(linkList$Score, centers=2)
linkList <- linkList[res$cluster==which.max(res$centers),]

# compute threshold
threshold = mean(a$centers)

return(list(LinkList=linkList,threshold=threshold))
}

32 changes: 23 additions & 9 deletions README.md
Expand Up @@ -15,6 +15,7 @@ Then install DPM using the `install_github` function in the
```r
library(devtools)
devtools::install_github("ghanbari/DPM",host="github.molgen.mpg.de/api/v3")
library(DPM)
```

### Format of input data
Expand All @@ -29,6 +30,7 @@ head(DataMat)
In the context of gene networks variables correspond to genes. The input is gene expression data as a matrix in which columns correspond to genes and rows correspond to samples. As an example, we have one data from DREAM4 challenge (insilico_size10_1_knockouts):

```r
library(DPM)
data(dream4)
head(dream4)
```
Expand Down Expand Up @@ -90,22 +92,34 @@ ggsave(“ScoreDistribution.png”, plot = last_plot(), device = “png”)
When you decide for the threshold you can visualize the resulted network (see section [Visualization of the network](#visualization-of-the-network) ) and get all the links with a score higher than the threshold (see subsection [Get only the links with a score higher than a threshold](#get-only-the-links-with-a-score-higher-than-a-threshold) )


However, we provide a threshold selection method based on the [fisher z-transformation](https://en.wikipedia.org/wiki/Fisher_transformation). You can compute the threshold for the DREAM4 data with the following command where:
However, we provide a threshold selection method based on k-means clustering. You can compute the threshold for the DREAM4 data with the following command where:

* `score_matrix` is the score matrix obtained from DPM or reg.DPM.
* `alpha` is a significance value
* `nsamp` is the number of samples (number of rows of input matrix for (reg-)DPM )
* `nvar` is the number of variables (number of columns of input matrix for (reg-)DPM )

```r
tr <- fisher_test(dpmres, alpha=0.05, nsamp=nrow(dream4), nnodes=ncol(dream4))$threshold
tr <- fisher_test(regdpmres, alpha=0.05, nsamp=nrow(dream4), nnodes=ncol(dream4))$threshold
dpmtr <- kmeans_links(dpmres)$threshold
regdpmtr <- kmeans_links(regdpmres)$threshold
```

You can also get all the significant links based on fisher z-transformation at significance level alpha:
You can visualize the position of the threshold on the distribution of the scores:

```r
library("ggplot2")
dpmdat <- data.frame(score = abs(dpmres[lower.tri(dpmres)]),name ='DPM')
regdpmdat <-data.frame(score = abs( regdpmres[lower.tri(regdpmres)]), name='reg-DPM')
dat <- rbind(dpmdat,regdpmdat)

ggplot(dat, aes(x=score,color=name))+ theme_bw() + geom_density(size=1.5)+scale_color_manual(values=c("DPM"="SteelBlue3","reg-DPM"="dodgerblue4"))+ geom_vline(xintercept=dpmtr, size=1, color="SteelBlue3") + geom_vline(xintercept=regdpmtr, size=1,color="dodgerblue4")

ggsave(“ScoreDistribution.png”, plot = last_plot(), device =png”)
```


You can also get all the links considered as true links based on k-means clustering:

```r
dpm_sig_links <- fisher_test(dpmres, alpha=0.05, nsamp=nrow(dream4), nnodes=ncol(dream4))$Listlink
regdpm_sig_links <- fisher_test(regdpmres, alpha=0.05, nsamp=nrow(dream4), nnodes=ncol(dream4))$Listlink
dpm_links <- kmeans_links(dpmres)$Listlink
regdpm_links <- kmeans_links(regdpmres)$Listlink
```

As an alternative, you can select the top-ranked links based on (reg.)DPM score which are most likely to be true links.
Expand Down
41 changes: 0 additions & 41 deletions man/fisher_test.Rd

This file was deleted.

35 changes: 35 additions & 0 deletions man/kmeans_links.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 4077cc3

Please sign in to comment.