Skip to content

loosolab/rPID

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 

rPID - Partial Information Decomposition for R

Many biological systems involve multiple interacting factors affecting an outcome synergistically and/or redunantly, e.g. genetic contribution to a phenotype or the tight interplay of genes within a gene-regulatory network (GRN)2. Information theory provides a set of measures that allow us to characterize statistical dependencies between pairs of random variables with considerable advantages over simpler measures such as (Pearson) correlation, as it is capable of capturing non-linear dependencies, and reflecting the dynamics between pairs or groups of genes3. In these settings, we are concerned with the statistics of how two (or more) random variables X1, X2, called source variables, jointly or separately specify/predict another random variable Z, called a target random variable. The source variables can provide information about the target uniquely, redundantly, or synergistically (see PI-diagram below). The mutual information (I) between the source variables and the target variable is equal to the sum of four partial information terms:

Formula

Mutual information decomposition

Here, we implemented the nonnegative decomposition of Multivariate Information for three random vectors as proposed by Williams and Beer1 as an R package.

Usage

Given three random vectors x1, x2 and z, the PID can be calculated from the pid function.

library(rPID)

z <- rnorm(100)
x1 <- rnorm(100)
x2 <- rnorm(100)

zd <- discretize(z)
x1d <- discretize(x1)
x2d <- discretize(x2)

decomposition <- pid(zd, x1d, x2d)

Example: Genetic circuits

Genetic circuit

Let's assume that the proteins AmtR (green) and SrpR (blue) can form a heterodimer to act on the YFP promoter (red) and drive YFP expression. They are connected in a AND gated genetic circuit, which means that either expression of AmtR or SrpR alone does not express YFP.

The corresponding expression truth table looks like this:

AmtR expression SrpR expression YFP expression
low low low
high low low
low high low
high high high

Now assume we collected expression data for AmtR, SrpR and YFP from 400 cells, with all expression combinations present in 100 cells, respectively.

data(circuit_data)
head(circuit_data)
#           YFP       AmtR        SrpR
# 1 0.006546362 0.04528976 0.003372873
# 2 0.016749429 0.07870458 0.002703958
# 3 0.008629785 0.04528976 0.001836538

The correlation values alone do lead to the association of YFP with AmtR or SrpR:

cor(circuit_data$YFP, circuit_data$AmtR, method="pearson")
# [1] 0.3688194
cor(circuit_data$YFP, circuit_data$SrpR, method="pearson")
# [1] 0.3539511

However, using partial information decomposition, we can explore three way interactions and quantify unique, synergistic and redundant information between the target varibal (z = YFP) and the set of two source variables (x1 = AmtR, x2 = SrpR):

# Use discretized expression data
data(circuit_data_discrete)

pid(z = circuit_data_discrete$YFP, x1 = circuit_data_discrete$AmtR, x2 = circuit_data_discrete$SrpR)
# $unique_x1
# [1] 0.0370862
# 
# $unique_x2
# [1] 0.0368186
# 
# $synergy
# [1] 0.597708
# 
# $redundancy
# [1] 0.3251131

The very high value of synergy hints that only both source variables together are able to provide full information about the target variable. Please feel free to explore other combinations of source and target variables:

# No unique or synergistic information, total redundancy:
pid(z = circuit_data_discrete$YFP, x1 = circuit_data_discrete$AmtR, x2 = circuit_data_discrete$AmtR)

# No synergistic information, most of the information is unique to x1:
pid(z = circuit_data_discrete$YFP, x1 = circuit_data_discrete$YFP, x2 = circuit_data_discrete$SrpR)

# High synergistic information and unique information from x2:
pid(z = circuit_data_discrete$SrpR, x1 = circuit_data_discrete$AmtR, x2 = circuit_data_discrete$YFP)

Installation and prerequisites

To install the package, use:

library(devtools)
install.packages("entropy")
install_github("loosolab/rPID", host="github.molgen.mpg.de")

In order to be able to use the Bayesian Blocks discretizer, additionally install the astroML python package via pip install astroML.

License

The project is licensed under the MIT license.

Literature

1. Williams PL and Beer RD. Nonnegative Decomposition of Multivariate Information. arXiv (2010), https://arxiv.org/abs/1004.2515v1

2. Griffith V and Ho T. Quantifying Redundant Information in Predicting a Target Random Variable. Entropy (2015), doi:10.3390/e17074644

3. Chan et. al. Network inference and hypotheses-generation from single-cell transcriptomic data using multivariate information measures. bioRxiv (2016), doi:10.1101/082099

About

Partial Information Decomposition for R

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages