rPID - Partial Information Decomposition for R

Many biological systems involve multiple interacting factors affecting an outcome synergistically and/or redunantly, e.g. genetic contribution to a phenotype or the tight interplay of genes within a gene-regulatory network (GRN)². Information theory provides a set of measures that allow us to characterize statistical dependencies between pairs of random variables with considerable advantages over simpler measures such as (Pearson) correlation, as it is capable of capturing non-linear dependencies, and reflecting the dynamics between pairs or groups of genes³. In these settings, we are concerned with the statistics of how two (or more) random variables X₁, X₂, called source variables, jointly or separately specify/predict another random variable Z, called a target random variable. The source variables can provide information about the target uniquely, redundantly, or synergistically (see PI-diagram below). The mutual information (I) between the source variables and the target variable is equal to the sum of four partial information terms:

$I(Z; X1; X2) = Synergy(Z; X1, X2) + Unique(Z; X1) + Unique(Z; X2) + Redundancy(Z; X1, X2)$

Here, we implemented the nonnegative decomposition of Multivariate Information for three random vectors as proposed by Williams and Beer¹ as an R package.

Usage

Given three random vectors x1, x2 and z, the PID can be calculated from the pid function.

library(rPID)

z <- rnorm(100)
x1 <- rnorm(100)
x2 <- rnorm(100)

zd <- discretize(z)
x1d <- discretize(x1)
x2d <- discretize(x2)

decomposition <- pid(zd, x1d, x2d)

Example: Genetic circuits

Let's assume that the proteins AmtR (green) and SrpR (blue) can form a heterodimer to act on the YFP promoter (red) and drive YFP expression. They are connected in a AND gated genetic circuit, which means that either expression of AmtR or SrpR alone does not express YFP.

The corresponding expression truth table looks like this:

AmtR expression	SrpR expression	YFP expression
low	low	low
high	low	low
low	high	low
high	high	high

Now assume we collected expression data for AmtR, SrpR and YFP from 400 cells, with all expression combinations present in 100 cells, respectively.

data(circuit_data)
head(circuit_data)
#           YFP       AmtR        SrpR
# 1 0.006546362 0.04528976 0.003372873
# 2 0.016749429 0.07870458 0.002703958
# 3 0.008629785 0.04528976 0.001836538

The correlation values alone do lead to the association of YFP with AmtR or SrpR:

cor(circuit_data$YFP, circuit_data$AmtR, method="pearson")
# [1] 0.3688194
cor(circuit_data$YFP, circuit_data$SrpR, method="pearson")
# [1] 0.3539511

However, using partial information decomposition, we can explore three way interactions and quantify unique, synergistic and redundant information between the target varibal (z = YFP) and the set of two source variables (x1 = AmtR, x2 = SrpR):

# Use discretized expression data
data(circuit_data_discrete)

pid(z = circuit_data_discrete$YFP, x1 = circuit_data_discrete$AmtR, x2 = circuit_data_discrete$SrpR)
# $unique_x1
# [1] 0.0370862
# 
# $unique_x2
# [1] 0.0368186
# 
# $synergy
# [1] 0.597708
# 
# $redundancy
# [1] 0.3251131

The very high value of synergy hints that only both source variables together are able to provide full information about the target variable. Please feel free to explore other combinations of source and target variables:

# No unique or synergistic information, total redundancy:
pid(z = circuit_data_discrete$YFP, x1 = circuit_data_discrete$AmtR, x2 = circuit_data_discrete$AmtR)

# No synergistic information, most of the information is unique to x1:
pid(z = circuit_data_discrete$YFP, x1 = circuit_data_discrete$YFP, x2 = circuit_data_discrete$SrpR)

# High synergistic information and unique information from x2:
pid(z = circuit_data_discrete$SrpR, x1 = circuit_data_discrete$AmtR, x2 = circuit_data_discrete$YFP)

Installation and prerequisites

To install the package, use:

library(devtools)
install.packages("entropy")
install_github("loosolab/rPID", host="github.molgen.mpg.de")

In order to be able to use the Bayesian Blocks discretizer, additionally install the astroML python package via pip install astroML.

License

The project is licensed under the MIT license.

Literature

1. Williams PL and Beer RD. Nonnegative Decomposition of Multivariate Information. arXiv (2010), https://arxiv.org/abs/1004.2515v1 ↩

2. Griffith V and Ho T. Quantifying Redundant Information in Predicting a Target Random Variable. Entropy (2015), doi:10.3390/e17074644 ↩

3. Chan et. al. Network inference and hypotheses-generation from single-cell transcriptomic data using multivariate information measures. bioRxiv (2016), doi:10.1101/082099 ↩

loosolab/rPID

About

Resources

Stars

Watchers

Forks

Releases

Languages