Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
code refactored for icdm submission
  • Loading branch information
kbudhath committed Jun 11, 2018
1 parent 6259733 commit 10943bf
Show file tree
Hide file tree
Showing 251 changed files with 1,000 additions and 334,468 deletions.
10 changes: 3 additions & 7 deletions README.md
@@ -1,8 +1,4 @@
# Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction between two random variables `X` and `Y` can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable.

We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class.

We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions.
Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.
# Accurate Causal Inference on Discrete Data
Additive Noise Models (ANMs) provide a theoretically sound approach for inferring the most likely causal direction between pairs of random variables given only a sample from their joint distribution. The key assumption is that the effect is a function of the cause, with additive noise that is independent of the cause. In many cases ANMs are identifiable. Their performance, however, hinges on the chosen dependence measure, the assumption we make on the true distribution, and on the sample size.

In this paper we propose to use Shannon entropy to measure the dependence within an ANM, which gives us a general approach by which we do not have to assume a true distribution, nor have to perform explicit significance tests during optimization. Moreover, through the Minumum Description Length principle, we further show the direct connection between this ANM formulation and the more general Algorithmic Markov Condition (AMC). While practical instantiations of the AMC have so far not been known to be identifiable, we show that under certain adjustments using ANMs this is possible. Our information theoretic formulation gives us a general, efficient, identifiable, and, as the experiments show, highly accurate method for causal inference on pairs of discrete variables---achieving (near) 100% accuracy on both synthetic and real-world data.
6 changes: 0 additions & 6 deletions acid.py
Expand Up @@ -9,12 +9,6 @@
from entropy import entropy


__author__ = "Kailash Budhathoki"
__email__ = "kbudhath@mpi-inf.mpg.de"
__copyright__ = "Copyright (c) 2018"
__license__ = "MIT"


def marginals(X, Y):
Ys = defaultdict(list)
for i, x in enumerate(X):
Expand Down
6 changes: 0 additions & 6 deletions cisc.py
Expand Up @@ -9,12 +9,6 @@
from sc import stochastic_complexity


__author__ = "Kailash Budhathoki"
__email__ = "kbudhath@mpi-inf.mpg.de"
__copyright__ = "Copyright (c) 2018"
__license__ = "MIT"


def marginals(X, Y):
Ys = defaultdict(list)
for i, x in enumerate(X):
Expand Down
122 changes: 0 additions & 122 deletions crisp.py

This file was deleted.

8 changes: 0 additions & 8 deletions data/acute/acute.names

This file was deleted.

120 changes: 0 additions & 120 deletions data/acute/acute.tsv

This file was deleted.

0 comments on commit 10943bf

Please sign in to comment.