Skip to content

EDA/ipd

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

IPD — Interaction Preserving Discretisation

How to Compile

You can compile the code yourself from scratch as follows

cd /code javac .java multibinning/business/.java multibinning/data/*.java

You can then create the .jar file as follows

jar cmf mainclass.txt ipd.jar .class multibinning/business/.class multibinning/data/*.java

How to Run

You run IPD as follows

java -jar ipd.jar

where for the IPD methors you'll need to specify the following parameters:

Parameter Meaning
-FILE_INPUT name of input file
-FILE_CP_OUTPUT name of output file for cut points per dimension
-FILE_RUNTIME_OUTPUT name of output file for runtime in microseconds
-FILE_DATA_OUTPUT name of output file for discretized data in .arff format
-NUM_ROWS number of data points
-NUM_MEASURE_COLS number of numeric dimensions
-NUM_CAT_CONTEXT_COLS number of categorical dimensions
-MAX_VAL maximum value, used for normalization
-METHOD method used (0 for IPD_opt, 2 for IPD_gr)

For example,

java -jar ipd.jar -FILE_INPUT example/simple.csv -FILE_CP_OUTPUT cuts.txt
-FILE_RUNTIME_OUTPUT runtime.txt -FILE_DATA_OUTPUT out.txt -NUM_ROWS 209
-NUM_MEASURE_COLS 2 -NUM_CAT_CONTEXT_COLS 0 -MAX_VAL 1.0 -METHOD 0

the output files of which have been included in /example.

Related Publications

Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data. Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014.

All synthetic and benchmark data sets used in Nguyen et al. (2014) can be found in /data. The benchmark datasets were drawn from the UCI Machine Learning Repository. The PAMAP data sets can be found at the PAMAP website.

Unadvertised and Disabled Functionality

The IPD code contains a wide spectrum of discretisation methods that are currently not documented. Two such methods, Supervised Univariate Discretisation, and PCA-based Discretisation, require the Weka framework.

While these are enabled in the provided jar file, to ease compilation and general use of the code, these have hence been disabled in the code.

Credits

Most of the IPD code was written by Hoang Vu Nguyen. Kailash Budhathoki fixed bugs and helped making the code easier to use.

About

Interaction Preserving Discretisation

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages