Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# IPD — Interaction Preserving Discretisation
## How to Compile
You can compile the code yourself from scratch as follows
>cd /code
>javac *.java multibinning/business/*.java multibinning/data/*.java
You can then create the .jar file as follows
>jar cmf mainclass.txt ipd.jar *.class multibinning/business/*.class multibinning/data/*.java
## How to Run
You run IPD as follows
>java -jar ipd.jar <parameters>
where for the IPD methors you'll need to specify the following parameters:
| Parameter | Meaning
| ---------------------- | ----
| -FILE_INPUT | name of input file
| -FILE_CP_OUTPUT | name of output file for cut points per dimension
| -FILE_RUNTIME_OUTPUT | name of output file for runtime in microseconds
| -FILE_DATA_OUTPUT | name of output file for discretized data in .arff format
| -NUM_ROWS | number of data points
| -NUM_MEASURE_COLS | number of numeric dimensions
| -NUM_CAT_CONTEXT_COLS | number of categorical dimensions
| -MAX_VAL | maximum value, used for normalization
| -METHOD | method used (0 for IPD_opt, 2 for IPD_gr)
For example,
>java -jar ipd.jar -FILE_INPUT example/simple.csv -FILE_CP_OUTPUT cuts.txt
-FILE_RUNTIME_OUTPUT runtime.txt -FILE_DATA_OUTPUT out.txt -NUM_ROWS 209
-NUM_MEASURE_COLS 2 -NUM_CAT_CONTEXT_COLS 0 -MAX_VAL 1.0 -METHOD 0
the output files of which have been included in /example.
## Related Publications
Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data. Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014.
All synthetic and benchmark data sets used in Nguyen et al. (2014) can be found in /data.
The benchmark datasets were drawn from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/).
The PAMAP data sets can be found at the [PAMAP website](http://www.pamap.org/).
## Unadvertised and Disabled Functionality
The IPD code contains a wide spectrum of discretisation methods that are currently not
documented. Two such methods, Supervised Univariate Discretisation, and PCA-based
Discretisation, require the Weka framework.
While these are *enabled* in the provided jar file, to ease compilation and
general use of the code, these have hence been *disabled* in the code.
## Credits
Most of the IPD code was written by Hoang Vu Nguyen.
Kailash Budhathoki fixed bugs and helped making the code easier to use.