Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
ipd/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
67 lines (41 sloc)
2.46 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# IPD — Interaction Preserving Discretisation | |
## How to Compile | |
You can compile the code yourself from scratch as follows | |
>cd /code | |
>javac *.java multibinning/business/*.java multibinning/data/*.java | |
You can then create the .jar file as follows | |
>jar cmf mainclass.txt ipd.jar *.class multibinning/business/*.class multibinning/data/*.java | |
## How to Run | |
You run IPD as follows | |
>java -jar ipd.jar <parameters> | |
where for the IPD methors you'll need to specify the following parameters: | |
| Parameter | Meaning | |
| ---------------------- | ---- | |
| -FILE_INPUT | name of input file | |
| -FILE_CP_OUTPUT | name of output file for cut points per dimension | |
| -FILE_RUNTIME_OUTPUT | name of output file for runtime in microseconds | |
| -FILE_DATA_OUTPUT | name of output file for discretized data in .arff format | |
| -NUM_ROWS | number of data points | |
| -NUM_MEASURE_COLS | number of numeric dimensions | |
| -NUM_CAT_CONTEXT_COLS | number of categorical dimensions | |
| -MAX_VAL | maximum value, used for normalization | |
| -METHOD | method used (0 for IPD_opt, 2 for IPD_gr) | |
For example, | |
>java -jar ipd.jar -FILE_INPUT example/simple.csv -FILE_CP_OUTPUT cuts.txt | |
-FILE_RUNTIME_OUTPUT runtime.txt -FILE_DATA_OUTPUT out.txt -NUM_ROWS 209 | |
-NUM_MEASURE_COLS 2 -NUM_CAT_CONTEXT_COLS 0 -MAX_VAL 1.0 -METHOD 0 | |
the output files of which have been included in /example. | |
## Related Publications | |
Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data. Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014. | |
All synthetic and benchmark data sets used in Nguyen et al. (2014) can be found in /data. | |
The benchmark datasets were drawn from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/). | |
The PAMAP data sets can be found at the [PAMAP website](http://www.pamap.org/). | |
## Unadvertised and Disabled Functionality | |
The IPD code contains a wide spectrum of discretisation methods that are currently not | |
documented. Two such methods, Supervised Univariate Discretisation, and PCA-based | |
Discretisation, require the Weka framework. | |
While these are *enabled* in the provided jar file, to ease compilation and | |
general use of the code, these have hence been *disabled* in the code. | |
## Credits | |
Most of the IPD code was written by Hoang Vu Nguyen. | |
Kailash Budhathoki fixed bugs and helped making the code easier to use. | |