README.md

# IPD — Interaction Preserving Discretisation

## How to Compile

You can compile the code yourself from scratch as follows
>cd /code
>javac *.java multibinning/business/*.java multibinning/data/*.java

You can then create the .jar file as follows
>jar cmf mainclass.txt ipd.jar *.class multibinning/business/*.class multibinning/data/*.java


## How to Run

You run IPD as follows
>java -jar ipd.jar <parameters>

where for the IPD methors you'll need to specify the following parameters:

| Parameter              | Meaning
| ---------------------- | ----
| -FILE_INPUT            | name of input file
| -FILE_CP_OUTPUT        | name of output file for cut points per dimension
| -FILE_RUNTIME_OUTPUT   | name of output file for runtime in microseconds
| -FILE_DATA_OUTPUT      | name of output file for discretized data in .arff format
| -NUM_ROWS              | number of data points
| -NUM_MEASURE_COLS      | number of numeric dimensions
| -NUM_CAT_CONTEXT_COLS  | number of categorical dimensions
| -MAX_VAL               | maximum value, used for normalization
| -METHOD                | method used (0 for IPD_opt, 2 for IPD_gr)

For example,

>java -jar ipd.jar -FILE_INPUT example/simple.csv  -FILE_CP_OUTPUT cuts.txt
  -FILE_RUNTIME_OUTPUT runtime.txt  -FILE_DATA_OUTPUT out.txt  -NUM_ROWS 209
  -NUM_MEASURE_COLS 2 -NUM_CAT_CONTEXT_COLS 0 -MAX_VAL 1.0  -METHOD 0

  the output files of which have been included in /example.


## Related Publications

Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data.  Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014.

All synthetic and benchmark data sets used in Nguyen et al. (2014) can be found in /data.
The benchmark datasets were drawn from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/).
The PAMAP data sets can be found at the [PAMAP website](http://www.pamap.org/).


## Unadvertised and Disabled Functionality

The IPD code contains a wide spectrum of discretisation methods that are currently not
documented. Two such methods, Supervised Univariate Discretisation, and PCA-based
Discretisation, require the Weka framework.

While these are *enabled* in the provided jar file, to ease compilation and
general use of the code, these have hence been *disabled* in the code.


## Credits

Most of the IPD code was written by Hoang Vu Nguyen.
Kailash Budhathoki fixed bugs and helped making the code easier to use.
	# IPD — Interaction Preserving Discretisation

	## How to Compile

	You can compile the code yourself from scratch as follows
	>cd /code
	>javac .java multibinning/business/.java multibinning/data/*.java

	You can then create the .jar file as follows
	>jar cmf mainclass.txt ipd.jar .class multibinning/business/.class multibinning/data/*.java



	## How to Run

	You run IPD as follows
	>java -jar ipd.jar <parameters>

	where for the IPD methors you'll need to specify the following parameters:

	\| Parameter \| Meaning
	\| ---------------------- \| ----
	\| -FILE_INPUT \| name of input file
	\| -FILE_CP_OUTPUT \| name of output file for cut points per dimension
	\| -FILE_RUNTIME_OUTPUT \| name of output file for runtime in microseconds
	\| -FILE_DATA_OUTPUT \| name of output file for discretized data in .arff format
	\| -NUM_ROWS \| number of data points
	\| -NUM_MEASURE_COLS \| number of numeric dimensions
	\| -NUM_CAT_CONTEXT_COLS \| number of categorical dimensions
	\| -MAX_VAL \| maximum value, used for normalization
	\| -METHOD \| method used (0 for IPD_opt, 2 for IPD_gr)

	For example,

	>java -jar ipd.jar -FILE_INPUT example/simple.csv -FILE_CP_OUTPUT cuts.txt
	-FILE_RUNTIME_OUTPUT runtime.txt -FILE_DATA_OUTPUT out.txt -NUM_ROWS 209
	-NUM_MEASURE_COLS 2 -NUM_CAT_CONTEXT_COLS 0 -MAX_VAL 1.0 -METHOD 0

	the output files of which have been included in /example.




	## Related Publications

	Nguyen, H-V, Müller, E, Vreeken, J & Böhm, K Unsupervised Interaction-Preserving Discretization of Multivariate Data. Data Mining and Knowledge Discovery vol.28(5), pp 1366-1397, Springer, 2014.

	All synthetic and benchmark data sets used in Nguyen et al. (2014) can be found in /data.
	The benchmark datasets were drawn from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/).
	The PAMAP data sets can be found at the [PAMAP website](http://www.pamap.org/).


	## Unadvertised and Disabled Functionality

	The IPD code contains a wide spectrum of discretisation methods that are currently not
	documented. Two such methods, Supervised Univariate Discretisation, and PCA-based
	Discretisation, require the Weka framework.

	While these are enabled in the provided jar file, to ease compilation and
	general use of the code, these have hence been disabled in the code.


	## Credits

	Most of the IPD code was written by Hoang Vu Nguyen.
	Kailash Budhathoki fixed bugs and helped making the code easier to use.