Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Minor changes in README
  • Loading branch information
snikumbh committed Jul 19, 2017
1 parent 94466a1 commit d24d9c9
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions README.md
@@ -1,8 +1,9 @@
# CoMIK : Conformal Multi-Instance Kernels

Manuscript authors: Sarvesh Nikumbh, Peter Ebert, Nico Pfeifer

To appear in Proceedings of WABI 2017.
Preprint at bioRxiv:
DOI at bioRxiv: https://doi.org/10.1101/139618

## Requirements:
- MATLAB 9.0.0.341360 (R2016a) [We developed _comik_ using this version of MATLAB. Compatibility with earlier versions is to be checked.]
Expand All @@ -28,11 +29,11 @@ For simulated dataset 1 provided in the folder `sample_data`
comik_wrapper('sample_data/simulated_dataset1/pos.fasta', 'sample_data/simulated_dataset1/neg.fa', 600, 600, [501:600 1101:1200], 'comik_run_simulated_dataset1', [2], 10, 10, [2 5 7], 10.^[1:1:2], 10.^[-3:1:3], 2.0, 10, 5, 'No', 'Yes', 2, 'runLog.txt');
```

_comik_ accepts two FASTA files as input -- the first FASTA file containing sequences in the positive class followed by a second FASTA file containing the sequences in the negative class. Other params are explained below.
_CoMIK_ accepts two FASTA files as input -- the first FASTA file containing sequences in the positive class followed by a second FASTA file containing the sequences in the negative class. Other params are explained below.

Further details:

The comik wrapper function takes the following arguments as input
The _CoMIK_ wrapper function takes the following arguments as input

- positive FASTA filename [type: str]
- negative FASTA filename [type: str]
Expand Down Expand Up @@ -74,7 +75,7 @@ segment-size: 100 or 200
**Note 3**: Presently, _comik_ uses MATLAB parfor-loop to execute the outer cross-validation folds in parallel.

During the run,
* _comik_ omits the sequences whose lengths are shorter than the segment-size specified from the run. It reports the number of sequences that got ommitted, their FASTA-Ids in a separate file named `ommittedFastaIds.txt` per outer fold separately.
* _CoMIK_ omits the sequences whose lengths are shorter than the segment-size specified from the run. It reports the number of sequences that got ommitted, their FASTA-Ids in a separate file named `ommittedFastaIds.txt` per outer fold separately.

* the following files are written to the disk per outer fold at any intermediate stage of the pipeline. Most of these are used by the pipeline itself in its subsequent stages.
- Run summary file: The resultString is also written to the summary file which is characterized by the segment-size and oligomer-length. The summary file is typically named: 'runSummary_segment-sizeX_oligoLenY.txt' where X and Y are as set for the pipeline run.
Expand Down

0 comments on commit d24d9c9

Please sign in to comment.