From d24d9c9b176f34b16446e6476c3e10e07972d064 Mon Sep 17 00:00:00 2001 From: Sarvesh Prakash Nikumbh Date: Wed, 19 Jul 2017 18:07:20 +0200 Subject: [PATCH] Minor changes in README --- README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index aaa60b7..ff26c58 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,9 @@ # CoMIK : Conformal Multi-Instance Kernels Manuscript authors: Sarvesh Nikumbh, Peter Ebert, Nico Pfeifer + To appear in Proceedings of WABI 2017. -Preprint at bioRxiv: +DOI at bioRxiv: https://doi.org/10.1101/139618 ## Requirements: - MATLAB 9.0.0.341360 (R2016a) [We developed _comik_ using this version of MATLAB. Compatibility with earlier versions is to be checked.] @@ -28,11 +29,11 @@ For simulated dataset 1 provided in the folder `sample_data` comik_wrapper('sample_data/simulated_dataset1/pos.fasta', 'sample_data/simulated_dataset1/neg.fa', 600, 600, [501:600 1101:1200], 'comik_run_simulated_dataset1', [2], 10, 10, [2 5 7], 10.^[1:1:2], 10.^[-3:1:3], 2.0, 10, 5, 'No', 'Yes', 2, 'runLog.txt'); ``` -_comik_ accepts two FASTA files as input -- the first FASTA file containing sequences in the positive class followed by a second FASTA file containing the sequences in the negative class. Other params are explained below. +_CoMIK_ accepts two FASTA files as input -- the first FASTA file containing sequences in the positive class followed by a second FASTA file containing the sequences in the negative class. Other params are explained below. Further details: -The comik wrapper function takes the following arguments as input +The _CoMIK_ wrapper function takes the following arguments as input - positive FASTA filename [type: str] - negative FASTA filename [type: str] @@ -74,7 +75,7 @@ segment-size: 100 or 200 **Note 3**: Presently, _comik_ uses MATLAB parfor-loop to execute the outer cross-validation folds in parallel. During the run, -* _comik_ omits the sequences whose lengths are shorter than the segment-size specified from the run. It reports the number of sequences that got ommitted, their FASTA-Ids in a separate file named `ommittedFastaIds.txt` per outer fold separately. +* _CoMIK_ omits the sequences whose lengths are shorter than the segment-size specified from the run. It reports the number of sequences that got ommitted, their FASTA-Ids in a separate file named `ommittedFastaIds.txt` per outer fold separately. * the following files are written to the disk per outer fold at any intermediate stage of the pipeline. Most of these are used by the pipeline itself in its subsequent stages. - Run summary file: The resultString is also written to the summary file which is characterized by the segment-size and oligomer-length. The summary file is typically named: 'runSummary_segment-sizeX_oligoLenY.txt' where X and Y are as set for the pipeline run.