Skip to content

Commit

Permalink
Further formatting and typos
Browse files Browse the repository at this point in the history
  • Loading branch information
puetz committed Mar 23, 2016
1 parent fd610c1 commit de2f7fa
Showing 1 changed file with 65 additions and 48 deletions.
113 changes: 65 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,112 +3,129 @@ Brute-force 3-way Interaction calculation for genetic studies

Welcome to kleEpistasis. A tool for performing 3-way genetic interaction analysis.

Contact: stefan.kleeberger@gmail.com (Stefan Kleeberger, Programmer), bmm@psych.mpg.de (Prof. Dr. Bertram Mueller-Myhsok, Supervisor) Max Planck Institute of Psychiatry, Munich, 2015

## Still in developement!
Contact: [Stefan Kleeberger](mailto:stefan.kleeberger@gmail.com) (Programmer), [Prof. Dr. Bertram Müller-Myhsok](mailto:bmm@psych.mpg.de) (Supervisor), Max Planck Institute of Psychiatry, Munich, 2016

This tool recently reached the status of BETA. The core functionality is hereby given. If there is any interest in using this tool, I will continue adding features to it. If you are interested in working with this tool and need assistence, please leave me a note (email above).
## Still in development!

## compilation & needed thechnologies
This tool recently reached the status of BETA. The core functionality is hereby given. If there is any interest in using this tool, I will continue adding features to it. If you are interested in working with this tool and need assistance, please leave [me a note](mailto:stefan.kleeberger@gmail.com) (email above).

## Compilation & needed technologies

- You will have to have at least a CUDA 2.0 capable device (if not 3.5 you need to change the Makefile from sm_35 to sm_20)
- ... the complete CUDA toolkit installed
- You need (at the moment) a greater amount of memory (for 5k SNPs & 1k Individuals, we needed ~ 20 GB)
- At least 8 cores
- Candidate SNPs or at least pre-selected SNPs. Genome-wide won't be feasable... yet.
- Candidate SNPs or at least pre-selected SNPs. Genome-wide won't be feasible ... yet.
- make changes in the _Makefile_
- make changes in the helperzz/build.sh file
- make changes in the _helperzz/build.sh_ file
- located in the kleEpistasis directory, you can either run
`make`
or,
or,
`./helperzz/build.sh`
which is as small script I wrote to create the needed filesystem structure and compile using multiple threads simultaniously
- The resulting binary will be located in bin/kleEpistasis
which is a small script to create the needed filesystem structure and compile using multiple threads simultaneously
- The resulting binary will be located in *bin/kleEpistasis*.

## Changes you may have do need in the _Makefile_
## Changes you may need to make in the _Makefile_

Please cusomize your Makefile as you need. You may want to change
Please customize your Makefile as you need. You may want to change
(friendly reminder: in sh shells there must not be any whitespace between VARIABLE=VALUE)


- the location of your cuda compiler nvcc (line 5)
- the location of your CUDA compiler nvcc (line 5)
- change the compute capability according to your device (e.g. m_35 to sm_20, line 5)
- the location of your cuda library (line 6)
- add debugging flags "-g -G" to OPT (line 7)
- the location of your CUDA library (line 6)
- add debugging flags "-g -G" to OPT (line 7)
- and anything you want to change, as long as you know what you are doing.

## How should my data look like?
In order to perform brute-force statistical 3-way interaction tests on SNP data, you will have to provide your data in PLINK binary format and a Phenotype in PLINK alternate phenotype format. Please see
## What should my data look like?
In order to perform brute-force statistical 3-way interaction tests on SNP data with kleEpistasis you will have to provide your genotype data in PLINK binary format and a phenotype in PLINK alternate phenotype format. Please see

- "http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed"
- [PLINK bed file documentation]("http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed")

and

- "http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#pheno"
- [PLINK phenotype file documentation]("http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#pheno")

for more information regarding file formats.


## Flags:
### REQUIRED

'-path [path]' Absolute or relative path to plink files in binary format (.bed .bim .fam) !>> without <<! file extention
`-path [path]` Absolute or relative path to plink files in binary format (.bed .bim .fam) *without* file extension

'-pathPheno [path]' Absolute or relative path to plink alternate phenotype file !>> with <<! file extention\
`-pathPheno [path]` Absolute or relative path to plink alternate phenotype file *with* file extension\

'-outPath [path]' Absolute or relative path to file where results will be written !>> with <<! file extention (.csv)
`-outPath [path]` Absolute or relative path to file where results will be written *with* file extension (.csv)

'-device [0,1,...]' GPU identifier, starting from 0 for first Graphics Card increasing in whole numbers
`-device [0,1,...]` GPU identifier, starting from 0 for first Graphics Card increasing in whole numbers

'-blockSize [2,3,...]' \t parameter for optimizing runtime. See explicit paragraph beneath
`-blockSize [2,3,...]` \t parameter for optimizing runtime. See explicit paragraph beneath

'-threads [1,2,...]' \t Number of threads to process results. See explicit paragraph beneath
`-threads [1,2,...]` \t Number of threads to process results. See explicit paragraph beneath

'-alphaPercent ]0.0;50.0[' \t Significance level for two-sided test. Allowed values: ]0.0;50.0[
`-alphaPercent ]0.0;50.0[` \t Significance level for two-sided test. Allowed values: ]0.0;50.0[


### OPTIONAL

'-pheno [0,1,...]' \t If your Phenotype file contains multiple phenotypes, you can use this flag to specify which phenotype should be used.
If you skip the '-pheno' flag, the first phenotype will be used (equivalent to '-pheno 0').
`-pheno [0,1,...]` \t If your phenotype file contains multiple phenotypes, you can use this flag to specify which phenotype should be used.
If you skip the `-pheno` flag, the first phenotype will be used (equivalent to `-pheno 0`).
The second phenotype has index 1, the third index 2 and so forth...

'-testBlockSize 1' \t This will start a test-run with only one sub-matrix to be calculated.
Use this flag to reduce runtime to test for the best possible value for '-blockSize'
`-testBlockSize 1` \t This will start a test-run with only one sub-matrix to be calculated.
Use this flag to reduce runtime to test for the best possible value for `-blockSize`
This will not create ANY results!

-blockSize:
If you are familiar with CUDA, you know what's it about. Otherwise you needn't get too deep into this.
`-blockSize`:
If you are familiar with CUDA, you know what it is about. Otherwise you needn't get too deep into this.
All you need to know is that this parameter has to be found by trial and error.
Use the '-testBlockSize 1' to reduce runtime dramatically and try different values for blockSize
We achived best results with '-blockSize 4' on a NVIDIA Tesla K40
Use the `-testBlockSize 1` to reduce runtime dramatically and try different values for blockSize
We achieved best results with `-blockSize 4` on a NVIDIA Tesla K40

-threads:
Results will be processed in backroud by the CPU while the GPU creates new results.
Depending on your CPU und Graphics Card, the GPU has to wait for the CPU to finish befor copying the next results.
To resolve this the task is split into chuncks and processed by one thread each.
We used an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz combined with a NVIDIA Tesla K40 and 8 threads (-threads 8) without encountering wait time.
This programm will altogether create n+2 threads with n as the number passed via the '-threads' flag.
`-threads`:
Results will be processed in the background by the CPU while the GPU creates new results.
Depending on your CPU und Graphics Card, the GPU has to wait for the CPU to finish before copying the next results.
To resolve this the task is split into chunks and processed by one thread each.
We used an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz combined with a NVIDIA Tesla K40 and 8 threads (`-threads 8`) without encountering wait time.
This program will altogether create n+2 threads with n as the number passed via the `-threads` flag.


## General

### Memory:
Doing a run on 5k SNPs and 1k Individuals, you need at least 20GB of RAM
Execution time:
We were able to perform a run on 5k SNPs and 1k Individuals, on an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz combined with a NVIDIA Tesla K40 in approx. " 2 hours"
Doing a run on 5k SNPs and 1k Individuals, you need at least 20GB of RAM
Execution time:
We were able to perform a run on 5k SNPs and 1k Individuals, on an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz combined with a NVIDIA Tesla K40 in approx. "2 hours"


### Result file:
The file containing the results has 4 columns:
Positio_ SNP1,Positio_ SNP2,Position_SNP3,Calculated_Value
The file containing the results has 4 columns:
Positio_SNP1, Positio_SNP2, Position_SNP3, Calculated_Value
and as many rows as significant results have been found.


### Example calls:

For performing a runtime test:
./bin/kleEpistasis -path /home/testuser/testdata/plink -pathPheno /home/testuser/testdata/pheno.txt -outPath /home/testuser/results/testDataRes.csv -device 0 -blockSize 4 -threads 8 -alphaPercent 5 -testBlockSize 1
```
./bin/kleEpistasis \
-path /home/testuser/testdata/plink \
-pathPheno /home/testuser/testdata/pheno.txt \
-outPath /home/testuser/results/testDataRes.csv \
-device 0 \
-blockSize 4 -threads 8 \
-alphaPercent 5 -testBlockSize 1
```

For performing a complete run:
./bin/kleEpistasis -path /home/testuser/testdata/plink -pathPheno /home/testuser/testdata/pheno.txt -outPath /home/testuser/results/testDataRes.csv -device 0 -blockSize 4 -threads 8 -alphaPercent 5

```
./bin/kleEpistasis \
-path /home/testuser/testdata/plink \
-pathPheno /home/testuser/testdata/pheno.txt \
-outPath /home/testuser/results/testDataRes.csv \
-device 0 \
-blockSize 4 -threads 8 \
-alphaPercent 5
```

0 comments on commit de2f7fa

Please sign in to comment.