Skip to content

Commit

Permalink
layout improved and compilation and technical questions added
Browse files Browse the repository at this point in the history
  • Loading branch information
klee committed Mar 22, 2016
1 parent b0bc2c1 commit fd610c1
Showing 1 changed file with 48 additions and 7 deletions.
55 changes: 48 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,51 @@ Welcome to kleEpistasis. A tool for performing 3-way genetic interaction analysi

Contact: stefan.kleeberger@gmail.com (Stefan Kleeberger, Programmer), bmm@psych.mpg.de (Prof. Dr. Bertram Mueller-Myhsok, Supervisor) Max Planck Institute of Psychiatry, Munich, 2015

In order to perform brute-force statistical 3-way interaction tests on SNP data, you will have to provide your data in PLINK binary format and a Phenotype in PLINK alternate phenotype format. Please see "http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed" and "http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#pheno" for more information regarding file formats.
## Still in developement!

This tool recently reached the status of BETA. The core functionality is hereby given. If there is any interest in using this tool, I will continue adding features to it. If you are interested in working with this tool and need assistence, please leave me a note (email above).

## compilation & needed thechnologies

Flags:
- You will have to have at least a CUDA 2.0 capable device (if not 3.5 you need to change the Makefile from sm_35 to sm_20)
- ... the complete CUDA toolkit installed
- You need (at the moment) a greater amount of memory (for 5k SNPs & 1k Individuals, we needed ~ 20 GB)
- At least 8 cores
- Candidate SNPs or at least pre-selected SNPs. Genome-wide won't be feasable... yet.
- make changes in the _Makefile_
- make changes in the helperzz/build.sh file
- located in the kleEpistasis directory, you can either run
`make`
or,
`./helperzz/build.sh`
which is as small script I wrote to create the needed filesystem structure and compile using multiple threads simultaniously
- The resulting binary will be located in bin/kleEpistasis

## Changes you may have do need in the _Makefile_

REQUIRED
Please cusomize your Makefile as you need. You may want to change
(friendly reminder: in sh shells there must not be any whitespace between VARIABLE=VALUE)


- the location of your cuda compiler nvcc (line 5)
- change the compute capability according to your device (e.g. m_35 to sm_20, line 5)
- the location of your cuda library (line 6)
- add debugging flags "-g -G" to OPT (line 7)
- and anything you want to change, as long as you know what you are doing.

## How should my data look like?
In order to perform brute-force statistical 3-way interaction tests on SNP data, you will have to provide your data in PLINK binary format and a Phenotype in PLINK alternate phenotype format. Please see

- "http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed"

and

- "http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#pheno"
for more information regarding file formats.


## Flags:
### REQUIRED

'-path [path]' Absolute or relative path to plink files in binary format (.bed .bim .fam) !>> without <<! file extention

Expand All @@ -29,7 +66,7 @@ REQUIRED
'-alphaPercent ]0.0;50.0[' \t Significance level for two-sided test. Allowed values: ]0.0;50.0[


OPTIONAL
### OPTIONAL

'-pheno [0,1,...]' \t If your Phenotype file contains multiple phenotypes, you can use this flag to specify which phenotype should be used.
If you skip the '-pheno' flag, the first phenotype will be used (equivalent to '-pheno 0').
Expand All @@ -53,21 +90,25 @@ We used an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz combined with a NVIDIA Tesl
This programm will altogether create n+2 threads with n as the number passed via the '-threads' flag.


Memory:
## General

### Memory:
Doing a run on 5k SNPs and 1k Individuals, you need at least 20GB of RAM
Execution time:
We were able to perform a run on 5k SNPs and 1k Individuals, on an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz combined with a NVIDIA Tesla K40 in approx. " 2 hours"


Result file:
### Result file:
The file containing the results has 4 columns:
Positio_ SNP1,Positio_ SNP2,Position_SNP3,Calculated_Value
and as many rows as significant results have been found.


Example calls:
### Example calls:

For performing a runtime test:
./bin/kleEpistasis -path /home/testuser/testdata/plink -pathPheno /home/testuser/testdata/pheno.txt -outPath /home/testuser/results/testDataRes.csv -device 0 -blockSize 4 -threads 8 -alphaPercent 5 -testBlockSize 1

For performing a complete run:
./bin/kleEpistasis -path /home/testuser/testdata/plink -pathPheno /home/testuser/testdata/pheno.txt -outPath /home/testuser/results/testDataRes.csv -device 0 -blockSize 4 -threads 8 -alphaPercent 5

0 comments on commit fd610c1

Please sign in to comment.