Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
LSTrAP/docs/configuration.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
189 lines (142 sloc)
6.27 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Configuring LSTrAP | |
## config.ini | |
Before running LSTrAP make sure **all paths in config.ini match your system's setup**, e.g. Trimmomatic adapters and jar will | |
need to be adapted. | |
### qsub parameters | |
Additional parameters can be added to the qsub commands at the bottom, | |
this allows users to submit jobs to specific queues, with specific | |
options, ... Furthermore, while the template is designed for Oracle/Sun | |
Grid Engine this can be set up to work with other job management systems | |
such as PBS and Torque. At the bottom section there is an example on how to | |
set the number of nodes/cores on PBS/Torque and how to add a walltime (if | |
required). | |
**Match the number of cores** to the number of cores the job needs. When | |
starting TopHat with **-p 3**, the job will require 4 cores (3 worker | |
threads and a background thread are active when a job is started this | |
way). | |
### Environment modules | |
In case environment modules are not used, all software needs to be installed on the cluster + nodes. You also need | |
to set all modules in the config.ini to **None** ! | |
In your config file, module names need to be specified. To see which modules are available on your system type: | |
module avail | |
Add the module name for each of the tools to your config.ini if your system is using environmental modules. | |
### Tweaking parameters of individual tools (expert feature) | |
In case you would like to tweak parameters passed to tools, this would be the place to do so. Note however that the tools | |
will run with the same settings for each file. Modifying parameters that would **change the output name or format will | |
cause the pipeline to break**. Arguments with a name like *${var}* should **not** be changed as this is how the pipeline | |
defines the input and output for each tool. | |
Example config.ini: | |
```ini | |
[TOOLS] | |
; Tool Configuration | |
; | |
; Some tools require additional files or might require a hard coded path to the script. | |
; Please make sure these are set up correctly. | |
; Trimmomatic Path | |
; ADJUST THIS | |
trimmomatic_path=/home/sepro/tools/Trimmomatic-0.36/trimmomatic-0.36.jar | |
; COMMANDS to run tools | |
; | |
; Here the commands used to start different steps are defined, ${name} are variables that will be set by LSTrAP for | |
; each job. | |
; Note that in some cases hard coded paths were required, adjust these to match the location of these files on | |
; your system | |
bowtie_cmd=bowtie2-build ${in} ${out} | |
hisat2_build_cmd=hisat2-build ${in} ${out} | |
; ADJUST PATHS TO ADAPTERS | |
trimmomatic_se_command=java -jar ${jar} SE -threads 1 ${in} ${out} ILLUMINACLIP:/home/sepro/tools/Trimmomatic-0.36/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 | |
trimmomatic_pe_command=java -jar ${jar} PE -threads 1 ${ina} ${inb} ${outap} ${outau} ${outbp} ${outbu} ILLUMINACLIP:/home/sepro/tools/Trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 | |
tophat_se_cmd=tophat -p 3 -o ${out} ${genome} ${fq} | |
tophat_pe_cmd=tophat -p 3 -o ${out} ${genome} ${forward},${reverse} | |
hisat2_se_cmd=hisat2 -p 3 -x ${genome} -U ${fq} -S ${out} 2> ${stats} | |
hisat2_pe_cmd=hisat2 -p 3 -x ${genome} -1 ${forward} -2 ${reverse} -S ${out} 2> ${stats} | |
htseq_count_cmd=htseq-count -s no -f ${itype} -t ${feature} -i ${field} ${bam} ${gff} > ${out} | |
interproscan_cmd=interproscan.sh -i ${in_dir}/${in_prefix}${SGE_TASK_ID} -o ${out_dir}/${out_prefix}${SGE_TASK_ID} -f tsv -dp -iprlookup -goterms --tempdir /tmp | |
pcc_cmd=python3 ./scripts/pcc.py ${in} ${out} ${mcl_out} | |
mcl_cmd=mcl ${in} --abc -o ${out} -te 4 | |
; ADJUST THIS | |
mcxdeblast_cmd=perl /apps/biotools/mcl-14.137/bin/mcxdeblast --m9 --line-mode=abc ${blast_in} > ${abc_out} | |
; ADJUST THIS | |
orthofinder_cmd=python /home/sepro/OrthoFinder-0.4/orthofinder.py -f ${fasta_dir} -t 8 | |
; qsub parameters (OGE) | |
qsub_indexing='' | |
qsub_trimmomatic='' | |
qsub_tophat='-pe cores 4' | |
qsub_htseq_count='' | |
qsub_interproscan='-pe cores 5' | |
qsub_pcc='' | |
qsub_mcl='-pe cores 4' | |
qsub_orthofinder='-pe cores 8' | |
qsub_mcxdeblast='' | |
; qsub parameters (PBS/Torque) | |
; qsub_indexing='' | |
; qsub_trimmomatic='' | |
; qsub_tophat='-l nodes=1,ppn=4' | |
; qsub_htseq_count='' | |
; qsub_interproscan='-l nodes=1,ppn=5' | |
; qsub_pcc='' | |
; qsub_mcl='-l nodes=1,ppn=4' | |
; qsub_orthofinder='-l nodes=1,ppn=8' | |
; qsub_mcxdeblast='' | |
; qsub parameters (PBS/Torque with walltimes) | |
; qsub_indexing='-l walltime=00:10:00' | |
; qsub_trimmomatic='-l walltime=00:10:00' | |
; qsub_tophat='-l nodes=1,ppn=4 -l walltime=00:10:00' | |
; qsub_htseq_count=' -l walltime=00:02:00' | |
; qsub_interproscan='-l nodes=1,ppn=5 -l walltime=00:10:00' | |
; qsub_pcc=' -l walltime=00:10:00' | |
; qsub_mcl='-l nodes=1,ppn=4 -l walltime=00:10:00' | |
; qsub_orthofinder='-l nodes=1,ppn=8 -l walltime=01:00:00' | |
; qsub_mcxdeblast='-l walltime=00:10:00' | |
; Module names | |
; These need to be configured if the required tools are installed in the environment modules. | |
; You can find the modules installed on your system using | |
; | |
; module avail | |
; | |
; In case there is no module load system on the system set the module name to None | |
bowtie_module=biotools/bowtie2-2.2.6 | |
samtools_module=biotools/samtools-1.3 | |
sratoolkit_module=biotools/sratoolkit-2.5.7 | |
tophat_module=biotools/tophat-2.1.0 | |
hisat2_module= | |
interproscan_module=biotools/interproscan-5.16-55.0 | |
blast_module=biotools/ncbi-blast-2.3.0+ | |
mcl_module=biotools/mcl-14.137 | |
python_module=devel/Python-2.7.10 | |
python3_module=devel/Python-3.5.1 | |
``` | |
## data.ini | |
The location of your data needs to be defined in your data.ini file. | |
Example data.ini file: | |
```ini | |
[GLOBAL] | |
; add all genomes, use semi-colons to separate multiple cfr. zma;ath | |
genomes=zma | |
; enter email to receive status updates from the cluster | |
; setting the email to None will disable this | |
email=None | |
; orthofinder settings (runs on all species) | |
orthofinder_output=./output/orthofinder | |
[zma] | |
cds_fasta= | |
protein_fasta= | |
genome_fasta= | |
gff_file= | |
gff_feature=CDS | |
gff_id=Parent | |
fastq_dir=./data/zma/fastq | |
tophat_cutoff=65 | |
htseq_cutoff=40 | |
indexing_output=./output/bowtie-build/zma | |
trimmomatic_output=./output/trimmed_fastq/zma | |
alignment_output=./tmp/tophat/zma | |
htseq_output=./output/htseq/zma | |
exp_matrix_output=./output/zma/exp_matrix.txt | |
exp_matrix_tpm_output=./output/zma/exp_matrix.tpm.txt | |
exp_matrix_rpkm_output=./output/zma/exp_matrix.rpkm.txt | |
interpro_output=./output/interpro/zma | |
pcc_output=./output/zma/pcc.std.txt | |
pcc_mcl_output=./output/zma/pcc.mcl.txt | |
mcl_cluster_output=./output/zma/mcl.clusters.txt | |
``` |