-
Notifications
You must be signed in to change notification settings - Fork 0
Add a new Pipeline to MACSEK
To add a new Pipeline to MACSEK you first of all have to write a nextflow pipeline. The pipeline of the Tutorial you will find in the git reposetory under 'TOBIAS-nextflow/TOBIAS_MACSEK/MACSEK_tutorial/' the docker container which will be build later on is published under 'pgoymann/macsek_tutorial'.
The entry files for the Pipeline will be present in the same folder as the Pipeline start. So the input Channel of the nextflow pipeline must search for the input files in the present directory. The results must be saved also in the present directory in a Folder called 'out/'. In the up coming example we will add a fastqc/multiqc Pipeline to MACSEK. As disribed above write a pipeline with these requirements. The fastqc/multiqc Pipleine would be look like this.
Channel.fromPath('*.fq').set{channel_fastqc_file} \\input channel gets fastq file
process fastqc{
publishDir 'out/', mode:'copy' //the direcotry for the results
maxForks 10 //restriction for number of Jobs startet in Parallel on your cluster
cpus 4 //if their are restrictions in Ressources on your Cluster add the CPUS and memory per Job as shown here
memory '4 GB'
input:
file(fastq) from channel_fastqc_file
output:
file('*')
file ('*.zip') into channel_multiqc
script:
"""
fastqc $fastq
"""
}
process multiqc{
publishDir 'out/', mode:'copy' //the direcotry for the results
maxForks 10 //restriction for number of Jobs startet in Parallel on the cluster
cpus 4 //if their are restrictions in ressources on your Cluster add the CPUS and memory per Job as shown here
memory '4 GB'
input:
file(fastqc_result) from channel_multiqc
output:
file('*')
script:
"""
multiqc $fastqc_result
"""
}
As you see in the first line the input Channel search for the files in the present directory. If you have got restictions on your cluster in the number of starting jobs in parallel and in Resources you have to specify them in the pipline rescritction. The number of paralel start Jobs can be limited with 'maxForks 10' and the number of resources with the cpus and memory varaible. As you see the results get stored in a folder called 'out/' in the same directory were the pipeline is started. Save the pipeline in file called 'pipeline.nf'
Also you have to create a config file for the new pipeline like shown bellow. Leave the namespace variable empty MACSEK will automatically recognize your namespace and set it in the config file. You only have to change the process part by setting the container repository for the pipeline processes.
k8s {
namespace = ''
serviceAccount = 'nextflowaccount'
storageClaimName = 'workspace'
storageMountPath = '/home/backend/workspace/'
}
process {
executor = "k8s"
withName:fastqc {
container = "quay.io/biocontainers/fastqc:0.11.8--2"
}
withName:multiqc {
container = "quay.io/biocontainers/multiqc:1.6--py36h24bf2e0_0"
}
}
Save both files the config under 'nextflow.config' and the pipeline under 'pipeline.nf'.
To Create the MACSEK container with your pipeline. Create a folder with the name of your pipeline in the pipelines folder in the MACSEK docker folder as shown bellow. Afterwards copy the pipeline and the config filr in this directory. For the example pipeline change in to the Tutorial folder their you will find the pipeline and the config file.
$ cd TOBIAS-nextflow/TOBIAS_MACSEK/MACSEK_tutorial/
$ cp -r fastqc/ ../docker_container/MACSEK/pipelines/
The create the docker contianer and push it to docker hub.
$ cd ../docker_container/MACSEK/
$ docker build . --tag <set name of container for example 'macsek_tutorial'>
$ docker tag <set name of container 'macsek_tutorial'> <name of your reposetory 'pgoymann/macsek_tutorial:part1'>
$ docker push <name of the reposetory 'pgoymann/macsek_tutorial:part1'>
When you have pushed the docker container change bag to the setupskript to deploy MACSEK with the new container on Kubernetes.
$ cd ../../
$ python Setup_TOBIAS_MACSEK.py --namespace <your name space> --use_MACSEK_container <location of the new container 'pgoymann/macsek_tutorial:part1'>
The the service will run test it by using the Comunicator skript.
$ python Comunicator_for_MACKSEK_TOBIAS.py -pipeline fastqc -input MACSEK_tutorial/SP1.fq -url <url getfrom the Setup>
The out put you will find under 'output/'
## Add ditonal variables to the pipeline