Skip to content

Add a new Pipeline to MACSEK

goymann edited this page Jan 28, 2020 · 21 revisions

To add a new Pipeline to MACSEK you first of all have to write a nextflow pipeline. The pipeline of the Tutorial you will find in the git repository under 'TOBIAS-nextflow/TOBIAS_MACSEK/MACSEK_tutorial/fastqc' the docker container which will be build later on is published under 'pgoymann/macsek_tutorial'.

Create the new Pipeline

The entry files for the Pipeline will be present in the same folder as the Pipeline start. So the input Channel of the nextflow pipeline must search for the input files in the present directory. The results must be saved also in the present directory in a Folder called 'out/'. In the up coming example we will add a fastqc/multiqc Pipeline to MACSEK. Write a pipeline with the requirements described above. The fastqc/multiqc pipeline would be look like this.

Channel.fromPath('*.fq').set{channel_fastqc_file} \\input channel gets fastq file 

process fastqc{

  publishDir 'out/', mode:'copy' //the directory for the results
  maxForks 10 //restriction for number of Jobs started in Parallel on your cluster 
  cpus 4 //if their are restrictions in Resources on your Cluster add the CPUS and memory per Job as shown here
  memory '4 GB'
  input:
    file(fastq) from channel_fastqc_file
  output:
    file('*')
    file ('*.zip') into channel_multiqc
  script:
    """
      fastqc $fastq
    """
}

process multiqc{
  publishDir 'out/', mode:'copy' //the directory for the results
  maxForks 10 //restriction for number of Jobs started in Parallel on the cluster 
  cpus 4 //if their are restrictions in resources on your Cluster add the CPUS and memory per Job as shown here
  memory '4 GB'
  input:
    file(fastqc_result) from  channel_multiqc
  output:
    file('*')

  script:
    """
      multiqc $fastqc_result
    """
}

As you see in the first line the input Channel search for the files in the present directory. If you have got restrictions on your cluster in the number of starting jobs in parallel and in Resources you have to specify them in the pipeline. The number of parallel started Jobs can be limited with 'maxForks 10' and the number of resources with the variables cpus and memory. As you see, the results get stored in a folder called 'out/', in the same directory were the pipeline is started. Save the pipeline in file called 'pipeline.nf'

Create Config file for the new pipeline

Also you have to create a config file for the new pipeline like shown bellow. Leave the namespace variable empty MACSEK will automatically recognize your namespace and set it in the config file. You only have to change the process part by setting the container repository for the pipeline processes.

k8s {
   namespace = ''
   serviceAccount = 'nextflowaccount'
   storageClaimName = 'workspace'
   storageMountPath = '/home/backend/workspace/'
}
process {
    executor = "k8s"
    withName:fastqc {
        container = "quay.io/biocontainers/fastqc:0.11.8--2"
    }
    withName:multiqc {
        container = "quay.io/biocontainers/multiqc:1.6--py36h24bf2e0_0"
    }
}

Save both files the config under 'nextflow.config' and the pipeline under 'pipeline.nf'.

Create the new docker Container

To Create the MACSEK container with your pipeline. Create a folder with the name of your pipeline. Afterwards copy the pipeline and the config file in to this directory and save in the 'docker_container/MACSEK/pipelines/' directory. For the example pipeline change in to the tutorial folder their you will find the pipeline and the config file.

$ cd TOBIAS-nextflow/TOBIAS_MACSEK/MACSEK_tutorial/
$ cp -r fastqc/ ../docker_container/MACSEK/pipelines/

Then create the docker container and push it to docker hub.

$ cd ../docker_container/MACSEK/
$ docker build . --tag <set name of container for example: 'macsek_tutorial'>
$ docker tag <set name of container 'macsek_tutorial'> <name of your reposetory 'pgoymann/macsek_tutorial:part4'>
$ docker push <name of the repository 'pgoymann/macsek_tutorial:part4'>

When you have pushed the docker container change bag to the setupskript to deploy MACSEK with the new container on Kubernetes. The example container is published on docker hub pgoymann/macsek_tutorial:part4.

$ cd ../../
$ python Setup_TOBIAS_MACSEK.py --namespace <your name space> --use_MACSEK_container <location of the new container 'pgoymann/macsek_tutorial:part4'>

The the service will run test it by using the communicator script.

$ python Comunicator_for_MACKSEK_TOBIAS.py -pipeline fastqc -input MACSEK_tutorial/SP1.fq -url <url get from the Setup> 

The out put you will find under 'output/'

Add additional variables to the pipeline

Sometimes you need to gave to your pipeline optional parameters they can be send to the pipeline by the Communicator script with the '--key_pairs' command. They get specified by as variable-name=value. They get written in the config file of the pipeline as environment Variables. So like in the example below you can send the string 'Hello' with the communicator script to the pipeline and on the cluster it will be printed.

$ python Comunicator_for_MACKSEK_TOBIAS.py -pipeline test --key_pairs string=Hello -url <url get from the Setup> 

Pipeline:

Channel.from(string).set{channel_string}

process test_string_send{
  echo true
  publishDir 'out/', mode:'copy' //the directory for the results
  maxForks 10 //restriction for number of Jobs started in Parallel on the cluster 
  cpus 4 //if their are restrictions in resources on your Cluster add the CPUS and memory per Job as shown here
  memory '4 GB'
  input:
    val string from channel_string
  output:
      file('*')
  script:
    """
      echo $string >> 'test.txt'
    """
}

If you add the pipeline above to MACSEK, as described in the fastqc example on the cluster you can send a string to the cluster and get bag file with the string inside. Run example The additional variables example is also present in the macse_tutorialcontaienr pgoymann/macsek_tutorial:part4 So create the Service with:

$ python Setup_TOBIAS_MACSEK.py --namespace <your namespace>  --use_MACSEK_container pgoymann/macsek_tutorial:part4

Then run:

$ python Comunicator_for_MACKSEK_TOBIAS.py -pipeline test --key_pairs string='Hello Kubernetes' -url <set url from the setup>
$ cat output/test.txt
Hello Kubernetes