Skip to content

Add a new Pipeline to MACSEK

goymann edited this page Jan 8, 2020 · 21 revisions

To add a new Pipeline to MACSEK you first of all have to write a nextflow pipeline.

Create the new Pipeline

The entry files for the Pipeline will be present in the same folder as the Pipeline start. So the input Channel of the nextflow pipeline must search for the input files in the present directory. The results must be saved also in the present directory in a Folder called 'out/'. In the up coming example we will add a fastqc/multiqc Pipeline to MACSEK. As disribed above write a pipeline with these requirements. The fastqc/multiqc Pipleine would be look like this.

Channel.fromPath('*.fq').set{channel_fastqc_file} \\input channel gets fastq file 

process fastqc{

  publishDir 'out/', mode:'copy' //the direcotry for the results
  maxForks 10 //restriction for number of Jobs startet in Parallel on your cluster 
  cpus 4 //if their are restrictions in Ressources on your Cluster add the CPUS and memory per Job as shown here
  memory '4 GB'
  input:
    file(fastq) from channel_fastqc_file
  output:
    file('*')
    file ('*.zip') into channel_multiqc
  script:
    """
      fastqc $fastq
    """
}

process multiqc{
  publishDir 'out/', mode:'copy' //the direcotry for the results
  maxForks 10 //restriction for number of Jobs startet in Parallel on the cluster 
  cpus 4 //if their are restrictions in ressources on your Cluster add the CPUS and memory per Job as shown here
  memory '4 GB'
  input:
    file(fastqc_result) from  channel_multiqc
  output:
    file('*')

  script:
    """
      multiqc $fastqc_result
    """
}

As you see in the first line the input Channel search for the files in the present directory. If you have got restictions on your cluster in the number of starting jobs in parallel and in Resources you have to specify them in the pipline rescritction. The number of paralel start Jobs can be limited with 'maxForks 10' and the number of resources with the cpus and memory varaible. As you see the results get stored in a folder called 'out/' in the same directory were the pipeline is started. Save the pipeline in file called 'pipeline.nf'

Create Config file for the new pipeline

Also you have to create a config file for

process {
    executor = "k8s"
    withName:fastqc {
        container = "quay.io/biocontainers/fastqc:0.11.8--2"
    }
    withName:multiqc {
        container = "quay.io/biocontainers/multiqc:1.6--py36h24bf2e0_0"
    }
}