Skip to content

Implementation TOBIAS Kubernetes S3 Version

goymann edited this page Mar 2, 2020 · 19 revisions

Running Jobs on Kubernetes

To run a job you on the Kubernetescluster you have to send a yaml with the configurations to the cluster. To get the les for calculating to the cluster and the results back to your local machine you need the S3-Storage. First, you upload your Data to the S3 then the job on the cluster starts and downloads the les from the S3. When calculation has finished the job uploads the results to the S3 and then your VM can download the results from the S3 to your machine.

Vortrag_TOBIAS_nextflow_Kubernetes_3

How the pipeline works

The most time-consuming part of calculating jobs on the cluster is transfer- ring the data through the S3-Storage to the Cluster. For these reasons, the bigwig’s are storade on the cluster in a NFS Storage so that every pod can get them from there. The plotting on Kubernetes is split into 3 processes.

1. In the first process the bigwig files from the ATACcoreect are sending
   to the NFS volume on the cluster.

Step1

2. When the biwigs are present on the NFS the pipeline starts the plotting on the Cluster. All plots for one         
   motive run in one Pod. When the plotting has finished the Plots get upload to the S3-Storage.    

Step2

3. In the last step when a Pod on the Cluster has finished the files get back download from the S3 to your VM.

Step3

Processes 2 and 3 must be shared, since nextflow only starts as many processes as there are cores available on the vm. So if process 2 and 3 were merged, only as many motifs would be plotted in parallel as there are cores available.

The functions used in the pipeline come from the Python package PYKS. The picture below shows the 3 processes described above and the used functions from the PYKS packages. In the picture you also see the fourth optional process, with this you can clean up the PVc, it starts a job on the cluster, which mounts the PVc and empties it. This process can be activated in the pipeline configuration file.

nextflow_kube_processe