MAnaging Computing SErvices on Kubernetes

MACSEK

MACSEK(MAnaging Computing SErvices on Kubernetes) is a framework that provides a computing service on Kubernetes cluster, such as provided by deNBi. It creates a service on the clusters, which allows the upload of files for all kinds of calculation, manages resource allocation on the cluster and provides the download of the results.

Features

  • Combine computing resources of a cluster with individual software/pipeline
  • Manages the file transfer on the user side
  • Automatated deployment of the service on the cluster
  • Manage the calculation on the cluster

MACSEK Overview

The MACSEK service is realized by two deployments, namely a NGINX server, which manages file services, and a software specific deployment that starts the pipelines with the respective data. The NINGX has two accessible URL paths, one for uploading data for processing, and the second one for downloading the generated result files. While the upload path is unrestricted, the download path is restricted to individual workloads. To communicate with the service from a locally running pipeline, MACSEK provides a Python script that can be integrated into e.g. Nextflow at a certain step. As proof of principle, we set up a nextflow pipeline for the TOBIS tool. For this example, the pipeline receives BED and BigWig formated files as input for calculation. In order to speed up the upload, the files are automatically packed into a tar-archive. In addition to the files, a configuration file is created, including an automatically generated password and a user ID for subsequent download of results. Both are md5 hashed before uploaded. In addition to the access data, a unique ID for the calculation is generated, enabling the pipeline to map results back to the calling user. Result files are packed as well and reslut transfer back to the enduser is done via the NGINX gain.

The 3 parts of MACSEK: The automatic deployment of the service by the administrator, the automatated file transfer from/to the users and the calculation on the cluster.

MACSEK details

In order to manage file transfers, apllication virtualisation and the multiple user/process environment on the cluster as a service, MACSEK utilizes an NGINX serverice for webbased user data interactions and various types of volumens within the Kubernetes cluster. In the TOBIAS example, the NGINX receives the input files from a local user TOBIAS pipeline and forwards them to a persistent volume on the cluster. The TOBIAS deployment constantly monitors this volume for new incoming workloads. Once the NGINX stores a new input file, a thread is triggered that immediately starts processing the input. After unpacking the supplied configuration file is used to generate a directory on the cluster into which the uploaded files are moved. In addition, the authentication data generated during the pipeline call is used to generate an account on the NGINX for the subsequent result download. Next, a configuration file for the pipeline is automatically built to start the calculation. Once the pipeline is started, the workload is optimized to utilize all assigned computing resource of the cluster. Therefor the individual processes are started as independent pods, including application containers. When the calculation is finished, results are aggreated and finally transfered to the previously created directory under the given ID. A path for the download is then built with according to the assigned ID as well. While the calculation runs on Kubernetes, the MACSEK module, which runs locally at the user client, checks whether the result files are available for download. Once calculation is finished and results are downloaded, the results are unpacked and subsequently provided for further steps that might run locally on the client. Finally, the MACSEK module sends the user ID to the cluster as a signal to have finalized the download. Triggered by this signal, the cluster terminates all files and folders connected to the finished workload.