Skip to content

Implementation of TOBIAS Cloud Service

goymann edited this page Jan 7, 2020 · 13 revisions

The service consists of two deployments the NGINX, which serves as file server, and the deployment that starts the pipelines and processes data. The NINGX has two accessible URL paths, one for uploading and the other for downloading the result files. While the upload path is open, the download path is only accessible via a set password and a user name. To communicate with the service, a Python script was written that is integrated in the Nextflow pipeline of TOBIAS. The pipeline receives the Bed and BigWig files for calculation. For the upload, the files are automatically packed by the Python script into a tare archive. In addition to the files, a configuration file is created which contains an automatically generated password and a user name for later download. These two are hashed with mdm5 before the upload and entered into the configuration file. In addition to the access data, a unique ID for the calculation is entered in the configuration file and the pipeline to be executed is specified. The tar archive created in this way is then uploaded to NGINX. MACSEK The NGINX receives the files and forwards them to a persistent volume. The second deployment constantly monitors the volume. When the NGINX stores a new file, it starts a thread and starts processing the file. First the file is unpacked. Then the supplied configuration file is read in. Under the name of the ID for the calculation, a directory is created into which the uploaded files are moved. With the authentication data an account is created in NGINX for the later download. A configuration file for the pipeline is built to start the actual calculation. This contains the path to the uploaded files. The pipeline is then started with this file in order to calculate the visualizations. In order to use the complete resource of the cluster, the individual processes are started as independent pods on the cluster. They have all mounted the Persistent Volume, from which they get the files and where the results are stored. When the calculation is finished, the result data is stored in the previously created directory under the given ID. The result files are then also packed into a tar archive. A path for the download is then built with the assigned ID. While the calculation runs on Kubernetes, the Python script, which runs locally at the user, checks whether the finished files are available for download. When the calculation on Kubernetes is finished and the download path is available, the script authenticates itself with the previously set access data and downloads the TAR archive. The TAR archive is unpacked and the result data is available. In the last step, the script sends the ID to the cluster again to signal that all files have been downloaded. On the cluster, the calculation directory and the user account will be deleted. Thus a run is finished.