ATAC-seq (Assay for Transposase-Accessible Chromatin using high-throughput sequencing) is a sequencing assay for investigating genome-wide chromatin accessibility. The assay applies a Tn5 Transposase to insert sequencing adapters into accessible chromatin, enabling mapping of regulatory regions across the genome. Additionally, the local distribution of Tn5 insertions contains information about transcription factor binding due to the visible depletion of insertions around sites bound by protein - known as footprints.
TOBIAS is a collection of command-line bioinformatics tools for performing footprinting analysis on ATAC-seq data. Please see the TOBIAS github repository for details about the individual tools.
To use the nextflow pipeline, make sure the included conda environments are installed:
$ git clone https://github.molgen.mpg.de/loosolab/TOBIAS-nextflow.git
$ cd TOBIAS-nextflow
$ conda env create -f TOBIAS_MAPOKS/environments/tobias.yaml
$ conda env create -f TOBIAS_MAPOKS/environments/macs.yaml
You can use the example config (TOBIAS_example.config) or adjust to your own data by replacing the values for each key. Run using:
$ cd TOBIAS_MAPOKS
$ conda activate TOBIAS_ENV
$ nextflow run TOBIAS_nextflow_Kubernetes.nf --config TOBIAS_example_kubernetes.config
The Pipeline has a filterfunction for the plotting. You can filter the number of best binding motifs per condition for each motif. To activate the filter add the number of motifs you want to plot in the config file.
filter_motifs : #number of motif conditions which are ploted
There are two Versions of running the plotting of the Tobias Pipeline on Kubernetes. The first implementation performs the plotting process on Kubernetes by sending the files to the cluster via S3-Storage. The seconded own deploys an infrastructure on the cluster to over a TOBIAS-Plotting service for calculating plots on the Kubernetes cluster.
For "local" use, the pipeline sends, with the Framework MAPOKS(Managing Pipelines on Ku- bernenets with an S3-Storage), the files to the cluster via an S3 and starts the calculation. The entire process is integrated into a complete nextflow TOBIAS pipeline. Take a look at it: wiki for more information about the implementation.
Plotting the TOBIAS pipeline on the Kubernetes-Cluster can be offered as a service. TOBIAS was integrated in the tool MACSEK(MAnaging Computing SErvices on Kuberntes). The TOBIAS_MACSEK folder contains a TOBIAS pipeline that automatically sends the files to the cluster and performs the calculation. Take a look at it: wiki for more information about the implementation.
The functions for communicating with the Kuberntes Kluster and the S3 have been combined in the Python package PYKS(Python Kubernetes S3 Funktions). For installation and more information, see the doku.