How it Works

1.1

1.2 2.1 Clustering

2.2 Motif Estimation

BED_to_FASTA

The process BED_to_FASTA calls the R-script bed_to_fasta.R, which extracts the sequences from a given BED-file and creates a FASTA-file per cluster. The cluster is given by an additional column in the BED-file.

GLAM2

The process GLAM2 uses the FASTA-files created from BED_to_FASTA as input and calls GLAM2. GLAM2 is a tool provided inside the MEME Suite. "GLAM2 is a program for finding motifs in sequences, typically amino-acid or nucleotide sequences. The main innovation of GLAM2 is that it allows insertions and deletions in motifs." GLAM2 uses Dirichlet processes to align the sequences and find Motifs inside the sequences. The output is a MEME-file for each cluster.

Tomtom

The process Tomtom gets the MEME-files from GLAM2 and a database with known motifs as input. It calls the programm Tomtom. Tomtom is a tool provided inside the MEME Suite and "compares one or more motifs against a database of known motifs (e.g., JASPAR). Tomtom will rank the motifs in the database and produce an alignment for each significant match". The output is a TSV-file for each motif from GLAM" which contains all in the database found motifs.

Columns of the TSV-file:

Query_ID	Target_ID	Optimal_offset	p-value	E-value	q-value	Overlap	Query_consensus	Target_consensus	Orientation

filter_Motif

filter_Motifs is the last step of the motif estimation. It filters the motifs found by GLAM2 on the already known motifs found by Tomtom. All known motifs are removed from the channel of motifs. The other motifs are given to Part 3.2. The filtering step is implemented with groovy inside the nextflow skript. If no unknonw motifs are found, nextflow runs a process called check_for_unknown_motifs. The process stops the script with an message for the user.