De novo motif discovery and evaluation based on footprints identified by TOBIAS
Start with installing all dependencies listed above. It is required to set the enviroment paths for meme-suite. this can be done with following commands:
export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH
export PATH=[meme-suite instalation path]/bin:$PATH
Download all files from the GitHub repository. The Nextflow-script needs a conda enviroment to run. Nextflow can create the needed enviroment from the given yaml-file. On some systems Nrxtflow exits the run with following error:
Caused by:
Failed to create Conda environment
command: conda env create --prefix --file env.yml
status : 143
message:
If this error occurs you have to create the enviroment before starting the pipeline. To create this enviroment you need the yml-file from the repository. Run the following commands to create the enviroment:
path=[Path to given masterenv.yml file]
conda env create --name masterenv -f=$path
When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.
nextflow run pipeline.nf --input [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --jaspar_db [MEME-file]
Required arguments:
--input Path to BigWig-file
--bed Path to BED-file
--genome_fasta Path to genome in FASTA-format
--jaspar_db Path to motif-database in MEME-format
Optional arguments:
Footprint extraction:
--window_length INT (Default: 200)
--step INT (Default: 100)
--percentage INT(Default: 0)
Filter unknown motifs:
--min_size_fp INT (Default: 10)
--max_size_fp INT (Default: 100)
Clustering:
--sequence_coverage INT (Default: 8)
--kmer INT (Default: 10)
--aprox_motif_len INT (Default: 10)
Motif estimation:
--motif_min_len INT Minimum length of Motif (Default: 8)
--motif_max_len INT Maximum length of Motif (Default: 20)
--interation INT Number of iterations done by glam2. More Interations: better results, higher runtime. (Default: 10000)
--tomtom_treshold float Threshold for similarity score. (Default: 0.01)
Creating GTF:
--organism [homo_sapiens | mus_musculus]
--tissues
All arguments can be set in the configuration files.
For further information read the documentation