masterJLU2018

De novo motif discovery and evaluation based on footprints identified by TOBIAS

Dependencies

Installation

Start with installing all dependencies listed above. It is required to set the enviroment paths for meme-suite. this can be done with following commands:

export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH
export PATH=[meme-suite instalation path]/bin:$PATH

Download all files from the GitHub repository. The Nextflow-script needs a conda enviroment to run. Nextflow can create the needed enviroment from the given yaml-file. On some systems Nrxtflow exits the run with following error:

Caused by:
  Failed to create Conda environment
  command: conda env create --prefix  --file env.yml
  status : 143
  message:

If this error occurs you have to create the enviroment before starting the pipeline. To create this enviroment you need the yml-file from the repository. Run the following commands to create the enviroment:

path=[Path to given masterenv.yml file]
conda env create --name masterenv -f=$path

When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.

Quick Start

nextflow run pipeline.nf --input [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --jaspar_db [MEME-file]

Parameters

Required arguments:
	--input Path to BigWig-file
	--bed Path to BED-file
	--genome_fasta Path to genome in FASTA-format
	--jaspar_db Path to motif-database in MEME-format


Optional arguments:
	Footprint extraction:
	--window_length INT (Default: 200)
	--step INT (Default: 100)
	--percentage INT(Default: 0)

	Filter unknown motifs:
	--min_size_fp INT (Default: 10)
	--max_size_fp INT (Default: 100)

	Cluster:
	Sequence preparation/ reduction:
	--kmer INT Kmer length (Default: 10)
	--aprox_motif_len INT Motif length (Default: 10)
	--motif_occurence FLOAT Percentage of motifs over all sequences. Use 1 (Default) to assume every sequence contains a motif.
	--min_seq_length INT Remove all sequences below this value. (Default: 10)
	Clustering:
	--global INT Global (=1) or local (=0) alignment. (Default: 0)
	--identity FLOAT Identity threshold. (Default: 0.8)
	--sequence_coverage INT Minimum aligned nucleotides on both sequences. (Default: 8)
	--memory INT Memory limit in MB. 0 for unlimited. (Default: 800)
	--throw_away_seq INT Remove all sequences equal or below this length before clustering. (Default: 9)
	--strand INT Align +/+ & +/- (= 1). Or align only +/+ (= 0). (Default: 0)

	Motif estimation:
	--motif_min_len INT	Minimum length of Motif (Default: 8)
	--motif_max_len INT	Maximum length of Motif (Default: 20)
	--interation INT	Number of iterations done by glam2. More Interations: better results, higher runtime. (Default: 10000)
	--tomtom_treshold float	Threshold for similarity score. (Default: 0.01)

	Creating GTF:
	--organism [homo_sapiens | mus_musculus]
	--tissues
  
 All arguments can be set in the configuration files.

For further information read the documentation

master_project_JLU2018/README.md

masterJLU2018

Dependencies

Installation

Quick Start

Parameters