Skip to content
Navigation Menu
Toggle navigation
Sign in
In this repository
All GitHub Enterprise
↵
Jump to
↵
No suggested jump to results
In this repository
All GitHub Enterprise
↵
Jump to
↵
In this organization
All GitHub Enterprise
↵
Jump to
↵
In this repository
All GitHub Enterprise
↵
Jump to
↵
Sign in
Reseting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
loosolab
/
master_project_JLU2018
Public
Notifications
You must be signed in to change notification settings
Fork
0
Star
0
Code
Issues
7
Pull requests
1
Actions
Projects
0
Wiki
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights
Files
89ae057
bin
config
demo
.gitignore
README.md
masterenv.yml
meme_suite.yml
nextflow.config
pipeline.nf
Breadcrumbs
master_project_JLU2018
/
README.md
Blame
Blame
Latest commit
History
History
113 lines (93 loc) · 5.8 KB
Breadcrumbs
master_project_JLU2018
/
README.md
Top
File metadata and controls
Preview
Code
Blame
113 lines (93 loc) · 5.8 KB
Raw
# masterJLU2018 De novo motif discovery and evaluation based on footprints identified by TOBIAS. For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki). ## Dependencies * [conda](https://conda.io/docs/user-guide/install/linux.html) * [Nextflow](https://www.nextflow.io/) * [MEME-Suite](http://meme-suite.org/doc/install.html?man_type=web) ## Installation 1. Start with installing all dependencies listed above (Nextflow, conda, MEME-Suite) and downloading all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018). 2. It is required to set the [environment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar). this can be done with following commands: ``` export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH export PATH=[meme-suite instalation path]/bin:$PATH ``` 3. Every other dependency will be automatically installed using conda. For that a conda environment has to be created from the yaml-file given in this repository. It is required to create and activate the environment from the yaml-file beforehand. This can be done with following commands: ```condsole conda env create -f masterenv.yml conda activate masterenv ``` 4. Set the wd parameter in the nextflow.config file as path where the repository is saved. For example: '~/masterJLU2018/'. **Important Notes:** 1. For conda the channel bioconda needs to be set as highest priority! This is required due to two different packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfish package from the channel conda-forge! ## Quick Start ```console nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file] --organism [mm10|mm9|hg19|hg38] ``` ### Demo run There are files provided inside ./demo/ for a demo run. Go to the main directory and run following command: ``` nextflow run pipeline.nf --bigwig ./demo/buenrostro50k_chr1_fp.bw --bed ./demo/buenrostro50k_chr1_peaks.bed --genome_fasta ./demo/hg38_chr1.fa --motif_db ./demo/jaspar_vertebrates.meme --out ./demo/buenrostro50k_chr1_out/ --organism hg38 ``` ## Parameters For a detailed overview for all parameters follow this [link](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki/Configuration). ``` Required arguments: --bigwig Path to BigWig-file --bed Path to BED-file --genome_fasta Path to genome in FASTA-format --motif_db Path to motif-database in MEME-format --config Path to UROPA configuration file --organism Input organism [hg38 | hg19 | mm9 | mm10] --out Output Directory (Default: './out/') Optional arguments: --help [0|1] 1 to show this help message. (Default: 0) --gtf_path Path to gtf-file. If path is set the process which creates a gtf-file is skipped. --tfbs_path Path to directory with tfbsscan output. If given tfbsscan will be skipped. Footprint extraction: --window_length INT This parameter sets the length of a sliding window. (Default: 200) --step INT This parameter sets the number of positions to slide the window forward. (Default: 100) --percentage INT Threshold in percent (Default: 0) --min_gap INT If footprints are less than X bases apart the footprints will be merged (Default: 6) Filter motifs: --min_size_fp INT Minimum sequence length threshold. Smaller sequences are discarded. (Default: 10) --max_size_fp INT Maximum sequence length threshold. Discards all sequences longer than this value. (Default: 200) --tfbsscan_method [moods|fimo] Method used by tfbsscan. (Default: moods) Cluster: Sequence preparation/ reduction: --kmer INT K-mer length (Default: 10) --aprox_motif_len INT Motif length (Default: 10) --motif_occurrence FLOAT Percentage of motifs over all sequences. Use 1 (Default) to assume every sequence contains a motif. --min_seq_length Interations Remove all sequences below this value. (Default: 10) Clustering: --global INT Global (=1) or local (=0) alignment. (Default: 0) --identity FLOAT Identity threshold. (Default: 0.8) --sequence_coverage INT Minimum aligned nucleotides on both sequences. (Default: 8) --memory INT Memory limit in MB. 0 for unlimited. (Default: 800) --throw_away_seq INT Remove all sequences equal or below this length before clustering. (Default: 9) --strand INT Align +/+ & +/- (= 1). Or align only +/+ (= 0). (Default: 0) Motif estimation: --min_seq INT Sets the minimum number of sequences required for the FASTA-files given to GLAM2. (Default: 100) --motif_min_key INT Minimum number of key positions (aligned columns) in the alignment done by GLAM2. (Default: 8) --motif_max_key INT Maximum number of key positions (aligned columns) in the alignment done by GLAM2. (Default: 20) --iteration INT Number of iterations done by GLAM2. More Iterations: better results, higher runtime. (Default: 10000) --tomtom_treshold FLOAT Threshold for similarity score. (Default: 0.01) --best_motif INT Get the best X motifs per cluster. (Default: 3) --gap_penalty INT Set penalty for gaps in GLAM2 (Default: 1000) --seed Set seed for GLAM2 (Default: 123456789) Moitf clustering: --cluster_motif Boolean If 1 pipeline clusters motifs. If its 0 it does not. (Defaul: 0) --edge_weight INT Minimum weight of edges in motif-cluster-graph (Default: 5) --motif_similarity_thresh FLOAT Threshold for motif similarity score (Default: 0.00001) Creating GTF: --tissues List/String List of one or more keywords for tissue-/category-activity, categories must be specified as in JSON config Evaluation: --max_uropa_runs INT Maximum number UROPA runs running parallelized (Default: 10) All arguments can be set in the configuration files ``` For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
You can’t perform that action at this time.