Skip to content
Merged
merged 235 commits into from
Dec 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
235 commits
Select commit Hold shift + click to select a range
89eb998
convert BED file to FASTA (one per cluster)
renewiegandt Oct 29, 2018
4c86efd
removing redundant code
renewiegandt Oct 29, 2018
317b83d
starting to work with custom peak calling
anastasiia Oct 30, 2018
400895a
test commit
anastasiia Oct 30, 2018
22c4b90
deleting functions that i don't need
anastasiia Oct 30, 2018
210bdda
saving the footprint before starting the new one
anastasiia Oct 30, 2018
54a5914
background changing to mean + 10%, the score for a footprint is a mea…
anastasiia Oct 30, 2018
c4f5bf3
calculating the position with max score as the position in the middle…
anastasiia Oct 30, 2018
b09d23c
make save_footprint as a function on its own
anastasiia Oct 30, 2018
362cc54
writing to bed file
anastasiia Oct 30, 2018
b061f61
adding the bonus information from the original bed file to the output…
anastasiia Oct 30, 2018
8c25200
changing to the actual start and end position and not the relative on…
anastasiia Oct 30, 2018
36974c2
Basic Nextflow structure
renewiegandt Nov 1, 2018
0dbdf5b
fixed db_channel for tomtom
renewiegandt Nov 1, 2018
2d42086
adding a sorting function after the bed file was written to present i…
anastasiia Nov 1, 2018
59a08fc
adding of a logger, deleting of the unsorted file after the sorting
anastasiia Nov 1, 2018
b4b972f
bed_to_fasta update
renewiegandt Nov 5, 2018
77e40e3
Merge branch 'motif_estiamtion' of https://github.molgen.mpg.de/looso…
renewiegandt Nov 5, 2018
82c0c13
motif estimation docu in nf
renewiegandt Nov 5, 2018
14fef9d
get best motif from meme
renewiegandt Nov 5, 2018
42a279b
Merge branch 'dev' into motif_estiamtion
renewiegandt Nov 7, 2018
5a7b661
added process to get best motif per meme-file
renewiegandt Nov 7, 2018
603c814
added uropa config
renewiegandt Nov 8, 2018
9f4e480
added process create_uropa_config
renewiegandt Nov 8, 2018
87bbda0
added scripts for part 1.2
renewiegandt Nov 8, 2018
da77364
Update: process overlap_with_known_TFBS
renewiegandt Nov 8, 2018
bce329b
backup before changing to use a window for search
anastasiia Nov 12, 2018
1cd5bb1
changing the input parameters, creating a function for searching with…
anastasiia Nov 12, 2018
88c1336
first try with window, the problem is that in some cases the window s…
anastasiia Nov 13, 2018
f1f0988
the search with sliding window is done, it needs to be tested
anastasiia Nov 13, 2018
de32619
need to resolve overlapping footprints
anastasiia Nov 13, 2018
5efbe54
working on overlapping footprints
anastasiia Nov 13, 2018
1a6ce69
added script for process 'footprint_extraction'
renewiegandt Nov 19, 2018
80ffd2a
added script for process 'footprint_extraction'
renewiegandt Nov 19, 2018
64d99dc
adding one dictionary for each peak, so that we can easily look for p…
anastasiia Nov 20, 2018
16110f5
Add files via upload
JannikHamp Nov 20, 2018
a9d0070
Add files via upload
JannikHamp Nov 20, 2018
26d2764
Update README.md
JannikHamp Nov 20, 2018
6fd9948
Update README.md
JannikHamp Nov 20, 2018
ffc665d
deleting of overlaps is working
anastasiia Nov 20, 2018
b92ebad
the number of peaks and the number of footprints is printed with help…
anastasiia Nov 21, 2018
8c4da33
adding the possibility for user to chose how many percent to add to t…
anastasiia Nov 21, 2018
02b7fc8
deleting of an unused function
anastasiia Nov 21, 2018
2429612
using the output file as parameter as well as rounding of the scores
anastasiia Nov 21, 2018
d810baf
deleting of unused imports and explaining why the other imports are n…
anastasiia Nov 21, 2018
492d7f6
final commenting for the important functions in the script
anastasiia Nov 21, 2018
8807ef6
deleting of unused code
anastasiia Nov 21, 2018
2fdbf0d
update, working with unmerged motif files now
JannikHamp Nov 28, 2018
9028820
update, bugfix output parameter error
JannikHamp Nov 29, 2018
a5e902f
added filter for small clusters
renewiegandt Nov 29, 2018
420cb1f
changed tomtom/glam2 parameters; added min_seq parameter
renewiegandt Nov 29, 2018
4b32142
Merge pull request #1 from loosolab/motif_estiamtion
renewiegandt Nov 29, 2018
0aa4cc3
Merge branch 'dev' into peak_calling
Dec 3, 2018
c24dea9
move the call_peaks.py script to the bin directory
Dec 3, 2018
2c0625a
control changes
anastasiia Dec 3, 2018
f16d6e9
adding the environment for the peak calling
anastasiia Dec 3, 2018
e74cc87
editing of the commits
anastasiia Dec 3, 2018
21be98b
Update README.md
renewiegandt Dec 3, 2018
e265495
Update README.md
renewiegandt Dec 3, 2018
8345d59
Update README.md
renewiegandt Dec 3, 2018
d8e6fc3
added yaml file for conda enviroment
renewiegandt Dec 3, 2018
bdb0e83
Update README.md
renewiegandt Dec 3, 2018
4277a65
Update README.md
renewiegandt Dec 3, 2018
146b7f8
added nextflow configuration files
renewiegandt Dec 3, 2018
eb70872
added parameter to config files
renewiegandt Dec 3, 2018
820d02f
update config file
anastasiia Dec 3, 2018
8a0143b
adding dependencies needed for call_peak.py
anastasiia Dec 3, 2018
2f49136
final changes in comments
anastasiia Dec 3, 2018
a48bd8a
added enviroment parameter
renewiegandt Dec 3, 2018
c154116
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 3, 2018
206043b
Merge branch 'peak_calling' into dev
anastasiia Dec 3, 2018
d0b72d8
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
anastasiia Dec 3, 2018
8a369df
removed redundant parameter
renewiegandt Dec 3, 2018
476df0c
added conda; bed input
renewiegandt Dec 3, 2018
a5259fd
added parameter to pipeline
renewiegandt Dec 3, 2018
43b5d98
Update README.md
renewiegandt Dec 3, 2018
3fd48cc
Update README.md
renewiegandt Dec 3, 2018
6ec4a85
delete unused logger statements
anastasiia Dec 4, 2018
86f762c
overlap_with_known_TFBS: input/output changes
renewiegandt Dec 4, 2018
f2344b6
add bed sequence reduction script
HendrikSchultheis Dec 4, 2018
ab88742
add cdhit wrapper script
HendrikSchultheis Dec 4, 2018
606485e
Inital commit ! -> Migration to new project
Dec 4, 2018
fe1a516
Update .ignore
Dec 4, 2018
4683fdf
Create README.md
SebastianBeyvers Dec 4, 2018
966761f
fixed cdhit wrapper description
HendrikSchultheis Dec 4, 2018
b82dc6a
Readme aktualisier
Dec 4, 2018
d8655fd
Check for local Folders
Dec 4, 2018
86f28cf
update the name of log-file
anastasiia Dec 4, 2018
835faec
update the name of the log-file
anastasiia Dec 4, 2018
dc7ad08
renaming channels
renewiegandt Dec 4, 2018
392fd12
Merge branch 'dev' into motif_estiamtion
renewiegandt Dec 4, 2018
96efc46
remove sequences below threshold (not above)
HendrikSchultheis Dec 4, 2018
8d06cdc
added parameter to config
renewiegandt Dec 4, 2018
ee6e0d7
finished create_GTF
renewiegandt Dec 4, 2018
9bd64be
show progressbar even in non interactive context
HendrikSchultheis Dec 4, 2018
cef1ec8
merge gtf to motif
renewiegandt Dec 4, 2018
9d23fd9
moved Modules to bin
renewiegandt Dec 4, 2018
c29c46d
Delete .gitignore
renewiegandt Dec 4, 2018
0b1c5c7
Changed config locations
Dec 4, 2018
1c22f75
Added Dev to new local branch
Dec 4, 2018
0529318
changed resultpath
Dec 4, 2018
8b2824e
Added Dev to new local branch
Dec 4, 2018
6cd5985
added documentation
HendrikSchultheis Dec 4, 2018
279c196
added cluster dependencies
HendrikSchultheis Dec 4, 2018
0a04f30
Changed Paths
Dec 4, 2018
9a6c8a2
Merge remote-tracking branch 'origin/dev_gtfdev' into dev_gtfdev
Dec 4, 2018
49c356b
Changed Paths
Dec 4, 2018
b25f97c
Changed Paths
Dec 4, 2018
afe9eb2
minor changes
renewiegandt Dec 4, 2018
f8a339e
Merge branch 'dev_gtfdev' into dev
Dec 4, 2018
a53aa72
Changed Paths
Dec 4, 2018
1285f1a
Changed Paths
Dec 4, 2018
512d262
moved config for gtf creation
renewiegandt Dec 4, 2018
4c3f013
Fixed: "No Module named bin Error"
Dec 4, 2018
55cf595
Merge remote-tracking branch 'origin/dev' into dev
Dec 4, 2018
a0165a9
Fixed: "No Module named bin Error" V2
Dec 4, 2018
396e183
Fixed: "No Module named bin Error" V2
Dec 4, 2018
2dee4f6
Fixed: "No Module named bin Error" V2
Dec 4, 2018
38c9bae
Fixed: "No Module named bin Error" V2
Dec 4, 2018
8c843fb
Fixed: "No Module named bin Error" V2
Dec 4, 2018
0f2822d
Update masterenv.yml
renewiegandt Dec 4, 2018
3faf37a
Merge pull request #2 from loosolab/cluster
renewiegandt Dec 4, 2018
bbf2117
Update masterenv.yml
renewiegandt Dec 4, 2018
d22a0e1
Fixed: "Paths" V2
Dec 4, 2018
142ee35
Merge remote-tracking branch 'origin/dev' into dev
Dec 4, 2018
6bc50be
added clustering + reducing
renewiegandt Dec 4, 2018
4607522
added parameter
renewiegandt Dec 4, 2018
9b258c6
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 4, 2018
24a5a06
fixed minimum sequence length and added check
HendrikSchultheis Dec 5, 2018
689331e
implemented minoverlap_kmer & minoverlap_motif
HendrikSchultheis Dec 5, 2018
617ff9f
better docu; reorder data in reduce_kmer()
HendrikSchultheis Dec 5, 2018
3de2701
Rscript first line changes
JannikHamp Dec 5, 2018
527c4fa
Delete README.md
JannikHamp Dec 5, 2018
f742731
Changed ResultPath
Dec 5, 2018
f6942a5
Changed ResultPath
Dec 5, 2018
c4b7a0d
Changed ResultPath
Dec 5, 2018
a584817
added progress messages; thread param; documentation
HendrikSchultheis Dec 5, 2018
bb4e3ac
merge dev
JannikHamp Dec 5, 2018
014e40d
moved scripts to bin
renewiegandt Dec 5, 2018
6ff051d
Delete .gitignore
renewiegandt Dec 5, 2018
02e946d
added parameter to yaml
renewiegandt Dec 5, 2018
1cdd1c2
added tfbsscan.py to bin
renewiegandt Dec 5, 2018
b3fabce
added and moved parameter in config
renewiegandt Dec 5, 2018
760d91d
added default parameter in script
renewiegandt Dec 5, 2018
ef33438
moved uropa.config to config dir
renewiegandt Dec 5, 2018
4ff3088
added help message
renewiegandt Dec 5, 2018
01f3492
added new config file 'filter_unknown_motifs'
renewiegandt Dec 6, 2018
8476db5
implemented motif_occurence for cutoff computation
HendrikSchultheis Dec 6, 2018
f0e2cef
threads = 0 for all cores to match cd-hit
HendrikSchultheis Dec 6, 2018
1d466c5
help message
renewiegandt Dec 6, 2018
688cb00
Update README.md
renewiegandt Dec 6, 2018
65e17ce
Update README.md
renewiegandt Dec 6, 2018
0bdc101
Update README.md
renewiegandt Dec 6, 2018
a62154c
Update README.md
renewiegandt Dec 6, 2018
0467d7b
added more cdhit parameter
HendrikSchultheis Dec 6, 2018
4b44ca5
changed type of motif_occurence to double
HendrikSchultheis Dec 6, 2018
c6c42ca
renamed ignore_strand to strand
HendrikSchultheis Dec 6, 2018
b769d49
new clustering parameter added
HendrikSchultheis Dec 6, 2018
51d7efe
Merge remote-tracking branch 'origin/dev' into cluster
HendrikSchultheis Dec 6, 2018
2700ce7
Update README.md
renewiegandt Dec 6, 2018
1a45703
fixed ambiguous flags
HendrikSchultheis Dec 6, 2018
cd53d3c
fixed ambiguous flags + nameing bug in find_kmer_regions
HendrikSchultheis Dec 6, 2018
e68a89d
removed param threads
HendrikSchultheis Dec 6, 2018
92baa4d
added parameter to list and description
HendrikSchultheis Dec 6, 2018
8274bc3
Merge pull request #3 from loosolab/cluster
renewiegandt Dec 6, 2018
9b09e5a
minor changes
renewiegandt Dec 6, 2018
88db055
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 6, 2018
a8377ce
removed debug code
renewiegandt Dec 6, 2018
224b6ee
bugfixing
JannikHamp Dec 6, 2018
0eb48b5
Update README.md
renewiegandt Dec 6, 2018
6aca457
...
JannikHamp Dec 6, 2018
119b876
fixing
JannikHamp Dec 6, 2018
b8f435f
fixing
JannikHamp Dec 6, 2018
df70d29
fixing
JannikHamp Dec 6, 2018
ff3570e
fixed failed merge
renewiegandt Dec 6, 2018
4988fab
fixed failed merge
renewiegandt Dec 6, 2018
db937a1
fixed failed merge
renewiegandt Dec 6, 2018
9c3f325
minor fix
renewiegandt Dec 6, 2018
56c89a1
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 6, 2018
cc88f6c
added process merge_meme
renewiegandt Dec 6, 2018
deca64b
removed typo
renewiegandt Dec 7, 2018
b784574
setting up prozesses for motif clustering
renewiegandt Dec 7, 2018
bff5817
added script merge_similar_clusters.R
renewiegandt Dec 7, 2018
10a7e8c
added motif clustering
renewiegandt Dec 8, 2018
e0bc519
Update README.md
renewiegandt Dec 8, 2018
889899f
Update README.md
renewiegandt Dec 8, 2018
250ee92
Update compareBed.sh
JannikHamp Dec 10, 2018
e861fdc
minor changes
renewiegandt Dec 10, 2018
b45f676
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 10, 2018
638133b
Implemented --dir parameter for remote data directory
Dec 10, 2018
0a0e9b8
correcting some errors
anastasiia Dec 10, 2018
c1cb8ca
Update README.md
renewiegandt Dec 10, 2018
42d3238
adding comment about the footprints_extraction
anastasiia Dec 10, 2018
198334a
Update README.md
renewiegandt Dec 10, 2018
5a7a526
improvement of parameters for my part
anastasiia Dec 10, 2018
ffed012
improve parameters for my part once more
anastasiia Dec 10, 2018
037b2bf
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 10, 2018
15aa68a
help message fix
renewiegandt Dec 10, 2018
30c17cc
Updated README.
Dec 10, 2018
f0ad6b3
Merge remote-tracking branch 'origin/dev' into dev
Dec 10, 2018
a3c452d
Updated README.
Dec 10, 2018
55bd528
Update Readme
SebastianBeyvers Dec 10, 2018
479ddad
Added more comments!
Dec 10, 2018
79e263e
Merge remote-tracking branch 'origin/dev' into dev
Dec 10, 2018
22a36a5
Changed Readme
Dec 10, 2018
69434e2
Changed Readme and updated mm_config
Dec 11, 2018
5f849b8
data error fixed
Dec 11, 2018
476c25d
error in output fixed
Dec 11, 2018
7357dc2
Fixed small bug in output formatting for mm10/mm9
Dec 12, 2018
367dd33
updated README
renewiegandt Dec 12, 2018
9026ad9
minor bugfix
renewiegandt Dec 12, 2018
023bf56
added T parameter for sorting
renewiegandt Dec 12, 2018
8c939de
removed parameter
renewiegandt Dec 12, 2018
d6f9898
tfbsscan optional; minor fixes/changes
renewiegandt Dec 12, 2018
ff52b92
fixed yml-file
renewiegandt Dec 12, 2018
9ff4281
minor changes
renewiegandt Dec 12, 2018
748fc0b
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 12, 2018
725d7c0
Update README.md
renewiegandt Dec 12, 2018
5275f18
we do not need this env
anastasiia Dec 13, 2018
1c5f769
Update README.md
renewiegandt Dec 13, 2018
7d05d05
Update merge.R
JannikHamp Dec 13, 2018
dc2363e
check prameters; minor fixes
renewiegandt Dec 13, 2018
6f0076b
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 13, 2018
1f45269
Update README.md
renewiegandt Dec 13, 2018
b69682a
Update README.md
renewiegandt Dec 13, 2018
ee03951
Fixed bug in formatting (gff vs gtf). Converted feature description t…
Dec 14, 2018
7aaf90c
fixes
renewiegandt Dec 14, 2018
9318075
resolve merge conflict
renewiegandt Dec 14, 2018
63548bd
fix
renewiegandt Dec 14, 2018
0d9892a
removed echo true from process
renewiegandt Dec 14, 2018
6513638
Readme formatted
SebastianBeyvers Dec 14, 2018
8a306b1
$"motifs" falsch -> "$motifs" richtig :)
JannikHamp Dec 14, 2018
fa5c838
minor fix
renewiegandt Dec 14, 2018
156e417
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 14, 2018
8669b3e
added info for FilterMotifs.stats
JannikHamp Dec 14, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
104 changes: 103 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,105 @@
# masterJLU2018

de novo motif discovery and evaluation based on footprints identified by TOBIAS
De novo motif discovery and evaluation based on footprints identified by TOBIAS

For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)

## Dependencies
* [conda](https://conda.io/docs/user-guide/install/linux.html)
* [Nextflow](https://www.nextflow.io/)
* [MEME-Suite](http://meme-suite.org/doc/install.html?man_type=web)

## Installation
Start with installing all dependencies listed above. It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar).
this can be done with following commands:
```
export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH
export PATH=[meme-suite instalation path]/bin:$PATH
```


Download all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
The Nextflow-script needs a conda enviroment to run. Nextflow can create the needed enviroment from the given yaml-file.
On some systems Nextflow exits the run with following error:
```
Caused by:
Failed to create Conda environment
command: conda env create --prefix --file env.yml
status : 143
message:
```
If this error occurs you have to create the enviroment before starting the pipeline.
To create this enviroment you need the yml-file from the repository.
Run the following commands to create the enviroment:
```console
path=[Path to given masterenv.yml file]
conda env create --name masterenv -f=$path
```
When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.

**Important Note:** For conda the channel bioconda needs to be set as highest priority! This is required due to two differnt packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfisch package from the channel conda-forge!

## Quick Start
```console
nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file]
```
## Parameters
For a detailed overview for all parameters follow this [link](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki/Configuration).
```
Required arguments:
--bigwig Path to BigWig-file with scores on the peaks of interest
--bed Path to BED-file with peaks of interest corresponding to the BigWig file
--genome_fasta Path to genome in FASTA-format
--motif_db Path to motif-database in MEME-format


Optional arguments:

--tfbs_path Path to directory with output BED-files from tfbsscan. If given tfbsscan will not be run.

Footprint extraction:
--window_length INT (Default: 200) a length of a window
--step INT (Default: 100) an interval to slide the window
--percentage INT(Default: 0) a percentage to be added to background while searching for footprints

Filter unknown motifs:
--min_size_fp INT (Default: 10)
--max_size_fp INT (Default: 100)

Cluster:
Sequence preparation/ reduction:
--kmer INT Kmer length (Default: 10)
--aprox_motif_len INT Motif length (Default: 10)
--motif_occurence FLOAT Percentage of motifs over all sequences. Use 1 (Default) to assume every sequence contains a motif.
--min_seq_length INT Remove all sequences below this value. (Default: 10)

Clustering:
--global INT Global (=1) or local (=0) alignment. (Default: 0)
--identity FLOAT Identity threshold. (Default: 0.8)
--sequence_coverage INT Minimum aligned nucleotides on both sequences. (Default: 8)
--memory INT Memory limit in MB. 0 for unlimited. (Default: 800)
--throw_away_seq INT Remove all sequences equal or below this length before clustering. (Default: 9)
--strand INT Align +/+ & +/- (= 1). Or align only +/+ (= 0). (Default: 0)

Motif estimation:
--min_seq INT Minimum number of sequences required in the FASTA-files for GLAM2 (Default: 100)
--motif_min_key INT Maximum number of key positions (aligned columns) (Default: 8)
--motif_max_key INT Maximum number of key positions (aligned columns) (Default: 20)
--iteration INT Number of iterations done by glam2. More Iterations: better results, higher runtime. (Default: 10000)
--tomtom_treshold float Threshold for similarity score. (Default: 0.01)

Motif clustering:
--cluster_motif BOOLEAN IF its 1 motifs will be clustered (Default: 0)
--edge_weight INT Minimum weight of edges in motif-cluster-graph (Default: 5)
--motif_similarity_thresh FLOAT threshold for motif similarity score (Default: 0.00001)

Creating GTF:
--tissue STRING Filter for one or more tissue/category activity, categories as in JSON config (Default: None)
--organism STRING Source organism: [ hg19 | hg38 or mm9 | mm10 ] (Default: hg38)

All arguments can be set in the configuration files.
```



For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)