Skip to content

Merge dev to motif_estimation to bring branch up to date #9

Merged
merged 186 commits into from
Dec 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
186 commits
Select commit Hold shift + click to select a range
16110f5
Add files via upload
JannikHamp Nov 20, 2018
a9d0070
Add files via upload
JannikHamp Nov 20, 2018
26d2764
Update README.md
JannikHamp Nov 20, 2018
6fd9948
Update README.md
JannikHamp Nov 20, 2018
2fdbf0d
update, working with unmerged motif files now
JannikHamp Nov 28, 2018
9028820
update, bugfix output parameter error
JannikHamp Nov 29, 2018
f2344b6
add bed sequence reduction script
HendrikSchultheis Dec 4, 2018
ab88742
add cdhit wrapper script
HendrikSchultheis Dec 4, 2018
966761f
fixed cdhit wrapper description
HendrikSchultheis Dec 4, 2018
96efc46
remove sequences below threshold (not above)
HendrikSchultheis Dec 4, 2018
9bd64be
show progressbar even in non interactive context
HendrikSchultheis Dec 4, 2018
c29c46d
Delete .gitignore
renewiegandt Dec 4, 2018
0b1c5c7
Changed config locations
Dec 4, 2018
1c22f75
Added Dev to new local branch
Dec 4, 2018
0529318
changed resultpath
Dec 4, 2018
8b2824e
Added Dev to new local branch
Dec 4, 2018
6cd5985
added documentation
HendrikSchultheis Dec 4, 2018
279c196
added cluster dependencies
HendrikSchultheis Dec 4, 2018
0a04f30
Changed Paths
Dec 4, 2018
9a6c8a2
Merge remote-tracking branch 'origin/dev_gtfdev' into dev_gtfdev
Dec 4, 2018
49c356b
Changed Paths
Dec 4, 2018
b25f97c
Changed Paths
Dec 4, 2018
afe9eb2
minor changes
renewiegandt Dec 4, 2018
f8a339e
Merge branch 'dev_gtfdev' into dev
Dec 4, 2018
a53aa72
Changed Paths
Dec 4, 2018
1285f1a
Changed Paths
Dec 4, 2018
512d262
moved config for gtf creation
renewiegandt Dec 4, 2018
4c3f013
Fixed: "No Module named bin Error"
Dec 4, 2018
55cf595
Merge remote-tracking branch 'origin/dev' into dev
Dec 4, 2018
a0165a9
Fixed: "No Module named bin Error" V2
Dec 4, 2018
396e183
Fixed: "No Module named bin Error" V2
Dec 4, 2018
2dee4f6
Fixed: "No Module named bin Error" V2
Dec 4, 2018
38c9bae
Fixed: "No Module named bin Error" V2
Dec 4, 2018
8c843fb
Fixed: "No Module named bin Error" V2
Dec 4, 2018
0f2822d
Update masterenv.yml
renewiegandt Dec 4, 2018
3faf37a
Merge pull request #2 from loosolab/cluster
renewiegandt Dec 4, 2018
bbf2117
Update masterenv.yml
renewiegandt Dec 4, 2018
d22a0e1
Fixed: "Paths" V2
Dec 4, 2018
142ee35
Merge remote-tracking branch 'origin/dev' into dev
Dec 4, 2018
6bc50be
added clustering + reducing
renewiegandt Dec 4, 2018
4607522
added parameter
renewiegandt Dec 4, 2018
9b258c6
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 4, 2018
24a5a06
fixed minimum sequence length and added check
HendrikSchultheis Dec 5, 2018
689331e
implemented minoverlap_kmer & minoverlap_motif
HendrikSchultheis Dec 5, 2018
617ff9f
better docu; reorder data in reduce_kmer()
HendrikSchultheis Dec 5, 2018
3de2701
Rscript first line changes
JannikHamp Dec 5, 2018
527c4fa
Delete README.md
JannikHamp Dec 5, 2018
f742731
Changed ResultPath
Dec 5, 2018
f6942a5
Changed ResultPath
Dec 5, 2018
c4b7a0d
Changed ResultPath
Dec 5, 2018
a584817
added progress messages; thread param; documentation
HendrikSchultheis Dec 5, 2018
bb4e3ac
merge dev
JannikHamp Dec 5, 2018
014e40d
moved scripts to bin
renewiegandt Dec 5, 2018
6ff051d
Delete .gitignore
renewiegandt Dec 5, 2018
02e946d
added parameter to yaml
renewiegandt Dec 5, 2018
1cdd1c2
added tfbsscan.py to bin
renewiegandt Dec 5, 2018
b3fabce
added and moved parameter in config
renewiegandt Dec 5, 2018
760d91d
added default parameter in script
renewiegandt Dec 5, 2018
ef33438
moved uropa.config to config dir
renewiegandt Dec 5, 2018
4ff3088
added help message
renewiegandt Dec 5, 2018
01f3492
added new config file 'filter_unknown_motifs'
renewiegandt Dec 6, 2018
8476db5
implemented motif_occurence for cutoff computation
HendrikSchultheis Dec 6, 2018
f0e2cef
threads = 0 for all cores to match cd-hit
HendrikSchultheis Dec 6, 2018
1d466c5
help message
renewiegandt Dec 6, 2018
688cb00
Update README.md
renewiegandt Dec 6, 2018
65e17ce
Update README.md
renewiegandt Dec 6, 2018
0bdc101
Update README.md
renewiegandt Dec 6, 2018
a62154c
Update README.md
renewiegandt Dec 6, 2018
0467d7b
added more cdhit parameter
HendrikSchultheis Dec 6, 2018
4b44ca5
changed type of motif_occurence to double
HendrikSchultheis Dec 6, 2018
c6c42ca
renamed ignore_strand to strand
HendrikSchultheis Dec 6, 2018
b769d49
new clustering parameter added
HendrikSchultheis Dec 6, 2018
51d7efe
Merge remote-tracking branch 'origin/dev' into cluster
HendrikSchultheis Dec 6, 2018
2700ce7
Update README.md
renewiegandt Dec 6, 2018
1a45703
fixed ambiguous flags
HendrikSchultheis Dec 6, 2018
cd53d3c
fixed ambiguous flags + nameing bug in find_kmer_regions
HendrikSchultheis Dec 6, 2018
e68a89d
removed param threads
HendrikSchultheis Dec 6, 2018
92baa4d
added parameter to list and description
HendrikSchultheis Dec 6, 2018
8274bc3
Merge pull request #3 from loosolab/cluster
renewiegandt Dec 6, 2018
9b09e5a
minor changes
renewiegandt Dec 6, 2018
88db055
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 6, 2018
a8377ce
removed debug code
renewiegandt Dec 6, 2018
224b6ee
bugfixing
JannikHamp Dec 6, 2018
0eb48b5
Update README.md
renewiegandt Dec 6, 2018
6aca457
...
JannikHamp Dec 6, 2018
119b876
fixing
JannikHamp Dec 6, 2018
b8f435f
fixing
JannikHamp Dec 6, 2018
df70d29
fixing
JannikHamp Dec 6, 2018
ff3570e
fixed failed merge
renewiegandt Dec 6, 2018
4988fab
fixed failed merge
renewiegandt Dec 6, 2018
db937a1
fixed failed merge
renewiegandt Dec 6, 2018
9c3f325
minor fix
renewiegandt Dec 6, 2018
56c89a1
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 6, 2018
cc88f6c
added process merge_meme
renewiegandt Dec 6, 2018
deca64b
removed typo
renewiegandt Dec 7, 2018
b784574
setting up prozesses for motif clustering
renewiegandt Dec 7, 2018
bff5817
added script merge_similar_clusters.R
renewiegandt Dec 7, 2018
10a7e8c
added motif clustering
renewiegandt Dec 8, 2018
e0bc519
Update README.md
renewiegandt Dec 8, 2018
889899f
Update README.md
renewiegandt Dec 8, 2018
250ee92
Update compareBed.sh
JannikHamp Dec 10, 2018
e861fdc
minor changes
renewiegandt Dec 10, 2018
b45f676
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 10, 2018
638133b
Implemented --dir parameter for remote data directory
Dec 10, 2018
0a0e9b8
correcting some errors
anastasiia Dec 10, 2018
c1cb8ca
Update README.md
renewiegandt Dec 10, 2018
42d3238
adding comment about the footprints_extraction
anastasiia Dec 10, 2018
198334a
Update README.md
renewiegandt Dec 10, 2018
5a7a526
improvement of parameters for my part
anastasiia Dec 10, 2018
ffed012
improve parameters for my part once more
anastasiia Dec 10, 2018
037b2bf
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 10, 2018
15aa68a
help message fix
renewiegandt Dec 10, 2018
30c17cc
Updated README.
Dec 10, 2018
f0ad6b3
Merge remote-tracking branch 'origin/dev' into dev
Dec 10, 2018
a3c452d
Updated README.
Dec 10, 2018
55bd528
Update Readme
SebastianBeyvers Dec 10, 2018
479ddad
Added more comments!
Dec 10, 2018
79e263e
Merge remote-tracking branch 'origin/dev' into dev
Dec 10, 2018
22a36a5
Changed Readme
Dec 10, 2018
69434e2
Changed Readme and updated mm_config
Dec 11, 2018
5f849b8
data error fixed
Dec 11, 2018
476c25d
error in output fixed
Dec 11, 2018
7357dc2
Fixed small bug in output formatting for mm10/mm9
Dec 12, 2018
367dd33
updated README
renewiegandt Dec 12, 2018
9026ad9
minor bugfix
renewiegandt Dec 12, 2018
023bf56
added T parameter for sorting
renewiegandt Dec 12, 2018
8c939de
removed parameter
renewiegandt Dec 12, 2018
d6f9898
tfbsscan optional; minor fixes/changes
renewiegandt Dec 12, 2018
ff52b92
fixed yml-file
renewiegandt Dec 12, 2018
9ff4281
minor changes
renewiegandt Dec 12, 2018
748fc0b
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 12, 2018
725d7c0
Update README.md
renewiegandt Dec 12, 2018
5275f18
we do not need this env
anastasiia Dec 13, 2018
1c5f769
Update README.md
renewiegandt Dec 13, 2018
7d05d05
Update merge.R
JannikHamp Dec 13, 2018
dc2363e
check prameters; minor fixes
renewiegandt Dec 13, 2018
6f0076b
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 13, 2018
1f45269
Update README.md
renewiegandt Dec 13, 2018
b69682a
Update README.md
renewiegandt Dec 13, 2018
ee03951
Fixed bug in formatting (gff vs gtf). Converted feature description t…
Dec 14, 2018
7aaf90c
fixes
renewiegandt Dec 14, 2018
9318075
resolve merge conflict
renewiegandt Dec 14, 2018
0805573
standalone version
JannikHamp Dec 14, 2018
6ef5dcb
Delete compareBed.sh
JannikHamp Dec 14, 2018
91e1dcd
Delete maxScore.R
JannikHamp Dec 14, 2018
6588cdd
Delete merge.R
JannikHamp Dec 14, 2018
63548bd
fix
renewiegandt Dec 14, 2018
0d9892a
removed echo true from process
renewiegandt Dec 14, 2018
6513638
Readme formatted
SebastianBeyvers Dec 14, 2018
8a306b1
$"motifs" falsch -> "$motifs" richtig :)
JannikHamp Dec 14, 2018
fa5c838
minor fix
renewiegandt Dec 14, 2018
156e417
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 14, 2018
8669b3e
added info for FilterMotifs.stats
JannikHamp Dec 14, 2018
614a0bb
Merge pull request #4 from loosolab/dev
renewiegandt Dec 14, 2018
e54d535
Update pipeline.nf
renewiegandt Dec 14, 2018
906942e
major bugfix
renewiegandt Dec 14, 2018
d67f477
Merge branch 'dev' of https://github.molgen.mpg.de/loosolab/masterJLU…
renewiegandt Dec 14, 2018
1e3b771
minor changes
renewiegandt Dec 14, 2018
fafe237
Update README.md
renewiegandt Dec 14, 2018
aed0e60
Update README.md
renewiegandt Dec 14, 2018
3ec6036
Update README.md
renewiegandt Dec 14, 2018
6f263b5
Update pipeline.nf
renewiegandt Dec 14, 2018
13d4447
Update pipeline.nf
renewiegandt Dec 14, 2018
565eeaa
Update README.md
renewiegandt Dec 14, 2018
43646c6
Update pipeline.nf
renewiegandt Dec 14, 2018
a390459
Update pipeline.nf
renewiegandt Dec 14, 2018
1d86bc9
Merge branch 'master' into dev
renewiegandt Dec 14, 2018
3a7ef0c
Merge pull request #5 from loosolab/dev
renewiegandt Dec 14, 2018
4b2150c
rename call_peaks to footrptins_extraction
anastasiia Dec 15, 2018
9a2d3eb
Merge pull request #6 from loosolab/master
renewiegandt Dec 15, 2018
62add26
minor fixes
renewiegandt Dec 15, 2018
0f3f388
Merge pull request #7 from loosolab/dev
renewiegandt Dec 15, 2018
6954ecd
Update README.md
renewiegandt Dec 15, 2018
6c42389
Update README.md
renewiegandt Dec 15, 2018
a159e41
Update pipeline.nf
renewiegandt Dec 15, 2018
550ad78
added parameter help
renewiegandt Dec 15, 2018
1bb8521
added separate motif_estimation script
renewiegandt Dec 15, 2018
76b07f9
Update pipeline.nf
renewiegandt Dec 15, 2018
e0ec81b
Update README.md
renewiegandt Dec 15, 2018
50fb433
documentation
renewiegandt Dec 15, 2018
c1f05e2
Merge branch 'master' of https://github.molgen.mpg.de/loosolab/master…
renewiegandt Dec 15, 2018
c998452
Changes output path for create_gtf
renewiegandt Dec 15, 2018
a2f15b3
added complete info text
renewiegandt Dec 15, 2018
e330274
bug fix
renewiegandt Dec 16, 2018
2b99bb8
Merge pull request #8 from loosolab/master
renewiegandt Dec 17, 2018
e4256a9
$workdir/merged.bed -> "$workdir"/merged.bed
JannikHamp Dec 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
171 changes: 0 additions & 171 deletions .gitignore

This file was deleted.

96 changes: 89 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,109 @@

De novo motif discovery and evaluation based on footprints identified by TOBIAS

For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)

## Dependencies
* [conda](https://conda.io/docs/user-guide/install/linux.html)
* [Nextflow](https://www.nextflow.io/)
* [MEME-Suite](http://meme-suite.org/doc/install.html?man_type=web)

## Installation
Start with installing all dependencies listed above.
Download all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
The Nextflow-script needs a conda enviroment to run. To create this enviroment you need the yml-file from the repository.
Start with installing all dependencies listed above. It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar).
this can be done with following commands:
```
export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH
export PATH=[meme-suite instalation path]/bin:$PATH
```


Download all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
The Nextflow-script needs a conda enviroment to run. Nextflow can create the needed enviroment from the given yaml-file.
On some systems Nextflow exits the run with following error:
```
Caused by:
Failed to create Conda environment
command: conda env create --prefix --file env.yml
status : 143
message:
```
If this error occurs you have to create the enviroment before starting the pipeline.
To create this enviroment you need the yml-file from the repository.
Run the following commands to create the enviroment:
```console
path=[Path to given masterenv.yml file]
conda env create --name masterenv -f=$path
source activate masterenv
```
When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.

**Important Note:** For conda the channel bioconda needs to be set as highest priority! This is required due to two differnt packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfisch package from the channel conda-forge!

## Quick Start
```console
nextflow run pipeline.nf --input [INPUT-file] --bed [INPUT-bed] --genome_fasta [path to file] --jaspar_db [path to motif database as meme-file] --config uropa.config
nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file] --config [UROPA-config-file]
```
## Parameters
For a detailed overview for all parameters follow this [link](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki/Configuration).
```
## Parameter
Required arguments:
--bigwig Path to BigWig-file
--bed Path to BED-file
--genome_fasta Path to genome in FASTA-format
--motif_db Path to motif-database in MEME-format
--config Path to UROPA configuration file
--create_known_tfbs_path Path to directory where output from tfbsscan (known motifs) are stored.
Path can be set as tfbs_path in next run. (Default: './')
--out Output Directory (Default: './out/')

Optional arguments:

--help [0|1] 1 to show this help message. (Default: 0)
--tfbs_path Path to directory with output from tfbsscan. If given tfbsscan will not be run.

For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)
Footprint extraction:
--window_length INT This parameter sets the length of a sliding window. (Default: 200)
--step INT This parameter sets the number of positions to slide the window forward. (Default: 100)
--percentage INT Threshold in percent (Default: 0)

Filter unknown motifs:
--min_size_fp INT Minimum sequence length threshold. Smaller sequences are discarded. (Default: 10)
--max_size_fp INT Maximum sequence length threshold. Discards all sequences longer than this value. (Default: 100)

Clustering:
Sequence preparation/ reduction:
--kmer INT Kmer length (Default: 10)
--aprox_motif_len INT Motif length (Default: 10)
--motif_occurence FLOAT Percentage of motifs over all sequences. Use 1 (Default) to assume every sequence contains a motif.
--min_seq_length Interations Remove all sequences below this value. (Default: 10)

Clustering:
--global INT Global (=1) or local (=0) alignment. (Default: 0)
--identity FLOAT Identity threshold. (Default: 0.8)
--sequence_coverage INT Minimum aligned nucleotides on both sequences. (Default: 8)
--memory INT Memory limit in MB. 0 for unlimited. (Default: 800)
--throw_away_seq INT Remove all sequences equal or below this length before clustering. (Default: 9)
--strand INT Align +/+ & +/- (= 1). Or align only +/+ (= 0). (Default: 0)

Motif estimation:
--min_seq INT Sets the minimum number of sequences required for the FASTA-files given to GLAM2. (Default: 100)
--motif_min_key INT Minimum number of key positions (aligned columns) in the alignment done by GLAM2. (Default: 8)
--motif_max_key INT Maximum number of key positions (aligned columns) in the alignment done by GLAM2.f (Default: 20)
--iteration INT Number of iterations done by glam2. More Iterations: better results, higher runtime. (Default: 10000)
--tomtom_treshold float Threshold for similarity score. (Default: 0.01)
--best_motif INT Get the best X motifs per cluster. (Default: 3)

Moitf clustering:
--cluster_motif Boolean If 1 pipeline clusters motifs. If its 0 it does not. (Defaul: 0)
--edge_weight INT Minimum weight of edges in motif-cluster-graph (Default: 5)
--motif_similarity_thresh FLOAT Threshold for motif similarity score (Default: 0.00001)

Creating GTF:
--organism [hg38 | hg19 | mm9 | mm10] Input organism
--tissues List/String List of one or more keywords for tissue-/category-activity, categories must be specified as in JSON
config
All arguments can be set in the configuration files
```



For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)