Skip to content

Dev #65

Merged
merged 219 commits into from
Jan 13, 2019
Merged

Dev #65

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
219 commits
Select commit Hold shift + click to select a range
3b4eb71
print number of reduced kmer
HendrikSchultheis Dec 6, 2018
2b99bb8
Merge pull request #8 from loosolab/master
renewiegandt Dec 17, 2018
e4256a9
$workdir/merged.bed -> "$workdir"/merged.bed
JannikHamp Dec 17, 2018
58c8478
Merge pull request #9 from loosolab/dev
renewiegandt Dec 18, 2018
cc23250
Bugfix in bed_to_fasta.R: Get last and second last instead of fixed i…
renewiegandt Dec 18, 2018
62f6f3f
bed_to_fasta.R: Improved documentation
renewiegandt Dec 18, 2018
cf9dcd8
bed_to_fasta.R: Imporved parametercalling with optparse
renewiegandt Dec 19, 2018
6d5c604
adaption of pipeline.nf to changes in bed_to_fasta.R
renewiegandt Dec 19, 2018
ce52871
Refactoring
renewiegandt Dec 19, 2018
4ec60f7
Merge pull request #13 from loosolab/dev
renewiegandt Dec 19, 2018
b74af7f
changing the separator in the last line
anastasiia Dec 19, 2018
117ab50
adding a column name strand containing always .
anastasiia Dec 19, 2018
98985d1
refactoring; renamed reduce_bed to reduce_sequence
HendrikSchultheis Dec 19, 2018
8faf399
Merge pull request #15 from loosolab/peak_calling
renewiegandt Dec 19, 2018
e0b9d38
check whether jellyfish is installed
HendrikSchultheis Dec 19, 2018
1730868
reduce_bed renamed to reduce_sequence
HendrikSchultheis Dec 19, 2018
3c4f733
get_best_motif.py: fixed bug which caused to print motif header as la…
renewiegandt Dec 19, 2018
88fa298
check whether jellyfish is installed
HendrikSchultheis Dec 19, 2018
e17d1db
check whether cdhit is installed
HendrikSchultheis Dec 19, 2018
dcd185e
omit TODO
HendrikSchultheis Dec 19, 2018
4c16f6f
check for header and forward it if provided
HendrikSchultheis Dec 19, 2018
5a7c84e
automatically detect and keep column names if provided
HendrikSchultheis Dec 19, 2018
97464ca
added author; better missing input error
HendrikSchultheis Dec 19, 2018
cc532bf
added author
HendrikSchultheis Dec 19, 2018
8389226
Fixed typos in get_best_motif.py
renewiegandt Dec 19, 2018
2fca158
Reads BED-files with or without header
renewiegandt Dec 19, 2018
4dea8e4
Imporved description for installation in README.md
renewiegandt Dec 20, 2018
1a7a812
Removed snakemake from yaml-file
renewiegandt Dec 20, 2018
4844609
Set parameter organism as required wihtout an default value
renewiegandt Dec 20, 2018
d60faa7
spell check
HendrikSchultheis Dec 20, 2018
46cfc59
Added Parameter gtf_path. If path is set process create_gtf will be s…
renewiegandt Dec 20, 2018
6507643
spell check
HendrikSchultheis Dec 20, 2018
d86f788
fixed more typos
HendrikSchultheis Dec 20, 2018
756e98f
process description for reduce_sequence and clustering
HendrikSchultheis Dec 20, 2018
5e46266
Fixed typo in bed_to_fasta.R
renewiegandt Dec 20, 2018
80963a3
Merge pull request #12 from loosolab/motif_estiamtion
renewiegandt Dec 20, 2018
1c6bcf1
Added new parameter list to README.mf
renewiegandt Dec 20, 2018
d70610e
Merge branch 'motif_estiamtion' of https://github.molgen.mpg.de/looso…
renewiegandt Dec 20, 2018
935ba3f
Merge pull request #17 from loosolab/cluster
HendrikSchultheis Dec 21, 2018
13bccda
Fixed bug in pipeline.nf: parameter gtf_path is now working
renewiegandt Dec 21, 2018
181fc68
Merge pull request #21 from loosolab/motif_estiamtion
renewiegandt Dec 22, 2018
9a58755
Improving comments
anastasiia Dec 27, 2018
e224922
fix for the help message called using -h
anastasiia Jan 2, 2019
36d7756
fixing the bug with the output directory name
anastasiia Jan 2, 2019
e29ad65
Merge pull request #24 from loosolab/peak_calling
renewiegandt Jan 3, 2019
b7c80c8
sorting scripts depending on their function
renewiegandt Jan 3, 2019
83460e9
Renaming output paths
renewiegandt Jan 3, 2019
fde8a8b
install optparse if not yet installed; added missing author to docume…
HendrikSchultheis Jan 3, 2019
1c392da
Merge pull request #30 from loosolab/cluster
HendrikSchultheis Jan 3, 2019
ab6f883
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 3, 2019
8993670
Merge pull request #28 from loosolab/estimation_motifs
renewiegandt Jan 3, 2019
c0c36bc
Added more comments and author / mail description
Jan 3, 2019
e629131
missing points in readme
anastasiia Jan 3, 2019
e1db4c2
Added functionality to Fix #19
Jan 3, 2019
f3b93f2
Tries to Fix #18
Jan 3, 2019
39cdf16
renamed ID field to gene_id in response to: #29
Jan 3, 2019
6016c18
added "_" separation in the third gtf-column in response to #29
Jan 3, 2019
c9e7c82
Merge pull request #31 from loosolab/anastasiia-patch-1
renewiegandt Jan 3, 2019
91c4ed5
Commented every function / method to resolve #29
Jan 3, 2019
30838d4
Commented every function / method to resolve #29
Jan 3, 2019
a747799
Removed data_files
Jan 3, 2019
94794d5
Merge remote-tracking branch 'origin/dev' into gtf_creation
Jan 3, 2019
694291c
updated script. Now the path of subscripts is automatically aquired. …
JannikHamp Jan 4, 2019
54723f7
Merge pull request #37 from loosolab/dev
anastasiia Jan 4, 2019
e9850e0
if there is no additional information provided by the original input …
anastasiia Jan 4, 2019
2dfa509
checking for the directory the user wants to save the file in, and cr…
anastasiia Jan 4, 2019
18a4fb5
adding a check if the directory was passed by the user
anastasiia Jan 4, 2019
3ebb1b8
added some documentation
JannikHamp Jan 4, 2019
7d834a4
added collumn for strand information
JannikHamp Jan 4, 2019
7de9a1b
added column "strand"
JannikHamp Jan 4, 2019
1a49821
documentation
JannikHamp Jan 4, 2019
9b5b061
documentation
JannikHamp Jan 4, 2019
6d0da12
Merge pull request #38 from loosolab/peak_calling
anastasiia Jan 4, 2019
9ecba8a
documentation
JannikHamp Jan 4, 2019
58eff98
documentation
JannikHamp Jan 4, 2019
e38a469
Added check for optparse to bed_to_fasta.R
renewiegandt Jan 4, 2019
d4a3b6c
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 4, 2019
8b28149
improved motif clustering
renewiegandt Jan 4, 2019
2064e52
imporved structure of merge_similar_clusters.R
renewiegandt Jan 5, 2019
b1cab10
Update compareBed.sh
JannikHamp Jan 5, 2019
f800e77
bed_to_fasta.R: improved syntax; fixed typos
renewiegandt Jan 6, 2019
7ff588b
label_cluster.R: improved syntax and documentation; fixed typos
renewiegandt Jan 6, 2019
798dd77
merge_similar_clusters.R: improved syntax and documentation; fixed typos
renewiegandt Jan 6, 2019
fc9c02c
Added ";" in attributes (last) field -> fix #29
Jan 6, 2019
b3223d1
Added Validator -> Fix #32
Jan 6, 2019
cca8ac4
Added check for \t at the end of lines in BED-file
renewiegandt Jan 6, 2019
f983d22
Remove header = false from fread
renewiegandt Jan 6, 2019
947cbfb
Added Validator
Jan 6, 2019
eb36ef7
Merge pull request #39 from loosolab/estimation_motifs
renewiegandt Jan 6, 2019
4f40b11
Changed comments for pullrequest
Jan 6, 2019
571cf5f
Changed comments for pullrequest
Jan 6, 2019
6cd7e0e
Changed comments for pullrequest
Jan 6, 2019
0d9cca0
Changed comments activity categorizer
Jan 6, 2019
7f8cc35
Changed comments activity table generator
Jan 6, 2019
6150f03
Merge remote-tracking branch 'origin/dev' into gtf_creation
Jan 6, 2019
93ceb85
Added gitignore !
Jan 6, 2019
e3c0023
removed pycache files
Jan 6, 2019
b97841f
Specified statement in GTFGen
Jan 6, 2019
1134d74
merge.R: Set separator from auto to '\t' in fread
renewiegandt Jan 6, 2019
5fd7cc4
Merge branch 'cluster' of https://github.molgen.mpg.de/loosolab/maste…
HendrikSchultheis Jan 7, 2019
c8e16f8
Merge remote-tracking branch 'origin/dev' into cluster
HendrikSchultheis Jan 7, 2019
5ee2113
show help if called without arguments; require = quiet
HendrikSchultheis Jan 7, 2019
dd46d81
show help if called without arguments; require = quiet
HendrikSchultheis Jan 7, 2019
aa1e9f0
fixing the bug with footprint length
anastasiia Jan 7, 2019
c452934
fixing the bug with overlaps + adding a merging function if the footp…
anastasiia Jan 7, 2019
7a5d6df
allowing user to set the parameter for max bp allowed in between the …
anastasiia Jan 7, 2019
ffbf294
Basic parameter checks added
HendrikSchultheis Jan 8, 2019
0308a0c
changing the name of footprints from footprint_ to f_ to make sure th…
anastasiia Jan 8, 2019
e4d5c5c
Merge pull request #40 from loosolab/gtf_creation
SebastianBeyvers Jan 8, 2019
ede935b
Update yaml; removed debug code from pipeline; update README
renewiegandt Jan 8, 2019
be868ef
Merge pull request #41 from loosolab/peak_calling
anastasiia Jan 8, 2019
5b8bb7a
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 8, 2019
37c69e6
this r script is no more necessary, the other does its job
JannikHamp Jan 8, 2019
6802d72
New updated version. Faster and more robust
JannikHamp Jan 8, 2019
b787d14
updated version.. make unique updated, more robust in general
JannikHamp Jan 8, 2019
961e0ec
Added basic parameter checks.
HendrikSchultheis Jan 8, 2019
fe5ab42
get_best_motif.py: Added alternative name to best_motif file
renewiegandt Jan 8, 2019
7c3bb47
added the shebang line
anastasiia Jan 8, 2019
e3dc513
Add shebang to tfbsscan
msbentsen Jan 8, 2019
becfeae
added information for logfile
JannikHamp Jan 8, 2019
c55e8ff
added dicumentation and parameter for .stats output file
JannikHamp Jan 8, 2019
adfa30e
Merge pull request #42 from loosolab/estimation_motifs
renewiegandt Jan 8, 2019
79b7e24
get_best_motif.py: removed whitespace from motif header
renewiegandt Jan 8, 2019
8377340
added script get_motif_seq.R
renewiegandt Jan 8, 2019
3f9e4bc
pipeline.nf: implemented get_motif_seq.R
renewiegandt Jan 8, 2019
49ec389
added r packages: RJSONIO, varhandle to masterenv.yml
renewiegandt Jan 8, 2019
98ab76a
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 8, 2019
4acc20f
documentation changes
JannikHamp Jan 9, 2019
7acce38
Merge pull request #45 from loosolab/peak_calling
anastasiia Jan 9, 2019
f6ddce3
Fixed naming scheme for 3rd column
Jan 9, 2019
d979e16
get_best_motif.py: added parameter cluster id
renewiegandt Jan 9, 2019
ea2e171
get_motif_seq.R: added parameter tmp_path and cluster_id
renewiegandt Jan 9, 2019
8e5049e
pipeline.nf: adjusting to new parameters required by get_best_motif;…
renewiegandt Jan 9, 2019
39f4856
Parameter summary added; fixed wrong default for threads parameter
HendrikSchultheis Jan 9, 2019
90c8c05
updated check for trailing tabs in motiffiles
JannikHamp Jan 9, 2019
1a35ce5
removed echo from testing
JannikHamp Jan 9, 2019
31315e6
the structure is ready, this need to be adjusted to the data the foot…
anastasiia Jan 9, 2019
1439045
Added log file for part 2.2_motif_estimation
renewiegandt Jan 9, 2019
530627c
more documentation, replce =, <-
JannikHamp Jan 9, 2019
53b0857
Parameter summary added
HendrikSchultheis Jan 9, 2019
8d1ff19
Added Shebang in response to #44
Jan 9, 2019
81c6fcd
Merge pull request #33 from loosolab/JannikHamp-patch-1
renewiegandt Jan 9, 2019
9ab7116
Merge pull request #43 from loosolab/cluster
renewiegandt Jan 9, 2019
cff01fc
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 9, 2019
0f4bf46
added missing shebangs to 2.2 scripts #44
renewiegandt Jan 9, 2019
c0a462b
pipeline.nf: adjusting to new parameter required by 2.1 scripts
renewiegandt Jan 9, 2019
d4b1245
fixing the bug with merging. The code will be edited soon to look nic…
anastasiia Jan 9, 2019
0f22336
Merge branch 'peak_calling' of https://github.molgen.mpg.de/loosolab/…
anastasiia Jan 9, 2019
b102737
fixing the typo footprints and not footrpints
anastasiia Jan 9, 2019
13a610f
Merge pull request #49 from loosolab/gtf_creation
SebastianBeyvers Jan 10, 2019
e62bb36
pipeline.nf: fixed typos
renewiegandt Jan 10, 2019
a161ed0
Updated config, updated skript for changes in Ensembl Release 95 temp…
Jan 10, 2019
f3b3db4
get_best_motif.py: Removed additional whitespace in motif header
renewiegandt Jan 10, 2019
2d2f8cb
pipeline.nf: removed code for debuging
renewiegandt Jan 10, 2019
881b445
fixed max_pos calculation
JannikHamp Jan 10, 2019
b128196
correct maxpos calculation
JannikHamp Jan 10, 2019
b2f3773
the max_pos is the index, and so add always 1 after calculating it. O…
anastasiia Jan 10, 2019
5343f60
fixed for correct contains_maxpos calculation
JannikHamp Jan 10, 2019
438e806
Merge pull request #53 from loosolab/JannikHamp-patch-2
renewiegandt Jan 10, 2019
43c55ae
bed_to_fasta.R: changes lapply to vapply
renewiegandt Jan 10, 2019
7d793c0
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 10, 2019
2fe66fd
adding a validation while printing to the output file. If there are p…
anastasiia Jan 10, 2019
310702e
Removed import os from get_best_motif.py
renewiegandt Jan 10, 2019
b70a38b
Removed newline from get_motif_seq
renewiegandt Jan 10, 2019
4397f77
script for calculation of absolute max_pos values at start
JannikHamp Jan 10, 2019
9a4c9c1
added Rscript for maxpos calculation
JannikHamp Jan 10, 2019
428302a
Merge pull request #47 from loosolab/estimation_motifs
HendrikSchultheis Jan 10, 2019
327dd04
fixed for correct max_pos calculation after possible splitting
JannikHamp Jan 10, 2019
3add693
Merge branch 'estimation_motifs' of https://github.molgen.mpg.de/loos…
renewiegandt Jan 10, 2019
0c22a5e
Merge branch 'dev' into tfbsscan-shebang
HendrikSchultheis Jan 10, 2019
58355d4
Merge pull request #46 from loosolab/tfbsscan-shebang
HendrikSchultheis Jan 10, 2019
20849d5
forgot $path to start the new Rscript. now fixed
JannikHamp Jan 10, 2019
1523096
deleting the print statements, adding some comments
anastasiia Jan 10, 2019
97c526d
Fix for #51
Jan 10, 2019
7514dfb
Fixed 1 output value of FilterMotifs.stats output. Flag 1 ratio was n…
JannikHamp Jan 11, 2019
6cd8c7f
check if output ends with .bed
JannikHamp Jan 11, 2019
f1a0690
Delete abs_max_score.R
JannikHamp Jan 11, 2019
9144c1f
max_pos calculation is done in one Rscript
JannikHamp Jan 11, 2019
8ed11f2
documentation
JannikHamp Jan 11, 2019
aa306f9
changed stats file name to compareBed.stats
JannikHamp Jan 11, 2019
cb16c3c
changed name of output compareBed.stats in description
JannikHamp Jan 11, 2019
f4a745c
Merge pull request #54 from loosolab/JannikHamp-patch-1
JannikHamp Jan 11, 2019
3a14de8
added info, that file_paths must not contain the "|" pipe symbol
JannikHamp Jan 11, 2019
6ed7d98
Added sorted outputs to improve handling in IGV
Jan 11, 2019
6c912f8
Removed some typos in Readme
Jan 11, 2019
332036c
Merge pull request #59 from loosolab/JannikHamp-patch-1
renewiegandt Jan 11, 2019
33ad341
Merge pull request #56 from loosolab/gtf_creation
SebastianBeyvers Jan 11, 2019
0da0270
Update README.md
renewiegandt Jan 12, 2019
0fb1f4c
Merge branch 'dev' into estimation_motifs
renewiegandt Jan 12, 2019
9ee772a
pipeline.nf: Default of parameter max_size_fp is set to 200
renewiegandt Jan 12, 2019
4b360e1
README.md: fixed typos
renewiegandt Jan 12, 2019
b15c203
add r package bit64 to environment
HendrikSchultheis Jan 12, 2019
b65dc90
cast start and end column to integer64 to prevent scientific notation
HendrikSchultheis Jan 12, 2019
7c41856
cast start and end column to integer64 to prevent scientific notation
HendrikSchultheis Jan 12, 2019
5acffac
fixed missing point
HendrikSchultheis Jan 12, 2019
3877157
Update README.md
renewiegandt Jan 12, 2019
a9331ac
making the list out of the dictionary in the merge function to ensure…
anastasiia Jan 12, 2019
40088b2
deleting print statements
anastasiia Jan 12, 2019
220a508
changing the name of max_bp_between to min_gap
anastasiia Jan 12, 2019
c398ce3
Merge pull request #50 from loosolab/peak_calling
anastasiia Jan 12, 2019
d644c07
Update README.md
anastasiia Jan 12, 2019
499206d
Merge pull request #63 from loosolab/cluster
HendrikSchultheis Jan 12, 2019
50214f0
Update README.md
renewiegandt Jan 12, 2019
34a2830
fixed .bed.bed unconsitent output name
JannikHamp Jan 12, 2019
f3c72c8
Merge branch 'estimation_motifs' of https://github.molgen.mpg.de/loos…
renewiegandt Jan 12, 2019
f371e1d
yml: fixed version of python and numpy to prevent bug in 1.1
renewiegandt Jan 12, 2019
e3afe9f
remove personal path from nextflow.config
renewiegandt Jan 12, 2019
0428556
renamed config files
renewiegandt Jan 12, 2019
5ea2f7a
Update README and help message in pipeline
renewiegandt Jan 12, 2019
d461966
pipeline.nf: added missing params.
renewiegandt Jan 12, 2019
574881f
Fixed typo
renewiegandt Jan 12, 2019
f1d0dc7
Updated configs to new parameteres
renewiegandt Jan 12, 2019
c93ecaf
Merge pull request #64 from loosolab/JannikHamp-patch-3
renewiegandt Jan 13, 2019
5b201c0
Added demo run files
renewiegandt Jan 13, 2019
1a740c2
Added demo command to README.md
renewiegandt Jan 13, 2019
95edb00
Added missing letter in README.md
renewiegandt Jan 13, 2019
9b849f2
Merge pull request #61 from loosolab/estimation_motifs
renewiegandt Jan 13, 2019
da0e710
Update README.md: Added step 4 of installation
renewiegandt Jan 13, 2019
62c362e
Merge pull request #66 from loosolab/estimation_motifs
SebastianBeyvers Jan 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
206 changes: 206 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Created by .ignore support plugin (hsz.mobi)
### JetBrains template
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff
.idea
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/modules.xml
# .idea/*.iml
# .idea/modules

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
.idea/httpRequests
### R template
# History files
.Rhistory
.Rapp.history

# Session Data files
.RData

# Example code in package build process
*-Ex.R

# Output files from R CMD build
/*.tar.gz

# Output files from R CMD check
/*.Rcheck/

# RStudio files
.Rproj.user/

# produced vignettes
vignettes/*.html
vignettes/*.pdf

# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth

# knitr and R markdown default cache directories
/*_cache/
/cache/

# Temporary files created by R markdown
*.utf8.md
*.knit.md

# Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html
rsconnect/
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/


# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
/bin/3.1_create_gtf/data/
91 changes: 51 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,50 @@
# masterJLU2018

De novo motif discovery and evaluation based on footprints identified by TOBIAS
De novo motif discovery and evaluation based on footprints identified by TOBIAS.

For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)
For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki).

## Dependencies
* [conda](https://conda.io/docs/user-guide/install/linux.html)
* [Nextflow](https://www.nextflow.io/)
* [MEME-Suite](http://meme-suite.org/doc/install.html?man_type=web)

## Installation
Start with installing all dependencies listed above. It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar).
1. Start with installing all dependencies listed above (Nextflow, conda, MEME-Suite) and downloading all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
2. It is required to set the [environment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar).
this can be done with following commands:
```
export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH
export PATH=[meme-suite instalation path]/bin:$PATH
```


Download all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
The Nextflow-script needs a conda enviroment to run. Nextflow can create the needed enviroment from the given yaml-file.
On some systems Nextflow exits the run with following error:
```
Caused by:
Failed to create Conda environment
command: conda env create --prefix --file env.yml
status : 143
message:
```
If this error occurs you have to create the enviroment before starting the pipeline.
To create this enviroment you need the yml-file from the repository.
Run the following commands to create the enviroment:
```console
path=[Path to given masterenv.yml file]
conda env create --name masterenv -f=$path
3. Every other dependency will be automatically installed using conda. For that a conda environment has to be created from the yaml-file given in this repository.
It is required to create and activate the environment from the yaml-file beforehand.
This can be done with following commands:
```condsole
conda env create -f masterenv.yml
conda activate masterenv
```
When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.

**Important Note:** For conda the channel bioconda needs to be set as highest priority! This is required due to two differnt packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfisch package from the channel conda-forge!
4. Set the wd parameter in the nextflow.config file as path where the repository is saved. For example: '~/masterJLU2018/'.


**Important Note:** For conda the channel bioconda needs to be set as highest priority! This is required due to two different packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfish package from the channel conda-forge!



## Quick Start
```console
nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file] --config [UROPA-config-file]
nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file] --organism [mm10|mm9|hg19|hg38]
```

### Demo run
There are files provided inside ./demo/ for a demo run.
Go to the main directory and run following command:
```
nextflow run pipeline.nf --bigwig ./demo/buenrostro50k_chr1_fp.bw --bed ./demo/buenrostro50k_chr1_peaks.bed --genome_fasta ./demo/hg38/hg38_chr1.fa --motif_db ./demo/motif_database/jaspar_vertebrates.meme --out ./demo/buenrostro50k_chr1_out/ --create_known_tfbs_path ./demo/known_tfbs_hg38_chr1/ --organism hg38
```

## Parameters
For a detailed overview for all parameters follow this [link](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki/Configuration).
```
Expand All @@ -52,31 +54,34 @@ Required arguments:
--genome_fasta Path to genome in FASTA-format
--motif_db Path to motif-database in MEME-format
--config Path to UROPA configuration file
--create_known_tfbs_path Path to directory where output from tfbsscan (known motifs) are stored.
Path can be set as tfbs_path in next run. (Default: './')
--out Output Directory (Default: './out/')

--organism Input organism [hg38 | hg19 | mm9 | mm10]
--out Output Directory (Default: './out/')

Optional arguments:

--help [0|1] 1 to show this help message. (Default: 0)
--tfbs_path Path to directory with output from tfbsscan. If given tfbsscan will not be run.
--create_known_tfbs_path Path to directory where output from tfbsscan (known motifs) are stored.
Path can be set as tfbs_path in next run. (Default: './')
--gtf_path Path to gtf-file. If path is set the process which creats a gtf-file is skipped.

Footprint extraction:
--window_length INT This parameter sets the length of a sliding window. (Default: 200)
--step INT This parameter sets the number of positions to slide the window forward. (Default: 100)
--percentage INT Threshold in percent (Default: 0)
--max_bp_between INT If footprints are less than X bases appart the footprints will be merged (Default: 6)

Filter unknown motifs:
Filter motifs:
--min_size_fp INT Minimum sequence length threshold. Smaller sequences are discarded. (Default: 10)
--max_size_fp INT Maximum sequence length threshold. Discards all sequences longer than this value. (Default: 100)
--max_size_fp INT Maximum sequence length threshold. Discards all sequences longer than this value. (Default: 200)
--tfbsscan_method [moods|fimo] Method used by tfbsscan. (Default: moods)

Clustering:
Cluster:
Sequence preparation/ reduction:
--kmer INT Kmer length (Default: 10)
--kmer INT K-mer length (Default: 10)
--aprox_motif_len INT Motif length (Default: 10)
--motif_occurence FLOAT Percentage of motifs over all sequences. Use 1 (Default) to assume every sequence contains a motif.
--min_seq_length Interations Remove all sequences below this value. (Default: 10)

Clustering:
--global INT Global (=1) or local (=0) alignment. (Default: 0)
--identity FLOAT Identity threshold. (Default: 0.8)
Expand All @@ -88,23 +93,29 @@ Optional arguments:
Motif estimation:
--min_seq INT Sets the minimum number of sequences required for the FASTA-files given to GLAM2. (Default: 100)
--motif_min_key INT Minimum number of key positions (aligned columns) in the alignment done by GLAM2. (Default: 8)
--motif_max_key INT Maximum number of key positions (aligned columns) in the alignment done by GLAM2.f (Default: 20)
--iteration INT Number of iterations done by glam2. More Iterations: better results, higher runtime. (Default: 10000)
--tomtom_treshold float Threshold for similarity score. (Default: 0.01)
--motif_max_key INT Maximum number of key positions (aligned columns) in the alignment done by GLAM2. (Default: 20)
--iteration INT Number of iterations done by GLAM2. More Iterations: better results, higher runtime. (Default: 10000)
--tomtom_treshold FLOAT Threshold for similarity score. (Default: 0.01)
--best_motif INT Get the best X motifs per cluster. (Default: 3)

Moitf clustering:
--cluster_motif Boolean If 1 pipeline clusters motifs. If its 0 it does not. (Defaul: 0)
--cluster_motif Boolean If 1 pipeline clusters motifs. If its 0 it does not. (Default: 0)
--edge_weight INT Minimum weight of edges in motif-cluster-graph (Default: 5)
--motif_similarity_thresh FLOAT Threshold for motif similarity score (Default: 0.00001)

Creating GTF:
--organism [hg38 | hg19 | mm9 | mm10] Input organism
--tissues List/String List of one or more keywords for tissue-/category-activity, categories must be specified as in JSON
config
All arguments can be set in the configuration files
```

For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki).

## Known issues

For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)
For unknown reasons, the tool [MOODS](https://www.cs.helsinki.fi/group/pssmfind/), which is called by the tfbsscan, rarely returns empty bedfiles, the problem is probably with the function _pfm_to_log_odds_. If MOODS does not work as expected and has problems with this function, you will see following error message:
```
ERROR
All motiffiles have less than 2 lines!
Fix motiffiles and try again.
```
There is no known fix so far. As a workaround either restart the pipeline in some hours with the same parameters or change the parameter tfbsscan_method to _fimo_ which forces the tfbsscan to use [fimo](http://meme-suite.org/doc/fimo.html). This methods takes longer but will cause no known error with empty bed files.