Merge branch 'dev' into tfbsscan-shebang

loosolab · Jan 10, 2019 · 0c22a5e · 0c22a5e
2 parents e3dc513 + 13a610f
commit 0c22a5e
Show file tree

Hide file tree

Showing 48 changed files with 2,099 additions and 1,014 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,206 @@
+# Created by .ignore support plugin (hsz.mobi)
+### JetBrains template
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+
+# User-specific stuff
+.idea
+.idea/**/tasks.xml
+.idea/**/usage.statistics.xml
+.idea/**/dictionaries
+.idea/**/shelf
+
+# Sensitive or high-churn files
+.idea/**/dataSources/
+.idea/**/dataSources.ids
+.idea/**/dataSources.local.xml
+.idea/**/sqlDataSources.xml
+.idea/**/dynamic.xml
+.idea/**/uiDesigner.xml
+.idea/**/dbnavigator.xml
+
+# Gradle
+.idea/**/gradle.xml
+.idea/**/libraries
+
+# Gradle and Maven with auto-import
+# When using Gradle or Maven with auto-import, you should exclude module files,
+# since they will be recreated, and may cause churn.  Uncomment if using
+# auto-import.
+# .idea/modules.xml
+# .idea/*.iml
+# .idea/modules
+
+# CMake
+cmake-build-*/
+
+# Mongo Explorer plugin
+.idea/**/mongoSettings.xml
+
+# File-based project format
+*.iws
+
+# IntelliJ
+out/
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Cursive Clojure plugin
+.idea/replstate.xml
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+fabric.properties
+
+# Editor-based Rest Client
+.idea/httpRequests
+### R template
+# History files
+.Rhistory
+.Rapp.history
+
+# Session Data files
+.RData
+
+# Example code in package build process
+*-Ex.R
+
+# Output files from R CMD build
+/*.tar.gz
+
+# Output files from R CMD check
+/*.Rcheck/
+
+# RStudio files
+.Rproj.user/
+
+# produced vignettes
+vignettes/*.html
+vignettes/*.pdf
+
+# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
+.httr-oauth
+
+# knitr and R markdown default cache directories
+/*_cache/
+/cache/
+
+# Temporary files created by R markdown
+*.utf8.md
+*.knit.md
+
+# Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html
+rsconnect/
+### Python template
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+/bin/3.1_create_gtf/data/
diff --git a/README.md b/README.md
@@ -1,44 +1,29 @@
 # masterJLU2018
 
-De novo motif discovery and evaluation based on footprints identified by TOBIAS
+De novo motif discovery and evaluation based on footprints identified by TOBIAS.
 
-For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)
+For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki).
 
 ## Dependencies
 * [conda](https://conda.io/docs/user-guide/install/linux.html)
 * [Nextflow](https://www.nextflow.io/)
 * [MEME-Suite](http://meme-suite.org/doc/install.html?man_type=web)
 
 ## Installation
-Start with installing all dependencies listed above. It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar).
+Start with installing all dependencies listed above (Nextflow, conda, MEME-Suite) and downloading all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
+It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar).
 this can be done with following commands:
 ```
 export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH
 export PATH=[meme-suite instalation path]/bin:$PATH
 ```
 
-
-Download all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018).
-The Nextflow-script needs a conda enviroment to run. Nextflow can create the needed enviroment from the given yaml-file.
-On some systems Nextflow exits the run with following error:
-```
-Caused by:
-  Failed to create Conda environment
-  command: conda env create --prefix  --file env.yml
-  status : 143
-  message:
-```
-If this error occurs you have to create the enviroment before starting the pipeline.
-To create this enviroment you need the yml-file from the repository.
-Run the following commands to create the enviroment:
-```console
-path=[Path to given masterenv.yml file]
-conda env create --name masterenv -f=$path
-```
-When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.
+Every other dependency will be automatically  installed by Nextflow using conda. For that a new conda enviroment will be created, which can be found in the from Nextflow created work directory after the first pipeline run.
+It is **not** required to create and activate the enviroment from the yaml-file beforehand.
 
 **Important Note:** For conda the channel bioconda needs to be set as highest priority! This is required due to two differnt packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfisch package from the channel conda-forge!
 
+
 ## Quick Start
 ```console
 nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file] --config [UROPA-config-file]
@@ -52,14 +37,16 @@ Required arguments:
 	--genome_fasta		 Path to genome in FASTA-format
 	--motif_db		 Path to motif-database in MEME-format
 	--config		 Path to UROPA configuration file
-	--create_known_tfbs_path Path to directory where output from tfbsscan (known motifs) are stored.
-				 Path can be set as tfbs_path in next run. (Default: './')
-	--out			 Output Directory (Default: './out/')	
-	
+ 	--organism 		 Input organism [hg38 | hg19 | mm9 | mm10]
+	--out			 Output Directory (Default: './out/')
+
 Optional arguments:
-	
+
 	--help [0|1]		1 to show this help message. (Default: 0)
 	--tfbs_path 		Path to directory with output from tfbsscan. If given tfbsscan will not be run.
+	--create_known_tfbs_path Path to directory where output from tfbsscan (known motifs) are stored.
+				 Path can be set as tfbs_path in next run. (Default: './')
+	--gtf_path			Path to gtf-file. If path is set the process which creats a gtf-file is skipped.
 
 	Footprint extraction:
 	--window_length INT	This parameter sets the length of a sliding window. (Default: 200)
@@ -99,12 +86,28 @@ Optional arguments:
 	--motif_similarity_thresh FLOAT	Threshold for motif similarity score (Default: 0.00001)
 
 	Creating GTF:
-	--organism [hg38 | hg19 | mm9 | mm10]	Input organism
 	--tissues List/String 	List of one or more keywords for tissue-/category-activity, categories must be specified as in JSON
 				config
 All arguments can be set in the configuration files
  ```
 
+For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki).
 
-
-For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki)
+## Known issues
+The Nextflow-script needs a conda enviroment to run. Nextflow creates the needed enviroment from the given yaml-file.
+On some systems Nextflow exits the run with following error:
+```
+Caused by:
+  Failed to create Conda environment
+  command: conda env create --prefix  --file env.yml
+  status : 143
+  message:
+```
+If this error occurs you have to create the enviroment before starting the pipeline.
+To create this enviroment you need the yml-file from the repository.
+Run the following commands to create the enviroment:
+```console
+path=[Path to given masterenv.yml file]
+conda env create --name masterenv -f $path
+```
+When the enviroment is created, set the variable 'path_env' in the configuration file as the path to it.