-
Notifications
You must be signed in to change notification settings - Fork 0
Update README.md #61
Update README.md #61
Changes from 2 commits
7d793c0
3add693
0da0270
0fb1f4c
9ee772a
4b360e1
3877157
d644c07
50214f0
f3c72c8
f371e1d
e3afe9f
0428556
5ea2f7a
d461966
574881f
f1d0dc7
5b201c0
1a740c2
95edb00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,20 +10,27 @@ For further information read the [documentation](https://github.molgen.mpg.de/lo | |
* [MEME-Suite](http://meme-suite.org/doc/install.html?man_type=web) | ||
|
||
## Installation | ||
Start with installing all dependencies listed above (Nextflow, conda, MEME-Suite) and downloading all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018). | ||
It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar). | ||
1. Start with installing all dependencies listed above (Nextflow, conda, MEME-Suite) and downloading all files from the [GitHub repository](https://github.molgen.mpg.de/loosolab/masterJLU2018). | ||
2. It is required to set the [enviroment paths for meme-suite](http://meme-suite.org/doc/install.html?man_type=web#installingtar). | ||
this can be done with following commands: | ||
``` | ||
export PATH=[meme-suite instalation path]/libexec/meme-[meme-suite version]:$PATH | ||
export PATH=[meme-suite instalation path]/bin:$PATH | ||
``` | ||
|
||
Every other dependency will be automatically installed by Nextflow using conda. For that a new conda enviroment will be created, which can be found in the from Nextflow created work directory after the first pipeline run. | ||
It is **not** required to create and activate the enviroment from the yaml-file beforehand. | ||
3. Every other dependency will be automatically installed using conda. For that a conda enviroment has to be created from the yaml-file given in this repository. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two whitespaces after 'automatically'. |
||
It is required to create and activate the enviroment from the yaml-file beforehand. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. enviroNment |
||
This can be done with following commands: | ||
```condsole | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. console |
||
conda env create -f masterenv.yml | ||
conda activate masterenv | ||
``` | ||
|
||
|
||
**Important Note:** For conda the channel bioconda needs to be set as highest priority! This is required due to two different packages with the same name in different channels. For the pipeline the package jellyfish from the channel bioconda is needed and **NOT** the jellyfish package from the channel conda-forge! | ||
|
||
|
||
|
||
## Quick Start | ||
```console | ||
nextflow run pipeline.nf --bigwig [BigWig-file] --bed [BED-file] --genome_fasta [FASTA-file] --motif_db [MEME-file] --config [UROPA-config-file] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. organism ?! |
||
|
@@ -39,52 +46,44 @@ Required arguments: | |
--config Path to UROPA configuration file | ||
--organism Input organism [hg38 | hg19 | mm9 | mm10] | ||
--out Output Directory (Default: './out/') | ||
Optional arguments: | ||
--help [0|1] 1 to show this help message. (Default: 0) | ||
--tfbs_path Path to directory with output from tfbsscan. If given tfbsscan will not be run. | ||
--create_known_tfbs_path Path to directory where output from tfbsscan (known motifs) are stored. | ||
Path can be set as tfbs_path in next run. (Default: './') | ||
--gtf_path Path to gtf-file. If path is set the process which creats a gtf-file is skipped. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arguments should be in one Format: With Datatype e.g. INT or without, not mixed within the document. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If its a list of strings which are allowed I prefer to write the list instead of 'STRING'. I could add 'STRING' to every path variable but I don't think that is necessary. |
||
Footprint extraction: | ||
--window_length INT This parameter sets the length of a sliding window. (Default: 200) | ||
--step INT This parameter sets the number of positions to slide the window forward. (Default: 100) | ||
--percentage INT Threshold in percent (Default: 0) | ||
Filter unknown motifs: | ||
--min_size_fp INT Minimum sequence length threshold. Smaller sequences are discarded. (Default: 10) | ||
--max_size_fp INT Maximum sequence length threshold. Discards all sequences longer than this value. (Default: 100) | ||
--tfbsscan_method [moods|fimo] Method used by tfbsscan. (Default: moods) | ||
Clustering: | ||
Sequence preparation/ reduction: | ||
--kmer INT Kmer length (Default: 10) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *K-mer length |
||
--aprox_motif_len INT Motif length (Default: 10) | ||
--motif_occurence FLOAT Percentage of motifs over all sequences. Use 1 (Default) to assume every sequence contains a motif. | ||
--min_seq_length Interations Remove all sequences below this value. (Default: 10) | ||
Clustering: | ||
--global INT Global (=1) or local (=0) alignment. (Default: 0) | ||
--identity FLOAT Identity threshold. (Default: 0.8) | ||
--sequence_coverage INT Minimum aligned nucleotides on both sequences. (Default: 8) | ||
--memory INT Memory limit in MB. 0 for unlimited. (Default: 800) | ||
--throw_away_seq INT Remove all sequences equal or below this length before clustering. (Default: 9) | ||
--strand INT Align +/+ & +/- (= 1). Or align only +/+ (= 0). (Default: 0) | ||
Motif estimation: | ||
--min_seq INT Sets the minimum number of sequences required for the FASTA-files given to GLAM2. (Default: 100) | ||
--motif_min_key INT Minimum number of key positions (aligned columns) in the alignment done by GLAM2. (Default: 8) | ||
--motif_max_key INT Maximum number of key positions (aligned columns) in the alignment done by GLAM2.f (Default: 20) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. GLAM2.f? |
||
--iteration INT Number of iterations done by glam2. More Iterations: better results, higher runtime. (Default: 10000) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is glam2 all lowercase before all uppercase. |
||
--tomtom_treshold float Threshold for similarity score. (Default: 0.01) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Datatypes all-caps or not -> only one format should be used |
||
--best_motif INT Get the best X motifs per cluster. (Default: 3) | ||
Moitf clustering: | ||
--cluster_motif Boolean If 1 pipeline clusters motifs. If its 0 it does not. (Defaul: 0) | ||
--edge_weight INT Minimum weight of edges in motif-cluster-graph (Default: 5) | ||
--motif_similarity_thresh FLOAT Threshold for motif similarity score (Default: 0.00001) | ||
Creating GTF: | ||
--tissues List/String List of one or more keywords for tissue-/category-activity, categories must be specified as in JSON | ||
config | ||
|
@@ -94,20 +93,6 @@ All arguments can be set in the configuration files | |
For further information read the [documentation](https://github.molgen.mpg.de/loosolab/masterJLU2018/wiki). | ||
|
||
## Known issues | ||
The Nextflow-script needs a conda environment to run. Nextflow creates the needed environment from the given yaml-file. | ||
On some systems Nextflow exits the run with following error: | ||
``` | ||
Caused by: | ||
Failed to create Conda environment | ||
command: conda env create --prefix --file env.yml | ||
status : 143 | ||
message: | ||
``` | ||
If this error occurs you have to create the environment before starting the pipeline. | ||
To create this environment you need the yml-file from the repository. | ||
Run the following commands to create the environment: | ||
```console | ||
path=[Path to given masterenv.yml file] | ||
conda env create --name masterenv -f $path | ||
``` | ||
When the environment is created, set the variable 'path_env' in the configuration file as the path to it. | ||
|
||
For unknown reasons Moods, whisch is called by tfbsscan, rarely returns empty bedfiles. If this happens The pipeline stops. | ||
The is no known fix sofar. If it happens try starting the pipeline again. If it still fails. Change the parameter tfbsscan_method to 'fimo'. This methods takes longer but will cause no error. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *There is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a workaround either restart the pipeline or change the parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enviroNment not enviroment