Skip to content

List of Parameters

renewiegandt edited this page Apr 12, 2019 · 13 revisions

All parameters can be set via the command line ( e.g. --bigwig /path/to/file ) or in the given configuration files. The files are stored in the config directory. You can also write your own configuration files and inculde them to the nextflow.config. (https://www.nextflow.io/docs/latest/config.html#config-include)

Required parameters

Parameter Input Description Default
bigwig Path Path to a bigWig-file with the signal over the peaks of interest. -
bed Path Path to a BED-file with peaks of interest corresponding to bigwig file. -
genome_fasta Path Path to the genome in FASTA-format. -
motif_db Path Path to a MEME-file containing known motifs (e.g. Jaspar Core database). -
organism String Input organism must be hg38/hg19 or mm9/mm10. -
config Path Path to UROPA configuration file. -
gtf_annotation Path Path to gtf annotation file (e.g. ensembl mainChr) -
out Path Output path. ./out/
help Boolean Show the help message 0
tfbs_path Path Path to directory with tfbsscan output. If given tfbsscan will be skipped. -
gtf_merged Path Path to gtf-file for uropa annotation. If a path is set the process is skipped, which creates a gtf-file from the annotation gtf and the gtf containing regulatory elements. -

Footprint extraction

Parameter Input Description Default
window_length Integer This parameter sets the length of a sliding window. 200
step Integer This parameter sets the number of positions to slide the window forward. 100
percentage Integer By default each signal from the bigWig-file is compared to the threshold which is the mean of signals within a window (or within a peak if this peak is smaller than the chosen window_length). For example --percentage 10 will add 10% of the found threshold within a window and set it as a new threshold to compare the signal to. 0
min_gap Integer This parameter sets the minimum number of bp between two footprints. If there is a smaller number of bp between two footprints, these footprints will be merged. The new score is calculated as the mean of scores of both footprints. 6

Filter motifs

Parameter Input Description Default
min integer Minimum sequence length threshold. Smaller sequences are discarded. 10
max integer Maximum sequence length threshold. discards all sequences longer than this value. 80
tfbsscan_method moods or fimo Set method for tfbsscan moods

Cluster

Parameter Input Description Default
kmer Integer k-mer length 10
aprox_motif_length Integer Approximate motif length. 10
motif_occurrence Double Motif occurrence over all sequences in percent. 0.5 for one motif every two sequences. 1
min_seq_length Integer Remove sequences below this length before reduction. 10
global Integer Use global (= 1) or local (= 0) sequence identity. 0
identity Double Minimum identity between sequences to be added to same cluster. 0.8
sequence_coverage Integer Minimum number of aligned nucleotides between to sequences. 8
memory Integer Memory limit in MB. (0 for unlimited) 800
throw_away_seq Integer Remove sequences with this length and below before clustering. 9
strand Integer Do +/+ and +/- alignments during clustering (= 1) or only +/+ (= 0). 0

Motif estimation

Parameter Input Description Default
min_seq Integer Sets the minimum number of sequences required for the FASTA-files given to GLAM2 100
motif_min_key Integer This parameter is equivalent to the -a parameter of GLAM2. It specifies the minimum number of key positions (aligned columns) in the alignment done by GLAM2. 8
motif_max_key Integer This parameter is equivalent to the -b parameter of GLAM2. It specifies the maximum number of key positions (aligned columns) in the alignment done by GLAM2. 20
iteration Integer This parameter is equivalent to the -n parameter of GLAM2. It specifies how many iterations should pass since the highest-scoring alignment seen so far before ending each alignment run. The higher the value is the better are the results and the higher is the runtime. 10000
tomtom_treshold Float This parameter is equivalent to the -thresh parameter of Tomtom. Tomtom only reports matches with significance values ≤ the set threshold. All clusters with a match are sorted out. 0.01
seed String Set a seed for the GLAM2 run. 123456789
best_motif 1-10 Get best X motifs. 3

Motif clustering (available soon)

Parameter Input Description Default
cluster_motif Boolean If 1 pipeline clusters motifs. If it's 0 it does not. 0
edge_weight Integer Minimum weight of edges in motif-cluster-graph. If the weight is set very high, more and smaller motif clusters are created. 50
motif_similarity_thresh Float Threshold for motif similarity score. (See tomtom_treshold) 0.00001

Creating GTF

Parameter Input Description Default
tissue String List of one or more keywords for tissue-/category-activity, categories must be specified as in JSON config. The list needs to be whitespace separated. For example: "Aorta Monocyte T-Cell". -

Evaluation

Parameter Input Description Default
max_uropa_runs Integer Maximum number UROPA runs running parallelized. 10