diff --git a/docs/interpretation/CSSv1.xml b/docs/interpretation/CSSv1.xml new file mode 100644 index 0000000..3b3f5d6 --- /dev/null +++ b/docs/interpretation/CSSv1.xml @@ -0,0 +1,197 @@ + + + + CSS + 1 + + Peter Ebert + pebert@mpi-inf.mpg.de + + + This process file is in draft status. + The CSS process is designed to generate basic chromatin state segmentation BED files based on histone modifications only. + This process implements a dual strategy to produce the state segmentation: ChromHMM (by Jason Ernst) is used to generate a segmentation + comparable to the reference segmentations provided by the ROADMAP project and thus forms the basis for the automated + state labelling in this process. Additionally, EpiCSeg (by Alessandro Mammana) is used as a more sophisticated method for state + segmentation. Since there is no reference state labelling available for EpiCSeg, the state labels for EpiCSeg are produced via a simple + overlap analysis followed by a majority vote based on the labelled ChromHMM segmentation. + + + + GALvX_Histone + BAM + collection + To run this process as a default, all six marks need to be available for a sample + + + GALvX_Input + BAM + single + + + + + + chrom_lengths + TXT + single + Common 2 column file listing chromosomes (name and length) for assembly + + + state_labels + TXT + single + ROADMAP reference state labels (18 states, 6 histone marks) + + + state_colors + TXT + single + ROADMAP reference state colors + + + blacklist_regions + BED + single + Common blacklist regions, same as for CHP or DHS + + + var_ref_files + BED + collection + It is still undecided if a fix set of annotation files should be used for the + reporting feature of the tools (state overlap); due to the automated state labelling procedure, + this is not necessary + + + + + state_segmentation + BED + collection + segmentations procduced by ChromHMM and EpiCSeg and post-processed to be properly labelled + + + model + TXT + collection + EpiCSeg model + + + reports + tar.gz + collection + combined set of all other output files that are part of the tool's report + + + + + samtools + 1.2 + tmp_nodup.bam ]]> + GALvX_Histone, GALvX_Input + Remove duplicates + + + samtools + 1.2 + + GALvX_Histone, GALvX_Input + Sort prior to filtering blacklist, output is piped to next command + + + bedtools + 2.20.1 + + + Filter blacklist regions, output is piped to next command + + + samtools + 1.2 + + + Create filtered and indexed BAM file + + + bedtools + 2.20.1 + tmp_nodup_blfilt.bed ]]> + + Make simple BED file as input for ChromHMM + + + java, ChromHMM.jar + 1.7.0_65, 1.10 + + + The cellmarktable info file is built on-the-fly by the pipeline and discarded after the run + + + java, ChromHMM.jar + 1.7.0_65, 1.10 + + + The cellmarktable info file is built on-the-fly by the pipeline and discarded after the run + + + R, EpiCSeg + 3.2.0, 2016-04-04 + + no looping + The command is assembled via repeating the -m parameter (-m Mark:Filepath) for the marks/Input + + + R, EpiCSeg + 3.2.0, 2016-04-04 + + no looping + This normalization step is mandatory if a joined segmentation is done for several samples (e.g. for a whole sub-project); + in the default case of a single sample, this has no relevant effect + + + R, EpiCSeg + 3.2.0, 2016-04-04 + + no looping + + + + bedtools + 2.20.1 + ecs_cmm_isect.tmp ]]> + EpiCSeg and ChromHMM segmentations + + + + Python3 + 3.2.3 + + no looping + Assign labels for the EpiCSeg segmentation based on overlap with ChromHMM segmentation. This command outputs a label mapping file + + + Python3 + 3.2.3 + + no looping + Assign labels for the EpiCSeg segmentation based on overlap with ChromHMM segmentation. Produces a browsertrack BED file to be loaded into, e.g., IGV + + + Python3 + 3.2.3 + + no looping + Assign labels for the EpiCSeg segmentation based on overlap with ChromHMM segmentation. Produces a browsertrack BED file to be loaded into, e.g., IGV + + + Unspec + 0.0 + + no looping + To do command for packaging the various output files into 2 gzip files + + + + + diff --git a/docs/interpretation/RNBv0.xml b/docs/interpretation/RNBv0.xml new file mode 100644 index 0000000..d60f45f --- /dev/null +++ b/docs/interpretation/RNBv0.xml @@ -0,0 +1,59 @@ + + + + RNB + 0 + + Fabian Mueller + fmueller@mpi-inf.mpg.de + + + This process desribes an integrative analysis of bisulfite methylation data. The input comprises methylation calls in bed format, a comma-separated sample annotation file and options setting in XML format. The result is a folder structure of multiple HTML reports describing data loading, quality control, site and sample filtering, identification of batch effects, methylation profiling on the basis of indiviual CpGs as well as genomic regions, differential methylation analysis and data export. + These reports can be viewed locally or displayed via the internet. + + + + MCSv0.bed + + collection + methylation calls in bed format + + + sampleAnnotation.csv + + single + Sample annotation table + + + analysisOptions.xml + + single + Options settings for RnBeads. These options are also listed in the corresponding analysis metadata. + + + + + methylationCall.bed + + collection + Methylation calls from other samples, not obtained by DEEP in bed format. + + + + + RnBv0.tar.gz + + single + RnBeads report directory containing HTML reports and associated plots, tables and files. + + + + + RnBeads + 0.99.11 + + + Integrated analysis of multiple samples using RnBeads. Pipeline options are transferred to RnBeads via an XML file, but should also be specified in the anlysis metadata. + + + diff --git a/docs/interpretation/RNBv1.xml b/docs/interpretation/RNBv1.xml new file mode 100644 index 0000000..43b6741 --- /dev/null +++ b/docs/interpretation/RNBv1.xml @@ -0,0 +1,47 @@ + + + + RNB + 1 + + Fabian Mueller + fmueller@mpi-inf.mpg.de + + + This process desribes an integrative analysis of bisulfite methylation data. The analysis is run from a configuration JSON file. The input comprises methylation calls in bed format, a comma-separated sample annotation file and options setting in XML format which are specified in the analysis metadata. The result is a folder structure of multiple HTML reports describing data loading, quality control, site and sample filtering, identification of batch effects, methylation profiling on the basis of indiviual CpGs as well as genomic regions, differential methylation analysis and data export. + These reports can be viewed locally or displayed via the internet. + + + + ANALYSIS_CONFIG.JSON + + single + Options settings for RnBeads. These options are also listed in the corresponding analysis metadata. + + + + + species_annotation + R package + single + Assembly specific annotation package containing standard information like CpG islands, promoter regions etc, see http://rnbeads.mpi-inf.mpg.de/installation.php + + + + + RNBEADS_REPORT + + single + RnBeads report directory containing HTML reports and associated plots, tables and files. + + + + + RnBeads + >=1.1.3 + + + Integrated analysis of multiple samples using RnBeads. Pipeline options are transferred to RnBeads via a JSON file, but should also be specified in the anlysis metadata. + + + diff --git a/docs/interpretation/TFAv1.xml b/docs/interpretation/TFAv1.xml new file mode 100644 index 0000000..a8b46a5 --- /dev/null +++ b/docs/interpretation/TFAv1.xml @@ -0,0 +1,126 @@ + + + + TFA + 1 + + Florian Schmidt + fschmidt@mmci.uni-saarland.de + + + + The TFA (Transcription Factor Annotation) process can be used to obtain scores describing the affinity of Transcription Factors to certain regions of the genome and to a predfined area around the TSS of a set of genes. + We utilise TRAP (Transcription Factor Affinity Prediction, cf. http://bioinformatics.oxfordjournals.org/content/23/2/134.long) to obtain the aformentioned scores. The process takes a region file as input (e.g. Dnase peaks), + a reference genome, a gene annotation file, a set of position weight matrices (PWMS, e.g. from the Jaspar database), and a set of precomputed parameters fitted to the PWMs which are required for the pValue computation in TRAP. + In addition, the user has to specify the SampleID and the number of cores that should be used for running TRAP in parallel. + To run the process use the script TFA.sh. The command line is: + sh TFA {genome_reference} {region_file} {SampleID} {Number_of_cores} {pwms} {TRAP_pValue_parameters} {genome_annotation} {TFRank} + + + + + region_file + bed + single + TF prediction will be carried out in all regions specified in this file. + + + + + genome_reference + fa + single + The reference genome of the analysed organism. + + + genome_annotation + gtf + single + A genome annotation file used to extract the position of TSS. + + + pwms + txt + single + A file containing PWMs, e.g. the vertebrates PWMs of the jaspar database. + + + TRAP_pValue_parameters + txt + single + A file containing precomputed parameters needed for the pValue computation within TRAP. For each PWM contained in jaspar_pwms, these parameters have to be specified. + + + + + affinity_file_peak_view + txt + single + This file contains the so called affinity scores computed by TRAP for all regions in the region_file. + + + pAffinity_file_peak_view + txt + single + This file contains the -log() transformed pValues computed by TRAP for all regions in the region_file. The Pseudocount (to avoid -log(0) is set to 0.001. + + + affinity_file_gene_view + txt + single + Using the genome_annotation, we computed an additive affinity score for all genes by overlapping all regions of the region file with a window of size 3000bp centered at the TSS of each gene. + + + pAffinity_file_gene_view + txt + single + Using the genome_annotation, we computed an additive score, based on -log transformed pValues, for all genes by overlapping all regions of the region file with a window of size 3000bp centered at the TSS of each gene. + + + affinity_file_rank + txt + single + This file contains the top x TFs according to the highest TRAP affinity scores for each gene in the annotation file. The number of genes x is specified by the user through setting the variable TFRank. + + + pAffinity_file_rank + txt + single + This file contains the top x TFs according to the highest TRAP pValues (-log transformed) for each gene in the annotation file. The number of genes x is specified by the user through setting the variable TFRank. + + + + + + bedtools getfasta + 2.23.0 + + + Creates a fasta file containing the sequences for all regions listed in the region_file. + + + convertRYtoN.py + 1.0 + + + As TRAP accepts only A,C,G,T and N we replace all characters unequal to A,C,G, or T by N. + + + TRAP.R3script, R3 + 1.0, 3.1.2 + + + This script uses the package TRAP to compute transcription factor binding affinity. It requires the libraries tRap, parallel and Biostrings. FORK is used in the parallelisation. A manual on how to install tRap can be found here: http://trap.molgen.mpg.de/download/TRAP_R_package/tRap-tutorial.html. + + + annotateTSS.py + 1.0 + + + This python script computes Gene specific TF scores using both files obtained from TRAP.R3script (affinity_file_peak_view and pAffinity_file_peak_view}) and the genome_annotation. + There are two types of output: The gene_view files and optinal gene_rank files. Gene_view files provide affinity scores and pValues for each TF and each gene in the genome_annotation. + The rank files contain the top TFRank transcription factors, individually for all genes. One file is scored according to affinity and one is scored according to pValues. If TFRank is set to none, the rank files are not generated + The window size determines the size of the window around each TSS. + + +