diff --git a/docs/quantification/chip-seq/CHPv5.xml b/docs/quantification/chip-seq/CHPv5.xml
index 3b97b0e..8f92919 100644
--- a/docs/quantification/chip-seq/CHPv5.xml
+++ b/docs/quantification/chip-seq/CHPv5.xml
@@ -8,13 +8,54 @@
pebert@mpi-inf.mpg.de
- Key points:
- (1) deepTools QC (fingerprint / GC bias) for raw BAM files [same as before]
- (2) deepTools QC (fingerprint) for filtered BAM files to create IHEC QC as part of the process [new]
- (3) deepTools correlation for filtered and blacklist removed BAM files [same as before]
- (4) peak calling with MACS2 and histoneHMM on filtered BAM files [partially new]
- (5) deepTools fold-change and coverage tracks for raw BAM files [same as before]
- (6) deepTools coverage track for filtered BAM file [same as before]
+ - This description gives a rough summary of what this process produces as default output
+ - For details, please do carefully read the respective step given below
+ (match the output filename as listed under Output files)
+
+ 1. Quality control
+ 1.1 The GC bias and fingerprint QC plots are based on raw / unfiltered BAM files
+ 1.2 The correlation heatmap is based on filtered BAM files, restricted to autosomes
+
+ 2. IHEC ChIP QC
+ 2.1 This process collects QC metrics as defined by the IHEC Assay Standards work group
+ 2.2 The set of metrics is saved in the analysis metadata file (.amd.tsv) that is generated as part of each run
+ 2.3 All metrics below are computed based on filtered BAM files, following the IHEC reference implementation
+ 2.4 2017-Sep.: https://github.com/IHEC/ihec-assay-standards
+ 2.5 final reads: total number of mapped reads in filtered BAM file
+ 2.6 peak reads: total number of reads overlapping called peak regions
+ 2.7 frip: peak reads divided by final reads (fraction of reads in peaks)
+ 2.8 JS dist: Jensen-Shannon distance as computed by deepTools/plotFingerprint
+ 2.9 CHANCE div: CHANCE divergence as computed by deepTools/plotFingerprint
+
+ 3. bigWig signal tracks
+ 3.1 log2 fold-change tracks of histone signal over Input
+ 3.2 Coverage tracks of raw / unfiltered BAM files, normalized to 1x depth
+ 3.3 Coverage tracks of filtered BAM files, with blacklist removed and 1x normalization based
+ on automsomes only. These are the recommended signal tracks to use in order to assess
+ the coverage in regions of interest in generic downstream analyses
+
+ 4. Peak calling
+ 4.1 We use MACS2 and histoneHMM to call peaks
+ 4.2 MACS2 narrow marks: H3K27ac, H3K4me1, H3K4me3
+ 4.3 MACS2 broad marks: H3K27me3, H3K36me3, H3K9me3
+ 4.3.1 The quality and reliability of MACS2 broad peaks is a subject of active debate.
+ We provide the above files mainly for historical reasons and
+ to improve compatibility to other epigenome mapping consortia
+ 4.3.2 We discourage the use of the MACS2 broad output for downstream analysis
+ 4.4 histoneHMM broad marks: H3K27me3, H3K36me3, H3K4me1, H3K9me3, Input
+ 4.4.1 We call enriched regions on Input since histoneHMM has no option to use the Input during peak calling
+ 4.5 Called peaks are flagged for overlapping with a blacklist region (MACS2 and histoneHMM) and for
+ overlapping with a region enriched in the Input (histoneHMM only). The flagging is realized via
+ the name column in the output files.
+ 4.6 The peak files are standardized to follow ENCODE's broadPeak and narrowPeak format, respectively.
+ For MACS output, this requires a rescaling of the score column to be in the range of 0...1000.
+ For histoneHMM, default values are added and the average posterior is taken as the signalValue.
+
+ 5. File naming
+ 5.1 DEEPID.PROC.DATE.ASSM (plus suffix and extension)
+ 5.2 Example DEEPID: 43_Hm21_WMAs_Ct_H3K4me3_F_1
+ 5.3 PROC = CHPv5 ; DATE = date of run
+ 5.3 ASSM: assembly like hg38 or m38 (GRCm38)
@@ -65,6 +106,12 @@
single
2-column, tab-separated table of chromosome sizes for reference genome
+
+ autosomal_regions
+ BED
+ single
+ A file listing all autosomes as BED regions for filtering
+
@@ -117,6 +164,30 @@
collection
Zip archive containing other histoneHMM output files (raw data files not needed by most users)
+
+ DEEPID.PROC.DATE.macs.out
+ zip
+ collection
+ Zip archive containing other MACS output files (raw data files not needed by most users)
+
+
+ DEEPID.PROC.DATE.hhmm.broad
+ BED / broadPeak
+ collection
+ Histone and Input enriched regions called by histoneHMM
+
+
+ DEEPID.PROC.DATE.macs.broad
+ BED / broadPeak
+ collection
+ Histone enriched regions called by MACS
+
+
+ DEEPID.PROC.DATE.macs.narrow
+ BED / narrowPeak
+ collection
+ Histone enriched regions called by MACS
+
DEEPID.PROC.DATE.fgpr
SVG
@@ -129,6 +200,28 @@
single
Fingerprint quality metrics based on raw BAM files
+
+ DEEPID.PROC.DATE.counts-fgpr
+ tsv
+ single
+ Fingerprint raw counts based on raw BAM files
+
+
+
+ DEEPID.PROC.DATE.auto.counts-summ
+ tsv
+ single
+ multiBamSummary raw counts based on filtered and autosome-restricted BAM files
+
+
+ DEEPID.PROC.DATE.auto.summ
+ npz
+ single
+
+ multiBamSummary data file based on filtered and autosome-restricted BAM files.
+ The format is a numpy compatible binary file.
+
+
@@ -156,7 +249,7 @@
]]>
GALvX_Histone, GALvX_Input
- Compute GC bias on raw BAM files
+ Compute GC bias on raw BAM files.
bamCompare
@@ -179,13 +272,17 @@
2.5.3
no looping
- Compute fingerprint on raw BAM files
+
+ Compute fingerprint on raw BAM files. In cases where there are two or more Input control
+ files, the majority file will be selected as JSD-sample file.
+
@@ -214,9 +311,9 @@
DEEPID.tmp.filt.bam
- Due to the previous filtering step, counting simply all reads in the filtered BAM
- file is equivalent to counting only mapped reads. The number of mapped reads is needed
- to compute the FRiP score in a later stage.
+ Due to the previous filtering step, counting simply all reads in the filtered BAM file
+ is equivalent to counting only mapped reads.
+ The number of mapped reads is needed to compute the FRiP score in a later stage.
@@ -224,7 +321,7 @@
2.5.3
@@ -232,6 +329,8 @@
Generate read coverage signal normalized to 1x depth for filtered BAM files.
Remove blacklist regions on-the-fly and consider only autosomes for normalization step.
+ Note that the implementation of this pipeline is designed to support genomes
+ with and without "chr" prefix, hence the various different naming styles for --ignoreForNorm.
@@ -239,13 +338,17 @@
2.5.3
no looping
- Compute fingerprint on filtered BAM files to compute IHEC QC measures
+
+ Compute fingerprint on filtered BAM files to collect IHEC QC metrics.
+ The output files of this step are temporary and are discarded after the analysis.
+
MACS2
@@ -257,7 +360,24 @@
]]>
GALvX_Histone
- MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K27me3/H3K36me/H3K9me3
+
+ MACS2 peak calling on filtered BAM files.
+ Parameter "--broad" set for libraries H3K27me3/H3K36me/H3K9me3
+
+
+
+ zip
+ 3.0
+
+
+
+ GALvX_Histone
+
+ zip all secondary MACS output files per histone mark: _peak.xls, _summits.bed and _peaks.gappedPeak
+ depending on parameter "--broad"
+
@@ -278,12 +398,27 @@
DEEPID.hmm.bed &&
- mv DEEPID-zinba-emfit.pdf DEEPID.PROC.DATE.hhmm.emfit.pdf
+ mv DEEPID-zinba-emfit.pdf {DEEPID.PROC.DATE.hhmm.emfit}
]]>
DEEPID-regions.gff
- Make histoneHMM output BED-like for blacklist intersection and standardize name of EM fit PDF.
+ Make histoneHMM output BED-like for blacklist intersection and standardize name of EM fit PDF
+
+ zip
+ 3.0
+
+
+
+ GALvX_Histone
+
+ zip all secondary histoneHMM output files per histone mark: -em-posterior.txt, -zinba-params-em.RData
+ -zinba-params-em.txt, .txt
+
+
+
sambamba
@@ -295,20 +430,47 @@
]]>
DEEPID.tmp.filt.bam
- Get flagstat output for filtered BAM files, specifically number of mapped reads in these files
+ Count number of reads overlapping peak regions, later used for FRiP score
+
- custom
- 0.1
+ bedtools
+ 2.26.0
DEEPID.peak-ovl-bl
]]>
- no looping
+ peak_file
+ Intersect all peak files with blacklist regions for flagging
+
+
+
+ bedtools
+ 2.26.0
+
+ DEEPID.peak-ovl-input
+ ]]>
+
+ histoneHMM_peak_file
+ Intersect all histoneHMM peak files with Input peaks for flagging
+
+
+
+ Python
+ 2.7.13
+
+ {DEEPID.PROC.DATE.macs.narrow} {DEEPID.PROC.DATE.macs.broad} {DEEPID.PROC.DATE.hhmm.broad}
+ ]]>
+
+ peak-file
- Compute FRiP score and record in analysis metadata file (.amd.tsv).
- Values input from the two previous steps.
+ Pipeline-internal on-the-fly peak flagging and standardization: for MACS2 output, score
+ column is rescaled to range 0...1000, for histoneHMM, default values are added to fulfill
+ standard broadPeak/narrowPeak format specifications.
@@ -317,7 +479,7 @@
0.6.6
@@ -331,8 +493,9 @@
2.5.3
no looping
@@ -346,6 +509,7 @@
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method}
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm
+ --outFileCorMatrix {DEEPID.PROC.DATE.corrmat}
]]>
no looping