From abe2cf7cd28e9243e25ab54061ad0b9964a1c9b5 Mon Sep 17 00:00:00 2001 From: Peter Ebert Date: Wed, 6 Sep 2017 16:03:56 +0200 Subject: [PATCH] ENH: first draft of complete CHPv5 process document --- docs/quantification/chip-seq/CHPv5.xml | 281 +++++++++++++++++++++++++ 1 file changed, 281 insertions(+) create mode 100644 docs/quantification/chip-seq/CHPv5.xml diff --git a/docs/quantification/chip-seq/CHPv5.xml b/docs/quantification/chip-seq/CHPv5.xml new file mode 100644 index 0000000..9a828a6 --- /dev/null +++ b/docs/quantification/chip-seq/CHPv5.xml @@ -0,0 +1,281 @@ + + + + CHP + 5 + + Peter Ebert + pebert@mpi-inf.mpg.de + + + Key points: + (1) deepTools QC (fingerprint / GC bias) for raw BAM files [same as before] + (2) deepTools QC (fingerprint) for filtered BAM files to create IHEC QC as part of the process [new] + (3) deepTools correlation for filtered and blacklist removed BAM files [same as before] + (4) peak calling with MACS2 and histoneHMM on filtered BAM files [partially new] + (5) deepTools fold-change and coverage tracks for raw BAM files [same as before] + (6) deepTools coverage track for filtered BAM file [same as before] + + + + GALvX_Histone + BAM + collection + Only paired-end libraries are supported + + + GALvX_Input + BAM + single + Only paired-end libraries are supported + + + GALvX_Index + BAI + collection + + No distinction between histone and Input library for the index files - one index file per BAM file is required + + + + QcSummary + JSON + collection + + The median insert size (field: insertSizeMedian) is extracted from the QC summary file. + + + + + + blacklist_regions + BED + single + Blacklist region + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + chromosome_sizes + TSV + single + 2-column, tab-separated table of chromosome sizes for reference genome + + + + + DEEPID.PROC.DATE.ext + EXT + collection + Process output not yet defined + + + + + plotFingerprint + 2.5.3 + + + + no looping + Compute fingerprint on raw BAM files + + + computeGCBias + 2.5.3 + + + + GALvX_Histone, GALvX_Input + Compute GC bias on raw BAM files + + + bamCompare + 2.5.3 + + + + GALvX_Histone + + Generate log2 fold-change tracks of signal over input for raw BAM files with scaling method + "readCount" for libraries H3K27me3/H3K9me3, and "SES" otherwise + + + + bamCoverage + 2.5.3 + + + + GALvX_Histone, GALvX_Input + Generate read coverage signal normalized to 1x depth for raw BAM files + + + + sambamba + 0.6.6 + + = 5" + {GALvX_*} + ]]> + + GALvX_Histone, GALvX_Input + + Apply IHEC ChIP QC standard filtering to all BAM files (equivalent to bitflag 3844). + The resulting BAM files are temporary and discarded after the analysis. + + + + MACS2 + 2.1.1.20160309 + + + + GALvX_Histone + MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K4me1/H3K27me3/H3K36me/H3K9me3 + + + + histoneHMM + 1.7 + + + + GALvX_Histone + HistoneHMM peak calling on filtered BAM files for broad marks: H3K4me1/H3K27me3/H3K9me3/H3K36me3 + + + bamCoverage + 2.5.3 + + + + DEEPID.tmp.filt.bam + Generate read coverage signal normalized to 1x depth for filtered BAM files + + + + multiBamSummary + 2.5.3 + + + + no looping + Create data matrix for correlation plot on filtered BAM files; remove blacklist regions on the fly + + + plotCorrelation + 2.5.3 + + + + no looping + Create heatmap correlation plot using Spearman and Pearson correlation + + + plotFingerprint + 2.5.3 + + + + no looping + Compute fingerprint on filtered BAM files; remove blacklist regions on the fly + + + sambamba + 0.6.6 + + + + DEEPID.tmp.filt.bam + + Get flagstat output for filtered BAM files, specifically number of mapped reads in these files. + This is done to compute the IHEC QC metrics as part of this process. + + + + sambamba + 0.6.6 + + DEEPID.tmp.peak_ovl.cnt + ]]> + + DEEPID.tmp.filt.bam + Get flagstat output for filtered BAM files, specifically number of mapped reads in these files + + + custom + 0.1 + + + + no looping + + Compute FRiP score and record in analysis metadata file (.amd.tsv). + Values input from the two previous steps. + + + + bedtools + 2.26.0 + + {DEEPID.PROC.DATE.peaks} + ]]> + + All peak files + + Discard peaks overlapping with a known blacklist region after computing FRiP score. + There does not seem to be an IHEC standard concerning this, correct? + + + +