Skip to content

Commit

Permalink
ENH: first draft of complete CHPv5 process document
Browse files Browse the repository at this point in the history
  • Loading branch information
pebert committed Sep 6, 2017
1 parent 9a7d641 commit abe2cf7
Showing 1 changed file with 281 additions and 0 deletions.
281 changes: 281 additions & 0 deletions docs/quantification/chip-seq/CHPv5.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
<name>CHP</name>
<version>5</version>
<author>
<name>Peter Ebert</name>
<email>pebert@mpi-inf.mpg.de</email>
</author>
<description>
Key points:
(1) deepTools QC (fingerprint / GC bias) for raw BAM files [same as before]
(2) deepTools QC (fingerprint) for filtered BAM files to create IHEC QC as part of the process [new]
(3) deepTools correlation for filtered and blacklist removed BAM files [same as before]
(4) peak calling with MACS2 and histoneHMM on filtered BAM files [partially new]
(5) deepTools fold-change and coverage tracks for raw BAM files [same as before]
(6) deepTools coverage track for filtered BAM file [same as before]
</description>
<inputs>
<filetype>
<identifier>GALvX_Histone</identifier>
<format>BAM</format>
<quantity>collection</quantity>
<comment>Only paired-end libraries are supported</comment>
</filetype>
<filetype>
<identifier>GALvX_Input</identifier>
<format>BAM</format>
<quantity>single</quantity>
<comment>Only paired-end libraries are supported</comment>
</filetype>
<filetype>
<identifier>GALvX_Index</identifier>
<format>BAI</format>
<quantity>collection</quantity>
<comment>
No distinction between histone and Input library for the index files - one index file per BAM file is required
</comment>
</filetype>
<filetype>
<identifier>QcSummary</identifier>
<format>JSON</format>
<quantity>collection</quantity>
<comment>
The median insert size (field: insertSizeMedian) is extracted from the QC summary file.
</comment>
</filetype>
</inputs>
<references>
<filetype>
<identifier>blacklist_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Blacklist region</comment>
</filetype>
<filetype>
<identifier>reference_genome</identifier>
<format>2bit</format>
<quantity>single</quantity>
<comment>The reference genome file; see DCC/download/results/references/genomes</comment>
</filetype>
<filetype>
<identifier>chromosome_sizes</identifier>
<format>TSV</format>
<quantity>single</quantity>
<comment>2-column, tab-separated table of chromosome sizes for reference genome</comment>
</filetype>
</references>
<outputs>
<filetype>
<identifier>DEEPID.PROC.DATE.ext</identifier>
<format>EXT</format>
<quantity>collection</quantity>
<comment>Process output not yet defined</comment>
</filetype>
</outputs>
<software>
<tool>
<name>plotFingerprint</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample {GALvX_Input}
]]>
</command_line>
<loop>no looping</loop>
<comment>Compute fingerprint on raw BAM files</comment>
</tool>
<tool>
<name>computeGCBias</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize}
--genome {reference_genome} --sampleSize 50000000
--GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Compute GC bias on raw BAM files</comment>
</tool>
<tool>
<name>bamCompare</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input}
--outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling}
--ratio log2 --binSize 25
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>
Generate log2 fold-change tracks of signal over input for raw BAM files with scaling method
&quot;readCount&quot; for libraries H3K27me3/H3K9me3, and &quot;SES&quot; otherwise
</comment>
</tool>
<tool>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Generate read coverage signal normalized to 1x depth for raw BAM files</comment>
</tool>

<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --format=bam --nthreads={sambamba_parallel} -output-filename DEEPID.tmp.filt.bam
--filter="not (duplicate or unmapped or failed_quality_control or supplementary or secondary_alignment) and mapping_quality >= 5"
{GALvX_*}
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>
Apply IHEC ChIP QC standard filtering to all BAM files (equivalent to bitflag 3844).
The resulting BAM files are temporary and discarded after the analysis.
</comment>
</tool>
<tool>
<name>MACS2</name>
<version>2.1.1.20160309</version>
<command_line>
<![CDATA[
macs2 callpeak -t DEEPID.tmp.filt.bam -c DEEPID_Input.tmp.filt.bam -f BAM --gsize {genomesize}
--keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad}
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K4me1/H3K27me3/H3K36me/H3K9me3</comment>
</tool>

<tool>
<name>histoneHMM</name>
<version>1.7</version>
<command_line>
<![CDATA[
histoneHMM_call_regions.R -b 750 --chromlen={chromosome_sizes}
--outprefix={DEEPID} --probability=0.1 DEEPID.tmp.filt.bam
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>HistoneHMM peak calling on filtered BAM files for broad marks: H3K4me1/H3K27me3/H3K9me3/H3K36me3</comment>
</tool>
<tool>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.filt.bam --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>Generate read coverage signal normalized to 1x depth for filtered BAM files</comment>
</tool>

<tool>
<name>multiBamSummary</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
multiBamSummary bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --outFileName SAMPLEID.npz
--labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --blackListFileName {blacklist_regions}
]]>
</command_line>
<loop>no looping</loop>
<comment>Create data matrix for correlation plot on filtered BAM files; remove blacklist regions on the fly</comment>
</tool>
<tool>
<name>plotCorrelation</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method}
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm
]]>
</command_line>
<loop>no looping</loop>
<comment>Create heatmap correlation plot using Spearman and Pearson correlation</comment>
</tool>
<tool>
<name>plotFingerprint</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample DEEPID_Input.tmp.filt.bam
--blackListFileName {blacklist_regions}
]]>
</command_line>
<loop>no looping</loop>
<comment>Compute fingerprint on filtered BAM files; remove blacklist regions on the fly</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba flagstat --nthreads={sambamba_parallel} DEEPID.tmp.filt.bam
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Get flagstat output for filtered BAM files, specifically number of mapped reads in these files.
This is done to compute the IHEC QC metrics as part of this process.
</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --count --nthreads={sambamba_parallel}
--regions=peak_file DEEPID.tmp.filt.bam > DEEPID.tmp.peak_ovl.cnt
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>Get flagstat output for filtered BAM files, specifically number of mapped reads in these files</comment>
</tool>
<tool>
<name>custom</name>
<version>0.1</version>
<command_line>
<![CDATA[
compute_frip: reads_in_peaks / total_mapped_reads
]]>
</command_line>
<loop>no looping</loop>
<comment>
Compute FRiP score and record in analysis metadata file (.amd.tsv).
Values input from the two previous steps.
</comment>
</tool>
<tool>
<name>bedtools</name>
<version>2.26.0</version>
<command_line>
<![CDATA[
bedtools intersect -v -a peak_files -b {blacklist_regions} > {DEEPID.PROC.DATE.peaks}
]]>
</command_line>
<loop>All peak files</loop>
<comment>
Discard peaks overlapping with a known blacklist region after computing FRiP score.
There does not seem to be an IHEC standard concerning this, correct?
</comment>
</tool>
</software>
</process>

0 comments on commit abe2cf7

Please sign in to comment.