-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ENH: first draft of complete CHPv5 process document
- Loading branch information
Showing
1 changed file
with
281 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,281 @@ | ||
<?xml version="1.0"?> | ||
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?> | ||
<process> | ||
<name>CHP</name> | ||
<version>5</version> | ||
<author> | ||
<name>Peter Ebert</name> | ||
<email>pebert@mpi-inf.mpg.de</email> | ||
</author> | ||
<description> | ||
Key points: | ||
(1) deepTools QC (fingerprint / GC bias) for raw BAM files [same as before] | ||
(2) deepTools QC (fingerprint) for filtered BAM files to create IHEC QC as part of the process [new] | ||
(3) deepTools correlation for filtered and blacklist removed BAM files [same as before] | ||
(4) peak calling with MACS2 and histoneHMM on filtered BAM files [partially new] | ||
(5) deepTools fold-change and coverage tracks for raw BAM files [same as before] | ||
(6) deepTools coverage track for filtered BAM file [same as before] | ||
</description> | ||
<inputs> | ||
<filetype> | ||
<identifier>GALvX_Histone</identifier> | ||
<format>BAM</format> | ||
<quantity>collection</quantity> | ||
<comment>Only paired-end libraries are supported</comment> | ||
</filetype> | ||
<filetype> | ||
<identifier>GALvX_Input</identifier> | ||
<format>BAM</format> | ||
<quantity>single</quantity> | ||
<comment>Only paired-end libraries are supported</comment> | ||
</filetype> | ||
<filetype> | ||
<identifier>GALvX_Index</identifier> | ||
<format>BAI</format> | ||
<quantity>collection</quantity> | ||
<comment> | ||
No distinction between histone and Input library for the index files - one index file per BAM file is required | ||
</comment> | ||
</filetype> | ||
<filetype> | ||
<identifier>QcSummary</identifier> | ||
<format>JSON</format> | ||
<quantity>collection</quantity> | ||
<comment> | ||
The median insert size (field: insertSizeMedian) is extracted from the QC summary file. | ||
</comment> | ||
</filetype> | ||
</inputs> | ||
<references> | ||
<filetype> | ||
<identifier>blacklist_regions</identifier> | ||
<format>BED</format> | ||
<quantity>single</quantity> | ||
<comment>Blacklist region</comment> | ||
</filetype> | ||
<filetype> | ||
<identifier>reference_genome</identifier> | ||
<format>2bit</format> | ||
<quantity>single</quantity> | ||
<comment>The reference genome file; see DCC/download/results/references/genomes</comment> | ||
</filetype> | ||
<filetype> | ||
<identifier>chromosome_sizes</identifier> | ||
<format>TSV</format> | ||
<quantity>single</quantity> | ||
<comment>2-column, tab-separated table of chromosome sizes for reference genome</comment> | ||
</filetype> | ||
</references> | ||
<outputs> | ||
<filetype> | ||
<identifier>DEEPID.PROC.DATE.ext</identifier> | ||
<format>EXT</format> | ||
<quantity>collection</quantity> | ||
<comment>Process output not yet defined</comment> | ||
</filetype> | ||
</outputs> | ||
<software> | ||
<tool> | ||
<name>plotFingerprint</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.fgpr} | ||
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg | ||
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample {GALvX_Input} | ||
]]> | ||
</command_line> | ||
<loop>no looping</loop> | ||
<comment>Compute fingerprint on raw BAM files</comment> | ||
</tool> | ||
<tool> | ||
<name>computeGCBias</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize} | ||
--genome {reference_genome} --sampleSize 50000000 | ||
--GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg | ||
]]> | ||
</command_line> | ||
<loop>GALvX_Histone, GALvX_Input</loop> | ||
<comment>Compute GC bias on raw BAM files</comment> | ||
</tool> | ||
<tool> | ||
<name>bamCompare</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input} | ||
--outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling} | ||
--ratio log2 --binSize 25 | ||
]]> | ||
</command_line> | ||
<loop>GALvX_Histone</loop> | ||
<comment> | ||
Generate log2 fold-change tracks of signal over input for raw BAM files with scaling method | ||
"readCount" for libraries H3K27me3/H3K9me3, and "SES" otherwise | ||
</comment> | ||
</tool> | ||
<tool> | ||
<name>bamCoverage</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov} | ||
--outFileFormat bigwig --normalizeTo1x {genomesize} | ||
]]> | ||
</command_line> | ||
<loop>GALvX_Histone, GALvX_Input</loop> | ||
<comment>Generate read coverage signal normalized to 1x depth for raw BAM files</comment> | ||
</tool> | ||
|
||
<tool> | ||
<name>sambamba</name> | ||
<version>0.6.6</version> | ||
<command_line> | ||
<![CDATA[ | ||
sambamba view --format=bam --nthreads={sambamba_parallel} -output-filename DEEPID.tmp.filt.bam | ||
--filter="not (duplicate or unmapped or failed_quality_control or supplementary or secondary_alignment) and mapping_quality >= 5" | ||
{GALvX_*} | ||
]]> | ||
</command_line> | ||
<loop>GALvX_Histone, GALvX_Input</loop> | ||
<comment> | ||
Apply IHEC ChIP QC standard filtering to all BAM files (equivalent to bitflag 3844). | ||
The resulting BAM files are temporary and discarded after the analysis. | ||
</comment> | ||
</tool> | ||
<tool> | ||
<name>MACS2</name> | ||
<version>2.1.1.20160309</version> | ||
<command_line> | ||
<![CDATA[ | ||
macs2 callpeak -t DEEPID.tmp.filt.bam -c DEEPID_Input.tmp.filt.bam -f BAM --gsize {genomesize} | ||
--keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad} | ||
]]> | ||
</command_line> | ||
<loop>GALvX_Histone</loop> | ||
<comment>MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K4me1/H3K27me3/H3K36me/H3K9me3</comment> | ||
</tool> | ||
|
||
<tool> | ||
<name>histoneHMM</name> | ||
<version>1.7</version> | ||
<command_line> | ||
<![CDATA[ | ||
histoneHMM_call_regions.R -b 750 --chromlen={chromosome_sizes} | ||
--outprefix={DEEPID} --probability=0.1 DEEPID.tmp.filt.bam | ||
]]> | ||
</command_line> | ||
<loop>GALvX_Histone</loop> | ||
<comment>HistoneHMM peak calling on filtered BAM files for broad marks: H3K4me1/H3K27me3/H3K9me3/H3K36me3</comment> | ||
</tool> | ||
<tool> | ||
<name>bamCoverage</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.filt.bam --outFileName {DEEPID.PROC.DATE.bamcov} | ||
--outFileFormat bigwig --normalizeTo1x {genomesize} | ||
]]> | ||
</command_line> | ||
<loop>DEEPID.tmp.filt.bam</loop> | ||
<comment>Generate read coverage signal normalized to 1x depth for filtered BAM files</comment> | ||
</tool> | ||
|
||
<tool> | ||
<name>multiBamSummary</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
multiBamSummary bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --outFileName SAMPLEID.npz | ||
--labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --blackListFileName {blacklist_regions} | ||
]]> | ||
</command_line> | ||
<loop>no looping</loop> | ||
<comment>Create data matrix for correlation plot on filtered BAM files; remove blacklist regions on the fly</comment> | ||
</tool> | ||
<tool> | ||
<name>plotCorrelation</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap | ||
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method} | ||
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm | ||
]]> | ||
</command_line> | ||
<loop>no looping</loop> | ||
<comment>Create heatmap correlation plot using Spearman and Pearson correlation</comment> | ||
</tool> | ||
<tool> | ||
<name>plotFingerprint</name> | ||
<version>2.5.3</version> | ||
<command_line> | ||
<![CDATA[ | ||
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile {DEEPID.PROC.DATE.fgpr} | ||
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg | ||
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample DEEPID_Input.tmp.filt.bam | ||
--blackListFileName {blacklist_regions} | ||
]]> | ||
</command_line> | ||
<loop>no looping</loop> | ||
<comment>Compute fingerprint on filtered BAM files; remove blacklist regions on the fly</comment> | ||
</tool> | ||
<tool> | ||
<name>sambamba</name> | ||
<version>0.6.6</version> | ||
<command_line> | ||
<![CDATA[ | ||
sambamba flagstat --nthreads={sambamba_parallel} DEEPID.tmp.filt.bam | ||
]]> | ||
</command_line> | ||
<loop>DEEPID.tmp.filt.bam</loop> | ||
<comment> | ||
Get flagstat output for filtered BAM files, specifically number of mapped reads in these files. | ||
This is done to compute the IHEC QC metrics as part of this process. | ||
</comment> | ||
</tool> | ||
<tool> | ||
<name>sambamba</name> | ||
<version>0.6.6</version> | ||
<command_line> | ||
<![CDATA[ | ||
sambamba view --count --nthreads={sambamba_parallel} | ||
--regions=peak_file DEEPID.tmp.filt.bam > DEEPID.tmp.peak_ovl.cnt | ||
]]> | ||
</command_line> | ||
<loop>DEEPID.tmp.filt.bam</loop> | ||
<comment>Get flagstat output for filtered BAM files, specifically number of mapped reads in these files</comment> | ||
</tool> | ||
<tool> | ||
<name>custom</name> | ||
<version>0.1</version> | ||
<command_line> | ||
<![CDATA[ | ||
compute_frip: reads_in_peaks / total_mapped_reads | ||
]]> | ||
</command_line> | ||
<loop>no looping</loop> | ||
<comment> | ||
Compute FRiP score and record in analysis metadata file (.amd.tsv). | ||
Values input from the two previous steps. | ||
</comment> | ||
</tool> | ||
<tool> | ||
<name>bedtools</name> | ||
<version>2.26.0</version> | ||
<command_line> | ||
<![CDATA[ | ||
bedtools intersect -v -a peak_files -b {blacklist_regions} > {DEEPID.PROC.DATE.peaks} | ||
]]> | ||
</command_line> | ||
<loop>All peak files</loop> | ||
<comment> | ||
Discard peaks overlapping with a known blacklist region after computing FRiP score. | ||
There does not seem to be an IHEC standard concerning this, correct? | ||
</comment> | ||
</tool> | ||
</software> | ||
</process> |