Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
ENH: update to CHPv5, still not finalized
  • Loading branch information
pebert committed Sep 11, 2017
1 parent abe2cf7 commit a1778b5
Showing 1 changed file with 150 additions and 75 deletions.
225 changes: 150 additions & 75 deletions docs/quantification/chip-seq/CHPv5.xml
Expand Up @@ -68,33 +68,90 @@
</references>
<outputs>
<filetype>
<identifier>DEEPID.PROC.DATE.ext</identifier>
<format>EXT</format>
<identifier>DEEPID.PROC.DATE.raw.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Process output not yet defined</comment>
<comment>Signal coverage track generated from raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.filt.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>
Signal coverage track generated from filtered BAM files. -F 3844 / q >= 5 / blacklist removed
</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.ses-fc</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>SES normalized fold-change signal</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.cnt-fc</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Read-count normalized fold-change signal</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<format>svg</format>
<quantity>collection</quantity>
<comment>GC bias plot based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<format>txt</format>
<quantity>collection</quantity>
<comment>Obs./exp. GC read frequencies based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.emfit</identifier>
<format>PDF</format>
<quantity>collection</quantity>
<comment>histoneHMM output visualizing the EM fit. Check this before using the histoneHMM output</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.out</identifier>
<format>zip</format>
<quantity>collection</quantity>
<comment>Zip archive containing other histoneHMM output files (raw data files not needed by most users)</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.fgpr</identifier>
<format>SVG</format>
<quantity>single</quantity>
<comment>Fingerprint plots based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.qm-fgpr</identifier>
<format>txt</format>
<quantity>single</quantity>
<comment>Fingerprint quality metrics based on raw BAM files</comment>
</filetype>

</outputs>
<software>

<tool>
<name>plotFingerprint</name>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample {GALvX_Input}
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
]]>
</command_line>
<loop>no looping</loop>
<comment>Compute fingerprint on raw BAM files</comment>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Generate read coverage signal normalized to 1x depth for raw BAM files</comment>
</tool>
<tool>
<name>computeGCBias</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize}
--genome {reference_genome} --sampleSize 50000000
--genome {reference_genome} --sampleSize 50000000 --fragmentLength {*_fraglen}
--GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg
]]>
</command_line>
Expand All @@ -118,24 +175,25 @@
</comment>
</tool>
<tool>
<name>bamCoverage</name>
<name>plotFingerprint</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample {GALvX_Input}
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Generate read coverage signal normalized to 1x depth for raw BAM files</comment>
<loop>no looping</loop>
<comment>Compute fingerprint on raw BAM files</comment>
</tool>

<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --format=bam --nthreads={sambamba_parallel} -output-filename DEEPID.tmp.filt.bam
sambamba view --format=bam --nthreads={sambamba_parallel} --output-filename DEEPID.tmp.filt.bam
--filter="not (duplicate or unmapped or failed_quality_control or supplementary or secondary_alignment) and mapping_quality >= 5"
{GALvX_*}
]]>
Expand All @@ -145,98 +203,88 @@
Apply IHEC ChIP QC standard filtering to all BAM files (equivalent to bitflag 3844).
The resulting BAM files are temporary and discarded after the analysis.
</comment>
</tool>
<tool>
<name>MACS2</name>
<version>2.1.1.20160309</version>
<command_line>
<![CDATA[
macs2 callpeak -t DEEPID.tmp.filt.bam -c DEEPID_Input.tmp.filt.bam -f BAM --gsize {genomesize}
--keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad}
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K4me1/H3K27me3/H3K36me/H3K9me3</comment>
</tool>

<tool>
<name>histoneHMM</name>
<version>1.7</version>
<command_line>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
histoneHMM_call_regions.R -b 750 --chromlen={chromosome_sizes}
--outprefix={DEEPID} --probability=0.1 DEEPID.tmp.filt.bam
sambamba view --count DEEPID.tmp.filt.bam > DEEPID.mapped.readcount
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>HistoneHMM peak calling on filtered BAM files for broad marks: H3K4me1/H3K27me3/H3K9me3/H3K36me3</comment>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Due to the previous filtering step, counting simply all reads in the filtered BAM
file is equivalent to counting only mapped reads. The number of mapped reads is needed
to compute the FRiP score in a later stage.
</comment>
</tool>
<tool>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.filt.bam --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
--outFileFormat bigwig --normalizeTo1x {genomesize} --blackListFileName {blacklist_regions} --ignoreForNorm chrX chrY chrM X Y M MT
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>Generate read coverage signal normalized to 1x depth for filtered BAM files</comment>
<comment>
Generate read coverage signal normalized to 1x depth for filtered BAM files.
Remove blacklist regions on-the-fly and consider only autosomes for normalization step.
</comment>
</tool>

<tool>
<name>multiBamSummary</name>
<name>plotFingerprint</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
multiBamSummary bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --outFileName SAMPLEID.npz
--labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --blackListFileName {blacklist_regions}
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample DEEPID_Input.tmp.filt.bam
]]>
</command_line>
<loop>no looping</loop>
<comment>Create data matrix for correlation plot on filtered BAM files; remove blacklist regions on the fly</comment>
<comment>Compute fingerprint on filtered BAM files to compute IHEC QC measures</comment>
</tool>
<tool>
<name>plotCorrelation</name>
<version>2.5.3</version>
<name>MACS2</name>
<version>2.1.1.20160309</version>
<command_line>
<![CDATA[
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method}
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm
macs2 callpeak -t DEEPID.tmp.filt.bam -c DEEPID_Input.tmp.filt.bam -f BAM --gsize {genomesize}
--keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad}
]]>
</command_line>
<loop>no looping</loop>
<comment>Create heatmap correlation plot using Spearman and Pearson correlation</comment>
<loop>GALvX_Histone</loop>
<comment>MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K27me3/H3K36me/H3K9me3</comment>
</tool>
<tool>
<name>plotFingerprint</name>
<version>2.5.3</version>

<tool>
<name>histoneHMM</name>
<version>1.7</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample DEEPID_Input.tmp.filt.bam
--blackListFileName {blacklist_regions}
histoneHMM_call_regions.R -b 750 --chromlen={chromosome_sizes}
--outprefix=DEEPID-regions.gff --probability=0.1 DEEPID.tmp.filt.bam
]]>
</command_line>
<loop>no looping</loop>
<comment>Compute fingerprint on filtered BAM files; remove blacklist regions on the fly</comment>
<loop>GALvX_Histone</loop>
<comment>HistoneHMM peak calling on filtered BAM files for broad marks: H3K4me1/H3K27me3/H3K9me3/H3K36me3</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<name>cut, sort, mv</name>
<version>8.13</version>
<command_line>
<![CDATA[
sambamba flagstat --nthreads={sambamba_parallel} DEEPID.tmp.filt.bam
cut -f 1,4,5,9 DEEPID-regions.gff | sort -V -k1,2 > DEEPID.hmm.bed &&
mv DEEPID-zinba-emfit.pdf DEEPID.PROC.DATE.hhmm.emfit.pdf
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Get flagstat output for filtered BAM files, specifically number of mapped reads in these files.
This is done to compute the IHEC QC metrics as part of this process.
</comment>
<loop>DEEPID-regions.gff</loop>
<comment>Make histoneHMM output BED-like for blacklist intersection and standardize name of EM fit PDF.</comment>
</tool>

<tool>
<name>sambamba</name>
<version>0.6.6</version>
Expand All @@ -263,19 +311,46 @@
Values input from the two previous steps.
</comment>
</tool>

<tool>
<name>bedtools</name>
<version>2.26.0</version>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
bedtools intersect -v -a peak_files -b {blacklist_regions} > {DEEPID.PROC.DATE.peaks}
]]>
</command_line>
<loop>All peak files</loop>
sambamba view --format=bam --nthreads={sambamba_parallel} --output-filename DEEPID.tmp.auto.bam
--regions={autosome_regions} DEEPID.tmp.filt.bam
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Discard peaks overlapping with a known blacklist region after computing FRiP score.
There does not seem to be an IHEC standard concerning this, correct?
Restrict filtered BAM files to autosomal regions. These BAM files will be used to plot the correlation heatmaps.
</comment>
</tool>
<tool>
<name>multiBamSummary</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
multiBamSummary bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.auto.bam --outFileName SAMPLEID.npz
--labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --blackListFileName {blacklist_regions}
]]>
</command_line>
<loop>no looping</loop>
<comment>Create data matrix for correlation plot on filtered BAM files; remove blacklist regions on the fly</comment>
</tool>
<tool>
<name>plotCorrelation</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method}
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm
]]>
</command_line>
<loop>no looping</loop>
<comment>Create heatmap correlation plot using Spearman and Pearson correlation</comment>
</tool>

</software>
</process>

0 comments on commit a1778b5

Please sign in to comment.