Skip to content
Permalink
a1778b5a93
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
356 lines (349 sloc) 12.4 KB
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
<name>CHP</name>
<version>5</version>
<author>
<name>Peter Ebert</name>
<email>pebert@mpi-inf.mpg.de</email>
</author>
<description>
Key points:
(1) deepTools QC (fingerprint / GC bias) for raw BAM files [same as before]
(2) deepTools QC (fingerprint) for filtered BAM files to create IHEC QC as part of the process [new]
(3) deepTools correlation for filtered and blacklist removed BAM files [same as before]
(4) peak calling with MACS2 and histoneHMM on filtered BAM files [partially new]
(5) deepTools fold-change and coverage tracks for raw BAM files [same as before]
(6) deepTools coverage track for filtered BAM file [same as before]
</description>
<inputs>
<filetype>
<identifier>GALvX_Histone</identifier>
<format>BAM</format>
<quantity>collection</quantity>
<comment>Only paired-end libraries are supported</comment>
</filetype>
<filetype>
<identifier>GALvX_Input</identifier>
<format>BAM</format>
<quantity>single</quantity>
<comment>Only paired-end libraries are supported</comment>
</filetype>
<filetype>
<identifier>GALvX_Index</identifier>
<format>BAI</format>
<quantity>collection</quantity>
<comment>
No distinction between histone and Input library for the index files - one index file per BAM file is required
</comment>
</filetype>
<filetype>
<identifier>QcSummary</identifier>
<format>JSON</format>
<quantity>collection</quantity>
<comment>
The median insert size (field: insertSizeMedian) is extracted from the QC summary file.
</comment>
</filetype>
</inputs>
<references>
<filetype>
<identifier>blacklist_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Blacklist region</comment>
</filetype>
<filetype>
<identifier>reference_genome</identifier>
<format>2bit</format>
<quantity>single</quantity>
<comment>The reference genome file; see DCC/download/results/references/genomes</comment>
</filetype>
<filetype>
<identifier>chromosome_sizes</identifier>
<format>TSV</format>
<quantity>single</quantity>
<comment>2-column, tab-separated table of chromosome sizes for reference genome</comment>
</filetype>
</references>
<outputs>
<filetype>
<identifier>DEEPID.PROC.DATE.raw.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Signal coverage track generated from raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.filt.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>
Signal coverage track generated from filtered BAM files. -F 3844 / q >= 5 / blacklist removed
</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.ses-fc</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>SES normalized fold-change signal</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.cnt-fc</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Read-count normalized fold-change signal</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<format>svg</format>
<quantity>collection</quantity>
<comment>GC bias plot based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<format>txt</format>
<quantity>collection</quantity>
<comment>Obs./exp. GC read frequencies based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.emfit</identifier>
<format>PDF</format>
<quantity>collection</quantity>
<comment>histoneHMM output visualizing the EM fit. Check this before using the histoneHMM output</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.out</identifier>
<format>zip</format>
<quantity>collection</quantity>
<comment>Zip archive containing other histoneHMM output files (raw data files not needed by most users)</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.fgpr</identifier>
<format>SVG</format>
<quantity>single</quantity>
<comment>Fingerprint plots based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.qm-fgpr</identifier>
<format>txt</format>
<quantity>single</quantity>
<comment>Fingerprint quality metrics based on raw BAM files</comment>
</filetype>
</outputs>
<software>
<tool>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Generate read coverage signal normalized to 1x depth for raw BAM files</comment>
</tool>
<tool>
<name>computeGCBias</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize}
--genome {reference_genome} --sampleSize 50000000 --fragmentLength {*_fraglen}
--GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Compute GC bias on raw BAM files</comment>
</tool>
<tool>
<name>bamCompare</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input}
--outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling}
--ratio log2 --binSize 25
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>
Generate log2 fold-change tracks of signal over input for raw BAM files with scaling method
&quot;readCount&quot; for libraries H3K27me3/H3K9me3, and &quot;SES&quot; otherwise
</comment>
</tool>
<tool>
<name>plotFingerprint</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample {GALvX_Input}
]]>
</command_line>
<loop>no looping</loop>
<comment>Compute fingerprint on raw BAM files</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --format=bam --nthreads={sambamba_parallel} --output-filename DEEPID.tmp.filt.bam
--filter="not (duplicate or unmapped or failed_quality_control or supplementary or secondary_alignment) and mapping_quality >= 5"
{GALvX_*}
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>
Apply IHEC ChIP QC standard filtering to all BAM files (equivalent to bitflag 3844).
The resulting BAM files are temporary and discarded after the analysis.
</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --count DEEPID.tmp.filt.bam > DEEPID.mapped.readcount
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Due to the previous filtering step, counting simply all reads in the filtered BAM
file is equivalent to counting only mapped reads. The number of mapped reads is needed
to compute the FRiP score in a later stage.
</comment>
</tool>
<tool>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.filt.bam --outFileName {DEEPID.PROC.DATE.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize} --blackListFileName {blacklist_regions} --ignoreForNorm chrX chrY chrM X Y M MT
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Generate read coverage signal normalized to 1x depth for filtered BAM files.
Remove blacklist regions on-the-fly and consider only autosomes for normalization step.
</comment>
</tool>
<tool>
<name>plotFingerprint</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile {DEEPID.PROC.DATE.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.fgpr.qc} --JSDsample DEEPID_Input.tmp.filt.bam
]]>
</command_line>
<loop>no looping</loop>
<comment>Compute fingerprint on filtered BAM files to compute IHEC QC measures</comment>
</tool>
<tool>
<name>MACS2</name>
<version>2.1.1.20160309</version>
<command_line>
<![CDATA[
macs2 callpeak -t DEEPID.tmp.filt.bam -c DEEPID_Input.tmp.filt.bam -f BAM --gsize {genomesize}
--keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad}
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>MACS2 peak calling on filtered BAM files. Parameter "--broad" for libraries H3K27me3/H3K36me/H3K9me3</comment>
</tool>
<tool>
<name>histoneHMM</name>
<version>1.7</version>
<command_line>
<![CDATA[
histoneHMM_call_regions.R -b 750 --chromlen={chromosome_sizes}
--outprefix=DEEPID-regions.gff --probability=0.1 DEEPID.tmp.filt.bam
]]>
</command_line>
<loop>GALvX_Histone</loop>
<comment>HistoneHMM peak calling on filtered BAM files for broad marks: H3K4me1/H3K27me3/H3K9me3/H3K36me3</comment>
</tool>
<tool>
<name>cut, sort, mv</name>
<version>8.13</version>
<command_line>
<![CDATA[
cut -f 1,4,5,9 DEEPID-regions.gff | sort -V -k1,2 > DEEPID.hmm.bed &&
mv DEEPID-zinba-emfit.pdf DEEPID.PROC.DATE.hhmm.emfit.pdf
]]>
</command_line>
<loop>DEEPID-regions.gff</loop>
<comment>Make histoneHMM output BED-like for blacklist intersection and standardize name of EM fit PDF.</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --count --nthreads={sambamba_parallel}
--regions=peak_file DEEPID.tmp.filt.bam > DEEPID.tmp.peak_ovl.cnt
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>Get flagstat output for filtered BAM files, specifically number of mapped reads in these files</comment>
</tool>
<tool>
<name>custom</name>
<version>0.1</version>
<command_line>
<![CDATA[
compute_frip: reads_in_peaks / total_mapped_reads
]]>
</command_line>
<loop>no looping</loop>
<comment>
Compute FRiP score and record in analysis metadata file (.amd.tsv).
Values input from the two previous steps.
</comment>
</tool>
<tool>
<name>sambamba</name>
<version>0.6.6</version>
<command_line>
<![CDATA[
sambamba view --format=bam --nthreads={sambamba_parallel} --output-filename DEEPID.tmp.auto.bam
--regions={autosome_regions} DEEPID.tmp.filt.bam
]]>
</command_line>
<loop>DEEPID.tmp.filt.bam</loop>
<comment>
Restrict filtered BAM files to autosomal regions. These BAM files will be used to plot the correlation heatmaps.
</comment>
</tool>
<tool>
<name>multiBamSummary</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
multiBamSummary bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.auto.bam --outFileName SAMPLEID.npz
--labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --blackListFileName {blacklist_regions}
]]>
</command_line>
<loop>no looping</loop>
<comment>Create data matrix for correlation plot on filtered BAM files; remove blacklist regions on the fly</comment>
</tool>
<tool>
<name>plotCorrelation</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method}
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm
]]>
</command_line>
<loop>no looping</loop>
<comment>Create heatmap correlation plot using Spearman and Pearson correlation</comment>
</tool>
</software>
</process>