Skip to content
Permalink
9a7d6419a0
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
243 lines (242 sloc) 13.4 KB
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
<name>CHP</name>
<version>4</version>
<author>
<name>Andreas Richter, Peter Ebert</name>
<email>arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de</email>
</author>
<description>
Process CHPv4 is a minor update of the previous version that accounts for unresolved quality problems mainly in the mouse ChIP data.
The mouse data show a higher duplication rate on average (though not as high as some of the human cell line samples), plus the heterochromatic marks
show problematic behavior in the correlation control plot (using Pearson; this process now computes the Pearson correlation on blacklist-filtered and duplicates-removed
BAM files). Currently, it is unclear if that is simply due to a sub-optimal blacklist for mouse or indeed indicative of strong outliers in the data.
Any downstream analysis should thoroughly check for outliers interfering with detected signal. This process creates an additional QC plot
using the Spearman correlation metric that is more robust w.r.t. outliers. Additionally, coverage tracks are also created from duplicate-free BAM files.
Otherwise, this process is identical to the previous version.
It is assumed that a QC summary file is always available that holds information on the fragment length in the respective library.
This process takes as data input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks.
Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would
subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control)
in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the genome-wide quality
of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for
layout reasons and cannot be taken as a solid data normalization procedure.
Erratum: the metadata in the .amd.tsv file generated by this process are partially wrong.
Specifically, the FRiP score is too low (~ too pessimistic) since it is calculated using all reads instead
of just the fraction of mapped reads.
</description>
<inputs>
<filetype>
<identifier>GALvX_Histone</identifier>
<format>BAM</format>
<quantity>collection</quantity>
<comment></comment>
</filetype>
<filetype>
<identifier>GALvX_Input</identifier>
<format>BAM</format>
<quantity>single</quantity>
<comment></comment>
</filetype>
<filetype>
<identifier>GALvX_Index</identifier>
<format>BAI index</format>
<quantity>collection</quantity>
<comment>Note that there is no separation of Histone and Input for the index files as one index file is required per BAM file irrespective of its type</comment>
</filetype>
<filetype>
<identifier>GALvX_QcSummary</identifier>
<format>TXT</format>
<quantity>collection</quantity>
<comment>The median fragment length is computed during the mapping in the GAL process and stored in the QcSummary file. This information is used in the CHP process.</comment>
</filetype>
</inputs>
<references>
<filetype>
<identifier>blacklist_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations</comment>
</filetype>
<filetype>
<identifier>reference_genome</identifier>
<format>2bit</format>
<quantity>single</quantity>
<comment>The reference genome file; see DCC/download/results/references/genomes</comment>
</filetype>
<filetype>
<identifier>control_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations</comment>
</filetype>
</references>
<outputs>
<filetype>
<identifier>DEEPID.PROC.DATE.bamcorr</identifier>
<format>SVG</format>
<quantity>collection</quantity>
<comment>joint deepTools graphics output for all histone marks plus input control; one using Pearson and one using Spearman metric</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.bamfgpr</identifier>
<format>SVG</format>
<quantity>single</quantity>
<comment>joint deepTools graphics output for all histone marks plus input control</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcbias</identifier>
<format>SVG</format>
<quantity>collection</quantity>
<comment>deepTools graphics output</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<format>tab-separated text file</format>
<quantity>collection</quantity>
<comment>Required output file, currently not used for any downstream analysis</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.peaks</identifier>
<format>XLS table</format>
<quantity>collection</quantity>
<comment>Standard MACS2 output XLS table for broad and narrow marks</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.broadPeak</identifier>
<format>broadPeak</format>
<quantity>collection</quantity>
<comment>Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gappedPeak</identifier>
<format>gappedPeak</format>
<quantity>collection</quantity>
<comment>Standard MACS2 output in ENCODE's gappedPeak format for broad marks</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.summits</identifier>
<format>BED</format>
<quantity>collection</quantity>
<comment>Standard MACS2 output for narrow marks</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.narrowPeak</identifier>
<format>narrowPeak</format>
<quantity>collection</quantity>
<comment>Standard MACS2 output for narrow marks, this file is usually used for downstream analyses</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.bamcomp</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Input-normalized histone signal tracks</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Sequencing-depth normalized signal coverage tracks; created on raw BAM files and on duplicate-filtered BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.control</identifier>
<format>SVG</format>
<quantity>collection</quantity>
<comment>A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot.</comment>
</filetype>
</outputs>
<software>
<tool>
<name>samtools</name>
<version>1.2</version>
<command_line><![CDATA[ samtools view -b -F 1024 {GALvX_*} > DEEPID.tmp.nodup.bam ]]></command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>All reads marked as duplicates are removed for peak calling and read coverage computation . The output of this command is temporary and discarded after the analysis.</comment>
</tool>
<tool>
<name>MACS2</name>
<version>2.1.0.20140616</version>
<command_line><![CDATA[ macs2 callpeak -t DEEPID_Histone.tmp.nodup.bam -c DEEPID_Input.tmp.nodup.bam -f BAM --gsize {genomesize} --keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad} ]]></command_line>
<loop>GALvX_Histone</loop>
<comment>parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3</comment>
</tool>
<tool>
<name>bamCoverage</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.nodup.bam --outFileName {DEEPID.PROC.DATE.bamcov} --outFileFormat bigwig --normalizeTo1x {genomesize} --fragmentLength {*_fraglen} ]]></command_line>
<loop>DEEPID.tmp.nodup.bam</loop>
<comment>report read coverage normalized to 1x sequencing depth based on duplicate-removed BAM files</comment>
</tool>
<tool>
<name>samtools</name>
<version>1.2</version>
<command_line><![CDATA[ samtools sort -m {samtools_memory} -n -@ {samtools_parallel} -T LIBRARY_tmpsrt -O bam DEEPID.tmp.nodup.bam ]]></command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Output is piped to next step</comment>
</tool>
<tool>
<name>bedtools</name>
<version>2.20.1</version>
<command_line><![CDATA[ bedtools pairtobed -ubam -type neither -abam - -b {blacklist_regions} ]]></command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Input is duplicate filtered BAM files, output is piped to next step</comment>
</tool>
<tool>
<name>samtools</name>
<version>1.2</version>
<command_line><![CDATA[ samtools sort -m {samtools_memory} -o DEEPID.tmp.nodup.blfilt.bam -@ {samtools_parallel} -T LIBRARY_tmpsrt2 -O bam - && samtools index DEEPID.tmp.nodup.blfilt.bam ]]></command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Input is piped from previous step and output written to a temporary BAM file that is discarded after the analysis</comment>
</tool>
<tool>
<name>bamCorrelate</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ bamCorrelate bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.nodup.blfilt.bam --plotFile {DEEPID.PROC.DATE.bamcorr} --corMethod pearson --labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --fragmentLength {all_avg_fraglen} --plotFileFormat svg ]]></command_line>
<loop>no looping</loop>
<comment>Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples</comment>
</tool>
<tool>
<name>bamCorrelate</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ bamCorrelate bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.nodup.blfilt.bam --plotFile {DEEPID.PROC.DATE.bamcorr} --corMethod spearman --labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --fragmentLength {all_avg_fraglen} --plotFileFormat svg ]]></command_line>
<loop>no looping</loop>
<comment>Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples</comment>
</tool>
<tool>
<name>bamFingerprint</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ bamFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.bamfgpr} --labels {plot_labels} --fragmentLength {all_avg_fraglen} --numberOfSamples 500000 --plotFileFormat svg ]]></command_line>
<loop>no looping</loop>
<comment>Input is all raw BAM files</comment>
</tool>
<tool>
<name>computeGCBias</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize} --genome {reference_genome} --fragmentLength {*_fraglen} --sampleSize 50000000 --GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg ]]></command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>Input is raw BAM file</comment>
</tool>
<tool>
<name>bamCompare</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input} --outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling} --ratio log2 --binSize 25 --fragmentLength {*_fraglen} ]]></command_line>
<loop>GALvX_Histone</loop>
<comment>generates log2 fold-change tracks of signal over input; scaling_method: &quot;readCount&quot; for samples H3K27me3/H3K9me3, &quot;SES&quot; else</comment>
</tool>
<tool>
<name>bamCoverage</name>
<version>1.5.9.1</version>
<command_line><![CDATA[ bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov} --outFileFormat bigwig --normalizeTo1x {genomesize} --fragmentLength {*_fraglen} ]]></command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
<comment>report read coverage normalized to 1x sequencing depth based on unfiltered BAM files</comment>
</tool>
<tool>
<name>potty_plotty.py</name>
<version>0.2</version>
<command_line><![CDATA[ potty_plotty.py --bg-label {Input_label} --bg-color {Input_color} --fg-label {H*_label} --fg-color {H*_color} --fg-values {Histone_bamcov} --bg-values {Input_bamcov} --plot-regions {control_regions} --binsize 25 --title DEEPID --outfile {DEEPID.PROC.DATE.control} ]]></command_line>
<loop>GALvX_Histone</loop>
<comment>the binsize 25 is selected in agreement with the value for deepTools</comment>
</tool>
</software>
</process>