Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
comp-metadata/docs/quantification/chip-seq/CHPv4.xml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
243 lines (242 sloc)
13.4 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0"?> | |
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?> | |
<process> | |
<name>CHP</name> | |
<version>4</version> | |
<author> | |
<name>Andreas Richter, Peter Ebert</name> | |
<email>arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de</email> | |
</author> | |
<description> | |
Process CHPv4 is a minor update of the previous version that accounts for unresolved quality problems mainly in the mouse ChIP data. | |
The mouse data show a higher duplication rate on average (though not as high as some of the human cell line samples), plus the heterochromatic marks | |
show problematic behavior in the correlation control plot (using Pearson; this process now computes the Pearson correlation on blacklist-filtered and duplicates-removed | |
BAM files). Currently, it is unclear if that is simply due to a sub-optimal blacklist for mouse or indeed indicative of strong outliers in the data. | |
Any downstream analysis should thoroughly check for outliers interfering with detected signal. This process creates an additional QC plot | |
using the Spearman correlation metric that is more robust w.r.t. outliers. Additionally, coverage tracks are also created from duplicate-free BAM files. | |
Otherwise, this process is identical to the previous version. | |
It is assumed that a QC summary file is always available that holds information on the fragment length in the respective library. | |
This process takes as data input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks. | |
Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would | |
subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control) | |
in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the genome-wide quality | |
of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for | |
layout reasons and cannot be taken as a solid data normalization procedure. | |
Erratum: the metadata in the .amd.tsv file generated by this process are partially wrong. | |
Specifically, the FRiP score is too low (~ too pessimistic) since it is calculated using all reads instead | |
of just the fraction of mapped reads. | |
</description> | |
<inputs> | |
<filetype> | |
<identifier>GALvX_Histone</identifier> | |
<format>BAM</format> | |
<quantity>collection</quantity> | |
<comment></comment> | |
</filetype> | |
<filetype> | |
<identifier>GALvX_Input</identifier> | |
<format>BAM</format> | |
<quantity>single</quantity> | |
<comment></comment> | |
</filetype> | |
<filetype> | |
<identifier>GALvX_Index</identifier> | |
<format>BAI index</format> | |
<quantity>collection</quantity> | |
<comment>Note that there is no separation of Histone and Input for the index files as one index file is required per BAM file irrespective of its type</comment> | |
</filetype> | |
<filetype> | |
<identifier>GALvX_QcSummary</identifier> | |
<format>TXT</format> | |
<quantity>collection</quantity> | |
<comment>The median fragment length is computed during the mapping in the GAL process and stored in the QcSummary file. This information is used in the CHP process.</comment> | |
</filetype> | |
</inputs> | |
<references> | |
<filetype> | |
<identifier>blacklist_regions</identifier> | |
<format>BED</format> | |
<quantity>single</quantity> | |
<comment>ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations</comment> | |
</filetype> | |
<filetype> | |
<identifier>reference_genome</identifier> | |
<format>2bit</format> | |
<quantity>single</quantity> | |
<comment>The reference genome file; see DCC/download/results/references/genomes</comment> | |
</filetype> | |
<filetype> | |
<identifier>control_regions</identifier> | |
<format>BED</format> | |
<quantity>single</quantity> | |
<comment>Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations</comment> | |
</filetype> | |
</references> | |
<outputs> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.bamcorr</identifier> | |
<format>SVG</format> | |
<quantity>collection</quantity> | |
<comment>joint deepTools graphics output for all histone marks plus input control; one using Pearson and one using Spearman metric</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.bamfgpr</identifier> | |
<format>SVG</format> | |
<quantity>single</quantity> | |
<comment>joint deepTools graphics output for all histone marks plus input control</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.gcbias</identifier> | |
<format>SVG</format> | |
<quantity>collection</quantity> | |
<comment>deepTools graphics output</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.gcfreq</identifier> | |
<format>tab-separated text file</format> | |
<quantity>collection</quantity> | |
<comment>Required output file, currently not used for any downstream analysis</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.peaks</identifier> | |
<format>XLS table</format> | |
<quantity>collection</quantity> | |
<comment>Standard MACS2 output XLS table for broad and narrow marks</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.broadPeak</identifier> | |
<format>broadPeak</format> | |
<quantity>collection</quantity> | |
<comment>Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.gappedPeak</identifier> | |
<format>gappedPeak</format> | |
<quantity>collection</quantity> | |
<comment>Standard MACS2 output in ENCODE's gappedPeak format for broad marks</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.summits</identifier> | |
<format>BED</format> | |
<quantity>collection</quantity> | |
<comment>Standard MACS2 output for narrow marks</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.narrowPeak</identifier> | |
<format>narrowPeak</format> | |
<quantity>collection</quantity> | |
<comment>Standard MACS2 output for narrow marks, this file is usually used for downstream analyses</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.bamcomp</identifier> | |
<format>bigwig</format> | |
<quantity>collection</quantity> | |
<comment>Input-normalized histone signal tracks</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.bamcov</identifier> | |
<format>bigwig</format> | |
<quantity>collection</quantity> | |
<comment>Sequencing-depth normalized signal coverage tracks; created on raw BAM files and on duplicate-filtered BAM files</comment> | |
</filetype> | |
<filetype> | |
<identifier>DEEPID.PROC.DATE.control</identifier> | |
<format>SVG</format> | |
<quantity>collection</quantity> | |
<comment>A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot.</comment> | |
</filetype> | |
</outputs> | |
<software> | |
<tool> | |
<name>samtools</name> | |
<version>1.2</version> | |
<command_line><![CDATA[ samtools view -b -F 1024 {GALvX_*} > DEEPID.tmp.nodup.bam ]]></command_line> | |
<loop>GALvX_Histone, GALvX_Input</loop> | |
<comment>All reads marked as duplicates are removed for peak calling and read coverage computation . The output of this command is temporary and discarded after the analysis.</comment> | |
</tool> | |
<tool> | |
<name>MACS2</name> | |
<version>2.1.0.20140616</version> | |
<command_line><![CDATA[ macs2 callpeak -t DEEPID_Histone.tmp.nodup.bam -c DEEPID_Input.tmp.nodup.bam -f BAM --gsize {genomesize} --keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad} ]]></command_line> | |
<loop>GALvX_Histone</loop> | |
<comment>parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3</comment> | |
</tool> | |
<tool> | |
<name>bamCoverage</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.nodup.bam --outFileName {DEEPID.PROC.DATE.bamcov} --outFileFormat bigwig --normalizeTo1x {genomesize} --fragmentLength {*_fraglen} ]]></command_line> | |
<loop>DEEPID.tmp.nodup.bam</loop> | |
<comment>report read coverage normalized to 1x sequencing depth based on duplicate-removed BAM files</comment> | |
</tool> | |
<tool> | |
<name>samtools</name> | |
<version>1.2</version> | |
<command_line><![CDATA[ samtools sort -m {samtools_memory} -n -@ {samtools_parallel} -T LIBRARY_tmpsrt -O bam DEEPID.tmp.nodup.bam ]]></command_line> | |
<loop>GALvX_Histone, GALvX_Input</loop> | |
<comment>Output is piped to next step</comment> | |
</tool> | |
<tool> | |
<name>bedtools</name> | |
<version>2.20.1</version> | |
<command_line><![CDATA[ bedtools pairtobed -ubam -type neither -abam - -b {blacklist_regions} ]]></command_line> | |
<loop>GALvX_Histone, GALvX_Input</loop> | |
<comment>Input is duplicate filtered BAM files, output is piped to next step</comment> | |
</tool> | |
<tool> | |
<name>samtools</name> | |
<version>1.2</version> | |
<command_line><![CDATA[ samtools sort -m {samtools_memory} -o DEEPID.tmp.nodup.blfilt.bam -@ {samtools_parallel} -T LIBRARY_tmpsrt2 -O bam - && samtools index DEEPID.tmp.nodup.blfilt.bam ]]></command_line> | |
<loop>GALvX_Histone, GALvX_Input</loop> | |
<comment>Input is piped from previous step and output written to a temporary BAM file that is discarded after the analysis</comment> | |
</tool> | |
<tool> | |
<name>bamCorrelate</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ bamCorrelate bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.nodup.blfilt.bam --plotFile {DEEPID.PROC.DATE.bamcorr} --corMethod pearson --labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --fragmentLength {all_avg_fraglen} --plotFileFormat svg ]]></command_line> | |
<loop>no looping</loop> | |
<comment>Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples</comment> | |
</tool> | |
<tool> | |
<name>bamCorrelate</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ bamCorrelate bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.nodup.blfilt.bam --plotFile {DEEPID.PROC.DATE.bamcorr} --corMethod spearman --labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --fragmentLength {all_avg_fraglen} --plotFileFormat svg ]]></command_line> | |
<loop>no looping</loop> | |
<comment>Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples</comment> | |
</tool> | |
<tool> | |
<name>bamFingerprint</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ bamFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.bamfgpr} --labels {plot_labels} --fragmentLength {all_avg_fraglen} --numberOfSamples 500000 --plotFileFormat svg ]]></command_line> | |
<loop>no looping</loop> | |
<comment>Input is all raw BAM files</comment> | |
</tool> | |
<tool> | |
<name>computeGCBias</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize} --genome {reference_genome} --fragmentLength {*_fraglen} --sampleSize 50000000 --GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg ]]></command_line> | |
<loop>GALvX_Histone, GALvX_Input</loop> | |
<comment>Input is raw BAM file</comment> | |
</tool> | |
<tool> | |
<name>bamCompare</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input} --outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling} --ratio log2 --binSize 25 --fragmentLength {*_fraglen} ]]></command_line> | |
<loop>GALvX_Histone</loop> | |
<comment>generates log2 fold-change tracks of signal over input; scaling_method: "readCount" for samples H3K27me3/H3K9me3, "SES" else</comment> | |
</tool> | |
<tool> | |
<name>bamCoverage</name> | |
<version>1.5.9.1</version> | |
<command_line><![CDATA[ bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov} --outFileFormat bigwig --normalizeTo1x {genomesize} --fragmentLength {*_fraglen} ]]></command_line> | |
<loop>GALvX_Histone, GALvX_Input</loop> | |
<comment>report read coverage normalized to 1x sequencing depth based on unfiltered BAM files</comment> | |
</tool> | |
<tool> | |
<name>potty_plotty.py</name> | |
<version>0.2</version> | |
<command_line><![CDATA[ potty_plotty.py --bg-label {Input_label} --bg-color {Input_color} --fg-label {H*_label} --fg-color {H*_color} --fg-values {Histone_bamcov} --bg-values {Input_bamcov} --plot-regions {control_regions} --binsize 25 --title DEEPID --outfile {DEEPID.PROC.DATE.control} ]]></command_line> | |
<loop>GALvX_Histone</loop> | |
<comment>the binsize 25 is selected in agreement with the value for deepTools</comment> | |
</tool> | |
</software> | |
</process> |