CHPv3.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
	<name>CHP</name>
	<version>3</version>
	<author>
		<name>Andreas Richter, Peter Ebert</name>
		<email>arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de</email>
	</author>
	<description>
        Process CHPv3 had to be created due to software updates and since it is now policy to remove reads marked as duplicates prior to the peak calling step. Additionally, since the
        upstream alignment processes have stabilized, it is now assumed that a QC summary file is always available that holds information on the fragment length in the respective library.
        Otherwise, this process version still has the identical workflow compared to previous versions.
        This process takes as data input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks.
        Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would
        subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control)
        in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the genome-wide quality
        of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for
        layout reasons and cannot be taken as a solid data normalization procedure.

		Erratum: the metadata in the .amd.tsv file generated by this process are partially wrong.
		Specifically, the FRiP score is too low (~ too pessimistic) since it is calculated using all reads instead
		of just the fraction of mapped reads.
	</description>
	<inputs>
		<filetype>
			<identifier>GALvX_Histone</identifier>
			<format>BAM</format>
			<quantity>collection</quantity>
			<comment></comment>
		</filetype>
		<filetype>
			<identifier>GALvX_Input</identifier>
			<format>BAM</format>
			<quantity>single</quantity>
			<comment></comment>
		</filetype>
		<filetype>
			<identifier>GALvX_Index</identifier>
			<format>BAI index</format>
			<quantity>collection</quantity>
			<comment>Note that there is no separation of Histone and Input for the index files as one index file is required per BAM file irrespective of its type</comment>
		</filetype>
        <filetype>
			<identifier>GALvX_QcSummary</identifier>
			<format>TXT</format>
			<quantity>collection</quantity>
			<comment>The median fragment length is computed during the mapping in the GAL process and stored in the QcSummary file. This information is used in the CHP process.</comment>
		</filetype>
	</inputs>
	<references>
		<filetype>
			<identifier>blacklist_regions</identifier>
			<format>BED</format>
			<quantity>single</quantity>
			<comment>ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations</comment>
		</filetype>
		<filetype>
			<identifier>reference_genome</identifier>
			<format>2bit</format>
			<quantity>single</quantity>
			<comment>The reference genome file; see DCC/download/results/references/genomes</comment>
		</filetype>
		<filetype>
			<identifier>control_regions</identifier>
			<format>BED</format>
			<quantity>single</quantity>
			<comment>Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations</comment>
		</filetype>
	</references>
	<outputs>
		<filetype>
			<identifier>DEEPID.PROC.DATE.bamcorr</identifier>
			<format>SVG</format>
			<quantity>single</quantity>
			<comment>joint deepTools graphics output for all histone marks plus input control</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.bamfgpr</identifier>
			<format>SVG</format>
			<quantity>single</quantity>
			<comment>joint deepTools graphics output for all histone marks plus input control</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.gcbias</identifier>
			<format>SVG</format>
			<quantity>collection</quantity>
			<comment>deepTools graphics output</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
			<format>tab-separated text file</format>
			<quantity>collection</quantity>
			<comment>Required output file, currently not used for any downstream analysis</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.peaks</identifier>
			<format>XLS table</format>
			<quantity>collection</quantity>
			<comment>Standard MACS2 output XLS table for broad and narrow marks</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.broadPeak</identifier>
			<format>broadPeak</format>
			<quantity>collection</quantity>
			<comment>Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.gappedPeak</identifier>
			<format>gappedPeak</format>
			<quantity>collection</quantity>
			<comment>Standard MACS2 output in ENCODE's gappedPeak format for broad marks</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.summits</identifier>
			<format>BED</format>
			<quantity>collection</quantity>
			<comment>Standard MACS2 output for narrow marks</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.narrowPeak</identifier>
			<format>narrowPeak</format>
			<quantity>collection</quantity>
			<comment>Standard MACS2 output for narrow marks, this file is usually used for downstream analyses</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.bamcomp</identifier>
			<format>bigwig</format>
			<quantity>collection</quantity>
			<comment>Input-normalized histone signal tracks</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.bamcov</identifier>
			<format>bigwig</format>
			<quantity>collection</quantity>
			<comment>Sequencing-depth normalized signal coverage tracks</comment>
		</filetype>
		<filetype>
			<identifier>DEEPID.PROC.DATE.control</identifier>
			<format>SVG</format>
			<quantity>collection</quantity>
			<comment>A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot.</comment>
		</filetype>
	</outputs>
	<software>
        <tool>
            <name>samtools</name>
            <version>1.2</version>
            <command_line><![CDATA[ samtools sort -m {samtools_memory} -n -@ {samtools_parallel} -T LIBRARY_tmpsrt -O bam {GALvX_*} ]]></command_line>
            <loop>GALvX_Histone, GALvX_Input</loop>
            <comment>Output is piped to next step</comment>
        </tool>
        <tool>
            <name>bedtools</name>
            <version>2.20.1</version>
            <command_line><![CDATA[ bedtools pairtobed -ubam -type neither -abam - -b {blacklist_regions} ]]></command_line>
            <loop>GALvX_Histone, GALvX_Input</loop>
            <comment>Input is piped from previous step, output is piped to next step</comment>
        </tool>
        <tool>
            <name>samtools</name>
            <version>1.2</version>
            <command_line><![CDATA[ samtools sort -m {samtools_memory} -o DEEPID.tmp.blfilt.bam -@ {samtools_parallel} -T LIBRARY_tmpsrt2 -O bam - && samtools index DEEPID.tmp.blfilt.bam ]]></command_line>
            <loop>GALvX_Histone, GALvX_Input</loop>
            <comment>Input is piped from previous step and output written to a temporary BAM file that is discarded after the analysis</comment>
        </tool>
		<tool>
			<name>bamCorrelate</name>
			<version>1.5.9.1</version>
			<command_line><![CDATA[ bamCorrelate bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.blfilt.bam --plotFile {DEEPID.PROC.DATE.bamcorr} --corMethod pearson --labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --fragmentLength {all_avg_fraglen} --plotFileFormat svg ]]></command_line>
			<loop>no looping</loop>
			<comment>Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples</comment>
		</tool>
		<tool>
			<name>bamFingerprint</name>
			<version>1.5.9.1</version>
			<command_line><![CDATA[ bamFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.bamfgpr} --labels {plot_labels} --fragmentLength {all_avg_fraglen} --numberOfSamples 500000 --plotFileFormat svg ]]></command_line>
			<loop>no looping</loop>
			<comment></comment>
		</tool>
		<tool>
			<name>computeGCBias</name>
			<version>1.5.9.1</version>
			<command_line><![CDATA[ computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize} --genome {reference_genome} --fragmentLength {*_fraglen} --sampleSize 50000000 --GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg ]]></command_line>
			<loop>GALvX_Histone, GALvX_Input</loop>
			<comment></comment>
		</tool>
        <tool>
            <name>Picardtools</name>
            <version>1.130</version>
            <command_line><![CDATA[ java -mx{java_memory} -jar picard.jar MarkDuplicates INPUT={GALvX_*} OUTPUT=DEEPID.tmp.nodup.bam METRICS_FILE=/dev/null REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT QUIET=true VERBOSITY=ERROR ]]></command_line>
            <loop>GALvX_Histone, GALvX_Input</loop>
            <comment>All reads marked as duplicates are removed for the peak calling. The output of this command is temporary and discarded after the peak calling.</comment>
        </tool>
		<tool>
			<name>MACS2</name>
			<version>2.1.0.20140616</version>
			<command_line><![CDATA[ macs2 callpeak -t DEEPID_Histone.tmp.nodup.bam -c DEEPID_Input.tmp.nodup.bam -f BAM --gsize {genomesize} --keep-dup all --name {*_name_prefix} --nomodel --extsize {*_fraglen} --qvalue 0.05 {*_broad} ]]></command_line>
			<loop>GALvX_Histone</loop>
			<comment>parameter &quot;--broad&quot; for samples H3K4me1/H3K27me3/H3K36me/H3K9me3</comment>
		</tool>
		<tool>
			<name>bamCompare</name>
			<version>1.5.9.1</version>
			<command_line><![CDATA[ bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input} --outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling} --ratio log2 --fragmentLength {*_fraglen} ]]></command_line>
			<loop>GALvX_Histone</loop>
			<comment>generates fold-change signal tracks; scaling_method: &quot;readCount&quot; for samples H3K27me3/H3K9me3, &quot;SES&quot; else</comment>
		</tool>
		<tool>
			<name>bamCoverage</name>
			<version>1.5.9.1</version>
			<command_line><![CDATA[ bamCoverage -p {deeptools_parallel} --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov} --outFileFormat bigwig --normalizeTo1x {genomesize} --fragmentLength {*_fraglen} ]]></command_line>
			<loop>GALvX_Histone, GALvX_Input</loop>
			<comment>report read coverage normalized to 1x sequencing depth</comment>
		</tool>
		<tool>
			<name>potty_plotty.py</name>
			<version>0.2</version>
			<command_line><![CDATA[ potty_plotty.py --bg-label {Input_label} --bg-color {Input_color} --fg-label {H*_label} --fg-color {H*_color} --fg-values {Histone_bamcov} --bg-values {Input_bamcov} --plot-regions {control_regions} --binsize 50 --title DEEPID --outfile {DEEPID.PROC.DATE.control} ]]></command_line>
			<loop>GALvX_Histone</loop>
			<comment>the binsize is selected according to the default value for deepTools (ie 50bp)</comment>
		</tool>
	</software>
</process>
	<?xml version="1.0"?>
	<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
	<process>
	<name>CHP</name>
	<version>3</version>
	<author>
	<name>Andreas Richter, Peter Ebert</name>
	<email>arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de</email>
	</author>
	<description>
	Process CHPv3 had to be created due to software updates and since it is now policy to remove reads marked as duplicates prior to the peak calling step. Additionally, since the
	upstream alignment processes have stabilized, it is now assumed that a QC summary file is always available that holds information on the fragment length in the respective library.
	Otherwise, this process version still has the identical workflow compared to previous versions.
	This process takes as data input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks.
	Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would
	subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control)
	in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the genome-wide quality
	of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for
	layout reasons and cannot be taken as a solid data normalization procedure.

	Erratum: the metadata in the .amd.tsv file generated by this process are partially wrong.
	Specifically, the FRiP score is too low (~ too pessimistic) since it is calculated using all reads instead
	of just the fraction of mapped reads.
	</description>
	<inputs>
	<filetype>
	<identifier>GALvX_Histone</identifier>
	<format>BAM</format>
	<quantity>collection</quantity>
	<comment></comment>
	</filetype>
	<filetype>
	<identifier>GALvX_Input</identifier>
	<format>BAM</format>
	<quantity>single</quantity>
	<comment></comment>
	</filetype>
	<filetype>
	<identifier>GALvX_Index</identifier>
	<format>BAI index</format>
	<quantity>collection</quantity>
	<comment>Note that there is no separation of Histone and Input for the index files as one index file is required per BAM file irrespective of its type</comment>
	</filetype>
	<filetype>
	<identifier>GALvX_QcSummary</identifier>
	<format>TXT</format>
	<quantity>collection</quantity>
	<comment>The median fragment length is computed during the mapping in the GAL process and stored in the QcSummary file. This information is used in the CHP process.</comment>
	</filetype>
	</inputs>
	<references>
	<filetype>
	<identifier>blacklist_regions</identifier>
	<format>BED</format>
	<quantity>single</quantity>
	<comment>ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations</comment>
	</filetype>
	<filetype>
	<identifier>reference_genome</identifier>
	<format>2bit</format>
	<quantity>single</quantity>
	<comment>The reference genome file; see DCC/download/results/references/genomes</comment>
	</filetype>
	<filetype>
	<identifier>control_regions</identifier>
	<format>BED</format>
	<quantity>single</quantity>
	<comment>Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations</comment>
	</filetype>
	</references>
	<outputs>
	<filetype>
	<identifier>DEEPID.PROC.DATE.bamcorr</identifier>
	<format>SVG</format>
	<quantity>single</quantity>
	<comment>joint deepTools graphics output for all histone marks plus input control</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.bamfgpr</identifier>
	<format>SVG</format>
	<quantity>single</quantity>
	<comment>joint deepTools graphics output for all histone marks plus input control</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.gcbias</identifier>
	<format>SVG</format>
	<quantity>collection</quantity>
	<comment>deepTools graphics output</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
	<format>tab-separated text file</format>
	<quantity>collection</quantity>
	<comment>Required output file, currently not used for any downstream analysis</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.peaks</identifier>
	<format>XLS table</format>
	<quantity>collection</quantity>
	<comment>Standard MACS2 output XLS table for broad and narrow marks</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.broadPeak</identifier>
	<format>broadPeak</format>
	<quantity>collection</quantity>
	<comment>Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.gappedPeak</identifier>
	<format>gappedPeak</format>
	<quantity>collection</quantity>
	<comment>Standard MACS2 output in ENCODE's gappedPeak format for broad marks</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.summits</identifier>
	<format>BED</format>
	<quantity>collection</quantity>
	<comment>Standard MACS2 output for narrow marks</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.narrowPeak</identifier>
	<format>narrowPeak</format>
	<quantity>collection</quantity>
	<comment>Standard MACS2 output for narrow marks, this file is usually used for downstream analyses</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.bamcomp</identifier>
	<format>bigwig</format>
	<quantity>collection</quantity>
	<comment>Input-normalized histone signal tracks</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.bamcov</identifier>
	<format>bigwig</format>
	<quantity>collection</quantity>
	<comment>Sequencing-depth normalized signal coverage tracks</comment>
	</filetype>
	<filetype>
	<identifier>DEEPID.PROC.DATE.control</identifier>
	<format>SVG</format>
	<quantity>collection</quantity>
	<comment>A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot.</comment>
	</filetype>
	</outputs>
	<software>
	<tool>
	<name>samtools</name>
	<version>1.2</version>
	<command_line><![CDATA[ samtools sort -m {samtools_memory} -n -@ {samtools_parallel} -T LIBRARY_tmpsrt -O bam {GALvX_*} ]]></command_line>
	<loop>GALvX_Histone, GALvX_Input</loop>
	<comment>Output is piped to next step</comment>
	</tool>
	<tool>
	<name>bedtools</name>
	<version>2.20.1</version>
	<command_line><![CDATA[ bedtools pairtobed -ubam -type neither -abam - -b {blacklist_regions} ]]></command_line>
	<loop>GALvX_Histone, GALvX_Input</loop>
	<comment>Input is piped from previous step, output is piped to next step</comment>
	</tool>
	<tool>
	<name>samtools</name>
	<version>1.2</version>
	<command_line><![CDATA[ samtools sort -m {samtools_memory} -o DEEPID.tmp.blfilt.bam -@ {samtools_parallel} -T LIBRARY_tmpsrt2 -O bam - && samtools index DEEPID.tmp.blfilt.bam ]]></command_line>
	<loop>GALvX_Histone, GALvX_Input</loop>
	<comment>Input is piped from previous step and output written to a temporary BAM file that is discarded after the analysis</comment>
	</tool>
	<tool>
	<name>bamCorrelate</name>
	<version>1.5.9.1</version>
	<command_line><![CDATA[ bamCorrelate bins -p {deeptools_parallel} --bamfiles DEEPID.tmp.blfilt.bam --plotFile {DEEPID.PROC.DATE.bamcorr} --corMethod pearson --labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --fragmentLength {all_avg_fraglen} --plotFileFormat svg ]]></command_line>
	<loop>no looping</loop>
	<comment>Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples</comment>
	</tool>
	<tool>
	<name>bamFingerprint</name>
	<version>1.5.9.1</version>
	<command_line><![CDATA[ bamFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.bamfgpr} --labels {plot_labels} --fragmentLength {all_avg_fraglen} --numberOfSamples 500000 --plotFileFormat svg ]]></command_line>
	<loop>no looping</loop>
	<comment></comment>
	</tool>
	<tool>
	<name>computeGCBias</name>
	<version>1.5.9.1</version>
	<command_line><![CDATA[ computeGCBias -p {deeptools_parallel} --bamfile {GALvX_} --effectiveGenomeSize {genomesize} --genome {reference_genome} --fragmentLength {_fraglen} --sampleSize 50000000 --GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg ]]></command_line>
	<loop>GALvX_Histone, GALvX_Input</loop>
	<comment></comment>
	</tool>
	<tool>
	<name>Picardtools</name>
	<version>1.130</version>
	<command_line><![CDATA[ java -mx{java_memory} -jar picard.jar MarkDuplicates INPUT={GALvX_*} OUTPUT=DEEPID.tmp.nodup.bam METRICS_FILE=/dev/null REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT QUIET=true VERBOSITY=ERROR ]]></command_line>
	<loop>GALvX_Histone, GALvX_Input</loop>
	<comment>All reads marked as duplicates are removed for the peak calling. The output of this command is temporary and discarded after the peak calling.</comment>
	</tool>
	<tool>
	<name>MACS2</name>
	<version>2.1.0.20140616</version>
	<command_line><![CDATA[ macs2 callpeak -t DEEPID_Histone.tmp.nodup.bam -c DEEPID_Input.tmp.nodup.bam -f BAM --gsize {genomesize} --keep-dup all --name {_name_prefix} --nomodel --extsize {_fraglen} --qvalue 0.05 {*_broad} ]]></command_line>
	<loop>GALvX_Histone</loop>
	<comment>parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3</comment>
	</tool>
	<tool>
	<name>bamCompare</name>
	<version>1.5.9.1</version>
	<command_line><![CDATA[ bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input} --outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {_scaling} --ratio log2 --fragmentLength {_fraglen} ]]></command_line>
	<loop>GALvX_Histone</loop>
	<comment>generates fold-change signal tracks; scaling_method: "readCount" for samples H3K27me3/H3K9me3, "SES" else</comment>
	</tool>
	<tool>
	<name>bamCoverage</name>
	<version>1.5.9.1</version>
	<command_line><![CDATA[ bamCoverage -p {deeptools_parallel} --bam {GALvX_} --outFileName {DEEPID.PROC.DATE.bamcov} --outFileFormat bigwig --normalizeTo1x {genomesize} --fragmentLength {_fraglen} ]]></command_line>
	<loop>GALvX_Histone, GALvX_Input</loop>
	<comment>report read coverage normalized to 1x sequencing depth</comment>
	</tool>
	<tool>
	<name>potty_plotty.py</name>
	<version>0.2</version>
	<command_line><![CDATA[ potty_plotty.py --bg-label {Input_label} --bg-color {Input_color} --fg-label {H_label} --fg-color {H_color} --fg-values {Histone_bamcov} --bg-values {Input_bamcov} --plot-regions {control_regions} --binsize 50 --title DEEPID --outfile {DEEPID.PROC.DATE.control} ]]></command_line>
	<loop>GALvX_Histone</loop>
	<comment>the binsize is selected according to the default value for deepTools (ie 50bp)</comment>
	</tool>
	</software>
	</process>