Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
ENH: final version of CHPv5, validates against XSD and together with …
…AMD of test sample
  • Loading branch information
pebert committed Sep 15, 2017
1 parent 65a1b80 commit c96feef
Showing 1 changed file with 61 additions and 56 deletions.
117 changes: 61 additions & 56 deletions docs/quantification/chip-seq/CHPv5.xml
Expand Up @@ -79,159 +79,170 @@
</comment>
</filetype>
<filetype>
<identifier>QcSummary</identifier>
<format>JSON</format>
<identifier>GALvX_QCSummary</identifier>
<format>JSON / txt</format>
<quantity>collection</quantity>
<comment>
The median insert size (field: insertSizeMedian) is extracted from the QC summary file.
Note that for compatibility with previous alignment processes, the QC summary files
may also have the old tabular / text-based format (field: PE_insertsize (mapq&gt;0))
</comment>
</filetype>
</inputs>
<references>
<filetype>
<identifier>blacklist_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Blacklist region</comment>
</filetype>
<filetype>
<identifier>reference_genome</identifier>
<format>2bit</format>
<quantity>single</quantity>
<comment>The reference genome file; see DCC/download/results/references/genomes</comment>
</filetype>
<filetype>
<identifier>blacklist_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>Blacklist region</comment>
</filetype>
<filetype>
<identifier>chromosome_sizes</identifier>
<format>TSV</format>
<quantity>single</quantity>
<comment>2-column, tab-separated table of chromosome sizes for reference genome</comment>
</filetype>
<filetype>
<identifier>autosomal_regions</identifier>
<identifier>autosome_regions</identifier>
<format>BED</format>
<quantity>single</quantity>
<comment>A file listing all autosomes as BED regions for filtering</comment>
</filetype>
</references>
<outputs>
<filetype>
<identifier>DEEPID.PROC.DATE.raw.bamcov</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.raw.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Signal coverage track generated from raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.filt.bamcov</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.filt.bamcov</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>
Signal coverage track generated from filtered BAM files. -F 3844 / q >= 5 / blacklist removed
</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.ses-fc</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.ses.log2-Input</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>SES normalized fold-change signal</comment>
<comment>SES normalized signal-over-Input track</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.cnt-fc</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.cnt.log2-Input</identifier>
<format>bigwig</format>
<quantity>collection</quantity>
<comment>Read-count normalized fold-change signal</comment>
<comment>Read-count normalized signal-over-Input track</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.gcbias</identifier>
<format>svg</format>
<quantity>collection</quantity>
<comment>GC bias plot based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.gcfreq</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.gcfreq</identifier>
<format>txt</format>
<quantity>collection</quantity>
<comment>Obs./exp. GC read frequencies based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.emfit</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.hhmm.emfit</identifier>
<format>PDF</format>
<quantity>collection</quantity>
<comment>histoneHMM output visualizing the EM fit. Check this before using the histoneHMM output</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.out</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.hhmm.out</identifier>
<format>zip</format>
<quantity>collection</quantity>
<comment>Zip archive containing other histoneHMM output files (raw data files not needed by most users)</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.macs.out</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.macs.out</identifier>
<format>zip</format>
<quantity>collection</quantity>
<comment>Zip archive containing other MACS output files (raw data files not needed by most users)</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.hhmm.broad</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.hhmm.broad</identifier>
<format>BED / broadPeak</format>
<quantity>collection</quantity>
<comment>Histone and Input enriched regions called by histoneHMM</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.macs.broad</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.macs.broad</identifier>
<format>BED / broadPeak</format>
<quantity>collection</quantity>
<comment>Histone enriched regions called by MACS</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.macs.narrow</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.macs.narrow</identifier>
<format>BED / narrowPeak</format>
<quantity>collection</quantity>
<comment>Histone enriched regions called by MACS</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.fgpr</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.fgpr</identifier>
<format>SVG</format>
<quantity>single</quantity>
<comment>Fingerprint plots based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.qm-fgpr</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.qm-fgpr</identifier>
<format>txt</format>
<quantity>single</quantity>
<comment>Fingerprint quality metrics based on raw BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.counts-fgpr</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.counts-fgpr</identifier>
<format>tsv</format>
<quantity>single</quantity>
<comment>Fingerprint raw counts based on raw BAM files</comment>
</filetype>

<filetype>
<identifier>DEEPID.PROC.DATE.auto.counts-summ</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.auto.counts-summ</identifier>
<format>tsv</format>
<quantity>single</quantity>
<comment>multiBamSummary raw counts based on filtered and autosome-restricted BAM files</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.auto.summ</identifier>
<identifier>DEEPID.PROC.DATE.ASSM.auto.summ</identifier>
<format>npz</format>
<quantity>single</quantity>
<comment>
multiBamSummary data file based on filtered and autosome-restricted BAM files.
The format is a numpy compatible binary file.
</comment>
</filetype>

<filetype>
<identifier>DEEPID.PROC.DATE.ASSM.bamcorr</identifier>
<format>SVG</format>
<quantity>collection</quantity>
<comment>Correlation heatmaps using Pearson and Spearman correlation measure</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.ASSM.corrmat</identifier>
<format>tsv</format>
<quantity>collection</quantity>
<comment>Raw correlation matrices</comment>
</filetype>
</outputs>
<software>

<tool>
<name>bamCoverage</name>
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.bamcov}
bamCoverage -p {deeptools_parallel} --binSize 25 --bam {GALvX_*} --outFileName {DEEPID.PROC.DATE.ASSM.raw.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize}
]]>
</command_line>
Expand All @@ -245,7 +256,7 @@
<![CDATA[
computeGCBias -p {deeptools_parallel} --bamfile {GALvX_*} --effectiveGenomeSize {genomesize}
--genome {reference_genome} --sampleSize 50000000 --fragmentLength {*_fraglen}
--GCbiasFrequenciesFile {DEEPID.PROC.DATE.gcfreq} --biasPlot {DEEPID.PROC.DATE.gcbias} --plotFileFormat svg
--GCbiasFrequenciesFile {DEEPID.PROC.DATE.ASSM.gcfreq} --biasPlot {DEEPID.PROC.DATE.ASSM.gcbias} --plotFileFormat svg
]]>
</command_line>
<loop>GALvX_Histone, GALvX_Input</loop>
Expand All @@ -257,7 +268,7 @@
<command_line>
<![CDATA[
bamCompare -p {deeptools_parallel} --bamfile1 {GALvX_Histone} --bamfile2 {GALvX_Input}
--outFileName {DEEPID.PROC.DATE.bamcomp} --outFileFormat bigwig --scaleFactorsMethod {*_scaling}
--outFileName {DEEPID.PROC.DATE.ASSM.*.log2-Input} --outFileFormat bigwig --scaleFactorsMethod {*_scaling}
--ratio log2 --binSize 25
]]>
</command_line>
Expand All @@ -272,10 +283,10 @@
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.raw.fgpr}
plotFingerprint -p {deeptools_parallel} --bamfiles {GALvX_*} --plotFile {DEEPID.PROC.DATE.ASSM.fgpr}
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics {DEEPID.PROC.DATE.raw.qm-fgpr} --JSDsample {GALvX_Input}
--outRawCounts {DEEPID.PROC.DATE.raw.counts-fgpr}
--outQualityMetrics {DEEPID.PROC.DATE.ASSM.qm-fgpr} --JSDsample {GALvX_Input}
--outRawCounts {DEEPID.PROC.DATE.ASSM.counts-fgpr}
]]>
</command_line>
<loop>no looping</loop>
Expand Down Expand Up @@ -321,7 +332,7 @@
<version>2.5.3</version>
<command_line>
<![CDATA[
bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.filt.bam --outFileName {DEEPID.PROC.DATE.filt.bamcov}
bamCoverage -p {deeptools_parallel} --binSize 25 --bam DEEPID.tmp.filt.bam --outFileName {DEEPID.PROC.DATE.ASSM.filt.bamcov}
--outFileFormat bigwig --normalizeTo1x {genomesize} --blackListFileName {blacklist_regions} --ignoreForNorm chrX chrY chrM X Y M MT
]]>
</command_line>
Expand All @@ -338,10 +349,10 @@
<version>2.5.3</version>
<command_line>
<![CDATA[
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile DEEPID.PROC.DATE.filt.fgpr.tmp
plotFingerprint -p {deeptools_parallel} --bamfiles DEEPID.tmp.filt.bam --plotFile DEEPID.PROC.DATE.tmp.filt.fgpr
--labels {plot_labels} --plotTitle {plot_title} --numberOfSamples 500000 --plotFileFormat svg
--outQualityMetrics DEEPID.PROC.DATE.filt.qm-fgpr.tmp --JSDsample DEEPID_Input.tmp.filt.bam
--outRawCounts DEEPID.PROC.DATE.filt.counts-fgpr
--outQualityMetrics DEEPID.tmp.filt.qm-fgpr.tmp --JSDsample DEEPID_Input.tmp.filt.bam
--outRawCounts DEEPID.tmp.filt.counts-fgpr
]]>
</command_line>
<loop>no looping</loop>
Expand Down Expand Up @@ -370,7 +381,7 @@
<version>3.0</version>
<command_line>
<![CDATA[
zip -9 -X -j -q -D {DEEPID.PROC.DATE.macs.out} DEEPID_macs*
zip -9 -X -j -q -D {DEEPID.PROC.DATE.ASSM.macs.out} DEEPID_macs*
]]>
</command_line>
<loop>GALvX_Histone</loop>
Expand Down Expand Up @@ -398,7 +409,7 @@
<command_line>
<![CDATA[
cut -f 1,4,5,9 DEEPID-regions.gff | sort -V -k1,2 > DEEPID.hmm.bed &&
mv DEEPID-zinba-emfit.pdf {DEEPID.PROC.DATE.hhmm.emfit}
mv DEEPID-zinba-emfit.pdf {DEEPID.PROC.DATE.ASSM.hhmm.emfit}
]]>
</command_line>
<loop>DEEPID-regions.gff</loop>
Expand All @@ -409,7 +420,7 @@
<version>3.0</version>
<command_line>
<![CDATA[
zip -9 -X -j -q -D {DEEPID.PROC.DATE.hhmm.out} DEEPID_hhmm*
zip -9 -X -j -q -D {DEEPID.PROC.DATE.ASSM.hhmm.out} DEEPID_hhmm*
]]>
</command_line>
<loop>GALvX_Histone</loop>
Expand All @@ -418,8 +429,6 @@
-zinba-params-em.txt, .txt
</comment>
</tool>


<tool>
<name>sambamba</name>
<version>0.6.6</version>
Expand All @@ -432,7 +441,6 @@
<loop>DEEPID.tmp.filt.bam</loop>
<comment>Count number of reads overlapping peak regions, later used for FRiP score</comment>
</tool>

<tool>
<name>bedtools</name>
<version>2.26.0</version>
Expand All @@ -444,7 +452,6 @@
<loop>peak_file</loop>
<comment>Intersect all peak files with blacklist regions for flagging</comment>
</tool>

<tool>
<name>bedtools</name>
<version>2.26.0</version>
Expand All @@ -456,14 +463,13 @@
<loop>histoneHMM_peak_file</loop>
<comment>Intersect all histoneHMM peak files with Input peaks for flagging</comment>
</tool>

<tool>
<name>Python</name>
<version>2.7.13</version>
<command_line>
<![CDATA[
pipeline-merge peak-file DEEPID.peak-ovl-bl DEEPID.peak-ovl-input
> {DEEPID.PROC.DATE.macs.narrow} {DEEPID.PROC.DATE.macs.broad} {DEEPID.PROC.DATE.hhmm.broad}
> {DEEPID.PROC.DATE.ASSM.macs.narrow} {DEEPID.PROC.DATE.ASSM.macs.broad} {DEEPID.PROC.DATE.ASSM.hhmm.broad}
]]>
</command_line>
<loop>peak-file</loop>
Expand All @@ -473,7 +479,6 @@
standard broadPeak/narrowPeak format specifications.
</comment>
</tool>

<tool>
<name>sambamba</name>
<version>0.6.6</version>
Expand All @@ -493,9 +498,9 @@
<version>2.5.3</version>
<command_line>
<![CDATA[
multiBamSummary bins -p 8 --bamfiles DEEPID.tmp.auto.bam --outFileName {DEEPID.PROC.DATE.auto.summ}
multiBamSummary bins -p 8 --bamfiles DEEPID.tmp.auto.bam --outFileName {DEEPID.PROC.DATE.ASSM.auto.summ}
--labels {plot_labels} --binSize 1000 --distanceBetweenBins 2000 --blackListFileName {blacklist_regions}
--outRawCounts {DEEPID.PROC.DATE.auto.counts-summ}
--outRawCounts {DEEPID.PROC.DATE.ASSM.auto.counts-summ}
]]>
</command_line>
<loop>no looping</loop>
Expand All @@ -506,10 +511,10 @@
<version>2.5.3</version>
<command_line>
<![CDATA[
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod {cor_method}
plotCorrelation bins --corData SAMPLEID.npz --plotFile {DEEPID.PROC.DATE.ASSM.bamcorr} --whatToPlot heatmap
--plotTitle {plot_title} --plotFileFormat svg --corMethod corr_method
--plotNumbers --zMin -1 --zMax 1 --colorMap coolwarm
--outFileCorMatrix {DEEPID.PROC.DATE.corrmat}
--outFileCorMatrix {DEEPID.PROC.DATE.ASSM.corrmat}
]]>
</command_line>
<loop>no looping</loop>
Expand Down

0 comments on commit c96feef

Please sign in to comment.