diff --git a/docs/quantification/chip-seq/CHPv0.xml b/docs/quantification/chip-seq/CHPv0.xml new file mode 100644 index 0000000..11abc8e --- /dev/null +++ b/docs/quantification/chip-seq/CHPv0.xml @@ -0,0 +1,123 @@ + + + + CHP + 0 + + Peter Ebert + pebert@mpi-inf.mpg.de + + + ANDREAS: please insert general comments here + + + + ALNv0_histone.bam + + collection + + + + ALNv0_input.bam + + single + + + + ALNv0.bai + + collection + Index files are renamed internally to .bam.bai since deepTools is expecting index naming like this + + + + + ref_genome + 2bit + single + The reference genome file + + + chip_ctrl_regions + BED + single + Control regions obtained from Freiburg for quality control of ChIPseq samples + + + + + samplesID.PROCESS.DATE.corplot.png + deepTools graphics PNG + single + + + + samplesID.PROCESS.DATE.fgprplot.png + deepTools graphics PNG + single + + + + sampleID.PROCESS.DATE.gcbplot.png + deepTools graphics PNG + collection + + + + sampleID.PROCESS.DATE.gcbfreq.txt + + collection + + + + sampleIDs.PROCESS.DATE.bamcomp.bw + + collection + + + + + + bamCorrelate (deepTools) + 1.5.1-252-gbbb0711 + + no looping + + + + bamFingerprint (deepTools) + 1.5.1-252-gbbb0711 + + no looping + + + + computeGCBias (deepTools) + 1.5.1-252-gbbb0711 + + no looping + + + + MACS2 + macs2 2.0.10.20130915 (tag:beta) + + no looping + + + + bamCompare (deepTools) + 1.5.1-252-gbbb0711 + + no looping + + + + bamCoverage (deepTools) + 1.5.1-252-gbbb0711 + + no looping + + + + \ No newline at end of file diff --git a/docs/quantification/chip-seq/CHPv1.xml b/docs/quantification/chip-seq/CHPv1.xml new file mode 100644 index 0000000..48ff356 --- /dev/null +++ b/docs/quantification/chip-seq/CHPv1.xml @@ -0,0 +1,160 @@ + + + + CHP + 1 + + Andreas Richter, Peter Ebert + arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de + + + Process CHPv1 has been used to analyse the first set of DEEP pilot data. + * {genomesize}: effective genome size (for deepTools and MACS2): for deeply sequenced samples and random mapping strategy approximated by (genome size) - (#N) + * human hs37d5 (1k genomes): 2.9e9 + * mouse mm10: 2.65e9 + * {fragment_length}: individual median fragment length/insert size of each sample derived from the BAM file (based on alignment statistics from HD [see email from Barbara Hutter, 16 October 2013] or computed by PE_fragment_size) + * use most recent deeptools version from github and most recent MACS2 version + + + + ALNvX_histone.bam + + collection + + + + ALNvX_input.bam + + single + + + + ALNvX.bai + + collection + Index files are renamed internally to .bam.bai since deepTools is expecting index naming like this + + + + + filtered_regions + BED + single + ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + plot_regions + BED + single + Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations + + + + + samplesID.PROCESS.DATE.corplot.cormethod + deepTools graphics PNG + single + + + + samplesID.PROCESS.DATE.fgprplot + deepTools graphics PNG + single + + + + sampleID.PROCESS.DATE.gcbplot + deepTools graphics PNG + collection + + + + sampleID.PROCESS.DATE.gcbfreq + tab-separated text file + collection + + + + sampleIDs.PROCESS.DATE.bamcomp.scalemethod + bigwig + collection + Always compare a signal vs the input + + + sampleID.PROCESS.DATE.bamcov.seqDepthNorm + bigwig + collection + + + + sampleID.PROCESS.DATE.ctrlreg + graphics PNG + collection + + + + + + region_filter.py + 0.1 + + + Script to generate a temporary BAM file with ENCODE blacklist regions excluded, only relevant for bamCorrelate tool. The filtered BAM file is discarded at the end of this process + + + bamCorrelate (deepTools) + 1.5.7-5-gcbab7b3 + + + Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples + + + bamFingerprint (deepTools) + 1.5.7-5-gcbab7b3 + + + + + + computeGCBias (deepTools) + 1.5.7-5-gcbab7b3 + + + + + + MACS2 + 2.0.10.20131216 (tag:beta) + + + parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3; default q-value cutoff of 0.05 is recommended by the author at least for broad marks and approved by A. Richter for all marks + + + bamCompare (deepTools) + 1.5.7-5-gcbab7b3 + + + scaling_method: "readCount" for samples H3K27me3/H3K9me3, "SES" else + + + bamCoverage (deepTools) + 1.5.7-5-gcbab7b3 + + + report read coverage normalized to 1x sequencing depth + + + signal_plotter.py + 0.1 + + + + + + diff --git a/docs/quantification/chip-seq/CHPv2.xml b/docs/quantification/chip-seq/CHPv2.xml new file mode 100644 index 0000000..2a30a80 --- /dev/null +++ b/docs/quantification/chip-seq/CHPv2.xml @@ -0,0 +1,185 @@ + + + + CHP + 2 + + Andreas Richter, Peter Ebert + arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de + + + Process CHPv2 has been created to correct a couple of mistakes in the v1 process description and - more importantly - since new software versions have been installed on the DEEP cluster at DAC/MPI-Inf. This process takes as input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks. Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control) in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the quality of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for layout reasons and independent for each region, i.e. plots of different regions cannot be compared directly. + + + + GALvX_histone + BAM + collection + + + + GALvX_input + BAM + single + + + + GALvX_index + BAM index + collection + Index files are renamed internally to .bam.bai since deepTools is expecting index naming like this + + + + + filtered_regions + BED + single + ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + plot_regions + BED + single + Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations + + + + + samplesID.PROCESS.DATE.corplot.cormethod + deepTools graphics PNG + single + + + + samplesID.PROCESS.DATE.fgprplot + deepTools graphics PNG + single + + + + sampleID.PROCESS.DATE.gcbplot + deepTools graphics PNG + collection + + + + sampleID.PROCESS.DATE.gcbfreq + tab-separated text file + collection + + + + sampleID.PROCESS.DATE._peaks.xls + XLS table + collection + Standard MACS2 output XLS table for broad and narrow marks + + + sampleID.PROCESS.DATE._peaks.broadPeak + broadPeak + collection + Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses + + + sampleID.PROCESS.DATE._peaks.gappedPeak + gappedPeak + collection + Standard MACS2 output in ENCODE's gappedPeak format for broad marks + + + sampleID.PROCESS.DATE._summits.bed + BED + collection + Standard MACS2 output for narrow marks + + + sampleID.PROCESS.DATE._peaks.narrowPeak + narrowPeak + collection + Standard MACS2 output for narrow marks, this file is usually used for subsequent analyses + + + sampleIDs.PROCESS.DATE.bamcomp.scalemethod + bigwig + collection + Input-normalized histone signal tracks + + + sampleID.PROCESS.DATE.bamcov.seqDepthNorm + bigwig + collection + Sequencing-depth normalized signal coverage tracks + + + sampleID.PROCESS.DATE.ctrlreg + graphics PNG + collection + A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot. + + + + + region_filter.py + 0.1 + + GALvX_histone, GALvX_input + Script to generate a temporary BAM file with ENCODE blacklist regions excluded, only relevant for bamCorrelate tool. The filtered BAM files are discarded at the end of this process. One temporary file per histone mark plus the input is generated. + + + bamCorrelate (deepTools) + 1.5.8.1 + + no looping + Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples + + + bamFingerprint (deepTools) + 1.5.8.1 + + no looping + + + + computeGCBias (deepTools) + 1.5.8.1 + + GALvX_histone, GALvX_input + + + + MACS2 + macs2 2.1.0.20140616 + + GALvX_histone + parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3; default q-value cutoff of 0.05 is recommended by the author at least for broad marks and approved by A. Richter for all marks + + + bamCompare (deepTools) + 1.5.8.1 + + GALvX_histone + scaling_method: "readCount" for samples H3K27me3/H3K9me3, "SES" else + + + bamCoverage (deepTools) + 1.5.8.1 + + GALvX_histone, GALvX_input + report read coverage normalized to 1x sequencing depth + + + signal_plotter.py + 0.1 + + only for histone marks: sampleID.PROCESS.DATE.bamcov.seqDepthNorm + + + + diff --git a/docs/quantification/chip-seq/CHPv3.xml b/docs/quantification/chip-seq/CHPv3.xml new file mode 100644 index 0000000..d3d698e --- /dev/null +++ b/docs/quantification/chip-seq/CHPv3.xml @@ -0,0 +1,224 @@ + + + + CHP + 3 + + Andreas Richter, Peter Ebert + arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de + + + Process CHPv3 had to be created due to software updates and since it is now policy to remove reads marked as duplicates prior to the peak calling step. Additionally, since the + upstream alignment processes have stabilized, it is now assumed that a QC summary file is always available that holds information on the fragment length in the respective library. + Otherwise, this process version still has the identical workflow compared to previous versions. + This process takes as data input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks. + Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would + subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control) + in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the genome-wide quality + of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for + layout reasons and cannot be taken as a solid data normalization procedure. + + Erratum: the metadata in the .amd.tsv file generated by this process are partially wrong. + Specifically, the FRiP score is too low (~ too pessimistic) since it is calculated using all reads instead + of just the fraction of mapped reads. + + + + GALvX_Histone + BAM + collection + + + + GALvX_Input + BAM + single + + + + GALvX_Index + BAI index + collection + Note that there is no separation of Histone and Input for the index files as one index file is required per BAM file irrespective of its type + + + GALvX_QcSummary + TXT + collection + The median fragment length is computed during the mapping in the GAL process and stored in the QcSummary file. This information is used in the CHP process. + + + + + blacklist_regions + BED + single + ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + control_regions + BED + single + Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations + + + + + DEEPID.PROC.DATE.bamcorr + SVG + single + joint deepTools graphics output for all histone marks plus input control + + + DEEPID.PROC.DATE.bamfgpr + SVG + single + joint deepTools graphics output for all histone marks plus input control + + + DEEPID.PROC.DATE.gcbias + SVG + collection + deepTools graphics output + + + DEEPID.PROC.DATE.gcfreq + tab-separated text file + collection + Required output file, currently not used for any downstream analysis + + + DEEPID.PROC.DATE.peaks + XLS table + collection + Standard MACS2 output XLS table for broad and narrow marks + + + DEEPID.PROC.DATE.broadPeak + broadPeak + collection + Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses + + + DEEPID.PROC.DATE.gappedPeak + gappedPeak + collection + Standard MACS2 output in ENCODE's gappedPeak format for broad marks + + + DEEPID.PROC.DATE.summits + BED + collection + Standard MACS2 output for narrow marks + + + DEEPID.PROC.DATE.narrowPeak + narrowPeak + collection + Standard MACS2 output for narrow marks, this file is usually used for downstream analyses + + + DEEPID.PROC.DATE.bamcomp + bigwig + collection + Input-normalized histone signal tracks + + + DEEPID.PROC.DATE.bamcov + bigwig + collection + Sequencing-depth normalized signal coverage tracks + + + DEEPID.PROC.DATE.control + SVG + collection + A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot. + + + + + samtools + 1.2 + + GALvX_Histone, GALvX_Input + Output is piped to next step + + + bedtools + 2.20.1 + + GALvX_Histone, GALvX_Input + Input is piped from previous step, output is piped to next step + + + samtools + 1.2 + + GALvX_Histone, GALvX_Input + Input is piped from previous step and output written to a temporary BAM file that is discarded after the analysis + + + bamCorrelate + 1.5.9.1 + + no looping + Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples + + + bamFingerprint + 1.5.9.1 + + no looping + + + + computeGCBias + 1.5.9.1 + + GALvX_Histone, GALvX_Input + + + + Picardtools + 1.130 + + GALvX_Histone, GALvX_Input + All reads marked as duplicates are removed for the peak calling. The output of this command is temporary and discarded after the peak calling. + + + MACS2 + 2.1.0.20140616 + + GALvX_Histone + parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3 + + + bamCompare + 1.5.9.1 + + GALvX_Histone + generates fold-change signal tracks; scaling_method: "readCount" for samples H3K27me3/H3K9me3, "SES" else + + + bamCoverage + 1.5.9.1 + + GALvX_Histone, GALvX_Input + report read coverage normalized to 1x sequencing depth + + + potty_plotty.py + 0.2 + + GALvX_Histone + the binsize is selected according to the default value for deepTools (ie 50bp) + + + diff --git a/docs/quantification/chip-seq/CHPv4.xml b/docs/quantification/chip-seq/CHPv4.xml new file mode 100644 index 0000000..ac755d2 --- /dev/null +++ b/docs/quantification/chip-seq/CHPv4.xml @@ -0,0 +1,243 @@ + + + + CHP + 4 + + Andreas Richter, Peter Ebert + arichter@ie-freiburg.mpg.de, pebert@mpi-inf.mpg.de + + + Process CHPv4 is a minor update of the previous version that accounts for unresolved quality problems mainly in the mouse ChIP data. + The mouse data show a higher duplication rate on average (though not as high as some of the human cell line samples), plus the heterochromatic marks + show problematic behavior in the correlation control plot (using Pearson; this process now computes the Pearson correlation on blacklist-filtered and duplicates-removed + BAM files). Currently, it is unclear if that is simply due to a sub-optimal blacklist for mouse or indeed indicative of strong outliers in the data. + Any downstream analysis should thoroughly check for outliers interfering with detected signal. This process creates an additional QC plot + using the Spearman correlation metric that is more robust w.r.t. outliers. Additionally, coverage tracks are also created from duplicate-free BAM files. + Otherwise, this process is identical to the previous version. + It is assumed that a QC summary file is always available that holds information on the fragment length in the respective library. + This process takes as data input aligned reads coming from the DCC/DKFZ and creates individual and comparative signal tracks as well as peak files for the different histone marks. + Note that before the correlation among all files is computed, a couple of known problematic regions are removed that usually show a spurious read distribution that would + subsequently lead to an inaccurate correlation among the files. The last step of this process plots the coverage of the histone signal (and, if available, of the input control) + in a few selected control regions (for details, contact Andreas Richter). Note that these plots are by no means suited to interpret the data or judge the genome-wide quality + of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; the scaling of the values is performed for + layout reasons and cannot be taken as a solid data normalization procedure. + + Erratum: the metadata in the .amd.tsv file generated by this process are partially wrong. + Specifically, the FRiP score is too low (~ too pessimistic) since it is calculated using all reads instead + of just the fraction of mapped reads. + + + + GALvX_Histone + BAM + collection + + + + GALvX_Input + BAM + single + + + + GALvX_Index + BAI index + collection + Note that there is no separation of Histone and Input for the index files as one index file is required per BAM file irrespective of its type + + + GALvX_QcSummary + TXT + collection + The median fragment length is computed during the mapping in the GAL process and stored in the QcSummary file. This information is used in the CHP process. + + + + + blacklist_regions + BED + single + ENCODE blacklist extended by A. Richter (FB); see DCC/download/results/references/annotations + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + control_regions + BED + single + Control regions obtained from A. Richter (FB) for quality control of ChIPseq samples; see DCC/download/results/references/annotations + + + + + DEEPID.PROC.DATE.bamcorr + SVG + collection + joint deepTools graphics output for all histone marks plus input control; one using Pearson and one using Spearman metric + + + DEEPID.PROC.DATE.bamfgpr + SVG + single + joint deepTools graphics output for all histone marks plus input control + + + DEEPID.PROC.DATE.gcbias + SVG + collection + deepTools graphics output + + + DEEPID.PROC.DATE.gcfreq + tab-separated text file + collection + Required output file, currently not used for any downstream analysis + + + DEEPID.PROC.DATE.peaks + XLS table + collection + Standard MACS2 output XLS table for broad and narrow marks + + + DEEPID.PROC.DATE.broadPeak + broadPeak + collection + Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses + + + DEEPID.PROC.DATE.gappedPeak + gappedPeak + collection + Standard MACS2 output in ENCODE's gappedPeak format for broad marks + + + DEEPID.PROC.DATE.summits + BED + collection + Standard MACS2 output for narrow marks + + + DEEPID.PROC.DATE.narrowPeak + narrowPeak + collection + Standard MACS2 output for narrow marks, this file is usually used for downstream analyses + + + DEEPID.PROC.DATE.bamcomp + bigwig + collection + Input-normalized histone signal tracks + + + DEEPID.PROC.DATE.bamcov + bigwig + collection + Sequencing-depth normalized signal coverage tracks; created on raw BAM files and on duplicate-filtered BAM files + + + DEEPID.PROC.DATE.control + SVG + collection + A plot of a set of control regions for each histone mark (histone and input signal). Attention: this plot can only be used for a rough quality assessment (experiment fail or success), you cannot base any interpretation on this plot. + + + + + samtools + 1.2 + DEEPID.tmp.nodup.bam ]]> + GALvX_Histone, GALvX_Input + All reads marked as duplicates are removed for peak calling and read coverage computation . The output of this command is temporary and discarded after the analysis. + + + MACS2 + 2.1.0.20140616 + + GALvX_Histone + parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3 + + + bamCoverage + 1.5.9.1 + + DEEPID.tmp.nodup.bam + report read coverage normalized to 1x sequencing depth based on duplicate-removed BAM files + + + samtools + 1.2 + + GALvX_Histone, GALvX_Input + Output is piped to next step + + + bedtools + 2.20.1 + + GALvX_Histone, GALvX_Input + Input is duplicate filtered BAM files, output is piped to next step + + + samtools + 1.2 + + GALvX_Histone, GALvX_Input + Input is piped from previous step and output written to a temporary BAM file that is discarded after the analysis + + + bamCorrelate + 1.5.9.1 + + no looping + Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples + + + bamCorrelate + 1.5.9.1 + + no looping + Window/bin size of 1kb since multiple narrow signals will be merged with default value (10kb), 1m samples + + + bamFingerprint + 1.5.9.1 + + no looping + Input is all raw BAM files + + + computeGCBias + 1.5.9.1 + + GALvX_Histone, GALvX_Input + Input is raw BAM file + + + bamCompare + 1.5.9.1 + + GALvX_Histone + generates log2 fold-change tracks of signal over input; scaling_method: "readCount" for samples H3K27me3/H3K9me3, "SES" else + + + bamCoverage + 1.5.9.1 + + GALvX_Histone, GALvX_Input + report read coverage normalized to 1x sequencing depth based on unfiltered BAM files + + + potty_plotty.py + 0.2 + + GALvX_Histone + the binsize 25 is selected in agreement with the value for deepTools + + + diff --git a/docs/quantification/dnase-seq/DHSv1.xml b/docs/quantification/dnase-seq/DHSv1.xml new file mode 100644 index 0000000..2e8648c --- /dev/null +++ b/docs/quantification/dnase-seq/DHSv1.xml @@ -0,0 +1,129 @@ + + + + DHS + 1 + + Karl Nordström, Peter Ebert + karl.nordstroem@uni-saarland.de, pebert@mpi-inf.mpg.de + + + Process DHS is modelled using the CHP process as blueprint + + + + ALNvX_dnase.bam + + single + + + + ALNvX.bai + + single + Index files are renamed internally to .bam.bai since deepTools is expecting index naming like this + + + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + plot_regions + BED + single + Control regions? + + + + + samplesID.PROCESS.DATE.corplot.cormethod + deepTools graphics PNG + single + + + + samplesID.PROCESS.DATE.fgprplot + deepTools graphics PNG + single + + + + sampleID.PROCESS.DATE.gcbplot + deepTools graphics PNG + collection + + + + sampleID.PROCESS.DATE.gcbfreq + tab-separated text file + collection + + + + sampleIDs.PROCESS.DATE.bamcomp.scalemethod + bigwig + collection + Always compare a signal vs the input + + + sampleID.PROCESS.DATE.bamcov.seqDepthNorm + bigwig + collection + + + + sampleID.PROCESS.DATE.ctrlreg + graphics PNG + collection + + + + + + region_filter.py + 0.1 + + + Script to generate a temporary BAM file with ENCODE blacklist regions excluded, only relevant for bamCorrelate tool. The filtered BAM file is discarded at the end of this process + + + bamFingerprint (deepTools) + 1.5.7-5-gcbab7b3 + + + + + + computeGCBias (deepTools) + 1.5.7-5-gcbab7b3 + + + + + + MACS2 + 2.0.10.20131216 (tag:beta) + + + parameter "--broad" for samples H3K4me1/H3K27me3/H3K36me/H3K9me3; default q-value cutoff of 0.05 is recommended by the author at least for broad marks and approved by A. Richter for all marks + + + bamCoverage (deepTools) + 1.5.7-5-gcbab7b3 + + + report read coverage normalized to 1x sequencing depth + + + signal_plotter.py + 0.1 + + + + + + \ No newline at end of file diff --git a/docs/quantification/dnase-seq/DHSv2.xml b/docs/quantification/dnase-seq/DHSv2.xml new file mode 100644 index 0000000..9aa5766 --- /dev/null +++ b/docs/quantification/dnase-seq/DHSv2.xml @@ -0,0 +1,114 @@ + + + + DHS + 2 + + Karl Nordström, Peter Ebert + karl.nordstroem@uni-saarland.de, pebert@mpi-inf.mpg.de + + + Process DHSv2 has been created to correct a couple of mistakes in the v1 process description and - more importantly - since new software versions have been installed on the DEEP cluster at DAC/MPI-Inf. + This process takes as input aligned reads coming from the DCC/DKFZ and creates a coverage signal track as well as peak files for a DNase BAM file. + The last step of this process plots the coverage of the histone signal (and, if available, of the input control) in a few selected control regions. + Note that these plots are by no means suited to interpret the data or judge the quality of the entire dataset - the plot of the control regions just shows regions with expected high or low signal compared to the input; + the scaling of the values is performed for layout reasons and independently for each region, i.e. plots of different regions cannot be compared directly. + + + + ALNvX_dnase.bam + + single + + + + ALNvX.bai + + single + Index files are renamed internally to .bam.bai since deepTools is expecting index naming like this + + + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + plot_regions + BED + single + Control regions + + + + + samplesID.PROCESS.DATE.fgprplot + deepTools graphics PNG + single + + + + sampleID.PROCESS.DATE.gcbplot + deepTools graphics PNG + single + + + + sampleID.PROCESS.DATE.gcbfreq + tab-separated text file + single + + + + sampleID.PROCESS.DATE.bamcov.seqDepthNorm + bigwig + single + + + + sampleID.PROCESS.DATE.ctrlreg + graphics PNG + single + + + + + + bamFingerprint (deepTools) + 1.5.8.1 + + no looping + + + + computeGCBias (deepTools) + 1.5.8.1 + + no looping + + + + MACS2 + macs2 2.1.0.20140616 + + no looping + we consider DNase to give broad enriched regions, thus parameter "--broad" by default; default q-value cutoff of 0.05 is recommended by MACS2 author for broad marks + + + bamCoverage (deepTools) + 1.5.8.1 + + no looping + report read coverage normalized to 1x sequencing depth + + + signal_plotter.py + 0.1 + + no looping + + + + \ No newline at end of file diff --git a/docs/quantification/dnase-seq/DHSv3.xml b/docs/quantification/dnase-seq/DHSv3.xml new file mode 100644 index 0000000..48937a1 --- /dev/null +++ b/docs/quantification/dnase-seq/DHSv3.xml @@ -0,0 +1,124 @@ + + + + DHS + 3 + + Karl Nordström, Peter Ebert + karl.nordstroem@uni-saarland.de, pebert@mpi-inf.mpg.de + + + Process DHSv3 has been created after software updates and to correct the command line call for the MACS peak caller. This process takes as input aligned reads coming from + the DCC/DKFZ and creates a coverage signal track as well as peak files for a DNase BAM file. The last step of this process plots the coverage of the DNase signal in a + few selected control regions (for details, contact Nina Gasparoni, UdS Walter). Note that this plot is by no means suited to interpret the data or judge the genome-wide + quality of the dataset - the plot of the control regions just shows regions with expected high or low signal. The scaling of the data values is done for layout reasons. + + + + GALvX_DNase + BAM + single + + + + GALvX_Index + BAI + single + + + + GALvX_QcSummary + TXT + single + This file contains QC information collected during the mapping. The median fragment length of the library is extracted from that file. + + + + + reference_genome + 2bit + single + The reference genome file; see DCC/download/results/references/genomes + + + control_regions + BED + single + Control regions, i.e. regions expected to be open or closed + + + + + DEEPID.PROC.DATE.gcbias + SVG + single + deepTools graphics output + + + DEEPID.PROC.DATE.gcfreq + tab-separated text file + single + Required output file, currently not used for any downstream analysis + + + DEEPID.PROC.DATE.peaks + XLS table + single + Standard MACS2 output XLS table for broad and narrow marks + + + DEEPID.PROC.DATE.broadPeak + broadPeak + single + Standard MACS2 output in ENCODE's broadPeak format for broad marks, this file is usually used for subsequent analyses + + + DEEPID.PROC.DATE.gappedPeak + gappedPeak + single + Standard MACS2 output in ENCODE's gappedPeak format for broad marks + + + DEEPID.PROC.DATE.bamcov + bigwig + single + Sequencing-depth normalized signal coverage tracks + + + DEEPID.PROC.DATE.control + SVG + single + Plot of signal value in a set of control regions + + + + + computeGCBias + 1.5.9.1 + + no looping + + + + MACS2 + 2.1.0.20140616 + + no looping + + + + bamCoverage + 1.5.9.1 + + no looping + report read coverage normalized to 1x sequencing depth + + + potty_plotty.py + 0.2 + + no looping + the binsize is selected according to the default value for deepTools (ie 50bp) + + + \ No newline at end of file