Skip to content
Permalink
9a7d6419a0
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
211 lines (210 sloc) 8.62 KB
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
<name>GAL</name>
<version>1</version>
<author>
<name>Barbara Hutter</name>
<email>b.hutter@dkfz.de</email>
</author>
<!-- Following section: free text description of process (what, how, why) -->
<description>
* mapping of raw sequences to the reference genome
- N pairs of fastq files that are processed into bam files separately and merged into one at the end
</description>
<!-- Following section: list input files [samples to be analysed and similar] -->
<inputs>
<filetype>
<identifier>SampleID_R1</identifier>
<format>FASTQ</format>
<quantity>collection</quantity>
<comment>raw input file with forward read of the pair ("read1"), pre-filtered for illumina chastity filter failed reads</comment>
</filetype>
<filetype>
<identifier>SampleID_R2</identifier>
<format>FASTQ</format>
<quantity>collection</quantity>
<comment>raw input file with reverse read of the pair ("read2"), pre-filtered for illumina chastity filter failed reads</comment>
</filetype>
</inputs>
<references>
<filetype>
<identifier>reference_genome</identifier>
<format>FASTA</format>
<quantity>single</quantity>
<comment>The reference genome file; see aspera.dkfz.de > download > results > references > genomes > human/mouse > WholeGenome</comment>
</filetype>
</references>
<!-- Following section: list input files [samples to be analysed and similar] -->
<outputs>
<filetype>
<identifier>DEEPID.PROC.DATE.bam</identifier>
<format>BAM</format>
<quantity>single</quantity>
<comment>the bam file merged from all input fastq files, duplicates are marked</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.bai</identifier>
<format>BAI</format>
<quantity>single</quantity>
<comment>Corresponding BAM index file, produced during merging and duplicate marking</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.flagstats</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>simple alignment statistics of the merged, duplicate marked bam</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.QcSummary</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>A summary of aligment statistics such as number of reads, percent of aligned reads, coverage of the genome, duplication level, etc.</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardMarkDupmetrics</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>Produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardAlignmentSummarymetrics</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardInsertSizemetrics</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardQualityByCyclemetrics</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardQualityDistributionmetrics</identifier>
<format>text</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardInsertSizeHistogram</identifier>
<format>PDF</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardQualityByCyclemetrics</identifier>
<format>PDF</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
<filetype>
<identifier>DEEPID.PROC.DATE.PicardQualityDistributionmetrics</identifier>
<format>PDF</format>
<quantity>single</quantity>
<comment>produced by Picard CollectMultipleMetrics</comment>
</filetype>
</outputs>
<software>
<tool>
<name>bwa</name>
<version>cnybwa-0.6.2</version>
<command_line><![CDATA[ cnybwa-0.6.2 aln -t 12 -q 20 {reference_genome} {SampleID_R*} > sampleID_R*.sai ]]></command_line>
<loop>SampleID_R*</loop>
<comment>production of an intermediate .sai file for each read1 and read2 fastq file, performed on convey machines. cnybwa-0.6.2 is a hardware re-implementation of bwa version 0.6.2. t is the number of threads, -q the parameter for iterative quality trimming of the read down to 35 bp</comment>
</tool>
<tool>
<name>bwa</name>
<version>0.6.2-tpx</version>
<command_line>
<![CDATA[
bwa sampe -P -a 1000 -T -t 8 -r readgroupinformation {reference_genome}
sampleID_R1.sai sampleID_R2.sai {SampleID_R1} {SampleID_R2} > sampleID_Sampe_output
]]>
</command_line>
<loop>SampleID_R*</loop>
<comment>
Pairing of reads to SAM format, piped to next step (samtools view).
Parameters: -a to set maximum insert size to 1000 bp, -t number of threads, -P pre-load index, -T use original buffer size.
The parameter readgroupinformation is initialized in the script as "@RG\tID:$ID\tSM:$SM\tLB:$LB\tPL:ILLUMINA",
where $ID is composed of run and lane (e.g. run140918_SN7001180_0145_C451VACXX_44_Mm08_WEAd_Db1_H3K9me3_F_1_ACAGTG_L001),
$SM the sampletype (e.g. sample_replicate1-H3K9me3_44_Mm08_WEAd_Db1),
and $LB the library (e.g. replicate1-H3K9me3_44_Mm08_WEAd_Db1).
These variables are constructed according to the file path of the fastq files.
</comment>
</tool>
<tool>
<name>samtools</name>
<version>0.1.19</version>
<command_line><![CDATA[ cat sampleID_Sampe_output | samtools view -uSbh - | samtools sort -o - > bamfile ]]></command_line>
<loop>sampleID_Sampe_output</loop>
<comment>Input piped from previous step (bwa sampe), conversion of SAM to BAM and sorting by coordinate</comment>
</tool>
<tool>
<name>Picard</name>
<version>1.125</version>
<command_line>
<![CDATA[
java8 -Xmx50G -jar picard-tools-1.125.jar MarkDuplicates I=bamfile*
OUTPUT={DEEPID.PROC.DATE.bam} VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=FALSE
ASSUME_SORTED=TRUE CREATE_INDEX=TRUE MAX_RECORDS_IN_RAM=12500000
METRICS_FILE={DEEPID.PROC.DATE.PicardMarkDupmetrics}
]]>
</command_line>
<loop>no looping</loop>
<comment>
Merging of per-lane bam files, marking of duplicates and index creation {DEEPID.PROC.DATE.bai}.
The Picard commandline gets I=bamfile for each bam file as input, which is simplified above in the command line.
Was previously version 1.61, from January 2015 on version 1.125
</comment>
</tool>
<tool>
<name>samtools</name>
<version>0.1.19</version>
<command_line><![CDATA[ samtools flagstat {DEEPID.PROC.DATE.bam} > {DEEPID.PROC.DATE.flagstats} ]]></command_line>
<loop>no looping</loop>
<comment>simple alignment statistics of the merged, duplicate marked bam</comment>
</tool>
<tool>
<name>Picard</name>
<version>1.61</version>
<command_line>
<![CDATA[
java -Xmx4G -cp picard-tools-1.61.jar -jar CollectMultipleMetrics.jar INPUT={DEEPID.PROC.DATE.bam}
REFERENCE_SEQUENCE={reference_genome} ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT
OUTPUT={DEEPID.PROC.DATE.Picard*} PROGRAM=CollectAlignmentSummaryMetrics
PROGRAM=CollectInsertSizeMetrics PROGRAM=QualityScoreDistribution PROGRAM=MeanQualityByCycle
]]>
</command_line>
<loop>no looping</loop>
<comment>
creates several output files:
DEEPID.PROC.DATE.PicardAlignmentSummarymetrics
DEEPID.PROC.DATE.PicardInsertSizemetrics
DEEPID.PROC.DATE.PicardQualityByCyclemetrics
DEEPID.PROC.DATE.PicardQualityDistributionmetrics
DEEPID.PROC.DATE.PicardQualityByCyclemetrics
DEEPID.PROC.DATE.PicardQualityDistributionmetrics
</comment>
</tool>
<tool>
<name>QCsummary</name>
<version>n/a</version>
<command_line>
<![CDATA[
perl writeQCsummary.pl -c samplesID.coverage.txt -d samplesID.diffchrom.txt
-f {DEEPID.PROC.DATE.flagstats} -i samplesID.insertsize.txt
-m {DEEPID.PROC.DATE.PicardMarkDupmetrics} -l "genome" -r "all_merged"
-p sampleID -s sampletype > {DEEPID.PROC.DATE.QcSummary}
]]>
</command_line>
<loop>no looping</loop>
<comment>a custom perl script that also reads in files that are not relevant for DEEP since it is part of the DKFZ whole genome pipeline.</comment>
</tool>
</software>
</process>