diff --git a/docs/alignment/bisulfite/RBAv0.xml b/docs/alignment/bisulfite/RBAv0.xml
new file mode 100644
index 0000000..a0f53df
--- /dev/null
+++ b/docs/alignment/bisulfite/RBAv0.xml
@@ -0,0 +1,219 @@
+
+
+
+ RBA
+ 0
+
+ Karl Nordström, Charles Imbusch
+ karl.nordstroem@uni-saarland.de
+
+
+ The RBAv0 pipeline is a cloned version of the DEEP BAL process. It trims and aligns RRBS data to a reference genome.
+
+ 0. Generation of MethylCtools reference index
+ 1. trim reads with Trim Galore! (Cutadapt)
+ 2. Map reads with MethylCtools (BWA)
+ 3. Merge bam files with Picard tools
+ 4. Generate a flagstat file
+
+ Step 0 is run manually and only once.
+
+
+
+ sampleID_R1.fastq.gz
+ FASTQ
+ collection
+ The current implementation takes a folder as input and trims and maps all fastq files in the folder
+
+
+
+
+ {ASSEMBLY}.fa
+ FASTA
+ single
+ fasta file containing genomic reference sequence
+
+
+
+
+ {OUTNAME}.bam
+ BAM
+ single
+ The resulting alignment in BAM format
+
+
+ {OUTNAME}.bam.bai
+ BAI
+ single
+ Index file for the alignment
+
+
+ {OUTNAME}.coverage.bw
+ bigWig
+ single
+ Coverage track in bigWig format.
+
+
+ {OUTNAME}.flagstat
+ TXT
+ single
+ The output from samtools flagstat
+
+
+ {OUTNAME}.rawCov
+ TXT
+ single
+ Contains a single value, the average genomic coverage
+
+
+
+
+ methylCtools
+ 0.9.2
+
+ no looping
+ Introduce C to T conversions to both strands. Only runs if the converted file does not exist
+
+
+ bwa
+ 0.7.12-r1039
+
+ no looping
+ generate the bwa index file. This step only runs if the index does not exist
+
+
+ Trim Galore!
+ 0.3.3
+
+ sampleID_R1.fastq.gz
+ This step is a one to one process trimming all fastq files. The reads are filtered for the default adapter (AGATCGGAAGAGC) and quality below 20
+
+
+ methylCtools
+ 0.9.2
+
+ sampleID_R1.trimmed.fq.gz
+ A one to one process preparing the trimmed files for mapping by converting C to T and storing converted positions in the header
+
+
+ bwa
+ 0.7.12-r1039
+ PIPE1.sai]]>
+ sampleID_R1.conv.fq
+ A one to one process mapping each file. Again a quality cutoff at 20. This step is piped to the next.
+
+
+ bwa
+ 0.7.12-r1039
+ PIPE2.sam]]>
+ PIPE1.sai
+ A one to one process converting each bwa alignment from sai to sam format
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+ PIPE3.bam]]>
+ PIPE2.sam
+ A one to one process converting the alignment to bam format
+
+
+ methylCtools
+ 0.9.2
+
+ PIPE3.bam
+ A one to one process converting the reads in the alignment files back to their raw format, undoing the C to T conversion.
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+ PIPE5.sam]]>
+ PIPE4.bam
+ A one to one process reconverting to sam format in order to correct some peccularities introduced by bwa
+
+
+ awk
+ 4.0.1
+ PIPE6.sam ]]>
+ PIPE5.sam
+ A one to one process removing all mapping information present for unmapped reads. Sometimes bwa add this for unmapped reads.
+
+
+ Picardtools
+ 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010)
+
+ PIPE6.sam
+ A one to one process adding reads to readgroups in accordance to FLOWCELL and LANE, which are replaced to the corresponding values.
+
+
+ Picardtools
+ 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010)
+
+ no looping
+ Merges all the generated bam files. If multiple fastq files were used as input, I=sampleID_R1.bam has to be multiplied to point to all the generated bam files.
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+
+ no looping
+ Generating the index file {OUTNAME}.bam.bai
+
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+ PIPE7.txt]]>
+ no looping
+ Extracting the header of the bam file in order to get chromosome lengths for the generation of the coverage file
+
+
+ awk
+ 4.0.1
+ ref.lengths]]>
+ no looping
+ Extracting the chromosome lengths from the sam header
+
+
+ bedtools
+ v2.20.1
+ coverage.bw.tmp ]]>
+ no looping
+ Calculating base pair resolution coverage in bed graph format
+
+
+ bedGraphToBigWig
+ v 4
+
+ no looping
+ converting the bedgraph file to bigWig format
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+ PIPE8.bam ]]>
+ no looping
+ Filtering non-primary (256) and multiple-mapping reads (1024) before calculating average coverage
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+ PIPE9.txt]]>
+ no looping
+ Converting to pileup format, limiting to regions with a coverage below 100000
+
+
+ awk
+ 4.0.1
+ {OUTNAME}.rawCov ]]>
+ no looping
+ calculating the average
+
+
+ samtools
+ 1.2 (using htslib 1.2.1)
+ {OUTNAME}.flagstat]]>
+ no looping
+ generating the flagstat file
+
+
+
diff --git a/docs/alignment/genome/GALv1.xml b/docs/alignment/genome/GALv1.xml
new file mode 100644
index 0000000..bef8a57
--- /dev/null
+++ b/docs/alignment/genome/GALv1.xml
@@ -0,0 +1,211 @@
+
+
+
+ GAL
+ 1
+
+ Barbara Hutter
+ b.hutter@dkfz.de
+
+
+
+ * mapping of raw sequences to the reference genome
+ - N pairs of fastq files that are processed into bam files separately and merged into one at the end
+
+
+
+
+ SampleID_R1
+ FASTQ
+ collection
+ raw input file with forward read of the pair ("read1"), pre-filtered for illumina chastity filter failed reads
+
+
+ SampleID_R2
+ FASTQ
+ collection
+ raw input file with reverse read of the pair ("read2"), pre-filtered for illumina chastity filter failed reads
+
+
+
+
+ reference_genome
+ FASTA
+ single
+ The reference genome file; see aspera.dkfz.de > download > results > references > genomes > human/mouse > WholeGenome
+
+
+
+
+
+ DEEPID.PROC.DATE.bam
+ BAM
+ single
+ the bam file merged from all input fastq files, duplicates are marked
+
+
+ DEEPID.PROC.DATE.bai
+ BAI
+ single
+ Corresponding BAM index file, produced during merging and duplicate marking
+
+
+ DEEPID.PROC.DATE.flagstats
+ text
+ single
+ simple alignment statistics of the merged, duplicate marked bam
+
+
+ DEEPID.PROC.DATE.QcSummary
+ text
+ single
+ A summary of aligment statistics such as number of reads, percent of aligned reads, coverage of the genome, duplication level, etc.
+
+
+ DEEPID.PROC.DATE.PicardMarkDupmetrics
+ text
+ single
+ Produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardAlignmentSummarymetrics
+ text
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardInsertSizemetrics
+ text
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardQualityByCyclemetrics
+ text
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardQualityDistributionmetrics
+ text
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardInsertSizeHistogram
+ PDF
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardQualityByCyclemetrics
+ PDF
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+ DEEPID.PROC.DATE.PicardQualityDistributionmetrics
+ PDF
+ single
+ produced by Picard CollectMultipleMetrics
+
+
+
+
+ bwa
+ cnybwa-0.6.2
+ sampleID_R*.sai ]]>
+ SampleID_R*
+ production of an intermediate .sai file for each read1 and read2 fastq file, performed on convey machines. cnybwa-0.6.2 is a hardware re-implementation of bwa version 0.6.2. t is the number of threads, -q the parameter for iterative quality trimming of the read down to 35 bp
+
+
+ bwa
+ 0.6.2-tpx
+
+ sampleID_Sampe_output
+ ]]>
+
+ SampleID_R*
+
+ Pairing of reads to SAM format, piped to next step (samtools view).
+ Parameters: -a to set maximum insert size to 1000 bp, -t number of threads, -P pre-load index, -T use original buffer size.
+ The parameter readgroupinformation is initialized in the script as "@RG\tID:$ID\tSM:$SM\tLB:$LB\tPL:ILLUMINA",
+ where $ID is composed of run and lane (e.g. run140918_SN7001180_0145_C451VACXX_44_Mm08_WEAd_Db1_H3K9me3_F_1_ACAGTG_L001),
+ $SM the sampletype (e.g. sample_replicate1-H3K9me3_44_Mm08_WEAd_Db1),
+ and $LB the library (e.g. replicate1-H3K9me3_44_Mm08_WEAd_Db1).
+ These variables are constructed according to the file path of the fastq files.
+
+
+
+ samtools
+ 0.1.19
+ bamfile ]]>
+ sampleID_Sampe_output
+ Input piped from previous step (bwa sampe), conversion of SAM to BAM and sorting by coordinate
+
+
+ Picard
+ 1.125
+
+
+
+ no looping
+
+ Merging of per-lane bam files, marking of duplicates and index creation {DEEPID.PROC.DATE.bai}.
+ The Picard commandline gets I=bamfile for each bam file as input, which is simplified above in the command line.
+ Was previously version 1.61, from January 2015 on version 1.125
+
+
+
+ samtools
+ 0.1.19
+ {DEEPID.PROC.DATE.flagstats} ]]>
+ no looping
+ simple alignment statistics of the merged, duplicate marked bam
+
+
+
+ Picard
+ 1.61
+
+
+
+ no looping
+
+ creates several output files:
+ DEEPID.PROC.DATE.PicardAlignmentSummarymetrics
+ DEEPID.PROC.DATE.PicardInsertSizemetrics
+ DEEPID.PROC.DATE.PicardQualityByCyclemetrics
+ DEEPID.PROC.DATE.PicardQualityDistributionmetrics
+ DEEPID.PROC.DATE.PicardQualityByCyclemetrics
+ DEEPID.PROC.DATE.PicardQualityDistributionmetrics
+
+
+
+ QCsummary
+ n/a
+
+ {DEEPID.PROC.DATE.QcSummary}
+ ]]>
+
+ no looping
+ a custom perl script that also reads in files that are not relevant for DEEP since it is part of the DKFZ whole genome pipeline.
+
+
+