diff --git a/docs/quantification/transcriptome/EXPv1.xml b/docs/quantification/transcriptome/EXPv1.xml
new file mode 100644
index 0000000..a18b242
--- /dev/null
+++ b/docs/quantification/transcriptome/EXPv1.xml
@@ -0,0 +1,146 @@
+
+
+
+ EXP
+ 1
+
+ Matthias Barann
+ m.barann@ikmb.uni-kiel.de
+
+
+ * bam2wig.py: Conversion of BAM file to BigWig coverage tracks. One track per strand will be generated.
+ * htseq-count: Generates read counts on the gene level.
+ * cufflinks: Generates FPKM values for genes and transcript isoforms.
+
+
+
+ .bam
+
+ single
+ Unfiltered aligned reads
+
+
+ .bai
+
+ single
+ Index file to bam file
+
+
+
+
+ chromInfo.txt
+ text file
+ single
+ Tab delimited file containing the name and length of the reference sequences: [name][tab][length].
+
+
+ gencode.v19.annotation.gtf
+ GTF
+ single
+ Gencode gene annotation file in gene transfer format.
+
+
+ reference.fa
+ multi fasta
+ single
+ The reference genome file; see aspera.dkfz.de > download > results > references > genomes > human > WholeGenome
+
+
+
+
+ [sampleID].EXPv1.[DATE].bamcov.Forward.wig
+ wiggle
+ single
+ Forward strand wiggle file. Usually it is not necessary to keep this file.
+
+
+ [sampleID].EXPv1.[DATE].bamcov.Reverse.wig
+ wiggle
+ single
+ Reverse strand wiggle file Usually it is not necessary to keep this file.
+
+
+ [sampleID].EXPv1.[DATE].bamcov.Forward.bw
+ BigWig
+ single
+ Forward strand BigWig file. This file will only be generated if the UCSC program bamToBigWig can be found in $PATH.
+
+
+ [sampleID].EXPv1.[DATE].bamcov.Reverse.bw
+ BigWig
+ single
+ Reverse strand BigWig file. This file will only be generated if the UCSC program bamToBigWig can be found in $PATH.
+
+
+ [sampleID].EXPv1.[DATE].readcounts.txt
+ text file
+ single
+ This file contains the read counts on the gene level.
+
+
+ [sampleID].EXPv1.[DATE].genes.fpkm.tracking
+ text file
+ single
+ Output file containing the FPKM counts on the gene level.
+
+
+ [sampleID].EXPv1.[DATE].isoforms.fpkm.tracking
+ text file
+ single
+ Output file containing the FPKM counts on the isoform level.
+
+
+ [sampleID].EXPv1.[DATE].transcripts.gtf
+ gene transfer format
+ single
+ This file contains assembled transcripts.
+
+
+
+
+ Python
+ 2.7
+
+ no looping
+
+
+
+ Samtools
+ 0.1.19-44428cd
+
+ no looping
+
+
+
+ bam2wig.py
+ 2.3.9
+
+ no looping
+ The python script is part of the RSeQC software. It will convert a bam file into two wig files (one for each strand). \
+ If the UCSC program wigToBigWig can be located by the python script, the generated wig files will automatically be converted to bigWig. \
+ Please note that for some samples the wigToBigWig command might exit with errors. In this case, manually invoking the wigToBigWig \
+ command on the generated wig files can solve the problem: \
+ wigToBigWig ${_sample}_Forward.wig -s ChromInfo.txt > ${_sample}_Forward.bw
+
+
+ htseq-count
+ 0.5.4p3
+ ${_sample}.sam
+ htseq-count -s reverse -m intersection-strict -a 20 ${_sample}.sam gencode.v19.annotation.gtf > ${_sample}_htseq.txt ]]>
+
+ no looping
+ DESeq requires bam files sorted by read name (step 1). After sorting, all non-primary alignments are removed during the bam to sam conversion. \
+ Invoking htseq-count counts the number of reads per gene. Using the mode 'intersection-strict' results in a rather conservative read count. \
+ Please see http://www-huber.embl.de/users/anders/HTSeq/doc/count.html#count for further information.
+
+
+ cufflinks
+ v2.0.2
+
+
+ no looping
+ Please see http://cufflinks.cbcb.umd.edu/manual.html for further information.
+
+
+
diff --git a/docs/quantification/transcriptome/LXPv1.xml b/docs/quantification/transcriptome/LXPv1.xml
new file mode 100644
index 0000000..5177b5f
--- /dev/null
+++ b/docs/quantification/transcriptome/LXPv1.xml
@@ -0,0 +1,135 @@
+
+
+ LXP
+ 1
+
+ Anupam Sinha
+ a.sinha@ikmb.uni-kiel.de
+
+
+
+ * htseq-count: Generates read counts on the gene level.
+ * cufflinks: Generates FPKM values for genes and transcript isoforms.
+ * StringTie: Generates FPKM values for genes and transcript isoforms. Also generates .ctab files for analysis using Ballgown.
+
+
+
+
+ .bam
+
+ single
+ Unfiltered aligned reads
+
+
+
+
+
+ gencode.v19.annotation.gtf
+ GTF
+ single
+ Gencode gene annotation file in gene transfer format.
+
+
+ reference.fa
+ multi fasta
+ single
+ The reference genome file; see aspera.dkfz.de > download > results > references > genomes > human > WholeGenome
+
+
+
+
+
+ [sampleID].LXPv1.[DATE].readcounts.txt
+ text file
+ single
+ This file contains the read counts on the gene level.
+
+
+ [sampleID].LXPv1.[DATE].genes.fpkm.tracking
+ text file
+ single
+ Output file containing the FPKM counts on the gene level.
+
+
+ [sampleID].LXPv1.[DATE].isoforms.fpkm.tracking
+ text file
+ single
+ Output file containing the FPKM counts on the isoform level.
+
+
+ [sampleID].LXPv1.[DATE].transcripts.gtf
+ gene transfer format
+ single
+ This file contains assembled transcripts.
+
+
+ [sampleID].LXPv1.[DATE].stringtie.gtf
+ gene transfer format
+ single
+ This file contains assembled transcripts.
+
+
+ [sampleID].LXPv1.[DATE].ballgown
+ tab separated fields (.ctab) format
+ five
+ This is a folder containing 5 .ctab files. These .ctab files contain the expression values of exons, introns and transcripts. Two files list the internal(generated by ballgown) association ids between exons, introns, and transcripts.
+
+
+
+
+
+ Python
+ 2.7
+
+ no looping
+
+
+
+ Samtools
+ 0.1.19-44428cd
+
+ no looping
+
+
+
+ htseq-count
+ 0.6.1p1
+ samtools sort -n -@ 8 -m 4G ${_sample}.bam ${_sample}_sorted
+ samtools/samtools view -F 256 ${_sample}_sorted.bam > ${_sample}.sam
+ htseq-count -s reverse -m union -a 20 ${_sample}.sam gencode.v19.annotation.gtf > ${_sample}_htseq.txt
+
+ no looping
+ DESeq2 requires bam files sorted by read name (step 1). After sorting, all non-primary alignments are removed during the bam to sam conversion. \
+ Invoking htseq-count counts the number of reads per gene. \
+ Please see http://www-huber.embl.de/users/anders/HTSeq/doc/count.html#count for further information.
+
+
+
+ cufflinks
+ v2.0.2
+
+
+
+ no looping
+ Please see http://cufflinks.cbcb.umd.edu/manual.html for further information.
+
+
+ StringTie
+ v1.0.3
+
+
+
+ no looping
+ Please see http://ccb.jhu.edu/software/stringtie/ for further information. \
+ "-b" option creates a folder which contains the .ctab files for analysis using Ballgown. \
+ Please see https://github.com/alyssafrazee/ballgown for further information.
+
+
+
+
+
diff --git a/docs/quantification/transcriptome/SXPv1.xml b/docs/quantification/transcriptome/SXPv1.xml
new file mode 100644
index 0000000..589091e
--- /dev/null
+++ b/docs/quantification/transcriptome/SXPv1.xml
@@ -0,0 +1,207 @@
+
+
+
+ SXP
+ 1
+
+ Filippos Klironomos
+ filippos.klironomos@mdc-berlin.de
+
+
+ *) miRDeep2 pipeline involves:
+ *) mapping of reads to genome and keeping those uniquely mapped
+ *) extracting bracketing DNA of the uniquely mapped reads
+ *) RNAfold extracted sequences and keeping those that form unbifurcated hairpins
+ *) scoring putative precursors:
+ *) expect greater number of reads mapping to either the -5p or -3p strand and very little to the hairpin
+ *) short 3' duplex overhang characteristic of Drosha/Dicer processing adds to the score
+ *) relative and absolute stabilities contribute to the score
+ *) if 5' end of mature sequence is identical to that of known mature sequence it adds to the score
+ *) randomly permuting read signatures with putative precursor sequences in order to determine the FPR
+ Internally miRDeep2 uses the following packages:
+ RNAfold version 2.1.7
+ RANDFOLD version 2
+
+
+
+ config
+ TSV
+ single
+
+ this is the configuration file that miRDeep2 uses to locate the FASTQ library and assign the 3-character identification to it
+
+
+
+
+
+ genome
+ fasta
+ single
+
+ hs37d5 and GRCm38mm10 genomes are modified as follows:
+ *) IDs are simplified, everything to the right of the first white space encountered is removed,
+ *) all ambiguously called nucleotides [URYSWKMBDHV] have been masked to "N".
+ The following script does all this:
+ \(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' hs37d5.fa > hs37d5_simple.fa
+ sed -e 's/^>\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' GRCm38mm10.fa > GRCm38mm10_simple.fa
+ ]]>
+
+
+
+ genome_index
+ bowtie-index
+ collection
+
+ bowtie version 0.12.7 index of hs37d5_simple.fa and GRCm38mm10_simple.fa generated as follows:
+ bowtie-build -f hs37d5_simple.fa hs37d5_simple.fa
+ bowtie-build -f GRCm38mm10_simple.fa GRCm38mm10_simple.fa
+
+
+
+ miRBase_mature
+ fasta
+ single
+ mature known miRNA reference from miRBase Release 20 uploaded to ASPERA
+
+
+ miRBase_hairpin
+ fasta
+ single
+ precursor (hairpin) known miRNA reference from miRBase Release 20 uploaded to ASPERA
+
+
+
+
+ SampleID.SXPv1.DATE.known.csv
+ csv
+ single
+
+ expression of known miRNAs quantified by miRDeep2
+
+
+
+ SampleID.SXPv1.DATE.known.bed
+ bed
+ single
+
+ BED track of expression of known miRNAs quantified by miRDeep2
+
+
+
+ SampleID.SXPv1.DATE.known.bedGraph
+ bedGraph
+ single
+
+ bedGraph track of expression of known miRNAs quantified by miRDeep2
+
+
+
+ SampleID.SXPv1.DATE.novel.bed
+ bed
+ single
+
+ bed track of expression of novel miRNAs predicted by miRDeep2
+
+
+
+ SampleID.SXPv1.DATE.novel.bedGraph
+ bedGraph
+ single
+
+ bedGraph track of expression of novel miRNAs predicted by miRDeep2
+
+
+
+
+
+ generate_config
+ missing
+
+ config ]]>
+
+ no looping
+
+ this command creates the configuration file for miRDeep2 to use in order to locate the FASTQ library {SampleID.fastq} and assign
+ a 3-letter internal ID to it, in this case ID1
+
+
+
+ mapper.pl
+ miRDeep2.0.0.6
+
+ mapper_summary.log ]]>
+
+ no looping
+
+ use the configuration file to locate the library; remove adaptor provided by {Adaptor};
+ collapse the reads to the file "read_collapsed.fa";
+ map to the reference and output the alignments in the file "reads_vs_genome.arf";
+ print out summary in "mapper_summary.log"
+
+ The ARF is a text-based format consisting of the following columns:
+
+ readID # the ID of the read
+ readLength # length of the read
+ start # start position of the alignment relative to the read
+ end # end position of the alignment relative to the read
+ readSeq # sequence of the read
+ chr # chromosome of reference where read maps
+ refLength # length of the reference sequence where read maps to
+ start # start position of reference sequence where read maps to
+ end # end position of reference sequence where read maps to
+ referenceSeq # reference sequence where read maps to
+ strand # strand of reference
+ mm # number of mismatches in the alignment
+ MAPQ-like-string # m==perfect match, M==mismatch
+
+
+
+ miRDeep2
+ miRDeep2.0.0.6
+
+ miRDeep2.report.log ]]>
+
+ no looping
+ quantify known miRNAs and predict putative novel miRNAs across samples
+
+
+ rename_according_to_metadata_standards
+ missing
+
+
+
+ no looping
+ rename output data file to conform to metadata naming standards
+
+
+ mirdeep2_csv2bed.pl
+ missing
+
+ "{SampleID}.SXPv1.{DATE}.novel.bed"
+ cat "novel_pres_DATE_t_TIME_score-50_to_na.bed" >> "{SampleID}.SXPv1.{DATE}.novel.bed"
+ ]]>
+
+ no looping
+
+ Generate BED tracks from the total precursor read counts of known and novel miRNAs and rename them according to metadata standards.
+ This tool has been uploaded to ASPERA.
+
+
+
+ bed_to_bedGraph
+ missing
+
+ FILENAME"Graph"; print $1,$2,$3,$5 >> FILENAME"Graph"} NR>3 {print $1,$2,$3,$5 >> FILENAME"Graph"}' "{SampleID}.SXPv1.{DATE}.known.bed"
+ gawk 'NR==1 {print "track type=bedGraph description=\"miRDeep2 novel miRNAs\" visibility=2 color=0,0,255 altColor=255,0,0" > FILENAME"Graph"; print $1,$2,$3,$5 >> FILENAME"Graph"} NR>1 {print $1,$2,$3,$5 >> FILENAME"Graph"}' "{SampleID}.SXPv1.{DATE}.novel.bed"
+ ]]>
+
+ no looping
+ convert BED tracks to bedGraph
+
+
+
diff --git a/docs/quantification/transcriptome/SXPv2.xml b/docs/quantification/transcriptome/SXPv2.xml
new file mode 100644
index 0000000..63ad5f6
--- /dev/null
+++ b/docs/quantification/transcriptome/SXPv2.xml
@@ -0,0 +1,227 @@
+
+
+
+ SXP
+ 2
+
+ Filippos Klironomos
+ filippos.klironomos@mdc-berlin.de
+
+
+ *) miRDeep2 pipeline involves:
+
+ *) mapping of reads to genome and keeping those uniquely mapped
+ *) extracting bracketing DNA of the uniquely mapped reads
+ *) RNAfold extracted sequences and keeping those that form unbifurcated hairpins
+ *) scoring putative precursors:
+ *) expect greater number of reads mapping to either the -5p or -3p strand and very little to the hairpin
+ *) short 3' duplex overhang characteristic of Drosha/Dicer processing adds to the score
+ *) relative and absolute stabilities contribute to the score
+ *) if 5' end of mature sequence is identical to that of known mature sequence it adds to the score
+ *) randomly permuting read signatures with putative precursor sequences in order to determine the FPR
+
+ Internally miRDeep2 uses the following packages:
+
+ RNAfold version 2.1.7
+ RANDFOLD version 2
+
+
+
+ config
+ TSV
+ single
+
+ this is the configuration file that miRDeep2 uses to locate the FASTQ library and assign the 3-character identification to it
+
+
+
+
+
+
+ genome
+ fasta
+ single
+
+\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' hs37d5.fa > hs37d5_simple.fa
+ sed -e 's/^>\(\S\+\)\s.*$/>\1/' -e '/^[^>]/s/[UuRrYySsWwKkMmBbDdHhVv]/N/g' GRCm38mm10.fa > GRCm38mm10_simple.fa
+]]>
+
+
+
+ genome_index
+ bowtie-index
+ collection
+
+ bowtie version 1.1.1 index of hs37d5_simple.fa and GRCm38mm10_simple.fa generated as follows:
+
+ bowtie-build -f hs37d5_simple.fa hs37d5_simple.fa
+ bowtie-build -f GRCm38mm10_simple.fa GRCm38mm10_simple.fa
+
+
+
+ miRBase_mature
+ fasta
+ single
+ mature known miRNA reference from miRBase Release 20 uploaded to ASPERA
+
+
+ miRBase_hairpin
+ fasta
+ single
+ precursor (hairpin) known miRNA reference from miRBase Release 20 uploaded to ASPERA
+
+
+
+
+
+ SampleID.SXPv2.DATE.known.csv
+ csv
+ single
+
+ expression of known miRNAs quantified by miRDeep2
+
+
+
+ SampleID.SXPv2.DATE.known.bed
+ bed
+ single
+
+ BED track of expression of known miRNAs quantified by miRDeep2
+
+
+
+ SampleID.SXPv2.DATE.known.bedGraph
+ bedGraph
+ single
+
+ bedGraph track of expression of known miRNAs quantified by miRDeep2
+
+
+
+ SampleID.SXPv2.DATE.novel.bed
+ bed
+ single
+
+ bed track of expression of novel miRNAs predicted by miRDeep2
+
+
+
+ SampleID.SXPv2.DATE.novel.bedGraph
+ bedGraph
+ single
+
+ bedGraph track of expression of novel miRNAs predicted by miRDeep2
+
+
+
+
+
+
+
+ generate_config
+ missing
+
+ config
+ ]]>
+
+ no looping
+
+ this command creates the configuration file for miRDeep2 to use in order to locate the FASTQ library {SampleID.fastq} and assign
+ a 3-letter internal ID to it, in this case ID1
+
+
+
+ mapper.pl
+ miRDeep2.0.0.7
+
+ mapper_summary.log
+ ]]>
+
+ no looping
+
+ use the configuration file to locate the library; remove adaptor provided by {Adaptor};
+ collapse the reads to the file "read_collapsed.fa";
+ map to the reference and output the alignments in the file "reads_vs_genome.arf";
+ print out summary in "mapper_summary.log"
+
+ The ARF is a text-based format consisting of the following columns:
+
+ readID # the ID of the read
+ readLength # length of the read
+ start # start position of the alignment relative to the read
+ end # end position of the alignment relative to the read
+ readSeq # sequence of the read
+ chr # chromosome of reference where read maps
+ refLength # length of the reference sequence where read maps to
+ start # start position of reference sequence where read maps to
+ end # end position of reference sequence where read maps to
+ referenceSeq # reference sequence where read maps to
+ strand # strand of reference
+ mm # number of mismatches in the alignment
+ MAPQ-like-string # m==perfect match, M==mismatch
+
+
+
+ miRDeep2
+ miRDeep2.0.0.7
+
+ miRDeep2.report.log
+]]>
+
+ no looping
+ quantify known miRNAs and predict putative novel miRNAs across samples
+
+
+ rename_according_to_metadata_standards
+ missing
+
+
+
+ no looping
+ rename output data file to conform to metadata naming standards
+
+
+ mirdeep2_csv2bed.pl
+ missing
+
+ "{SampleID}.SXPv2.{DATE}.novel.bed"
+ cat "novel_pres_DATE_t_TIME_score-50_to_na.bed" >> "{SampleID}.SXPv2.{DATE}.novel.bed"
+]]>
+
+ no looping
+
+ Generate BED tracks from the total precursor read counts of known and novel miRNAs and rename them according to metadata standards.
+ This tool has been uploaded to ASPERA.
+
+
+
+ bed_to_bedGraph
+ missing
+
+ FILENAME"Graph"; print $1,$2,$3,$5 >> FILENAME"Graph"} NR>3 {print $1,$2,$3,$5 >> FILENAME"Graph"}' "{SampleID}.SXPv2.{DATE}.known.bed"
+ gawk 'NR==1 {print "track type=bedGraph description=\"miRDeep2 novel miRNAs\" visibility=2 color=0,0,255 altColor=255,0,0" > FILENAME"Graph"; print $1,$2,$3,$5 >> FILENAME"Graph"} NR>1 {print $1,$2,$3,$5 >> FILENAME"Graph"}' "{SampleID}.SXPv2.{DATE}.novel.bed"
+]]>
+
+ no looping
+ convert BED tracks to bedGraph
+
+
+