diff --git a/docs/misc/THBv1.xml b/docs/misc/THBv1.xml
new file mode 100644
index 0000000..213c675
--- /dev/null
+++ b/docs/misc/THBv1.xml
@@ -0,0 +1,81 @@
+
+
+
+ THB
+ 1
+
+ Peter Ebert
+ pebert@mpi-inf.mpg.de
+
+
+ The trackhub_conv.py Python3 script adds the 'chr' prefix to the chromosome names and filters
+ for the chromosomes 1-22 / 1-19 and X,Y for reasons of compatibility of genomic coordinates between assemblies.
+ Note that the script just reads the folder contents and converts every file in the folder that appears
+ to be output of a DEEP process and to be a peak or bigwig file (based on file naming).
+ The converted files are put in the same folder.
+ Important: MACS2 outputs narrowPeak/broadPeak files that are not fully compliant to ENCODE standards,
+ the score column (index 5) has to be between 0-1000, so the conversion script rescales these values.
+ Please note that the peak name still refers to the original (unconverted) file.
+ Approximately 1 out 10 files is chosen at random and checked for consistency by reversing the conversion
+ (except for scaling of the score column in case of peak files) and computing the MD5 checksum,
+ which is then compared to the MD5 checksum of the original file after filtering
+ for the appropriate chromosomes as explained above.
+
+
+
+ CHP_peaks
+ narrowPeak
+ collection
+ Standard output of MACS2 in ENCODE narrowPeak format
+
+
+ CHP_peaks
+ broadPeak
+ collection
+ Standard output of MACS2 in ENCODE broadPeak format
+
+
+ DEEP_bigwig
+ bigwig
+ collection
+ Any bigwig output of a standardized DEEP process
+
+
+
+
+ chrom_sizes
+ table
+ single
+ File holding information on chromosome sizes for UCSC assembly (i.e. hg19, mm10)
+
+
+ field_names
+ AutoSQL
+ collection
+ Field_names is a folder containing files in AutoSQL format necessary for conversion of narrowPeak and broadPeak format into bigbed
+
+
+
+
+ THB_peaks
+ bigbed
+ collection
+ Converted peak files
+
+
+ THB_bigwig
+ bigwig
+ collection
+ Converted bigwig files
+
+
+
+
+ trackhub_conv.py
+ 0.1
+
+ CHP_peaks, DEEP_bigwig
+ Simple Python3 script to handle the batch conversion of files
+
+
+
diff --git a/docs/misc/THBv2.xml b/docs/misc/THBv2.xml
new file mode 100644
index 0000000..1770803
--- /dev/null
+++ b/docs/misc/THBv2.xml
@@ -0,0 +1,125 @@
+
+
+
+ THB
+ 2
+
+ Peter Ebert
+ pebert@mpi-inf.mpg.de
+
+
+ This process merely describes the conversion - not production - of DEEP data files into an IHEC compatible format.
+ If you have any questions about the actual data, please refer to the process XML files describing
+ the data production and contact the author named in the respective file. The trackhub conversion process
+ describes the conversion of standardized DEEP process output files into one of the BIG formats
+ needed to submit the data as IHEC track hub. Since the reference assemblies used by IHEC are different
+ to the ones used by DEEP, the conversion consists of the following steps:
+ (i) filter data files for chromosomes 1-22 (hsa)/1-19 (mmu) and X,
+ (ii) add "chr" prefix to chromosome names and
+ (iii) for all BED or BED-like files, ensure that these represent a regular BED6+ file; in particular,
+ the "score" column is adjusted by default to be in the range 0-1000 (for details about the
+ formats used, please refer to https://genome.ucsc.edu/FAQ/FAQformat.html).
+ The adjustment works as follows:
+ select one meaningful column (e.g. coverage, signal enrichment or similar), bin the data according
+ to the gray shading schema used by the UCSC genome browser (see link above) and then assign fix score
+ values according according to the binning.
+
+
+
+ DEEP_bigwig
+ bigWig
+ collection
+ bigWig output of a standardized DEEP process (libraries: histone, DNase, NOMe, WGBS; only raw/unfiltered signal tracks for histone and DNase libs)
+
+
+ DEEP_bed
+ BED or BED-like
+ collection
+ BED or BED-like output of a standardized DEEP process; comprises of histone, DNase and NOMe peaks and expressed small/long RNAs
+
+
+
+
+ chrom_sizes
+ table
+ collection
+ Common files containing information about the chromosome sizes for the respective assemblies
+
+
+ field_names
+ AutoSQL
+ collection
+ AutoSQL files describing the different BED files: narrowPeak, broadPeak, gNOMePeak, snRNAexpr, longRNAexpr
+
+
+
+
+ DEEPID.PROC.DATE.bigBed
+ bigBed
+ collection
+ Converted BED or BED-like files
+
+
+ DEEPID.PROC.DATE.bigWig
+ bigWig
+ collection
+ Converted bigWig files
+
+
+
+
+ bigWigToBedGraph, egrep, sort, sed
+ 4, 2.12, 8.13, 4.2.1
+
+ temp_signal.bg ]]>
+
+ DEEP_bigwig
+ Filter all signal tracks and add prefix, make sure that output is sorted (should be by construction)
+
+
+ bedGraphToBigWig
+ 4
+
+
+
+ temp_signal.bg
+ Create final signal tracks
+
+
+ egrep, sort, sed
+ 2.12, 8.13, 4.2.1
+
+ temp_region.bed ]]>
+
+ DEEP_bed
+ Filter all uncompressed BED files and add prefix, make sure that output is sorted
+
+
+ gzip, egrep, sort, sed
+ 1.5, 2.12, 8.13, 4.2.1
+
+ temp_region.bed ]]>
+
+ DEEP_bed (gzipped)
+ Filter all gzipped BED files and add prefix, make sure that output is sorted
+
+
+ python3, numpy
+ 3.2.3, 1.6.2
+
+
+
+ temp_region.bed
+ Python3 function to adjust score column is implemented as part of the pipeline code and executed for all BED files by default
+
+
+ bedToBigBed
+ 2.6
+
+
+
+ temp_region.bed
+ Create final region files. n==1 for snRNA; n==3 for NOMe and broad peaks; n==4 for narrow peaks and long RNAs
+
+
+
diff --git a/docs/misc/THBv3.xml b/docs/misc/THBv3.xml
new file mode 100644
index 0000000..19ccf30
--- /dev/null
+++ b/docs/misc/THBv3.xml
@@ -0,0 +1,142 @@
+
+
+
+ THB
+ 3
+
+ Peter Ebert
+ pebert@mpi-inf.mpg.de
+
+
+ This process merely describes the conversion - not production - of DEEP data files into an IHEC compatible format.
+ If you have any questions about the actual data, please refer to the process XML files describing
+ the data production and contact the author named in the respective file. The trackhub conversion process
+ describes the conversion of standardized DEEP process output files into one of the BIG formats
+ needed to submit the data as IHEC track hub. Since the reference assemblies used by IHEC are different
+ to the ones used by DEEP, the conversion consists of the following steps:
+ (i) filter data files for chromosomes 1-22 (hsa)/1-19 (mmu) and X,
+ (ii) add "chr" prefix to chromosome names and
+ (iii) for all BED or BED-like files, ensure that these represent a regular BED6+ file; in particular,
+ the "score" column is adjusted by default to be in the range 0-1000 (for details about the
+ formats used, please refer to https://genome.ucsc.edu/FAQ/FAQformat.html).
+ The adjustment works as follows:
+ select one meaningful column (e.g. coverage, signal enrichment or similar), bin the data according
+ to the gray shading schema used by the UCSC genome browser (see link above) and then assign fix score
+ values according according to the binning.
+ Version 3 of the THB process also creates a mapping between filename and track property
+ (~ what does this data represent?) as required by the updated IHEC trackhub specification (JSON format).
+
+
+
+ DEEP_signal
+ bigWig or bedGraph
+ collection
+ bigWig output of a standardized DEEP process (libraries: histone, DNase, NOMe, WGBS; only raw/unfiltered signal tracks for histone and DNase libs)
+
+
+ DEEP_bed
+ BED or BED-like
+ collection
+ BED or BED-like output of a standardized DEEP process; comprises of histone, DNase and NOMe peaks and expressed small/long RNAs
+
+
+
+
+ chrom_sizes
+ table
+ collection
+ Common files containing information about the chromosome sizes for the respective assemblies
+
+
+ field_names
+ AutoSQL
+ collection
+ AutoSQL files describing the different BED files: narrowPeak, broadPeak, gNOMePeak, snRNAexpr, longRNAexpr
+
+
+
+
+ DEEPID.PROC.DATE.bigBed
+ bigBed
+ collection
+ Converted BED or BED-like files
+
+
+ DEEPID.PROC.DATE.bigWig
+ bigWig
+ collection
+ Converted bigWig files
+
+
+ DACID.PROC.DATE.prop.tsv
+ tab separated table
+ single
+ trackhub property mapping
+
+
+
+
+ bigWigToBedGraph, egrep, sort, sed
+ 4, 2.12, 8.13, 4.2.1
+
+ temp_signal.bg ]]>
+
+ DEEP_bigwig
+ Filter all signal tracks and add prefix, make sure that output is sorted (should be by construction)
+
+
+ bedGraphToBigWig
+ 4
+
+
+
+ temp_signal.bg
+ Create final signal tracks
+
+
+ egrep, sort, sed
+ 2.12, 8.13, 4.2.1
+
+ temp_region.bed ]]>
+
+ DEEP_bed
+ Filter all uncompressed BED files and add prefix, make sure that output is sorted
+
+
+ gzip, egrep, sort, sed
+ 1.5, 2.12, 8.13, 4.2.1
+
+ temp_region.bed ]]>
+
+ DEEP_bed (gzipped)
+ Filter all gzipped BED files and add prefix, make sure that output is sorted
+
+
+ python3, numpy
+ 3.2.3, 1.6.2
+
+
+
+ temp_region.bed
+ Python3 function to adjust score column is implemented as part of the pipeline code and executed for all BED files by default
+
+
+ bedToBigBed
+ 2.6
+
+
+
+ temp_region.bed
+ Create final region files. N==1 for snRNA; N==3 for NOMe and broad peaks; N==4 for narrow peaks and long RNAs
+
+
+ python3
+ 3.2.3
+
+
+
+ no looping
+ Python3 function to write the track property mapping (filename to property) to a text file
+
+
+