README.md

![TOuCAN](./TOuCAN_Schriftzug.png "TOuCAN")
# TOuCAN: Targeted chrOmatin Capture ANalysis - A Nextflow Pipeline


TOuCAN is a Nextflow Pipeline for analysing *Targeted Chromation Capture* (T2C) and *High-throughput Chromatin Capture* (HiC) experiments.
The basic analysing steps are taken from the [original pipeline](https://www.nature.com/articles/nprot.2017.132) (Petros Kolovos et al.).
TOuCAN combines these steps to an easy to use pipeline. Furthermore, it adds additional features for the analysis.

## Features
T2C Analysis:
* MultiPlot
  * Interaction matrix
  * TAD Boundary score
  * Gene annotation
* bed file with all interactions
  * raw/normalized
* bed file with all interactions inside the target region
  * with and without uropa annotation
* easy to read viewpoint format for annotated interactions (see Results for explanation)
* restriction maps for different runs on the genome

HiC Analysis:
* Plots
  * Interaction matrix
  * more will be implemented

### MultiPlot Example
![MultiPlot](./multiplot.png "MultiPlot Example")
Note: Green shows a value higher than the scale.

## Installation and Command-line usage
### Dependencies
* Linux
* Nextflow version >= 0.30
* R version version 3.4.4
  * ggplot2
  * plyr
  * gridExtra
  * gtable
  * RColorBrewer
  * getopt
  * ggbio
  * optparse
* Python version 2.7.8
  * pysam
  * getopt
* HiCExplorer version 2.1
* Conda:
  * bowtie2 version 2.3.3.1*
  * bwa version 0.7.15*
  * SAMtools version 1.3.1*
  * BEDtools version 2.27.1*
  * uropa version 2.0.2 alpha*

``* Will be implemented automatically through conda enviroment.``

### Installation
To install Nextflow, follow the instructions on the offical Nextflow [website](https://www.nextflow.io/).
After installing all dependencies, download TOuCAN from the [TOuCAN GitHub Page](https://github.molgen.mpg.de/loosolab/TOuCAN).
Then add all required parameters to the configuration file.
Please check the following link for detailed information about the [configuraten file setup](https://github.molgen.mpg.de/loosolab/TOuCAN/wiki/How-to-setup-the-TOuCAN-configuration-file).

### Usage
To run the pipeline, use following command-line:
Parameter with default values are optional.
```
Usage: nextflow run TOuCAN.nf --in [Input Path] --out [Output Path] --mode [Modi] [options]

--mode help, h					- For showing this help message

--mode plot					- Plot data [currently only T2C plots]
	parameters:
		--path_matrix [PATH]	   	- Path to directory with *.normalized.bed files.
		--chr [chr1,chr2,...,chrY]	- On which chromosome is the target region.
		--start [INT]			- Start of target region.
		--end [INT]			- End of target region.
		--score_min [INT]		- Score range: minimum. [default: 0]
		--score_max [INT]	        - Score range: maximium. [default: autoscale]
        	--pn [STRING]               	- Name of the Project [default: 'Project']

--mode T2C				        - Full T2C analysis
	parameters:
		--in [PATH]		        - Path to directory with fastq / fastq.gz files.
        	--bam [PATH]                	- Path to directory with bam files. [if given --in [PATH] is ignored]
		--out [PATH]			- Path to output directory.
		--safe_all_files [0|1]	   	- If 1 safes all temporary files into "OUTPUT/02_analysis/". [default: 0]
		--check_res_maps [0|1]	    	- If 1 prints first 5 lines of every file from restriction maps. [def.: 0]
		--chr [chr1,chr2,...,chrY]	- On which chromosome is the target region.
		--start [INT]			- Start of target region.
		--end [INT]			- End of target region.
		--score_min [INT]		- Score range: minimum. [default: 0]
		--score_max [INT]		- Score range: maximium. [default: autoscale]
        	--pn [STRING]               	- Name of the Project [default: 'Project']
		--organsim [mm10,mm9,hg19]      - Type of the genome

--mode uropa			   	        - Uropa annoation [T2C]
	parameters:
		--in [PATH]			- Path to directory with *.normalized.bed
		--out [PATH]			- Path to output directory.
		--chr [chr1,chr2,...,chrY]	- On which chromosome is the target region.
		--start [INT]			- Start of target region.
		--end [INT]			- End of target region.
        	--pn [STRING]               	- Name of the Project [default: 'Project']

--mode multiplot			        - creating a plot with interaction map, TAD graph and gene annotation
	parameters:
		--in [PATH]			- Path to directory with *.normalized.bed
		--out [PATH]			- Path to output directory.
		--chr [chr1,chr2,...,chrY]	- On which chromosome is the target region.
		--start [INT]			- Start of target region.
		--end [INT]			- End of target region.
       		--score_min [INT]		- Score range: minimum. [default: 0]
       		--score_max [INT]		- Score range: maximium. [default: autoscale]
        	--pn [STRING]              	- Name of the Project [default: 'Project']
		--organsim [mm10,mm9,hg19]      - Type of the genome

--mode HiC 					- Full HiC analysis
	parameters:
		--in [PATH]			- Path to directory with fastq / fastq.gz files.
		--out [PATH]			- Path to output directory.
		--aln [bwa|bowtie2]		- Choose alignment tool. [default: bwa]
		--bin [INT]			- Binsize [default: 10000]

Skip Aligment -> BAM files as Input:
The BAM files need to be from this Pipeline with follwing
file extension: "[NAME].(normalized|matrix).bam" !

Skip creating restritction maps:
After creating the restriction maps write their path into the config file to skip
creating the restrction maps again. [path_T2C_restriction_maps]
```
### Input
HiC and T2C Experiments will result in two fastq files for each sample: a forward and a reversed fastq file. Those files need to have the same basename. To identify those, each basename has to end with a 'sample extension'.
For example:
sample1_R1.fastq
sample1_R2.fastq
You have to give the extension as a parameter in the commandline or in the configuration file. In this case, it would look like this:
```
params{
	sample_extension = "_R[12]" // in the configuration file
}
```
```
nextflow run TOuCAN ... --sample_extension _R[12] // as command line parameter
```
### Simple Example
The fastq files are stored in '/example/fastq/'. The target region is on chromosome 1 from base 123000000 to base 125000000.
The command for the T2C analysis would look like this:
```shell
nextflow run TOuCAN.nf --mode T2C --in /example/fastq/ --out /out/ --chr chr1 --start 123000000 --end 125000000
```

## Results
For a detailed explanation of all Results, follow this [link](https://github.molgen.mpg.de/loosolab/TOuCAN/wiki/TOuCAN-Results).
	![TOuCAN](./TOuCAN_Schriftzug.png "TOuCAN")
	# TOuCAN: Targeted chrOmatin Capture ANalysis - A Nextflow Pipeline


	TOuCAN is a Nextflow Pipeline for analysing Targeted Chromation Capture (T2C) and High-throughput Chromatin Capture (HiC) experiments.
	The basic analysing steps are taken from the [original pipeline](https://www.nature.com/articles/nprot.2017.132) (Petros Kolovos et al.).
	TOuCAN combines these steps to an easy to use pipeline. Furthermore, it adds additional features for the analysis.

	## Features
	T2C Analysis:
	* MultiPlot
	* Interaction matrix
	* TAD Boundary score
	* Gene annotation
	* bed file with all interactions
	* raw/normalized
	* bed file with all interactions inside the target region
	* with and without uropa annotation
	* easy to read viewpoint format for annotated interactions (see Results for explanation)
	* restriction maps for different runs on the genome

	HiC Analysis:
	* Plots
	* Interaction matrix
	* more will be implemented

	### MultiPlot Example
	![MultiPlot](./multiplot.png "MultiPlot Example")
	Note: Green shows a value higher than the scale.

	## Installation and Command-line usage
	### Dependencies
	* Linux
	* Nextflow version >= 0.30
	* R version version 3.4.4
	* ggplot2
	* plyr
	* gridExtra
	* gtable
	* RColorBrewer
	* getopt
	* ggbio
	* optparse
	* Python version 2.7.8
	* pysam
	* getopt
	* HiCExplorer version 2.1
	* Conda:
	* bowtie2 version 2.3.3.1*
	* bwa version 0.7.15*
	* SAMtools version 1.3.1*
	* BEDtools version 2.27.1*
	* uropa version 2.0.2 alpha*

	``* Will be implemented automatically through conda enviroment.``

	### Installation
	To install Nextflow, follow the instructions on the offical Nextflow [website](https://www.nextflow.io/).
	After installing all dependencies, download TOuCAN from the [TOuCAN GitHub Page](https://github.molgen.mpg.de/loosolab/TOuCAN).
	Then add all required parameters to the configuration file.
	Please check the following link for detailed information about the [configuraten file setup](https://github.molgen.mpg.de/loosolab/TOuCAN/wiki/How-to-setup-the-TOuCAN-configuration-file).

	### Usage
	To run the pipeline, use following command-line:
	Parameter with default values are optional.
	```
	Usage: nextflow run TOuCAN.nf --in [Input Path] --out [Output Path] --mode [Modi] [options]

	--mode help, h - For showing this help message

	--mode plot - Plot data [currently only T2C plots]
	parameters:
	--path_matrix [PATH] - Path to directory with *.normalized.bed files.
	--chr [chr1,chr2,...,chrY] - On which chromosome is the target region.
	--start [INT] - Start of target region.
	--end [INT] - End of target region.
	--score_min [INT] - Score range: minimum. [default: 0]
	--score_max [INT] - Score range: maximium. [default: autoscale]
	--pn [STRING] - Name of the Project [default: 'Project']

	--mode T2C - Full T2C analysis
	parameters:
	--in [PATH] - Path to directory with fastq / fastq.gz files.
	--bam [PATH] - Path to directory with bam files. [if given --in [PATH] is ignored]
	--out [PATH] - Path to output directory.
	--safe_all_files [0\|1] - If 1 safes all temporary files into "OUTPUT/02_analysis/". [default: 0]
	--check_res_maps [0\|1] - If 1 prints first 5 lines of every file from restriction maps. [def.: 0]
	--chr [chr1,chr2,...,chrY] - On which chromosome is the target region.
	--start [INT] - Start of target region.
	--end [INT] - End of target region.
	--score_min [INT] - Score range: minimum. [default: 0]
	--score_max [INT] - Score range: maximium. [default: autoscale]
	--pn [STRING] - Name of the Project [default: 'Project']
	--organsim [mm10,mm9,hg19] - Type of the genome

	--mode uropa - Uropa annoation [T2C]
	parameters:
	--in [PATH] - Path to directory with *.normalized.bed
	--out [PATH] - Path to output directory.
	--chr [chr1,chr2,...,chrY] - On which chromosome is the target region.
	--start [INT] - Start of target region.
	--end [INT] - End of target region.
	--pn [STRING] - Name of the Project [default: 'Project']

	--mode multiplot - creating a plot with interaction map, TAD graph and gene annotation
	parameters:
	--in [PATH] - Path to directory with *.normalized.bed
	--out [PATH] - Path to output directory.
	--chr [chr1,chr2,...,chrY] - On which chromosome is the target region.
	--start [INT] - Start of target region.
	--end [INT] - End of target region.
	--score_min [INT] - Score range: minimum. [default: 0]
	--score_max [INT] - Score range: maximium. [default: autoscale]
	--pn [STRING] - Name of the Project [default: 'Project']
	--organsim [mm10,mm9,hg19] - Type of the genome

	--mode HiC - Full HiC analysis
	parameters:
	--in [PATH] - Path to directory with fastq / fastq.gz files.
	--out [PATH] - Path to output directory.
	--aln [bwa\|bowtie2] - Choose alignment tool. [default: bwa]
	--bin [INT] - Binsize [default: 10000]

	Skip Aligment -> BAM files as Input:
	The BAM files need to be from this Pipeline with follwing
	file extension: "[NAME].(normalized\|matrix).bam" !

	Skip creating restritction maps:
	After creating the restriction maps write their path into the config file to skip
	creating the restrction maps again. [path_T2C_restriction_maps]
	```
	### Input
	HiC and T2C Experiments will result in two fastq files for each sample: a forward and a reversed fastq file. Those files need to have the same basename. To identify those, each basename has to end with a 'sample extension'.
	For example:
	sample1_R1.fastq
	sample1_R2.fastq
	You have to give the extension as a parameter in the commandline or in the configuration file. In this case, it would look like this:
	```
	params{
	sample_extension = "_R[12]" // in the configuration file
	}
	```
	```
	nextflow run TOuCAN ... --sample_extension _R[12] // as command line parameter
	```
	### Simple Example
	The fastq files are stored in '/example/fastq/'. The target region is on chromosome 1 from base 123000000 to base 125000000.
	The command for the T2C analysis would look like this:
	```shell
	nextflow run TOuCAN.nf --mode T2C --in /example/fastq/ --out /out/ --chr chr1 --start 123000000 --end 125000000
	```

	## Results
	For a detailed explanation of all Results, follow this [link](https://github.molgen.mpg.de/loosolab/TOuCAN/wiki/TOuCAN-Results).