Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
TOuCAN/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
154 lines (136 sloc)
6.33 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
![TOuCAN](./TOuCAN_Schriftzug.png "TOuCAN") | |
# TOuCAN: Targeted chrOmatin Capture ANalysis - A Nextflow Pipeline | |
TOuCAN is a Nextflow Pipeline for analysing *Targeted Chromation Capture* (T2C) and *High-throughput Chromatin Capture* (HiC) experiments. | |
The basic analysing steps are taken from the [original pipeline](https://www.nature.com/articles/nprot.2017.132) (Petros Kolovos et al.). | |
TOuCAN combines these steps to an easy to use pipeline. Furthermore, it adds additional features for the analysis. | |
## Features | |
T2C Analysis: | |
* MultiPlot | |
* Interaction matrix | |
* TAD Boundary score | |
* Gene annotation | |
* bed file with all interactions | |
* raw/normalized | |
* bed file with all interactions inside the target region | |
* with and without uropa annotation | |
* easy to read viewpoint format for annotated interactions (see Results for explanation) | |
* restriction maps for different runs on the genome | |
HiC Analysis: | |
* Plots | |
* Interaction matrix | |
* more will be implemented | |
### MultiPlot Example | |
![MultiPlot](./multiplot.png "MultiPlot Example") | |
Note: Green shows a value higher than the scale. | |
## Installation and Command-line usage | |
### Dependencies | |
* Linux | |
* Nextflow version >= 0.30 | |
* R version version 3.4.4 | |
* ggplot2 | |
* plyr | |
* gridExtra | |
* gtable | |
* RColorBrewer | |
* getopt | |
* ggbio | |
* optparse | |
* Python version 2.7.8 | |
* pysam | |
* getopt | |
* HiCExplorer version 2.1 | |
* Conda: | |
* bowtie2 version 2.3.3.1* | |
* bwa version 0.7.15* | |
* SAMtools version 1.3.1* | |
* BEDtools version 2.27.1* | |
* uropa version 2.0.2 alpha* | |
``* Will be implemented automatically through conda enviroment.`` | |
### Installation | |
To install Nextflow, follow the instructions on the offical Nextflow [website](https://www.nextflow.io/). | |
After installing all dependencies, download TOuCAN from the [TOuCAN GitHub Page](https://github.molgen.mpg.de/loosolab/TOuCAN). | |
Then add all required parameters to the configuration file. | |
Please check the following link for detailed information about the [configuraten file setup](https://github.molgen.mpg.de/loosolab/TOuCAN/wiki/How-to-setup-the-TOuCAN-configuration-file). | |
### Usage | |
To run the pipeline, use following command-line: | |
Parameter with default values are optional. | |
``` | |
Usage: nextflow run TOuCAN.nf --in [Input Path] --out [Output Path] --mode [Modi] [options] | |
--mode help, h - For showing this help message | |
--mode plot - Plot data [currently only T2C plots] | |
parameters: | |
--path_matrix [PATH] - Path to directory with *.normalized.bed files. | |
--chr [chr1,chr2,...,chrY] - On which chromosome is the target region. | |
--start [INT] - Start of target region. | |
--end [INT] - End of target region. | |
--score_min [INT] - Score range: minimum. [default: 0] | |
--score_max [INT] - Score range: maximium. [default: autoscale] | |
--pn [STRING] - Name of the Project [default: 'Project'] | |
--mode T2C - Full T2C analysis | |
parameters: | |
--in [PATH] - Path to directory with fastq / fastq.gz files. | |
--bam [PATH] - Path to directory with bam files. [if given --in [PATH] is ignored] | |
--out [PATH] - Path to output directory. | |
--safe_all_files [0|1] - If 1 safes all temporary files into "OUTPUT/02_analysis/". [default: 0] | |
--check_res_maps [0|1] - If 1 prints first 5 lines of every file from restriction maps. [def.: 0] | |
--chr [chr1,chr2,...,chrY] - On which chromosome is the target region. | |
--start [INT] - Start of target region. | |
--end [INT] - End of target region. | |
--score_min [INT] - Score range: minimum. [default: 0] | |
--score_max [INT] - Score range: maximium. [default: autoscale] | |
--pn [STRING] - Name of the Project [default: 'Project'] | |
--organsim [mm10,mm9,hg19] - Type of the genome | |
--mode uropa - Uropa annoation [T2C] | |
parameters: | |
--in [PATH] - Path to directory with *.normalized.bed | |
--out [PATH] - Path to output directory. | |
--chr [chr1,chr2,...,chrY] - On which chromosome is the target region. | |
--start [INT] - Start of target region. | |
--end [INT] - End of target region. | |
--pn [STRING] - Name of the Project [default: 'Project'] | |
--mode multiplot - creating a plot with interaction map, TAD graph and gene annotation | |
parameters: | |
--in [PATH] - Path to directory with *.normalized.bed | |
--out [PATH] - Path to output directory. | |
--chr [chr1,chr2,...,chrY] - On which chromosome is the target region. | |
--start [INT] - Start of target region. | |
--end [INT] - End of target region. | |
--score_min [INT] - Score range: minimum. [default: 0] | |
--score_max [INT] - Score range: maximium. [default: autoscale] | |
--pn [STRING] - Name of the Project [default: 'Project'] | |
--organsim [mm10,mm9,hg19] - Type of the genome | |
--mode HiC - Full HiC analysis | |
parameters: | |
--in [PATH] - Path to directory with fastq / fastq.gz files. | |
--out [PATH] - Path to output directory. | |
--aln [bwa|bowtie2] - Choose alignment tool. [default: bwa] | |
--bin [INT] - Binsize [default: 10000] | |
Skip Aligment -> BAM files as Input: | |
The BAM files need to be from this Pipeline with follwing | |
file extension: "[NAME].(normalized|matrix).bam" ! | |
Skip creating restritction maps: | |
After creating the restriction maps write their path into the config file to skip | |
creating the restrction maps again. [path_T2C_restriction_maps] | |
``` | |
### Input | |
HiC and T2C Experiments will result in two fastq files for each sample: a forward and a reversed fastq file. Those files need to have the same basename. To identify those, each basename has to end with a 'sample extension'. | |
For example: | |
sample1_R1.fastq | |
sample1_R2.fastq | |
You have to give the extension as a parameter in the commandline or in the configuration file. In this case, it would look like this: | |
``` | |
params{ | |
sample_extension = "_R[12]" // in the configuration file | |
} | |
``` | |
``` | |
nextflow run TOuCAN ... --sample_extension _R[12] // as command line parameter | |
``` | |
### Simple Example | |
The fastq files are stored in '/example/fastq/'. The target region is on chromosome 1 from base 123000000 to base 125000000. | |
The command for the T2C analysis would look like this: | |
```shell | |
nextflow run TOuCAN.nf --mode T2C --in /example/fastq/ --out /out/ --chr chr1 --start 123000000 --end 125000000 | |
``` | |
## Results | |
For a detailed explanation of all Results, follow this [link](https://github.molgen.mpg.de/loosolab/TOuCAN/wiki/TOuCAN-Results). |