This is an easy-to-use pipeline for pre-processing of HiChIP (or HiC) data from end-to-end using various existing tools. The pipeline aims to enable focussing on the biological question to be answered by seamlessly processing the raw data in the background.
The pipeline uses
- HiC-Pro for pre-processing the raw data and generate raw and ICE normalized contact matrices
- HiCPlotter for static visualization of contact matrices with/without additional annotations
- FitHiC for calling significant interactions
- Juicebox/Juicer for dynamic visualization of contact matrices with/without additional annotations
- HiCcompare for a differential comparison of two contact matrices
Output from the pipeline can be directly used for further downstream analyses.
- HiC-Pro (v2.9.0; https://github.com/nservant/HiC-Pro)
- Juicer (tested with v1.8.8 on Unix; https://github.com/theaidenlab/juicebox/wiki/Download)
- Fit-HiC R package (tested with v.1.2.0; higher versions should work; https://doi.org/doi:10.18129/B9.bioc.FitHiC)
- HiCPlotter (v0.7.1; https://github.com/kcakdemir/HiCPlotter)
- HiCcompare (v1.1.3; https://doi.org/doi:10.18129/B9.bioc.HiCcompare)
Kindly note that, due to HiCPlotter which requires Python 2.7.*, this pipeline currently works with Python 2.7.*
The generated .hic
files can be viewed using Juicebox (desktop or the web version).
More specifically,
- For installing HiC-Pro v2.9.0, follow instructions at [https://github.com/nservant/HiC-Pro]. Once done, configure it as per corresponding instructions (e.g., set paths to softwares in HiC-Pro's config-hicpro.txt), and copy the configured config-hicpro.txt into this folder.
- For installing FitHiC (latest version) and HiCcompare
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite(c("FitHiC", "HiCcompare"))
$ python run_pipeline.py -h
usage: run_pipeline.py -c CONFIG_FILE -st STAGE [STAGE ...] [-ne] [-h]
Pipeline for processing HiChIP data from end-to-end.
Usage example: $python run_pipeline.py -c umbrella-config-file.txt -s do_all
Required arguments:
-c CONFIG_FILE, --config_file CONFIG_FILE
Specify the umbrella config file.
-st STAGE [STAGE ...], --stage STAGE [STAGE ...]
Specify what to run. Possible values: [do_hicpro,
do_hicplotter, do_fithic, do_juicebox,
do_differential, suggest_resolution, do_all]. Non-
relevant details from the config file will be ignored.
When *do_all*, we perform sequentially do_hicpro,
do_fithic, and do_juicebox. For others, run the
pipeline individually.
Optional arguments:
-ne, --no_execute Specify this if you do not want to execute commands
right away; just print them in the commands logfile.
-h, --help Show this message and exit.
$ python run_pipeline.py -c umbrella-config-file.txt -st do_hicpro
Further important points:
- The
umbrella-config-file.txt
is used to set all the necessary the params for the pipeline. - Use the
umbrella-config-file.txt
file for setting the various params required by the individual tools. - Each pipeline run provides information messages on the console, and also logs it together with all the commands to a file. This file is named with keyword 'pHDee' suffixed with a 4-digit number identifying the particular run. See this file for the precise commands that are run in case you have to debug something.
- The
-ne
option (ne:no_execute) lets you perform a dry run of the pipeline. In this mode, the pipeline forms and logs all commands and messages for a particular stage but doesn't execute them.
In case of any questions or feedback, please write to snikumbh@mpi-inf.mpg.de