Skip to content
/ coTRaCTE Public

Predicting co-occurring transcription factors on cell-type specific accessible chromatin regions

Notifications You must be signed in to change notification settings

Alena/coTRaCTE

Repository files navigation

coTRaCTE

Predicting co-occurring transcription factors on cell-type specific accessible chromatin regions

Pipeline

  1. Derive the cell-type specific DNase hypersensitive sites (CTS-DHSs) and ubiquitous DNase hypersensitive sites (ubiq-DHSs) for DNase-seq experiments. All input file (read counts on 200bp windows without repeats) names including path and corresponding cell types should be saved in data/all_files.csv. Output files are saved in: output_directory/top_regions/ as bed files, each cell type in separate file.

Usage:

Rscript scripts/calculate.cts-dhs.R -c "count.directory" -t "data/all_files.csv" -w "data/ranges_hg19_200bp_masked_sorted.bed" -o "output.dir" -tpr 10000 -m 1

The CTS-DHSs and ubiq-DHSs derived for 90 cell types from ENCODE, genome hg19 are stored in data/top_regions/

  1. Get the fasta files for all top_regions (each cell type in separate folder) using:

scripts/Get_fasta_tissues.sh

Before using Get_fasta_tissues.sh bedtools must be installed, the corresponding genome must be downloaded.

  1. Calculate binding affinities for PWMs of interest with TRAP using calculate.affinity.R:

Rscript scripts/calculate.affinity.R -m file.with.matrices -f format.matrices -s "data/top_regions/fasta" -t "data/cell_types.dat" -o "output.folder"

The pre-calculated affinities for TRANSFAC matrices are stored in: data/affinity/ (separate folder for each cell type and separate file for each PWM).

  1. Calculate the TF-enrichment for all PWMs in a cell-type specific way and plot heatmaps of p-values and of odd ratios for all matrices and all cell types

Rscript scripts/calculate.tf.enrichment.R -l "data/list.of.matrices" -a "data/affinity" -t "data/cell_types_test.dat" -k 500 -n 5000 -o "results/enrichment" -p TRUE -d "results/plots"

  1. Calculate the TF co-occurrence in a cell-type specific way for all possible pairs of TFs

Rscript scripts/calculate.tf.pairs.R -l "data/list.of.matrices" -a "data/affinity" -t "data/cell_types_test.dat" -k 500 -n 5000 -o "results/tf.pairs"

Testing scripts in: tests/call_functions.sh

About

Predicting co-occurring transcription factors on cell-type specific accessible chromatin regions

Resources

Stars

Watchers

Forks

Releases

No releases published