Skip to content

Alena/coTRaCTE

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

coTRaCTE

Predicting co-occurring transcription factors on cell-type specific accessible chromatin regions

Pipeline

  1. Derive the cell-type specific DNase hypersensitive sites (CTS-DHSs) and ubiquitous DNase hypersensitive sites (ubiq-DHSs) for DNase-seq experiments. All input file (read counts on 200bp windows without repeats) names including path and corresponding cell types should be saved in data/all_files.csv. Output files are saved in: output_directory/top_regions/ as bed files, each cell type in separate file.

Usage:

Rscript scripts/calculate.cts-dhs.R -c "count.directory" -t "data/all_files.csv" -w "data/ranges_hg19_200bp_masked_sorted.bed" -o "output.dir" -tpr 10000 -m 1

The CTS-DHSs and ubiq-DHSs derived for 90 cell types from ENCODE, genome hg19 are stored in data/top_regions/

  1. Get the fasta files for all top_regions (each cell type in separate folder) using:

scripts/Get_fasta_tissues.sh

Before using Get_fasta_tissues.sh bedtools must be installed, the corresponding genome must be downloaded.

  1. Calculate binding affinities for PWMs of interest with TRAP using calculate.affinity.R:

Rscript scripts/calculate.affinity.R -m file.with.matrices -f format.matrices -s "data/top_regions/fasta" -t "data/cell_types.dat" -o "output.folder"

The pre-calculated affinities for TRANSFAC matrices are stored in: data/affinity/ (separate folder for each cell type and separate file for each PWM).

  1. Calculate the TF-enrichment for all PWMs in a cell-type specific way and plot heatmaps of p-values and of odd ratios for all matrices and all cell types

Rscript scripts/calculate.tf.enrichment.R -l "data/list.of.matrices" -a "data/affinity" -t "data/cell_types_test.dat" -k 500 -n 5000 -o "results/enrichment" -p TRUE -d "results/plots"

  1. Calculate the TF co-occurrence in a cell-type specific way for all possible pairs of TFs

Rscript scripts/calculate.tf.pairs.R -l "data/list.of.matrices" -a "data/affinity" -t "data/cell_types_test.dat" -k 500 -n 5000 -o "results/tf.pairs"

Testing scripts in: tests/call_functions.sh

About

Predicting co-occurring transcription factors on cell-type specific accessible chromatin regions

Resources

Stars

Watchers

Forks

Releases

No releases published