Skip to content

denise/GIT_lncRNA_networks

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

lncRNA_networks

Identifying lncRNA-mediated regulatory modules via ChIA-PET network analysis.

This repository contains scripts and programs to construct cell-specific chromatin interaction networks from ChIA-PET experiments. Running /GIT_lncRNA_networks/Code/workflow_lncRNA-network.sh produces and analyses the network for the cell line K562. If you want to analyse your own cell line or use different annotation you can change the marked lines in the workflow file. You probably want to change the project_PATH variable in line 7.

All subscripts are called from /GIT_lncRNA_networks/Code/workflow_lncRNA-network.sh, code for the different subtasks is organized into appropriate folders. Any data that is generated gets stored in /GIT_lncRNA_networks/Data/ in a folder named after your cell line. The workflow is as follows:

  • get data and preprocess (select reliable PETs, get expression cutoff for RNA-seq)
  • annotate the interaction regions as protein coding genes, lncRNA or chromHMM segments
  • generate and analyze the chromosome-wise interaction networks

Variables you can adapt in the workflow:

  • line 7: project_PATH -> your directory
  • line 19: cell -> your cell line
  • line 40: method -> method for choosing the expression cutoff for genes used for annotation ("lnc0", "lncg" or "75")
  • line 57: gene_lnc_choice -> choose how to annotate regions overlapping protein coding genes and lncRNAs ("gene_lnc" or "gene")
    More explanations below.

Data:

  1. Cell-specific Data Cell-specific data for the generation of interaction networks is downloaded in the file /GIT_lncRNA_networks/Data/K562. If you don't want to use K562, you need to write your own workflow_this_cell.sh file in /GIT_lncRNA_networks/Data/$cell. This file downloads the following data:
  • ChIA-PET for building the interaction network
  • RNA-seq for finding an expression cutoff for genes
  • chromHMM segmentation file for annotation with regulatory elements
    Your chromHMM file is probably annotated differently, so you have to adapt the categories in /GIT_lncRNA_networks/Code/get_Data/3_prepare_chrom_anno.sh. chromHMM files can be created using the software from http://compbio.mit.edu/ChromHMM/.
  1. General annotation data Files for positional conservation are already stored under GIT_lncRNA_networks/Data/posCons_anno/, all other files listed below get downloaded via running the workflow script and the subscripts it calls.
  • human gff files for gene annotation
  • disease annotation
  • positionally conserved protein-coding gene - lncRNA pairs

Annotation:

  • We only want to annotate interaction regions with their overlapping genes if these genes are actually transcribed. The cutoff for protein-coding genes is defined as the minimum between two modes in the RNA-seq expression distribution. For lncRNAs there is only one mode, so there are different methods of choosing an expression cutoff. The default is "lnc0" which means that we use all lncRNAs that are present in two independent RNA-seq experiments. With "lncg" we use the protein-coding genes cutoff for lncRNAs as well. When "75" is chosen we select genes above the 75%-quantile for each protein-coding and lncRNAs.
  • If you want to annotate interaction regions which overlap protein-coding genes and lncRNAs only as genes, change gene_lnc_choice in line 57 in /GIT_lncRNA_networks/Code/workflow_lncRNA-network.sh to "gene".

Networks:

  • Networks are created and analysed mainly with matlab. Output formats are on the one hand a .mat file containing the adjacency matrix, node ids, annotation and features and on the other hand csv files for nodes and edges which can be loaded into cytoscape or gephi or another visualization software of your choice. In /GIT_lncRNA_networks/Code/networks/make_adjacency_matrices.m you can choose if you want to create only the whole chromosome network or also its biggest connected component (default is both), and how to score edges when two nodes are reported to be interacting by several different PETs. When collapsing these multiple edges the default is to use the maximum of all edge weights (max_score), but can also choose to sum all edge weights (cummulative_score).
  • The MSM clustering algorithm is not publicly available at the moment. Contact Natasa Djurdjevac Conrad if you are interested in using it. Pseudocode is available here http://publications.imp.fu-berlin.de/1127/.

About

Identifying lncRNA-mediated regulatory modules via ChIA-PET network analysis

Resources

Stars

Watchers

Forks

Releases

No releases published