Skip to content

thedinga/xgb_survival_network

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

XGB Survival Network

This is the code repository corresponding to "A gradient tree boosting and network propagation derived pan-cancer survival network of the tumor microenvironment".

To identify a pan-cancer survival gene network, a two-step approach is applied:

  1. Survival prediction with XGBoost based on gene expression data
  2. Network propagation on the feature importance weights derived in step 1 and inference of a pan-cancer survival gene sub-network

This repository contains the Python code for training XGBoost models on a single cancer cohort or on pan-cancer gene expression data as well as Python and R code for downloading the required TCGA data as well as creating the figures displayed in the paper. For the network propagation, the NetCore software was used.

Dependencies

For the identification of a survival gene network by performing survival prediction and network propagation, the following software and packages are required. For installation instructions of Python and R dependencies, please see Installation of Dependencies .

The following software and packages are required for downloading and preprocessing TCGA data:

The following software and packages are required for running the XGBoost survival prediction:

The following software and packages are required for performing network propagation:

The following software and packages are required for re-creating the figures displayed in "A gradient tree boosting and network propagation derived pan-cancer survival network of the tumor microenvironment":

Installation of Dependencies

You can install all required Python packages from Unix shell as follows:

pip install numpy==1.18.5
pip install pandas==1.1.5
pip install tqdm==4.38.0
pip install scipy=1.2.1
pip install matplotlib=3.1.1
pip install scikit-learn==0.22.2.post1
pip install seaborn==0.9.0
pip install networkx==2.3
pip install xgboost==0.90
pip install mygene==3.1.0

All R dependencies can be installed by entering an R session and typing:

>if (!require("BiocManager", quietly = TRUE))
>   install.packages("BiocManager")
>BiocManager::install(version = "3.10")
>BiocManager::install("TCGAbiolinks")
>if (!require("optparse"))
>   install.packages(“optparse”)
>if (!require("dplyr"))
>   install.packages(“dplyr”)
>if (!require("reshape2"))
>   install.packages(“reshape2”)
>if (!require("rjson"))
>   install.packages(“rjson”)
>if (!require("ggplot2"))
>   install.packages(“ggplot2”)
>if (!require("ggpubr"))
>   install.packages(“ggpubr”)
>if (!require("corrplot"))
>   install.packages(“corrplot”)
>if (!require("plyr"))
>   install.packages(“plyr”)

How to Run

Download and Preprocessing of TCGA Data

To download and preprocess the TCGA data for survival prediction with XGBoost, you can execute the following R script:

Rscript downloadTCGAData.R

which will download gene expression and clinical data for 25 different TCGA cohorts. To download specific cohorts only, you can add -c followed by the desired cohort(s) (e.g., 'TCGA-BRCA' or 'TCGA-BRCA', 'TCGA-COAD', 'TCGA-LUAD') to the programm call.

Survival Prediction with XGBoost

To run model replications of pan-cancer XGBoost training, please run:

python run_xgb_survival_replications.py 
  -r <result_dir> 
  -f <feature_dir> 
  -s <first_replication> 
  -e <last_replication>

where
result_dir: Survival prediction results will be written to this directory
feature_dir: The selected features will be written to this directory
first_replication: Number of first model replication to be performed (e.g. 1)
last_replication: Number of last model replication to be performed (e.g. 1 to run only one replication or 100 to run 100 model replications with different train-test splits)

To run model replications of single-cohort XGBoost training for a selected cohort, please run instead:

python run_xgb_survival_replications.py 
  -r <result_dir> 
  -f <feature_dir> 
  -s <first_replication> 
  -e <last_replication> 
  -c <cohort>

where
cohort: Name of the selected TCGA cancer cohort (e.g. 'TCGA-COAD')

To train a model on all data from the 25 TCGA cohorts with more than 20 uncensored patients and test the model on the remaining 8 TCGA cohorts, run:

python run_xgb_survial_test_new_cohorts.py 
  -r <result_dir> 
  -f <feature_dir>

To train a pan-cancer XGBoost model that is trained on a random subset of the training data with specified size, run:

python run_xgb_survival_random_subsets.py 
  -r <result_dir> 
  -f <feature_dir> 
  -n <subsample_size> 
  -s <first_replication> 
  -e <last_replication>

where
subsample_size: The number of patients that are randomly subsampled from the training data for model training (e.g. 500)

Preparation of Survival Prediction Outputs for Network Propagation

Run the following python script to prepare the outputs from the XGBoost survival prediction for network propagation with NetCore:

python prepare_XGBoost_results_for_NetCore.py 
  --result_path <result_dir>
  --num_replications <num_reps> 
  --output_path <out_dir>

where
result_dir: The path to the directory containing the XGBoost survival prediction results
num_reps: The number of model replications that have been performed for XGBoost survival prediction
out_dir: The output of this script, which can then be used as input to network propagation, are written to this directory

Network Propagation with NetCore

For performing network propagation with netcore, run the following command:

python <path_to_netcore>/netcore/netcore.py 
  -e <network_file>
  -w <weight_file>
  -pd <permutation_dir>
  -o <out_dir>

where
network_file: File containing network in edge list format
weight_file: Weight file containing the gene weights computed from survival prediction outputs
permutation_dir: Path to directory containing permutation files of the network
out_dir: Network propagation results will be written to this directory\

Note that before running NetCore on a network for the first time, permutations of the network need to be constructed. For more information, please visit https://github.molgen.mpg.de/barel/NetCore.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published