Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
EPIC_Preprocessing_Pipeline/scripts/4_check_matching_of_epigenetic_sex.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
171 lines (140 sloc)
6.35 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Script 4: Check matching of epigenetic sex" | |
date: '`r format(Sys.time(), "%d %B, %Y")`' | |
output: html_document | |
--- | |
```{r runtime start, include=FALSE} | |
start_time <- Sys.time() | |
``` | |
```{r setup, include=FALSE} | |
needed_packages <- c("BiocManager", "dplyr", "knitr", "ggplot2", "rmarkdown") | |
user_choices <- readRDS("../data/user_choices.rds") | |
if(user_choices$package_path != "PLEASE FILL IN"){ | |
.libPaths(c(user_choices$package_path, .libPaths())) | |
} | |
installed_packages <- needed_packages %in% rownames(installed.packages()) | |
if (any(installed_packages == FALSE)) { | |
install.packages(needed_packages[!installed_packages], repos = "http://cran.us.r-project.org") | |
} | |
lapply(needed_packages, library, character.only = TRUE) | |
if (!("minfi" %in% rownames(installed.packages()))) { | |
BiocManager::install("minfi") | |
} | |
library(minfi) | |
knitr::opts_knit$set(root.dir = paste0(user_choices$personal_path, "/")) | |
knitr::opts_chunk$set(echo = FALSE) | |
``` | |
```{r load data, include=FALSE} | |
load(paste0(user_choices$project_name, "/processed_data/gRatioSet_quality.Rdata")) | |
phenotype_data <- read.csv(paste0(user_choices$project_name, "/processed_data/phenotype_data.csv"), sep = ",") | |
``` | |
# Description | |
This script: | |
* predicts sex with DNA methylation data (gRatioSet_quality, chip-wide medians | |
of measurements on sex chromosomes) | |
* compares epigenetically predicted sex with phenotype information | |
* reports samples with mismatches. These will be excluded in script 5 | |
<br> | |
**Notes:** | |
If mismatches are present, the user should contact the principal investigator of | |
the study to exclude the possibility of transmission errors. | |
Often samples that have failed in QC due to high detection *p* value (Script 3) also | |
fail the sex match check | |
## Prediction of sex with DNA methylation data | |
```{r, predict sex with DNAm data, include= FALSE} | |
predictedSex <- getSex(gRatioSet_quality, cutoff=-2) | |
phenotype_data <- subset(phenotype_data, arrayid %in% rownames(predictedSex),) | |
sex_df <- phenotype_data %>% | |
select(personid, arrayid, reported_sex = sex) %>% | |
mutate(predicted_sex = predictedSex$predictedSex, | |
predicted_sex_ymedian = as.numeric(as.character(predictedSex$yMed)), | |
predicted_sex_xmedian = as.numeric(as.character(predictedSex$xMed)), | |
predicted_sex_difference = predicted_sex_ymedian - predicted_sex_xmedian) | |
# difference = ymedian - xmedian | |
``` | |
## Matching of predicted and reported sex | |
If you have any sex mismatches, they will be displayed in the table below as red and matches will be blue | |
Additionally, a comparison between median total intensities of sex chromosomes is displayed | |
(samples should be further examined if not clearly clustering with male or female clusters) | |
```{r, test sex matches} | |
mismatch_check_df <- sex_df %>% | |
select(reported_sex, predicted_sex) %>% | |
rowwise() %>% | |
mutate(match = n_distinct(unlist(cur_data())) == 1) %>% | |
ungroup() | |
sex_df$match <- mismatch_check_df$match | |
saveRDS(sex_df, paste0(user_choices$project_name, "/reports/predicted_sex.rds")) | |
sex_df %>% dplyr::count(match) %>% | |
ggplot(aes(x = match, y = n, fill = match)) + | |
geom_col(alpha = 0.7) + | |
scale_y_continuous(breaks = ~round(unique(pretty(.)))) + | |
labs(x = "Matched Predicted and Reported Sex", y = "Count") + | |
theme(legend.position = "none") + | |
scale_fill_manual(breaks = c("FALSE", "TRUE"), values = c("darkred", "darkblue")) | |
kable(table(sex_df$reported_sex, sex_df$predicted_sex), | |
col.names = c("Predicted Female", "Predicted Male"), type = "html", | |
caption = "Comparison of Reported to Predicted Sex") | |
plotSex_custom <- function (object, id = NULL) | |
{ | |
plot(x = object$predicted_sex_xmedian, y = object$predicted_sex_ymedian, type = "n", xlab = "X chr, median total intensity (log2)", | |
ylab = "Y chr, median total intensity (log2)") | |
text(x = object$predicted_sex_xmedian, y = object$predicted_sex_ymedian, labels = id, col = ifelse(object$predicted_sex == | |
"M", "deepskyblue", "deeppink3")) | |
legend("bottomleft", c("M", "F"), col = c("deepskyblue", | |
"deeppink3"), pch = 16) | |
} | |
plotSex_custom(sex_df, id = sex_df$personid) | |
``` | |
## Mismatches of predicted and reported sex | |
The following samples were removed due to sex mismatch: | |
```{r, test sex mismatches} | |
sex_df %>% filter(match == FALSE) %>% paged_table() | |
``` | |
## User report | |
```{r, info output for user 1, results='asis'} | |
cat("Sex prediction using DNAm data was completed and saved to the reports folder", sep = "<br>\n") | |
``` | |
```{r, info output for user 2, results='asis'} | |
if(sum(is.na(sex_df$match)) > 0){ | |
cat("Caution! NAs were produced in your predictions. Please check the predicted_sex | |
file in the reports folder", sep = "<br>\n") | |
} | |
``` | |
```{r, info output for user 3, results='asis'} | |
if (sex_df %>% filter(match == TRUE) %>% pull(match) %>% length() == nrow(sex_df)) { | |
cat("There were no sex mismatches between predicted and reported sex", sep = "<br>\n") | |
} else { | |
cat(paste0("Mismatches in ", sex_df %>% filter(match == F) %>% pull(match) %>% length(), | |
" samples were detected and visualized"), sep = "<br>\n") | |
} | |
``` | |
```{r, info output for user 4, results='asis'} | |
cat("Session info text file was updated", sep = "<br>\n") | |
``` | |
```{r, document session info into a text file, include=FALSE} | |
connection <- file(paste0(user_choices$personal_path, "/", user_choices$project_name, | |
"/session_info_all.txt"), "a+") | |
writeLines("######################################################################", connection) | |
writeLines("############################ Script 4: #############################", connection) | |
writeLines("######################################################################", connection) | |
sessioninfo <- sessionInfo() | |
writeLines("\nR Version:", connection) | |
writeLines(paste0(" ", sessioninfo$R.version$version.string), connection) | |
writeLines("\nAttached packages:", connection) | |
# nicely format packages (with version) | |
package_version <- unlist(lapply(sessioninfo$otherPkgs, function(x){paste0(" ",x$Package, " (", x$Version, ")")})) | |
names(package_version) <- NULL | |
for(i in 1:length(package_version)) { | |
writeLines(package_version[i], connection) | |
} | |
writeLines("\n", connection) | |
close(connection) | |
``` | |
```{r runtime end, include=FALSE} | |
end_time <- Sys.time() | |
run_time <- as.numeric(round((end_time - start_time), 1), units = "mins") | |
``` | |
## Completion time | |
```{r, completion time, results = 'asis'} | |
cat("The time for the script to run was:", run_time, "minutes", sep = " ") | |
``` |