Skip to content
Permalink
f1bd3ab1ae
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
171 lines (140 sloc) 6.35 KB
---
title: "Script 4: Check matching of epigenetic sex"
date: '`r format(Sys.time(), "%d %B, %Y")`'
output: html_document
---
```{r runtime start, include=FALSE}
start_time <- Sys.time()
```
```{r setup, include=FALSE}
needed_packages <- c("BiocManager", "dplyr", "knitr", "ggplot2", "rmarkdown")
user_choices <- readRDS("../data/user_choices.rds")
if(user_choices$package_path != "PLEASE FILL IN"){
.libPaths(c(user_choices$package_path, .libPaths()))
}
installed_packages <- needed_packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(needed_packages[!installed_packages], repos = "http://cran.us.r-project.org")
}
lapply(needed_packages, library, character.only = TRUE)
if (!("minfi" %in% rownames(installed.packages()))) {
BiocManager::install("minfi")
}
library(minfi)
knitr::opts_knit$set(root.dir = paste0(user_choices$personal_path, "/"))
knitr::opts_chunk$set(echo = FALSE)
```
```{r load data, include=FALSE}
load(paste0(user_choices$project_name, "/processed_data/gRatioSet_quality.Rdata"))
phenotype_data <- read.csv(paste0(user_choices$project_name, "/processed_data/phenotype_data.csv"), sep = ",")
```
# Description
This script:
* predicts sex with DNA methylation data (gRatioSet_quality, chip-wide medians
of measurements on sex chromosomes)
* compares epigenetically predicted sex with phenotype information
* reports samples with mismatches. These will be excluded in script 5
<br>
**Notes:**
If mismatches are present, the user should contact the principal investigator of
the study to exclude the possibility of transmission errors.
Often samples that have failed in QC due to high detection *p* value (Script 3) also
fail the sex match check
## Prediction of sex with DNA methylation data
```{r, predict sex with DNAm data, include= FALSE}
predictedSex <- getSex(gRatioSet_quality, cutoff=-2)
phenotype_data <- subset(phenotype_data, arrayid %in% rownames(predictedSex),)
sex_df <- phenotype_data %>%
select(personid, arrayid, reported_sex = sex) %>%
mutate(predicted_sex = predictedSex$predictedSex,
predicted_sex_ymedian = as.numeric(as.character(predictedSex$yMed)),
predicted_sex_xmedian = as.numeric(as.character(predictedSex$xMed)),
predicted_sex_difference = predicted_sex_ymedian - predicted_sex_xmedian)
# difference = ymedian - xmedian
```
## Matching of predicted and reported sex
If you have any sex mismatches, they will be displayed in the table below as red and matches will be blue
Additionally, a comparison between median total intensities of sex chromosomes is displayed
(samples should be further examined if not clearly clustering with male or female clusters)
```{r, test sex matches}
mismatch_check_df <- sex_df %>%
select(reported_sex, predicted_sex) %>%
rowwise() %>%
mutate(match = n_distinct(unlist(cur_data())) == 1) %>%
ungroup()
sex_df$match <- mismatch_check_df$match
saveRDS(sex_df, paste0(user_choices$project_name, "/reports/predicted_sex.rds"))
sex_df %>% dplyr::count(match) %>%
ggplot(aes(x = match, y = n, fill = match)) +
geom_col(alpha = 0.7) +
scale_y_continuous(breaks = ~round(unique(pretty(.)))) +
labs(x = "Matched Predicted and Reported Sex", y = "Count") +
theme(legend.position = "none") +
scale_fill_manual(breaks = c("FALSE", "TRUE"), values = c("darkred", "darkblue"))
kable(table(sex_df$reported_sex, sex_df$predicted_sex),
col.names = c("Predicted Female", "Predicted Male"), type = "html",
caption = "Comparison of Reported to Predicted Sex")
plotSex_custom <- function (object, id = NULL)
{
plot(x = object$predicted_sex_xmedian, y = object$predicted_sex_ymedian, type = "n", xlab = "X chr, median total intensity (log2)",
ylab = "Y chr, median total intensity (log2)")
text(x = object$predicted_sex_xmedian, y = object$predicted_sex_ymedian, labels = id, col = ifelse(object$predicted_sex ==
"M", "deepskyblue", "deeppink3"))
legend("bottomleft", c("M", "F"), col = c("deepskyblue",
"deeppink3"), pch = 16)
}
plotSex_custom(sex_df, id = sex_df$personid)
```
## Mismatches of predicted and reported sex
The following samples were removed due to sex mismatch:
```{r, test sex mismatches}
sex_df %>% filter(match == FALSE) %>% paged_table()
```
## User report
```{r, info output for user 1, results='asis'}
cat("Sex prediction using DNAm data was completed and saved to the reports folder", sep = "<br>\n")
```
```{r, info output for user 2, results='asis'}
if(sum(is.na(sex_df$match)) > 0){
cat("Caution! NAs were produced in your predictions. Please check the predicted_sex
file in the reports folder", sep = "<br>\n")
}
```
```{r, info output for user 3, results='asis'}
if (sex_df %>% filter(match == TRUE) %>% pull(match) %>% length() == nrow(sex_df)) {
cat("There were no sex mismatches between predicted and reported sex", sep = "<br>\n")
} else {
cat(paste0("Mismatches in ", sex_df %>% filter(match == F) %>% pull(match) %>% length(),
" samples were detected and visualized"), sep = "<br>\n")
}
```
```{r, info output for user 4, results='asis'}
cat("Session info text file was updated", sep = "<br>\n")
```
```{r, document session info into a text file, include=FALSE}
connection <- file(paste0(user_choices$personal_path, "/", user_choices$project_name,
"/session_info_all.txt"), "a+")
writeLines("######################################################################", connection)
writeLines("############################ Script 4: #############################", connection)
writeLines("######################################################################", connection)
sessioninfo <- sessionInfo()
writeLines("\nR Version:", connection)
writeLines(paste0(" ", sessioninfo$R.version$version.string), connection)
writeLines("\nAttached packages:", connection)
# nicely format packages (with version)
package_version <- unlist(lapply(sessioninfo$otherPkgs, function(x){paste0(" ",x$Package, " (", x$Version, ")")}))
names(package_version) <- NULL
for(i in 1:length(package_version)) {
writeLines(package_version[i], connection)
}
writeLines("\n", connection)
close(connection)
```
```{r runtime end, include=FALSE}
end_time <- Sys.time()
run_time <- as.numeric(round((end_time - start_time), 1), units = "mins")
```
## Completion time
```{r, completion time, results = 'asis'}
cat("The time for the script to run was:", run_time, "minutes", sep = " ")
```