Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
EPIC_Preprocessing_Pipeline/scripts/3_quality_control.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
222 lines (186 sloc)
9.4 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Script 3: Quality control" | |
date: '`r format(Sys.time(), "%d %B, %Y")`' | |
output: html_document | |
--- | |
```{r runtime start, include=FALSE} | |
start_time <- Sys.time() | |
``` | |
```{r setup, include=FALSE} | |
needed_packages <- c("BiocManager", "dplyr", "knitr", "tidyr", "ggplot2", "ggrepel", "rmarkdown") | |
user_choices <- readRDS("../data/user_choices.rds") | |
if(user_choices$package_path != "PLEASE FILL IN"){ | |
.libPaths(c(user_choices$package_path, .libPaths())) | |
} | |
installed_packages <- needed_packages %in% rownames(installed.packages()) | |
if (any(installed_packages == FALSE)) { | |
install.packages(needed_packages[!installed_packages], repos = "http://cran.us.r-project.org") | |
} | |
lapply(needed_packages, library, character.only = TRUE) | |
if (!("minfi" %in% rownames(installed.packages()))) { | |
BiocManager::install("minfi") | |
} | |
library(minfi) | |
knitr::opts_knit$set(root.dir = paste0(user_choices$personal_path, "/")) | |
knitr::opts_chunk$set(echo = FALSE) | |
``` | |
```{r load data, include=FALSE} | |
load(paste0(user_choices$project_name, "/processed_data/RGSet_original.Rdata")) | |
load(paste0(user_choices$project_name, "/processed_data/Betas_original.Rdata")) | |
load(paste0(user_choices$project_name, "/processed_data/gRatioSet_original.Rdata")) | |
phenotype_data <- read.csv(paste0(user_choices$project_name, "/processed_data/phenotype_data.csv"), sep = ",") | |
summary_table_preprocessing <- readRDS(paste0(user_choices$project_name, "/reports/summary_table_preprocessing.rds")) | |
``` | |
# Description: | |
This script | |
* calculates detection *p* values for each probe and sample | |
* removes samples with detection *p* values higher that threshold set by user (default 0.05) | |
* saves person IDs of failed samples (with insufficient detection *p* values) | |
* generates reports of poor quality samples according to detection *p* values | |
* generates minfi QC report for remaining samples | |
* generates Individual Density report, asks user to visually examine | |
Please note: Probes with high detection *p* values are not yet removed in this step, but in script 6. | |
<br> | |
**Notes:** | |
Detection *p* values report: | |
These are generated for each probe in every sample and are indicative for quality of signal. | |
The method compares the total signal (M+U) for each probe to the background signal level | |
(estimated from negative control probes). Very small values show a reliable signal | |
<br> | |
Minfi QC report: | |
This provides density plots of beta values for all samples colored by sample group | |
as well as bean plots divided by couple of arrays. This could help with identifying deviant samples | |
For further features of the report read further in the minfi User's Guide: https://bioconductor.org/packages/devel/bioc/vignettes/minfi/inst/doc/minfi.html | |
<br> | |
Individual Density Report: | |
This provides the distribution of beta values across all probes for each sample | |
The curve should have two prominent peaks, since most of probes have either | |
very low methylation (around 0%) or very high methylation (around 100%) | |
Exceptions from this rule (e.g. further or more strongly deviating peaks) might be | |
artifacts and should be excluded from further analysis | |
(**User should notify such artifacts, as they will be requested to enter these in script 5**) | |
```{r, calculate & save detection p values, include=FALSE} | |
detP_original <- detectionP(RGSet_original) | |
save(detP_original, file = paste0(user_choices$project_name, "/reports/detP_original.Rdata")) | |
``` | |
## Mean detection p values report | |
```{r, generate & save report of mean detection p values for each sample} | |
detP_mean_df <- detP_original %>% | |
as.data.frame() %>% | |
summarise(across(everything(), mean)) %>% | |
pivot_longer(cols = everything(), names_to = "personid", values_to = "Mean Detection p value") | |
detP_plot <- detP_mean_df %>% | |
ggplot(aes(x = personid, y = `Mean Detection p value`)) + | |
geom_col() + | |
geom_hline(yintercept = user_choices$detP_sample_threshold, color = "darkred") + | |
geom_label_repel(data = subset(detP_mean_df, `Mean Detection p value` > user_choices$detP_sample_threshold), | |
aes(label = personid), size = 3, box.padding = unit(0.35, "lines"), | |
point.padding = unit(0.3, "lines")) + | |
theme_bw() + | |
labs(title = "Quality Control", x = "Person ID") + | |
theme(axis.text.x = element_text(vjust = -1, hjust = 1, angle = 90)) | |
detP_plot | |
``` | |
## Removal of poor quality samples from RGset and Betas matrix | |
The following samples were removed: | |
```{r, remove poor quality samples with mean detection p value mean > threshold} | |
keep <- colMeans(detP_original) < user_choices$detP_sample_threshold | |
RGSet_quality <- RGSet_original[,keep] | |
save(RGSet_quality, file = paste0(user_choices$project_name, "/processed_data/RGSet_quality.Rdata")) | |
gRatioSet_quality <- gRatioSet_original[,keep] | |
save(gRatioSet_quality, file = paste0(user_choices$project_name, "/processed_data/gRatioSet_quality.Rdata")) | |
Betas_quality <- Betas_original[,keep] | |
save(Betas_quality, file = paste0(user_choices$project_name, "/processed_data/Betas_quality.Rdata")) | |
detP_quality <- detP_original[,keep] | |
save(detP_quality, file = paste0(user_choices$project_name, "/reports/detP_quality.Rdata")) | |
# display removed samples and corresponding detection p values and save for script 5 | |
detP_mean_fails_df <- detP_mean_df %>% filter(`Mean Detection p value` > user_choices$detP_sample_threshold) | |
saveRDS(detP_mean_fails_df, paste0(user_choices$project_name, "/reports/detP_mean_fails_df.rds")) | |
detP_mean_fails_df %>% paged_table() | |
``` | |
## Minfi QC report was created | |
```{r, minfi QC report for remaining samples, include=FALSE} | |
qcReport(RGSet_quality, sampGroups = phenotype_data$slide, pdf = paste0(user_choices$project_name, "/reports/Minfi_QC_report.pdf")) | |
``` | |
```{r, update summary table file for preprocessing steps, include=FALSE} | |
dim_RGSet_quality <- dim(RGSet_quality) | |
dim_Betas_quality <- dim(Betas_quality) | |
step_number <- c("2", "2") | |
step <- c("Quality", "Quality") | |
data_class <- c("RGSet", "Betas") | |
samples <- c(dim_RGSet_quality[2], dim_Betas_quality[2]) | |
probes <- c(dim_RGSet_quality[1], dim_Betas_quality[1]) | |
table_preprocessing_adding <- data.frame(step_number, step, data_class, samples, probes) | |
summary_table_preprocessing <- bind_rows(summary_table_preprocessing, table_preprocessing_adding) | |
saveRDS(summary_table_preprocessing, paste0(user_choices$project_name, "/reports/summary_table_preprocessing.rds")) | |
``` | |
## Individual Density report | |
```{r, individual density report} | |
title_df <- data.frame(rownames(pData(RGSet_quality))) | |
colnames(title_df) <- "arrayid" | |
title_df <- title_df %>% | |
left_join(phenotype_data, by = "arrayid") %>% | |
dplyr::select(arrayid, personid) | |
for (i in 1:ncol(RGSet_quality)){ | |
title_df_current <- title_df %>% filter(arrayid == rownames(pData(RGSet_quality))[i]) %>% pull(personid) | |
titel <- paste0(rownames(pData(RGSet_quality))[i], " / ", title_df_current) | |
densityPlot(as.matrix(Betas_quality[,i]), main = titel) | |
print(i) | |
} | |
``` | |
```{r, save reports as pdf, include=FALSE} | |
pdf(paste0(user_choices$project_name, "/reports/detectionP_report.pdf"), paper = "a4", width = 11, height = 8) | |
detP_plot | |
dev.off() | |
pdf(paste0(user_choices$project_name, "/reports/individual_density.pdf")) | |
for (i in 1:ncol(RGSet_quality)) | |
{titel <- paste(rownames(pData(RGSet_quality))[i]) | |
densityPlot(as.matrix(Betas_quality[,i]), main = titel) | |
print(i)} | |
dev.off() | |
``` | |
## User report | |
```{r, info output for user, results = 'asis'} | |
cat("Detection p values for all probes of every sample were calculated and saved to the reports folder", | |
"Person IDs of failed samples due to detection p value were saved to the reports folder", | |
paste0(nrow(detP_mean_fails_df) ," poor quality samples were removed"), | |
"Minfi QC report and Individual Density reports were generated and saved to the reports folder", | |
"Summary table of preprocessing steps was updated", | |
"Session info text file was updated", sep = "<br>\n") | |
``` | |
## Record any visually spotted distrubution artifacts | |
```{r, request that user records visual artifacts, results = 'asis'} | |
cat("Please record any visually spotted distribution artifacts from the Individual Density Report.", | |
"These will be excluded in script 5.", sep = "<br>\n") | |
``` | |
## Preprocessing summary table | |
```{r, preprocessing summary table} | |
summary_table_preprocessing %>% paged_table() | |
``` | |
```{r, document session info into a text file, include=FALSE} | |
connection <- file(paste0(user_choices$personal_path, "/", user_choices$project_name, | |
"/session_info_all.txt"), "a+") | |
writeLines("######################################################################", connection) | |
writeLines("############################ Script 3: #############################", connection) | |
writeLines("######################################################################", connection) | |
sessioninfo <- sessionInfo() | |
writeLines("\nR Version:", connection) | |
writeLines(paste0(" ", sessioninfo$R.version$version.string), connection) | |
writeLines("\nAttached packages:", connection) | |
# nicely format packages (with version) | |
package_version <- unlist(lapply(sessioninfo$otherPkgs, function(x){paste0(" ",x$Package, " (", x$Version, ")")})) | |
names(package_version) <- NULL | |
for(i in 1:length(package_version)) { | |
writeLines(package_version[i], connection) | |
} | |
writeLines("\n", connection) | |
close(connection) | |
``` | |
```{r runtime end, include=FALSE} | |
end_time <- Sys.time() | |
run_time <- as.numeric(round((end_time - start_time), 1), units = "mins") | |
``` | |
## Completion time | |
```{r, completion time, results = 'asis'} | |
cat("The time for the script to run was:", run_time, "minutes", sep = " ") | |
``` |