Skip to content
Navigation Menu
Toggle navigation
Sign in
In this repository
All GitHub Enterprise
↵
Jump to
↵
No suggested jump to results
In this repository
All GitHub Enterprise
↵
Jump to
↵
In this organization
All GitHub Enterprise
↵
Jump to
↵
In this repository
All GitHub Enterprise
↵
Jump to
↵
Sign in
Reseting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
mpip
/
EPIC_Preprocessing_Pipeline
Public
Notifications
You must be signed in to change notification settings
Fork
1
Star
0
Code
Issues
0
Pull requests
0
Actions
Projects
1
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Security
Insights
Files
9168ca9
data
pipeline_flowchart
scripts
10_correction_batch_effects_unfiltered.Rmd
11_annotate_hg19_create_pseudo_epicv1_from_epicv2.Rmd
1_definitions_and_setup.Rmd
2_import_raw_DNAm_data.Rmd
3_quality_control.Rmd
4_check_matching_of_epigenetic_sex.Rmd
5_samples_exclusion.Rmd
6_filtering_cpgs.Rmd
7_normalization.Rmd
8_removal_of_outliers_detect_batch_effects.Rmd
9_correction_batch_effects_filtered.Rmd
.gitignore
LICENSE
README.md
Breadcrumbs
EPIC_Preprocessing_Pipeline
/
scripts
/
4_check_matching_of_epigenetic_sex.Rmd
Blame
Blame
Latest commit
History
History
171 lines (140 loc) · 6.35 KB
Breadcrumbs
EPIC_Preprocessing_Pipeline
/
scripts
/
4_check_matching_of_epigenetic_sex.Rmd
Top
File metadata and controls
Code
Blame
171 lines (140 loc) · 6.35 KB
Raw
--- title: "Script 4: Check matching of epigenetic sex" date: '`r format(Sys.time(), "%d %B, %Y")`' output: html_document --- ```{r runtime start, include=FALSE} start_time <- Sys.time() ``` ```{r setup, include=FALSE} needed_packages <- c("BiocManager", "dplyr", "knitr", "ggplot2", "rmarkdown") user_choices <- readRDS("../data/user_choices.rds") if(user_choices$package_path != "PLEASE FILL IN"){ .libPaths(c(user_choices$package_path, .libPaths())) } installed_packages <- needed_packages %in% rownames(installed.packages()) if (any(installed_packages == FALSE)) { install.packages(needed_packages[!installed_packages], repos = "http://cran.us.r-project.org") } lapply(needed_packages, library, character.only = TRUE) if (!("minfi" %in% rownames(installed.packages()))) { BiocManager::install("minfi") } library(minfi) knitr::opts_knit$set(root.dir = paste0(user_choices$personal_path, "/")) knitr::opts_chunk$set(echo = FALSE) ``` ```{r load data, include=FALSE} load(paste0(user_choices$project_name, "/processed_data/gRatioSet_quality.Rdata")) phenotype_data <- read.csv(paste0(user_choices$project_name, "/processed_data/phenotype_data.csv"), sep = ",") ``` # Description This script: * predicts sex with DNA methylation data (gRatioSet_quality, chip-wide medians of measurements on sex chromosomes) * compares epigenetically predicted sex with phenotype information * reports samples with mismatches. These will be excluded in script 5 <br> **Notes:** If mismatches are present, the user should contact the principal investigator of the study to exclude the possibility of transmission errors. Often samples that have failed in QC due to high detection *p* value (Script 3) also fail the sex match check ## Prediction of sex with DNA methylation data ```{r, predict sex with DNAm data, include= FALSE} predictedSex <- getSex(gRatioSet_quality, cutoff=-2) phenotype_data <- subset(phenotype_data, arrayid %in% rownames(predictedSex),) sex_df <- phenotype_data %>% select(personid, arrayid, reported_sex = sex) %>% mutate(predicted_sex = predictedSex$predictedSex, predicted_sex_ymedian = as.numeric(as.character(predictedSex$yMed)), predicted_sex_xmedian = as.numeric(as.character(predictedSex$xMed)), predicted_sex_difference = predicted_sex_ymedian - predicted_sex_xmedian) # difference = ymedian - xmedian ``` ## Matching of predicted and reported sex If you have any sex mismatches, they will be displayed in the table below as red and matches will be blue Additionally, a comparison between median total intensities of sex chromosomes is displayed (samples should be further examined if not clearly clustering with male or female clusters) ```{r, test sex matches} mismatch_check_df <- sex_df %>% select(reported_sex, predicted_sex) %>% rowwise() %>% mutate(match = n_distinct(unlist(cur_data())) == 1) %>% ungroup() sex_df$match <- mismatch_check_df$match saveRDS(sex_df, paste0(user_choices$project_name, "/reports/predicted_sex.rds")) sex_df %>% dplyr::count(match) %>% ggplot(aes(x = match, y = n, fill = match)) + geom_col(alpha = 0.7) + scale_y_continuous(breaks = ~round(unique(pretty(.)))) + labs(x = "Matched Predicted and Reported Sex", y = "Count") + theme(legend.position = "none") + scale_fill_manual(breaks = c("FALSE", "TRUE"), values = c("darkred", "darkblue")) kable(table(sex_df$reported_sex, sex_df$predicted_sex), col.names = c("Predicted Female", "Predicted Male"), type = "html", caption = "Comparison of Reported to Predicted Sex") plotSex_custom <- function (object, id = NULL) { plot(x = object$predicted_sex_xmedian, y = object$predicted_sex_ymedian, type = "n", xlab = "X chr, median total intensity (log2)", ylab = "Y chr, median total intensity (log2)") text(x = object$predicted_sex_xmedian, y = object$predicted_sex_ymedian, labels = id, col = ifelse(object$predicted_sex == "M", "deepskyblue", "deeppink3")) legend("bottomleft", c("M", "F"), col = c("deepskyblue", "deeppink3"), pch = 16) } plotSex_custom(sex_df, id = sex_df$personid) ``` ## Mismatches of predicted and reported sex The following samples were removed due to sex mismatch: ```{r, test sex mismatches} sex_df %>% filter(match == FALSE) %>% paged_table() ``` ## User report ```{r, info output for user 1, results='asis'} cat("Sex prediction using DNAm data was completed and saved to the reports folder", sep = "<br>\n") ``` ```{r, info output for user 2, results='asis'} if(sum(is.na(sex_df$match)) > 0){ cat("Caution! NAs were produced in your predictions. Please check the predicted_sex file in the reports folder", sep = "<br>\n") } ``` ```{r, info output for user 3, results='asis'} if (sex_df %>% filter(match == TRUE) %>% pull(match) %>% length() == nrow(sex_df)) { cat("There were no sex mismatches between predicted and reported sex", sep = "<br>\n") } else { cat(paste0("Mismatches in ", sex_df %>% filter(match == F) %>% pull(match) %>% length(), " samples were detected and visualized"), sep = "<br>\n") } ``` ```{r, info output for user 4, results='asis'} cat("Session info text file was updated", sep = "<br>\n") ``` ```{r, document session info into a text file, include=FALSE} connection <- file(paste0(user_choices$personal_path, "/", user_choices$project_name, "/session_info_all.txt"), "a+") writeLines("######################################################################", connection) writeLines("############################ Script 4: #############################", connection) writeLines("######################################################################", connection) sessioninfo <- sessionInfo() writeLines("\nR Version:", connection) writeLines(paste0(" ", sessioninfo$R.version$version.string), connection) writeLines("\nAttached packages:", connection) # nicely format packages (with version) package_version <- unlist(lapply(sessioninfo$otherPkgs, function(x){paste0(" ",x$Package, " (", x$Version, ")")})) names(package_version) <- NULL for(i in 1:length(package_version)) { writeLines(package_version[i], connection) } writeLines("\n", connection) close(connection) ``` ```{r runtime end, include=FALSE} end_time <- Sys.time() run_time <- as.numeric(round((end_time - start_time), 1), units = "mins") ``` ## Completion time ```{r, completion time, results = 'asis'} cat("The time for the script to run was:", run_time, "minutes", sep = " ") ```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
You can’t perform that action at this time.