Skip to content
Navigation Menu
Toggle navigation
Sign in
In this repository
All GitHub Enterprise
↵
Jump to
↵
No suggested jump to results
In this repository
All GitHub Enterprise
↵
Jump to
↵
In this organization
All GitHub Enterprise
↵
Jump to
↵
In this repository
All GitHub Enterprise
↵
Jump to
↵
Sign in
Reseting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
mpip
/
EPIC_Preprocessing_Pipeline
Public
Notifications
You must be signed in to change notification settings
Fork
1
Star
0
Code
Issues
0
Pull requests
0
Actions
Projects
1
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Security
Insights
Files
1246ab6
data
pipeline_flowchart
scripts
10_correction_batch_effects_unfiltered.Rmd
1_definitions_and_setup.Rmd
2_import_raw_DNAm_data.Rmd
3_quality_control.Rmd
4_check_matching_of_epigenetic_sex.Rmd
5_samples_exclusion.Rmd
6_normalization.Rmd
7_filtering_cpgs.Rmd
8_removal_of_outliers_detect_batch_effects.Rmd
9_correction_batch_effects_filtered.Rmd
.gitignore
LICENSE
README.md
Breadcrumbs
EPIC_Preprocessing_Pipeline
/
scripts
/
2_import_raw_DNAm_data.Rmd
Blame
Blame
Latest commit
History
History
165 lines (135 loc) · 6.79 KB
Breadcrumbs
EPIC_Preprocessing_Pipeline
/
scripts
/
2_import_raw_DNAm_data.Rmd
Top
File metadata and controls
Code
Blame
165 lines (135 loc) · 6.79 KB
Raw
--- title: "Script 2: Import raw DNAm data" date: '`r format(Sys.time(), "%d %B, %Y")`' output: html_document --- ```{r runtime start, include=FALSE} start_time <- Sys.time() ``` ```{r setup, include=FALSE} needed_packages <- c("BiocManager", "knitr", "dplyr", "rmarkdown") installed_packages <- needed_packages %in% rownames(installed.packages()) if (any(installed_packages == FALSE)) { install.packages(needed_packages[!installed_packages], repos = "http://cran.us.r-project.org") } lapply(needed_packages, library, character.only = TRUE) needed_biocmanager_packages <- c("minfi", "IlluminaHumanMethylationEPICanno.ilm10b4.hg19", "IlluminaHumanMethylationEPICmanifest") installed_biocmanager_packages <- needed_biocmanager_packages %in% rownames(installed.packages()) if (any(installed_biocmanager_packages == FALSE)) { BiocManager::install(needed_biocmanager_packages[!installed_biocmanager_packages]) } lapply(needed_biocmanager_packages, library, character.only = TRUE) user_choices <- readRDS("../data/user_choices.rds") if (user_choices$array_type == "v2") { if (!("jokergoo/IlluminaHumanMethylationEPICv2manifest" %in% rownames(installed.packages()))) { BiocManager::install("jokergoo/IlluminaHumanMethylationEPICv2manifest") } if (!("IlluminaHumanMethylationEPICv2anno.20a1.hg38" %in% rownames(installed.packages()))) { BiocManager::install("jokergoo/IlluminaHumanMethylationEPICv2anno.20a1.hg38") } library(IlluminaHumanMethylationEPICv2manifest) library(IlluminaHumanMethylationEPICv2anno.20a1.hg38) } knitr::opts_knit$set(root.dir = paste0(user_choices$personal_path, "/")) knitr::opts_chunk$set(echo = FALSE) ``` # Description This script: * reads raw intensity signals (variables: red & green) from IDAT files to **RGChannelSet object** * organizes data by probe level (variables: methylated & unmethylated) to **MethylSet object** * converts MethylSet object to **RatioSet object** (variables: Beta and M values (logratio of Beta)) * gets raw beta values (**Betas**) from RatioSet obejct * maps RatioSet with genomic position of loci to create **gRatioSet** * saves **phenotype data** (meta data) as **PhenoData** file from RGSet * creates **summary file of preprocessing steps** to record current number of samples and probes as pipeline progresses. <br> **Further information about the data classes mentioned above is available in the minfi User's Guide:** https://bioconductor.org/packages/devel/bioc/vignettes/minfi/inst/doc/minfi.html <br> **Note:** Recording object dimensions (samples, probes) for the different files throughout the pipeline is very important. Please check the summary file with dimensions of object at the end of each step in the EPIC pipeline ```{r, create & save red green channel set, include = FALSE} targets <- read.csv(paste0(user_choices$project_name, "/processed_data/phenotype_data.csv"), sep = ",") RGSet_original <- read.metharray.exp(targets = targets) if (user_choices$array_type == "v2") { RGSet_original@annotation <- c(array = "IlluminaHumanMethylationEPICv2", annotation = "20a1.hg38") } save(RGSet_original, file = paste0(user_choices$project_name, "/processed_data/RGSet_original.Rdata")) ``` ```{r, create & save methyl set, include=FALSE} Mset_original <- preprocessRaw(RGSet_original) save(Mset_original, file = paste0(user_choices$project_name, "/processed_data/Mset_original.Rdata")) ``` ```{r, create & save ratio set, include=FALSE} RatioSet_original <- ratioConvert(Mset_original, what = "both", keepCN = TRUE) save(RatioSet_original, file = paste0(user_choices$project_name, "/processed_data/RatioSet_original.Rdata")) ``` ```{r, vreate & save raw bets as beta values matrix, include=FALSE} Betas_original <- getBeta(RatioSet_original) save(Betas_original, file = paste0(user_choices$project_name, "/processed_data/Betas_original.Rdata")) ``` ```{r, create & save gRatio set with genomic positions of loci, include=FALSE} gRatioSet_original <- mapToGenome(RatioSet_original, mergeManifest=TRUE) save(gRatioSet_original, file = paste0(user_choices$project_name, "/processed_data/gRatioSet_original.Rdata")) ``` ```{r, create & save phenotype data file, include=FALSE} PhenoData_original <- pData(RGSet_original) save(PhenoData_original, file = paste0(user_choices$project_name, "/processed_data/PhenoData_original.Rdata")) ``` ```{r, create & save summary table file for preprocessing steps, include=FALSE} dim_RGSet_original <- dim(RGSet_original) dim_Betas_original <- dim(Betas_original) step_number <- c("1", "1") step <- c("Original", "Original") data_class <- c("RGSet", "Betas") samples <- c(dim_RGSet_original[2], dim_Betas_original[2]) probes <- c(dim_RGSet_original[1], dim_Betas_original[1]) summary_table_preprocessing <- data.frame(step_number, step, data_class, samples, probes) saveRDS(summary_table_preprocessing, paste0(user_choices$project_name, "/reports/summary_table_preprocessing.rds")) # save for future additions ``` # Summaries of created data ```{r, present class structures} RGSet_original Mset_original RatioSet_original ``` ## User report ```{r, info output for user, results = 'asis'} cat("RGChannelSet (RGSet), MethylSet (MSet), RatioSet, Betas and gRatioSet were created and named with the ending 'original'", "All 'original' datasets were saved to the processed_data folder", "Summary table of preprocessing steps was saved to the reports folder", "Session info text file was updated", sep = "<br>\n") ``` ## Preprocessing summary table ```{r, preprocessing summary table} summary_table_preprocessing %>% paged_table() ``` ```{r, document session info into a text file, include=FALSE} connection <- file(paste0(user_choices$personal_path, "/", user_choices$project_name, "/session_info_all.txt"), "a+") writeLines("######################################################################", connection) writeLines("############################ Script 2: #############################", connection) writeLines("######################################################################", connection) sessioninfo <- sessionInfo() writeLines("\nR Version:", connection) writeLines(paste0(" ", sessioninfo$R.version$version.string), connection) writeLines("\nAttached packages:", connection) # nicely format packages (with version) package_version <- unlist(lapply(sessioninfo$otherPkgs, function(x){paste0(" ",x$Package, " (", x$Version, ")")})) names(package_version) <- NULL for(i in 1:length(package_version)) { writeLines(package_version[i], connection) } writeLines("\n", connection) close(connection) ``` ```{r runtime end, include=FALSE} end_time <- Sys.time() run_time <- as.numeric(round((end_time - start_time), 1), units = "mins") ``` ## Completion time ```{r, completion time, results = 'asis'} cat("The time for the script to run was:", run_time, "minutes", sep = " ") ```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
You can’t perform that action at this time.