Skip to content
This repository has been archived by the owner. It is now read-only.
Permalink
64fce7794e
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
79 lines (66 sloc) 5.62 KB
```{r parameters-and-defaults, include = FALSE}
module <- "scRNAseq"
section <- "normalization"
```
```{r parameter-merge, include = FALSE}
local_params <- module %>%
options() %>%
magrittr::extract2(module) %>%
magrittr::extract2(section) %>%
ReporteR.base::validate_params(parameters_and_defaults)
```
```{r scRNAseq-normalization-B-TMM-checks, include = FALSE}
num_figure_rows <- 3
assertive.sets::assert_is_subset(local_params$features, colnames(SummarizedExperiment::colData(object_filtered)))
if (assertive.properties::is_empty(local_params$features)) {
local_params$features <- c(NULL)
num_figure_rows <- 2
}
```
### TMM normalization
Scaling to library size as a form of normalization makes intuitive sense, given it is expected that sequencing a sample to half the depth will give, on average, half the number of reads mapping to each gene. We believe this is appropriate for normalizing between replicate samples of an RNA population. However, library size scaling is too simple for many biological applications. The number of tags expected to map to a gene is not only dependent on the expression level and length of the gene, but also the composition of the RNA population that is being sampled. Thus, if a large number of genes are unique to, or highly expressed in, one experimental condition, the sequencing 'real estate' available for the remaining genes in that sample is decreased. If not adjusted for, this sampling artifact can force the DE analysis to be skewed towards one experimental condition.
TMM normalization [@robinson_tmm_2010] essentially determines a global fold change between the relative RNA production of two samples. They propose an empirical strategy that equates the overall expression levels of genes between samples under the assumption that the majority of them are not differentially expressed. The simple yet robust way to estimate the ratio of RNA production uses a weighted trimmed mean of the log expression ratios (trimmed mean of M values (TMM)). Figure \@ref(fig:scRNAseq-normalization-B-TMM-figure) depicts effects of the normalization strategy.
```{r scRNAseq-normalization-B-TMM-processing, include = FALSE, echo = FALSE}
object_filtered <- scater::normalizeExprs(object_filtered, method="TMM")
SummarizedExperiment::assay(object_filtered, "norm_TMM") <- SummarizedExperiment::assay(object_filtered, "logcounts")
# Reset logcounts
SummarizedExperiment::assay(object_filtered, "logcounts") <- log2(SummarizedExperiment::assay(object_filtered, "counts") + 1)
```
```{r scRNAseq-normalization-B-TMM-figure-params, include = FALSE, echo = FALSE}
fig_height <- ReporteR.base::estimate_figure_height(
height_in_panels = num_figure_rows,
panel_height_in_in = params$formatting_defaults$figures$panel_height_in,
axis_space_in_in = params$formatting_defaults$figures$axis_space_in,
mpf_row_space = as.numeric(grid::convertUnit(grid::unit(5, 'mm'), 'in')),
max_height_in_in = params$formatting_defaults$figures$max_height_in)
```
```{r scRNAseq-normalization-B-TMM-figure, echo = FALSE, message=FALSE, warning=FALSE, fig.height = fig_height$global, fig.cap = paste("Results of TMM normalization.", caption_norm_pca, ifelse(num_figure_rows == 3, caption_norm_pca_extra, ""))}
figure_normalization_TMM <- multipanelfigure::multi_panel_figure(height = fig_height$sub, columns = 3, rows = num_figure_rows, unit = "in")
# Based on raw counts
figure_normalization_TMM <- multipanelfigure::fill_panel(figure_normalization_TMM,
scater::plotPCASCE(object_filtered, ntop = 10, exprs_values = "logcounts", colour_by = local_params$features[1], add_ticks = FALSE) +
theme_norm_pca)
figure_normalization_TMM <- multipanelfigure::fill_panel(figure_normalization_TMM,
scater::plotExplanatoryVariables(object_filtered, exprs_values = "logcounts", variables = local_params$features) +
theme_norm_pca,
column = 2:3)
# Based on normalized values
figure_normalization_TMM <- multipanelfigure::fill_panel(figure_normalization_TMM,
scater::plotPCASCE(object_filtered, ntop = 10, exprs_values = "norm_TMM", colour_by = local_params$features[1], add_ticks = FALSE) +
ggplot2::guides(colour = FALSE) +
theme_norm_pca)
figure_normalization_TMM <- multipanelfigure::fill_panel(figure_normalization_TMM,
scater::plotExplanatoryVariables(object_filtered, exprs_values = "norm_TMM", variables = local_params$features) +
theme_norm_pca,
column = 2:3)
# Additional panels for first three variables
if(num_figure_rows == 3) {
for(i in 1:min(3, length(local_params$features))) {
figure_normalization_TMM <- multipanelfigure::fill_panel(figure_normalization_TMM,
scater::plotPCASCE(object_filtered, ntop = 10, exprs_values = "norm_TMM", colour_by = local_params$features[i], add_ticks = FALSE) +
ggplot2::guides(colour = FALSE) +
theme_norm_pca)
}
}
figure_normalization_TMM
```