03-normalization-Z-compare.Rmd

```{r parameters-and-defaults, include = FALSE}
module <- "scRNAseq"
section <- "normalization"
```

```{r parameter-merge, include = FALSE}
local_params <- module %>%
  options() %>%
  magrittr::extract2(module) %>%
  magrittr::extract2(section) %>%
  ReporteR.base::validate_params(parameters_and_defaults)
```

```{r scRNAseq-normalization-Z-compare-checks, include = FALSE}
if (assertive.properties::is_non_empty(local_params$batch)) {
  batch <- local_params$batch
} else {
  if (assertive.properties::is_empty(local_params$features)) {
    batch <- NULL
  } else {
    batch <- local_params$features[1]
  }
}
```

### Comparison

Essentially, a gene expression normalization procedure tries to remove unwanted variation that is introduced by technical confounders, e.g. sequencing depth, or by processing in different laboratories, on different days, using different machines (so-called *batch effects*). In order to assess whether a particular normalization strategy has been successful in removing unwanted variation, **R**elative **L**og **E**xpression (RLE) plots [@gandolfo_rle_2018] can be used.

The RLE values are computed by calculating the deviation between the expression of a feature and the median expression of this feature across all samples of the experiment. Assuming that $Y_{i,j}$ represents the (potentially normalized) log expression of gene $i$ in cell $j$, the deviations of the relative log expression (RLE values) are calulated for each sample:

$RLE_{j} = Y_{i,j} - median(Y_{i})$

An RLE plot shows the deviations for each sample in a boxplot (Figure \@ref(fig:scRNAseq-normalization-Z-compare-figure)). Under the basic assumption that most features are not changed across samples and no unwanted variation is present (anymore), deviations would be centered around 0 and have a similar spread (random variation). Other behavior would be a sign of large between sample heterogeneity and/or failed removal of unwanted variation.

```{r scRNAseq-normalization-Z-compare-figure, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Relative log expression (RLE) values for different normalizations. Boxplots represent the log deviations from the mean and should be centered near Zero with a similar spread. Watch out for other behavior, which can be a sign for failed or inappropriate normalization."}
methods <- c("sumfactor", "TMM")
exprs_mat_list <- setNames(as.list(paste0("norm_", methods)), methods)

scater::plotRLE(object_filtered, exprs_mats = c(list(tpm = 'abundance', counts = 'counts'), exprs_mat_list), exprs_logged = c(F, F, rep(T, times = length(exprs_mat_list))), colour_by = batch)
```
	```{r parameters-and-defaults, include = FALSE}
	module <- "scRNAseq"
	section <- "normalization"
	```

	```{r parameter-merge, include = FALSE}
	local_params <- module %>%
	options() %>%
	magrittr::extract2(module) %>%
	magrittr::extract2(section) %>%
	ReporteR.base::validate_params(parameters_and_defaults)
	```

	```{r scRNAseq-normalization-Z-compare-checks, include = FALSE}
	if (assertive.properties::is_non_empty(local_params$batch)) {
	batch <- local_params$batch
	} else {
	if (assertive.properties::is_empty(local_params$features)) {
	batch <- NULL
	} else {
	batch <- local_params$features[1]
	}
	}
	```

	### Comparison

	Essentially, a gene expression normalization procedure tries to remove unwanted variation that is introduced by technical confounders, e.g. sequencing depth, or by processing in different laboratories, on different days, using different machines (so-called batch effects). In order to assess whether a particular normalization strategy has been successful in removing unwanted variation, Relative Log Expression (RLE) plots [@gandolfo_rle_2018] can be used.

	The RLE values are computed by calculating the deviation between the expression of a feature and the median expression of this feature across all samples of the experiment. Assuming that $Y_{i,j}$ represents the (potentially normalized) log expression of gene $i$ in cell $j$, the deviations of the relative log expression (RLE values) are calulated for each sample:

	$RLE_{j} = Y_{i,j} - median(Y_{i})$

	An RLE plot shows the deviations for each sample in a boxplot (Figure \@ref(fig:scRNAseq-normalization-Z-compare-figure)). Under the basic assumption that most features are not changed across samples and no unwanted variation is present (anymore), deviations would be centered around 0 and have a similar spread (random variation). Other behavior would be a sign of large between sample heterogeneity and/or failed removal of unwanted variation.

	```{r scRNAseq-normalization-Z-compare-figure, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Relative log expression (RLE) values for different normalizations. Boxplots represent the log deviations from the mean and should be centered near Zero with a similar spread. Watch out for other behavior, which can be a sign for failed or inappropriate normalization."}
	methods <- c("sumfactor", "TMM")
	exprs_mat_list <- setNames(as.list(paste0("norm_", methods)), methods)

	scater::plotRLE(object_filtered, exprs_mats = c(list(tpm = 'abundance', counts = 'counts'), exprs_mat_list), exprs_logged = c(F, F, rep(T, times = length(exprs_mat_list))), colour_by = batch)
	```