This repository has been archived by the owner. It is now read-only.
Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
ReporteR.scRNAseq/inst/content/05-dimension-reduction-E-rsvd.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
57 lines (45 sloc)
4.34 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
```{r parameters-and-defaults, include = FALSE} | |
module <- "scRNAseq" | |
section <- "dimension_reduction" | |
``` | |
```{r parameter-merge, include = FALSE} | |
local_params <- module %>% | |
options() %>% | |
magrittr::extract2(module) %>% | |
magrittr::extract2(section) %>% | |
ReporteR.base::validate_params(parameters_and_defaults) | |
``` | |
```{r scRNAseq-dimension-reduction-E-rsvd-checks, include = FALSE} | |
assertive.sets::assert_is_subset(local_params$features, colnames(SummarizedExperiment::colData(object_filtered))) | |
``` | |
### Randomized singular value decomposition | |
**S**ingular **v**alue **d**ecomposition is a method to decompose a matrix into a factorization of three other matrices. Matrix decompositions play fundamental roles in the area of applied mathematics, statistical computing, and machine learning. In particular, low-rank matrix decompositions are vital, and widely used for data analysis, dimensionality reduction, and data compression. Singular values are the axes of a least squares fitted ellipsoid to the data and their sum of squares typically sums up to the total variance observed in the input matrix. By truncating singular values (i.e. keeping only a set of singular values that correspond to the most variance), a data compression can be achieved. Classical SVD is computationally expensive, especially for large inputs. Leveraging **randomness** is an effective strategy to derive a smaller matrix from a high-dimensional matrix that captures most of the information by randomly subsampling [@halko_rsvd_2009; @erichson_rsvd_2016]. Non of the randomness should obscure essential information in the data, as long as some low-rank structure is contained in the input matrix. Deterministic matrix factorizations (SVD) are then applied to obtain the near-optimal low-rank approximation. This approximation often beats classical methods in terms of robustness, accuracy and speed, thus allowing usage with a scalable architecture for modern *big data* applications. | |
**U**niform **M**anifold **A**pproximation and **P**rojection (UMAP) [@mcinnes_umap_2018] is a novel manifold learning technique for dimension reduction, that seeks to preserve more of the global data structure than t-SNE. UMAP constructs a topological representation of high-dimensional data by optimizing the data representation in a low-dimensional space. It builds on mathematical foundations related to Laplacian eigenmaps and category theoretic approaches to geometric realization of fuzzy simplicial sets. | |
```{r scRNAseq-dimension-reduction-E-rsvd-processing, echo=FALSE, include=FALSE} | |
features <- NULL | |
if (local_params$subset_het) { | |
features <- SummarizedExperiment::rowData(object_filtered)$is_het | |
} | |
object_filtered <- singlecellutils::reduce_dimension(object_filtered, exprs_values = local_params$assay, flavor = "rsvd", features = features, n_dims = local_params$dims) | |
object_filtered <- singlecellutils::reduce_dimension(object_filtered, flavor = "umap", slot = "rsvd_umap", exprs_values = NULL, use_dimred = "rsvd", n_neighbors = 10L) | |
``` | |
```{r scRNAseq-dimension-reduction-E-rsvd-figure-params, include = FALSE, echo = FALSE} | |
fig_height <- ReporteR.base::estimate_figure_height( | |
height_in_panels = 1, | |
panel_height_in_in = params$formatting_defaults$figures$panel_height_in, | |
axis_space_in_in = params$formatting_defaults$figures$axis_space_in, | |
mpf_row_space = as.numeric(grid::convertUnit(grid::unit(5, 'mm'), 'in')), | |
max_height_in_in = params$formatting_defaults$figures$max_height_in) | |
caption <- glue::glue("Dimensionality reduction using **randomaized SVD** [@halko_rsvd_2009; @erichson_rsvd_2016], keeping {local_params$dims} initial singular values, followed by a further reduction into two dimensions using **UMAP** [@mcinnes_umap_2018] for visualization.") | |
``` | |
```{r scRNAseq-dimension-reduction-E-rsvd-figure, echo = FALSE, message=FALSE, warning=FALSE, fig.height = fig_height$global, fig.cap = caption} | |
figure_dimred_rsvd <- multipanelfigure::multi_panel_figure(height = fig_height$sub, columns = min(3, length(local_params$features)), rows = 1, unit = "in") | |
for(i in 1:min(3, length(local_params$features))) { | |
tmp_plot <- object_filtered %>% | |
scater::plotReducedDim(use_dimred = "rsvd_umap", colour_by = local_params$features[i], add_ticks = FALSE) + | |
ggplot2::guides(colour = FALSE) + | |
theme_dimred_scatter | |
figure_dimred_rsvd <- multipanelfigure::fill_panel(figure_dimred_rsvd, tmp_plot) | |
} | |
figure_dimred_rsvd | |
``` |