Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
bit_R_workshop/R_workshop.2.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
768 lines (597 sloc)
24.7 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "R Workshop -- Day 2" | |
author: "Franziska Metge" | |
output: | |
html_document: default | |
pdf_document: default | |
--- | |
```{r "setup", include=FALSE} | |
require("knitr") | |
opts_knit$set(root.dir = "~/bit_R_workshop") | |
``` | |
# Outline | |
1. Loops (solution to yesterday's homework) | |
3. Write your own function | |
4. Installing R packages (cran and bioconductor) | |
5. Basic plotting functions | |
<!-- 6. Step by Step DEG analysis to take home --> | |
# Yesterday's Homework | |
1. Load the table generated in 'create_gene_expression_table.R' using `source()` | |
2. Return all genes which are significant differentially expressed | |
3. Use the `t.test()$p.value` and a for loop | |
4. Groups are as following: | |
+ Group 1: Rep 1-4 | |
+ Group 2: Rep 5-8 | |
```{r eval = T} | |
source('data/create_gene_expression_table.R') | |
head(Experiment_DEG) | |
``` | |
```{r eval = T} | |
# number of rows | |
nrow(Experiment_DEG) | |
# Calculate the t.test for each gene using a for loop | |
for(i in 1:nrow(Experiment_DEG)){ | |
ttest = t.test(Experiment_DEG[i, 2:5], Experiment_DEG[i, 6:9]) | |
Experiment_DEG[i, 'pvals'] = as.numeric(ttest$p.value) | |
} | |
sig_genes = subset(Experiment_DEG, pvals < 0.05) | |
head(sig_genes) | |
nrow(sig_genes) | |
``` | |
# Write your own function | |
When to write a function: | |
- if you repeat the same procedure with minor changes or on different data | |
How is the basic structure of a function | |
```{r eval = F} | |
myFunctionName <- function(arg1, arg2, ...){ | |
<Do something with arg1 and arg2...> | |
return(result) | |
} | |
``` | |
There are two main principles how to write a function | |
- Think first and design the function as you go along (more experianced programmers) | |
- Wrap a function around already written code | |
## Easy examples | |
You want a function that calculates the square of a given number | |
```{r eval = T} | |
Square <- function(x){ | |
result = x * x | |
return(result) | |
} | |
Square(3) | |
``` | |
No we want to return the square and the qube of a given number | |
```{r eval = T} | |
exponent_2_and_3 <- function(x){ | |
sq = x * x | |
cb = x * x * x | |
return(c(sq, cb)) | |
} | |
exponent_2_and_3(3) | |
``` | |
Sometimes it can be useful to return a list | |
```{r eval = T} | |
exponent_2_and_3_list <- function(x){ | |
sq = x * x | |
cb = x * x * x | |
return(list(square = sq , cube = cb)) | |
} | |
exponent_2_and_3_list(3) | |
``` | |
## Hands On | |
* Write a function that takes in two numbers and returns the log2 fold change. | |
- calculate log2 fold change: `log2(y/x)` | |
```{r, eval = T, echo = F, collapse = TRUE} | |
my_log2fc <- function(x, y){ | |
lf = log2(y/x) | |
return(lf) | |
} | |
print("my_log2fc(3, 6)") | |
my_log2fc(3, 6) | |
``` | |
* Write a function that takes in two numbers and returns the bigger one or both if they are equal. | |
- remember the signs to for bigger than `>`, less than `<`, or equal to `==` | |
```{r, eval = T, echo = F, collapse = TRUE} | |
my_bigger <- function(x, y){ | |
if(x > y){ | |
return(x) | |
} else if(y > x){ | |
return(y) | |
} else { | |
return(c(x, y)) | |
} | |
} | |
print("my_bigger(3, 5)") | |
my_bigger(3, 5) | |
``` | |
## Hands on | |
Write a function (`return_bigger <- function(arguments){do something}`) which takes in a number `x` and a vector `v` of numbers. Return all numbers bigger than the given number. | |
Using a for loop (`for(i in v){do something}`) and if (`if(condtions){do something}`) statements. You can use `append()` to add a value to a vector. First create an empty vector `y <- numeric(0)`, then fill vector using `for`, `if` and `y <- append(y, number)`. | |
```{r eval = T, echo= F} | |
return_bigger <- function(x, v){ | |
bigger = numeric(0) | |
for(i in v){ | |
if(i > x){ | |
bigger = append(bigger, i) | |
} | |
} | |
return(bigger) | |
} | |
``` | |
When you finished writing the function try to run it on the following test case: | |
```{r eval = T} | |
return_bigger(5, c(10,3,2,16,4,7)) | |
``` | |
Did it work? Let's discuss some other occasions it is useful to write or use a function. | |
### A bit more complex example of a function | |
```{r eval = T, collapse=TRUE} | |
myVolcanoPlot <- function(D, group1, group2, alpha ){ | |
# Input: | |
# D > dataframe to be analyzed | |
# group1 > column indeces for group1 | |
# group2 > column indeces for group2 | |
# alpha > p.value cut off for significant genes | |
# this function calculates the pvalue using a t.test | |
# the group means and the log2fc and makes a plot | |
# calculate the pvalue for each row using apply | |
for(i in 1:nrow(D)){ | |
# calculate p.value | |
D[i,'p.value'] = as.numeric(t.test(D[i,group1], D[i,group2])$p.value) | |
# calculate mean for group 1 | |
D[i,'mean.g1'] = mean(as.numeric(D[i,group1]), na.rm = TRUE) | |
# calculate mean for group 2 | |
D[i,'mean.g2'] = mean(as.numeric(D[i,group2]), na.rm = TRUE) | |
# calculate log2 fold change | |
D[i,'log2fc'] = log2(D[i,'mean.g2']/D[i,'mean.g1']) | |
} | |
D[, 'q.value'] = p.adjust(D[,'p.value'], method = 'fdr') | |
plot(D[,'log2fc'], -log10(D[,'q.value']), pch = 19, | |
xlab = 'log2fc', ylab = '-log10(q.value)', cex.axis = 1.3, cex.lab = 1.3, | |
col = ifelse(D[,'q.value'] <= alpha, | |
ifelse(D[, 'log2fc'] < 0, '#2166ac50', '#b2182b50'), | |
'#bdbdbd30')) # nested ifelse call to make three colors | |
# add legend | |
legend('bottomleft', c('n.s', 'sig down', 'sig up'), col = c('#bdbdbd30', '#2166ac50', '#b2182b50'), pch = 19, cex = 0.8, bty = 'n') | |
return(D) | |
} | |
head(Experiment_DEG) | |
AD = myVolcanoPlot(Experiment_DEG, 2:5, 6:9, 0.05) | |
head(AD) | |
``` | |
*** | |
# Install packages | |
If you cannnot find a function for what you are looking for, odds are, that somebody else already encountered the same problem and wrote a package for it. There are two main resources for additional packages in R. | |
The more general packages can be found on [Cran](https://cran.r-project.org/). Packages related to analysis of biological data are most often found on [Bioconductor](https://www.bioconductor.org/). | |
First find a package you are looking for via Google and read through the [manual](https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) or the vingette to confirm that it is the right package for you. Ask yourself the following questions: | |
1. Is it applicable to my data? | |
2. Is it returning results I am interested in? | |
3. Is it being maintained? | |
4. Do I find the manual somewhat easy to understand? | |
If you are happy with the package, download it either through `install.packages()` if you found it on Cran, or `biocLite()` if you found it on Bioconductor. | |
### Cran | |
```{r eval = F} | |
# install the package, only need to do this once | |
install.packages('pheatmap') | |
# every time you start R and want to use the package, you need to load it | |
library(pheatmap) | |
``` | |
### Bioconductor | |
```{r eval = F} | |
if (!requireNamespace("BiocManager", quietly = TRUE)) | |
install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE) | |
BiocManager::install("edgeR") | |
library(edgeR) | |
``` | |
We will install more packages later. In general it is quite straight forward. | |
*** | |
# Basic plotting functions | |
With R you can make various different plots and graphics. In this Workshop we will only cover basic plotting functions but we are more than happy to help you with a specific plot you are interested in. | |
## Scatter plot | |
Simple scatter plot. A scatterplot can be used to view how one variable (y) behaves with respect to another variable (x). | |
```{r} | |
# create two vectors with random variables | |
# 100 random numbers from a uniform distribution between 20 and 80 | |
x = runif(100, 20, 80) | |
# 100 random numbers from a normal distribution but the mean is differen | |
# for every random number (depends on x) | |
y = rnorm(100, x, 10) | |
# plot | |
plot(x, y ) | |
# make the plot a bit prettier | |
# add title and axis labels | |
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1') | |
# make the text bigger | |
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1', | |
cex.main = 2, cex.axis = 1.5, cex.lab = 1.5) | |
# change the color and shape of the dots | |
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1', | |
cex.main = 2, cex.axis = 1.5, cex.lab = 1.5, | |
pch = 17, col = 'olivedrab') | |
# plot with shape and colors | |
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1', | |
cex.main = 2, cex.axis = 1.5, cex.lab = 1.5, | |
pch = ifelse(abs(x - mean(x)) > sd(x), 6, 17), | |
col = ifelse(y > x, 'firebrick2', 'dodgerblue2')) | |
``` | |
<!-- | |
add pictures of possbile shapes and colors in R | |
--> | |
```{r} | |
# add a legends to the plot | |
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1', | |
cex.main = 2, cex.axis = 1.5, cex.lab = 1.5, cex = 1.5, | |
pch = ifelse(abs(x - mean(x)) > sd(x), 6, 17), | |
col = ifelse(y > x, 'firebrick2', 'dodgerblue2')) | |
legend('topleft', c('outside sd', 'inside sd'), pch = c(6, 17), cex = 0.8) | |
legend('bottomright', c('x < y', 'x > y'), fill = c('firebrick2', 'dodgerblue2'), | |
cex = 0.8, bty = 'n') | |
# add a regression line | |
abline(lm(y ~ x), lty = 2, lwd = 1.5) | |
``` | |
*** | |
Resources to help you choose colors for your plot: | |
- R color brewer: https://colorbrewer2.org/ | |
- R color pdf: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf | |
### Hands On | |
* Read in your test data frame from the file `airway_gene_expression.tsv` and `airway_sample_info.tsv`. | |
- `D = read.delim2(file, as.is = T, header = T)` | |
* Plot the mean over all samples before the intervention against the mean of all samples after the intervention with a nice title and axis labels. | |
- select columns based on vector `airway.D[airway.I[airway.I$Intervention == "before",]$SampleName]` | |
- calculate row wise mean (`rowMeans()`) | |
- plot data (`plot(x = mean_before, y = mean_after, ...)`) | |
* Change the shape and color of the data points. | |
- `plot(..., pch = ,...)` to change shape | |
- `plot(..., col = ,...)` to change color | |
* Change the axis to log scale. | |
- `plot(..., log = 'xy', ...)` | |
```{r, eval = TRUE, echo = FALSE, collapse = TRUE, warning = FALSE} | |
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T) | |
airway.I = read.delim2('data/airway_sample_info.tsv', as.is = T, header = T) | |
print('head(airway.D)') | |
head(airway.D) | |
print('head(airway.I)') | |
head(airway.I) | |
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName]) | |
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName]) | |
plot(x = mean.before, y = mean.after, pch = 19, col = 'olivedrab', main = 'Airway Data', xlab = 'before intervention [avg]', ylab = 'after intervention [avg]', log = 'xy') | |
``` | |
## Volcano plot | |
A volcano plot is a special type of scatter plot with the log2fc on the x-axis and the p-value on the y-axis. Usually, the significant dots are colored in red and the not significant dots in gray. In our field of research, these dots usually represent genes, proteins or metabolites. The fold change is calculate between two different groups (Condition, age, strain, etc.). The p.value is usually calculated a statistical test (t.test, wilcox.test). | |
```{r, eval = T, collapse = T} | |
head(Experiment_DEG) | |
# calculate p.values using t.test and apply | |
for(i in 1:nrow(Experiment_DEG)){ | |
# calculate p.value | |
Experiment_DEG[i,'p.value'] = as.numeric(t.test(Experiment_DEG[i,2:5], Experiment_DEG[i,6:9])$p.value) | |
} | |
# to calculate the log2fc you first need to calculate the genewise mean of both groups | |
mean_groupA = rowMeans(Experiment_DEG[,2:5]) | |
mean_groupB = rowMeans(Experiment_DEG[,6:9]) | |
# calculate the log2fc based on the group means | |
log2fc = log2(mean_groupB/mean_groupA) | |
# plot the data | |
plot(x = log2fc, y = -log(Experiment_DEG$p.value), col = as.factor(Experiment_DEG$p.value < 0.05), | |
main = 'Volcano plot', pch = 19, cex.axis = 1.3, cex.lab = 1.5, cex.main = 2) | |
legend('top', c('n.s.', 'p < 0.05'), col = c('black', 'red'), pch = 19, bty = 'n') | |
``` | |
Not exactly how most volcano plots look like, but this nicely represents how the synthetic data was generated. | |
### Hands On | |
Work on the airway dataset | |
```{r, collapse = T, echo = TRUE} | |
airway.ge = read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t') | |
# make Gene ID to row names | |
head(airway.ge, n = 5) | |
``` | |
* Remove low abundant `rowMeans < 5` genes from the airway data | |
- calculate the `RM = rowMeans()` using columns `2:9` of `airway.ge` | |
- subset `airway.ge` using `subset(airway.ge, RM > 5)` | |
* Calculate a p-value for your test data frame | |
- use a `for(i in 1:nrow(expressed.airway)){ calculate p-value for row i}` | |
- calculate p.value using t.test `as.numieric(t.test()$p.value)` | |
* Calculate the log2 fold changes using average across the replicates | |
- calculate `mean.before = rowMean(expressed.airway[, <correct columns>])` | |
- calculate `mean.after = rowMean(expressed.airway[, <correct columns>])` | |
- `log2(mean.after/mean.before)` | |
* Plot a volcano plot | |
- `plot(x = log2fc, y = -log10(p.value), ...)` | |
* Add a line at p = 0.01 | |
- add horizontal lines using `abline(h = )` | |
* Add a legend | |
- add legend using `legend()` | |
```{r, eval = T, echo = FALSE, collapse=TRUE} | |
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5) | |
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName]) | |
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName]) | |
log2fc = log2(mean.after/mean.before) | |
print("log2fc") | |
head(log2fc) | |
p.value = apply(airway.D[, 2:9], 1, | |
function(x) {as.numeric(t.test(x[c(1, 3, 5, 7)], x[c(2, 4, 6, 8)])$p.value)}) | |
print("pvalue") | |
head(p.value) | |
plot(x = log2fc, y = -log(p.value), col = as.factor(p.value < 0.05), | |
main = 'Volcano plot', pch = 19, cex.axis = 1.3, cex.lab = 1.5, cex.main = 2) | |
abline(h = -log(0.01)) | |
legend('top', c('n.s.', 'p < 0.05'), col = c('black', 'red'), pch = 19, bty = 'n') | |
``` | |
## PCA plot | |
- The principal component analysis (pca) is an orthogonal transformation | |
- Correlated variables → linearly uncorrelated variables (principal components, PC) | |
- Dimensionality reduction, # PC <= min(# observation, # variables) | |
- First PC has the largest possible variance | |
- Different algorithms to obtain PCs | |
- PCA is mostly used as a tool in: | |
- exploratory data analysis | |
- visualize genetic distance and relatedness between populations. | |
```{r, eval = T, collapse = F} | |
# load example data | |
wine = read.csv('data/wine.csv') | |
head(wine) | |
# define the first column as a factor | |
wineClasses = factor(wine$Cvs) | |
# compute principal component | |
winePCA = prcomp(scale(wine[, -1])) | |
# look at the results | |
summary(winePCA) | |
# look at the names of the results in order to find out what you can use to plot | |
names(winePCA) | |
# plot first vs. second component as scatterplot | |
plot(winePCA$x[,1:2], col = wineClasses, pch = 19) | |
``` | |
### Hands On | |
* Perform a PCA on the your test data (for the samples) | |
- calculate pca only on numeric data `prcomp(t(airway.ge[,<correct columns>]))` | |
* Plot the data using the coordinate system of the first and second component | |
- plot data in new coordinate system: `plot(pca$x[, <correct column for first and second component>], ...)` | |
* Color dots based on before and after intervention | |
- `plot(..., col = , ...)` | |
* Give each patient a different shape | |
- `plot(..., pch = , ...)` | |
* Add a legend | |
- `legend()` | |
```{r, eval = TRUE, echo = FALSE, collapse = TRUE} | |
airway.pca = prcomp(scale(t(airway.D[, -1]))) | |
print(summary(airway.pca)) | |
plot(airway.pca$x[,1:2], col = as.factor(airway.I$Intervention), pch = c(1,1,2,2,3,3,4,4)) | |
legend('topleft', levels(as.factor(airway.I$Intervention)), pch = 1, col = as.factor(airway.I$Intervention)) | |
legend('topright', paste("Patient", c(1, 2, 3, 4)), pch = c(1, 2, 3,4), col = 'black') | |
``` | |
## Other plot option | |
Using the `type` argument in the `plot(x, y, ..., type = ...)` function allows you to change the style how the data is portrayed. This table gives an overview of the different options | |
| symbol | plot | | |
|--|--| | |
| "p" | for points (default)| | |
| "l" | for lines | | |
| "b" | for lines between the dots, but dots as points | | |
| "c" | for the lines part alone of "b" | | |
| "o" | for lines and dots to be 'overplotted'| | |
| "h" | for 'histogram' like vertical lines | | |
| "s" | for stair steps | | |
| "S" | for other steps, see 'Details'| | |
| "n" | for no plotting | | |
Example | |
```{r, eval = T} | |
y = rnorm(50) | |
plot(y, type = 'b') | |
plot(y, type = 'h') | |
plot(y, type = 's') | |
``` | |
## Histogram | |
```{r eval = T} | |
# 100 random numbers from a uniform distribution between 20 and 80 | |
x = rnorm(1000, 20, 80) | |
# a simple histogram | |
hist(x) | |
# sometimes it is useful to separate the data into more bins | |
hist(x, breaks = 20) | |
# you can also store the information generated by the histogram into a new variable | |
x.hist = hist(x, breaks = 20) | |
head(x.hist) | |
``` | |
You can add the title, x-axis label and color like in the plot function, lets play around a bit `hist(x, main = , col = , xlab = , xlim = )` | |
To overlap two distributions, simply add the argument `add = TRUE` to the second plot call | |
```{r} | |
x = runif(100, 20, 80) | |
y = rnorm(100, x, 10) | |
hist(x, xlab = 'WT',main = 'Simple Histogram', | |
cex.main = 2, cex.axis = 1.5, cex.lab = 1.5, | |
col = 'gray', xlim = c(0, 100)) | |
hist(y,col = '#0000FF50', add = T) | |
``` | |
### Hands On | |
Work on airway data: | |
```{r, collapse=TRUE} | |
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T) | |
airway.I = read.delim2('data/airway_sample_info.tsv', as.is = T, header = T) | |
# make Gene ID to row names | |
head(airway.ge, n = 5) | |
# subset to rows with mean > 5 | |
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5) | |
# claculate averages for before and after intervention | |
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName]) | |
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName]) | |
``` | |
* Draw a histogram of the `log2(mean)` before the intervention (average over all replicates) | |
+ `hist(... , breaks = 100)` | |
* Add the histogram of log2(mean) after the intervention (average over all replicates) | |
+ using the same breaks `H1 = hist(...)` and `H2 = hist(..., breaks = H1$breaks, add = TRUE )` | |
```{r, eval = T, echo = F} | |
H1 = hist(log2(mean.before), breaks = 100, col = 'darkgray') | |
hist(log2(mean.after), breaks = H1$breaks, col = '#20502050', add = TRUE) | |
``` | |
## Heatmap | |
Heatmaps are another regular plot to describe expression data among many other applications. | |
We will use the data and code from this webpage: https://davetang.org/muse/2018/05/15/making-a-heatmap-in-r-with-the-pheatmap-package/ | |
```{r, message=FALSE, eval = TRUE} | |
library(pheatmap) | |
# Generate some data | |
test = matrix(rnorm(200), 20, 10) | |
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3 | |
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2 | |
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4 | |
colnames(test) = paste("Test", 1:10, sep = "") | |
rownames(test) = paste("Gene", 1:20, sep = "") | |
# Draw heatmaps | |
pheatmap(test) | |
``` | |
### Hands On | |
* Draw a heatmap from the first 300 lines of test data set | |
- `pheatmpa(as.matrix(airway.D[<correct row indeces>, <only numerical columns>]))` | |
```{r, eval = TRUE, echo = FALSE} | |
library(pheatmap) | |
pheatmap(as.matrix(airway.D[1:300, -1]), scale = 'row') | |
``` | |
* Draw a heatmap of only the significantly (p <= 0.01) changed rows | |
```{r, collapse=TRUE} | |
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T) | |
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5) | |
for(i in 1:nrow(airway.D)){ | |
# calculate p.value | |
airway.D[i,'p.value'] = as.numeric(t.test(airway.D[i,c(2, 4, 6, 8)], airway.D[i,c(3,5,7,9)])$p.value) | |
} | |
head(airway.D) | |
``` | |
- use `subset(data, condition)` to subset to only significant genes | |
```{r, eval = TRUE, echo = FALSE} | |
sig.airway = subset(airway.D, p.value <= 0.01) | |
pheatmap(as.matrix(sig.airway[, 2:9]), scale = 'row') | |
``` | |
<!-- ## Venndiagram --> | |
<!-- Imagine you have two gene sets. To visualize the overlap of these two sets you can draw a venn diagram. To draw a venn diagram you need to load the `VennDiagram` --> | |
<!-- ```{r, message= FALSE, collapse=TRUE} --> | |
<!-- library(VennDiagram) --> | |
<!-- genes1 = paste('gene', seq(1, 30)) --> | |
<!-- genes2 = paste('gene', seq(1, 50, 2)) --> | |
<!-- venn.diagram(list(group1 = genes1, group2 = genes2), --> | |
<!-- fill = c('dodgerblue', 'firebrick'), --> | |
<!-- filename = 'venn1.tiff') --> | |
<!-- genes.overlap = calculate.overlap(list(group1 = genes1, group2 = genes2)) --> | |
<!-- genes.overlap --> | |
<!-- ``` --> | |
## Barplot | |
```{r eval = T} | |
M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) | |
barplot(M) | |
barplot(M, beside = T, main = 'Barplot, beside = T', names.arg = c('A', 'B', 'C')) | |
``` | |
### Hands On | |
* Draw a barplot with the average expression level per replicate | |
- use `colMeans(airway.D[, 2:9])` | |
```{r, eval = T, collapse = T, echo = F} | |
barplot(colMeans(airway.D[, 2:9]), las = 2, cex.names = 0.5) | |
``` | |
## Boxplot | |
```{r eval = T} | |
x = rnorm(100, 10, 2) | |
y = rnorm(100, 15, 2) | |
boxplot(list(x, y)) | |
``` | |
You can also plot multiple boxes by using a group variable | |
```{r} | |
x.and.y = c(x, y) | |
groups.x.and.y = c(rep('treatment 1', 50), rep('treatment 2', 80), rep( 'treatment 3', 70)) | |
boxplot(x.and.y ~ groups.x.and.y) | |
``` | |
### Hands On | |
* Draw a boxplot using the test data table, one box per replicate | |
+ log2 of the expression values for better visualization | |
+ `log2(airway.D[, <correct columns>])` | |
+ `boxplot()` | |
```{r, eval = T, collapse = T, echo = FALSE, warning=FALSE} | |
boxplot(as.matrix(log2(airway.D[,c(2,4, 6, 8, 3, 5, 7, 9)])), las = 2, cex.axis = 0.5, ylab = 'log2(expression)' ) | |
``` | |
<!-- ## Plot properties --> | |
<!-- You can modify almost everything about the plot properties. To modify plot properties use `par()` --> | |
<!-- ### Margins --> | |
<!-- Sometimes the axis label does not fit onto the plot or you want to add subtitle and need more space on the bottom. You can use `par(mar = c(bottom, left, top, right))` to modify the plot margins. --> | |
<!-- ```{r} --> | |
<!-- par(mar = c(7, 5, 2, 1)) --> | |
<!-- boxplot(x.and.y ~ groups.x.and.y, las = 2) --> | |
<!-- ``` --> | |
<!-- Sometimes you want multiple plots in one graphic. Use `par(mfrow = (nrow, ncol))` --> | |
<!-- ```{r} --> | |
<!-- par(mfrow = c(1, 2)) --> | |
<!-- boxplot(x.and.y ~ groups.x.and.y, las = 2) --> | |
<!-- boxplot(list(x, y)) --> | |
<!-- ``` --> | |
<!-- ## Add text to a plot --> | |
<!-- ### Within the plot --> | |
<!-- ```{r eval = T} --> | |
<!-- M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) --> | |
<!-- bp = barplot(M, beside = T, main = 'Barplot, beside = T', --> | |
<!-- names.arg = c('A', 'B', 'C'), ylim = c(0, 15)) --> | |
<!-- text(x = colMeans(bp), y = c(12, 13, 11), labels = '**', cex = 1.5, pos = 3) --> | |
<!-- ``` --> | |
<!-- ### Within the margins --> | |
<!-- ```{r eval = T} --> | |
<!-- M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) --> | |
<!-- bp = barplot(M, beside = T, main = 'Barplot, beside = T', names.arg = c('A', 'B', 'C')) --> | |
<!-- mtext('Samples', 1, 3) --> | |
<!-- ``` --> | |
### Advanced links (plotting) | |
- data to viz: https://www.data-to-viz.com/ | |
- R plot gallery: https://www.r-graph-gallery.com/ | |
# End of the day challenge | |
Pick a plot from the R graph gallery and reproduce the plot. I also encourage you to explore the data. In some cases you will need to install extra libraries, but don't be afraid, R-studio usually helps you with that. If we have time I would like you to share your plot and quickly explain what you visualized. | |
# Recommended further reading | |
- Introduction to the tidyverse: https://r4ds.had.co.nz/introduction.html | |
# And for tomorrow: | |
Tomorrow we will perform a real differential gene expression analysis, following the DESeq2 Tutorial | |
<https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html> | |
To ensure that everything is running smoothly, please make sure that you have the following libraries installed. If not please install them using either `install.packages()` or Bioconductor. | |
```{r, eval = FALSE, collapse = T} | |
# loading all libraries | |
# cran libraries | |
install.packages("magrittr") | |
install.packages("dplyr") | |
install.packages("ggplot2") | |
install.packages("pheatmap") | |
install.packages("RColorBrewer") | |
install.packages("PoiClaClu") | |
install.packages("ggbeeswarm") | |
install.packages("dbplyr") | |
library("magrittr") | |
library("dplyr") | |
library("ggplot2") | |
library("pheatmap") | |
library("RColorBrewer") | |
library("PoiClaClu") | |
library("ggbeeswarm") | |
# bioconductor libraries | |
if (!requireNamespace("BiocManager", quietly = TRUE)) | |
install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE) | |
BiocManager::install("airway") | |
BiocManager::install("GenomicFeatures") | |
BiocManager::install("GenomicAlignments") | |
BiocManager::install("Rsamtools") | |
BiocManager::install("BiocParallel") | |
BiocManager::install("DESeq2") | |
BiocManager::install("vsn") | |
BiocManager::install("apeglm") | |
BiocManager::install("genefilter") | |
BiocManager::install("AnnotationDbi") | |
BiocManager::install("org.Hs.eg.db") | |
BiocManager::install("ReportingTools") | |
BiocManager::install("Gviz") | |
BiocManager::install("sva") | |
BiocManager::install("RUVSeq") | |
BiocManager::install("fission") | |
BiocManager::install("tximeta") | |
library("airway") | |
library("GenomicFeatures") | |
library("GenomicAlignments") | |
library("Rsamtools") | |
library("BiocParallel") | |
library("DESeq2") | |
library("vsn") | |
library("apeglm") | |
library("genefilter") | |
library("AnnotationDbi") | |
library("org.Hs.eg.db") | |
library("ReportingTools") | |
library("Gviz") | |
library("sva") | |
library("RUVSeq") | |
library("fission") | |
library("tximeta") | |
``` | |