Skip to content

Commit

Permalink
#10 updated R workshop material
Browse files Browse the repository at this point in the history
  • Loading branch information
FranziMe committed Oct 20, 2021
1 parent 2435696 commit 3aabdae
Show file tree
Hide file tree
Showing 13 changed files with 3,234 additions and 218 deletions.
574 changes: 574 additions & 0 deletions DESeq2.R

Large diffs are not rendered by default.

146 changes: 79 additions & 67 deletions R_workshop.2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
title: "R Workshop -- Day 2"
author: "Franziska Metge"
output:
html_document: default
pdf_document: default
html_document: default
---


Expand Down Expand Up @@ -239,8 +239,10 @@ library(pheatmap)

### Bioconductor
```{r eval = F}
source("https://bioconductor.org/biocLite.R")
biocLite("edgeR")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE)
BiocManager::install("edgeR")
library(edgeR)
```
Expand Down Expand Up @@ -325,8 +327,8 @@ Resources to help you choose colors for your plot:
- `plot(..., log = 'xy', ...)`

```{r, eval = TRUE, echo = FALSE, collapse = TRUE, warning = FALSE}
airway.D <- read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.I <- read.delim2('data/airway_sample_info.tsv', as.is = T, header = T)
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.I = read.delim2('data/airway_sample_info.tsv', as.is = T, header = T)
print('head(airway.D)')
head(airway.D)
print('head(airway.I)')
Expand Down Expand Up @@ -368,14 +370,14 @@ Not exactly how most volcano plots look like, but this nicely represents how the
### Hands On
Work on the airway dataset
```{r, collapse = T, echo = TRUE}
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
airway.ge = read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
# make Gene ID to row names
head(airway.ge, n = 5)
```

* Remove low abundant `rowMeans < 1` genes from the airway data
* Remove low abundant `rowMeans < 5` genes from the airway data
- calculate the `RM = rowMeans()` using columns `2:9` of `airway.ge`
- subset `airway.ge` using `subset(airway.ge, RM > 1)`
- subset `airway.ge` using `subset(airway.ge, RM > 5)`
* Calculate a p-value for your test data frame
- use a `for(i in 1:nrow(expressed.airway)){ calculate p-value for row i}`
- calculate p.value using t.test `as.numieric(t.test()$p.value)`
Expand All @@ -391,7 +393,7 @@ head(airway.ge, n = 5)
- add legend using `legend()`

```{r, eval = T, echo = FALSE, collapse=TRUE}
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 1)
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5)
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName])
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName])
Expand Down Expand Up @@ -425,12 +427,12 @@ legend('top', c('n.s.', 'p < 0.05'), col = c('black', 'red'), pch = 19, bty = 'n

```{r, eval = T, collapse = F}
# load example data
wine <- read.csv('data/wine.csv')
wine = read.csv('data/wine.csv')
head(wine)
# define the first column as a factor
wineClasses <- factor(wine$Cvs)
wineClasses = factor(wine$Cvs)
# compute principal component
winePCA <- prcomp(scale(wine[, -1]))
winePCA = prcomp(scale(wine[, -1]))
# look at the results
summary(winePCA)
# look at the names of the results in order to find out what you can use to plot
Expand Down Expand Up @@ -494,7 +496,7 @@ hist(x)
# sometimes it is useful to separate the data into more bins
hist(x, breaks = 20)
# you can also store the information generated by the histogram into a new variable
x.hist <- hist(x, breaks = 20)
x.hist = hist(x, breaks = 20)
head(x.hist)
```
You can add the title, x-axis label and color like in the plot function, lets play around a bit `hist(x, main = , col = , xlab = , xlim = )`
Expand All @@ -514,12 +516,12 @@ hist(y,col = '#0000FF50', add = T)
### Hands On
Work on airway data:
```{r, collapse=TRUE}
airway.D <- read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.I <- read.delim2('data/airway_sample_info.tsv', as.is = T, header = T)
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.I = read.delim2('data/airway_sample_info.tsv', as.is = T, header = T)
# make Gene ID to row names
head(airway.ge, n = 5)
# subset to rows with mean > 1
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 1)
# subset to rows with mean > 5
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5)
# claculate averages for before and after intervention
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName])
Expand All @@ -541,14 +543,21 @@ hist(log2(mean.after), breaks = H1$breaks, col = '#20502050', add = TRUE)
Heatmaps are another regular plot to describe expression data among many other applications.
We will use the data and code from this webpage: https://davetang.org/muse/2018/05/15/making-a-heatmap-in-r-with-the-pheatmap-package/

```{r eval = T, message=FALSE}
```{r, eval = F, echo = TRUE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos='http://cran.us.r-project.org' ,
dependencies = TRUE, version = "1.30.10")
BiocManager::install("DESeq")
```

```{r eval = T, message=FALSE, eval = FALSE}
library(pheatmap)
library(DESeq)
# load data and subset
example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
data_subset <- as.matrix(data[rowSums(data)>50000,])
example_file = system.file ("extra/TagSeqExample.tab", package="DESeq")
data = read.delim(example_file, header=T, row.names="gene")
data_subset = as.matrix(data[rowSums(data)>50000,])
# create heatmap using pheatmap
pheatmap(data_subset)
Expand All @@ -567,8 +576,8 @@ pheatmap(as.matrix(airway.D[1:300, -1]), scale = 'row')

* Draw a heatmap of only the significantly (p <= 0.01) changed rows
```{r, collapse=TRUE}
airway.D <- read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 1)
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5)
for(i in 1:nrow(airway.D)){
# calculate p.value
Expand All @@ -581,25 +590,25 @@ head(airway.D)
- use `subset(data, condition)` to subset to only significant genes

```{r, eval = TRUE, echo = FALSE}
sig.airway <- subset(airway.D, p.value <= 0.01)
sig.airway = subset(airway.D, p.value <= 0.01)
pheatmap(as.matrix(sig.airway[, 2:9]), scale = 'row')
```


## Venndiagram
Imagine you have two gene sets. To visualize the overlap of these two sets you can draw a venn diagram. To draw a venn diagram you need to load the `VennDiagram`
<!-- ## Venndiagram -->
<!-- Imagine you have two gene sets. To visualize the overlap of these two sets you can draw a venn diagram. To draw a venn diagram you need to load the `VennDiagram` -->

```{r, message= FALSE, collapse=TRUE}
library(VennDiagram)
genes1 = paste('gene', seq(1, 30))
genes2 = paste('gene', seq(1, 50, 2))
venn.diagram(list(group1 = genes1, group2 = genes2),
fill = c('dodgerblue', 'firebrick'),
filename = 'venn1.tiff')
<!-- ```{r, message= FALSE, collapse=TRUE} -->
<!-- library(VennDiagram) -->
<!-- genes1 = paste('gene', seq(1, 30)) -->
<!-- genes2 = paste('gene', seq(1, 50, 2)) -->
<!-- venn.diagram(list(group1 = genes1, group2 = genes2), -->
<!-- fill = c('dodgerblue', 'firebrick'), -->
<!-- filename = 'venn1.tiff') -->

genes.overlap = calculate.overlap(list(group1 = genes1, group2 = genes2))
genes.overlap
```
<!-- genes.overlap = calculate.overlap(list(group1 = genes1, group2 = genes2)) -->
<!-- genes.overlap -->
<!-- ``` -->

## Barplot
```{r eval = T}
Expand Down Expand Up @@ -644,53 +653,56 @@ boxplot(x.and.y ~ groups.x.and.y)
boxplot(as.matrix(log2(airway.D[,c(2,4, 6, 8, 3, 5, 7, 9)])), las = 2, cex.axis = 0.5, ylab = 'log2(expression)' )
```

## Plot properties
You can modify almost everything about the plot properties. To modify plot properties use `par()`
<!-- ## Plot properties -->
<!-- You can modify almost everything about the plot properties. To modify plot properties use `par()` -->

### Margins
Sometimes the axis label does not fit onto the plot or you want to add subtitle and need more space on the bottom. You can use `par(mar = c(bottom, left, top, right))` to modify the plot margins.
<!-- ### Margins -->
<!-- Sometimes the axis label does not fit onto the plot or you want to add subtitle and need more space on the bottom. You can use `par(mar = c(bottom, left, top, right))` to modify the plot margins. -->

```{r}
par(mar = c(7, 5, 2, 1))
boxplot(x.and.y ~ groups.x.and.y, las = 2)
```
<!-- ```{r} -->
<!-- par(mar = c(7, 5, 2, 1)) -->
<!-- boxplot(x.and.y ~ groups.x.and.y, las = 2) -->
<!-- ``` -->

Sometimes you want multiple plots in one graphic. Use `par(mfrow = (nrow, ncol))`
```{r}
par(mfrow = c(1, 2))
boxplot(x.and.y ~ groups.x.and.y, las = 2)
boxplot(list(x, y))
```
<!-- Sometimes you want multiple plots in one graphic. Use `par(mfrow = (nrow, ncol))` -->
<!-- ```{r} -->
<!-- par(mfrow = c(1, 2)) -->
<!-- boxplot(x.and.y ~ groups.x.and.y, las = 2) -->
<!-- boxplot(list(x, y)) -->
<!-- ``` -->

## Add text to a plot
<!-- ## Add text to a plot -->

### Within the plot
```{r eval = T}
M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3)
bp = barplot(M, beside = T, main = 'Barplot, beside = T',
names.arg = c('A', 'B', 'C'), ylim = c(0, 15))
text(x = colMeans(bp), y = c(12, 13, 11), labels = '**', cex = 1.5, pos = 3)
```
<!-- ### Within the plot -->
<!-- ```{r eval = T} -->
<!-- M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) -->
<!-- bp = barplot(M, beside = T, main = 'Barplot, beside = T', -->
<!-- names.arg = c('A', 'B', 'C'), ylim = c(0, 15)) -->
<!-- text(x = colMeans(bp), y = c(12, 13, 11), labels = '**', cex = 1.5, pos = 3) -->
<!-- ``` -->

### Within the margins
```{r eval = T}
M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3)
bp = barplot(M, beside = T, main = 'Barplot, beside = T', names.arg = c('A', 'B', 'C'))
mtext('Samples', 1, 3)
```
<!-- ### Within the margins -->
<!-- ```{r eval = T} -->
<!-- M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) -->
<!-- bp = barplot(M, beside = T, main = 'Barplot, beside = T', names.arg = c('A', 'B', 'C')) -->
<!-- mtext('Samples', 1, 3) -->
<!-- ``` -->

### Advanced links (plotting)
- data to viz: https://www.data-to-viz.com/
- R plot gallery: https://www.r-graph-gallery.com/


# End of the day challenge
Pick a plot from data-to-viz web-page and reproduce the plot. I also encourage you to explore the data. In some cases you will need to install extra libraries, but don't be afraid, R-studio usually helps you with that. If we have time I would like you to share your plot and quickly explain what you visualized.
Pick a plot from the R graph gallery and reproduce the plot. I also encourage you to explore the data. In some cases you will need to install extra libraries, but don't be afraid, R-studio usually helps you with that. If we have time I would like you to share your plot and quickly explain what you visualized.

# Recommended further reading
- Introduction to the tidyverse: https://r4ds.had.co.nz/introduction.html

# And for tomorrow:

Tomorrow we will perform a real differential gene expression analysis, following the DESeq2 Tutorial
<http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html>
<https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html>

To ensure that everything is running smoothly, please make sure that you have the following libraries installed. If not please install them using either `install.packages()` or Bioconductor.

Expand Down Expand Up @@ -718,7 +730,7 @@ library("ggbeeswarm")
# bioconductor libraries
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE, version = "1.30.10")
install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE)
BiocManager::install("airway")
BiocManager::install("GenomicFeatures")
Expand Down
196 changes: 93 additions & 103 deletions R_workshop.2.html

Large diffs are not rendered by default.

Binary file modified R_workshop.2.pdf
Binary file not shown.
Loading

0 comments on commit 3aabdae

Please sign in to comment.