R_workshop.2.Rmd

---
title: "R Workshop -- Day 2"
author: "Franziska Metge"
output:
  html_document: default
  pdf_document: default
---


```{r "setup", include=FALSE}
require("knitr")
opts_knit$set(root.dir = "~/bit_R_workshop")
```


# Outline

1. Loops (solution to yesterday's homework)
3. Write your own function
4. Installing R packages (cran and bioconductor)
5. Basic plotting functions
<!-- 6. Step by Step DEG analysis to take home -->


# Yesterday's Homework
1. Load the table generated in 'create_gene_expression_table.R' using `source()`
2. Return all genes which are significant differentially expressed
3. Use the `t.test()$p.value` and a for loop
4. Groups are as following:
  + Group 1: Rep 1-4
  + Group 2: Rep 5-8

```{r eval = T}
source('data/create_gene_expression_table.R')
head(Experiment_DEG)
```

```{r eval = T}
# number of rows
nrow(Experiment_DEG)

# Calculate the t.test for each gene using a for loop
for(i in 1:nrow(Experiment_DEG)){
  ttest = t.test(Experiment_DEG[i, 2:5], Experiment_DEG[i, 6:9])
  Experiment_DEG[i, 'pvals'] = as.numeric(ttest$p.value)
}

sig_genes = subset(Experiment_DEG, pvals < 0.05)
head(sig_genes)
nrow(sig_genes)
```

# Write your own function

When to write a function:

- if you repeat the same procedure with minor changes or on different data

How is the basic structure of a function
```{r eval = F}
myFunctionName <- function(arg1, arg2, ...){
  <Do something with arg1 and arg2...>
  return(result)
}
```

There are two main principles how to write a function

- Think first and design the function as you go along (more experianced programmers)
- Wrap a function around already written code


## Easy examples
You want a function that calculates the square of a given number
```{r eval = T}
Square <- function(x){

  result = x * x

  return(result)
}

Square(3)
```

No we want to return the square and the qube of a given number
```{r eval = T}
exponent_2_and_3 <- function(x){

  sq = x * x
  cb = x * x * x

  return(c(sq, cb))
}

exponent_2_and_3(3)
```

Sometimes it can be useful to return a list
```{r eval = T}
exponent_2_and_3_list <- function(x){

  sq = x * x
  cb = x * x * x

  return(list(square = sq , cube = cb))
}

exponent_2_and_3_list(3)

```

## Hands On
* Write a function that takes in two numbers and returns the log2 fold change.
  - calculate log2 fold change: `log2(y/x)`
```{r, eval = T, echo = F, collapse = TRUE}
my_log2fc <- function(x, y){
  lf = log2(y/x)
  return(lf)
}

print("my_log2fc(3, 6)")
my_log2fc(3, 6)
```

* Write a function that takes in two numbers and returns the bigger one or both if they are equal.
  - remember the signs to for bigger than `>`, less than `<`, or equal to `==`
```{r, eval = T, echo = F, collapse = TRUE}
my_bigger <- function(x, y){
  if(x > y){
    return(x)
  } else if(y > x){
    return(y)
  } else {
    return(c(x, y))
  }
}

print("my_bigger(3, 5)")

my_bigger(3, 5)
```

## Hands on

Write a function (`return_bigger <- function(arguments){do something}`) which takes in a number `x` and a vector `v` of numbers. Return all numbers bigger than the given number.

Using a for loop (`for(i in v){do something}`) and if (`if(condtions){do something}`) statements. You can use `append()` to add a value to a vector. First create an empty vector `y <- numeric(0)`, then fill vector using `for`, `if` and `y <- append(y, number)`.

```{r eval = T, echo= F}
return_bigger <- function(x, v){
  bigger = numeric(0)
  for(i in v){
    if(i > x){
      bigger = append(bigger, i)
    }
  }
  return(bigger)
}
```

When you finished writing the function try to run it on the following test case:

```{r eval = T}
return_bigger(5, c(10,3,2,16,4,7))
```

Did it work? Let's discuss some other occasions it is useful to write or use a function.

### A bit more complex example of a function
```{r eval = T, collapse=TRUE}
myVolcanoPlot <- function(D, group1, group2, alpha ){
  # Input:
  # D > dataframe to be analyzed
  # group1 > column indeces for group1
  # group2 > column indeces for group2
  # alpha > p.value cut off for significant genes

  # this function calculates the pvalue using a t.test
  # the group means and the log2fc and makes a plot

  # calculate the pvalue for each row using apply
  for(i in 1:nrow(D)){
    # calculate p.value
    D[i,'p.value'] = as.numeric(t.test(D[i,group1], D[i,group2])$p.value)

    # calculate mean for group 1
    D[i,'mean.g1'] = mean(as.numeric(D[i,group1]), na.rm = TRUE)

    # calculate mean for group 2
    D[i,'mean.g2'] = mean(as.numeric(D[i,group2]), na.rm = TRUE)

    # calculate log2 fold change
    D[i,'log2fc'] = log2(D[i,'mean.g2']/D[i,'mean.g1'])
  }

  D[, 'q.value'] = p.adjust(D[,'p.value'], method = 'fdr')
  plot(D[,'log2fc'], -log10(D[,'q.value']), pch = 19,
       xlab = 'log2fc', ylab = '-log10(q.value)', cex.axis = 1.3, cex.lab = 1.3,
       col = ifelse(D[,'q.value'] <= alpha,
                    ifelse(D[, 'log2fc'] < 0, '#2166ac50', '#b2182b50'),
                    '#bdbdbd30')) # nested ifelse call to make three colors
  # add legend
  legend('bottomleft', c('n.s', 'sig down', 'sig up'), col = c('#bdbdbd30', '#2166ac50', '#b2182b50'), pch = 19, cex = 0.8, bty = 'n')

  return(D)
}

head(Experiment_DEG)
AD = myVolcanoPlot(Experiment_DEG, 2:5, 6:9, 0.05)
head(AD)

```

***

# Install packages

If you cannnot find a function for what you are looking for, odds are, that somebody else already encountered the same problem and wrote a package for it. There are two main resources for additional packages in R.
The more general packages can be found on [Cran](https://cran.r-project.org/). Packages related to analysis of biological data are most often found on [Bioconductor](https://www.bioconductor.org/).
First find a package you are looking for via Google and read through the [manual](https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) or the vingette to confirm that it is the right package for you. Ask yourself the following questions:

1. Is it applicable to my data?
2. Is it returning results I am interested in?
3. Is it being maintained?
4. Do I find the manual somewhat easy to understand?

If you are happy with the package, download it either through `install.packages()` if you found it on Cran, or `biocLite()` if you found it on Bioconductor.

### Cran
```{r eval = F}
# install the package, only need to do this once
install.packages('pheatmap')

# every time you start R and want to use the package, you need to load it
library(pheatmap)

```

### Bioconductor
```{r eval = F}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE)

BiocManager::install("edgeR")

library(edgeR)
```

We will install more packages later. In general it is quite straight forward.

***

# Basic plotting functions
With R you can make various different plots and graphics. In this Workshop we will only cover basic plotting functions but we are more than happy to help you with a specific plot you are interested in.

## Scatter plot
Simple scatter plot. A scatterplot can be used to view how one variable (y) behaves with respect to another variable (x).
```{r}
# create two vectors with random variables
# 100 random numbers from a uniform distribution between 20 and 80
x = runif(100, 20, 80)
# 100 random numbers from a normal distribution but the mean is differen
# for every random number (depends on x)
y = rnorm(100, x, 10)

# plot
plot(x, y )

# make the plot a bit prettier
# add title and axis labels
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1')

# make the text bigger
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1',
     cex.main = 2, cex.axis = 1.5, cex.lab = 1.5)

# change the color and shape of the dots
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1',
     cex.main = 2, cex.axis = 1.5, cex.lab = 1.5,
     pch = 17, col = 'olivedrab')

# plot with shape and colors
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1',
     cex.main = 2, cex.axis = 1.5, cex.lab = 1.5,
     pch = ifelse(abs(x - mean(x)) > sd(x), 6, 17),
     col = ifelse(y > x, 'firebrick2', 'dodgerblue2'))

```
<!--
add pictures of possbile shapes and colors in R
-->
```{r}
# add a legends to the plot
plot(x, y, main = 'Simple Plot', xlab = 'WT', ylab = 'Treat1',
     cex.main = 2, cex.axis = 1.5, cex.lab = 1.5, cex = 1.5,
          pch = ifelse(abs(x - mean(x)) > sd(x), 6, 17),
     col = ifelse(y > x, 'firebrick2', 'dodgerblue2'))

legend('topleft', c('outside sd', 'inside sd'), pch = c(6, 17), cex = 0.8)
legend('bottomright', c('x < y', 'x > y'), fill = c('firebrick2', 'dodgerblue2'),
       cex = 0.8, bty = 'n')

# add a regression line
abline(lm(y ~ x), lty = 2, lwd = 1.5)
```

***

Resources to help you choose colors for your plot:
  - R color brewer: https://colorbrewer2.org/
  - R color pdf: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf


### Hands On

* Read in your test data frame from the file `airway_gene_expression.tsv` and `airway_sample_info.tsv`.
  - `D = read.delim2(file, as.is = T, header = T)`
* Plot the mean over all samples before the intervention against the mean of all samples after the intervention with a nice title and axis labels.
  - select columns based on vector `airway.D[airway.I[airway.I$Intervention == "before",]$SampleName]`
  - calculate row wise mean (`rowMeans()`)
  - plot data (`plot(x = mean_before, y = mean_after, ...)`)
* Change the shape and color of the data points.
  - `plot(..., pch = ,...)` to change shape
  - `plot(..., col = ,...)` to change color
* Change the axis to log scale.
  - `plot(..., log = 'xy', ...)`

```{r, eval = TRUE, echo = FALSE, collapse = TRUE, warning = FALSE}
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.I = read.delim2('data/airway_sample_info.tsv', as.is = T, header = T)
print('head(airway.D)')
head(airway.D)
print('head(airway.I)')
head(airway.I)
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName])
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName])

plot(x = mean.before, y = mean.after, pch = 19, col = 'olivedrab', main = 'Airway Data', xlab = 'before intervention [avg]', ylab = 'after intervention [avg]', log = 'xy')

```

## Volcano plot
A volcano plot is a special type of scatter plot with the log2fc on the x-axis and the p-value on the y-axis. Usually, the significant dots are colored in red and the not significant dots in gray. In our field of research, these dots usually represent genes, proteins or metabolites. The fold change is calculate between two different groups (Condition, age, strain, etc.). The p.value is usually calculated a statistical test (t.test, wilcox.test).

```{r, eval = T, collapse = T}
head(Experiment_DEG)
# calculate p.values using t.test and apply
for(i in 1:nrow(Experiment_DEG)){
  # calculate p.value
  Experiment_DEG[i,'p.value'] = as.numeric(t.test(Experiment_DEG[i,2:5], Experiment_DEG[i,6:9])$p.value)
}

# to calculate the log2fc you first need to calculate the genewise mean of both groups
mean_groupA = rowMeans(Experiment_DEG[,2:5])
mean_groupB = rowMeans(Experiment_DEG[,6:9])

# calculate the log2fc based on the group means
log2fc = log2(mean_groupB/mean_groupA)

# plot the data
plot(x = log2fc, y = -log(Experiment_DEG$p.value), col = as.factor(Experiment_DEG$p.value < 0.05),
     main = 'Volcano plot', pch = 19, cex.axis = 1.3, cex.lab = 1.5, cex.main = 2)
legend('top', c('n.s.', 'p < 0.05'), col = c('black', 'red'), pch = 19, bty = 'n')

```

Not exactly how most volcano plots look like, but this nicely represents how the synthetic data was generated.

### Hands On
Work on the airway dataset
```{r, collapse = T, echo = TRUE}
airway.ge = read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
# make Gene ID to row names
head(airway.ge, n = 5)
```

* Remove low abundant `rowMeans < 5` genes from the airway data
  - calculate the `RM = rowMeans()` using columns `2:9` of `airway.ge`
  - subset `airway.ge` using `subset(airway.ge, RM > 5)`
* Calculate a p-value for your test data frame
  - use a `for(i in 1:nrow(expressed.airway)){ calculate p-value for row i}`
  - calculate p.value using t.test `as.numieric(t.test()$p.value)`
* Calculate the log2 fold changes using average across the replicates
  - calculate `mean.before = rowMean(expressed.airway[, <correct columns>])`
  - calculate `mean.after = rowMean(expressed.airway[, <correct columns>])`
  - `log2(mean.after/mean.before)`
* Plot a volcano plot
  - `plot(x = log2fc, y = -log10(p.value), ...)`
* Add a line at p = 0.01
  - add horizontal lines using `abline(h = )`
* Add a legend
  - add legend using `legend()`

```{r, eval = T, echo = FALSE, collapse=TRUE}
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5)

mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName])
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName])

log2fc = log2(mean.after/mean.before)

print("log2fc")
head(log2fc)

p.value = apply(airway.D[, 2:9], 1,
                function(x) {as.numeric(t.test(x[c(1, 3, 5, 7)], x[c(2, 4, 6, 8)])$p.value)})
print("pvalue")
head(p.value)
plot(x = log2fc, y = -log(p.value), col = as.factor(p.value < 0.05),
     main = 'Volcano plot', pch = 19, cex.axis = 1.3, cex.lab = 1.5, cex.main = 2)
abline(h = -log(0.01))
legend('top', c('n.s.', 'p < 0.05'), col = c('black', 'red'), pch = 19, bty = 'n')

```

## PCA plot
- The principal component analysis (pca) is an orthogonal transformation
- Correlated variables → linearly uncorrelated variables (principal components, PC)
- Dimensionality reduction, # PC <= min(# observation, # variables)
- First PC has the largest possible variance
- Different algorithms to obtain PCs
- PCA is mostly used as a tool in:
  - exploratory data analysis
  - visualize genetic distance and relatedness between populations.


```{r, eval = T, collapse = F}
# load example data
wine = read.csv('data/wine.csv')
head(wine)
# define the first column as a factor
wineClasses = factor(wine$Cvs)
# compute principal component
winePCA = prcomp(scale(wine[, -1]))
# look at the results
summary(winePCA)
# look at the names of the results in order to find out what you can use to plot
names(winePCA)
# plot first vs. second component as scatterplot
plot(winePCA$x[,1:2], col = wineClasses, pch = 19)
```

### Hands On

* Perform a PCA on the your test data (for the samples)
  - calculate pca only on numeric data `prcomp(t(airway.ge[,<correct columns>]))`
* Plot the data using the coordinate system of the first and second component
  - plot data in new coordinate system: `plot(pca$x[, <correct column for first and second component>], ...)`
* Color dots based on before and after intervention
  - `plot(..., col = , ...)`
* Give each patient a different shape
  - `plot(..., pch = , ...)`
* Add a legend
  - `legend()`

```{r, eval = TRUE, echo = FALSE, collapse = TRUE}
airway.pca = prcomp(scale(t(airway.D[, -1])))
print(summary(airway.pca))
plot(airway.pca$x[,1:2], col = as.factor(airway.I$Intervention), pch = c(1,1,2,2,3,3,4,4))
legend('topleft', levels(as.factor(airway.I$Intervention)), pch = 1, col =  as.factor(airway.I$Intervention))
legend('topright', paste("Patient", c(1, 2, 3, 4)), pch = c(1, 2, 3,4), col = 'black')
```


## Other plot option
Using the `type` argument in the `plot(x, y, ..., type = ...)` function allows you to change the style how the data is portrayed. This table gives an overview of the different options

| symbol | plot |
|--|--|
| "p" | for points (default)|
| "l" | for lines |
| "b" | for lines between the dots, but dots as points |
| "c" | for the lines part alone of "b" |
| "o" | for lines and dots to be 'overplotted'|
| "h" | for 'histogram' like vertical lines |
| "s" | for stair steps |
| "S" | for other steps, see 'Details'|
| "n" | for no plotting |

Example
```{r, eval = T}
y = rnorm(50)
plot(y, type = 'b')
plot(y, type = 'h')
plot(y, type = 's')
```

## Histogram

```{r eval = T}
# 100 random numbers from a uniform distribution between 20 and 80
x = rnorm(1000, 20, 80)
# a simple histogram
hist(x)
# sometimes it is useful to separate the data into more bins
hist(x, breaks = 20)
# you can also store the information generated by the histogram into a new variable
x.hist = hist(x, breaks = 20)
head(x.hist)
```
You can add the title, x-axis label and color like in the plot function, lets play around a bit `hist(x, main = , col = , xlab = , xlim = )`

To overlap two distributions, simply add the argument `add = TRUE` to the second plot call
```{r}
x = runif(100, 20, 80)
y = rnorm(100, x, 10)

hist(x, xlab = 'WT',main = 'Simple Histogram',
     cex.main = 2, cex.axis = 1.5, cex.lab = 1.5,
     col = 'gray', xlim = c(0, 100))

hist(y,col = '#0000FF50', add = T)
```

### Hands On
Work on airway data:
```{r, collapse=TRUE}
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.I = read.delim2('data/airway_sample_info.tsv', as.is = T, header = T)
# make Gene ID to row names
head(airway.ge, n = 5)
# subset to rows with mean > 5
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5)

# claculate averages for before and after intervention
mean.before = rowMeans(airway.D[airway.I[airway.I$Intervention == "before",]$SampleName])
mean.after = rowMeans(airway.D[airway.I[airway.I$Intervention == "after",]$SampleName])

```

* Draw a histogram of the `log2(mean)` before the intervention (average over all replicates)
  + `hist(... , breaks = 100)`
* Add the histogram of log2(mean) after the intervention (average over all replicates)
  + using the same breaks `H1 = hist(...)` and `H2 = hist(..., breaks = H1$breaks, add = TRUE )`

```{r, eval = T, echo = F}
H1 = hist(log2(mean.before), breaks = 100, col = 'darkgray')
hist(log2(mean.after), breaks = H1$breaks, col = '#20502050', add = TRUE)
```

## Heatmap
Heatmaps are another regular plot to describe expression data among many other applications.
We will use the data and code from this webpage: https://davetang.org/muse/2018/05/15/making-a-heatmap-in-r-with-the-pheatmap-package/

```{r, message=FALSE, eval = TRUE}
library(pheatmap)

# Generate some data
test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = paste("Gene", 1:20, sep = "")

# Draw heatmaps
pheatmap(test)
```


### Hands On

* Draw a heatmap from the first 300 lines of test data set
  - `pheatmpa(as.matrix(airway.D[<correct row indeces>, <only numerical columns>]))`


```{r, eval = TRUE, echo = FALSE}
library(pheatmap)
pheatmap(as.matrix(airway.D[1:300, -1]), scale = 'row')
```

* Draw a heatmap of only the significantly (p <= 0.01) changed rows
```{r, collapse=TRUE}
airway.D = read.delim2('data/airway_gene_expression.tsv', as.is = T, header = T)
airway.D = subset(airway.D, rowMeans(airway.D[, 2:9]) > 5)

  for(i in 1:nrow(airway.D)){
    # calculate p.value
    airway.D[i,'p.value'] = as.numeric(t.test(airway.D[i,c(2, 4, 6, 8)], airway.D[i,c(3,5,7,9)])$p.value)
}

head(airway.D)
```

  - use `subset(data, condition)` to subset to only significant genes

```{r, eval = TRUE, echo = FALSE}
sig.airway = subset(airway.D, p.value <= 0.01)
pheatmap(as.matrix(sig.airway[, 2:9]),  scale = 'row')
```


<!-- ## Venndiagram -->
<!-- Imagine you have two gene sets. To visualize the overlap of these two sets you can draw a venn diagram. To draw a venn diagram you need to load the `VennDiagram` -->

<!-- ```{r, message= FALSE, collapse=TRUE} -->
<!-- library(VennDiagram) -->
<!-- genes1 = paste('gene', seq(1, 30)) -->
<!-- genes2 = paste('gene', seq(1, 50, 2)) -->
<!-- venn.diagram(list(group1 = genes1, group2 = genes2),  -->
<!--              fill = c('dodgerblue', 'firebrick'), -->
<!--              filename = 'venn1.tiff') -->

<!-- genes.overlap = calculate.overlap(list(group1 = genes1, group2 = genes2)) -->
<!-- genes.overlap -->
<!-- ``` -->

## Barplot
```{r eval = T}
M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3)
barplot(M)
barplot(M, beside = T, main = 'Barplot, beside = T', names.arg = c('A', 'B', 'C'))
```

### Hands On

* Draw a barplot with the average expression level per replicate
  - use `colMeans(airway.D[, 2:9])`

```{r, eval = T, collapse = T, echo = F}
barplot(colMeans(airway.D[, 2:9]), las = 2, cex.names = 0.5)
```

## Boxplot
```{r eval = T}
x = rnorm(100, 10, 2)
y = rnorm(100, 15, 2)

boxplot(list(x, y))
```

You can also plot multiple boxes by using a group variable
```{r}
x.and.y = c(x, y)
groups.x.and.y = c(rep('treatment 1', 50), rep('treatment 2', 80), rep( 'treatment 3', 70))

boxplot(x.and.y ~ groups.x.and.y)
```

### Hands On

* Draw a boxplot using the test data table, one box per replicate
  + log2 of the expression values for better visualization
  + `log2(airway.D[, <correct columns>])`
  + `boxplot()`

```{r, eval = T, collapse = T, echo = FALSE, warning=FALSE}
boxplot(as.matrix(log2(airway.D[,c(2,4, 6, 8, 3, 5, 7, 9)])), las = 2, cex.axis = 0.5, ylab = 'log2(expression)' )
```

<!-- ## Plot properties -->
<!-- You can modify almost everything about the plot properties. To modify plot properties use `par()` -->

<!-- ### Margins -->
<!-- Sometimes the axis label does not fit onto the plot or you want to add subtitle and need more space on the bottom. You can use `par(mar = c(bottom, left, top, right))` to modify the plot margins. -->

<!-- ```{r} -->
<!-- par(mar = c(7, 5, 2, 1)) -->
<!-- boxplot(x.and.y ~ groups.x.and.y, las = 2) -->
<!-- ``` -->

<!-- Sometimes you want multiple plots in one graphic. Use `par(mfrow = (nrow, ncol))` -->
<!-- ```{r} -->
<!-- par(mfrow = c(1, 2)) -->
<!-- boxplot(x.and.y ~ groups.x.and.y, las = 2) -->
<!-- boxplot(list(x, y)) -->
<!-- ``` -->

<!-- ## Add text to a plot -->

<!-- ### Within the plot -->
<!-- ```{r eval = T} -->
<!-- M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) -->
<!-- bp = barplot(M, beside = T, main = 'Barplot, beside = T',  -->
<!--              names.arg = c('A', 'B', 'C'), ylim = c(0, 15)) -->
<!-- text(x = colMeans(bp), y = c(12, 13, 11), labels = '**', cex = 1.5, pos = 3) -->
<!-- ``` -->

<!-- ### Within the margins -->
<!-- ```{r eval = T} -->
<!-- M = matrix(c(12, 13, 11, 9, 5, 7 ), byrow = T, ncol = 3) -->
<!-- bp = barplot(M, beside = T, main = 'Barplot, beside = T', names.arg = c('A', 'B', 'C')) -->
<!-- mtext('Samples', 1, 3) -->
<!-- ``` -->

### Advanced links (plotting)
- data to viz: https://www.data-to-viz.com/
- R plot gallery: https://www.r-graph-gallery.com/


# End of the day challenge
Pick a plot from the R graph gallery and reproduce the plot. I also encourage you to explore the data. In some cases you will need to install extra libraries, but don't be afraid, R-studio usually helps you with that. If we have time I would like you to share your plot and quickly explain what you visualized.

# Recommended further reading
- Introduction to the tidyverse: https://r4ds.had.co.nz/introduction.html

# And for tomorrow:

Tomorrow we will perform a real differential gene expression analysis, following the DESeq2 Tutorial
 <https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html>

 To ensure that everything is running smoothly, please make sure that you have the following libraries installed. If not please install them using either `install.packages()` or Bioconductor.

```{r, eval = FALSE, collapse = T}
# loading all libraries

# cran libraries
install.packages("magrittr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("pheatmap")
install.packages("RColorBrewer")
install.packages("PoiClaClu")
install.packages("ggbeeswarm")
install.packages("dbplyr")

library("magrittr")
library("dplyr")
library("ggplot2")
library("pheatmap")
library("RColorBrewer")
library("PoiClaClu")
library("ggbeeswarm")


# bioconductor libraries
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager", repos='http://cran.us.r-project.org' , dependencies = TRUE)

BiocManager::install("airway")
BiocManager::install("GenomicFeatures")
BiocManager::install("GenomicAlignments")
BiocManager::install("Rsamtools")
BiocManager::install("BiocParallel")
BiocManager::install("DESeq2")
BiocManager::install("vsn")
BiocManager::install("apeglm")
BiocManager::install("genefilter")
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("ReportingTools")
BiocManager::install("Gviz")
BiocManager::install("sva")
BiocManager::install("RUVSeq")
BiocManager::install("fission")
BiocManager::install("tximeta")

library("airway")
library("GenomicFeatures")
library("GenomicAlignments")
library("Rsamtools")
library("BiocParallel")
library("DESeq2")
library("vsn")
library("apeglm")
library("genefilter")
library("AnnotationDbi")
library("org.Hs.eg.db")
library("ReportingTools")
library("Gviz")
library("sva")
library("RUVSeq")
library("fission")
library("tximeta")
```