R_workshop.Rmd

---
title: "R Workshop -- Introduction to R"
output:
  html_document: default
  pdf_document: default
fontsize: 8pt
---

```{r "setup", include=FALSE}
require("knitr")
opts_knit$set(root.dir = "~/bit_R_workshop/")
```
# Outline
1. Introduction to R Studio
3. Overview over basic functions
4. How to get help
5. Read in data
6. Basic table functions
7. Basic table manipulations
8. Loops and if queries

# Introduction to R Studio

R Studio is a graphical interface for the R language. In general it has four main panels. To open a new R script **File &rarr; New File &rarr; Rscript**. You may write your code within a script and run it either line by line using **ctrl** + **enter** or using the run button. The **#** indicates a comment in R. Lines starting with **#** will not be executed.

```{r}
# in this document gray boxes will indicate code
3+7

# white boxes starting with ## will indicate results calculated by R
```
Now, please share your screen with us for a few moments to make sure we are all on the same page. Thank you!

***

R can be used as a simple calculator
```{r, collapse = TRUE}
# adding
18 + 6

# substracting
18 - 6

# multiplication
18 * 6

# Division
18/6
```

***

# Programming Syntax in R

1. Difference between Console and Script
2. What is a variable
    + `name = value`

3. What is a function
    + `name(argument = value)`
    + `name(argument1 = value1, argument2 = value2)`

4. You can have nested functions
    + `name(arguments = name(arguments = values))`

***


R can also be used as a more advanced calculator
```{r, collapse = TRUE}
# square a number
4 ^ 2

# square root of a number
sqrt(x = 16)

# sqrt is the function name
# 16 is the argument

# log-transform data
log10(100)
log2(8)

# nested calculations
sqrt(x = log2(x = 8))
# or short
sqrt(log2(8))
```

R can be used to check if a value is bigger/smaller than another
```{r, collapse = TRUE}
# greater than, less than
6 == 6 # equal to
6 > 6 # greater than
6 >= 6 # greater or equal then
6 < 6 # less than
```

You can store a result or any value in a variable to use later
```{r, collapse = TRUE}
# assign the value 6 to variable x
x = 6
# check out what x is
x
# or
print(x)
# use x
x + 5
```

## Hands On
**Try to execute the following calculations**

* add 5 and 7
* divide the result by 2
* multiply 3 and 4
* test if the result is smaller or equal to 12


***
# Relevant basic data types

### Numeric
```{r eval = FALSE}
x = 10.5 # or x <- 10

# access variables
x
# or
print(x)
```

### Logical
```{r eval = FALSE}
x = TRUE # T; FALSE; F;
# T == 1, F == 0
# you can calculate using TRUE and FALSE, i.e.
TRUE + TRUE
FALSE + TRUE
```

### Character
```{r eval = FALSE}
apple = "Apple"
x = "six"
```

You can find out as well as manipulate data types
```{r eval = FALSE}
x = 5; y = '5'; z = TRUE
# find out data type
class(x)
typeof(y)

# test if a variable is of a certain data type
is.numeric(y)
is.logical(z)

# for a data type on a variable
as.numeric(y)
as.character(x)
as.numeric('apple')
```

## Hands On
**Create the following Variables**

* create one variable with six as character
* create one variable with 6 as a number
* compare if both variables are the same
* `a = 'apple'`, check type of `a`

***
# Complex data types

### Vector
A vector is a sequence of elements of the same basic data type.
```{r eval = FALSE}
# numerical vector
a = c(2, 3, 4)

# logical vector
b = c(TRUE, FALSE, TRUE)

# character vector
c = c('apple', 'six')

# mixed vector?
d = c('six', a, 5)

# access values in a vector
a[1]
a[1] + 4
a[1] = a[1] + 4
print(a)
```

### Matrix
A matrix is a collection of elements of the same data type arranged in a two-dimensional rectangular layout.
```{r eval = FALSE}
# 3 columns, fill matrix column wise (default)
A = matrix(c(2,3,4,5,6,7), ncol = 3)

# 2 rows, fill matrix row-wise
A = matrix(c(2,3,4,5,6,7), nrow = 2, byrow = T)

# alternative matrix generation
B = matrix(1, 2, 3)
C = matrix(NA, 3, 2)

# transpose a matrix
t(A)

# concatenate two matrices row- or column-wise
D = cbind(A, B)
E = rbind(A, B)

# access matrix elements
# rows
D[1,]
# columns
D[,2]
# individual cells
D[1,2]
```

### List
A list is a generic vector containing objects of the same or different data types
``` {r eval = FALSE}
n = c(2, 3, 5)
s = c('aa', 'bb', 'cc', 'dd')
b = c(TRUE, FALSE, TRUE)

# list without names
x = list(n, s, b)

# same list with names
xn = list(Numbers = n, Strings = s, Boolean = b)
print(xn)

# access list elements
x[2] # vs. x[[2]]
x[[2]]
x[['Strings']]
xn$Strings # vs. xn['Strings'] or xn[['Strings']]
```

### Data Frame
A data frame is list of vectors of equal length organized in rows and columns
```{r eval = FALSE}
n = c(2, 3, 5)
s = c('aa', 'bb', 'cc')
b = c(TRUE, FALSE, TRUE)

# data frame witout names
df = data.frame(n, s, b)

# data frame with column names
names(df) = c('Numbers', 'Strings', 'Boolean')
dfn = data.frame(Numbers = n, Strings = s, Boolean = b)

# compare list and data frame
print(dfn)

# access elements in a data frame
dfn$Numbers # whole first column
dfn[,1] # whole first column
dfn[2,3] # second element of the third column
```

Compare `list` and `data.frame`
``` {r collapse = TRUE}
# input
n = c(2, 3, 5)
s = c('aa', 'bb', 'cc')
b = c(TRUE, FALSE, TRUE)

list(Numbers = n, Strings = s, Boolean = b)

data.frame(Numbers = n, Strings = s, Boolean = b)
```

# Hands On
Create a dataframe based on the following table:

| Name   | Age | Height | Sport      |
|:------:|----:|-------:|:----------:|
| Alice  | 25  |  1.65  | Ski        |
| Ben    | 32  |  1.96  | Baskeball  |
| Charly | 27  |  1.86  | Soccer     |
| Dawn   | 33  |  1.74  | Volleyball |

`n = c('Alice', 'Ben', 'Charly', 'Dawn')`
`a = c(25, 32, 27, 33)`
`h = c(1.65, 1.96, 1.86, 1.74)`
`s = c('Ski', 'Basketball', 'Soccer', 'Volleyball')`

- retrieve the object in the second row, thrid column
- retrieve the fourth column
- retrieve the first row

***

# Basic functions

### Sum
```{r eval = FALSE}
M = matrix(c(5, 6, 7, 8), 2)
sum(M)
rowSums(M)
colSums(M)
```

### Mean
```{r eval = FALSE}
M = matrix(c(5, 6, 7, 8), 2)
mean(M)
rowMeans(M)
colMeans(M)
```

### Variation and standard deviation
```{r eval = FALSE, collapse = T}
M = matrix(c(5, 6, 7, 8), 2)
var(M)
var(as.numeric(M))
sd(M)
```

### Median
```{r eval = TRUE, collapse=T}
mean(c(5, 6, 7, 89))
median(c(5, 6, 7, 89))
```

### Quantiles
```{r eval = TRUE, collapse=T}
quantile(c(5,6,7,89), probs = c(.25, .5, .75))
```

### Sequence
```{r eval = TRUE, collapse=T}
seq(from = 1, to = 10, by = 1)
```

### Repeat
```{r eval = TRUE, collapse=T}
rep(x = NA, times = 3)
rep(c(0,1), 5)
```

### Random numbers
```{r eval = TRUE, collapse=T}
# random numbers from a normal distribution
rnorm(n = 5, mean = 5, sd = 1)

# random numbers from a uniform distribution
runif(n = 5, min = 1, max = 5)
```

Other distributions in R include:

* Normal: `rnorm(n, mean, sd)`
* Uniform: `runif(n, min, max)`
* Binomial: `rbinom(n, size, prob)`
* Beta: `rbeta(n, shape1, shape2, ncp)`
* Exponenital: `rexp(n, rate)`
* Poisson: `rpois(n, lambda)`
* Chi^2^: `rchisq(n, df, ncp)`
* Student's t: `rt(n, df, ncp)`
* ...

There are four basic functions for each distribution

* `rnorm(n, mean, sd)` To generate random numbers
* `dnrom(x, mean, sd)` To get the density
* `pnorm(q, mean, sd)` To get the distribution function
* `qnorm(p, mean, sd)` To get the quantiles of the distribution

# Hands On

0. Set the seed to 0: `set.seed(0)`
1. Create a variable `x = ` with 10,000 random numbers from a binomial distribution with `size = 50` and `prob = 0.5`.
2. Calculate the mean and standard deviation of `x`.
3. Create a 10 by 10 matrix `A = matrix()` with random variables from a normal distribution
4. Calculate the row-wise and column wise-mean of `A`

****

# How to get help

### Inside R
If you have a rough idea how the function might be called
```{r eval = FALSE}
??rowMeans
```
If you know the exact name of the function
```{r eval = FALSE}
?rnorm
```

All R help functions are structured the same way

| Section     | Explanation                                                                                              |
|:------------|:---------------------------------------------------------------------------------------------------------|
| Description | A short overview of what the function intends to do.                                                     |
| Usage       | How to call this function and which arguments may be supplied. The order of the arguments is meaningful. |
| Arguemnts   | A detailed description of the arguments that are passed to the function.                                 |
| Details     | A more or less detailed description of the function.                                                     |
| Value       | The value which is returned by the function.                                                             |
| References  | Citation and related functions.                                                                          |
| Examples    | Different examples on how to use the function.                                                           |

### Outside of R
[Google](https://www.google.com/):
Try looking for `R calculate row wise mean`. A good address is [stackoverflow..com](https://stackoverflow.com/questions/10945703/calculate-row-means-on-subset-of-columns)

# Hands On
Open the help function for `quantile()`. What are the default settings for this functions, what are the arguments you can submit to the function?

***

# Read in data

### Table
```{r eval = FALSE}
# default sep = '', header = F (any white space seperates)
D = read.table('data/mouse.tab', as.is = T, header = T)

# default sep = '\t', header = T
D = read.delim('data/mouse.tab', as.is = T, header = T)
```
### Csv
```{r eval = FALSE}
# default separtor: ',', header = T, decimals as '.'
D = read.csv('data/mouse.csv', as.is = T, header = T)
```

### Excel
```{r eval = FALSE}
# install.packages('xlsx')
library(xlsx)
D = xlsx::read.xlsx('data/mouse.xlsx', sheetName = 'mouse')
# or
# install.packages('openxlsx')
library(openxlsx)
D = openxlsx::read.xlsx('data/mouse.xlsx', sheet = 'mouse')
```

### RData object
RData objects lets you load a previoulsy saved workspace, i.e. from a Co-worker or earlier R-session
```{r}
load('data/rworkshop.RData')
```

### script
source allows you to run a full script
```{r eval = FALSE}
ls()
source('data/test_table_script.R')
ls()
```

Lets take a look at the table
```{r eval = T}
source('data/test_table_script.R')
head(Experiment1)
```

# Hands On
Read in `data/airway_gene_expression.tsv` as `airway.ge` and print the first 5 rows.

```{r, eval = T, collapse = T, echo = FALSE}
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
head(airway.ge, n = 5)
```

# Basic matrix functions
Adding two columns or rows
```{r, eval = T, collapse = T, echo = T}
M = matrix(round(rnorm(25, 10, 5),1), ncol = 5)
print(M)
M[,1] + M[,2]
M[3,] + M[4,]
```
Multiplying two columns
```{r, eval = T, collapse=T}
M[,1] * M[,2]
M * 2
```

Normalzing all columns to sum up to 1
```{r, eval - T, collapse = T}
SUMS = colSums(M)
SUMS

colSums(M/SUMS)

# it is not that easy
M.norm = t(t(M)/SUMS)
colSums(M.norm)
```

# Basic table functions
Get a Summary of all the columns of the table
```{r eval = T}
summary(Experiment1)
```

For one column count the occurences of each item
```{r eval = T}
table(Experiment1$cage)
```

To calculate the correlation between two variables we can use the function `cor()`
```{r eval = T, collapse = T}
cor(Experiment1$cage, Experiment1$weight, method = 'spearman')
cor(Experiment1$lifespan, Experiment1$weight)
```

To see if the lifespan differes between the treatment and the control we can use the Wilcoxon test. The Wilcoxon `(paired = TRUE)` test or Mann Whitney U test `(paired = FALSE)` is used for non-parametric data.
```{r eval = T}
wilcox.test(Experiment1$lifespan ~ Experiment1$treatment)
```
Another test that is commonly used is the `t.test()` to test if two groups differ from one another. You may only use the `t.test()` if your groups are normally distributed.

## Hands On
Work on the airway data
```{r, collapse = T, echo = TRUE}
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
head(airway.ge, n = 5)
```

* Calculate the **correlation** (`cor()`) between variable `S_01.control.Rep1` and variable `S_03.control.Rep2` in `airway.ge`.
```{r, eval = T, echo = F, collapse = T}
airway.cor = cor(airway.ge$S_01.control.Rep1, airway.ge$S_03.control.Rep2)
print(airway.cor)
```

```{r, eval = TRUE, echo = FALSE, warning=FALSE}
plot(x = airway.ge$S_01.control.Rep1, y = airway.ge$S_03.control.Rep2, ylab = 'S_03.control.Rep2', xlab = 'S_01.control.Rep1', pch = 19, col = '#50505050', log = 'xy')
text(x = 1, y = max(airway.ge$S_03.control.Rep2), labels = paste('R =', round(airway.cor, 2)), pos = 4)
```

* Calculate the **T Test** (`t.test()`) comparing variable `S_01.control.Rep1` and variable `S_02.__treat.Rep1` in `airway.ge` .

```{r, eval = T, echo = FALSE}
boxplot(list(S_01.control.Rep1 = log10(airway.ge$S_01.control.Rep1 + 1), S_02.__treat.Rep1 = log10(airway.ge$S_02.__treat.Rep1 + 1)))
```

```{r, eval = T, echo = F, collapse = T}
print('paired = T')
t.test(airway.ge$S_01.control.Rep1, airway.ge$S_02.__treat.Rep1 , paired = T)

print('paired = F')
t.test(airway.ge$S_01.control.Rep1, airway.ge$S_02.__treat.Rep1 , paired = F)
```


***

To visualize the correlation between the weihgt and lifespan, we can use the basic plot function
```{r}
plot(Experiment1$lifespan, Experiment1$weight, xlab = 'lifespan [weeks]',
     ylab = 'weight [g]', main = 'Experiment', pch = 19)
```

And now we can add the regression line using the linear model fitting function `lm()` and `abline()` to draw the fitted line in the plot
```{r}
# fit the linear model to the data
L = lm(Experiment1$lifespan ~ Experiment1$weight)

# plot the data
plot(Experiment1$lifespan, Experiment1$weight, xlab = 'lifespan [weeks]',
     ylab = 'weight [g]', main = 'Experiment', pch = 19)

# add the regression line
abline(L, col = 'red')
```

## Hands On

Work on the airway data
```{r, collapse = T, echo = TRUE}
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
head(airway.ge, n = 5)
```

* Calculate the regression line (`lm()`) between variable `S_01.control.Rep1` and `S_02.__treat.Rep1` in `airway.ge`.
* Plot the data including the regression line. (`L = lm()` `plot()`, `abline(L)`)

```{r, eval = T, echo = F, collapse = T}
airway.lm = lm(airway.ge$S_02.__treat.Rep1 ~ airway.ge$S_01.control.Rep1)
plot(airway.ge$S_01.control.Rep1, airway.ge$S_02.__treat.Rep1, xlab = 'control', ylab = 'treated',
     main = 'Airway', pch = 19)
abline(airway.lm, col = 'red', lty = 2)
abline(a = 0, b = 1, lwd = 0.5, lty = 3)

```

# Basic table manipulations
Imagine you have a second treatment you want to add to the bottom of the existing data.frame. You can use `rbind()` to do that
```{r}
Experiment = rbind(Experiment1, Treatment3)
```

Or you have the body size of each mouse and you want to add it to the left of the exisiting data.frame assuming the same order in both data.frames. You can use `cbind()` to do that
```{r}
ExperimentS = cbind(Experiment, size)
```

If the data.frames are not sorted the same way but both tables have an ID column. You can use `merge()` to combine both tables
```{r}
ExperimentP = merge(ExperimentS, Exp2, by.x = 'mouseID', by.y = 'mouseID_pupps',
                     all = TRUE)
```

Now if you one want to look at the mice from cage 10
```{r}
Exp_cage10 = subset(ExperimentP, cage == 10)
```
or mice from cages 10 and 11
```{r}
Exp_cage10_11 = subset(ExperimentP, cage %in% c(10, 11))
```
then you realize you only need the first 8 columns, easy
```{r}
Exp_cage10_11 = subset(ExperimentP, cage %in% c(10, 11))[,1:8]
```

To view the names of a data.frame we can use `names()`
```{r}
names(Exp_cage10_11)
```
Now we want to change 'sex.x' to 'sex'
```{r}
names(Exp_cage10_11)[3] = 'sex'
```

## Hands On
* Subset your `airway.ge` to genes which have a mean expression of more than 5, e.g. `rowMeans > 5`, `subset(data, condition)`
```{r, collapse = T, echo = TRUE}
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t')
# make Gene ID to row names
row.names(airway.ge) = airway.ge$GeneID
# remove column 'GeneID' from table
airway.ge = airway.ge[,-1]
head(airway.ge, n = 5)

# now keep only rows with an average > 5
# 1. calculate row wise mean using rowMeans()
# 2. subset airway using subset()
```

* From the smaller data frame, make a matrix using all numerical columns.

```{r, eval = T, echo = F, collapse = T}
small_airway = subset(airway.ge, rowMeans(airway.ge) > 5)
small_airway.matrix = as.matrix(small_airway)

head(small_airway.matrix)

paste("typeof:", typeof(small_airway.matrix))
paste("class:",  class(small_airway.matrix))
```


***

# Conditional statements
Imagine you write a script that tells you if you need to take an umbrella with you based on the chances that it will rain today.
```{r eval = T}
chance_of_rain = 0.7
if(chance_of_rain > 0.5){
  print('Better take an umbrella.')
} else {
  print("You don't need to take an umbrella")
}
```

You can also include several conditions
```{r eval = T}
chance_of_rain = 0.7
if(chance_of_rain > 0.8){
  print('Better take an umbrella.')
} else if(chance_of_rain < 0.3){
  print("No need to take an umbrella today")
} else {
  print("It may or may not rain")
}
```

### General structure of if-statements
```{r eval = F}
# one condition
if (condition){ do something }

# nore than one condition
if (condition){
  do something
} else if(condition 2) {
  do something else
}else {
  do something else
}
# you may add as many else ifs as you need
```

### Examples
```{r eval = T, collapse= T}
x = 5; y = NA; z = 'six'

# example 1
if (x == 5){
  print("x is equal to 5")
}

# example 2
if (x == 5){
  x * 2
} else {
  print('x is not equal to 5')
}

# example 3
x = 8
if (x < 5) {
  x * 2
} else if (x == 5) {
  x
} else {
  x / 2
}

# example 4
if(is.na(y)){
  print('y is not available')
} else {
  print(y)
}

# example 5
if (is.character(z)){
  length(z)
  nchar(z)
} else if(is.numeric(z)){
  z * 3
} else {
  print("can't work with z. It's neither a number nor a character")
}


```

## Hands On

* Test if 6 is bigger than 2 times 3. (bigger: `>`; less than: `<`; equal to: `==`)
* Test if "bigger" is longer than "smaller". (length of a string: `nchar()`)
* Print the longer word or both if they are of equal length.

# Examples using vectors
```{r eval = T, collapse = T}
a = c(1, 2, 3)
b = c(1, 2, 3)
c = c(1, 3)

if (a == b) {
  print(paste("a and b agree on", sum (a == b), "elements", sep = ' '))
}

if (a == c){
  print("a and c are the same")
}

a == b
#a == c

# how to fix this problem
if(length(a) == length(c) && a == c){
  print(paste("a and c agree on", sum(a == c), "elements", sep = ' '))
} else {
  print("a and c cannot be compared, they differ in length")
}
```

# Loops
Loops can be used if you want to repeat the same thing several times on different items. There are two different types of loops

### while
* `while (condition == TRUE) { do something }`
* repeat while condition is true
* use if you do not know how often you need to repeat something
* could run infinitely

### for
* `for (i in some_list) { do something }`
* repeat something a set amount of times
* no conditional statement needed.
* use if you have a fixed number of iterations (e.g. rows in a data.frame, all files in a folder)
* will almost always finish

Most procedures can be written as `for` or as `while` loops. `for` loops are almost always the saver choice

### simple examples
```{r eval = TRUE}
# initialize number for while loop
number = 1
# start while loop
while(number  < 50){
  print(number)
  number = number * 2
}
print(number)

print(1:8)
# for loop
for(i in 1:8){
  print(paste('i is now:', i))
}

# for as pairwise comparisons
conditions = c('control', 'drugA', 'drugB')
# iterate over condtions twice
for(cond1 in conditions){
  print(cond1)
  for (cond2 in conditions){
    print(cond2)
   print(paste('t.test(', cond1, ',', cond2,')', sep = ' '))
  }
}
```
# Hands on

1. what will happen?
```{r eval = F}
# for loop
for (x in 2:4){
  print(c(x, x*x))
}

# while loop
x = 1
while(x < 4){
  x <- x + 1
  print(c(x, x*x))
}

# and now
x = 1
while(x < 4){
  print(c(x, x*x))
  x = x + 1
}

# or now?
x = 1
while(x < 4){
  print(c(x, x*x))
}

```

2. sum up all numbers from one to ten using a for loop (start with `x = 0`; each iteration add i to x `x = x + i`)
3. for each item in `c(9, 12, 10, 10, 13,  6,  6, 10, 10,  7,  8,  7,  9, 11,  5)` add a random number between -0.5 and 0.5
4. start deviding 1000 by 2 until the result is 2 or smaller. Count how many iterations are needed

# Loops and if statements combined

```{r}

# # for as pairwise comparisons
# conditions = c('control', 'drugA', 'drugB')
# # iterate over condtions twice
# for(cond1 in conditions){
#   print(cond1)
#   for (cond2 in conditions){
#     print(cond2)
#    print(paste('t.test(', cond1, ',', cond2,')', sep = ' '))
#   }
# }

# testing one condition against itself doesn't make sense
# better
for(cond1 in conditions){
  print(cond1)
  for (cond2 in conditions){
    if (cond1 != cond2){
      print(cond2)
      print(paste('t.test(', cond1, ',', cond2,')', sep = ' '))
    }
  }
}

# now we still have some redundancy in the comparisons as t.test(A,B) is the same as
# t.test(B,A)
for(i in 1:length(conditions)){
  print(paste( 'i: ' , i, ', condition[i]: ', conditions[i], sep = ''))
  for (j in 1:length(conditions)){
    print(paste( 'j: ' , j, ', condition[j]: ', conditions[j], sep = ''))
    if (i < j){
      print(paste('t.test(', conditions[i], ',', conditions[j],')', sep = ' '))
    }
  }
}


```


### Umbrella example
We want to ask our computer for every day of the forcast if we need to take an umbrella or not.

### while
```{r eval = T}
# weather forcast
forcast = c(5, 10, 15, 75, 73, 30, 0)

# initialize the start of the while loop
day = 1

# start while loop
while(day < 8){
  print(day)
  chance_of_rain = forcast[day]

  if(chance_of_rain > 30){
    print('take an umbrella')
  } else {
    print("you don't need and umbrella")
  }
  day = day + 1
}
```

### or with a for loop
```{r eval = T}
# weather forcast
forcast = c(5, 10, 15, 75, 73, 30, 0)
for(day in forcast){
  print(day)
  chance_of_rain = day
  if(chance_of_rain > 30){
    print("take umbrella")
  } else {
    print("you don’t need an umbrella")
  }
}
```

### exit a loop
```{r eval = T}
for (x in 2:4){
  if(x == 4){
    break
  } else {
    print(x)
  }
}
```

### skip an iteration in a loop
```{r  eval = T}
for (x in 2:4){
  if(x == 3){
    next
  } else {
    print(x)
  }
}
```

## Hands on
1. Calculate the mean (`mean()`) and standard deviation (`sd()`) in the following vector `c(9, 12, 10, 10, 13,  6,  6, 10, 10,  7,  8,  7,  9, 11,  5)`.
2. For each item, if the value is less than the mean, add the standard deviation, if the value is higher than the mean, substract the standard deviation.
3. Calculate the mean and standard deviation of the resulting vector.

# Loops for more used cases
You have a table of gene expression data. Each row represents one gene. Each column represents a sample. The first three columns are replicates of the wild type. The following three columns are drug treated replicates. Now for each gene you want to know if there is a significant difference in the values between wild type and drug treatment. You can use a t.test or wilcox.test for each gene.

### no loop
```{r eval = T, collapse = T}
gene = read.csv("data/gene_exp1.csv")
head(gene)
t.test(gene[1,2:4], gene[1,5:7])
t.test(gene[2,2:4], gene[2,5:7])
t.test(gene[3,2:4], gene[3,5:7])
t.test(gene[4,2:4], gene[4,5:7])
t.test(gene[5,2:4], gene[5,5:7])
t.test(gene[6,2:4], gene[6,5:7])
t.test(gene[7,2:4], gene[7,5:7])
t.test(gene[8,2:4], gene[8,5:7])
t.test(gene[9,2:4], gene[9,5:7])
head(gene)
```
But we actually want to save the pvalue within the data frame so we can use it later on. Using `names()`, we can see what values are returned by the `t.test()`
```{r}
names(t.test(gene[1,2:4], gene[1,5:7]))
```
We want the 'p.value' value to save it in our dataframe, let's try it
```{r, collapse = T}
ttest = t.test(gene[1,2:4], gene[1,5:7])
ttest$p.value

# or
t.test(gene[1,2:4], gene[1,5:7])$p.value
```
Great, not we can save the pvalue in the correct spot
```{r, collapse = T}
gene[1, 8] = t.test(gene[1,2:4], gene[1,5:7])$p.value
head(gene)
# and with a loop, we don't have to repeat everything
for(i in 1:9){
  ttest = t.test(gene[i,2:4], gene[i,5:7])
  gene[i,8] = ttest$p.value
}
# take a look at the data frame
head(gene)

# keep only the significant genes
significant = subset(gene, Diff <= 0.05)
head(significant)
```

# Homework
1. Load the table generated in 'create_gene_expression_table.R' using `source()`
2. Return all genes which are significant differentially expressed
3. Use the `t.test()$p.value` and a for loop
4. Groups are as following:
  + Group 1: Rep 1-4
  + Group 2: Rep 5-8

# End of day 1