Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
bit_R_workshop/R_workshop.Rmd
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1032 lines (839 sloc)
24.7 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "R Workshop -- Introduction to R" | |
output: | |
html_document: default | |
pdf_document: default | |
fontsize: 8pt | |
--- | |
```{r "setup", include=FALSE} | |
require("knitr") | |
opts_knit$set(root.dir = "~/bit_R_workshop/") | |
``` | |
# Outline | |
1. Introduction to R Studio | |
3. Overview over basic functions | |
4. How to get help | |
5. Read in data | |
6. Basic table functions | |
7. Basic table manipulations | |
8. Loops and if queries | |
# Introduction to R Studio | |
R Studio is a graphical interface for the R language. In general it has four main panels. To open a new R script **File → New File → Rscript**. You may write your code within a script and run it either line by line using **ctrl** + **enter** or using the run button. The **#** indicates a comment in R. Lines starting with **#** will not be executed. | |
```{r} | |
# in this document gray boxes will indicate code | |
3+7 | |
# white boxes starting with ## will indicate results calculated by R | |
``` | |
Now, please share your screen with us for a few moments to make sure we are all on the same page. Thank you! | |
*** | |
R can be used as a simple calculator | |
```{r, collapse = TRUE} | |
# adding | |
18 + 6 | |
# substracting | |
18 - 6 | |
# multiplication | |
18 * 6 | |
# Division | |
18/6 | |
``` | |
*** | |
# Programming Syntax in R | |
1. Difference between Console and Script | |
2. What is a variable | |
+ `name = value` | |
3. What is a function | |
+ `name(argument = value)` | |
+ `name(argument1 = value1, argument2 = value2)` | |
4. You can have nested functions | |
+ `name(arguments = name(arguments = values))` | |
*** | |
R can also be used as a more advanced calculator | |
```{r, collapse = TRUE} | |
# square a number | |
4 ^ 2 | |
# square root of a number | |
sqrt(x = 16) | |
# sqrt is the function name | |
# 16 is the argument | |
# log-transform data | |
log10(100) | |
log2(8) | |
# nested calculations | |
sqrt(x = log2(x = 8)) | |
# or short | |
sqrt(log2(8)) | |
``` | |
R can be used to check if a value is bigger/smaller than another | |
```{r, collapse = TRUE} | |
# greater than, less than | |
6 == 6 # equal to | |
6 > 6 # greater than | |
6 >= 6 # greater or equal then | |
6 < 6 # less than | |
``` | |
You can store a result or any value in a variable to use later | |
```{r, collapse = TRUE} | |
# assign the value 6 to variable x | |
x = 6 | |
# check out what x is | |
x | |
# or | |
print(x) | |
# use x | |
x + 5 | |
``` | |
## Hands On | |
**Try to execute the following calculations** | |
* add 5 and 7 | |
* divide the result by 2 | |
* multiply 3 and 4 | |
* test if the result is smaller or equal to 12 | |
*** | |
# Relevant basic data types | |
### Numeric | |
```{r eval = FALSE} | |
x = 10.5 # or x <- 10 | |
# access variables | |
x | |
# or | |
print(x) | |
``` | |
### Logical | |
```{r eval = FALSE} | |
x = TRUE # T; FALSE; F; | |
# T == 1, F == 0 | |
# you can calculate using TRUE and FALSE, i.e. | |
TRUE + TRUE | |
FALSE + TRUE | |
``` | |
### Character | |
```{r eval = FALSE} | |
apple = "Apple" | |
x = "six" | |
``` | |
You can find out as well as manipulate data types | |
```{r eval = FALSE} | |
x = 5; y = '5'; z = TRUE | |
# find out data type | |
class(x) | |
typeof(y) | |
# test if a variable is of a certain data type | |
is.numeric(y) | |
is.logical(z) | |
# for a data type on a variable | |
as.numeric(y) | |
as.character(x) | |
as.numeric('apple') | |
``` | |
## Hands On | |
**Create the following Variables** | |
* create one variable with six as character | |
* create one variable with 6 as a number | |
* compare if both variables are the same | |
* `a = 'apple'`, check type of `a` | |
*** | |
# Complex data types | |
### Vector | |
A vector is a sequence of elements of the same basic data type. | |
```{r eval = FALSE} | |
# numerical vector | |
a = c(2, 3, 4) | |
# logical vector | |
b = c(TRUE, FALSE, TRUE) | |
# character vector | |
c = c('apple', 'six') | |
# mixed vector? | |
d = c('six', a, 5) | |
# access values in a vector | |
a[1] | |
a[1] + 4 | |
a[1] = a[1] + 4 | |
print(a) | |
``` | |
### Matrix | |
A matrix is a collection of elements of the same data type arranged in a two-dimensional rectangular layout. | |
```{r eval = FALSE} | |
# 3 columns, fill matrix column wise (default) | |
A = matrix(c(2,3,4,5,6,7), ncol = 3) | |
# 2 rows, fill matrix row-wise | |
A = matrix(c(2,3,4,5,6,7), nrow = 2, byrow = T) | |
# alternative matrix generation | |
B = matrix(1, 2, 3) | |
C = matrix(NA, 3, 2) | |
# transpose a matrix | |
t(A) | |
# concatenate two matrices row- or column-wise | |
D = cbind(A, B) | |
E = rbind(A, B) | |
# access matrix elements | |
# rows | |
D[1,] | |
# columns | |
D[,2] | |
# individual cells | |
D[1,2] | |
``` | |
### List | |
A list is a generic vector containing objects of the same or different data types | |
``` {r eval = FALSE} | |
n = c(2, 3, 5) | |
s = c('aa', 'bb', 'cc', 'dd') | |
b = c(TRUE, FALSE, TRUE) | |
# list without names | |
x = list(n, s, b) | |
# same list with names | |
xn = list(Numbers = n, Strings = s, Boolean = b) | |
print(xn) | |
# access list elements | |
x[2] # vs. x[[2]] | |
x[[2]] | |
x[['Strings']] | |
xn$Strings # vs. xn['Strings'] or xn[['Strings']] | |
``` | |
### Data Frame | |
A data frame is list of vectors of equal length organized in rows and columns | |
```{r eval = FALSE} | |
n = c(2, 3, 5) | |
s = c('aa', 'bb', 'cc') | |
b = c(TRUE, FALSE, TRUE) | |
# data frame witout names | |
df = data.frame(n, s, b) | |
# data frame with column names | |
names(df) = c('Numbers', 'Strings', 'Boolean') | |
dfn = data.frame(Numbers = n, Strings = s, Boolean = b) | |
# compare list and data frame | |
print(dfn) | |
# access elements in a data frame | |
dfn$Numbers # whole first column | |
dfn[,1] # whole first column | |
dfn[2,3] # second element of the third column | |
``` | |
Compare `list` and `data.frame` | |
``` {r collapse = TRUE} | |
# input | |
n = c(2, 3, 5) | |
s = c('aa', 'bb', 'cc') | |
b = c(TRUE, FALSE, TRUE) | |
list(Numbers = n, Strings = s, Boolean = b) | |
data.frame(Numbers = n, Strings = s, Boolean = b) | |
``` | |
# Hands On | |
Create a dataframe based on the following table: | |
| Name | Age | Height | Sport | | |
|:------:|----:|-------:|:----------:| | |
| Alice | 25 | 1.65 | Ski | | |
| Ben | 32 | 1.96 | Baskeball | | |
| Charly | 27 | 1.86 | Soccer | | |
| Dawn | 33 | 1.74 | Volleyball | | |
`n = c('Alice', 'Ben', 'Charly', 'Dawn')` | |
`a = c(25, 32, 27, 33)` | |
`h = c(1.65, 1.96, 1.86, 1.74)` | |
`s = c('Ski', 'Basketball', 'Soccer', 'Volleyball')` | |
- retrieve the object in the second row, thrid column | |
- retrieve the fourth column | |
- retrieve the first row | |
*** | |
# Basic functions | |
### Sum | |
```{r eval = FALSE} | |
M = matrix(c(5, 6, 7, 8), 2) | |
sum(M) | |
rowSums(M) | |
colSums(M) | |
``` | |
### Mean | |
```{r eval = FALSE} | |
M = matrix(c(5, 6, 7, 8), 2) | |
mean(M) | |
rowMeans(M) | |
colMeans(M) | |
``` | |
### Variation and standard deviation | |
```{r eval = FALSE, collapse = T} | |
M = matrix(c(5, 6, 7, 8), 2) | |
var(M) | |
var(as.numeric(M)) | |
sd(M) | |
``` | |
### Median | |
```{r eval = TRUE, collapse=T} | |
mean(c(5, 6, 7, 89)) | |
median(c(5, 6, 7, 89)) | |
``` | |
### Quantiles | |
```{r eval = TRUE, collapse=T} | |
quantile(c(5,6,7,89), probs = c(.25, .5, .75)) | |
``` | |
### Sequence | |
```{r eval = TRUE, collapse=T} | |
seq(from = 1, to = 10, by = 1) | |
``` | |
### Repeat | |
```{r eval = TRUE, collapse=T} | |
rep(x = NA, times = 3) | |
rep(c(0,1), 5) | |
``` | |
### Random numbers | |
```{r eval = TRUE, collapse=T} | |
# random numbers from a normal distribution | |
rnorm(n = 5, mean = 5, sd = 1) | |
# random numbers from a uniform distribution | |
runif(n = 5, min = 1, max = 5) | |
``` | |
Other distributions in R include: | |
* Normal: `rnorm(n, mean, sd)` | |
* Uniform: `runif(n, min, max)` | |
* Binomial: `rbinom(n, size, prob)` | |
* Beta: `rbeta(n, shape1, shape2, ncp)` | |
* Exponenital: `rexp(n, rate)` | |
* Poisson: `rpois(n, lambda)` | |
* Chi^2^: `rchisq(n, df, ncp)` | |
* Student's t: `rt(n, df, ncp)` | |
* ... | |
There are four basic functions for each distribution | |
* `rnorm(n, mean, sd)` To generate random numbers | |
* `dnrom(x, mean, sd)` To get the density | |
* `pnorm(q, mean, sd)` To get the distribution function | |
* `qnorm(p, mean, sd)` To get the quantiles of the distribution | |
# Hands On | |
0. Set the seed to 0: `set.seed(0)` | |
1. Create a variable `x = ` with 10,000 random numbers from a binomial distribution with `size = 50` and `prob = 0.5`. | |
2. Calculate the mean and standard deviation of `x`. | |
3. Create a 10 by 10 matrix `A = matrix()` with random variables from a normal distribution | |
4. Calculate the row-wise and column wise-mean of `A` | |
**** | |
# How to get help | |
### Inside R | |
If you have a rough idea how the function might be called | |
```{r eval = FALSE} | |
??rowMeans | |
``` | |
If you know the exact name of the function | |
```{r eval = FALSE} | |
?rnorm | |
``` | |
All R help functions are structured the same way | |
| Section | Explanation | | |
|:------------|:---------------------------------------------------------------------------------------------------------| | |
| Description | A short overview of what the function intends to do. | | |
| Usage | How to call this function and which arguments may be supplied. The order of the arguments is meaningful. | | |
| Arguemnts | A detailed description of the arguments that are passed to the function. | | |
| Details | A more or less detailed description of the function. | | |
| Value | The value which is returned by the function. | | |
| References | Citation and related functions. | | |
| Examples | Different examples on how to use the function. | | |
### Outside of R | |
[Google](https://www.google.com/): | |
Try looking for `R calculate row wise mean`. A good address is [stackoverflow..com](https://stackoverflow.com/questions/10945703/calculate-row-means-on-subset-of-columns) | |
# Hands On | |
Open the help function for `quantile()`. What are the default settings for this functions, what are the arguments you can submit to the function? | |
*** | |
# Read in data | |
### Table | |
```{r eval = FALSE} | |
# default sep = '', header = F (any white space seperates) | |
D = read.table('data/mouse.tab', as.is = T, header = T) | |
# default sep = '\t', header = T | |
D = read.delim('data/mouse.tab', as.is = T, header = T) | |
``` | |
### Csv | |
```{r eval = FALSE} | |
# default separtor: ',', header = T, decimals as '.' | |
D = read.csv('data/mouse.csv', as.is = T, header = T) | |
``` | |
### Excel | |
```{r eval = FALSE} | |
# install.packages('xlsx') | |
library(xlsx) | |
D = xlsx::read.xlsx('data/mouse.xlsx', sheetName = 'mouse') | |
# or | |
# install.packages('openxlsx') | |
library(openxlsx) | |
D = openxlsx::read.xlsx('data/mouse.xlsx', sheet = 'mouse') | |
``` | |
### RData object | |
RData objects lets you load a previoulsy saved workspace, i.e. from a Co-worker or earlier R-session | |
```{r} | |
load('data/rworkshop.RData') | |
``` | |
### script | |
source allows you to run a full script | |
```{r eval = FALSE} | |
ls() | |
source('data/test_table_script.R') | |
ls() | |
``` | |
Lets take a look at the table | |
```{r eval = T} | |
source('data/test_table_script.R') | |
head(Experiment1) | |
``` | |
# Hands On | |
Read in `data/airway_gene_expression.tsv` as `airway.ge` and print the first 5 rows. | |
```{r, eval = T, collapse = T, echo = FALSE} | |
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t') | |
head(airway.ge, n = 5) | |
``` | |
# Basic matrix functions | |
Adding two columns or rows | |
```{r, eval = T, collapse = T, echo = T} | |
M = matrix(round(rnorm(25, 10, 5),1), ncol = 5) | |
print(M) | |
M[,1] + M[,2] | |
M[3,] + M[4,] | |
``` | |
Multiplying two columns | |
```{r, eval = T, collapse=T} | |
M[,1] * M[,2] | |
M * 2 | |
``` | |
Normalzing all columns to sum up to 1 | |
```{r, eval - T, collapse = T} | |
SUMS = colSums(M) | |
SUMS | |
colSums(M/SUMS) | |
# it is not that easy | |
M.norm = t(t(M)/SUMS) | |
colSums(M.norm) | |
``` | |
# Basic table functions | |
Get a Summary of all the columns of the table | |
```{r eval = T} | |
summary(Experiment1) | |
``` | |
For one column count the occurences of each item | |
```{r eval = T} | |
table(Experiment1$cage) | |
``` | |
To calculate the correlation between two variables we can use the function `cor()` | |
```{r eval = T, collapse = T} | |
cor(Experiment1$cage, Experiment1$weight, method = 'spearman') | |
cor(Experiment1$lifespan, Experiment1$weight) | |
``` | |
To see if the lifespan differes between the treatment and the control we can use the Wilcoxon test. The Wilcoxon `(paired = TRUE)` test or Mann Whitney U test `(paired = FALSE)` is used for non-parametric data. | |
```{r eval = T} | |
wilcox.test(Experiment1$lifespan ~ Experiment1$treatment) | |
``` | |
Another test that is commonly used is the `t.test()` to test if two groups differ from one another. You may only use the `t.test()` if your groups are normally distributed. | |
## Hands On | |
Work on the airway data | |
```{r, collapse = T, echo = TRUE} | |
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t') | |
head(airway.ge, n = 5) | |
``` | |
* Calculate the **correlation** (`cor()`) between variable `S_01.control.Rep1` and variable `S_03.control.Rep2` in `airway.ge`. | |
```{r, eval = T, echo = F, collapse = T} | |
airway.cor = cor(airway.ge$S_01.control.Rep1, airway.ge$S_03.control.Rep2) | |
print(airway.cor) | |
``` | |
```{r, eval = TRUE, echo = FALSE, warning=FALSE} | |
plot(x = airway.ge$S_01.control.Rep1, y = airway.ge$S_03.control.Rep2, ylab = 'S_03.control.Rep2', xlab = 'S_01.control.Rep1', pch = 19, col = '#50505050', log = 'xy') | |
text(x = 1, y = max(airway.ge$S_03.control.Rep2), labels = paste('R =', round(airway.cor, 2)), pos = 4) | |
``` | |
* Calculate the **T Test** (`t.test()`) comparing variable `S_01.control.Rep1` and variable `S_02.__treat.Rep1` in `airway.ge` . | |
```{r, eval = T, echo = FALSE} | |
boxplot(list(S_01.control.Rep1 = log10(airway.ge$S_01.control.Rep1 + 1), S_02.__treat.Rep1 = log10(airway.ge$S_02.__treat.Rep1 + 1))) | |
``` | |
```{r, eval = T, echo = F, collapse = T} | |
print('paired = T') | |
t.test(airway.ge$S_01.control.Rep1, airway.ge$S_02.__treat.Rep1 , paired = T) | |
print('paired = F') | |
t.test(airway.ge$S_01.control.Rep1, airway.ge$S_02.__treat.Rep1 , paired = F) | |
``` | |
*** | |
To visualize the correlation between the weihgt and lifespan, we can use the basic plot function | |
```{r} | |
plot(Experiment1$lifespan, Experiment1$weight, xlab = 'lifespan [weeks]', | |
ylab = 'weight [g]', main = 'Experiment', pch = 19) | |
``` | |
And now we can add the regression line using the linear model fitting function `lm()` and `abline()` to draw the fitted line in the plot | |
```{r} | |
# fit the linear model to the data | |
L = lm(Experiment1$lifespan ~ Experiment1$weight) | |
# plot the data | |
plot(Experiment1$lifespan, Experiment1$weight, xlab = 'lifespan [weeks]', | |
ylab = 'weight [g]', main = 'Experiment', pch = 19) | |
# add the regression line | |
abline(L, col = 'red') | |
``` | |
## Hands On | |
Work on the airway data | |
```{r, collapse = T, echo = TRUE} | |
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t') | |
head(airway.ge, n = 5) | |
``` | |
* Calculate the regression line (`lm()`) between variable `S_01.control.Rep1` and `S_02.__treat.Rep1` in `airway.ge`. | |
* Plot the data including the regression line. (`L = lm()` `plot()`, `abline(L)`) | |
```{r, eval = T, echo = F, collapse = T} | |
airway.lm = lm(airway.ge$S_02.__treat.Rep1 ~ airway.ge$S_01.control.Rep1) | |
plot(airway.ge$S_01.control.Rep1, airway.ge$S_02.__treat.Rep1, xlab = 'control', ylab = 'treated', | |
main = 'Airway', pch = 19) | |
abline(airway.lm, col = 'red', lty = 2) | |
abline(a = 0, b = 1, lwd = 0.5, lty = 3) | |
``` | |
# Basic table manipulations | |
Imagine you have a second treatment you want to add to the bottom of the existing data.frame. You can use `rbind()` to do that | |
```{r} | |
Experiment = rbind(Experiment1, Treatment3) | |
``` | |
Or you have the body size of each mouse and you want to add it to the left of the exisiting data.frame assuming the same order in both data.frames. You can use `cbind()` to do that | |
```{r} | |
ExperimentS = cbind(Experiment, size) | |
``` | |
If the data.frames are not sorted the same way but both tables have an ID column. You can use `merge()` to combine both tables | |
```{r} | |
ExperimentP = merge(ExperimentS, Exp2, by.x = 'mouseID', by.y = 'mouseID_pupps', | |
all = TRUE) | |
``` | |
Now if you one want to look at the mice from cage 10 | |
```{r} | |
Exp_cage10 = subset(ExperimentP, cage == 10) | |
``` | |
or mice from cages 10 and 11 | |
```{r} | |
Exp_cage10_11 = subset(ExperimentP, cage %in% c(10, 11)) | |
``` | |
then you realize you only need the first 8 columns, easy | |
```{r} | |
Exp_cage10_11 = subset(ExperimentP, cage %in% c(10, 11))[,1:8] | |
``` | |
To view the names of a data.frame we can use `names()` | |
```{r} | |
names(Exp_cage10_11) | |
``` | |
Now we want to change 'sex.x' to 'sex' | |
```{r} | |
names(Exp_cage10_11)[3] = 'sex' | |
``` | |
## Hands On | |
* Subset your `airway.ge` to genes which have a mean expression of more than 5, e.g. `rowMeans > 5`, `subset(data, condition)` | |
```{r, collapse = T, echo = TRUE} | |
airway.ge <- read.delim("data/airway_gene_expression.tsv", as.is = T, header = T, sep = '\t') | |
# make Gene ID to row names | |
row.names(airway.ge) = airway.ge$GeneID | |
# remove column 'GeneID' from table | |
airway.ge = airway.ge[,-1] | |
head(airway.ge, n = 5) | |
# now keep only rows with an average > 5 | |
# 1. calculate row wise mean using rowMeans() | |
# 2. subset airway using subset() | |
``` | |
* From the smaller data frame, make a matrix using all numerical columns. | |
```{r, eval = T, echo = F, collapse = T} | |
small_airway = subset(airway.ge, rowMeans(airway.ge) > 5) | |
small_airway.matrix = as.matrix(small_airway) | |
head(small_airway.matrix) | |
paste("typeof:", typeof(small_airway.matrix)) | |
paste("class:", class(small_airway.matrix)) | |
``` | |
*** | |
# Conditional statements | |
Imagine you write a script that tells you if you need to take an umbrella with you based on the chances that it will rain today. | |
```{r eval = T} | |
chance_of_rain = 0.7 | |
if(chance_of_rain > 0.5){ | |
print('Better take an umbrella.') | |
} else { | |
print("You don't need to take an umbrella") | |
} | |
``` | |
You can also include several conditions | |
```{r eval = T} | |
chance_of_rain = 0.7 | |
if(chance_of_rain > 0.8){ | |
print('Better take an umbrella.') | |
} else if(chance_of_rain < 0.3){ | |
print("No need to take an umbrella today") | |
} else { | |
print("It may or may not rain") | |
} | |
``` | |
### General structure of if-statements | |
```{r eval = F} | |
# one condition | |
if (condition){ do something } | |
# nore than one condition | |
if (condition){ | |
do something | |
} else if(condition 2) { | |
do something else | |
}else { | |
do something else | |
} | |
# you may add as many else ifs as you need | |
``` | |
### Examples | |
```{r eval = T, collapse= T} | |
x = 5; y = NA; z = 'six' | |
# example 1 | |
if (x == 5){ | |
print("x is equal to 5") | |
} | |
# example 2 | |
if (x == 5){ | |
x * 2 | |
} else { | |
print('x is not equal to 5') | |
} | |
# example 3 | |
x = 8 | |
if (x < 5) { | |
x * 2 | |
} else if (x == 5) { | |
x | |
} else { | |
x / 2 | |
} | |
# example 4 | |
if(is.na(y)){ | |
print('y is not available') | |
} else { | |
print(y) | |
} | |
# example 5 | |
if (is.character(z)){ | |
length(z) | |
nchar(z) | |
} else if(is.numeric(z)){ | |
z * 3 | |
} else { | |
print("can't work with z. It's neither a number nor a character") | |
} | |
``` | |
## Hands On | |
* Test if 6 is bigger than 2 times 3. (bigger: `>`; less than: `<`; equal to: `==`) | |
* Test if "bigger" is longer than "smaller". (length of a string: `nchar()`) | |
* Print the longer word or both if they are of equal length. | |
# Examples using vectors | |
```{r eval = T, collapse = T} | |
a = c(1, 2, 3) | |
b = c(1, 2, 3) | |
c = c(1, 3) | |
if (a == b) { | |
print(paste("a and b agree on", sum (a == b), "elements", sep = ' ')) | |
} | |
if (a == c){ | |
print("a and c are the same") | |
} | |
a == b | |
#a == c | |
# how to fix this problem | |
if(length(a) == length(c) && a == c){ | |
print(paste("a and c agree on", sum(a == c), "elements", sep = ' ')) | |
} else { | |
print("a and c cannot be compared, they differ in length") | |
} | |
``` | |
# Loops | |
Loops can be used if you want to repeat the same thing several times on different items. There are two different types of loops | |
### while | |
* `while (condition == TRUE) { do something }` | |
* repeat while condition is true | |
* use if you do not know how often you need to repeat something | |
* could run infinitely | |
### for | |
* `for (i in some_list) { do something }` | |
* repeat something a set amount of times | |
* no conditional statement needed. | |
* use if you have a fixed number of iterations (e.g. rows in a data.frame, all files in a folder) | |
* will almost always finish | |
Most procedures can be written as `for` or as `while` loops. `for` loops are almost always the saver choice | |
### simple examples | |
```{r eval = TRUE} | |
# initialize number for while loop | |
number = 1 | |
# start while loop | |
while(number < 50){ | |
print(number) | |
number = number * 2 | |
} | |
print(number) | |
print(1:8) | |
# for loop | |
for(i in 1:8){ | |
print(paste('i is now:', i)) | |
} | |
# for as pairwise comparisons | |
conditions = c('control', 'drugA', 'drugB') | |
# iterate over condtions twice | |
for(cond1 in conditions){ | |
print(cond1) | |
for (cond2 in conditions){ | |
print(cond2) | |
print(paste('t.test(', cond1, ',', cond2,')', sep = ' ')) | |
} | |
} | |
``` | |
# Hands on | |
1. what will happen? | |
```{r eval = F} | |
# for loop | |
for (x in 2:4){ | |
print(c(x, x*x)) | |
} | |
# while loop | |
x = 1 | |
while(x < 4){ | |
x <- x + 1 | |
print(c(x, x*x)) | |
} | |
# and now | |
x = 1 | |
while(x < 4){ | |
print(c(x, x*x)) | |
x = x + 1 | |
} | |
# or now? | |
x = 1 | |
while(x < 4){ | |
print(c(x, x*x)) | |
} | |
``` | |
2. sum up all numbers from one to ten using a for loop (start with `x = 0`; each iteration add i to x `x = x + i`) | |
3. for each item in `c(9, 12, 10, 10, 13, 6, 6, 10, 10, 7, 8, 7, 9, 11, 5)` add a random number between -0.5 and 0.5 | |
4. start deviding 1000 by 2 until the result is 2 or smaller. Count how many iterations are needed | |
# Loops and if statements combined | |
```{r} | |
# # for as pairwise comparisons | |
# conditions = c('control', 'drugA', 'drugB') | |
# # iterate over condtions twice | |
# for(cond1 in conditions){ | |
# print(cond1) | |
# for (cond2 in conditions){ | |
# print(cond2) | |
# print(paste('t.test(', cond1, ',', cond2,')', sep = ' ')) | |
# } | |
# } | |
# testing one condition against itself doesn't make sense | |
# better | |
for(cond1 in conditions){ | |
print(cond1) | |
for (cond2 in conditions){ | |
if (cond1 != cond2){ | |
print(cond2) | |
print(paste('t.test(', cond1, ',', cond2,')', sep = ' ')) | |
} | |
} | |
} | |
# now we still have some redundancy in the comparisons as t.test(A,B) is the same as | |
# t.test(B,A) | |
for(i in 1:length(conditions)){ | |
print(paste( 'i: ' , i, ', condition[i]: ', conditions[i], sep = '')) | |
for (j in 1:length(conditions)){ | |
print(paste( 'j: ' , j, ', condition[j]: ', conditions[j], sep = '')) | |
if (i < j){ | |
print(paste('t.test(', conditions[i], ',', conditions[j],')', sep = ' ')) | |
} | |
} | |
} | |
``` | |
### Umbrella example | |
We want to ask our computer for every day of the forcast if we need to take an umbrella or not. | |
### while | |
```{r eval = T} | |
# weather forcast | |
forcast = c(5, 10, 15, 75, 73, 30, 0) | |
# initialize the start of the while loop | |
day = 1 | |
# start while loop | |
while(day < 8){ | |
print(day) | |
chance_of_rain = forcast[day] | |
if(chance_of_rain > 30){ | |
print('take an umbrella') | |
} else { | |
print("you don't need and umbrella") | |
} | |
day = day + 1 | |
} | |
``` | |
### or with a for loop | |
```{r eval = T} | |
# weather forcast | |
forcast = c(5, 10, 15, 75, 73, 30, 0) | |
for(day in forcast){ | |
print(day) | |
chance_of_rain = day | |
if(chance_of_rain > 30){ | |
print("take umbrella") | |
} else { | |
print("you don’t need an umbrella") | |
} | |
} | |
``` | |
### exit a loop | |
```{r eval = T} | |
for (x in 2:4){ | |
if(x == 4){ | |
break | |
} else { | |
print(x) | |
} | |
} | |
``` | |
### skip an iteration in a loop | |
```{r eval = T} | |
for (x in 2:4){ | |
if(x == 3){ | |
next | |
} else { | |
print(x) | |
} | |
} | |
``` | |
## Hands on | |
1. Calculate the mean (`mean()`) and standard deviation (`sd()`) in the following vector `c(9, 12, 10, 10, 13, 6, 6, 10, 10, 7, 8, 7, 9, 11, 5)`. | |
2. For each item, if the value is less than the mean, add the standard deviation, if the value is higher than the mean, substract the standard deviation. | |
3. Calculate the mean and standard deviation of the resulting vector. | |
# Loops for more used cases | |
You have a table of gene expression data. Each row represents one gene. Each column represents a sample. The first three columns are replicates of the wild type. The following three columns are drug treated replicates. Now for each gene you want to know if there is a significant difference in the values between wild type and drug treatment. You can use a t.test or wilcox.test for each gene. | |
### no loop | |
```{r eval = T, collapse = T} | |
gene = read.csv("data/gene_exp1.csv") | |
head(gene) | |
t.test(gene[1,2:4], gene[1,5:7]) | |
t.test(gene[2,2:4], gene[2,5:7]) | |
t.test(gene[3,2:4], gene[3,5:7]) | |
t.test(gene[4,2:4], gene[4,5:7]) | |
t.test(gene[5,2:4], gene[5,5:7]) | |
t.test(gene[6,2:4], gene[6,5:7]) | |
t.test(gene[7,2:4], gene[7,5:7]) | |
t.test(gene[8,2:4], gene[8,5:7]) | |
t.test(gene[9,2:4], gene[9,5:7]) | |
head(gene) | |
``` | |
But we actually want to save the pvalue within the data frame so we can use it later on. Using `names()`, we can see what values are returned by the `t.test()` | |
```{r} | |
names(t.test(gene[1,2:4], gene[1,5:7])) | |
``` | |
We want the 'p.value' value to save it in our dataframe, let's try it | |
```{r, collapse = T} | |
ttest = t.test(gene[1,2:4], gene[1,5:7]) | |
ttest$p.value | |
# or | |
t.test(gene[1,2:4], gene[1,5:7])$p.value | |
``` | |
Great, not we can save the pvalue in the correct spot | |
```{r, collapse = T} | |
gene[1, 8] = t.test(gene[1,2:4], gene[1,5:7])$p.value | |
head(gene) | |
# and with a loop, we don't have to repeat everything | |
for(i in 1:9){ | |
ttest = t.test(gene[i,2:4], gene[i,5:7]) | |
gene[i,8] = ttest$p.value | |
} | |
# take a look at the data frame | |
head(gene) | |
# keep only the significant genes | |
significant = subset(gene, Diff <= 0.05) | |
head(significant) | |
``` | |
# Homework | |
1. Load the table generated in 'create_gene_expression_table.R' using `source()` | |
2. Return all genes which are significant differentially expressed | |
3. Use the `t.test()$p.value` and a for loop | |
4. Groups are as following: | |
+ Group 1: Rep 1-4 | |
+ Group 2: Rep 5-8 | |
# End of day 1 |