The aim of this session is to introduce you to good programming practice and the use of R Markdown for creating reproducible analyses.
The successful student will be able to:
Analysis workflows should conform to the same practices as lab projects and notebooks: structured and documented pipelines.
Reproducibility is a continuum - some is better than none:
Ask yourself:
Two good references are Wilson et al. (2014) and Wilson et al. (2017).
Live demonstration.
---
gives metadata about the Rmd document and its output.```{r setup, include=FALSE}
That first code chunk is for setting some default code chunk options.
I often use these:
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = FALSE,
warning = FALSE,
message = FALSE)
```
echo = FALSE
means the code will not be included by default - this is normally what you want in a report.
include = FALSE
means neither the code nor output will appear.
Have separate named code chunks for each process: set up, package loading, data import, data tidying (maybe several chunks), different analyses, figures etc.
Naming chunks makes it easier to debug.
After the setup
chunk, I typically have a chunk for loading all the packages I need. I usually add brief comments explaining why I need them if they aren’t packages I use all the time.
```{r pkg}
library(tidyverse)
```
Then the chunk for importing.
```{r import}
# code and comments for data import
```
Then chunks for tidying. These may have names that describe the type of tidying.
```{r tidy}
# code and comments
```
And so on.
All of these would have the report narative around the chunks of code. The exact organisation depends on the project. I often reorganise my chunks as the analysis develops.
“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” The tidyverse style guide
Some keys points:
<-
not =
for assignment"
for quoting text (not '
) unless the text contains double quotes# Ugly code
names(pigeon)[1]<-"interorbital"; hist(pigeon$interorbital,xlim=c(8,14),main=NULL,xlab="Width (mm)",ylab="Number of pigeons",col="grey")
# Well-styled code
names(pigeon)[1] <- "interorbital"
hist(pigeon$interorbital,
xlim = c(8, 14),
main = NULL,
xlab = "Width (mm)",
ylab = "Number of pigeons",
col = "grey")
Write code which expresses the structure of the problem/solution.
Do not hard code numbers if at all possible to avoid.
# bad
sum(3, 5, 6, 7, 8) / 5
## [1] 5.8
(3 - 5.8)^2 + (5 - 5.8)^2 + (6 - 5.8)^2 + (7 - 5.8)^2 + (8 - 5.8)^2
## [1] 14.8
# good
offspring <- c(3, 5, 6, 7, 8)
mean_offspring <- sum(offspring) / length(offspring)
sum((offspring - mean_offspring)^2)
## [1] 14.8
Much of the description of the project will be in the report text, often in a high-level form. Additional information required to understand the properties of the data and the rationale and mechanics of the analyses need to be documented in comments.
Use comments extensively. Use comments both to give an overview.
Use the Case study and exercises from the example in Workshop 2: Tidying data and the tidyverse. to develop a report generated through R Markdown. The data are in Y101_Y102_Y201_Y202_Y101-5.csv.
Do:
Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2019. Rmarkdown: Dynamic Documents for R. https://github.com/rstudio/rmarkdown.
Bryan, Jennifer. 2018. “Excuse Me, Do You Have a Moment to Talk About Version Control?” Am. Stat. 72 (1): 20–27.
Wilson, Greg, D A Aruliah, C Titus Brown, Neil P Chue Hong, Matt Davis, Richard T Guy, Steven H D Haddock, et al. 2014. “Best Practices for Scientific Computing.” PLoS Biol. 12 (1): e1001745.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Comput. Biol. 13 (6): e1005510.
Xie, Yihui, J.J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.