back to Big Data Biology main page


You don't need to watch all these videos urgently. See the week by week guide in the VLE for a gentle video-watching schedule. New concepts take time to learn, so be patient with yourself. :-)


We include video guides to the workshops. Use these if you need some extra guidance, after the live workshop.


Lectures and module info

Lecture 1

Lecture 1 video (Part A) | 35 minutes
Introduction to big data biology concepts. Introduction to high-throughput methods.

Lecture 1 video (Part B) | 18 minutes
Outline of the module. How we teach, and what you need to do.

Download as PDF | Powerpoint

Lecture 2

Lecture 2 video | 34 minutes

Download as PDF | Powerpoint


R Studio Skills

Installing R and R studio | 9 minutes
Website to download R Studio
Keywords: R studio, packages, tidyverse, install.packages(), library()

Loading data and saving data in R | 10 minutes
Keywords: read.table, read.delim, save.image, load

walkthrough of ggplot2| 26 minutes
See also data-to-viz.com to find out what plot may suit your data. Keywords:tidyverse, ggplot2, geom_histogram, geom_density, geom_violin, geom_boxplot, stat_compare_means

Making plots in R | 17 minutes
See the R commands.
Keywords: boxplot, scatterplot, barplot, box and whiskers plot, histogram, hist, stripchart, summary, head

Data types in R | 13 minutes
Explains data types in R (strings, characters, numeric, Booleans).
Keywords: variables, case-sensitive, TRUE/FALSE, strings, characters, numeric, Boolean(TRUE/FALSE), substring, paste, sub, max, min, abs, and (&), or (|), factors, as.character, as.numeric

Data structures in R | 13 minutes
Explains vectors, data frames, matrices (matrix), lists and how to use them.
Keywords:table, list, read.table, data.frame, column names, matrix, matrices, vector, list, dim, names

What is a function? | 7 minutes
Keywords: Apply, Sapply, lapply

Apply and Sapply | 15 minutes
Keywords: Apply, Sapply, applying a function to rows or columns of a matrix


Feel free to use and/or modify any R code on this website.


Data Analysis Concepts

R tools to summarise large data sets | 30 minutes
Keywords: summary, mean, median, nrow, ncol, dim, subset, hist, head, tail
R code is here

Multiple test correction
Keywords: Bonferroni

Exploring data with plots and summaries | 12 minutes
See the R commands.
Keywords: correlation, mean, median, summary, hist, plot, log scale, log10

Correlation is not causation | 7 minutes
Keywords:pirates, ice cream, sharks, correlation, causation, cor.test, crocodilians

Linear models | 9 minutes
Keywords: gradient, intercept, lm, glucosinolates, F-test, Brassica


Workshop video guides


Please note: not all these workshop videos will be available at the start of term. We'll upload them before you need them though.


Fungal Ecology Dataset

Introduction to the Fungal Ecology Dataset | 11 minutes
Keywords: Daphne Ezer, ecology, soil

Workshop 1 (part 1) | 9 minutes
Keywords:fungi, ecology, metagenomics, operational taxonomic units (OTUs), read.csv, dim, class

Workshop 1 (part 2) | 14 minutes
Keywords:fungi, ecology, metagenomics, rownames, colnames, ecology, hist, histogram, which, colSums, barplot

Workshop 1 (part 3, optional) | 12 minutes
Keywords:grep, logical functions, which, TRUE, FALSE, sort, table

Workshop 2 (part 1) | 15 minutes
Keywords:which, grep, load, sapply

Workshop 2 (part 2) | 16 minutes
Keywords:sapply, unique, missing data, library(seqinr), write.fasta, save.image

Workshop 3 | 20 minutes
Keywords:dim, sum, barplot, plot, technical artifacts, rainbow colour palette, length, legend, order, Chi-squared test, chisq.test, library(MASS)

Workshop 4 | 20 minutes
Keywords: fungal diversity, load, as.numeric, unique, pie chart, library(Rgraphviz), Simpson's Index of Diversity, ANOVA test


Fission Yeast Dataset

Introduction to the Yeast Dataset | 7 minutes
Keywords: Fission yeast, Schizosaccharomyces pombe, Daniel Jeffares, essential genes, Pombase, Angeli, gene expression, mRNA half life, protein copies/cell

Workshop 1 | 25 minutes
Keywords: setwd, rm, hist, load, subset, nrow, ncol, summary, log10, pdf

Workshop 2 | 24 minutes
Keywords: box and whiskers plot, boxplot, wilcox.test, log10, mRNA copies per cell, essential genes

Workshop 3 | 38 minutes
Keywords: ggplot2, transposon, merge

Workshop 4 | 34 minutes
Keywords: conservation, phyloP, bar plot, matrix, chisq.test, figure legend


Brassica Dataset

Introduction to the Brassica Dataset | 11 minutes
Keywords: Brassica, Andrea Harper, Oilseed rape, RPKM, RNAseq, glucosinolate

Workshop 1 | 22 minutes
Keywords:read.delim, dim, class, Brassica, rownames, OSR101_RPKM2[,-1], using square brackets, [], is.numeric, sapply, hist, is.numeric, row.names, rowmeans, subset, RPKM, summary, write.table, barplot, save.image

Workshop 2 | 24 minutes
Keywords:read.table, read.delim, rownames, "for" loops, OSR_merge[1:5.1:5], paste, linear model, lm, abline, line of best fit, anova, coefficients, summary, as.data.frame, ncol

Workshop 3 | 26 minutes
Keywords:qqnorm, qqline, read.delim, library("car"), qqPlot, results$P.value, merge (to merge data frames), order, library(ggplot2), geom_point, theme_classic, -log10, Bonferroni multiple test correction, gl, false discovery rate, FDR, colnames

Workshop 4 | 16 minutes
Keywords: read.delim, NCBI Blast, stringsAsFactors, rownames


Pseudosuchia (crocodile) Macroevolution Dataset

Introduction to the Crocodile Macroevolution Dataset
Keywords: Katie Davis, evolution, phylogeny

Workshop 1 | 27 minutes
Keywords:macroevolution, pseudosuchia, global climate change, read.csv, dim, class, plot, xlim, sea level, save.image

Workshop 2 | 28 minutes
Keywords:macroevolution, pseudosuchia, global climate change, read.csv, dim, class, plot, xlim, sea level, save.image

Workshop 3 | 17 minutes
Keywords:macroevolution, pseudosuchia, loading libraries, BAMMtools, Bayesian, phylogeny, phylogenetic trees, speciation, plotRateThroughTime

Workshop 4 | 26 minutes
Keywords:ggplot2, Detrended Cross Correlation Analysis,


Report Writing

Workshop 5: Report Writing Guide | 38 minutes