Back to Big Data Biology main page

All data are from unpublished research by Katie E. Davis & Alex Payne.

Introduction

The biological question

Our biological question is to explore how the diversification of Pseudosuchia was shaped by environmental changes in the geological past.

You will do this by testing for significant correlations between the speciation rate of Pseudosuchia and global environmental change through geological time.

Over the course of the four workshops you will be guided towards answering the hypothesis by showing you:

  • How to explore the different data types.
  • How to partition the phylogenetic data by habitat so that you can test whether ecology affected biotic responses to environmental change.
  • How to plot phylogenetic trees.
  • How to plot time series from environmental & diversification data.
  • How to carry out correlation analyses between these time series.
  • How to plot the output as a histogram.
  • How to test for statistical significance.

Aim of this workshop

The purpose of this workshop is to familiarise yourself with the phylogenetic data you will be using throughout the rest of the workshops. You will be extracting a speciation rate time series from these data that you will use in the last workshop to carry out some correlation analyses.

You have been provided with the following data:

  • A phylogenetic tree of Pseudosuchia.
  • Habitat data so you can partition the data into terrestrial and marine taxa.

Learning outcomes

  • Loading & exploring phylogenetic data.
  • Subsetting data.
  • Plotting phylogenetic trees.
  • Plotting nice looking phylogenetic trees.

Important!!!

###DATAMUNGE###

When you see this anything between these tags it means that we’re either altering the data to the required format or working out parameters that are very specific to phylogenetic trees. This is beyond what I would expect you to be able to code yourself so do not worry about it. You will not be expected to code anything this complex for your report analyses so just use it when necessary and try not to overthink it! You might want to rename variables to avoid over-writing but that is all. The novel aspect of your report should come from the analyses you choose to run and how you choose to present your results.

###/DATAMUNGE###

Getting started

You can find all the data needed for these workshops here:

https://www-users.york.ac.uk/~kd856/WorkshopData/

You will also need to load the data you saved after the first workshop.

load("Crocs_Workshop1.RData")

This time we also have some libraries we need to load.

library(phytools)
## Loading required package: ape
## Loading required package: maps
library(strap)
## Loading required package: geoscale

Loading phylogenetic trees into R

Now we can load our Pseudosuchia data starting with the phylogenetic tree.

tree <- read.tree("fossilCrocPhylogeny.tre")

Let’s try plotting our tree.

plot(tree)

We can’t read that! How else can we see what it contains? Let’s take a look:

tree
## 
## Phylogenetic tree with 536 tips and 535 internal nodes.
## 
## Tip labels:
##   Acaenasuchus_geoffreyi, Desmatosuchus_haplocerus, Desmatosuchus_smalli, Lucasuchus_hunti, Sierritasuchus_macalpini, Longosuchus_meadei, ...
## 
## Rooted; includes branch lengths.

You can see from this that your tree contains 536 tips and 535 internal nodes. But what else makes up a phylogenetic tree?

What makes up a phylogenetic tree?

Let’s take a few minutes to look at what makes up a phylogenetic tree. You can also do this for your subtree, once you’ve created it, and compare the results.

Start typing tree$ then press tab to find out what it contains. You should find the following:

tree$edge
tree$edge.length
tree$Nnode
tree$tip.label
tree$root.edge

Why don’t you explore some of these? See what you can learn about your tree. I will give you some hints first: edges are the branches, nodes are the points at which branches connect, the tip labels are the OTUs (Operational Taxonomic Units, ie. the taxa in the tree) and the root is the deepest (oldest) node in the tree.

Subsetting data

Now you know a bit about what makes up a phylogeny, let’s try subsetting our data so we can take a closer look at just the terrestrial species.

First we need to load in the habitat data.

habitatdata <- read.csv("HabitatData.csv", header=T, stringsAsFactors = FALSE)

Does it look right?

head(habitatdata)
##                            Taxon     Habitat
## 1         Acaenasuchus_geoffreyi Terrestrial
## 2     Adamanasuchus_eisenhardtae Terrestrial
## 3         Adamantinasuchus_navae Terrestrial
## 4             Adzhosuchus_fuscus Terrestrial
## 5               Aeolodon_priscus      Marine
## 6 Aetobarbakinoides_brasiliensis Terrestrial

Now we can use subset() to extract a list of just terrestrial taxa.

TerrestrialTaxa <- subset(habitatdata, habitatdata$Habitat=='Terrestrial')$Taxon

Check your output. Are these the taxa listed as terrestrial in the csv file?

head(TerrestrialTaxa)
## [1] "Acaenasuchus_geoffreyi"         "Adamanasuchus_eisenhardtae"    
## [3] "Adamantinasuchus_navae"         "Adzhosuchus_fuscus"            
## [5] "Aetobarbakinoides_brasiliensis" "Aetosauroides_scagliai"

Now we want to extract the subtree containing only the terrestrial taxa. A subtree is just the part of the phylogenetic tree that only contains the taxa of interest. In this case, we want to extract the parts of the phylogeny that are made up of only terrestrial taxa.

treeT <- keep.tip(tree, TerrestrialTaxa)

Now we have our new tree let’s take a closer look.

treeT
## 
## Phylogenetic tree with 207 tips and 206 internal nodes.
## 
## Tip labels:
##   Acaenasuchus_geoffreyi, Desmatosuchus_haplocerus, Desmatosuchus_smalli, Lucasuchus_hunti, Sierritasuchus_macalpini, Longosuchus_meadei, ...
## 
## Rooted; includes branch lengths.

Notice that it now only contains 207 taxa, which should match those in your CSV file coded as “terrestrial”. If you google the first one (Acaenasuchus geoffreyi), what can you find out about it? Can you confirm that it really is a terrestrial species? Hint: the Paleobiology database is a really great resource, see below for more details.

Remember to also take a look inside your tree by typing treeT$ then pressing tab.

Plotting phylogenetic trees

And now let’s plot our subtree.

plot(treeT)

Still looking pretty untidy despite being a lot smaller! Let’s tweak it a bit to see if we can make it a bit clearer.

We can change the font size using cex. Try this:

plot(treeT, cex=0.2)

This is looking much better, though the tree is still big so it’s still difficult to read, especially on a small screen. Why don’t you try saving it to PDF then you can open it up and zoom in to read? Does that help?

You could look up the plot() function and try some different settings. I like to plot fan trees - what do you think?

Saving phylogenetic trees to file

You don’t need it for this workshop but it’s also worth knowing how to save your new subtree to file as a Newick string. If you want to visualise your tree in online tools such as IToL (Interactive Tree of Life), you’ll need to do this:

write.tree(treeT, file = 'TerrestrialTaxa.tre')

Making a phylogenetic plot scaled to geological time

This is a bit fiddly because the package we’re going to use to make this figure wasn’t designed for phylogenies that are already time-calibrated - but we can make it work! If you want to do this for the marine taxa too you just follow the same steps as above but extract the marine species instead of the terrestrial ones when you subset your data.

###DATAMUNGE###

#Find root for plotting first by checking ages of nodes (nodeHeights)
lengths <- nodeHeights(treeT)

#Now let's find the biggest number, that's our root node
root.time <- max(lengths)

#Set root for plotting
treeT$root.time <- root.time

# grab our OTUs (Operational Taxonomic Units = taxa)
all_otus <- treeT$tip.label

# Create an empty matrix containing the taxa, this is required by strap
all_otudates <- matrix(0, nrow = length(all_otus), ncol=2)

# Turn the matrix into a data frame
all_otudates <- data.frame(all_otudates)

#set the row names to the taxa (OTUs)
row.names(all_otudates) <- all_otus

# set column names to FAD (First Appearance Datum) and LAD (Last Appearance Datum)
colnames(all_otudates) <- c('FAD','LAD')

###/DATAMUNGE###

Now we can give all_otudates to the strap library so that we can plot the tree against a geological timescale.

geoscalePhylo(treeT,ages=all_otudates, cex.tip=0.1, lwd=1, quat.rm=T, units=c("Period", "Epoch"), boxes="Epoch")

How much better is that?! You can now start to get a feel for what was going on throughout the evolutionary history of this group. Again, you might want this as a PDF for your report.

Paleobiology database & species data exploration

Now you’ve had a look at your phylogenetic data it’s worth finding out a little about this group. The paleobiology database is a fantastic online resource for searching for taxon information. Take a look & try searching for some of the species names you’ve looked at today. What can you find out about them?

Paleobiology database online

You can either explore a little or go straight to the search box in the top right hand corner and enter a taxon name.

Saving data

Last of all, don’t forget to save your data.

save.image("Crocs_Workshop2.RData")

For next time

For next time also do this workshop for the marine species. You will find it really helpful, and much more interesting, to have both sets of results for your report. You could also plot your environmental time series from the last workshop against your phylogenetic tree with a geological time scale. Does it look like anything changes in your phylogeny at the same time as any significant environmental changes?

Resources

You might find it helpful to read up on the geological timescale. Here are some webpages to get you started.

Introduction to phylogenetic trees

Very basic introduction:

https://en.wikipedia.org/wiki/Phylogenetic_tree

A basic overview:

https://www.khanacademy.org/science/high-school-biology/hs-evolution/hs-phylogeny/a/phylogenetic-trees

Less basic but a really nice summary of the key concepts:

https://www.nature.com/scitable/topicpage/reading-a-phylogenetic-tree-the-meaning-of-41956/

An interactive tutorial:

https://courses.lumenlearning.com/wm-biology1/chapter/outcome-phylogenetic-trees/

Geological time resources

Introduction to the geological timescale:

https://personalpages.manchester.ac.uk/staff/russell.garwood/EART22101/Geological_column/index.html#first

Interactive Earth:

https://www.smithsonianmag.com/science-nature/travel-through-deep-time-interactive-earth-180952886/

BGS geological timeline:

https://www.bgs.ac.uk/discoveringGeology/time/timeline/entertimeline.html?

Climate change:

https://www.bgs.ac.uk/discoveringGeology/climateChange/home.html

Climate through time I:

https://www.bgs.ac.uk/discoveringGeology/climateChange/images/climateThroughTime/climate_through_time_large.jpg

Climate through time II:

https://www.bgs.ac.uk/discoveringGeology/climateChange/climateThroughTime/images/ClimatethroughTimeOnline_screengrab.jpg

Geological time resources:

https://serc.carleton.edu/NAGTWorkshops/time/visualizations/geotime.html

https://ei.lehigh.edu/learners/cc/geologictimeline.html

University of California, Berkeley - Explorations through time (interactive modules that explore the history of life on Earth):

https://ucmp.berkeley.edu/education/explotime.html

Rocks and fossils factsheets:

https://www.geolsoc.org.uk/rocksfossils

Past climates:

https://www.bgs.ac.uk/discoveringGeology/climateChange/general/pastClimatesExamples.html

Future climates:

https://www.bgs.ac.uk/discoveringGeology/climateChange/general/futureClimates.html

Thinking like a geologist could help us fight climate change:

https://www.theverge.com/2018/10/23/18015908/marcia-bjornerud-timefulness-geology-climate-change-environment

Understanding the history of Earth’s climate:

https://time.com/5680432/climate-change-history-carbon/

Greenhouse Earth – the story of ancient climate change:

https://www.bgs.ac.uk/discoveringGeology/climateChange/greenhouseEarth.html