Back to Big Data Biology main page

All data are from unpublished research by Katie E. Davis & Alex Payne.

Introduction

The biological question

Our biological question is to explore how the diversification of Pseudosuchia was shaped by environmental changes in the geological past.

You will do this by testing for significant correlations between the speciation rate of Pseudosuchia and global environmental change through geological time.

Over the course of the four workshops you will be guided towards answering the hypothesis by showing you:

  • How to explore the different data types.
  • How to partition the phylogenetic data by habitat so that you can test whether ecology affected biotic responses to environmental change.
  • How to plot phylogenetic trees.
  • How to plot time series from environmental & diversification data.
  • How to carry out correlation analyses between these time series.
  • How to plot the output as a histogram.
  • How to test for statistical significance.

Aim of this workshop

The purpose of this workshop is to familiarise yourself with the diversification rate data. You will be extracting a speciation rate time series from these data that you will use in the last workshop to carry out some correlation analyses.

You have been provided with the following data:

  • Diversification rate data that contain speciation & extinction rates.

Learning outcomes

  • Loading & exploring diversification rate data.
  • Plotting rate through time data.
  • Understanding where diversification rate data come from and what they can tell us about past rates of diversification.

Getting started

You can find all the data needed for these workshops here:

https://www-users.york.ac.uk/~kd856/WorkshopData/

You will also need to load the data you saved after the first workshop.

load("Crocs_Workshop2.RData")

Again, we have some libraries we need to load.

library(BAMMtools)
## Loading required package: ape
library(phytools)
## Loading required package: maps

Extracting rates through time from BAMM/fossilBAMM output data

This session we’re going to explore the diversification dynamics of this group. These data come from a piece of software called fossilBAMM. Later on I link you to the documentation so you can read more about how this is done. Note that the documentation links not to FossilBAMM but to BAMM. FossilBAMM is just a version of BAMM that can model diversification rates for fossil datasets and I will probably use the two names interchangeably as this is the only difference.

First we need to extract “rates through time”. These data are the speciation and extinction rates through time for our group. Remember that our group is composed of the terrestrial species within Pseudosuchia but first we need to load all the diversification rate data before we can start to extract the information relevant to our group. This might take a few minutes to load so this is a good opportunity to put the kettle on!

edata <- getEventData(tree, eventdata ='fossilCrocDiversificationData.txt', burnin=0.1)
## Reading event datafile:  fossilCrocDiversificationData.txt 
##      ...........
## Read a total of 10000 samples from posterior
## 
## Discarded as burnin: GENERATIONS <  2997000
## Analyzing  9001  samples from posterior
## 
## Setting recursive sequence on tree...
## 
## Done with recursive sequence

You might be wondering what this means: ‘burnin = 01’. This means that we are removing the first 10% of the samples to account for the burn-in phase of the analysis. The rationale behind this is that the diversification analysis that produced these data probably didn’t immediately find the most probable solutions. Removing the first 10% means that we’re left with a set of results of equally high probability. See the resources listed at the end of this workshop for more on this.

Now we can extract the subtree that contains only the terrestrial taxa, this may also take a minute or two. NB: this is not quite the same as the subtree in workshop 2, it contains the same taxa but that subtree was a phylo object - this means it just contained the components that make up a phylogenetic tree, ie. tips, nodes and edges (branches). This subtree is a BAMM object, it contains the tree AND the associated diversification data.

streeTerrestrial <- subtreeBAMM(edata, tips=TerrestrialTaxa)

Take a look at streeTerrestrial, it should tell you some information about the extracted tree & associated data.

streeTerrestrial
## 
## Phylogenetic tree with 207 tips and 206 internal nodes.
## 
## Tip labels:
##   Acaenasuchus_geoffreyi, Desmatosuchus_haplocerus, Desmatosuchus_smalli, Lucasuchus_hunti, Sierritasuchus_macalpini, Longosuchus_meadei, ...
## 
## Rooted; includes branch lengths.
## 
## Posterior samples: 9001 
## 
## List elements:
##   edge Nnode tip.label edge.length begin end downseq lastvisit numberEvents eventData     eventVectors tipStates tipLambda meanTipLambda eventBranchSegs tipMu meanTipMu type

Now we need to extract the rates through time for our subtree. This may take a minute or two again.

rtt_T <- getRateThroughTimeMatrix(streeTerrestrial)

Exploring rate through time data

Let’s take a look.

summary(rtt_T)
##        Length Class  Mode     
## lambda 900100 -none- numeric  
## mu     900100 -none- numeric  
## times     100 -none- numeric  
## type        1 -none- character

This shows you what rtt_T contains. You can also type rtt_T$ then press tab for a list.

Try exploring what is in rtt_T.

First let’s look at the times, these are your 100 times with the ages in MYR.

rtt_T$times
##   [1]   0.000000   2.848220   5.696439   8.544659  11.392879  14.241098
##   [7]  17.089318  19.937538  22.785758  25.633977  28.482197  31.330417
##  [13]  34.178636  37.026856  39.875076  42.723295  45.571515  48.419735
##  [19]  51.267954  54.116174  56.964394  59.812614  62.660833  65.509053
##  [25]  68.357273  71.205492  74.053712  76.901932  79.750151  82.598371
##  [31]  85.446591  88.294811  91.143030  93.991250  96.839470  99.687689
##  [37] 102.535909 105.384129 108.232348 111.080568 113.928788 116.777007
##  [43] 119.625227 122.473447 125.321667 128.169886 131.018106 133.866326
##  [49] 136.714545 139.562765 142.410985 145.259204 148.107424 150.955644
##  [55] 153.803863 156.652083 159.500303 162.348523 165.196742 168.044962
##  [61] 170.893182 173.741401 176.589621 179.437841 182.286060 185.134280
##  [67] 187.982500 190.830719 193.678939 196.527159 199.375379 202.223598
##  [73] 205.071818 207.920038 210.768257 213.616477 216.464697 219.312916
##  [79] 222.161136 225.009356 227.857575 230.705795 233.554015 236.402235
##  [85] 239.250454 242.098674 244.946894 247.795113 250.643333 253.491553
##  [91] 256.339772 259.187992 262.036212 264.884432 267.732651 270.580871
##  [97] 273.429091 276.277310 279.125530 281.973750

Now let’s look at lambda.

rtt_T$lambda

Why are there so many results? This is because the full data set contain 9,001 simulations of diversification rate change over the phylogenetic tree. You can only see the full results by scrolling up but there are 100 columns with 9,001 speciation rates for each column. The 100 columns represent geological time, ie. this is showing the speciation rates at 100 points throughout the evolutionary history of this group. In this matrix we are just seeing that there are 100 times recorded as indices, we cannot see the ages associated with these indices.

Now, let’s look at mu (extinction rates). We don’t explore extinction in the workshop material but an option for further analysis would be to try running all the analyses for extinction as well as speciation.

rtt_T$mu

Type - there are only two types of BAMM analysis ‘diversification’ and ‘trait’. These data come from a ‘diversification’ analysis.

rtt_T$type

Plotting rates through time

Let’s take a look at the speciation rate time series, if you want to plot extinction you’ll need to change the ratetype to ‘extinction’, you might also want to change the colour. Feel free to edit the plots to your preference too ready for your report.

plotRateThroughTime(streeTerrestrial, ratetype='speciation', avgCol="blue", ylim=c(0,0.5), cex.axis=2, intervalCol='blue', intervals=c(0.05, 0.95), opacity=0.1)

Why are there confidence intervals and not just a single line plot for the result?

Don’t forget that you can save as a PDF to use in your report. You can also edit the settings to change the look of these figures if you want.

Where do these data come from?

At this point I strongly recommend you spend a bit of time reading about where the diversification rate data come from. This will really help you to gain a deeper understanding of what you’re looking at today. The possibilities go well beyond what we look at in these workshops so I recommend the following sections in the documentation as a starting point - 1, 3.6 & 8.9.

Section 8 is about the basic analyses you can carry out on these data using the package ‘BAMMtools’. An advanced option for your report could be to try exploring some of these. For example, can you find out where rate shifts took place on the Pseudosuchia phylogeny?

This is the link to the software documentation:

http://bamm-project.org/documentation.html

And in the resources section, at the end of the workshop, you will further useful links.

Saving data

Don’t forget to save your data.

save.image("Crocs_Workshop3.RData")

You might have some time spare in this workshop so you could either start exploring extinction rates using the above code or start working through workshops 2 and 3 with the marine taxa.

For next time

For next time, continue working on the marine taxa and also look at extinction rates for both terrestrial and marine taxa. You might also want to try combining your plots so that you can your phylogeny, along with speciation rate change and environmental change all in the same place. We’ll be analysing this statistically in the next workshop but you’re likely to find combining your plots useful as a visualisation tool. Finally, you could also consider exploring the BAMMtools options some more, such as locations of rate shifts and macroevolutionary cohort analysis.

Resources

Here are the papers describing BAMM, fossilBAMM & BAMMtools:

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089543

https://academic.oup.com/sysbio/article/68/1/1/4999317

https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12199

Bayesian statistics resources

Basics of Bayesian statistics:

https://www.quantstart.com/articles/Bayesian-Statistics-A-Beginners-Guide/

https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/

MCMC sampling in Bayesian inference:

https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo#Convergence

https://link.springer.com/article/10.3758/s13423-016-1015-8

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3619958/