Back to Big Data Biology main page
All data are from unpublished research by Katie E. Davis & Alex Payne.
Our biological question is to explore how the diversification of Pseudosuchia was shaped by environmental changes in the geological past.
You will do this by testing for significant correlations between the speciation rate of Pseudosuchia and global environmental change through geological time.
Over the course of the four workshops you will be guided towards answering the hypothesis by showing you:
The purpose of this workshop is to familiarise yourself with the diversification rate data. You will be extracting a speciation rate time series from these data that you will use in the last workshop to carry out some correlation analyses.
You have been provided with the following data:
You can find all the data needed for these workshops here:
https://www-users.york.ac.uk/~kd856/WorkshopData/
You will also need to load the data you saved after the first workshop.
load("Crocs_Workshop2.RData")
Again, we have some libraries we need to load.
library(BAMMtools)
## Loading required package: ape
library(phytools)
## Loading required package: maps
This session we’re going to explore the diversification dynamics of this group. These data come from a piece of software called fossilBAMM. Later on I link you to the documentation so you can read more about how this is done. Note that the documentation links not to FossilBAMM but to BAMM. FossilBAMM is just a version of BAMM that can model diversification rates for fossil datasets and I will probably use the two names interchangeably as this is the only difference.
First we need to extract “rates through time”. These data are the speciation and extinction rates through time for our group. Remember that our group is composed of the terrestrial species within Pseudosuchia but first we need to load all the diversification rate data before we can start to extract the information relevant to our group. This might take a few minutes to load so this is a good opportunity to put the kettle on!
edata <- getEventData(tree, eventdata ='fossilCrocDiversificationData.txt', burnin=0.1)
## Reading event datafile: fossilCrocDiversificationData.txt
## ...........
## Read a total of 10000 samples from posterior
##
## Discarded as burnin: GENERATIONS < 2997000
## Analyzing 9001 samples from posterior
##
## Setting recursive sequence on tree...
##
## Done with recursive sequence
You might be wondering what this means: ‘burnin = 01’. This means that we are removing the first 10% of the samples to account for the burn-in phase of the analysis. The rationale behind this is that the diversification analysis that produced these data probably didn’t immediately find the most probable solutions. Removing the first 10% means that we’re left with a set of results of equally high probability. See the resources listed at the end of this workshop for more on this.
Now we can extract the subtree that contains only the terrestrial taxa, this may also take a minute or two. NB: this is not quite the same as the subtree in workshop 2, it contains the same taxa but that subtree was a phylo object - this means it just contained the components that make up a phylogenetic tree, ie. tips, nodes and edges (branches). This subtree is a BAMM object, it contains the tree AND the associated diversification data.
streeTerrestrial <- subtreeBAMM(edata, tips=TerrestrialTaxa)
Take a look at streeTerrestrial, it should tell you some information about the extracted tree & associated data.
streeTerrestrial
##
## Phylogenetic tree with 207 tips and 206 internal nodes.
##
## Tip labels:
## Acaenasuchus_geoffreyi, Desmatosuchus_haplocerus, Desmatosuchus_smalli, Lucasuchus_hunti, Sierritasuchus_macalpini, Longosuchus_meadei, ...
##
## Rooted; includes branch lengths.
##
## Posterior samples: 9001
##
## List elements:
## edge Nnode tip.label edge.length begin end downseq lastvisit numberEvents eventData eventVectors tipStates tipLambda meanTipLambda eventBranchSegs tipMu meanTipMu type
Now we need to extract the rates through time for our subtree. This may take a minute or two again.
rtt_T <- getRateThroughTimeMatrix(streeTerrestrial)
Let’s take a look.
summary(rtt_T)
## Length Class Mode
## lambda 900100 -none- numeric
## mu 900100 -none- numeric
## times 100 -none- numeric
## type 1 -none- character
This shows you what rtt_T contains. You can also type rtt_T$ then press tab for a list.
Try exploring what is in rtt_T.
First let’s look at the times, these are your 100 times with the ages in MYR.
rtt_T$times
## [1] 0.000000 2.848220 5.696439 8.544659 11.392879 14.241098
## [7] 17.089318 19.937538 22.785758 25.633977 28.482197 31.330417
## [13] 34.178636 37.026856 39.875076 42.723295 45.571515 48.419735
## [19] 51.267954 54.116174 56.964394 59.812614 62.660833 65.509053
## [25] 68.357273 71.205492 74.053712 76.901932 79.750151 82.598371
## [31] 85.446591 88.294811 91.143030 93.991250 96.839470 99.687689
## [37] 102.535909 105.384129 108.232348 111.080568 113.928788 116.777007
## [43] 119.625227 122.473447 125.321667 128.169886 131.018106 133.866326
## [49] 136.714545 139.562765 142.410985 145.259204 148.107424 150.955644
## [55] 153.803863 156.652083 159.500303 162.348523 165.196742 168.044962
## [61] 170.893182 173.741401 176.589621 179.437841 182.286060 185.134280
## [67] 187.982500 190.830719 193.678939 196.527159 199.375379 202.223598
## [73] 205.071818 207.920038 210.768257 213.616477 216.464697 219.312916
## [79] 222.161136 225.009356 227.857575 230.705795 233.554015 236.402235
## [85] 239.250454 242.098674 244.946894 247.795113 250.643333 253.491553
## [91] 256.339772 259.187992 262.036212 264.884432 267.732651 270.580871
## [97] 273.429091 276.277310 279.125530 281.973750
Now let’s look at lambda.
rtt_T$lambda
Why are there so many results? This is because the full data set contain 9,001 simulations of diversification rate change over the phylogenetic tree. You can only see the full results by scrolling up but there are 100 columns with 9,001 speciation rates for each column. The 100 columns represent geological time, ie. this is showing the speciation rates at 100 points throughout the evolutionary history of this group. In this matrix we are just seeing that there are 100 times recorded as indices, we cannot see the ages associated with these indices.
Now, let’s look at mu (extinction rates). We don’t explore extinction in the workshop material but an option for further analysis would be to try running all the analyses for extinction as well as speciation.
rtt_T$mu
Type - there are only two types of BAMM analysis ‘diversification’ and ‘trait’. These data come from a ‘diversification’ analysis.
rtt_T$type
Let’s take a look at the speciation rate time series, if you want to plot extinction you’ll need to change the ratetype to ‘extinction’, you might also want to change the colour. Feel free to edit the plots to your preference too ready for your report.
plotRateThroughTime(streeTerrestrial, ratetype='speciation', avgCol="blue", ylim=c(0,0.5), cex.axis=2, intervalCol='blue', intervals=c(0.05, 0.95), opacity=0.1)
Why are there confidence intervals and not just a single line plot for the result?
Don’t forget that you can save as a PDF to use in your report. You can also edit the settings to change the look of these figures if you want.
At this point I strongly recommend you spend a bit of time reading about where the diversification rate data come from. This will really help you to gain a deeper understanding of what you’re looking at today. The possibilities go well beyond what we look at in these workshops so I recommend the following sections in the documentation as a starting point - 1, 3.6 & 8.9.
Section 8 is about the basic analyses you can carry out on these data using the package ‘BAMMtools’. An advanced option for your report could be to try exploring some of these. For example, can you find out where rate shifts took place on the Pseudosuchia phylogeny?
This is the link to the software documentation:
http://bamm-project.org/documentation.html
And in the resources section, at the end of the workshop, you will further useful links.
Don’t forget to save your data.
save.image("Crocs_Workshop3.RData")
You might have some time spare in this workshop so you could either start exploring extinction rates using the above code or start working through workshops 2 and 3 with the marine taxa.
For next time, continue working on the marine taxa and also look at extinction rates for both terrestrial and marine taxa. You might also want to try combining your plots so that you can your phylogeny, along with speciation rate change and environmental change all in the same place. We’ll be analysing this statistically in the next workshop but you’re likely to find combining your plots useful as a visualisation tool. Finally, you could also consider exploring the BAMMtools options some more, such as locations of rate shifts and macroevolutionary cohort analysis.
Here are the papers describing BAMM, fossilBAMM & BAMMtools:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089543
https://academic.oup.com/sysbio/article/68/1/1/4999317
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12199
Bayesian statistics resources
Basics of Bayesian statistics:
https://www.quantstart.com/articles/Bayesian-Statistics-A-Beginners-Guide/
https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
MCMC sampling in Bayesian inference:
https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo#Convergence