back to Big Data Biology main page
On the first day of the workshop, we were introduced to a data set that listed the counts of unique OTUs from a number of environmental samples collected in soil near hedge roots.
In the second workshop, we found the distribution of phyla in these samples (cummulatively, across all samples).
In the third workshop, we found the distribution of phyla in each of the environmental samples, separately.
Now, we will introduce a new data table that contains ‘meta-data’ about where each of the samples were collected.
We will use this meta-data to ask: how does the sampling location impact the phyla distribution of the sample?
We will learn about the importance of meta-data for answering biological questions.
We also learn the following specific data analysis techniques:
You will also have the opportunity to learn the following new programming skills if you choose to (i.e. optional): You will also review some of the functions that were introduced in earlier workshops (including: sapply, which, table, unique) and how to draw histograms, scatter plots, pie charts, and boxplots.
load("Workshop3_ITS.Rda")
#Unlike when we read in the table in week 1, we use the parameters "header=TRUE" and "row.names=1".
#header=TRUE means that the first row represents column names
#row.names=1 means that the first row represents row names
meta=read.csv("GroupVars.csv", header=TRUE, row.names=1, stringsAsFactors = FALSE)
#What are the dimensions of this table?
dim(meta)
## [1] 47 5
#What are the column names and row names?
colnames(meta)
## [1] "Field" "HedgeLocation" "Root.number" "Latitude"
## [5] "Longitude"
rownames(meta)
## [1] "W3" "W6" "W8" "W9" "W14" "W17" "W18" "W19" "W22" "W23" "W24" "W25"
## [13] "W26" "W27" "W28" "W33" "W34" "W35" "W36" "W41" "W42" "W43" "W44" "W50"
## [25] "W51" "W52" "W53" "W58" "W59" "W60" "W61" "W65" "W66" "W69" "W70" "W73"
## [37] "W74" "W76" "W77" "W79" "W81" "W83" "W85" "W86" "W87" "W89" "W95"
The row names refer to environmental samples and the column names refer to various kinds of meta-data. For instance, ‘Field’ refers to which field each hedge came from, and the latitude and longitude refer to the geographic location.
Let’s look at the whole table:
meta
## Field HedgeLocation Root.number Latitude Longitude
## W3 BSSE 1 3 53.86877 -1.326241
## W6 BSSE 1 6 53.86877 -1.326242
## W8 BSSE 1 8 53.86877 -1.326243
## W9 BSSE 2 9 53.86891 -1.326051
## W14 BSSE 2 14 53.86891 -1.326052
## W17 BSSE 3 17 53.86901 -1.325844
## W18 BSSE 3 18 53.86901 -1.325845
## W19 BSSE 3 19 53.86901 -1.325846
## W22 BSSE 3 22 53.86901 -1.325847
## W23 BSSE 3 23 53.86901 -1.325848
## W24 BSSE 3 24 53.86901 -1.325849
## W25 BSSW 4 25 53.87087 -1.332464
## W26 BSSW 4 26 53.87087 -1.332465
## W27 BSSW 4 27 53.87087 -1.332466
## W28 BSSW 4 28 53.87087 -1.332467
## W33 BSSW 5 33 53.87108 -1.332263
## W34 BSSW 5 34 53.87108 -1.332264
## W35 BSSW 5 35 53.87108 -1.332265
## W36 BSSW 5 36 53.87108 -1.332266
## W41 BSSW 6 41 53.87075 -1.332559
## W42 BSSW 6 42 53.87075 -1.332560
## W43 BSSW 6 43 53.87075 -1.332561
## W44 BSSW 6 44 53.87075 -1.332562
## W50 Copse 7 50 53.87373 -1.335384
## W51 Copse 7 51 53.87373 -1.335385
## W52 Copse 7 52 53.87373 -1.335386
## W53 Copse 7 53 53.87373 -1.335387
## W58 Copse 8 58 53.87388 -1.335302
## W59 Copse 8 59 53.87388 -1.335303
## W60 Copse 8 60 53.87388 -1.335304
## W61 Copse 8 61 53.87388 -1.335305
## W65 Copse 9 65 53.87403 -1.335211
## W66 Copse 9 66 53.87403 -1.335212
## W69 Copse 9 69 53.87403 -1.335213
## W70 Copse 9 70 53.87403 -1.335214
## W73 Hillside 10 73 53.87284 -1.328333
## W74 Hillside 10 74 53.87284 -1.328334
## W76 Hillside 10 76 53.87284 -1.328335
## W77 Hillside 10 77 53.87284 -1.328336
## W79 Hillside 10 79 53.87284 -1.328337
## W81 Hillside 11 81 53.87287 -1.328524
## W83 Hillside 11 83 53.87287 -1.328525
## W85 Hillside 11 85 53.87287 -1.328526
## W86 Hillside 11 86 53.87287 -1.328527
## W87 Hillside 11 87 53.87287 -1.328528
## W89 Hillside 12 89 53.87290 -1.328704
## W95 Hillside 12 95 53.87290 -1.328705
To visualise the geographic variability in sampling, let us plot the longitude and latitudes of the samples:
long=as.numeric(meta[,"Longitude"])
lat=as.numeric(meta[,"Latitude"])
plot(long, lat, xlab="longitude", ylab="latitude")
What can you say about the distribution of environmental samples from this plot?
How many unique longitudes are there (use unique function)?
length(unique(long))
## [1] 47
Given this answer, doesn’t it seem strange that we can you only see 12 dots in the graph? Maybe some of the longitudes are very close to one another. Let’s check:
rounded_long=round(long, 4) #round to 4 decimal places
length(unique(rounded_long))
## [1] 12
Okay, that seems to explain it. There seem to be 12 sampling locations, but multiple samples taken in each location. Indeed, each of the dots on this scatterplot correspond to a unique ‘HedgeLocation’, a column in the table:
unique(meta[, 'HedgeLocation'])
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
We are interested in visualising how the diversity varies across different hedges. In this section, we will evaluate this qualitatively using pie charts.
Remember that in the previous workshop, we calculated a count-table ‘hedgerow_counts_by_phylum’, where the rows were phyla and the columns were environmental samples.
dim(hedgerow_counts_by_phylum) # reminder of its dimensions
## [1] 15 47
hedgerow_counts_by_phylum[1:5, 1:5] # let's take a peak at the first 5 rows and columns
## W14 W17 W18 W19 W22
## k__Fungi;p__Ascomycota 25607 14359 21956 17524 20237
## k__Fungi;p__Basidiomycota 13518 1325 2440 2063 1145
## k__Fungi;p__Chytridiomycota 10 0 31 108 293
## k__Fungi;p__Entomophthoromycota 0 0 0 0 0
## k__Fungi;p__Glomeromycota 1468 120 0 86 31
Let’s group these by hedge location. The algorithm is as follows:
unique_hedges=unique(meta[,"HedgeLocation"])
hedgerow_counts_by_phylum_and_hedge=sapply(unique_hedges, function(hedge){
ids_field=which(meta[,"HedgeLocation"]==hedge) #step 1
samples_in_field=rownames(meta)[ids_field] #step 2
subTable=hedgerow_counts_by_phylum[,samples_in_field] #step 3
rowSums(subTable) #step 4
})
hedgerow_counts_by_phylum_and_hedge
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## k__Fungi;p__Ascomycota 16400 58487 116787 32165 96155 43913 103498
## k__Fungi;p__Basidiomycota 48255 19287 12913 18464 3740 15990 30305
## k__Fungi;p__Chytridiomycota 247 10 676 58 3 0 216
## k__Fungi;p__Entomophthoromycota 0 0 0 0 0 0 0
## k__Fungi;p__Glomeromycota 428 1491 238 20828 4330 3128 3849
## k__Fungi;p__GS19 0 0 0 0 0 0 0
## k__Fungi;p__Kickxellomycota 5 0 73 0 3 11 31
## k__Fungi;p__Mortierellomycota 63 234 768 779 443 55 1812
## k__Fungi;p__Mucoromycota 0 0 12 0 0 0 14
## k__Fungi;p__Olpidiomycota 12 1 5092 42 0 463 0
## k__Fungi;p__Rozellomycota 595 8 381 112 48 16 91
## k__Fungi;p__unidentified 2922 116 265 128 0 6 12
## k__Fungi;p__Zoopagomycota 817 0 69 1 6 9 34
## k__Protista;p__Cercozoa 0 0 5 136 3 36 28
## other 1 62 119 184 39 14 73
## [,8] [,9] [,10] [,11] [,12]
## k__Fungi;p__Ascomycota 67507 74472 50598 62683 17270
## k__Fungi;p__Basidiomycota 8538 4471 4429 29665 7278
## k__Fungi;p__Chytridiomycota 55 12 5 318 44
## k__Fungi;p__Entomophthoromycota 20 0 0 0 0
## k__Fungi;p__Glomeromycota 28 2 2340 19589 4880
## k__Fungi;p__GS19 0 0 0 0 28
## k__Fungi;p__Kickxellomycota 0 7 1 0 0
## k__Fungi;p__Mortierellomycota 550 91 796 58 49
## k__Fungi;p__Mucoromycota 26 0 2 13 25
## k__Fungi;p__Olpidiomycota 3483 32 3665 543 19
## k__Fungi;p__Rozellomycota 41 0 1730 236 6
## k__Fungi;p__unidentified 129 115 29 63 15
## k__Fungi;p__Zoopagomycota 26 0 258 381 385
## k__Protista;p__Cercozoa 7 4 5 38 14
## other 137 71 78 157 91
Now, we might want to draw the frequencies of the phyla as pie charts. Normally, a pie chart is drawn with the ‘pie’ function.
pie(hedgerow_counts_by_phylum_and_hedge[,1])
We can make this prettier, using the colour labels and names that we made in the previous workshop
pie(hedgerow_counts_by_phylum_and_hedge[,1], col=col_labels, labels=shorterNames)
However, we would like to include spatial information as well: let us draw the scatter plot we drew above, but with pie charts showing the diversity of samples, instead of dots. Get rid of the # symbol to install the necessary packages.
#install.packages("BiocManager")
BiocManager::install(c("Rgraphviz"))
## Bioconductor version 3.10 (BiocManager 1.30.10), R 3.6.1 (2019-07-05)
## Installing package(s) 'Rgraphviz'
##
## The downloaded binary packages are in
## /var/folders/m9/f1j_khfs5_153g67p0y0c1thj5r65_/T//RtmpFOVT22/downloaded_packages
## Old packages: 'ade4', 'ape', 'backports', 'bibtex', 'broom', 'callr', 'cli',
## 'clipr', 'coda', 'codetools', 'colorspace', 'covr', 'cowplot', 'cpp11',
## 'data.table', 'devtools', 'digest', 'doParallel', 'DT', 'expm', 'FDboost',
## 'foreach', 'Formula', 'fs', 'GGally', 'ggplot2', 'ggrepel', 'gh', 'glue',
## 'gplots', 'gridGraphics', 'htmltools', 'htmlwidgets', 'httpuv', 'httr',
## 'hunspell', 'isoband', 'iterators', 'jsonlite', 'KernSmooth', 'knitr',
## 'labeling', 'later', 'libcoin', 'lme4', 'magick', 'magrittr', 'MASS',
## 'Matrix', 'mboost', 'mgcv', 'mvtnorm', 'network', 'nlme', 'openssl',
## 'partykit', 'pbkrtest', 'pillar', 'pkgbuild', 'processx', 'promises', 'ps',
## 'R.methodsS3', 'R.oo', 'R.utils', 'R6', 'Rcpp', 'RcppArmadillo', 'RcppEigen',
## 'Rdpack', 'readr', 'remotes', 'rlang', 'rmarkdown', 'roxygen2', 'rprojroot',
## 'rstudioapi', 'segmented', 'seqinr', 'shiny', 'sna', 'sp', 'spelling',
## 'statnet.common', 'stringi', 'survival', 'sys', 'testthat', 'tibble',
## 'tinytex', 'usethis', 'V8', 'vctrs', 'withr', 'xfun'
library("Rgraphviz")
## Loading required package: graph
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which, which.max, which.min
## Loading required package: grid
#Draw the plot boundary:
plot(c(), xlim=c(min(long), max(long)), ylim=c(min(lat), max(lat)), xlab="Longitude", ylab="Latitutde")
#Draw a pie chart for each hedge:
a=sapply(unique_hedges, function(i){
firstRow=which(meta[,"HedgeLocation"]==i)[1] #get the meta-data for one row from this hedge
hedge_longitude=as.numeric(meta[firstRow, "Longitude"]) # get the longitude for this hedge
hedge_latitude=as.numeric(meta[firstRow, "Latitude"]) # get the latitude for this hedge
vals=hedgerow_counts_by_phylum_and_hedge[,i] # get the phylum counts for this hedge
nonZero=which(vals>0) # pieGlyphs gives an error if any of the values equal 0, so we must only include phyla with counts>0
pieGlyph(vals[nonZero], hedge_longitude, hedge_latitude, radius=0.00009, labels=NA, col=col_labels[nonZero]) # use the pieGlyph function to draw a pie chart in the correct position
})
How would you interpret this graph from a biological perspective? Qualitatively, how does spatial location of the sample impact the fungal phyla distribution? Do there appear to be any samples that are very different from the others? Do you think there is more diversity within a field or between fields?
One way to summarise the diversity of an environmental sample is to ask: if I were to randomly select two individuals from this environment, what is the probability that they will be the same kind of organism (for instance, the same species)? This is the definition of Simpson’s Index (D). In a very diverse environment, it gives a low value (because the probability of selecting the same kind of organism twice is very low) and vice versa. Simpson’s Index of Diversity is 1-D, so that high numbers mean more diversity.
In this section, we will calculate the Simpson’s Index of Diversity for each sample:
BiocManager::install(c("vegan"))
## Bioconductor version 3.10 (BiocManager 1.30.10), R 3.6.1 (2019-07-05)
## Installing package(s) 'vegan'
##
## The downloaded binary packages are in
## /var/folders/m9/f1j_khfs5_153g67p0y0c1thj5r65_/T//RtmpFOVT22/downloaded_packages
## Old packages: 'ade4', 'ape', 'backports', 'bibtex', 'broom', 'callr', 'cli',
## 'clipr', 'coda', 'codetools', 'colorspace', 'covr', 'cowplot', 'cpp11',
## 'data.table', 'devtools', 'digest', 'doParallel', 'DT', 'expm', 'FDboost',
## 'foreach', 'Formula', 'fs', 'GGally', 'ggplot2', 'ggrepel', 'gh', 'glue',
## 'gplots', 'gridGraphics', 'htmltools', 'htmlwidgets', 'httpuv', 'httr',
## 'hunspell', 'isoband', 'iterators', 'jsonlite', 'KernSmooth', 'knitr',
## 'labeling', 'later', 'libcoin', 'lme4', 'magick', 'magrittr', 'MASS',
## 'Matrix', 'mboost', 'mgcv', 'mvtnorm', 'network', 'nlme', 'openssl',
## 'partykit', 'pbkrtest', 'pillar', 'pkgbuild', 'processx', 'promises', 'ps',
## 'R.methodsS3', 'R.oo', 'R.utils', 'R6', 'Rcpp', 'RcppArmadillo', 'RcppEigen',
## 'Rdpack', 'readr', 'remotes', 'rlang', 'rmarkdown', 'roxygen2', 'rprojroot',
## 'rstudioapi', 'segmented', 'seqinr', 'shiny', 'sna', 'sp', 'spelling',
## 'statnet.common', 'stringi', 'survival', 'sys', 'testthat', 'tibble',
## 'tinytex', 'usethis', 'V8', 'vctrs', 'withr', 'xfun'
library("vegan")
## Warning: package 'vegan' was built under R version 3.6.2
## Loading required package: permute
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.6.2
## This is vegan 2.5-7
simpson=diversity(t(hedgerow_counts_by_phylum), index="simpson")
simpson
## W14 W17 W18 W19 W22 W23 W24
## 0.49807288 0.16762914 0.20365149 0.32745844 0.33475559 0.33443232 0.15291603
## W25 W26 W27 W28 W33 W34 W35
## 0.57822915 0.57717751 0.46502694 0.57514129 0.06508104 0.09464753 0.22801737
## W36 W3 W41 W42 W43 W44 W50
## 0.18903588 0.56936660 0.00000000 0.42078375 0.43765607 0.49588258 0.30263233
## W51 W52 W53 W58 W59 W60 W61
## 0.08393509 0.49390262 0.41267318 0.36155131 0.33054538 0.19056867 0.31051566
## W65 W66 W69 W6 W70 W73 W74
## 0.18836565 0.10858469 0.19968855 0.45407865 0.02029461 0.43566361 0.37081157
## W76 W77 W79 W81 W83 W85 W86
## 0.21508357 0.52838941 0.32021222 0.49288508 0.56722568 0.50000000 0.38121734
## W87 W89 W8 W95 W9
## 0.51108194 0.55993412 0.02184443 0.57841302 0.25865596
hist(simpson)
The ‘Field’ column tells us which field the sample was taken from.
How many unique fields are there (how many unique values are there in the "Field’ column of meta)? (use unique function)?
length(unique(meta[,"Field"]))
## [1] 4
How many samples are in each field (use table function)
table(meta[,"Field"])
##
## BSSE BSSW Copse Hillside
## 11 12 12 12
Let us find the Simpson Index of Diversity’s for samples corresponding to each field.
unique_fields=unique(meta[,"Field"])
simpson_by_field=sapply(unique_fields, function(i){ #for every field:
rows_in_field=which(meta[,"Field"]==i) #find the rows that correspond to that field
rownames_in_field=rownames(meta)[rows_in_field] # find the rownames of these rows
simpson[rows_in_field] #get the Simpson Index of Diversity for these rows
})
simpson_by_field
## $BSSE
## W14 W17 W18 W19 W22 W23 W24 W25
## 0.4980729 0.1676291 0.2036515 0.3274584 0.3347556 0.3344323 0.1529160 0.5782291
## W26 W27 W28
## 0.5771775 0.4650269 0.5751413
##
## $BSSW
## W33 W34 W35 W36 W3 W41 W42
## 0.06508104 0.09464753 0.22801737 0.18903588 0.56936660 0.00000000 0.42078375
## W43 W44 W50 W51 W52
## 0.43765607 0.49588258 0.30263233 0.08393509 0.49390262
##
## $Copse
## W53 W58 W59 W60 W61 W65 W66
## 0.41267318 0.36155131 0.33054538 0.19056867 0.31051566 0.18836565 0.10858469
## W69 W6 W70 W73 W74
## 0.19968855 0.45407865 0.02029461 0.43566361 0.37081157
##
## $Hillside
## W76 W77 W79 W81 W83 W85 W86
## 0.21508357 0.52838941 0.32021222 0.49288508 0.56722568 0.50000000 0.38121734
## W87 W89 W8 W95 W9
## 0.51108194 0.55993412 0.02184443 0.57841302 0.25865596
Let us compare the Simpson Index of Diversity for these fields:
boxplot(simpson_by_field, xlab="Field", ylab="Simpson Index of Diversity")
Qualitatively, how would you interpret this graph?
Last week, we used a Chi-squared test, because we were comparing two categorical variables (sample and phylum). Now, we have one categorical value (field) and one continuous variable (Simpson’s Index of Diversity), so an ANOVA (Analysis of Variation) test might be better suited.
Our null hypothesis is that the fields have the same mean Simpson’s Index of Diversity.
Some ANOVA test assumptions:
How would you evaluate whether these assumptions seem to hold? As part of your final report, you can choose to check whether these assumptions hold.
For now, we will skip this step and jump right into combining the Simpson’s Index of Diversity and the Field labels into a single data frame.
sampleOrder=rownames(meta)
df_simpson=data.frame(Field=meta[,"Field"], Simpson=simpson[sampleOrder])
df_simpson
## Field Simpson
## W3 BSSE 0.56936660
## W6 BSSE 0.45407865
## W8 BSSE 0.02184443
## W9 BSSE 0.25865596
## W14 BSSE 0.49807288
## W17 BSSE 0.16762914
## W18 BSSE 0.20365149
## W19 BSSE 0.32745844
## W22 BSSE 0.33475559
## W23 BSSE 0.33443232
## W24 BSSE 0.15291603
## W25 BSSW 0.57822915
## W26 BSSW 0.57717751
## W27 BSSW 0.46502694
## W28 BSSW 0.57514129
## W33 BSSW 0.06508104
## W34 BSSW 0.09464753
## W35 BSSW 0.22801737
## W36 BSSW 0.18903588
## W41 BSSW 0.00000000
## W42 BSSW 0.42078375
## W43 BSSW 0.43765607
## W44 BSSW 0.49588258
## W50 Copse 0.30263233
## W51 Copse 0.08393509
## W52 Copse 0.49390262
## W53 Copse 0.41267318
## W58 Copse 0.36155131
## W59 Copse 0.33054538
## W60 Copse 0.19056867
## W61 Copse 0.31051566
## W65 Copse 0.18836565
## W66 Copse 0.10858469
## W69 Copse 0.19968855
## W70 Copse 0.02029461
## W73 Hillside 0.43566361
## W74 Hillside 0.37081157
## W76 Hillside 0.21508357
## W77 Hillside 0.52838941
## W79 Hillside 0.32021222
## W81 Hillside 0.49288508
## W83 Hillside 0.56722568
## W85 Hillside 0.50000000
## W86 Hillside 0.38121734
## W87 Hillside 0.51108194
## W89 Hillside 0.55993412
## W95 Hillside 0.57841302
Now, let’s make a linear model that predicts the Simpson’s Index of Diversity based on the field:
linMod=lm(Simpson ~ Field, data=df_simpson)
Finally, we can perform an ANOVA on this linear model:
anova(linMod)
## Analysis of Variance Table
##
## Response: Simpson
## Df Sum Sq Mean Sq F value Pr(>F)
## Field 3 0.27143 0.090477 3.4189 0.02552 *
## Residuals 43 1.13795 0.026464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How do we interpret these results: What can we say about the biodiversities of the different fields? Remember in Workshop 1 we noticed that some samples have very few OTUs (information we stored in the variable colSums_ITS)
sort(colSums_ITS)
## W41 W85 W65 W79 W59 W25 W77 W73 W53 W8 W89 W95 W17
## 2 2 19 237 1218 6444 10257 11592 11932 12951 14370 15734 15806
## W86 W33 W26 W70 W61 W44 W74 W69 W43 W81 W19 W83 W76
## 17160 17250 17683 17971 18065 18558 18927 20390 20994 21377 21584 21723 22923
## W27 W24 W42 W6 W18 W52 W22 W28 W23 W35 W58 W36 W34
## 22999 23488 24087 24371 24756 24803 25151 25771 26613 28248 28273 29147 30125
## W3 W60 W9 W66 W14 W51 W87 W50
## 32423 32991 38771 40897 40925 45262 53482 57966
If we remove the samples that have less than 1000 reads sequenced, will the ANOVA results stay the same?
high_counts=which(colSums_ITS>1000)
rownames_high_counts=names(high_counts)
df_simpson_highcounts=df_simpson[rownames_high_counts,]
linMod2=lm(Simpson ~ Field, data=df_simpson_highcounts)
anova(linMod2)
## Analysis of Variance Table
##
## Response: Simpson
## Df Sum Sq Mean Sq F value Pr(>F)
## Field 3 0.25926 0.086419 3.4261 0.02631 *
## Residuals 39 0.98374 0.025224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Bonus code for those who want to use this for your project: Here is some code that will help you test the assumptions of the ANOVA test– can you understand what this does and why this is relevant for testing the ANOVA test assumptions?
resids=resid(linMod) #returns the residuals
fields_as_numbers=c(1:4)
names(fields_as_numbers)=unique_fields
fields=meta[,"Field"]
plot(fields_as_numbers[fields], resids, xlab="fields (numbered)", ylab="residuals")
Today we compared the diversity of the different groups of environmental samples, based on the location in which they were sampled. Here are some biological questions to consider:
Statistical questions: