Introduction

Aims

In the first of three related workshops we revisit tests you have seen previously such as t-tests and ANOVA but now in the framework of the General Linear Model. We will learn to apply and interpret the lm() function.

Objectives

By actively following the first lecture, working through workbook examples during the workshop workshop and any completing follow-up independent study the successful student will be able to:

  • Explain the link between t-tests, ANOVA and regression
  • Appropriately apply linear models using lm()
  • Interpret the results using summary() and anova() and relate them to the outputs of t.test() and aov()

You can optionally stretch yourself by asking for more in-depth explanation, creating figures to go with your analyses, or doing ‘More advanced examples’

Workbook Instructions

The workbook for this session is divided in to 3 sections.

You are not expected do all of the workbook examples

Choose one (or two, if keen) from each section that best matches your biological interests.

For each example you choose:

  • write comments in your scripts!
  • read in the data file
  • check you understand the structure of the data
  • identify the response and explanatory variables
  • build a model with lm()
  • examine the model result using summary() and anova()
  • what are the group means?
  • what does the summary() reveal about signifcant effects?
  • use plot(mod, which = 1) and plot(mod, which = 2) to examine the assumptions
  • consider whether to do post-hoc testing with lsmeans() and pairs()

Optional Extension: Practice your plotting skills.

Workbook

Section 1

Choose one of the following examples


Nicotinic acid on adipocytes

This example is about the effect of nicotinic acid treatment on the adiponectin secretion of an adipocytes cell line. Adiponectin is exclusively secreted from adipose tissue and modulates a number of metabolic processes. Nicotinic acid can affect adiponectin secretion. 3T3-L1 adipocytes were treated with nicotinic acid or with a control treatment and adiponectin concentration (pg/mL) measured. The data are in adipocytes.txt. Each row represents an independent sample of adipocytes and the first column gives the concentration adiponectin and the second column indicates whether they were treated with nicotinic acid or not.


Omega 3 Cannabis sativa

Some plant biotechnologists are trying to increase the quantity of omega 3 fatty acids in Cannabis sativa. They have developed a genetically modified line using genes from Linum usitatissimum (linseed). They grow 50 wild type and fifty modified plants to maturity, collect the seeds and determine the amount of omega 3 fatty acids. The data are in csativa.txt. Do you think their modification has been successful?


Egg laying in a parasitic wasp

The data in wasp.txt concern the egg-laying behaviour of a species of parasitic wasp, laying its eggs on a beetle larva. Wasps and other Hymenopterans (Ants and Bees) are haplo-diploid: unfertilised eggs are haploid and develop into males, whereas fertilised eggs are diploid and develop into females. Researchers wanted to know if mating status affected the time the wasp takes to lay its eggs (in hours). Each row represents an individual wasp. The first column gives the time taken and the second column indicates whether they are mated (1) or unmated (0).

Section 2

Choose one of:


Myoglobin in seals

The myoglobin concentration of skeletal muscle of three species of seal in grams per kilogram of muscle was determined and the data are given in seal.txt. We want to know if there is a difference between species. Each row represents an individual seal. The first column gives the myoglobin concentration and the second column indicates species


Comparing standardization Methods

Researchers measure concentration of long-chain hydrocarbons, in a single unknown sample by three methods of standardisation using gas chromatography. They wish to determine whether the three standardisations methods give the same concentrations. The data are given in analyte.txt and the first column gives the analyte concentration determined in parts per million and the second column indicates the standardisations method ‘standard’, ‘internal standard’ or ‘standard addition’.


Insecticides

The data in biomass.txt are taken from an experiment in which the biomass (g) of insect pest species was measured on plots sprayed with different insecticides. The intention was to determine which insecticide was most effective. This example is slightly more challenging.

Section 3

Choose one of:


Fertilsers on crop yield

The data in yield.txt come from a two-factor design in which crop yield (in kilograms) was determined from plots treated with low and high levels of nitrogen and low and high levels of potassium.


Neuroscience

This example concerns the effect of maternal choline deficiency on neuron cross sectional area in two brain regions in Mice. Postnatal cognitive performance is influenced by the choline intake in utero. To better understand this phenomenon, pregnant mice were fed a control or choline-deficient diet and their offspring examined. The cross sectional area (CSA) of cholinergic neurons was determined in two brain regions, the MSN and the DB. The data are given in neuronregion.txt

More advanced examples

These are few more complex examples which the especially keen would be able to tackle because the ‘general linear model’ is extendable and the same principles apply.


Response to cancer treatment

This example concerns the effect of patient genotype and their glutathione concentration on their sensitivity to cytoxic drugs. Patients vary in their response to cancer treatment. This may be because sensitivity to cytotoxic (anti-cancer) drugs is influenced by genotype. However, glutathione (GSH) concentration is also implicated in treatment sensitivity. Researchers measure treatment sensitivity and GSH concentration for patients that had one of three alleles (“A2”,“AA01”,“B34”). The data are in response.txt and comprise the following variables:

  • GSH: a continuous measure glutathione concentration
  • sens: a continuous measure of treatment sensitivity in arbitrary units
  • genotype: a factor with three levels,A2, AA01 and B34

Clover yield

Replicated plots of clover were grown in one of three rotations, 2-, 4-, or 8-year cycles and the total seed production calculated. The density of yarrow stems is known to affect clover yield so was this was included as a covariate. The data are in clover.txt and comprise the following variables:

  • clov.y: a continuous measure clover yield
  • yarrow.s: a (practically) continuous measure of yarrow stem density in arbitrary units
  • cycle: a factor with three levels,A, B and C

The Rmd file

Suggested analyses and interpretation for Workbook examples are marked:

#============== WORKBOOK EXAMPLE ==============#

Suggested analyses and interpretation for more advanced examples are marked:

#============== MORE ADVANCED EXAMPLES ==============#

Rmd file