Introduction

Aims

In the third of three related workshops we will learn to apply and interpret the glm() function to binary (binomial) response data.

Objectives

By actively following the first lecture, working through workbook examples during the workshop workshop and any completing follow-up independent study the successful student will be able to:

  • Explain the link between the general linear models and the generalised linear model
  • Recognise where a generalised linear model for binomially distributed data would be appropriate and apply glm()
  • Determine which effects are significant using using summary() and anova()

You can optionally stretch yourself by asking for more in-depth explanations about the meaning of the estimates, and the direction and magnitude of the effects or creating figures to go with your analyses. Biomedical Scientists might be particularly interested in binomial glm estimates (‘odds ratios’).

Workbook Instructions

The workbook for this session is divided in to 2 sections.

You are not expected do all of the workbook examples

Choose one from each section that best matches your biological interests. For each example you choose, you should:

  • write comments in your scripts!
  • read in the data file
  • check you understand the structure of the data
  • identify the response and explanatory variables
  • build a model with glm()
  • examine the model result using summary() and anova()
  • what are the model estimates?
  • interpret the results - you might find it helpful to use predict()
  • use plot(mod, which = 1) and plot(mod, which = 2) to examine the assumptions

Optional Extension: Practice your plotting skills.

Section 1

Choose one of:


Wolf Spiders

This example concerns the effect of sand grain size on the presence of wolf spiders. Suzuki et al. (2006) measured sand grain size on 28 beaches in Japan and observed the presence or absence of the burrowing wolf spider Lycosa ishikariana on each beach. The data are in grainsize.txt. Can you predict the presence of spiders from the sand grain size?


Oesophageal cancer

This examples examines the effect of alcohol consumption on the incidence of oesophageal cancer in men over 55 years of age. Thirty men aged 55 years and over were survey for their alcohol consumption then followed up 10 years later for the occurrence of oesophageal cancer. The data are in oesoph.txt and comprise two variables:

  • status : a variable which indicates whether the individual had developed oesophageal cancer (1) or not (0)
  • alcohol : the amount of alcohol consumed per week in grams

Section 2

Choose one of:


Skin micro-organisms

Human skin is colonised by a diverse collection of micro-organisms which vary considerably between individuals. The presence or absence of a particular micro-organism on the skin of a number of individuals was determined along with variables which might influence presence. The data are in microrg.txt and comprise following variables:

  • melanin : a continuous measure of the concentration of melanin in the individual’s skin determined by the SR method
  • age : the individual’s age in years (to one tenth of a year)
  • presence: whether the micro-organism is absent (0) or present (1) on the individual’s skin
  • gender : female or male

The goal of analysis was to determine if the presence of the micro-organism could be predicted from an individual’s gender, melanin concentration or age.

The Rmd file

Suggested analyses and interpretation for Workbook examples are marked:

#============== WORKBOOK EXAMPLE ==============#

Rmd file