# Exercise: observer agreement about eye colour

This website is for students following the M.Sc. in Evidence Based Practice at the University of York.

183 students were observed twice by different student observers. These measured height (mm), arm circumference (mm), head circumference, and pulse (beats/min) and recorded sex and eye colour (black, brown, blue, grey, hazel, green, other). They entered these into a computer file. Eye colour and sex were entered as numerical codes.

The following table shows eye colour recorded by the two observers:

Eye colour recorded
by first observer
Eye colour recorded by second observer   Total
black     brown       blue         grey         hazel   green     other
black 6 4 0 0 0 0 0 10
brown 6 69 0 0 4 0 1 80
blue 0 0 39 1 0 2 2 44
grey 0 1 1 4 0 4 0 10
hazel 0 1 0 0 9 4 0 14
green 0 0 1 1 1 15 2 20
other 0 0 0 0 0 2 3 5
Total 12 75 41 6 14 27 8 183

The Stata output for the kappa statistic for this table is:

```. kap eye1 eye2

Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
79.23%      26.16%     0.7188     0.0385      18.69      0.0000
```

Question 1:

How would you describe the level of agreement in this table?

Question 2:

The expected agreement is much lower than for sex, where it was 54.52%. Why is this?

Question 3:

How could we improve the kappa statistic?

Question 4:

What pairs of categories might be regarded as minor disagreements?

Question 5:

What might be plausible weights for the pairs of eye colour categories?

We can use the following disagreement weights:

 black brown blue grey hazel green other black 0 1 2 2 2 2 2 brown 1 0 2 2 1 2 2 blue 2 2 0 1 2 2 2 grey 2 2 1 0 2 1 2 hazel 2 1 2 2 0 1 2 green 2 2 2 1 1 0 2 other 2 2 2 2 2 2 0

We could use agreement weights instead, as some programs, such as Stata, require. (SPSS 16 does not do weighted kappa.)

Question 6:

What weights for agreement would correspond to these disagreement weights?