This website is for students following the M.Sc. in Evidence Based Practice at the University of York.
183 students were observed twice by different student observers. These measured height (mm), arm circumference (mm), head circumference, and pulse (beats/min) and recorded sex and eye colour (black, brown, blue, grey, hazel, green, other). They entered these into a computer file. Eye colour and sex were entered as numerical codes.
The following table shows eye colour recorded by the two observers:
Eye colour recorded by first observer | Eye colour recorded by second observer | Total | ||||||
---|---|---|---|---|---|---|---|---|
black | brown | blue | grey | hazel | green | other | ||
black | 6 | 4 | 0 | 0 | 0 | 0 | 0 | 10 |
brown | 6 | 69 | 0 | 0 | 4 | 0 | 1 | 80 |
blue | 0 | 0 | 39 | 1 | 0 | 2 | 2 | 44 |
grey | 0 | 1 | 1 | 4 | 0 | 4 | 0 | 10 |
hazel | 0 | 1 | 0 | 0 | 9 | 4 | 0 | 14 |
green | 0 | 0 | 1 | 1 | 1 | 15 | 2 | 20 |
other | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 5 |
Total | 12 | 75 | 41 | 6 | 14 | 27 | 8 | 183 |
The Stata output for the kappa statistic for this table is:
. kap eye1 eye2 Expected Agreement Agreement Kappa Std. Err. Z Prob>Z ----------------------------------------------------------------- 79.23% 26.16% 0.7188 0.0385 18.69 0.0000
Question 1:
How would you describe the level of agreement in this table?
Question 2:
The expected agreement is much lower than for sex, where it was 54.52%. Why is this?
How could we improve the kappa statistic?
Question 4:
What pairs of categories might be regarded as minor disagreements?
What might be plausible weights for the pairs of eye colour categories?
We can use the following disagreement weights:
black | brown | blue | grey | hazel | green | other | |
black | 0 | 1 | 2 | 2 | 2 | 2 | 2 |
brown | 1 | 0 | 2 | 2 | 1 | 2 | 2 |
blue | 2 | 2 | 0 | 1 | 2 | 2 | 2 |
grey | 2 | 2 | 1 | 0 | 2 | 1 | 2 |
hazel | 2 | 1 | 2 | 2 | 0 | 1 | 2 |
green | 2 | 2 | 2 | 1 | 1 | 0 | 2 |
other | 2 | 2 | 2 | 2 | 2 | 2 | 0 |
We could use agreement weights instead, as some programs, such as Stata, require. (SPSS 16 does not do weighted kappa.)
Question 6:
What weights for agreement would correspond to these disagreement weights?
This is the Stata output:
. kapwgt eyes 1 \ 0.5 1 \ 0 0 1 \ 0 0 0.5 1 \ 0 0.5 0 0 1 \ 0 0 0 0.5 0.5 1 \ 0 0 0 0 0 0 1 . kap eye1 eye2, wgt(eyes) Ratings weighted by: 1.0000 0.5000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5000 1.0000 0.0000 0.0000 0.5000 0.0000 0.0000 0.0000 0.0000 1.0000 0.5000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5000 1.0000 0.0000 0.5000 0.0000 0.0000 0.5000 0.0000 0.0000 1.0000 0.5000 0.0000 0.0000 0.0000 0.0000 0.5000 0.5000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 Expected Agreement Agreement Kappa Std. Err. Z Prob>Z ----------------------------------------------------------------- 86.61% 34.52% 0.7955 0.0432 18.40 0.0000
Question 7:
How does the weighting change the results?
Back to Measurement in Health and Disease index.
This page maintained by Martin Bland.
Last updated: 21 July, 2008.