Exercise: Meibography

This website is for students following the M.Sc. in Evidence Based Practice at the University of York.

Meibography is the measurement of the number and condition of the meibomian glands in the eyelids.

The following is the abstract of a paper on the subject:

Purpose: To evaluate the within- and between-reader reliability and the interrelation between 2 methods of grading meibography images.

Methods: A video meibography sequence (1200 frames) was captured from 290 patients using near-infrared light (650-700 nm) and a near-infrared CCD camera. One frame was selected for grading by 2 masked readers using 2 scales, where the first reader graded the image on 2 occasions and the second reader graded the image on 1 occasion. The first grading scale was a gestalt assessment (categorically graded), which is an assessment of partial meibomian glands within the image. T he second was a count of individual whole glands. Within- and between-reader reliability and concurrent validity between the scales were examined.

Results: Within-reader reliability of the gestalt scale was moderate to high (simple kappa = 0.78, 95% confidence interval [CI] = 0.71 to 0.85 and weighted kappa = 0.91, 95% CI = 0.88 to 0.95). Within-reader reliability of individual gland counting was moderate via a 95% limits of agreement analysis (-2.84 to 2.76 glands). Between-reader reliability of the gestalt scale was fair (simple kappa = 0.38, 95% CI = 0.30 to 0.46 and weighted kappa = 0.57, 95% CI = 0.47 to 0.68). Between-reader reliability of gland counting was fair via a 95% limits of agreement analysis (-4.46 to 5.08 glands). There was a strong relation between the gestalt scale and gland counting indicating good concurrent validity (Z = -15.15, P < 0.0001).

Conclusions: These methods of grading meibography images demonstrate good within-reader reliability and fair between-reader reliability. Responsiveness to change will need to be addressed in future studies.

(Source: Nichols JJ, Berntsen DA, Mitchell GL, Nichols KK. An assessment of grading scales for meibography images. Cornea 2005; 24: 382-388.)

Questions about this abstract:

  1. What is meant by ‘simple kappa = 0.78’? What does this tell us?
  2. What is meant by ‘weighted kappa = 0.91’? What does this tell us?
  3. Why is weighted kappa larger than simple kappa?
  4. Why is the limits of agreement approach inappropriate for the comparison of two observers? What would be better?
  5. If there were a bias between repeated observations by the same observer, what would this tell us?
