The usual way to analyse observer agreement for categorical data is by Cohen's kappa statistics. In the problem considered here, there are an unknown number of "subjects" being observed, and observers only record when the feature being studied is present. No observation is made when it is absent. This makes kappa inappropriate.
The method described here was developed for the study of some cerebral embolus data. The results were published as:
Markus H, Bland JM, Rose G, Sitzer M, Siebler M. How good is intercenter agreement in the identification of embolic signals in carotid artery disease? Stroke 1996; 27: 1249-1252.
Things have been slightly simplified in this description of the method. The data consist of 125 moments in 3 hours of tape when at least one of five observers recorded an abnormality which is thought to represent a cerebral embolus passing. I have used "yes" as shorthand for recording an abnormality and "no" for failing to record an abnormality. There are no moments when all record no abnormality because the method of data collection does not allow for this.
The data are downloadable as a Stata dictionary file, a simple, self-explanatory text format. The detection of an embolus is coded "1", if the observer did not record an embolus at this time it is coded "0".
The problem is that the number of observations where both observers would say "no" is unknown. Consider for example observers 1 and 2. The table is:
Observer 2 | |||
---|---|---|---|
Observer 1 | no | yes | Total |
no | 10 | 5 | 15 |
yes | 11 | 99 | 110 |
Total | 21 | 104 | 125 |
Cohen's kappa = 0.483 |
But if we had only the observations of Observers 1 and 2, there would be no observations in the first cell:
Observer 2 | |||
---|---|---|---|
Observer 1 | no | yes | Total |
no | 0 | 5 | 5 |
yes | 11 | 99 | 110 |
Total | 11 | 104 | 115 |
Cohen's kappa = -0.064 |
But in fact there is an unknown large number of observations where both say "no", e.g. 1000:
Observer 2 | |||
---|---|---|---|
Observer 1 | no | yes | Total |
no | 1000 | 5 | 1005 |
yes | 11 | 99 | 110 |
Total | 1011 | 104 | 1115 |
Cohen's kappa = 0.917 |
Thus Kappa cannot be estimated here. We do not know how many "no"s there are.
We need a different approach. I suggest estimating the probability that if one observer says "yes" another will say "yes" also, i.e. that if one observer records an embolus another observer will also record it.
To estimate this probability, all we need are the numbers of observers giving
"yes" assessments for each moment when a "yes" is observed. Denote the numbers
of observers and recorded moments by n and m respectively,
and the number of observers rating moment i as a "yes" by
ri. For each observer rating subject i as "yes"
there are n-1 other observers, ri-1 of whom classify
the moment as "yes". Hence the proportion of other observers rating the
subject as "yes" is (ri-1)/(n-1). The total number
of ratings as "yes" over all subjects is Sum ri and the
average proportion of further observers who also rate as "yes" is
pyes = (Sum ri(ri-1)
/(n-1))/(Sum ri)
= (Sum ri2 - Sum ri)
/(n-1))/(Sum ri)
We can apply this to the following table for Observers 1 and 2, with no observations in the first cell:
Observer 2 | |||
---|---|---|---|
Observer 1 | no | yes | Total |
no | 0 | 5 | 5 |
yes | 11 | 99 | 110 |
Total | 11 | 104 | 115 |
The number of observers is n=2 and the number of moments is
m=115. There are 5 + 11 = 16 moments where one observers rates it as
"yes" and 99 where both observers rate as "yes".
Sum ri = 16 + 99 times 2 = 214.
Sum ri2 = 16 + 99 times 22 = 412.
pyes = (412-214)/((2-1) times 214) = 0.93
If we apply this to the version of the table with an arbitrary large number of observations where both say "no", e.g. 1000:
Observer 2 | |||
---|---|---|---|
Observer 1 | no | yes | Total |
no | 1000 | 5 | 1005 |
yes | 11 | 99 | 110 |
Total | 1011 | 104 | 1115 |
we get the same thing. The number of observers is n=2 and the number of
moments is now m=1115. However, there are still 5 + 11 = 16 moments
where one observers rates it as "yes" and 99 where both observers rate as
"yes".
Sum ri = 16 + 99 times 2 = 214.
Sum ri2 = 16 + 99 times 22 = 412.
pyes = (412-214)/((2-1) times 214) = 0.93
as before.
This method is not dependent on the moments when no embolus is observed.
If we use all five observers, we have a total of 125 observations. The numbers of moments with each possible number of "yes"s are:
"yes"s | Count |
---|---|
1 | 18 |
2 | 8 |
3 | 8 |
4 | 10 |
5 | 81 |
Total | 125 |
You can find this by adding the variables representing each observer's
observations (0 for "no", 1 for "yes") and tabulating the result. From this we
find
Sum ri = 12575
by multiplying the count by the number of "yes"s and adding. We get
Sum ri2 = 57675.
by multiplying the count by the number of "yes"s squared and adding.
The probability that another observer would say yes is then found by:
pyes = (57675-12575)/((5-1) times 12575) = 0.90
Thus we can conclude that if any one of these observers recorded an embolus, another observer would also record it with probability 0.90. In other words, a second observer would agree with 90% of the emboli recorded.
If you wish to use this method, please acknowledge the original paper by Markus et al. and this web site.
Frequently asked questions on the design and analysis of measurement studies.
This page maintained by Martin Bland.
Last updated: 20 March, 2009