How do I measure observer agreement when only positive observations are recorded?

The usual way to analyse observer agreement for categorical data is by Cohen's kappa statistics. In the problem considered here, there are an unknown number of "subjects" being observed, and observers only record when the feature being studied is present. No observation is made when it is absent. This makes kappa inappropriate.

The method described here was developed for the study of some cerebral embolus data. The results were published as:

Markus H, Bland JM, Rose G, Sitzer M, Siebler M. How good is intercenter agreement in the identification of embolic signals in carotid artery disease? Stroke 1996; 27: 1249-1252.

Things have been slightly simplified in this description of the method. The data consist of 125 moments in 3 hours of tape when at least one of five observers recorded an abnormality which is thought to represent a cerebral embolus passing. I have used "yes" as shorthand for recording an abnormality and "no" for failing to record an abnormality. There are no moments when all record no abnormality because the method of data collection does not allow for this.

The data are downloadable as a Stata dictionary file, a simple, self-explanatory text format. The detection of an embolus is coded "1", if the observer did not record an embolus at this time it is coded "0".

Why Cohen's kappa doesn't work

The problem is that the number of observations where both observers would say "no" is unknown. Consider for example observers 1 and 2. The table is:

Observer 2
Observer 1 no yes Total
no 10 5 15
yes 11 99 110
Total 21 104 125
Cohen's kappa = 0.483

	Observer 2
Observer 1	no	yes	Total
no	10	5	15
yes	11	99	110
Total	21	104	125
Cohen's kappa = 0.483

But if we had only the observations of Observers 1 and 2, there would be no observations in the first cell:

Observer 2
Observer 1 no yes Total
no 0 5 5
yes 11 99 110
Total 11 104 115
Cohen's kappa = -0.064

	Observer 2
Observer 1	no	yes	Total
no	0	5	5
yes	11	99	110
Total	11	104	115
Cohen's kappa = -0.064

But in fact there is an unknown large number of observations where both say "no", e.g. 1000:

Observer 2
Observer 1 no yes Total
no 1000 5 1005
yes 11 99 110
Total 1011 104 1115
Cohen's kappa = 0.917

	Observer 2
Observer 1	no	yes	Total
no	1000	5	1005
yes	11	99	110
Total	1011	104	1115
Cohen's kappa = 0.917

Thus Kappa cannot be estimated here. We do not know how many "no"s there are.

Probability that another observer would agree

We need a different approach. I suggest estimating the probability that if one observer says "yes" another will say "yes" also, i.e. that if one observer records an embolus another observer will also record it.

To estimate this probability, all we need are the numbers of observers giving "yes" assessments for each moment when a "yes" is observed. Denote the numbers of observers and recorded moments by n and m respectively, and the number of observers rating moment i as a "yes" by r_i. For each observer rating subject i as "yes" there are n-1 other observers, r_i-1 of whom classify the moment as "yes". Hence the proportion of other observers rating the subject as "yes" is (r_i-1)/(n-1). The total number of ratings as "yes" over all subjects is Sum r_i and the average proportion of further observers who also rate as "yes" is
p_yes = (Sum r_i(r_i-1) /(n-1))/(Sum r_i)
= (Sum r_i² - Sum r_i) /(n-1))/(Sum r_i)

We can apply this to the following table for Observers 1 and 2, with no observations in the first cell:

Observer 2
Observer 1 no yes Total
no 0 5 5
yes 11 99 110
Total 11 104 115

	Observer 2
Observer 1	no	yes	Total
no	0	5	5
yes	11	99	110
Total	11	104	115

The number of observers is n=2 and the number of moments is m=115. There are 5 + 11 = 16 moments where one observers rates it as "yes" and 99 where both observers rate as "yes".
Sum r_i = 16 + 99 times 2 = 214.
Sum r_i² = 16 + 99 times 2² = 412.
p_yes = (412-214)/((2-1) times 214) = 0.93

If we apply this to the version of the table with an arbitrary large number of observations where both say "no", e.g. 1000:

Observer 2
Observer 1 no yes Total
no 1000 5 1005
yes 11 99 110
Total 1011 104 1115

	Observer 2
Observer 1	no	yes	Total
no	1000	5	1005
yes	11	99	110
Total	1011	104	1115

we get the same thing. The number of observers is n=2 and the number of moments is now m=1115. However, there are still 5 + 11 = 16 moments where one observers rates it as "yes" and 99 where both observers rate as "yes".
Sum r_i = 16 + 99 times 2 = 214.
Sum r_i² = 16 + 99 times 2² = 412.
p_yes = (412-214)/((2-1) times 214) = 0.93
as before.

This method is not dependent on the moments when no embolus is observed.

If we use all five observers, we have a total of 125 observations. The numbers of moments with each possible number of "yes"s are:

"yes"s Count
1 18
2 8
3 8
4 10
5 81
Total 125

"yes"s	Count
1	18
2	8
3	8
4	10
5	81
Total	125

You can find this by adding the variables representing each observer's observations (0 for "no", 1 for "yes") and tabulating the result. From this we find
Sum r_i = 12575
by multiplying the count by the number of "yes"s and adding. We get
Sum r_i² = 57675.
by multiplying the count by the number of "yes"s squared and adding. The probability that another observer would say yes is then found by:
p_yes = (57675-12575)/((5-1) times 12575) = 0.90

Thus we can conclude that if any one of these observers recorded an embolus, another observer would also record it with probability 0.90. In other words, a second observer would agree with 90% of the emboli recorded.

If you wish to use this method, please acknowledge the original paper by Markus et al. and this web site.

Frequently asked questions on the design and analysis of measurement studies.

Measurement studies menu.

Martin Bland's home page.

This page maintained by Martin Bland.
Last updated: 20 March, 2009