How does pooling blood samples affect standard deviation?

The question came from Paul Wicks.

This is the scenario:

We are trying to describe a population in terms of a marker in the blood. We have 24 blood samples. The 24 measurements have standard deviation SD1.

1) In the first case we work out a mean and a standard error of the mean. The standard error is SD1/sqrt(24). Let us call it SE1.

2) In the second case we mix up the blood samples into 3 'pooled' samples with 8 samples in each. The mean of the pooled samples is the same as the mean of the unpooled samples. The 3 pooled samples have standard deviation SD2. The standard error of our three samples is SD2/sqrt(3). Let us call it SE2.

Now I would think that SE1=SE2. I am wrong. In our experiment. SE2 is approx SE1*sqrt(8), i.e. implying SD1=SD2. I theorized that SD2=SD1/SQRT(8) as I felt that the pooled samples would show less variation.

Where have I gone wrong? Why is it SD1=SD2 and not SE1=SE2?

My answer

The problem is that in measurement there are two sources of variation, between the subjects, the "true" variation, and within the subject, the measurement error. Pooling samples affects the first component but not the second.

Let us call the between-subjects standard deviation s_b and the within-subject standard deviation s_w.

First we measure 24 subjects. The variance is
SD1² = s_b² + s_w²
and the standard error of the mean is
SE1 = root((s_b² + s_w²)/24) = root(s_b²/24 + s_w²/24 ).

Now we pool 8 subjects' blood, presumably chosen at random. The variance of measurements of such pools is
SD2² = s_b²/8 + s_w².
The between-pools component is smaller than the between-subjects component for single samples, because this will be the average of the "true" values for 8 subjects. The measurement error is the same as for a single sample, because it comes from the measurement process, not the subjects. There are 3 such samples, so the standard error of the mean is
SE2 = root((s_b²/8 + s_w²)/3) = root(s_b²/24 + s_w²/3).

So SD1 should be greater than SD2, and SE1 should be less than SE2. How much greater or less depends on the relative sizes of s_b and s_w.

If s_b is much greater than s_w, the standard errors will be similar.
If s_b is much less than s_w, the standard deviations will be similar.

If s_b=0, then SE1 = SE2/root(8), as Paul found. I would conclude that the measurement error is large.

Back to frequently asked questions on the design and analysis of measurement studies.

Back to measurement studies menu.

Back to Martin Bland's Home Page.

This page maintained by Martin Bland.
Last updated: 13 January, 2004.