Statistical inference is the process of drawing conclusions from data, for example by confidence intervals and significance tests. In this lecture we shall look how we can draw conclusions from samples about the means of populations.
We shall first look at large samples, and at how we can make inferences about a single mean, means in paired data, and the difference between the means of two samples. For each of these we shall use a large sample Normal method or z method.
We shall then look at the same problems for small samples. For a single mean we shall describe the one sample t method, for paired data the paired t method, and for the means of two samples the two sample t method, also called the independent samples t method, or two group t method. For t methods there are strong assumptions about the distribution of the observations. I shall describe how we can use graphical methods to investigate these.
We shall not discuss what to do if we have means of more than two samples. The usual method for any size samples is one-way analysis of variance (anova), the assumptions of which are as for the two sample t method.
We can find confidence intervals and carry out significance tests for the means of large samples using the Normal distribution. We make use of two properties of large samples. First, the means of large samples drawn in the same way will follow a Normal distribution quite closely, as described in Week 2. Second, the standard deviation estimated from a large sample will be close to that for the whole population. This means that the standard error estimated from the sample will be a good estimate.
We find confidence intervals for means of large samples using the Normal distribution. We first estimate the standard error of the mean of the sample. This is easy to do from the standard deviation of the observations, it is the standard deviation divided by the square root of the sample size. Then the 95% confidence interval is the mean minus 1.96 standard errors to the mean plus 1.96 standard errors.
For example, Figure 1 shows the distribution of birthweight in 1749 singleton pregnancies to Caucasian mothers in South London.
Figure 1.  Birthweights of 1749 singleton births to Caucasian mothers 
in South London (data of Brooke et al., 1989) 
This is clearly negatively skew, unlike the distribution of birthweight 
for term births, which is approximately Normal.  
These birthweights have mean = 3296.0 g and standard deviation = 563.2 g.  
The standard error of the mean is 13.5 g.  
Because the sample is large, 
the mean birthweight will be from a Normal distribution 
with mean equal to the mean birthweight in the population and 
standard deviation very close to the estimated standard error 
of the mean, 13.5 g.  
Hence the 95% confidence interval for the population mean birthweight 
will be 3296.0 - 1.96 × 13.5 g to 3296.0 + 1.96 × 13.5 g, 
which gives 3270 g to 3322 g.  
Hence we estimate that the mean birthweight in this population 
to be between 3270 and 3322 g.
 
The only assumptions we have to make about the data for this method are:
 
In healthcare research, we more often want to compare groups of subjects 
than use a single sample to estimate the mean in the population.  
For example, Christensen et al. (2004) compared interventions for 
depression delivered using the internet.  
They recruited 525 people with symptoms of depression identified in a survey.  
These were They were randomly allocated to a website, BluePages, 
offering information about depression (n = 166) or a 
cognitive behaviour therapy website, MoodGYM, (n = 182), 
or a control intervention using an attention placebo (n = 178).  
The main outcome measure was the Center for Epidemiologic Studies 
depression scale.  
This consists of 20 questions scored 0 (not depressed) to 3 (depressed) 
and summed, giving a score between 0 and 60.  
The means and standard deviations of their depression scores 
before randomisation and of their falls in depression
 score are shown in Table 1.  
 
 
All the subjects were depressed and so should have initial scores well 
above zero and the standard deviations are approximately half the mean, 
so we can deduce that the depression scores had a positively skew distribution.  
 
Ninety (17%) of subjects did not return post-intervention questionnaires 
and the authors assumed that their scores were unchanged.  
This means that whatever the original distribution of the scores, 
the differences must have a large spike of at least 90 observations at zero.  
They could not have a Normal distribution.  
 
Because the three treatment groups are all fairly large samples, 
we can estimate a confidence interval for the mean fall for each of them.  
For BluePages, 165 subjects had mean fall in depression score = 3.9 
with standard deviation = 9.1.  
The standard error of the mean is 0.71 and so the 95% confidence interval 
for the mean fall is 3.9 - 1.96 × 0.71 to 3.9 + 1.96 × 0.71, 
which is 2.5 to 5.3 points on the depression scale.  
This is our interval estimate for the mean fall in depression score a
ssuming non-responders do not change.  
 
We can also test the null hypothesis that the mean change is zero 
against the alternative hypothesis that there is a change, in either direction.  
Because the sample is large, the mean will be from a Normal distribution 
with standard deviation equal to the standard error of the mean, 0.71.  
Hence the observed sample mean minus the unknown population mean then 
divided by the standard error will be an observation from the 
Standard Normal distribution.  
The null hypothesis says that the population mean is zero, 
so if this is true then sample mean over standard error will be from 
a Standard Normal distribution.  
For the BluePages group this is z = 3.9/0.71 = 5.49.  
The probability of getting such an extreme value from a 
Standard Normal distribution is very small indeed, 0.00000004.  
We cannot trust the approximation to the Normal distribution 
will be close enough for such very small probabilities to be reliable, 
so we usually quote this as P<0.0001.  
This is the large sample Normal test for a single mean, 
also called the z test for a single mean. 
 
The assumptions required for this method are:
 
We can check the last by plotting the difference against the average 
of the two measurements for the subject.  
I shall describe this in detail later under paired t test.
 
We can also find a confidence interval for difference 
between the means of two independent samples.  
For example, we shall compare the mean fall in score for BluePages with MoodGYM.  
The difference between the means, BluePages minus MoodGYM, = -0.3.  
We can find the standard error for the difference by squaring the 
standard error of each mean, adding, and taking the square root.  
This only works when the groups are independent.  
If we were to do it for paired data like the before and 
after measurements above, the standard error might be much too large.  
For BluePages and MoodGYM, we have
 
root(0.712 + 0.672) = 0.98.	
 
The 95% CI is then given by -0.3 - 1.96 × 0.98 to 
-0.3 + 1.96 × 0.98 = -2.2 to +1.6.
 
We can also do a test of the null hypothesis that in the population 
the difference between the means is zero against the 
alternative hypothesis that the difference in the population is not zero.  
As for the paired example above, because we have a large sample the 
observed difference minus the population difference then divided by 
the estimated standard error of the difference should be an 
observation from a Standard Normal distribution.  
If the null hypothesis were true, the population difference would be zero.  
The test statistic is observed difference divided by its standard error, 
z = -0.3/0.98 = -0.31.  
The probability of an observation from the Standard Normal distribution 
being as far from its expected value, zero, as -0.31 is P=0.76.  
Hence the difference is not significant.  
 
We can tell this from the 95% confidence interval, also, 
as this includes zero, the null hypothesis value for the difference.  
This is the large sample Normal distribution test or z test for the 
means of two independent groups.
 
We can carry out the same calculations for the comparison of 
each active intervention with control.  
For BluePages, the difference between mean falls is 3.9 - 1.0 = 2.9 
and the standard error of the difference is 0.95.  
Hence the 95% confidence interval is 2.9 - 1.96 × 0.95 
to 2.9 + 1.96 × 0.95 which gives 1.0 to 4.8.  
The test of significance gives z = 2.9/0.95 = 3.05, P = 0.002.  
For MoodGYM minus control the observed difference is 3.2 with 
standard error 0.92, the 95% confidence interval 
being 1.4 to 5.0 and z = 3.48, P = 0.0005.  
 
If we want to test the overall null hypothesis that the three treatments 
would produce the same mean fall in the population, 
we could do this by applying the Bonferroni correction to these three P values.  
Multiplying by 3 would give the smallest P value = 0.0005 × 3 = 0.0015, 
which is still highly significant.  Christensen et al. (2004) 
did not do the analysis exactly as we have here.  
They used an analysis of variance method, which I shall omit, 
to compare all three groups simultaneously.  
 
The large sample Normal method for comparing two means requires 
two assumptions about the data.  
 
 
Some computer programs do not do large sample z tests directly.  
You have to use the command for a one sample or paired t test, 
or for a two-sample t test with unequal variances.  I describe these below.  
For large samples, they give the same answers as the z tests.
 
When samples are small, we cannot apply the large sample 
Normal distribution methods safely.  
his problem was tackled by a statistician who published under the pseudonym Student, 
because his employers would not allow him to publish the results of his work.  
The probability distribution which he discovered is known as 
Student's t distribution as a result and the methods which 
use it as Student's t tests.
 
We have seen that when the sample is large, 
the observed sample mean minus the population mean 
divided by the standard error follows the Standard Normal distribution.  
When the sample is small this is not so.  
The distribution followed depends on the distribution of the 
observations themselves, unlike the large sample case where this is irrelevant.  
We have to assume that the data themselves come from a population which 
follows a Normal distribution.  
We have seen that some naturally occurring variables do this and some do not.  
We shall see in Week 5 that many variables which do not follow a 
Normal distribution can be made to do so by changing the way in 
which we look at them, using a transformation such as the logarithm.  
When the observations come from a population which follows a 
Normal distribution, then the sample mean minus the population mean 
divided by the standard error of the mean follows Student's t distribution, 
or simply the t distribution.  
Student's t distribution may be defined as the distribution 
which this ratio would follow.
 
Like the Normal distribution, Student's t distribution is a 
family of distributions rather than just one.  
This family has only has one parameter, 
the number which tells us with which member of the family of t distributions 
we are dealing.  
This is called the degrees of freedom.  
We have already used this term in the calculation of 
variances and standard deviations.  
The degrees of freedom of the t distribution is equal to the degrees of freedom of 
the standard deviation used in the calculation of the standard error.
 
Figure 2 shows some members of the Student's t distribution family.  
 
Figure 2.  Student's t distribution with 1, 4, and 20 degrees of freedom, 
with the Standard Normal distribution 
When the degrees of freedom are small, corresponding to small samples, 
the t distribution has much longer tails than the Normal.  
This reflects the greater uncertainty in the standard error of the mean.  
As the degrees of freedom and hence the related sample size gets bigger, 
the t distribution gets closer and closer to the Standard Normal distribution.  
The t distribution reaches the Normal distribution in theory when 
the sample is infinitely large.  
In practice, it is difficult to tell the Normal and t distributions apart 
at about 30 degrees of freedom.
 
Like the Normal, the t distribution has no simple formulae for its probabilities.  
Instead we used numerical approximations to calculate the number 
which replaces 1.96 in confidence interval calculations 
and the P values in significance tests.  
If we do these calculations using one of the many computer programs available, 
the program will calculate these for us.  
For the purposes of illustration, 
I shall also give a short table of the distribution 
for different degrees of freedom (Table 2).  
 
 
D.f. = Degrees of freedom  
For each of the degrees of freedom given, 
Table 2 gives the value which will be exceeded, 
in either positive or negative direction, with the given probability.  
For example, Figure 3 shows the 5% two sided 
probability points of the t distribution with 4 degrees of freedom.
 
 
Figure 3.  5% probability points of the t distribution with 4 degrees of freedom 
We can use Student's t distribution to replace the Normal distribution 
in confidence interval and significance tests for small samples.  
To do this we must be able to assume that the observations 
themselves come from a Normal distribution, 
plus other assumptions for different applications as described below.
 
We can use the t distribution to carry out all the analyses of means
of small samples which we did above using the Normal distribution for large samples.  
We seldom want to estimate the mean of a population from the 
mean of a small sample, but we shall start with this as it is the easiest.  
 
For our example, we shall use data from nine patients 
with chronic non-healing wounds (Shukla et al., 2004).  
Biopsies were assessed using the microscopic angiogenesis grading system 
(MAGS) score, which provides an index of how well 
small blood vessels are developing and hence of epithelial regeneration.  
High scores are good.  
The nine observations were 20, 31, 34, 39, 43, 45, 49, 51, and 63.  
 
We can use these measurements to estimate the mean MAGS score in 
non-healing patients.  
The mean score before treatment is 41.7 and the standard deviation is 
12.5 with 8 degrees of freedom.  
The standard error of the mean is 4.2.  
If we had a large sample, we could estimate a 95% confidence interval 
for the mean by subtracting and adding 1.96 standard errors: 
41.7 - 1.96 × 4.2 to 41.7 + 1.96 × 4.2.  
But we have only 9 observations, so this would not be valid.  
Instead we use the t distribution with 8 degrees of freedom.  
From Table 2, 
the 5% point of the t distribution with 8 degrees of freedom is 2.31, 
so the confidence interval for the mean MAGS score is 
41.7 - 2.31 × 4.2 to 41.7 + 2.31 × 4.2 = 32.0 to 51.4.  
 
This is only valid provided we can assume the observations 
come from a Normal distribution.  
We may know from our experience of the measurement that this variable 
usually follows a Normal distribution, 
but we always like to check that our sample is compatible.  
I describe how to do this in the next section.
 
When I introduced the Normal distribution, 
I showed histograms of several large samples and superimposed 
Normal distribution curves on them to show whether 
the Normal distribution fitted the data.  
For small samples, it is very difficult to judge from a histogram 
whether the Normal distribution is a good fit.  
 
Figure 4 shows a histogram for the MAGS score before treatment.
 
Figure 4.  Histogram for the MAGS score before treatment, 
with Normal distribution of same mean and variance  
We cannot really say whether the distribution and the data have the same shape.  
There is a better graphical method to examine the fit of a Normal distribution 
to a set of data, the Normal quantile plot or Normal plot for short.  
A Normal plot is a plot of the observed data against the values 
which we would expect if the data actually followed a Normal distribution.  
Table 3 shows the results of the calculation.  
 
 
First we put our observations into ascending order.  
There are nine of them, and we ask what would be the expected values 
of the smallest observation from a sample of nine from a Normal distribution.  
For the Standard Normal distribution this is -1.28.  
(As usual, we skip the formulae because the computer program will do all this for us.)  
We expect the next up to be -0.84, the next to be -0.52, etc.  
The middle value is expected to be zero, the mean and median of the 
Standard Normal distribution.  
We now convert these to a Normal distribution with the 
same mean and variance as the data by multiplying the 
Standard Normal value by the sample standard deviation and adding the sample mean.  
Thus we would expect the smallest of nine observations 
from a Normal distribution with mean 41.7 and standard deviation to be 
-1.28 × 12.5 + 41.7 = 25.6.  
Compare this to the observed smallest value, which is 20.  
Inspecting Table 3 will show you that 
most of the observed MAGS scores and the MAGS scores we would expect 
if we had a Normal distribution are quite close.  
 
We can now plot the observed MAGS score against the MAGS score 
which would be expected if data followed a Normal distribution.  
If the observed and expected are similar, observations should 
lie close to the line of equality, which joins points where the 
observed and expected would be equal, which we also draw on the graph.  
Figure 5 shows the Normal plot for the MAGS data.  
 
Figure 5.  Normal plot for the MAGS data 
Most of the observations are indeed close to the line, 
suggesting that the observations are quite close to a what we 
would expect from a Normal distribution.
 
To see how Normal plots behave with distributions of different shapes, 
we can look at Normal plots and histograms together when we have larger samples.  
Figure 6 shows the Normal plot for the birth weight data of 
Figure 1.  
 
Figure 6.  Normal plot for the birthweight data, all gestational ages 
The distribution is negatively skew and the points deviate away from the line, 
falling below it at either end rising above in the middle.  
Figure 7 shows the histogram and Normal plot for 
term birth weights only, which fit a Normal distribution quite well.  
 
Figure 7.  Normal plot for the birthweight data, births at or above 
37 weeks gestation only 
The Normal plot shows a good fit to the straight line of equality.  
Figure 8 shows the Normal plot for serum cholesterol 
in stroke patients.  
 
Figure 8  Normal plot for cholesterol in stroke patients
 
This is a positively skew distribution and shows the opposite curvature 
to the negatively skew distribution of Figure 6, 
curving upwards rather than down. 
 
There are several ways of drawing Normal plots.  
Some programs, such as SPSS, put the expected Normal values on the vertical axis 
and the observed data on the horizontal axis.  
A downward curve then indicates positive skewness, 
an upward curve negative skewness.  
Some programs use the Standard Normal expected values 
rather than those for a Normal distribution with the same mean 
and standard deviation as the data, in which case the straight 
line depends on the mean and standard deviation rather than being the line of equality.  
Some offer a Normal probability plot rather than a Normal quantile plot, 
but these look very similar and are interpreted in the same way.
 
There are also several significance tests, such as the Shapiro-Wilk, 
Shapiro-Francia, and the splendidly named Kolmogorov-Smirnov tests, 
which can be used to test the null hypothesis that the data come from a 
Normal distribution.  Graphical methods are much more useful in practice.  
If the sample is small, departures from the Normal may not be significant 
just because the there are insufficient data to detect them.  
If the sample is large, very small departures from the Normal may be significant, 
but such departures will not affect the results of analyses.
 
The paired t method is the version of the one sample t method 
usually seen in research publications.  
Here we have paired observations, such as the same subject 
before and after an intervention, 
the same subject receiving two different interventions as in a cross-over trial, 
or matched case and control in a case-control study.  
Table 4 shows fuller data from Shukla et al. (2004).  
 
 
In this trial, patients with chronic non-healing wounds 
were randomised to receive topical placental extract or to control.  
The data in Table 4 show the MAGS score before 
and after treatment in a group 9 of the patients in the active treatment group.  
We want to know whether we have evidence that mean MAGS score changed 
and what the average score might be.  
I have calculated the difference between the MAGS score after treatment 
and the MAGS score before treatment, i.e. the increase in the MAGS score.
 
The authors of the paper did not do any further analysis of these data, 
as they were all positive differences and the MAGS score 
clearly increases following treatment.  
We shall use them to estimate the mean increase in MAGS score.  
The mean and standard deviation of the increase in MAGS 
score are 9.33 and 4.03 respectively.  
We have 9 observations so the number of degrees of freedom 
for the calculation of the standard deviation is 9 - 1 = 8.
 
The standard error of the mean difference is 1.34.  
To estimate the 95% confidence interval for the mean from this small sample, 
we use the 5% point of the t distribution with 8 degrees of freedom.  
From the 8 degrees of freedom row in Table 2 this is 2.31.  
The 95% confidence interval is therefore the mean minus or plus 
2.31 standard errors, 9.33 - 2.31 × 1.34 to 9.33 + 2.31 × 1.34, 
which gives us 6.2 to 12.4.
 
We can also test the null hypothesis that in the population the mean increase is zero.  
The test statistic is the mean divided by its standard error.  
This is 9.33/1.34 = 6.96.  
If we look in the 8 degrees of freedom row in Table 2, 
we see that this is larger than the largest number there, 5.04, 
which corresponds to a probability of 0.001.  
Hence we could say P<0.001.  
In practice, we would do this using a computer program, which gives us P = 0.0001.  
The difference is highly significant.
 
There are several assumptions which we must make about the data 
for the paired t method test to be valid:
 
 
The first of these, independence, depends on the design.  
It is met for the MAGS data, because the pairs of data 
come from nine different subjects.  
The second can be tested by a Normal plot, as shown in Figure 9.  
 
Figure 9.  Normal plot for the increases in MAGS score 
This appears to fit the straight line quite well and there is no reason 
to suppose that the differences do not follow a Normal distribution.  
The third, that the mean and the variability are not related to the magnitude, 
can also be investigated graphically.  
We do a scatter plot of the difference against the 
average of the two observations, as in Figure 10.  
 
Figure 10.  Difference versus mean plot for the increases in MAGS score 
We do this because the average of the two measurements is the best estimate 
we have of the subject's true MAGS score over the period.  
Using only one of the measurements, either before or after, 
on the horizontal axis tends to produce spurious relationships between 
difference and magnitude.  
For the MAGS data, Figure 10 shows little evidence 
that either the mean difference or the variability of the differences 
is related to the magnitude of MAGS score for the subject.
 
This is also called the unpaired t method or unpaired t test, 
the two group t method, or Student's two sample t test.  
It enables us to estimate the difference between means or 
test the null hypothesis of no difference in the population, 
even when the samples are small.
 
Our example is a comparison of capillary density between patients with diabetic foot ulcers 
and a group of non-ulcerated controls (Table 5).
  
 
 
The data are shown graphically in Figure 11.
 
Figure 11.  Scatter diagram showing capillary density in two groups of patients 
The samples are small, only 23 ulcer patients and 19 controls, 
so we cannot use the large sample Normal method.  
The standard error will not be sufficiently well estimated.
 
For the two-sample t method, we must make three assumptions about the data:
 
If  the distributions in the two populations have the same variance, 
we need only one estimate of variance.  
We call this the common or pooled variance estimate.  
It is a weighted average of the two sample variances, 
weighted by the degrees of freedom.  
The degrees of freedom for this common variance estimate are 
the number of observations minus 2.  
We then use this common estimate of variance to estimate 
the standard error of the difference between the means.
 
For the capillary density example, the common variance = 53.31, SD = 7.30 capillaries/mm2, 
df = 19 + 23 – 2 = 40.  
The difference (control – ulcer) = 34.08 – 22.59 = 11.49 capillaries/mm2.  
The standard error of the difference = 2.26 capillaries/mm2.  
Then the 95% confidence interval for difference is given by 
11.49 – t × 2.26 to 11.49 + t × 2.26.  
Here t comes not from the Normal distribution but the t distribution 
with 40 degrees of freedom.  
It is the 5% point of the distribution, because 5% of observations 
will be further from zero than t, 95% will be closer to zero than t.  
From Table 2, for 40 degrees of freedom, t = 2.02.  
Hence the 95% CI is 
11.49 – 2.02 × 2.26 to 11.49 + 2.02 × 2.26 = 6.92 to 16.07 capillaries/mm2.
 
We can also carry out a test of significance, 
testing the null hypothesis that in the population the 
difference between means = 0.  
We take the observed difference divided by its standard error and, 
if the null hypothesis were true, this would be an observation 
from the t distribution with 40 degrees of freedom.  We have
difference/SE = 11.49/2.26 = 5.08.
 
From Table 2, the probability of such an extreme 
value is less than 0.001.  
If we use a good computer program, this will calculate the P value for us more accurately.  
In this case we get P = 0.0000, which we report as P<0.0001.
 
We can check the assumption that energy expenditure 
follows a Normal distribution in each population by histograms and Normal plots.  
Figure 12 shows histograms for each group.  
 
Figure 12.  Histograms of capillary density in two groups of patients 
There are not enough observations to judge whether the data follow 
Normal distributions.  
We can improve matters by combining the two groups.  
The distribution would be affected by any difference between the means, 
perhaps even becoming bimodal.  
We get round this by subtracting the group mean from each observation 
to give residuals.  
The residuals have mean = 0 in each group.  
We can then put them together to form a single distribution, 
as shown in Figure 13.  
 
Figure 13.  Distribution of residual capillary density, 
with corresponding Normal distribution curve 
This looks fairly symmetrical, but there are still only a few observations.  
A better way would be to draw a Normal plot for the residuals, 
as shown in Figure 14.  
 
Figure 14.  Distribution of residual capillary density, 
with corresponding Normal distribution curve, and Normal plot 
The Normal plot conforms fairly well to the straight line, 
confirming that the distribution is approximately Normal.
 
The other assumption is that the variances are the same in each population.  
For the capillary density, Table 5 
shows that the standard deviations are similar, 
being 7.29 capillaries/mm2 for the control sample and 7.31 capillaries/mm2 
for the ulcerated sample.
 
We can also test the equality of variances, either with an F test or Levene's test.  
However, tests have the unfortunately property that they miss 
large differences for small samples, when differences might matter, 
and find them for large samples, when they matter much less.  
It is usually preferable to judge whether the assumption of 
uniform variance is plausible from the scatter plot (Figure 11).  
For the capillary density, the spread of the two groups looks fairly similar.
 
Methods using the t distribution depend on some strong assumptions 
about the distributions from which the data come.  
In general, for two equal sized samples the t method is very resistant 
to deviations from Normality, 
though as the samples become less equal in size the approximation becomes less good.  
The most likely effect of skewness is that we lose power.  
P values are then too large and confidence intervals too wide.  
We can usually correct skewness by a transformation, as described in Week 5.
 
If we cannot assume uniform variance, 
the effect is usually small if the two populations are from a Normal Distribution.  
However, unequal variance is often associated with skewness in the data.  
When distributions are positively skew, 
the variability usually increases with increasing mean.  
This is the case for the energy expenditure, of course.  
In this case a transformation designed to correct one fault often tends 
to correct the other as well.
 
If distributions are Normal, we can use the Satterthwaite correction 
to the degrees of freedom, often called the two sample t method for 
unequal or unpooled variance.
 
If variances are unequal, we cannot estimate a common variance.  
Instead we use the large sample form of the standard error 
of the difference between means.  
We replace the t value for confidence intervals and significance tests 
by t with fewer degrees of freedom.  
The Satterthwaite degrees of freedom depend on the relative sizes of the variances.  
The larger variance dominates and if one is much larger than 
the other the degrees of freedom for that group are the only degrees of freedom. 
 
For the capillary density example, the degrees of freedom = 40 (= 19 + 23 – 2).  
The unpooled standard error, found as for the comparison of two large sample means, 
is 6.91 capillaries/mm2, Satterthwaite's degrees of freedom = 38.56.  
This is almost unchanged because the variances here are almost the same.  
We round this down to 38 to use the t table.  
For this example, the t test for equal variances gives P<0.0001, 
unequal variances also gives P<0.0001.  
 
The Satterthwaite 95% confidence interval is 6.91 to 16.07 capillaries/mm2, 
compared to the 6.92 to 16.07 capillaries/mm2 using the pooled variance method.  
It is very similar.  
This is because the two sample t method is very robust to small 
departures from its assumptions, especially when the groups are of similar size, 
as here.
 
N.B. Satterthwaite's method is an approximation for use in unusual circumstances.  
The equal variance method is the standard t test. 
 
Brooke OG, Anderson HR, Bland JM, Peacock JL, Stewart CM.  (1989)  
Effects on birth weight of smoking, alcohol, caffeine, socioeconomic factors, 
and psychosocial stress.  
British Medical Journal, 298, 795-801.
 
Christensen H, Griffiths KM, Jorm AF.  (2004)  
Delivering interventions for depression by using the internet: 
randomised controlled trial.  
British Medical Journal 328, 265-268.
 
Shukla VK, Rasheed MA, Kumar M, Gupta SK, Pandey SS.  (2004)  
A trial to determine the role of placental extract in the treatment 
of chronic non-healing wounds.  
Journal of Wound Care 13, 177-9.
 
Steenmoorle P, Julina GN.  (2004)  
Can laboratory investigation help us to decide when to discontinue larval therapy?  
Journal of Wound Care 13, 38-40.
 
To  Clinical Biostatistics index.
 
To  Martin Bland's M.Sc. index.
 
This page maintained by Martin Bland. 
 
  
 Back to top.
Differences between means in paired large samples 
      Treatment 
      Number	
      Baseline scores
      Fall in scores
   
      Mean
      SD
      Mean	
      SD
   
      BluePages
      165
      21.1
      10.4
      3.9
      9.1
   
      MoodGYM
      182
      21.8
      10.5
      4.2
      9.1
   
      Controls
      178
      21.6
      11.1
      1.0
      8.4
 
  
Comparing the means of two independent large samples 
  
The t distribution
 
 
  D.f. 
  Probability
          
  D.f.
  Probability
 
                           0.10    0.05    0.01    0.001                 0.10    0.05    0.01    0.001 
 
                          (10%)    (5%)    (1%)    (0.1%)               (10%)    (5%)    (1%)   (0.1%) 
 
    1     6.31   12.70   63.66   636.62    16     1.75    2.12    2.92     4.02 
 
    2     2.92    4.30    9.93    31.60    17     1.74    2.11    2.90     3.97 
 
    3     2.35    3.18    5.84    12.92    18     1.73    2.10    2.88     3.92 
 
    4     2.13    2.78    4.60     8.61    19     1.73    2.09    2.86     3.88 
 
    5     2.02    2.57    4.03     6.87    20     1.73    2.09    2.85     3.85 
 
    6     1.94    2.45    3.71     5.96    21     1.72    2.08    2.83     3.82 
 
    7     1.90    2.36    3.50     5.41    22     1.72    2.07    2.82     3.79 
 
    8     1.86    2.31    3.36     5.04    23     1.71    2.07    2.81     3.77 
 
    9     1.83    2.26    3.25     4.78    24     1.71    2.06    2.80     3.75 
 
   10     1.81    2.23    3.17     4.59    25     1.71    2.06    2.79     3.73 
 
   11     1.80    2.20    3.11     4.44    30     1.70    2.04    2.75     3.65 
 
   12     1.78    2.18    3.06     4.32    40     1.68    2.02    2.70     3.55 
 
   13     1.77    2.16    3.01     4.22    60     1.67    2.00    2.66     3.46 
 
   14     1.76    2.15    2.98     4.14   120     1.66    1.98    2.62     3.37 
 
   15     1.75    2.13    2.95     4.07   infinity    1.65    1.96    2.58     3.29 
 
infinity, same as the Standard Normal Distribution
The one sample t method 
Checking the assumption of a Normal distribution
  
      MAGS score
      Standard Normal,  
 mean = 0, SD = 1
     Normal with   
 mean = 41.7, SD = 12.5
  
      20
      -1.28
      25.6
   
      31
      -0.84
      31.1
   
      34
      -0.52
      35.1
   
      39
      -0.25
      38.5
   
      43
      0.00
      41.7
   
      45
        0.25
      44.8
   
      49
        0.52
      48.2
   
      51
        0.84
      52.2
   
      63
      1.28
      57.7
 
 
The paired t method 
  
      MAGS score  
 before
     MAGS score  
 after
     Difference,  
 MAGS before  
 minus MAGS after
     Average of  
 MAGS before 
 and MAGS after
  
      20
      32
      12
      26.0
   
      31
      47
      16
      39.0
   
      34
      43
       9
      38.5
   
      39
      43
        4
      41.0
   
      43
      55
      12
      49.0
   
      45
      52
        7
      48.5
   
      49
      61
      12
      55.0
   
      51
      55
        4
      53.0
   
      63
      71
        8
      67.0
 
  
The two sample t method
  
      
      Controls	
      Ulcerated patients
        17.5   9.0
        27.5   11.0
        27.0   12.5
        29.5   18.0
        27.0   18.0
        29.0   18.0
        34.5   18.5
        31.0   20.0
        35.5   20.0
        33.5   22.0
        35.5   22.5
        34.0   22.5
        36.5   23.0
        38.0   23.0
        40.0   24.0
        39.5   26.5
        40.0   26.5
        40.0   27.0
        52.0   27.5
          28.0
          28.5
          29.0
          44.5
   
      Number
      19
      23
   
      Mean
      34.08
      22.59
   
      SD
      7.29
      7.31 
 
  
References
Last updated: 9 May, 2007.