Back to Brief Table of Contents

- D-1 When should sample size calculations be provided?
- D-2 Why is it important to consider sample size?
- D-3 Information required to calculate a sample size
- D-4 Explanation of statistical terms
- D-5 Which variables should be included in the sample size calculation?
- D-6 Allowing for response rates and other losses to the sample
- D-7 Consistency with study aims and statistical analysis
- D-8 Three specific examples of sample size calculations & statements
- D-9 Sample size statements likely to be rejected

Sample size calculations are required for the vast majority of quantitative studies.

Sample size calculations are not required for qualitative research (note: this means formal qualitative methods, such as content analysis, not simple descriptive projects which are actually still quantitative.)

Sample size calculations may not be required for certain preliminary pilot studies (see A-1.9). (However, such studies will often be performed prior to applying for funding).

If in any doubt, please check with the funding body - missing or inadequate sample size calculations are one of the most common reasons for rejecting proposals.

In studies concerned with estimating some characteristic of a population (e.g. the prevalence of asthmatic children), sample size calculations are important to ensure that estimates are obtained with required precision or confidence. For example, a prevalence of 10% from a sample of size 20 would have a 95% confidence interval of 1% to 31%, which is not very precise or informative. On the other hand, a prevalence of 10% from a sample of size 400 would have a 95% confidence interval of 7% to 13%, which may be considered sufficiently accurate. Sample size calculations help to avoid the former situation.

In studies concerned with detecting an effect (e.g. a difference between two treatments, or relative risk of a diagnosis if a certain risk factor is present versus absent), sample size calculations are important to ensure that if an effect deemed to be clinically or biologically important exists, then there is a high chance of it being detected, i.e. that the analysis will be statistically significant. If the sample is too small, then even if large differences are observed, it will be impossible to show that these are due to anything more than sampling variation.

It is highly recommended that you ask a professional statistician to conduct the sample size calculation.

Methods for the determination of sample size are described in several general
statistics texts, such as Altman (1991), Bland (2000), and Armitage, Berry
and Matthews (2002). Two specialised books are available which discuss
sample size determination in many situations. For continuous data, use Machin
Machin *et al.* (1998). For categorical data,
use Lemeshow *et al.* (1996). These books
both give tables so simplify the calculation. For sample size in sequential
trials, see
Whitehead, J. (1997).

The actual calculations for sample size can be done using several computer programs. Our free program Clinstat carries out calculations for the comparison of means and proortions and for testing correlation. It is available via our web directory of randomisation software. Many more options are provided by the commercial computer package, nQuery advisor, Elashoff (2000). A good free Windows program is PS Power and Sample Size Calculations, by William D. Dupont and Walton D. Plummer.

Your sample size calculation depends on the following factors, which the statistician will want to discuss with you: -

- The variables of interest in your study, including the type of data (type of data is expanded in sections A-4, A-4.1, and A-4.2)
- The desired power*
- The desired significance level*
- The effect size of clinical importance*
- The standard deviation of continuous outcome variables
- Whether analysis will involve one- or two-sided tests*
- Aspects of the design of your study: e.g. is your study ....
- a simple randomised controlled trial (RCT)
- a cluster randomised trial
- an equivalence trial (see D-7)
- a non-randomised intervention study (see B-5.10c)
- an observational study
- a prevalence study
- a study measuring sensitivity and specificity
- does your study have paired data?
- does your study include repeated measures?
- are groups of equal sizes?
- are the data hierarchical?

* Explanations of these terms are given below

Note 1: Non-randomised studies looking for differences or associations will generally require a much larger sample in order to allow adjustment for confounding factors within the analysis (see A-1.6, E-5).

Note 2: It is the absolute sample size which is of most interest, not the sample size as a proportion of the whole population.

Many statistical analyses involve the comparison of two treatments, procedures or subject types. The numerical value summarising the difference of interest is called the effect. In other study designs the effect may be represented by a correlation coefficient, an odds ratio, or a relative risk. We declare the null and alternative hypotheses. Usually, the null hypothesis states that there is no effect (e.g. the difference is zero; the relative risk is one; or the correlation coefficient is zero), and the alternative hypothesis that there is an effect.

The p-value is the probability of obtaining the effect observed in the study (or one stronger) if the null hypothesis of no effect is actually true. It is usually expressed as a proportion (e.g. p=0.03).

The significance level is a cut-off point for the p-value, below which the null hypothesis will be rejected and it will be concluded that there is evidence of an effect. The significance level is typically set at 5%. (The significance level, although a p-value, is usually expressed as a percentage: p=5% is equivalent to p=0.05). If the observed p-value is smaller than 5% then there is only a small probability that the study could have observed the data it did if there was truly no effect, and so it would be concluded that there is evidence of a real effect.

A significance level of 5% also means there is up to a 5% probability of concluding that there is evidence of an effect, when in fact none exists. A significance level of 1% is sometimes more appropriate, if it is very important to avoid concluding that there is evidence of an effect when in reality none exists.

Power is the probability that the null hypothesis will be correctly rejected i.e. rejected when there is indeed a real difference or association. It can also be thought of as "100 minus the percentage chance of missing a real effect" - therefore the higher the power, the lower the chance of missing a real effect. Power is typically set at 80%, 90% or 95%. Power should not be less than 80%. If it is very important that the study does not miss a real effect, then a power of 90% or more should be applied.

This is the smallest difference between the group means or proportions (or odds ratio/relative risk closest to unity) which would be considered to be clinically or biologically important. The sample size should be set so that if such a difference exists, then it is very likely that a statistically significant result would be obtained.

In a two-sided test, the null hypothesis states there is no effect, and the alternative hypothesis (often implied) is that a difference exists in either direction. In a one-sided test the alternative hypothesis does specify a direction, for example that an active treatment is better than a placebo, and the null hypothesis then includes both no effect and placebo better than active treatment.

Two-sided tests should be used unless there is a very good reason for doing otherwise. Expectation that the difference will be in a particular direction is not adequate justification for one-sided tests. Medical researchers are sometimes surprised by their results. If the true effect is in the opposite direction to that expected, this generally has very different implications to that of no effect, and should be reported as such; a one-sided test would not allow this. Please see Bland & Altman (1994) for some examples of when one-sided tests may be appropriate.

The sample size calculation should relate to the study's primary outcome variable.

If the study has secondary outcome variables which are also considered important (as is often the case), the sample size should also be sufficient for the analyses of these variables. Separate sample size calculations should ideally be provided for each important variable. (See also B-6 and E-7)

The sample size calculation should relate to the final, achieved sample. Therefore, the initial numbers approached in the study may need to be increased in accordance with the expected response rate, loss to follow up, lack of compliance, and any other predicted reasons for loss of subjects (for clinical trials see also B-10). The link between the initial numbers approached and the final achieved sample size should be made explicit.

The adequacy of a sample size should be assessed according to the purpose of the study. For example, if the aim is to demonstrate that a new drug is superior to an existing one then it is important that the sample size is sufficient to detect a clinically important difference between the two treatments. However, sometimes the aim is to demonstrate that two drugs are equally effective. This type of trial is called an equivalence trial or a 'negative' trial. Pocock (1983) p129-130, discusses sample size considerations for these studies. The sample size required to demonstrate equivalence will be larger than that required to demonstrate a difference. Please check that your sample size calculations relate to the study's stated objectives, and are based on the study's primary outcome variable (see D-5).

The sample size calculation should also be consistent with the study's proposed method of analysis, since both the sample size and the analysis depend on the design of the study (see section E). Please check the consistency between sample size calculation and choice of analysis.

If your study requires the estimation of a single proportion, comparison of two means, or comparison of two proportions, the sample size calculations for these situations are (generally) relatively straightforward, and are therefore presented here. However, it is still strongly recommended that you ask a statistician to conduct the sample size calculation.

Note: The formula presented below is based on 'normal approximation methods', and, unless a very large sample is planned, should not be applied when estimating percentages which are close to 0% or 100%. In these circumstances 'exact methods' should be used. This will generally be the case in studies estimating the sensitivity or specificity of a new technique, where percentages close to 100% are anticipated. Consult a statistician (or at least a computer package) in this case. (See also E-12.1)

Scenario: The prevalence of dysfunctional breathing amongst asthma patients
being treated in general practice is to be assessed using a postal
questionnaire survey (Thomas * et al. * 2001).

Required information: -

- Primary outcome variable = presence/absence of dysfunctional breathing
- 'Best guess' of expected percentage (proportion) = 30% (0.30)
- Desired width of 95% confidence interval = 10% (i.e. +/- 5%, or 25% to 35%)

The formula for the sample size for estimation of a single proportion is as follows: -

*n* = 15.4 * *p* * (1-*p*)/*W*^{2}

where *n* = the required sample size
*p* = the expected proportion - here 0.30
*W* = width of confidence interval - here 0.10

Inserting the required information into the formula gives: -

*n* = 15.4 * 0.30 * (0.70)/ 0.10^{2} = 324

Suggested description of this sample size calculation: -

"A sample of 324 patients with asthma will be required to obtain a 95% confidence interval of +/- 5% around a prevalence estimate of 30%. To allow for an expected 70% response rate to the questionnaire, a total of 480 questionnaires will be delivered."

Note: The following calculation only applies when you intend to compare two groups of the same size.

Scenario: A placebo-controlled randomised trial proposes to assess the effectiveness of colony stimulating factors (CSFs) in reducing sepsis in premature babies. A previous study has shown the underlying rate of sepsis to be about 50% in such infants around 2 weeks after birth, and a reduction of this rate to 34% would be of clinical importance.

Required information: -

- Primary outcome variable = presence/absence of sepsis at 14 days after treatment (treatment is for a maximum of 72 hours after birth). Hence, a categorical variable summarised by proportions.
- Size of difference of clinical importance = 16%, or 0.16 (i.e. 50%-34%)
- Significance level = 5%
- Power = 80%
- Type of test = two-sided

The formula for the sample size for comparison of 2 proportions (two-sided) is as follows: -

*n* = [*A* + *B*]^{2} *
[(*p*_{1}*(1-*p*_{1})) + (*p*_{2} *
(1-*p*_{2}))]/[*p*_{1}-*p*_{2}]
^{2}

where *n* = the sample size required in each group (double this for total
sample)

*p*_{1} = first proportion - here 0.50

*p*_{2} = second proportion - here 0.34

*p*_{1}-*p*_{2} = size of difference of clinical
importance - here 0.16

*A* depends on desired significance level (see table) - here 1.96

*B* depends on desired power (see table) - here 0.84

Table of values for A and B

Significance level | A |
---|---|

5% | 1.96 |

1% | 2.58 |

Power | B |

80% | 0.84 |

90% | 1.28 |

95% | 1.64 |

Inserting the required information into the formula gives: -

*n* = [1.96 + 0.84]^{2} * [(0.50*0.50) + (0.34* 0.66)] /
[0.16]^{2} = 146

This gives the number required in each of the trial's two groups. Therefore the total sample size is double this, i.e. 292.

Suggested description of this sample size calculation: -

"A sample size of 292 babies (146 in each of the treatment and placebo groups) will be sufficient to detect a difference of 16% between groups in the sepsis rate at 14 days, with 80% power and a 5% significance level. This 16% difference represents the difference between a 50% sepsis rate in the placebo group and a 34% rate in the treatment group."

Note: The following calculation only applies when you intend to compare two groups of the same size.

Scenario: A randomised controlled trial has been planned to evaluate a brief
psychological intervention in comparison to usual treatment in the reduction of
suicidal ideation amongst patients presenting at hospital with deliberate
self-poisoning. Suicidal ideation will be measured on the Beck scale; the
standard deviation of this scale in a previous study was 7.7, and a difference
of 5 points is considered to be of clinical importance. It is anticipated that
around one third of patients may drop out of treatment (Guthrie *et al.* 2001)

Required information: -

- Primary outcome variable = The Beck scale for suicidal ideation. A continuous variable summarised by means.
- Standard deviation = 7.7 points
- Size of difference of clinical importance = 5 points
- Significance level = 5%
- Power = 80%
- Type of test = two-sided

The formula for the sample size for comparison of 2 means (2-sided) is as follows: -

*n* = [*A* + *B*]^{2} * 2 * *SD*^{2} /
*DIFF*^{2}

where *n* = the sample size required in each group (double this for total
sample).

*SD* = standard deviation, of the primary outcome variable - here 7.7.

*DIFF* = size of difference of clinical importance - here 5.0.

*A* depends on desired significance level (see table) - here 1.96.

*B* depends on desired power (see table) - here 1.28.

Table of values for A and B

Significance level | A |
---|---|

5% | 1.96 |

1% | 2.58 |

Power | B |

80% | 0.84 |

90% | 1.28 |

95% | 1.64 |

Inserting the required information into the formula gives: -

*n* = [1.96 + 0.84]^{2} * 2 * 7.7^{2} / 5.0^{2}
= 38

This gives the number required in each of the trial's two groups. Therefore the total sample size is double this, i.e. 76.

To allow for the predicted dropout rate of around one third, the sample size was increased to 60 in each group, a total sample of 120.

Suggested description of this sample size calculation: -

"A sample size of 38 in each group will be sufficient to detect a difference of 5 points on the Beck scale of suicidal ideation, assuming a standard deviation of 7.7 points, a power of 80%, and a significance level of 5%. This number has been increased to 60 per group (total of 120), to allow for a predicted drop-out from treatment of around one third"

"A previous study in this area recruited 150 subjects and found highly significant results (p=0.014), and therefore a similar sample size should be sufficient here."

Previous studies may have been 'lucky' to find significant results, due to random sampling variation. Calculations of sample size specific to the present, proposed study should be provided - including details of power, significance level, primary outcome variable, effect size of clinical importance for this variable, standard deviation (if a continuous variable), and sample size in each group (if comparing groups).

"Sample sizes are not provided because there is no prior information on which to base them."

Every effort should be made to find previously published information on which to base sample size calculations, or a small pre-study may be conducted to gather this information.

Where prior information on standard deviations is unavailable, sample size calculations can be given in very general terms, i.e. by giving the size of difference that may be detected in terms of a number of standard deviations.

However, if funding is being requested for very preliminary pilot studies (see A-1.9), aimed at assessing feasibility or gathering the information required to calculate sample sizes for a full-scale study, then sample size calculations are not necessary.

"The throughput of the clinic is around 50 patients a year, of whom 10% may refuse to take part in the study. Therefore over the 2 years of the study, the sample size will be 90 patients. "

Although most studies need to balance feasibility with study power, the sample size should not be decided on the number of available patients alone.

Where the number of available patients is a known limiting factor, sample size calculations should still be provided, to indicate either a) the power which the study will have to detect the desired difference of clinical importance, or b) the difference which will be detected when the desired power is applied.

Where the number of available patients is too small to provide sufficient power to detect differences of clinical importance, you may wish to consider extending the length of the study, or collaborating with a colleague to conduct a multi-centre study.

Altman DG. (1991) *Practical Statistics for Medical Research.* Chapman
and Hall, London.

Armitage P, Berry G, Matthews JNS. (2002) *Statistical Methods in Medical
Research, 4th ed.* Blackwell, Oxford.

Bland JM and Altman DG. (1994).
One and two sided tests of significance.
*British Medical Journal* ** 309 ** 248.

Bland M. (2000)
*An Introduction to Medical Statistics, 3rd. ed.*
Oxford University Press, Oxford.

Elashoff JD. (2000) nQuery Advisor Version 4.0 User's Guide. Los Angeles, CA.

Guthrie E, Kapur N, Mackway-Jones K, Chew-Graham C, Moorey J, Mendel E,
Marino-Francis F, Sanderson S, Turpin C, Boddy G, Tomenson B. (2001)
Randomised controlled trial of brief psychological intervention after
deliberate self poisoning. *British Medical Journal* **323**,
135-138.

Lemeshow S, Hosmer DW, Klar J & Lwanga SK. (1996) *Adequacy of sample
size in health studies.* John Wiley & Sons, Chichester.

Machin D, Campbell MJ, Fayers P, Pinol, A. (1998) *Statistical Tables for
the Design of Clinical Studies, Second Edition* Blackwell, Oxford.

Pocock SJ. (1983) *Clinical Trials: A Practical
Approach.* John Wiley and Sons, Chichester.

Thomas M, McKinley RK, Freeman E, Foy C. (2001)
Prevalence of dysfunctional breathing in patients treated for asthma in primary
care: cross sectional survey. *British Medical Journal* **322**,
1098-1100.

Whitehead, J. (1997) *The Design and Analysis of Sequential Clinical
Trials, revised 2nd. ed.* Chichester, Wiley.

Back to Brief Table of Contents.

Back to Martin Bland's home page.

This page is maintained by Martin Bland.

Last updated: 11 September, 2009.