Exercise: Estimation of sample size

1. An opinion pollster wanted to estimate voter preferences to within two percentage points. How could the sample size to do this be decided?

Suggested answer

The paper ‘Defibrotide for prophylaxis of hepatic veno-occlusive disease in paediatric haemopoietic stem-cell transplantation: an open-label, phase 3, randomized controlled trial’ (Corbacioglu et al. 2012) contains the following statement:

‘We estimated the sample size on the basis of the primary endpoint, assuming rates of veno-occlusive disease to be 30% [four references] in the control group and 15% in the defibrotide group. Assuming a two-sided level of significance at 0.05, power of 80%, and a 10% dropout rate, 135 patients per group were needed (270 patients in total)’.

2. What is meant by ‘two-sided level of significance at 0.05’ and why did they need to specify this?

Suggested answer

3. What are 30% and 15% in this statement and from where do they come?

Suggested answer

4. What is meant by ‘power of 80%’?

Suggested answer

The authors go on to report that: ‘The data and safety monitoring board inspected the planned adaptive interim analysis on the primary endpoint, and recommended that the sample size be increased to 180 patients per group to achieve a conditional power for significance of 80%’.

The results were that: ‘22 (12%) of 180 participants in the defibrotide group had veno-occlusive disease by 30 days after HSCT compared with 35 (20%) of 176 controls (risk difference –7.7%, 95% CI –15.3 to –0.1; Z test . . . p = 0.048 8 . . .)’.

The 95% confidence intervals for the proportions with veno-occlusive disease were 7% to 17% for the defibrotide group and 14% to 26% for the control group.

5. How do the results of the study compare with the sample size calculations?

Suggested answer

In a proposed trial of a health promotion programme, the programme was to be implemented across a whole county. The plan was to use four counties, two counties to be allocated to receive the programme and two counties to act as controls. The programme would be evaluated by a survey of samples of about 750 subjects drawn from the at-risk populations in each county. A conventional sample size calculation, which ignored the clustering, had indicated that 1,500 subjects in each treatment group would be required to give power 80% to detect the required difference. The applicants were aware of the problem of cluster randomization and the need to take it into account in the analysis, e.g. by analysis at the level of the cluster (county). They had an estimate of the intracluster correlation = 0.005, based on a previous study. They argued that this was so small that they could ignore the clustering.

6. Were they correct?

Suggested answer

Back to top.

Questions taken from Martin Bland, An Introduction to Medical Statistics OUP, 2015.


To Statistics for Research index.

To Martin Bland's home page.

This page maintained by Martin Bland.
Last updated: 3 February, 2020.

Back to top.