Question 5: How could we use a transformation for these data? What problems would be caused by the subject who had zero arm movements recorded?
The figure below shows the data after a logarithmic transformation:
The observation which is zero causes a problem for log transformation, as zero has no logarithm. We have set this to the log of half the next largest observation (4), which is a bit arbitrary but better than making it a missing value. Another possibility would be to add a small number, such as one, before transforming. The log transformation corrects the skewness. However, the variance in the CHD cases is now less than in the controls so the transformation has over-corrected.
The figure below shows the data after a square root transformation:
The square root transformation looks better. The variable is a count of the number movements in a fixed time and therefore the square root transformation is the first transformation we would try.
The p-value from the two sample t test using the square root transformation is just smaller than that for the log transformation (untransformed: P=0.0134, log: P=0.0071, square root: P=0.0066). Making the data fit the assumptions of the method better usually reduces the P value.
Back to Exercise: Transformations.
This page maintained by Martin Bland.
Last updated: 27 July, 2009.