# Bayesian Statistics

## Preface to the First Edition

When I first learned a little statistics, I felt confused, and others I spoke to confessed that they had similar feelings. Not because the mathematics was difficult—most of that was a lot easier than pure mathematics—but because I found it difficult to follow the logic by which inferences were arrived at from data. It sounded as if the statement that a null hypothesis was rejected at the 5% level meant that there was only a 5% chance of that hypothesis was true, and yet the books warned me that this was not a permissible interpretation. Similarly, the statement that a 95% confidence interval for an unknown parameter ran from -2 to +2 sounded as if the parameter lay in that interval with 95% probability and yet I was warned that all I could say was that if I carried out similar procedures time after time then the unknown parameters would lie in the confidence intervals I constructed 95% of the time. It appeared that the books I looked at were not answering the questions that would naturally occur to a beginner, and that instead they answered some rather recondite questions which no-one was likely to want to ask.

Subsequently, I discovered that the whole theory had been worked out in very considerable detail in such books as Lehmann (1959 and 1986). But attempts such as those that Lehmann describes to put everything on a firm foundation raised even more questions. I gathered that the usual t test could be justified as a procedure that was “uniformly most powerful unbiased”, but I could only marvel at the ingenuity that led to the invention of such criteria for the justification of the procedure, while remaining unconvinced that they had anything sensible to say about a general theory of statistical inference. Of course, Lehmann and others with an equal degree of common sense were capable of developing more and more complicated constructions and exceptions so as to build up a theory that appeared to cover most problems without doing anything obviously silly, and yet the whole enterprise seemed reminiscent of the construction of epicycycle upon epicycle in order to preserve a theory of planetary motion based on circular motion; there seemed to be an awful lot of “adhockery”.

I went on to discover that this theory could lead to the sorts of conclusions that I had naïvely expected to get from statistics when I first learned about it. Indeed, some lecture notes of Lindley’s [and subsequently his book, Lindley (1965)] and the pioneering book by Jeffreys (1939, 1948 and 1961) showed that if the statistician had “personal probabilities” that were of a certain conventional type then conclusions very like those in the elementary books I had first looked at could be arrived at, with the difference that a 95% confidence interval really did mean an interval in which the statistician was justified in thinking that there was a 95% probability of finding the unknown parameter. On the other hand, there was the further freedom to adopt other initial choices of personal beliefs and thus to arrive at different conclusions.

Over a number of years I taught the standard, classical, theory of statistics to a large number of students, most of whom appeared to have similar difficulties to those I had myself encountered in understanding the nature of the conclusions that this theory comes to. However, the mere fact that students have difficulty with a theory does not prove it wrong. More importantly, I found the theory did not improve with better acquaintance, and I went on studying Bayesian theory. It turned out that there were real differences in the conclusions arrived at by classical and Bayesian statisticians, and so the former was not just a special case of the latter corresponding to a conventional choice of prior beliefs. On the contrary, there was a strong disagreement between statisticians as to the conclusions to be arrived at in certain standard situations, of which I will cite three examples for now. One concerns a test of a sharp null hypothesis (for example a test that the mean of a distribution is exactly equal to zero), especially when the sample size was large. A second concerns the Behrens-Fisher problem, that is, the inferences that can be made about the difference between the means of two populations when no assumption is made about their variances. Another is the likelihood principle, which asserts that you can only take account of the probability of events that have actually occurred under various hypotheses, and not of events that might have happened but did not; this principle follows from Bayesian statistics and is contradicted by the classical theory. A particular case concerns the relevance of stopping rules, that is to say whether or not you are entitled to take into account the fact that the experimenter decided when to stop experimenting depending on the results so far available rather than having decided to use a fixed sample size all along. The more I thought about all these controversies, the more I was convinced that the Bayesians were right on these disputed issues.

At long last, I decided to teach a third-year course on Bayesian statistics in the University of York, which I have now done for a few years. Most of the students who took the course did find the theory more coherent than the classical theory they had learned in the first course on mathematical statistics they had taken in their second year, and I became yet more clear in my own mind that this was the right way to view statistics. I do, however, admit that there are topics (such as nonparametric statistics) which are difficult to fit into a Bayesian framework.