We are constantly bombarded by statistics, a £million here, a £billion there, a 2% rise here, a 1% fall there. What does it all mean? Some people implicitly believe all the reported conclusions, others take "lies, damned lies, and statistics" as their mantra, and believe none of it. This little book, on the other hand, aims for the middle ground, and suggests ways to help evaluate the numbers. It is arranged around a series of chapters covering a wide variety of topics: "how big are those £billions really?" (divide annual budgets by 3 billion to get pounds per person per week: lots of money divided by a lot of people make some of those quoted billions remarkably small on a personal level); counting (when is an assault not an assault; why no one has ever escaped from a Finnish open prison; why school league tables are virtually meaningless); chance clusters (randomness is clumpy, not smoothly distributed, so a cluster of events doesn't necessarily imply a special cause); reporting the outliers or extreme values as "could be up to xxx!!!" (so add a mental rejoinder of "but probably won't be"); how a few extreme values can skew averages (most people earn quite a bit less than the average wage; nearly everyone has more than the average number of feet); the assumption that correlation implies causation (it may instead be a symptom, or both may be caused by a third factor, or something else); and more.
Here's an example application. As I was driving home the day after reading this book, I saw a sign that said "up to one fifth of all road accidents are caused by tired drivers." Being a natural cynic, I must admit that my immediate reaction on seeing it was: hmm, so, four fifths are caused by drivers who aren't tired? But as this book teaches, to evaluate that number, we need to know what proportion of all drivers are tired (if it's a minuscule proportion, then this is serious, with a small minority of drivers involved with a big problem; if it's one fifth, then it might be irrelevant; and if it's most drivers, then it might seem to be worth driving tired!). And additionally, we need to check that like is being compared with like, so what constitutes "tired" (where on the spectrum from yawning a couple of times, to falling asleep at the wheel, and is it the same in both sets?), and what constitutes "road accident" (where on the spectrum from a slight fender-bender in a traffic queue, to a fatal accident, and is it the same in both sets?) And what about that automatic assumption of correlation being causality: was it the driver's tiredness that caused the accident? Maybe it was whatever they were doing to counteract their tiredness, or maybe it's just that a fifth of all drivers are tired? I don't know the answers to any of these: I would need to do more research to find out.
This is a good book: it gives a thoughtful account of how difficult it is to get good statistics in the first place, and the horrors that can result from basing policy and targets on poor statistics and single measures. (I would have like some more pictures and diagrams to help illustrate some of the points being made: maybe their paucity is a result of this book's genesis as a radio programme?) Anyhow, applying its principles should help you make better judgements on the statistics reported in the news. Of course, some of those better judgements will be relatively dull: that it's not as exciting as the screaming headline implies, or (more likely) that there is insufficient information given for you to make a reasoned evaluation (although there is a short section on how to get rough estimates of some of the missing data). So beware. If you take this to heart, you won't ever be able to look at news reports in the same way again. Ever since I read Darrell Huff's How to Lie with Statistics, particularly the part on misleading graphs, and later Edward Tufte's elegant series on how to do graphs right, I can't look at any graphic in the news without muttering something like "misleading axes", and mentally redrawing it. This could affect your reaction to numbers in the same way!