*Probabilistic Reasoning in Intelligent Systems: revised edn*. 1991*Causality*. 2000*Causal Inference in Statistics*. 2016, with Madelyn Glymour, Nicholas P. Jewell2018, with Dana Mackenzie*The Book of Why*.

PROBABILISTIC REASONING IN INTELLIGENT SYSTEMS is a complete and accessible account
of the theoretical foundations and computational methods
that underlie plausible reasoning under uncertainty.
The author provides a coherent explication of probability
as a language for reasoning with partial belief
and offers a unifying perspective on other Al approaches to uncertainty,
such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic.

The author distinguishes syntactic and semantic approaches to uncertainty — and offers techniques, based on belief networks, that provide a mechanism for making semantics-based systems operational. Specifically, network propagation techniques serve as a mechanism for combining the theoretical coherence of probability theory with modern demands of reasoning systems technology: modular declarative inputs, conceptually meaningful inferences, and parallel distributed computation. Application areas include diagnosis, forecasting, image interpretation, multi-sensor fusion, decision support systems, plan recognition, planning, speech recognition — in short, almost every task requiring that conclusions be drawn from uncertain clues and incomplete information.

PROBABILISTIC REASONING IN INTELLIGENT SYSTEMS will be of special interest to scholars and researchers in AI, decision theory, statistics, logic, philosophy, cognitive psychology, and the management sciences. Professionals in the areas of knowledge-based systems, operations research, engineering and statistics will find theoretical and computational tools of immediate practical use. The book can also be used as an excellent text for graduate-level courses in Al, operations research or applied probability.

Second printing (1991) includes expanded BIBLIOGRAPHICAL AND HISTORIGAL REMARKS sections for each chapter and updated current references throughout.

Causality is central to the understanding and use of data.
Without an understanding of cause-effect relationships,
we cannot use data to answer questions as basic as
“Does this treatment harm or help patients?”
But though hundreds of introductory texts are available
on statistical methods of data analysis, until now,
no beginner-level book has been written about
the exploding arsenal of methods that can tease causal information from data.

*Causal Inference in Statistics* fills that gap.
Using simple examples and plain language,
the book lays out how to define causal parameters;
the assumptions necessary to estimate causal parameters in a variety of situations;
how to express those assumptions mathematically;
whether those assumptions have testable implications;
how to predict the effects of interventions; and how to reason counterfactually.
These are the foundational tools that any student of statistics
needs to acquire in order to use statistical methods to answer causal questions of interest.

This book is accessible to anyone with an interest in interpreting data, from undergraduates, professors, researchers, or to the interested layperson. Examples are drawn from a wide variety of fields, including medicine, public policy, and law; a brief introduction to probability and statistics is provided for the uninitiated; and each chapter comes with study questions to reinforce the readers understanding.

We have all heard the old saying “correlation is not causation”. This is a problem for statistics, since all it can measure is correlation. Pearl here argues that this is because statisticians are restricting themselves too much, and that it is possible to do more. There is no magic; to get this more, you have to add something into the system, but that something is very reasonable: a causal model.

He organises his argument using the three-runged “ladder of causation”.
On the bottom rung is pure statistics, reasoning about *observations*:
what is the probability of recovery, found from observing these people who have taken a drug.
The second rung allows reasoning about *interventions*:
what is the probability of recovery, if I were to give these other people the drug.
And the top rung includes reasoning about *counterfactuals*:
what would have happened if that person had not received the drug?

Intervention (rung 2) is different from observation alone (rung 1)
because the observations may be (almost certainly are) of a biassed group:
observing only those who took the drug for whatever reason,
maybe because they were already sick in a particular hospital,
or because they were rich enough to afford it, or some other confounding variable.
The intervention, however, is a different case: people are specifically given the drug.
The purely statistical way of moving up to rung 2 is to run a *randomised control trial* (RCT),
to remove the effect of confounding variables,
and thereby to make the observed results the same as the results from intervention.
The RCT is often known as the “gold standard” for experimental research for this reason.

But here’s the thing: what is a confounding variable, and what is not?
In order to know what to control for, and what to ignore,
the experimenter has to have some kind of implicit *causal model* in their head.
It has to be implicit, because statisticians are not allowed to talk about causality!
Yet it must exist to some degree, otherwise how do we even know which variables to measure,
let alone control for?
Pearl argues to make this causal model *explicit*, and use it in the experimental design.
Then, with respect to this now explicit causal model,
it is possible to reason about results more powerfully.
(He does not address how to *discover* this model:
that is a different part of the scientific process, of modelling the world.
However, observations can be used to *test* the model to some degree:
some models are simply too causally strong to support the observed situation.)

Pearl uses this framework to show how and why the RCT works.
More importantly, he also shows that it is possible to reason about interventions
sometimes from observations alone (hence data mining pure observations becomes more powerful),
or sometimes with fewer controlled variables, without the need for a full RCT.
This is extremely useful, since there are many cases where RCTs
are unethical, impractical, or too expensive.
RCTs are not the “gold standard” after all;
they are basically a dumb sledgehammer approach.
He also shows how to use the causal model to calculate which variables do need to be controlled for,
and how controlling for certain variables is precisely the *wrong* thing to do.

Using such causal models also allows us to ascend to the third rung: reasoning about counterfactuals, where experiments are in principle impossible. This gives us power to reason about different worlds: What’s the probability that Fred would have died from lung cancer if he hadn’t smoked? What’s the probability that heat wave would have happened with less CO2 in the atmosphere?

[p51]
probabilities encode our beliefs about a static world,
causality tells us whether and how probabilities change when the world changes, be it by intervention or by act of imagination.

This is a very nicely written book,
with many real world examples.
The historical detail included shows how and why statisticians neglected causality.
It is not always an easy read – the concepts are quite intricate in places –
but it is a crucially *important* read.
We should never again bow down to “correlation is not causation”:
we now know how to discover when it *is*.