To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We have seen that relative frequencies converge on theoretical probabilities. But how fast? When can we begin to use an observed relative frequency as a reliable estimate of a probability? This chapter gives some answers. They are a little more technical than most of this book. For practical purposes, all you need to know is how to use is the three boxed Normal Facts below.
EXPERIMENTAL BELL-SHAPED CURVES
On page 191 we had the result of a coin-tossing experiment. The graph was roughly in the shape of a bell. Many observed distributions have this property.
Example: Incomes. In modern industrialized countries we have come to expect income distributions to look something like Curve 1 on the next page, with a few incredibly rich people at the right end of the graph. But in feudal times there was no middle class, so we would expect the income distribution in Curve 2. It is “bimodal”–it has two peaks.
Example: Errors. We can never measure with complete accuracy. Any sequence of “exact” measurements of the same quantity will show some variation. We often average the results. We can think of this as a sample mean. A good measuring device will produce results that cluster about the mean, with a small sample standard deviation. A bad measuring device gives results that vary wildly from one another, giving a large standard deviation.
The most important new idea about probability is the probability that something happens, on condition that something else happens. This is called conditional probability.
CATEGORICAL AND CONDITIONAL
We express probabilities in numbers. Here is a story I read in the newspaper. The old tennis pro Ivan was discussing the probability that the rising young star Stefan would beat the established player Boris in the semifinals. Ivan was set to play Pete in the other semifinals match. He said,
The probability that Stefan will beat Boris is 40%.
Or he could have said,
The chance of Stefan's winning is 0.4.
These are categorical statements, no ifs and buts about them. Ivan might also have this opinion:
Of course I'm going to win my semifinal match, but if I were to lose, then Stefan would not be so scared of meeting me in the finals, and he would play better; there would then be a 50–50 chance that Stefan would beat Boris.
This is the probability of Stefan's winning in his semifinal match, conditional on Ivan losing the other semifinal. We call it the conditional probability. Here are other examples:
Categorical: The probability that there will be a bumper grain crop on the prairies next summer.
Conditional: The probability that there will be a bumper grain crop next summer, given that there has been very heavy snowfall the previous winter. […]
Inductive logic is risky. We need it when we are uncertain. Not just uncertain about what will happen, or what is true, but also when we are uncertain about what to do. Decisions need more than probability. They are based on the value of possible outcomes of our actions. The technical name for value is utility. This chapter shows how to combine probability and utility. But it ends with a famous paradox.
ACTS
Should you open a small business?
Should you take an umbrella?
Should you buy a Lotto ticket?
Should you move in with someone you love?
In each case you settle on an act. Doing nothing at all counts as an act.
Acts have consequences.
You go broke (or maybe found a great company).
You stay dry when everyone else is sopping wet (or you mislay your umbrella).
You waste a dollar (or perhaps win a fortune).
You live happily ever after (or split up a week later).
You do absolutely nothing active at all: that counts as an act, too. Some consequences are desirable. Some are not. Suppose you can represent the cost or benefit of a possible consequence by a number–so many dollars, perhaps. Call that number the utility of the consequence.
Suppose you can also represent the probability of each possible consequence of an act by a number.
Some core connections between the basic rules of probability, frequency-type probability, and statistical stability. These facts provide the foundations for frequency reasoning. The chapter ends by stating Bernoulli's Theorem, one of the most fundamental facts about probability.
Now we move from three chapters using the belief perspective to four chapters using the frequency perspective. Chapters 13 and 14 gave one reason why belief-type probabilities should satisfy the basic rules of probability. Chapter 15 showed how to apply that result to “learning from experience.” Chapters 16–19 do something similar from the frequency perspective.
THE PROGRAM
▪ This chapter describes some deductive connections between probability rules and our intuitions about stable frequencies.
▪ Chapter 17 extends these connections.
▪ Chapter 18 presents one core idea of frequency-type inductive inference–the significance idea.
▪ Chapter 19 presents a second core idea of frequency-type inductive inference–the confidence idea. This idea explains the way opinion polls are now reported. It also explains how we can think of the use of statistics as inductive behavior.
All the results described in this chapter are deductions from the basic rules of probability. The results are only stated, and not proved, because the proofs are more difficult than other material in this book.
BELIEF AND FREQUENCY COMPARED
The basic rules are for any type of probability. Belief-type and frequency-type probabilities emphasize two fundamentally different types of consequences of the basic rules.
Our final example of inductive logic denies that we make inferences at all. Instead, we behave inductively, in such a way that our system for making decisions has good overall consequences for us. This is the theory of confidence intervals.
SAMPLES AND POPULATIONS
In Chapter 2 there was a box of 60 oranges–a population of oranges. We drew 4 oranges at random–a sample. In Chapter 2 we distinguished two forms of argument:
Statement about a population.
So,
Statement about a sample.
Statement about a sample.
So,
Statement about a population.
Bernoulli's Theorem, applied to sampling with replacement from an urn, makes a statement about a sample on the basis of knowledge about the population of the urn and the sampling method. It is an example of the first type of argument.
Now we want to go in the other direction. We take a sample. We want to draw a conclusion about a population. A significance test involves one type of reasoning but does not go very far. We often want to use a sample to estimate something about a population. The most familiar type of estimate based on a sample is the opinion poll.
OPINION POLLS
Before we do some inductive logic, we should pause to think realistically about survey sampling. Consider a controversial survey topic.
This chapter summarizes the rules you have been using for adding and multiplying probabilities, and for using conditional probability. It also gives a pictorial way to understand the rules.
The rules that follow are informal versions of standard axioms for elementary probability theory.
ASSUMPTIONS
The rules stated here take some things for granted:
▪ The rules are for finite groups of propositions (or events).
▪ If A and B are propositions (or events), then so are AvB, A&B, and ∼A.
▪ Elementary deductive logic (or elementary set theory) is taken for granted.
▪ If A and B are logically equivalent, then Pr(A) = Pr(B). [Or, in set theory, if A and B are events which are provably the same sets of events, Pr(A) = Pr(B).]
NORMALITY
The probability of any proposition or event A lies between 0 and 1.
1. (1) 0 ≤ Pr(A) ≤ 1
Why the name “normality”? A measure is said to be normalized if it is put on a scale between 0 and 1.
CERTAINTY
An event that is sure to happen has probability 1. A proposition that is certainly true has probability 1.
(2) Pr(certain proposition) = 1
Pr(sure event) = 1
Often the Greek letter Ω is used to represent certainty: Pr(Ω) = 1.
ADDITIVITY
If two events or propositions A and B are mutually exclusive (disjoint, incompatible), the probability that one or the other happens (or is true) is the sum of their probabilities.
Bayes' Rule is central for applications of personal probability. It offers a way to represent rational change in belief, in the light of new evidence.
BAYES' RULE
Bayes' Rule is of very little interest when you are thinking about frequency-type probabilities. It is just a rule. On page 70 we derived it in a few lines from the definition of conditional probability.
For many problems–shock absorbers, tarantulas, children with toxic metals poisoning, taxicabs–it shortens some calculations. That's all.
But Bayes' Rule really does matter to personal probability, or to any other belief-type probability.
Today, belief-type probability approaches are often called “Bayesian.” If you hear a statistician talking about a Bayesian analysis of a problem, it means some version of ideas that we discuss in this chapter. But there are many versions, ranging from personal to logical. An independent-minded Bayesian named I. J. Good (see page 184) figured out that, in theory, there are 46,656 ways to be a Bayesian!
HYPOTHESES
“Hypothesis” is a fancy word, but daily life is full of hypotheses. Most decisions depend upon comparing the evidence for different hypotheses.
Should Albert quit this course? The drop date is tomorrow. He is a marginal student and has done poorly so far. Will the course get harder, as courses often do? (Hypothesis 1) Or will it stay at its present level, where he can pass the course? (Hypothesis 2)
How belief-type probability can be applied to the problem of induction using the idea of learning from experience by Bayes' Rule.
The idea is already present in Chapters 13–15.
▪ We can represent degrees of belief by numbers between 0 and 1.
▪ Degrees of belief represented by these numbers should satisfy the basic laws of probability, on pain of being “incoherent” if they don't.
▪ If they do satisfy these laws, then Bayes' Rule follows.
▪ Hence we can update earlier degrees of belief by new evidence in a coherent, “rational” way.
This evasion of the problem of induction is called Bayesian.
The Bayesian does not claim to be able to justify any given set of degrees of belief as being uniquely rational. He does claim that he can tell you how it is reasonable to change your beliefs in the light of experience.
The Bayesian says to Hume:
Hume, you're right. Given a set of premises, supposed to be all the reasons bearing on a conclusion, you can form any opinion you like.
But you're not addressing the issue that concerns us!
At any point in our grown-up lives (let's leave babies out of this), we have a lot of opinions and various degrees of belief about our opinions. The question is not whether these opinions are “rational.” The question is whether we are reasonable in modifying these opinions in the light of new experience, new evidence.
The idea of probability leads in two different directions: belief and frequency. Probability makes us think of the degree to which we can be confident of something uncertain, given what we know or can find out. Probability also makes us think of the relative frequency with which some outcome occurs on repeated trials on a chance setup.
Thus far we have used both ideas almost interchangeably, because the basic rules for calculating with each are virtually identical. But now we have to distinguish between them, because the philosophical and practical uses of these two ideas are very different. The distinction is essential for the rest of this book (and for all clear thinking about probability).
We have been doing all these calculations about probabilities, and have not said a word about what we mean by “probability” itself. Now we are going to set things right. Up to now it has not mattered a great deal what we mean by the word. From now on it will make all the difference.
This chapter is an example of one kind of philosophy, often called analytic philosophy. We will try to come to grips with different concepts associated with the idea of probability. Many students find this chapter the hardest one of all. Not surprising! The distinctions that we have to make have bedeviled probability theorists–including some of the very best–for more than 200 years.
Sometimes a decision problem can be solved without using probabilities or expected value at all. These are situations in which one strategy dominates all others, no matter what happens in the actual world. This is called dominance. It is illustrated by a very famous argument for acting as if you believed in God. Variations on that argument lead to other decision rules.
The expected value rule can be used only when some probabilities are available. Sometimes we are so ignorant that we are not inclined to talk even of the probabilities of different alternatives. That is the extreme end of uncertainty. Yet there may still be more or less reasonable decisions.
DOMINANCE
It was a dark and foggy night when Peter made his first trip to Holland, the homeland of his parents. His parents gave him enough money to rent a car to see the family, but after that he was practically broke. He was planning to stay with his distant relatives. So he was driving along a road, somewhat lost, when he came to an overhead signpost. Unfortunately, a garbage bag had blown over the front of the sign, obscuring the first three letters of each town. What he saw was:
▪ AVENHAGE ↑
▪ TERDAM →
Peter figured the topmost town must be “'s Gravenhage” (in English, “The Hague,” where the World Court is located). But the second town might be “Amsterdam” or “Rotterdam.” What should he do?
One of the most useful consequences of the basic rules helps us understand how to make use of new evidence. Bayes' Rule is one key to “learning from experience.”
Chapter 5 ended with several examples of the same form: urns, shock absorbers, weightlifters. The numbers were changed a bit, but the problems in each case were identical.
For example, on page 51 there were two urns A and B, each containing a known proportion of red and green balls. An urn was picked at random. So we knew:
Pr(A) and Pr(B).
Then there was another event R, such as drawing a red ball from an urn. The probability of getting red from urn A was 0.8. The probability of getting red from urn B was 0.4. So we knew:
Pr(R/A) and Pr(R/B).
Then we asked, what is the probability that the urn drawn was A, conditional on drawing a red ball? We asked for:
Pr(A/R) =? Pr(B/R) =?
Chapter 5 solved these problems directly from the definition of conditional probability. There is an easy rule for solving problems like that. It is called Bayes' Rule.
In the urn problem we ask which of two hypotheses is true: Urn A is selected, or Urn B is selected. In general we will represent hypotheses by the letter H.
We perform an experiment or get some evidence: we draw at random and observe a red ball. In general we represent evidence by the letter E.
There have been two fundamentally different approaches to probability. One emphasizes the frequency idea. The other emphasizes the belief idea.
Some theorists say that only one of those two ideas really matters. We will call them dogmatists. In this book we are eclectic. Here are two definitions taken from a dictionary:
♦ Eclectic. Adjective. 1. (in art, philosophy, etc.) Selecting what seems best from various styles, doctrines, ideas, methods, etc.
♦ Dogmatic. Adjective. 1. Characterized by making authoritative or arrogant assertions or opinions, etc.
FREQUENCY DOGMATISTS
Some experts believe that all inductive reasoning should be analyzed in terms of frequency-type probabilities. This is a dogmatic philosophy, saying that inductive reasoning should rely on exactly one use of probability. Belief dogmatists often say that frequency-type probabilities “have no role in science.”
BELIEF DOGMATISTS
Some experts believe that all inductive reasoning should be analyzed in terms of belief-type probabilities. This is a dogmatic philosophy, saying that inductive reasoning should rely on exactly one use of probability. Belief dogmatists often say that frequency-type probability “doesn't even make sense.”
OUR ECLECTIC APPROACH
Chapters 16–19 and 22 use the frequency idea. Chapters 13–15 and 21 use the belief idea.
Luckily, most (but not all) data and arguments that a frequency dogmatist can analyze, a belief dogmatist can analyze too. And vice versa. Only in rather specialized situations do the two schools of thought draw really different inferences from the same data.
Statistical hypotheses are compared with data, often collected in carefully designed experiments. Evidence may lead us tentatively to accept or reject hypotheses. Evidence can be good or bad; it can be more, or less, convincing. When is it significant? What are the underlying ideas about accepting and rejecting hypotheses? This chapter introduces two fundamentally different ways of thinking about these issues, both of which are deeply entrenched in statistical practice. One idea is that of significance tests. Another is the power of a test to discriminate false hypotheses.
ASTROLOGY
Four members of this class went for coffee after the first meeting. Two of them had the same astrological sign (of the zodiac). There are 12 signs of the zodiac. Is this significant? Were they fated to meet?
We need to consider plausible models and ask, how likely is it, that this phenomenon would occur by chance alone?
Theoretical probability model: each person is assigned a sign by a chance setup with equal probability for each sign–just as if you drew your sign from a pack of twelve different cards, say a pack of all the clubs except the ace.
Think of a deck of cards with the aces removed, leaving 4 suits of 12 different cards each (analogous to the 12 signs and the 4 people). If we select at random a card from each suit, what is the probability that we get at least two cards that match in value?