To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we provide a number of probability problems that challenge the reader to test his or her feeling for probabilities. As stated in the introduction, it is possible to fall wide of the mark when using intuitive reasoning to calculate a probability, or to estimate the order of magnitude of a probability. To find out how you fare in this regard, it may be useful to try one or more of these twelve problems. They are playful in nature but are also illustrative of the surprises one can encounter in the solving of practical probability problems. Think carefully about each question before looking up its solution. Solving probability problems usually requires creative thinking, more than technical skills. All of the solutions to the probability questions posed in this chapter can be found scattered throughout the ensuing chapters.
Question 1. A birthday problem (§3.1, §4.2.3)
You go with a friend to a football (soccer) game. The game involves 22 players of the two teams and one referee. Your friend wagers that, among these 23 persons on the field, at least two people will have birthdays on the same day. You will receive ten dollars from your friend if this is not the case. How much money should you, if the wager is to be a fair one, pay out to your friend if he is right?
Question 2. Probability of winning streaks (§2.1.3, §5.10.1)
A basketball player has a 50% success rate in free throw shots.
How does one calculate the probability of throwing heads more than fifteen times in 25 tosses of a fair coin? What is the probability of winning a lottery prize? Is it exceptional for a city that averages eight serious fires per year to experience twelve serious fires in one particular year? These kinds of questions can be answered by the probability distributions that we will be looking at in this chapter. These are the binomial distribution, the Poisson distribution and the hypergeometric distribution. A basic knowledge of these distributions is essential in the study of probability theory. This chapter gives insight into the different types of problems to which these probability distributions can be applied. The binomial model refers to a series of independent trials of an experiment that has two possible outcomes. Such an elementary experiment is also known as a Bernoulli experiment, after the famous Swiss mathematician Jakob Bernoulli (1654–1705). Inmost cases, the two possible outcomes of a Bernoulli experiment will be specified as “success” or “failure.” Many probability problems boil down to determining the probability distribution of the total number of successes in a series of independent trials of a Bernoulli experiment. The Poisson distribution is another important distribution and is used, in particular, to model the occurrence of rare events. When you know the expected value of a Poisson distribution, you know enough to calculate all of the probabilities of that distribution.
Why do so many students find probability difficult? Could it be the way the subject is taught in so many textbooks? When I was a student, a class in topology made a great impression on me. The teacher asked us not to take notes during the first hour of his lectures. In that hour, he explained ideas and concepts from topology in a non-rigorous, intuitive way. All we had to do was listen in order to grasp the concepts being introduced. In the second hour of the lecture, the material from the first hour was treated in a mathematically rigorous way and the students were allowed to take notes. I learned a lot from this approach of interweaving intuition and formal mathematics.
This book is written very much in the same spirit. It first helps you develop a “feel for probabilities” before presenting the more formal mathematics. The book is not written in a theorem–proof style. Instead, it aims to teach the novice the concepts of probability through the use of motivating and insightful examples. No mathematics are introduced without specific examples and applications to motivate the theory. Instruction is driven by the need to answer questions about probability problems that are drawn from real-world contexts. The book is organized into two parts. Part One is informal, using many thought-provoking examples and problems from the real world to help the reader understand what probability really means. Probability can be fun and engaging, but this beautiful branch of mathematics is also indispensable to modern science.
In Chapter 8, conditional probabilities are introduced by conditioning upon the occurrence of an event B of nonzero probability. In applications, this event B is often of the form Y = b for a discrete random variable Y. However, when the random variable Y is continuous, the condition Y = b has probability zero for any number b. The purpose of this chapter is to develop techniques for handling a condition provided by the observed value of a continuous random variable. We will see that the conditional probability density function of X given Y = b for continuous random variables is analogous to the conditional probability mass function of X given Y = b for discrete random variables. The conditional distribution of X given Y = b enables us to define the natural concept of conditional expectation of X given Y = b. This concept allows for an intuitive understanding and is of utmost importance. In statistical applications, it is often more convenient to work with conditional expectations instead of the correlation coefficient when measuring the strength of the relationship between two dependent random variables. In applied probability problems, the computation of the expected value of a random variable X is often greatly simplified by conditioning on an appropriately chosen random variable Y. Learning the value of Y provides additional information about the random variable X and for that reason the computation of the conditional expectation of X given Y = b is often simple.
Many random phenomena happen in continuous time. Examples include occurrence of cell phone calls, spread of epidemic diseases, stock fluctuations, etc. A continuous-time Markov chain is a very useful stochastic process to model such phenomena. It is a process that goes from state to state according to a Markov chain, but the times between state transitions are continuous random variables having an exponential distribution.
The purpose of this chapter is to give an elementary introduction to continuous-time Markov chains. The basic concept of the continuous-time Markov chain model is the so-called transition rate function. Several examples will be given to illustrate this basic concept. Next we discuss the time-dependent behavior of the process and give Kolmogorov's differential equations to compute the time-dependent state probabilities. Finally, we present the flow-rate-equation method to compute the limiting state probabilities and illustrate this powerful method with several examples dealing with queueing systems.
Markov chain model
A continuous-time stochastic process {X(t), t ≥ 0} is a collection of random variables indexed by a continuous time parameter t ∈ [0, ∞), where the random variable X(t) is called the state of the process at time t. In an inventory problem X(t) might be the stock on hand at time t and in a queueing problem X(t) might be the number of customers present at time t. The formal definition of a continuous-time Markov chain is a natural extension of the definition of a discrete-time Markov chain.
In previous chapters we have dealt with sequences of independent random variables. However, many random systems evolving in time involve sequences of dependent random variables. Think of the outside weather temperature on successive days, or the price of IBM stock at the end of successive trading days. Many such systems have the property that the current state alone contains sufficient information to give the probability distribution of the next state. The probability model with this feature is called a Markov chain. The concepts of state and state transition are at the heart of Markov chain analysis. The line of thinking through the concepts of state and state transition is very useful to analyze many practical problems in applied probability.
Markov chains are named after the Russian mathematician Andrey Markov (1856–1922), who first developed this probability model in order to analyze the alternation of vowels and consonants in Pushkin's poem “Eugine Onegin.” His work helped to launch the modern theory of stochastic processes (a stochastic process is a collection of random variables, indexed by an ordered time variable). The characteristic property of a Markov chain is that its memory goes back only to the most recent state. Knowledge of the current state only is sufficient to describe the future development of the process. A Markov model is the simplest model for random systems evolving in time when the successive states of the system are not independent.
Constructing the mathematical foundations of probability theory has proven to be a long-lasting process of trial and error. The approach consisting of defining probabilities as relative frequencies in cases of repeatable experiments leads to an unsatisfactory theory. The frequency view of probability has a long history that goes back to Aristotle. It was not until 1933 that the great Russian mathematician Andrej Nikolajewitsch Kolmogorov (1903–1987) laid a satisfactory mathematical foundation of probability theory. He did this by taking a number of axioms as his starting point, as had been done in other fields of mathematics. Axioms state a number of minimal requirements that the mathematical objects in question (such as points and lines in geometry) must satisfy. In the axiomatic approach of Kolmogorov, probability figures as a function on subsets of a so-called sample space, where the sample space represents the set of all possible outcomes the experiment. The axioms are the basis for the mathematical theory of probability. As a milestone, the law of large numbers can be deduced from the axioms by logical reasoning. The law of large numbers confirms our intuition that the probability of an event in a repeatable experiment can be estimated by the relative frequency of its occurrence in many repetitions of the experiment. This law is the fundamental link between theory and the real world. Its proof has to be postponed until Chapter 14.
In many practical applications of probability, physical situations are better described by random variables that can take on a continuum of possible values rather than a discrete number of values. Examples are the decay time of a radioactive particle, the time until the occurrence of the next earthquake in a certain region, the lifetime of a battery, the annual rainfall in London, and so on. These examples make clear what the fundamental difference is between discrete random variables taking on a discrete number of values and continuous random variables taking on a continuum of values. Whereas a discrete random variable associates positive probabilities to its individual values, any individual value has probability zero for a continuous random variable. It is only meaningful to speak of the probability of a continuous random variable taking on a value in some interval. Taking the lifetime of a battery as an example, it will be intuitively clear that the probability of this lifetime taking on a specific value becomes zero when a finer and finer unit of time is used. If you can measure the heights of people with infinite precision, the height of a randomly chosen person is a continuous random variable. In reality, heights cannot be measured with infinite precision, but the mathematical analysis of the distribution of heights of people is greatly simplified when using a mathematical model in which the height of a randomly chosen person is modeled as a continuous random variable.
This appendix first gives some background material on counting methods. Many probability problems require counting techniques. In particular, these techniques are extremely useful for computing probabilities in a chance experiment in which all possible outcomes are equally likely. In such experiments, one needs effective methods to count the number of outcomes in any specific event. In counting problems, it is important to know whether the order in which the elements are counted is relevant or not. After the discussion on counting methods, the appendix summarizes a number of properties of the famous number e and the exponential function ex both playing an important role in probability.
Permutations
How many different ways can you arrange a number of different objects such as letters or numbers? For example, what is the number of different ways that the three letters A, B, and C can be arranged? By writing out all the possibilities ABC, ACB, BAC, BCA, CAB, and CBA, you can see that the total number is 6. This brute-force method of writing down all the possibilities and counting them is naturally not practical when the number of possibilities gets large, for example the number of different ways to arrange the 26 letters of the alphabet. You can also determine that the three letters A, B, and C can be written down in 6 different ways by reasoning as follows. For the first position, there are 3 available letters to choose from, for the second position there are 2 letters over to choose from, and only one letter for the third position.
Generating functions were introduced by the Swiss genius Leonhard Euler (1707–1783) in the eighteenth century to facilitate calculations in counting problems. However, this important concept is also extremely useful in applied probability, as was first demonstrated by the work of Abraham de Moivre (1667–1754) who discovered the technique of generating functions independently of Euler. In modern probability theory, generating functions are an indispensable tool in combination with methods from numerical analysis.
The purpose of this chapter is to give the basic properties of generating functions and to show the utility of this concept. First, the generating function is defined for a discrete random variable on nonnegative integers. Next, we consider the more general moment-generating function, which is defined for any random variable. The (moment) generating function is a powerful tool for both theoretical and computational purposes. In particular, it can be used to prove the central limit theorem. A sketch of the proof will be given. This chapter also gives a proof of the strong law of large numbers, using moment-generating functions together with so-called Chernoff bounds. Finally, the strong law of large numbers is used to establish the powerful renewal-reward theorem for stochastic processes having the property that the process probabilistically restarts itself at certain points in time.
Generating functions
We first introduce the concept of generating function for a discrete random variable X whose possible values belong to the set of nonnegative integers.
In experiments, one is often interested not only in individual random variables, but also in relationships between two or more random variables. For example, if the experiment is the testing of a new medicine, the researcher might be interested in cholesterol level, blood pressure, and glucose level of a test person. Similarly, a political scientist investigating the behavior of voters might be interested in the income and level of education of a voter. There are many more examples in the physical sciences, medical sciences, and social sciences. In applications, one often wishes to make inferences about one random variable on the basis of observations of other random variables.
The purpose of this chapter is to familiarize the student with the notations and the techniques relating to experiments whose outcomes are described by two or more real numbers. The discussion is restricted to the case of pairs of random variables. The chapter treats joint and marginal densities, along with covariance and correlation. Also, the transformation rule for jointly distributed random variables and regression to the mean are discussed.