To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We present in this chapter one of the simplest methods for general inference in Bayesian networks, which is based on the principle of variable elimination: A process by which we successively remove variables from a Bayesian network while maintaining its ability to answer queries of interest.
Introduction
We saw in Chapter 5 how a number of real-world problems can be solved by posing queries with respect to Bayesian networks. We also identified four types of queries: probability of evidence, prior and posterior marginals, most probable explanation (MPE), and maximum a posterior hypothesis (MAP). We present in this chapter one of the simplest inference algorithms for answering these types of queries, which is based on the principle of variable elimination. Our interest here will be restricted to computing the probability of evidence and marginal distributions, leaving the discussion of MPE and MAP queries to Chapter 10.
We start in Section 6.2 by introducing the process of eliminating a variable. This process relies on some basic operations on a class of functions known as factors, which we discuss in Section 6.3. We then introduce the variable elimination algorithm in Section 6.4 and see how it can be used to compute prior marginals in Section 6.5. The performance of variable elimination will critically depend on the order in which we eliminate variables. We discuss this issue in Section 6.6, where we also provide some heuristics for choosing good elimination orders.
We discuss in this chapter a class of approximate inference algorithms which are based on belief propagation. These algorithms provide a full spectrum of approximations, allowing one to trade-off approximation quality with computational resources.
Introduction
The algorithm of belief propagation was first introduced as a specialized algorithm that applied only to networks having a polytree structure. This algorithm, which we treated in Section 7.5.4, was later applied to networks with arbitrary structure and found to produce high-quality approximations in certain cases. This observation triggered a line of investigations into the semantics of belief propagation, which had the effect of introducing a generalization of the algorithm that provides a full spectrum of approximations with belief propagation approximations at one end and exact results at the other.
We discuss belief propagation as applied to polytrees in Section 14.2 and then discuss its application to more general networks in Section 14.3. The semantics of belief propagation are exposed in Section 14.4, showing howit can be viewed as searching for an approximate distribution that satisfies some interesting properties. These semantics will then be the basis for developing generalized belief propagation in Sections 14.5–14.7. An alternative semantics for belief propagation will also be given in Section 14.8, together with a corresponding generalization. The difference between the two generalizations of belief propagation is not only in their semantics but also in the way they allow the user to trade off the approximation quality with the computational resources needed to produce them.
We consider in this chapter the computational complexity of probabilistic inference. We also provide some reductions of probabilistic inference to well known problems, allowing us to benefit from specialized algorithms that have been developed for these problems.
Introduction
In previous chapters, we discussed algorithms for answering three types of queries with respect to a Bayesian network that induces a distribution Pr(X). In particular, given some evidence e we discussed algorithms for computing:
The probability of evidence e, Pr(e) (see Chapters 6–8)
The MPE probability for evidence e, MPEP (e) (see Chapter 10)
The MAP probability for variables Q and evidence e, MAPP (Q, e) (see Chapter 10).
In this chapter, we consider the complexity of three decision problems that correspond to these queries. In particular, given a number p, we consider the following problems:
D-PR: Is Pr(e) > p?
D-MPE: Is there a network instantiation x such that Pr(x, e) > p?
D-MAP: Given variables Q ⊆ X, is there an instantiation q such that Pr(q, e) > p?
We also consider a fourth decision problem that includes D-PR as a special case:
D-MAR: Given variables Q ⊆ X and instantiation q, is Pr(q|e) > p?
Note here that when e is the trivial instantiation, D-MAR reduces to asking whether Pr(q) > p, which is identical to D-PR.
We provide a number of results on these decision problems in this chapter. In particular, we show in Sections 11.2–11.4 that D-MPE is NP-complete, D-PR and D-MAR are PPcomplete, and D-MAP is NPPP-complete.
We discuss in this chapter a class of approximate inference algorithms based on stochastic sampling: a process by which we repeatedly simulate situations according to their probability and then estimate the probabilities of events based on the frequency of their occurrence in the simulated situations.
Introduction
Consider the Bayesian network in Figure 15.1 and suppose that our goal is to estimate the probability of some event, say, wet grass. Stochastic sampling is a method for estimating such probabilities that works by measuring the frequency at which events materialize in a sequence of situations simulated according to their probability of occurrence. For example, if we simulate 100 situations and find out that the grass is wet in 30 of them, we estimate the probability of wet grass to be 3/10. As we see later, we can efficiently simulate situations according to their probability of occurrence by operating on the corresponding Bayesian network, a process that provides the basis for many of the sampling algorithms we consider in this chapter.
The statements of sampling algorithms are remarkably simple compared to the methods for exact inference discussed in previous chapters, and their accuracy can be made arbitrarily high by increasing the number of sampled situations. However, the design of appropriate sampling methods may not be trivial as we may need to focus the sampling process on a set of situations that are of particular interest.
Automated reasoning has been receiving much interest from a number of fields, including philosophy, cognitive science, and computer science. In this chapter, we consider the particular interest of computer science in automated reasoning over the last few decades, and then focus our attention on probabilistic reasoning using Bayesian networks, which is the main subject of this book.
Automated reasoning
The interest in automated reasoning within computer science dates back to the very early days of artificial intelligence (AI), when much work had been initiated for developing computer programs for solving problems that require a high degree of intelligence. Indeed, an influential proposal for building automated reasoning systems was extended by John McCarthy shortly after the term “artificial intelligence” was coined [McCarthy, 1959]. This proposal, sketched in Figure 1.1, calls for a system with two components: a knowledge base, which encodes what we know about the world, and a reasoner (inference engine), which acts on the knowledge base to answer queries of interest. For example, the knowledge base may encode what we know about the theory of sets in mathematics, and the reasoner may be used to prove various theorems about this domain.
McCarthy's proposal was actually more specific than what is suggested by Figure 1.1, as he called for expressing the knowledge base using statements in a suitable logic, and for using logical deduction in realizing the reasoning engine; see Figure 1.2. McCarthy's proposal can then be viewed as having two distinct and orthogonal elements.
Pyramid annotation makes it possible to evaluate quantitatively and qualitatively the content of machine-generated (or human) summaries. Evaluation methods must prove themselves against the same measuring stick – evaluation – as other research methods. First, a formal assessment of pyramid data from the 2003 Document Understanding Conference (DUC) is presented; this addresses whether the form of annotation is reliable and whether score results are consistent across annotators. A combination of interannotator reliability measures of the two manual annotation phases (pyramid creation and annotation of system peer summaries against pyramid models), and significance tests of the similarity of system scores from distinct annotations, produces highly reliable results. The most rigorous test consists of a comparison of peer system rankings produced from two independent sets of pyramid and peer annotations, which produce essentially the same rankings. Three years of DUC data (2003, 2005, 2006) are used to assess the reliability of the method across distinct evaluation settings: distinct systems, document sets, summary lengths, and numbers of model summaries. This functional assessment addresses the method's ability to discriminate systems across years. Results indicate that the statistical power of the method is more than sufficient to identify statistically significant differences among systems, and that the statistical power varies little across the 3 years.
We address in this chapter a number of problems that arise in real-world applications, showing how each can be solved by modeling and reasoning with Bayesian networks.
Introduction
We consider a number of real-world applications in this chapter drawn from the domains of diagnosis, reliability, genetics, channel coding, and commonsense reasoning. For each one of these applications, we state a specific reasoning problem that can be addressed by posing a formal query with respect to a corresponding Bayesian network. We discuss the process of constructing the required network and then identify the specific queries that need to be applied.
There are at least four general types of queries that can be posed with respect to a Bayesian network. Which type of query to use in a specific situation is not always trivial and some of the queries are guaranteed to be equivalent under certain conditions. We define these query types formally in Section 5.2 and then discuss them and their relationships in more detail when we go over the various applications in Section 5.3.
The construction of a Bayesian network involves three major steps. First, we must decide on the set of relevant variables and their possible values. Next, we must build the network structure by connecting the variables into a DAG. Finally, we must define the CPT for each network variable. The last step is the quantitative part of this construction process and can be the most involved in certain situations.
We propose a novel approach to developing a tractable affective dialogue model for probabilistic frame-based dialogue systems. The affective dialogue model, based on Partially Observable Markov Decision Process (POMDP) and Dynamic Decision Network (DDN) techniques, is composed of two main parts: the slot-level dialogue manager and the global dialogue manager. It has two new features: (1) being able to deal with a large number of slots and (2) being able to take into account some aspects of the user's affective state in deriving the adaptive dialogue strategies. Our implemented prototype dialogue manager can handle hundreds of slots, where each individual slot might have hundreds of values. Our approach is illustrated through a route navigation example in the crisis management domain. We conducted various experiments to evaluate our approach and to compare it with approximate POMDP techniques and handcrafted policies. The experimental results showed that the DDN–POMDP policy outperforms three handcrafted policies when the user's action error is induced by stress as well as when the observation error increases. Further, performance of the one-step look-ahead DDN–POMDP policy after optimizing its internal reward is close to state-of-the-art approximate POMDP counterparts.
Having located a misspelling, a spellchecker generally offers some suggestions for the intended word. Even without using context, a spellchecker can draw on various types of information in ordering its suggestions. A series of experiments is described, beginning with a basic corrector that implements a well-known algorithm for reversing single simple errors, and making successive enhancements to take account of substring matches, pronunciation, known error patterns, syllable structure and word frequency. The improvement in the ordering produced by each enhancement is measured on a large corpus of misspellings. The final version is tested on other corpora against a widely used commercial spellchecker and a research prototype.
An important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of the semantic relations expressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.