Content Listing

15 - Proximal and Mirror-Descent Methods
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp 507-546
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Author Index
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp 1009-1032
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

10 - Lipschitz Conditions
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp 330-340
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

62 - Bagging and Boosting
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

24 February 2023

Print publication:

22 December 2022, pp 2557-2586
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we describe two ensemble learning techniques, known as bagging and boosting, which aggregate the decisions of a mixture of learners to enable enhanced classification performance. In particular, they help transform a collection of “weak” learners into a more robust learning machine.

40 - Independent Component Analysis
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 March 2023

Print publication:

22 December 2022, pp 1609-1642
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The expectation-maximization (EM) and Baum–Welch algorithms are particularly useful for the processing of data arising from mixture models. Both techniques enable us to identify the parameters of the underlying components, for both cases when the observations are independent of each other or follow a first-order Markovian process. In this chapter, we consider another important example of a mixture model consisting of a collection of independent sources, a mixture matrix, and the observations. The objective is to undo the mixing and recover the original sources. The resulting technique is known as independent component analysis (ICA).

3 - Random Variables
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp 68-131
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

34 - Expectation Propagation
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 March 2023

Print publication:

22 December 2022, pp 1352-1379
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The Laplace method approximates the posterior distribution $f_{z | y} (z | y)$ through a Gaussian probability density function (pdf) that is not always accurate. The Markov chain Monte Carlo (MCMC) method, on the other hand, relies on sampling from auxiliary (proposal) distributions and provides a powerful way to approximate posterior distributions albeit through repeated simulations. In this chapter, we describe a third approach for approximating the posterior distribution, known as expectation propagation (EP). This method restricts the class of distributions from which the posterior is approximated to the Gaussian or exponential family and assumes a factored form for the posterior. The method can become analytically demanding, depending on the nature of the factors used for the posterior, because these factors can make the computation of certain moments unavailable in closed form. The EP method has been observed to lead to good performance in some applications such as the Bayesian logit classification problem, but this behavior is not universal and performance can degrade for other problems, especially when the posterior distribution admits a mixture model.

53 - Self-Organizing Maps
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

24 February 2023

Print publication:

22 December 2022, pp 2290-2312
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The $k$ -nearest neighbor ( $k$ -NN) rule is appealing. However, each new feature $h \in R^{M}$ requires searching over the entire training set of size $N$ to determine the neighborhood around $h$ .

56 - Linear Discriminant Analysis
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

24 February 2023

Print publication:

22 December 2022, pp 2357-2382
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we describe three other data-based generative methods that approximate the solution to the optimal Bayes classifier (52.8) in the absence of knowledge of the conditional probabilities $ℙ (r = r | h = h)$ . The methods estimate the prior probabilities $ℙ (r = r)$ for the classes and, in some cases, assume a Gaussian form for the reverse conditional distribution, $f_{h | r} (h | r)$ . The training data is used to estimate the priors and the first-and second-order moments of $f_{h | r} (h | r)$ .

43 - Undirected Graphs
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 March 2023

Print publication:

22 December 2022, pp 1740-1806
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The discussion in the last two chapters focused on directed graphical models or Bayesian networks, where a directed link from a variable $x_{1}$ toward another variable $x_{2}$ carries with it an implicit connotation of “causal effect” by $x_{1}$ on $x_{2}$ . In many instances, this implication need not be appropriate or can even be limiting. For example, there are cases where conditional independence relations cannot be represented by a directed graph. One such example is provided in Prob. 43.1. In this chapter, we examine another form of graphical representations where the links are not required to be directed anymore, and the probability distributions are replaced by potential functions. These are strictly positive functions defined over sets of connected nodes; they broaden the level of representation by graphical models. The potential functions carry with them a connotation of “similarity” or “affinity” among the variables, but can also be rolled back to represent probability distributions. Over undirected graphs, edges linking nodes will continue to reflect pairwise relationship between the variables but will lead to a fundamental factorization result in terms of the product of clique potential functions. We will show that these functions play a prominent role in the development of message-passing algorithms for the solution of inference problems.

Reviews
Michael Silverstein, University of Chicago
Book:

Language in Culture

Published online:

22 December 2022

Print publication:

22 December 2022, pp ii-ii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

41 - Bayesian Networks
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 March 2023

Print publication:

22 December 2022, pp 1643-1681
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The inference of a random variable $x$ from observations ${y_{1}, y_{2}, \dots, y_{N}}$ requires that we evaluate the posterior distribution $f_{x ∣ y_{1 : N}} (x ∣ y_{1}, \dots, y_{N})$ as happens, for example, in inference formulations based on mean-square-error (MSE), maximum a-posteriori (MAP), or probability of error metrics. In previous chapters, we described several techniques to facilitate the computation or approximation of such posterior distributions using Monte Carlo or variational inference methods. We will encounter other types of approximations in later chapters. For example, in the context of naïve Bayes classifiers in Chapter 55, we will assume that, conditioned on the latent variable $x$ , the observations are independent of each other in order to write

Contents
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp vii-xxvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

21 - Convergence Analysis III: Stochastic Proximal Algorithms
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp 756-778
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

45 - Value and Policy Iterations
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 March 2023

Print publication:

22 December 2022, pp 1853-1916
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We continue our treatment of Markov decision processes (MDPs) and focus in this chapter on methods for determining optimal actions or policies. We derive two popular methods known as value iteration and policy iteration, and establish their convergence properties. We also examine the Bellman optimality principle in the context of value and policy learning. In a later section, we extend the discussion to the more challenging case of partially observable MDPs (POMDPs), where the successive states of the MDP are unobservable to the agent, and the agent is only able to sense measurements emitted randomly by the MDP from the various states. We will define POMDPs and explain that they can be reformulated as belief‐MDPs with continuous (rather than discrete) states. This fact complicates the solution of the value iteration. Nevertheless, we will show that the successive value iterates share a useful property, namely, that they are piecewise linear and convex. This property can be exploited by computational methods to reduce the complexity of solving the value iteration for POMDPs.

18 - Gradient Noise
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 February 2023

Print publication:

22 December 2022, pp 642-682
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

59 - Logistic Regression
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

24 February 2023

Print publication:

22 December 2022, pp 2457-2498
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we describe a popular discriminative approach for classification problems, known as logistic regression. Assuming binary classification with labels $γ \in {\pm 1}$ and features $h \in R^{M}$ , we explained earlier in expression (28.85) that the optimal Bayes classifier for predicting $γ$ is given by

54 - Decision Trees
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

24 February 2023

Print publication:

22 December 2022, pp 2313-2340
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We mentioned earlier in Section 52.3 that the nearest-neighbor (NN) rule for classification and clustering treats equally all attributes within each feature vector, $h_{n} \in R^{M}$ . If, for example, some attributes are more relevant to the classification task than other attributes, then this aspect is ignored by the NN classifier because all entries of the feature vector will contribute similarly to the calculation of Euclidean distances and the determination of neighborhoods.

Contents
Ali H. Sayed, École Polytechnique Fédérale de Lausanne
Book:

Inference and Learning from Data

Published online:

17 March 2023

Print publication:

22 December 2022, pp vii-xxvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Editorial Acknowledgments
- By E. Summerson Carr, Susan Gal, Constantine V. Nakassis
Michael Silverstein, University of Chicago
Book:

Language in Culture

Published online:

22 December 2022

Print publication:

22 December 2022, pp 334-336
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As readers of Michael Silverstein’s works know, his published papers are always carefully annotated with the places they have been delivered as “talks,” the persons who invited him to speak or write, and the thanks to those who commented and contributed in different ways to the finished product. Tragically, he cannot append that form of acknowledgment to this book. Michael Silverstein died in July 2020, after a year of illness and in the midst of editing and polishing the manuscript of this book. To follow his usual practice of providing a natural history of this text seems a fitting form of acknowledgment to those devoted colleagues who participated in bringing this book to publication.

Textbooks

Refine search

Refine search

Actions for selected content:

36807 results in Cambridge Textbooks

15 - Proximal and Mirror-Descent Methods

Author Index

10 - Lipschitz Conditions

62 - Bagging and Boosting

Summary

40 - Independent Component Analysis

Summary

3 - Random Variables

34 - Expectation Propagation

Summary

53 - Self-Organizing Maps

Summary

56 - Linear Discriminant Analysis

Summary

43 - Undirected Graphs

Summary

Reviews

41 - Bayesian Networks

Summary

Contents

21 - Convergence Analysis III: Stochastic Proximal Algorithms

45 - Value and Policy Iterations

Summary

18 - Gradient Noise

59 - Logistic Regression

Summary

54 - Decision Trees

Summary

Contents

Editorial Acknowledgments

Summary

Textbooks

Refine search

Refine search

Actions for selected content:

Save Search

36807 results in Cambridge Textbooks

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary