To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
As seen in the preceding chapter, when a reliable model is available to describe the probabilistic relationship between input variable x and target variable t, one is faced with a model-based prediction problem, also known as inference. Inference can in principle be optimally addressed by evaluating functions of the posterior distribution of the output t given the input x.
This chapter provides a refresher on probability and linear algebra with the aim of reviewing the necessary background for the rest of the book. Readers not familiar with probability and linear algebra are invited to first consult one of the standard textbooks mentioned in Recommended Resources, Sec. 2.14. Readers well versed on these topics may briefly skim through this chapter to get a sense of the notation used in the book.
In the examples studied in Chapter 4, the exact optimization of the (regularized) training loss was feasible through simple numerical procedures or via closed-form analytical solutions. In practice, exact optimization is often computationally intractable, and scalable implementations must rely on approximate optimization methods that perform local, iterative updates in search of an optimized solution. This chapter provides an introduction to local optimization methods for machine learning.
The previous chapter, as well as Chapter 4, have focused on supervised learning problems, which assume the availability of a labeled training set . A labeled data set consists of examples in the form of pairs (𝑥, 𝑡) of input 𝑥 and desired output 𝑡.
This chapter aims to motivate the study of machine learning, having in mind as the intended audience students and researchers with an engineering background.
This final chapter covers topics that build on the material discussed in the book, with the aim of pointing to avenues for further study and research. The selection of topics is clearly a matter of personal choice, but care has been taken to present both well-established topics, such as probabilistic graphical models, and emerging ones, such as causality and quantum machine learning. The topics are distinct, and each section can be read separately. The presentation is brief, and only meant as a launching pad for exploration.
As discussed so far in this book, the standard formulation of machine learning makes the following two basic assumptions: 1. Statistical equivalence of training and testing. The statistical properties of the data observed during training match those to be experienced during testing – i.e., the population distribution underlying the generation of the data is the same during both training and testing. 2. Separation of learning tasks. Training is carried out separately for each separate learning task – i.e., for any new data set and/or loss function, training is viewed as a new problem to be addressed from scratch.
In this chapter, we use the optimization tools presented in Chapter 5 to develop supervised learning algorithms that move beyond the simple settings studied in Chapter 4 for which the training problem could be solved exactly, typically by addressing an LS problem. We will focus specifically on binary and multi-class classification, with a brief discussion at the end of the chapter about the (direct) extension to regression problems. Following Chapter 4, the presentation will mostly concentrate on parametric model classes, but we will also touch upon mixture models and non-parametric methods.
This chapter focuses on three key problems that underlie the formulation of many machine learning methods for inference and learning, namely variational inference (VI), amortized VI, and variational expectation maximization (VEM). We have already encountered these problems in simplified forms in previous chapters, and they will be essential in developing the more advanced techniques to be covered in the rest of the book. Notably, VI and amortized VI underpin optimal Bayesian inference, which was used, e.g., in Chapter 6 to design optimal predictors for generative models; and VEM generalizes the EM algorithm that was introduced in Chapter 7 for training directed generative latent-variable models.
The previous chapters have adopted a limited range of probabilistic models, namely Bernoulli and categorical distributions for discrete rvs and Gaussian distributions for continuous rvs. While these are common modeling choices, they clearly do not represent many important situations of interest for machine learning applications. For instance, discrete data may a priori take arbitrarily large values, making categorical models unsuitable. Continuous data may need to satisfy certain constraints, such as non-negativity, rendering Gaussian models far from ideal.