We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter delves into the theory and application of reversible Markov Chain Monte Carlo (MCMC) algorithms, focusing on their role in Bayesian inference. It begins with the Metropolis–Hastings algorithm and explores variations such as component-wise updates, and the Metropolis-Adjusted Langevin Algorithm (MALA). The chapter also discusses Hamiltonian Monte Carlo (HMC) and the importance of scaling MCMC methods for high-dimensional models or large datasets. Key challenges in applying reversible MCMC to large-scale problems are addressed, with a focus on computational efficiency and algorithmic adjustments to improve scalability.
This chapter provides a comprehensive overview of the foundational concepts essential for scalable Bayesian learning and Monte Carlo methods. It introduces Monte Carlo integration and its relevance to Bayesian statistics, focusing on techniques such as importance sampling and control variates. The chapter outlines key applications, including logistic regression, Bayesian matrix factorization, and Bayesian neural networks, which serve as illustrative examples throughout the book. It also offers a primer on Markov chains and stochastic differential equations, which are critical for understanding the advanced methods discussed in later chapters. Additionally, the chapter introduces kernel methods in preparation for their application in scalable Markov Chain Monte Carlo (MCMC) diagnostics.
This chapter focuses on continuous-time MCMC algorithms, particularly those based on piecewise deterministic Markov processes (PDMPs). It introduces PDMPs as a scalable alternative to traditional MCMC, with a detailed explanation of their simulation, invariant distribution, and limiting processes. Various continuous-time samplers, including the bouncy particle sampler and zig-zag process, are compared in terms of efficiency and performance. The chapter also addresses practical aspects of simulating PDMPs, including techniques for exploiting model sparsity and data subsampling. Extensions to these methods, such as handling discontinuous target distributions or distributions defined on spaces of different dimensions, are discussed.
The development of more sophisticated and, especially, approximate sampling algorithms aimed at improving scalability in one or more of the senses already discussed in this book raises important considerations about how a suitable algorithm should be selected for a given task, how its tuning parameters should be determined, and how its convergence should be as- sessed. This chapter presents recent solutions to the above problems, whose starting point is to derive explicit upper bounds on an appropriate distance between the posterior and the approximation produced by MCMC. Further, we explain how these same tools can be adapted to provide powerful post-processing methods that can be used retrospectively to improve approximations produced using scalable MCMC.
This chapter explores the benefits of non-reversible MCMC algorithms in improving sampling efficiency. Revisiting Hamiltonian Monte Carlo (HMC), the chapter discusses the advantages of breaking detailed balance and introduces lifting schemes as a tool to enhance exploration of the parameter space. It reviews non-reversible HMC and alternative algorithms like Gustafson’s method. The chapter also covers techniques like delayed rejection and the discrete bouncy particle sampler, offering a comparison between reversible and non-reversible methods. Theoretical insights and practical implementations are provided to highlight the efficiency gains from non-reversibility.
This chapter introduces stochastic gradient MCMC (SG-MCMC) algorithms, designed to scale Bayesian inference to large datasets. Beginning with the unadjusted Langevin algorithm (ULA), it extends to more sophisticated methods such as stochastic gradient Langevin dynamics (SGLD). The chapter emphasises controlling the stochasticity in gradient estimators and explores the role of control variates in reducing variance. Convergence properties of SG-MCMC methods are analysed, with experiments demonstrating their performance in logistic regression and Bayesian neural networks. It concludes by outlining a general framework for SG-MCMC and offering practical guidance for efficient, scalable Bayesian learning.
This chapter examines the critical role of evaluation within the framework of recommender systems, highlighting its significance alongside system construction. We identify three key aspects of evaluation: the impact of metrics on optimization quality, the collaborative nature of evaluation efforts across teams, and the alignment of chosen metrics with organizational goals. Our discussion spans a comprehensive range of evaluation techniques, from offline methods to online experiments. We explore offline evaluation methods and metrics, offline simulation through replay, online A/B testing, and fast online evaluation via interleaving. Ultimately, we propose a multilayer evaluation architecture that integrates these diverse methods to enhance the scientific rigor and efficiency of recommender system assessments.
The introduction of advanced deep learning models such as Microsoft’s Deep Crossing, Google’s Wide&Deep, and others like FNN and PNN in 2016 marked a significant shift in the field of recommender systems and computational advertising, establishing deep learning as the dominant approach. This chapter discusses the evolution of traditional recommendation models and highlights two main advancements in deep learning models: enhanced expressivity for uncovering hidden data patterns and flexible model structures tailored to specific business use cases. Drawing on techniques from computer vision, speech, and natural language processing, deep learning recommendation models have rapidly evolved. The chapter summarizes several influential deep learning models and constructs an evolution map. These models are selected based on their industry impact and their role in advancing deep learning recommender systems. Additionally, the chapter will introduce applications of Large Language Models (LLMs) in recommender systems, exploring how these models further enhance recommendation technologies.
This chapter explores the integration of deep learning in recommender systems, highlighting its significance as a leading application area with substantial business value. We examine notable advancements driven by industry leaders like Meta, Google, Airbnb, and Alibaba. These innovations mark a transformative shift toward deep learning in recommender systems, evidenced by Alibaba’s ongoing innovations in e-commerce and Airbnb’s applications in search and recommendation. For aspiring recommender system engineers, the current era of open-source code and knowledge sharing provides unparalleled access to cutting-edge applications and insights from industry pioneers. This chapter aims to build a foundational understanding of deep learning recommender systems developed by Meta, Airbnb, YouTube, and Alibaba, encouraging readers to focus on technical details and engineering practices for practical application.
This concluding chapter revisits the overarching architecture of recommender systems, encouraging readers to synthesize the technical details discussed throughout the book into a cohesive knowledge framework. Initially introduced in Chapter 1, the technical architecture diagram serves as a foundational reference for understanding the field. With a comprehensive overview of each module now complete, readers are invited to refine their interpretations of the architecture. Establishing a personal knowledge framework is crucial for identifying gaps, appreciating details, and maintaining a holistic view of the subject.
Embedding technology plays a pivotal role in deep learning, particularly in industries such as recommendation, advertising, and search. It is considered a fundamental operation for transforming sparse vectors into dense representations that can be further processed by neural networks. Beyond its basic role, embedding technology has evolved significantly in both academia and industry, with applications ranging from sequence processing to multifeature heterogeneous data. This chapter discusses the basics of embedding, its evolution from Word2Vec to graph embeddings and multifeature fusion, and its applications in recommender systems, with an emphasis on online deployment and inference.