We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
As one often encounters datasets with more than a few variables, multivariate statistical techniques are needed to extract the information contained in these datasets effectively. In the environmental sciences, examples of multivariate datasets are ubiquitous – the air temperatures recorded by all the weather stations around the globe, the satellite infrared images composed of numerous small pixels, the gridded output from a general circulation model, etc. The number of variables or time series from these datasets ranges from thousands to millions.Without a mastery of multivariate techniques, one is overwhelmed by these gigantic datasets. In this chapter, we review the principal component analysis method and its many variants, and the canonical correlation analysis method. These methods, using standard matrix techniques such as singular value decomposition, are relatively easy to use, but suffer from being linear, a limitation which will be lifted with neural network and kernel methods in later chapters.
Principal component analysis (PCA)
Geometric approach to PCA
We have a dataset with variables y1, …, ym. These variables have been sampled n times. In many situations, the m variables are m time series each containing n observations in time. For instance, one may have a dataset containing the monthly air temperature measured at m stations over n months.
Broadly speaking, the goal of (mainstream) learning theory is to approximate a function (or some function features) from data samples, perhaps perturbed by noise. To attain this goal, learning theory draws on a variety of diverse subjects. It relies on statistics whose purpose is precisely to infer information from random samples. It also relies on approximation theory, since our estimate of the function must belong to a prespecified class, and therefore the ability of this class to approximate the function accurately is of the essence. And algorithmic considerations are critical because our estimate of the function is the outcome of algorithmic procedures, and the efficiency of these procedures is crucial in practice. Ideas from all these areas have blended together to form a subject whose many successful applications have triggered its rapid growth during the past two decades.
This book aims to give a general overview of the theoretical foundations of learning theory. It is not the first to do so.Yet we wish to emphasize a viewpoint that has drawn little attention in other expositions, namely, that of approximation theory. This emphasis fulfills two purposes. First, we believe it provides a balanced view of the subject. Second, we expect to attract mathematicians working on related fields who find the problems raised in learning theory close to their interests.
While writing this book, we faced a dilemma common to the writing of any book in mathematics: to strike a balance between clarity and conciseness. In particular, we faced the problem of finding a suitable degree of self-containment for a book relying on a variety of subjects.
This book by Felipe Cucker and Ding-Xuan Zhou provides solid mathematical foundations and new insights into the subject called learning theory.
Some years ago, Felipe and I were trying to find something about brain science and artificial intelligence starting from literature on neural nets. It was in this setting that we encountered the beautiful ideas and fast algorithms of learning theory. Eventually we were motivated to write on the mathematical foundations of this new area of science.
I have found this arena to with its new challenges and growing number of application, be exciting. For example, the unification of dynamical systems and learning theory is a major problem.Another problem is to develop a comparative study of the useful algorithms currently available and to give unity to these algorithms. How can one talk about the “best algorithm” or find the most appropriate algorithm for a particular task when there are so many desirable features, with their associated trade-offs? How can one see the working of aspects of the human brain and machine vision in the same framework?
I know both authors well. I visited Felipe in Barcelona more than 13 years ago for several months, and when I took a position in Hong Kong in 1995, I asked him to join me. There Lenore Blum, Mike Shub, Felipe, and I finished a book on real computation and complexity. I returned to the USA in 2001, but Felipe continues his job at the City University of Hong Kong. Despite the distance we have continued to write papers together. I came to know Ding-Xuan as a colleague in the math department at City University. We have written…