Search results for Pattern Recognition and Machine Learning

15 - Deep Learning
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 494-517
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

NN models with more hidden layers than the traditional NN are referred to as deep neural network (DNN) or deep learning (DL) models, which are now widely used in environmental science. For image data, the convolutional neural network (CNN) has been developed, where in convolutional layers, a neuron is only connected to a small patch of neurons in the preceding layer, thereby greatly reducing the number of model weights. Popular architectures of DNN include the encoder-decoder and U-net models. For time series modelling, the long short-term memory (LSTM) network and temporal convolutional network have been developed. Generative adversarial network (GAN) produces highly realistic fake data.

9 - Principal Components and Canonical Correlation
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 283-329
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Principal component analysis (PCA), a classical method for reducing the dimensionality of multivariate datasets, linearly combines the variables to generate new uncorrelated variables that maximize the amount of variance captured. Rotation of the PCA modes is commonly performed to provide more meaningful interpretation. Canonical correlation analysis (CCA) is a generalization of correlation (for two variables) to two groups of variables, with CCA finding modes of maximum correlation between the two groups. Instead of maximum correlation, maximum covariance analysis extracts modes with maximum covariance.

16 - Forecast Verification and Post-processing
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 518-548
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Forecast verification evaluates the quality of the forecasts made by a model, using a variety of forecast scores developed for binary classes, multiple classes, continuous variables and probabilistic forecasts. Skill scores estimate a model’s skill relative to a reference model or benchmark. Problems such as spurious skills and extrapolation with new data are discussed. Model bias in the output predicted by numerical models is alleviated by post-processing methods, while output from numerical models with low spatial resolution is enhanced by downscaling methods, especially in climate change studies.

Index
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 613-628
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 573-612
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Non-linear Optimization
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 216-244
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Many machine learning methods require non-linear optimization, performed by the backward propagation of model errors, with the process complicated by the presence of multiple minima and saddle points. Numerous gradient descent algorithms are available for optimization, including stochastic gradient descent, conjugate gradient, quasi-Newton and non-linear least squares such as Levenberg-Marquardt. In contrast to deterministic optimization, stochastic optimization methods repeatedly introduce randomness during the search process to avoid getting trapped in a local minimum. Evolutionary algorithms, borrowing concepts from evolution to solve optimization problems, include genetic algorithm and differential evolution.

12 - Classification
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 418-439
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Under supervised learning, when the output variable is discrete or categorical instead of continuous, one has a classification problem instead of a regression problem. Several classification methods are covered: linear discriminant analysis, logistic regression, naive Bayes classifier, K-nearest neighbours, extreme learning machine classifier and multi-layer perceptron classifier. In classification, the cross-entropy objective function is often used in place of the mean squared error function.

14 - Decision Trees, Random Forests and Boosting
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 473-493
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A decision tree is a tree-like model of decisions and their consequences, with classification and regression tree (CART) being the most commonly used. Being simple models, decision trees are considered ’weak learners’ relative to more complex and more accurate models. By using a large ensemble of weak learners, methods such as random forest can compete well against strong learners such as neural networks. An alternative to random forest is boosting. While random forest constructs all the trees independently, boosting constructs one tree at a time. At each step, boosting tries to a build a weak learner that improves on the previous one.

Preface
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp xv-xvii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendices
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 569-572
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Time Series
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 372-417
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Under time series analysis, one proceeds from Fourier analysis to the design of windows, then spectral analysis (e.g. computing the spectrum, the cross-spectrum between two time series, wavelets, etc.) and the filtering of frequency signals. The principal component analysis method can be turned into a spectral method known as singular spectrum analysis. Auto-regressive processes and Box-Jenkins models are also covered.

3 - Probability Distributions
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 65-100
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As probability distributions form the cornerstone of statistics, a survey is made of the common families of distributions, including the binomial distribution, Poisson distribution, multinomial distribution, Gaussian distribution, gamma distribution, beta distribution, von Mises distribution, extreme value distributions, t-distribution and chi-squared distribution. Other topics include maximum likelihood estimation, Gaussian mixtures and kernel density estimation.

6 - Neural Networks
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 173-215
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Inspired by the human brain, neural network (NN) models have emerged as the dominant branch of machine learning, with the multi-layer perceptron (MLP) model being the most popular. Non-linear optimization and the presence of local minima during optimization led to interests in other NN architectures that only require linear least squares optimization, e.g. extreme learning machines (ELM) and radial basis functions (RBF). Such models readily adapt to online learning, where a model can be updated inexpensively as new data arrive continually. Applications of NN to predict conditional distributions (by the conditional density network and the mixture density network) and to perform quantile regression are also covered.

2 - Basics
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 19-64
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A review of basic probability theory – probability density, expectation, mean, variance/covariance, median, median absolute deviation, quantiles, skewness/kurtosis and correlation – is first given. Exploratory data analysis methods (histograms, quantile-quantile plots and boxplots) are then introduced. Finally, topics including Mahalanobis distance, Bayes theorem, classification, clustering and information theory are covered.

5 - Linear Regression
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 137-172
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Simple linear regression is extended to multiple linear regression (for multiple predictor variables) and to multivariate linear regression for (multiple response variables). Regression with circular data and/or categorical data is covered. How to select predictors and how to avoid overfitting with techniques such as ridge regression and lasso are followed by quantile regression. The assumption of Gaussian noise or residual is removed in generalized least squares, with applications to optimal fingerprinting in climate change.

1 - Introduction
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 1-18
- Chapter
- - You have access
- PDF
- Export citation
Summary

The historical development of statistics and artificial intelligence (AI) is outlined, with machine learning (ML) emerging as the dominant branch of AI. Data science is viewed as being composed of a yin part (ML) and a yang part (statistics), and environmental data science is the intersection between data science and environmental science. Supervised learning and unsupervised learning are compared. Basic concepts of underfitting/overfitting and the curse of dimensionality are introduced.

Notation Used
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp xviii-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Frontmatter
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Abbreviations
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp xix-xx
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Statistical Inference
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 101-136
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

From observed data, statistical inference infers the properties of the underlying probability distribution. For hypothesis testing, the t-test and some non-parametric alternatives are covered. Ways to infer confidence intervals and estimate goodness of fit are followed by the F-test (for test of variances) and the Mann-Kendall trend test. Bootstrap sampling and field significance are also covered.

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2275 results in Pattern Recognition and Machine Learning

15 - Deep Learning

Summary

9 - Principal Components and Canonical Correlation

Summary

16 - Forecast Verification and Post-processing

Summary

Index

References

7 - Non-linear Optimization

Summary

12 - Classification

Summary

14 - Decision Trees, Random Forests and Boosting

Summary

Preface

Appendices

11 - Time Series

Summary

3 - Probability Distributions

Summary

6 - Neural Networks

Summary

2 - Basics

Summary

5 - Linear Regression

Summary

1 - Introduction

Summary

Notation Used

Frontmatter

Abbreviations

4 - Statistical Inference

Summary

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2275 results in Pattern Recognition and Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary