Search

Message in a bottle: Forecasting wine prices
Bernardina Algieri, Leonardo Iania, Arturo Leccadito, Giulia Meloni
Journal:

Journal of Wine Economics / Volume 19 / Issue 1 / February 2024

Published online by Cambridge University Press:

07 May 2024, pp. 64-91
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Can we predict fine wine and alcohol prices? Yes, but it depends on the forecasting horizon. We make this point by considering the Liv-ex Fine Wine 100 and 50 Indices, the retail and wholesale alcohol prices in the United States for the period going from January 1992 to March 2022. We use rich and diverse datasets of economic, survey, and financial variables as potential price drivers and adopt several combination/dimension reduction techniques to extract the most relevant determinants. We build a comprehensive set of models and compare forecast performances across different selling levels and alcohol categories. We show that it is possible to predict fine wine prices for the 2-year horizon and retail/wholesale alcohol prices at horizons ranging from 1 month to 2 years. Our findings stress the importance of including consumer survey data and macroeconomic factors, such as international economic factors and developed markets equity risk factors, to enhance the precision of predictions of retail/wholesale (fine wine) prices.

Comparing flow-based and anatomy-based features in the data-driven study of nasal pathologies
Andrea Schillaci, Kazuto Hasegawa, Carlotta Pipolo, Giacomo Boracchi, Maurizio Quadrio
Journal:

Flow: Applications of Fluid Mechanics / Volume 4 / 2024

Published online by Cambridge University Press:

25 April 2024, E5
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In several problems involving fluid flows, computational fluid dynamics (CFD) provides detailed quantitative information and allows the designer to successfully optimize the system by minimizing a cost function. Sometimes, however, one cannot improve the system with CFD alone, because a suitable cost function is not readily available; one notable example is diagnosis in medicine. The application considered here belongs to the field of rhinology; a correct air flow is key for the functioning of the human nose, yet the notion of a functionally normal nose is not available and a cost function cannot be written. An alternative and attractive pathway to diagnosis and surgery planning is offered by data-driven methods. In this work, we consider the machine learning study of nasal impairment caused by anatomic malformations, with the aim of understanding whether fluid dynamic features, available after a CFD analysis, are more effective than purely geometric features at the training of a neural network for regression. Our experiments are carried out on an extremely simplified anatomic model and a correspondingly simple CFD approach; nevertheless, they show that flow-based features perform better than geometry-based ones and allow the training of a neural network with fewer inputs, a crucial advantage in fields like medicine.

Neural networks with dimensionality reduction for predicting temperature change due to plastic deformation in a cold rolling simulation
Chun Kit Jeffery Hou, Kamran Behdinan
Journal:

AI EDAM / Volume 37 / 2023

Published online by Cambridge University Press:

06 January 2023, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Cold rolling involves large deformation of the workpiece leading to temperature increase due to plastic deformation. This process is highly nonlinear and leads to large computation times to fully model the process. This paper describes the use of dimension-reduced neural networks (DR-NNs) for predicting temperature changes due to plastic deformation in a two-stage cold rolling process. The main objective of these models is to reduce computational demand, error, and uncertainty in predictions. Material properties, feed velocity, sheet dimensions, and friction models are introduced as inputs for the dimensionality reduction. Different linear and nonlinear dimensionality reduction methods reduce the input space to a smaller set of principal components. The principal components are fed as inputs to the neural networks for predicting the output temperature change. The DR-NNs are compared against a standalone neural network and show improvements in terms of lower computational time and prediction uncertainty.

Principal component density estimation for scenario generation using normalizing flows
Eike Cramer, Alexander Mitsos, Raúl Tempone, Manuel Dahmen
Journal:

Data-Centric Engineering / Volume 3 / 2022

Published online by Cambridge University Press:

25 March 2022, e7
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Neural networks-based learning of the distribution of non-dispatchable renewable electricity generation from sources, such as photovoltaics (PV) and wind as well as load demands, has recently gained attention. Normalizing flow density models are particularly well suited for this task due to the training through direct log-likelihood maximization. However, research from the field of image generation has shown that standard normalizing flows can only learn smeared-out versions of manifold distributions. Previous works on normalizing flow-based scenario generation do not address this issue, and the smeared-out distributions result in the sampling of noisy time series. In this paper, we exploit the isometry of the principal component analysis (PCA), which sets up the normalizing flow in a lower-dimensional space while maintaining the direct and computationally efficient likelihood maximization. We train the resulting principal component flow (PCF) on data of PV and wind power generation as well as load demand in Germany in the years 2013–2015. The results of this investigation show that the PCF preserves critical features of the original distributions, such as the probability density and frequency behavior of the time series. The application of the PCF is, however, not limited to renewable power generation but rather extends to any dataset, time series, or otherwise, which can be efficiently reduced using PCA.

Strategies for EELS Data Analysis. Introducing UMAP and HDBSCAN for Dimensionality Reduction and Clustering
Javier Blanco-Portals, Francesca Peiró, Sònia Estradé
Journal:

Microscopy and Microanalysis / Volume 28 / Issue 1 / February 2022

Published online by Cambridge University Press:

22 November 2021, pp. 109-122

Print publication:

February 2022
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Hierarchical density-based spatial clustering of applications with noise (HDBSCAN) and uniform manifold approximation and projection (UMAP), two new state-of-the-art algorithms for clustering analysis, and dimensionality reduction, respectively, are proposed for the segmentation of core-loss electron energy loss spectroscopy (EELS) spectrum images. The performances of UMAP and HDBSCAN are systematically compared to the other clustering analysis approaches used in EELS in the literature using a known synthetic dataset. Better results are found for these new approaches. Furthermore, UMAP and HDBSCAN are showcased in a real experimental dataset from a core–shell nanoparticle of iron and manganese oxides, as well as the triple combination nonnegative matrix factorization–UMAP–HDBSCAN. The results obtained indicate how the complementary use of different combinations may be beneficial in a real-case scenario to attain a complete picture, as different algorithms highlight different aspects of the dataset studied.

13 - Graph Signal Processing for the Power Grid
from Part IV - Signal Processing
- By Anna Scaglione, Raksha Ramakrishna, Mahdi Jamei
Edited by Ali Tajer, Rensselaer Polytechnic Institute, New York, Samir M. Perlaza, H. Vincent Poor, Princeton University, New Jersey
Book:

Advanced Data Analytics for Power Systems

Published online:

22 March 2021

Print publication:

08 April 2021, pp 312-339
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Graph Signal Processing (GSP) is a general theory, whose goal is to bring about tools for graph signals analysis that are a direct generalization of Digital Signal Processing (DSP). The goal of this chapter is understanding the graph-spectral properties of the signals, which are typically explained through the linear generative model using graph filters. Are PMU a graph signal that obeys the linear generative model prevalent in the literature? If so, what kind of graph-filter structure and excitation justifies the properties discussed already? Can we derive new strategies to sense and process these data based on GSP? By putting the link between PMU data and GSP on the right footing, we can determine to what extent GSP tools are useful, and specify how we can use the basic equations for gaining theoretical insight that support the observations.

5 - Principal Component Analysis
from Part II - Domain-Independent Feature Extraction
Jianxin Wu, Nanjing University, China
Book:

Essentials of Pattern Recognition

Published online:

08 December 2020

Print publication:

19 November 2020, pp 101-122
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Part II introduces domain-independent feature extraction methods, and this chapter presents principal component analysis (PCA). We start from its motivation, using an example. Then we gradually discover and develop the PCA algorithm: starting from zero dimensions, then one dimension, and finally the complete algorithm. We analyze its errors in ideal and practical conditions, and establish the equivalence between maximum variance and minimum reconstruction error. Two important issues are also discussed: when we can use PCA, and the relationship between PCA and SVD (singular value decomposition).

4 - Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
from Part One - Fundamentals
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 79-111
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents a staple of Feature Engineering: the automatic reduction of features, either by direction selection or by projection to a smaller feature space.Central to Feature Engineering are efforts to reduce the number of features, as uninformative features bloat the ML model with unnecessary parameters. In turn, too many parameters then either produces suboptimal results, as they are easy to overfit, or require large amounts of training data. These efforts are either by explicitly dropping certain features (feature selection) or mapping the feature vector, if it is sparse, into a lower, denser dimension (dimensionality reduction). There are also cover some algorithms that perform feature selection as part of their inner computation (embedded feature selection or regularization). Feature selection takes the spotlight within Feature Engineering due to its intrinsic utility for Error Analysis. Some techniques such as feature ablation using wrapper methods are used as the starting step before a feature drill down. Moreover, as feature selection helps build understandable models, it intertwines with Error Analysis as the analysis profits from such understandable models.

The Art of Feature Engineering

Essentials for Machine Learning
Pablo Duboue
Published online:

29 May 2020

Print publication:

25 June 2020
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
When machine learning engineers work with data sets, they may find the results aren't as good as they need. Instead of improving the model or collecting more data, they can use the feature engineering process to help improve results by modifying the data's features to better capture the nature of the problem. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques, the text builds up to a unique cross-domain approach that spans data on graphs, texts, time series, and images, with fully worked out case studies. Key topics include binning, out-of-fold estimation, feature selection, dimensionality reduction, and encoding variable-length data. The full source code for the case studies is available on a companion website as Python Jupyter notebooks.

Mining of Massive Datasets

3rd edition
Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman
Published online:

16 April 2020

Print publication:

09 January 2020
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Written by leading authorities in database and Web technologies, this book is essential reading for students and practitioners alike. The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the MapReduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream-processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets, and clustering. This third edition includes new and extended coverage on decision trees, deep learning, and mining social-network graphs.

11 - Dimensionality Reduction
Jure Leskovec, Stanford University, California, Anand Rajaraman, Jeffrey David Ullman, Stanford University, California
Book:

Mining of Massive Datasets

Published online:

16 April 2020

Print publication:

09 January 2020, pp 410-440
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we shall explore the idea of dimensionality reduction in more detail. We begin with a discussion of eigenvalues and their use in “principal component analysis” (PCA). We cover singular-value decomposition, a more powerful version of UV-decomposition. Finally, because we are always interested in the largest data sizes we can handle, we look at another form of decomposition, called CUR-decomposition, which is a variant of singular-value decomposition that keeps the matrices of the decomposition sparse if the original matrix is sparse.

Spatial modelling of soil organic carbon stocks with combined principal component analysis and geographically weighted regression
Long Guo, Mei Luo, Chengsi Zhangyang, Chen Zeng, Shanqin Wang, Haitao Zhang
Journal:

The Journal of Agricultural Science / Volume 156 / Issue 6 / August 2018

Published online by Cambridge University Press:

18 October 2018, pp. 774-784
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
With the development of remote sensing and geostatistical technology, complex environmental variables are increasingly easily quantified and applied in modelling soil organic carbon (SOC). However, this emphasizes data redundancy and multicollinearity problems adding to the difficulty in selecting dominant influential auxiliary variables and uncertainty in estimating SOC stocks. The current paper considers the spatial characteristics of SOC density (SOCD) to construct prediction models of SOCD on the basis of reducing the data dimensionality and complexity using the principal component analysis (PCA) method. A total of 260 topsoil samples were collected from Chahe town, China. Eight environmental variables (elevation, aspect, slope, normalized difference vegetation index, normalized difference moisture index, nearest distance to construction area and road, and land use degree comprehensive index) were pre-analysed by PCA and then extracted as the main principal component variables to construct prediction models. Two geostatistical approaches (ordinary kriging and ordinary co-kriging) and two regression approaches (ordinary least squares and geographically weighted regression (GWR)) were used to estimate SOCD. Results showed that PCA played an important role in reducing the redundancy and multicollinearity of the auxiliary variables and GWR achieved the highest prediction accuracy in these four models. GWR considered not only the spatial characteristics of SOCD but also the related valuable information of the auxiliary attributes. In summary, PCA-GWR is a promising spatial method used here to predict SOC stocks.

Adaptive Dimensionality-Reduction for Time-Stepping in Differential and Partial Differential Equations
Part of
Xing Fu, J. Nathan Kutz
Journal:

Numerical Mathematics: Theory, Methods and Applications / Volume 10 / Issue 4 / November 2017

Published online by Cambridge University Press:

12 September 2017, pp. 872-894

Print publication:

November 2017
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A numerical time-stepping algorithm for differential or partial differential equations is proposed that adaptively modifies the dimensionality of the underlying modal basis expansion. Specifically, the method takes advantage of any underlying low-dimensional manifolds or subspaces in the system by using dimensionality-reduction techniques, such as the proper orthogonal decomposition, in order to adaptively represent the solution in the optimal basis modes. The method can provide significant computational savings for systems where low-dimensional manifolds are present since the reduction can lower the dimensionality of the underlying high-dimensional system by orders of magnitude. A comparison of the computational efficiency and error for this method are given showing the algorithm to be potentially of great value for high-dimensional dynamical systems simulations, especially where slow-manifold dynamics are known to arise. The method is envisioned to automatically take advantage of any potential computational saving associated with dimensionality-reduction, much as adaptive time-steppers automatically take advantage of large step sizes whenever possible.

Performance study of dimensionality reduction methods formetrology of nonrigid mechanical parts
H. Radvar-Esfahlan, S.-A. Tahan
Journal:

International Journal of Metrology and Quality Engineering / Volume 4 / Issue 3 / 2013

Published online by Cambridge University Press:

06 March 2014, pp. 193-200

Print publication:

2013
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The geometric measurement of parts using a coordinate measuring machine (CMM) has beengenerally adapted to the advanced automotive and aerospace industries. However, for thegeometric inspection of deformable free-form parts, special inspection fixtures, incombination with CMM’s and/or optical data acquisition devices (scanners), are used. As aresult, the geometric inspection of flexible parts is a consuming process in terms of timeand money. The general procedure to eliminate the use of inspection fixtures based ondistance preserving nonlinear dimensionality reduction (NLDR) technique was developed inour previous works. We sought out geometric properties that are invariant to inelasticdeformations. In this paper we will only present a systematic comparison of somewell-known dimensionality reduction techniques in order to evaluate their accuracy andpotential for non-rigid metrology. We will demonstrate that even though these techniquesmay provide acceptable results through artificial data on certain fields like patternrecognition and machine learning, this performance cannot be extended to all realengineering metrology problems where high accuracy is needed.

Temporal dynamics of 2D and 3D shape representation in macaque visual area V4
JAY HEGDÉ, DAVID C. VAN ESSEN
Journal:

Visual Neuroscience / Volume 23 / Issue 5 / September 2006

Published online by Cambridge University Press:

04 October 2006, pp. 749-763
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We studied the temporal dynamics of shape representation in area V4 of the alert macaque monkey. Analyses were based on two large stimulus sets, one equivalent to the 2D shape stimuli used in a previous study of V2, and the other a set of stereoscopic 3D shape stimuli. As in V2, we found that information conveyed by individual V4 neurons about the stimuli tended to be maximal during the initial transient response and generally lower, albeit statistically significant, afterwards. The population response was substantially correlated from one stimulus to the next during the transients, and decorrelated as responses decayed. V4 responses showed significantly longer latencies than in V2, especially for the 3D stimulus set. Recordings from area V1 in a single animal revealed temporal dynamic patterns in response to the 2D shape stimuli that were largely similar to those in V2 and V4. Together with earlier results, these findings provide evidence for a distributed process of coarse-to-fine representation of shape stimuli in the visual cortex.

Search Results

Refine search

Refine search

Actions for selected content:

15 results

Message in a bottle: Forecasting wine prices

Comparing flow-based and anatomy-based features in the data-driven study of nasal pathologies

Neural networks with dimensionality reduction for predicting temperature change due to plastic deformation in a cold rolling simulation

Principal component density estimation for scenario generation using normalizing flows

Strategies for EELS Data Analysis. Introducing UMAP and HDBSCAN for Dimensionality Reduction and Clustering

13 - Graph Signal Processing for the Power Grid

Summary

5 - Principal Component Analysis

Summary

4 - Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings

Summary

The Art of Feature Engineering

Mining of Massive Datasets

11 - Dimensionality Reduction

Summary

Spatial modelling of soil organic carbon stocks with combined principal component analysis and geographically weighted regression

Adaptive Dimensionality-Reduction for Time-Stepping in Differential and Partial Differential Equations

Performance study of dimensionality reduction methods formetrology of nonrigid mechanical parts

Temporal dynamics of 2D and 3D shape representation in macaque visual area V4

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

15 results

Summary

Summary

Summary

The Art of Feature Engineering

Mining of Massive Datasets

Summary