We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
PCA is a popular tool for exploring and summarizing multivariate data, especially those consisting of many variables. PCA, however, is often not simple to interpret, as the components are a linear combination of the variables. To address this issue, numerous methods have been proposed to sparsify the nonzero coefficients in the components, including rotation-thresholding methods and, more recently, PCA methods subject to sparsity inducing penalties or constraints. Here, we offer guidelines on how to choose among the different sparse PCA methods. Current literature misses clear guidance on the properties and performance of the different sparse PCA methods, often relying on the misconception that the equivalence of the formulations for ordinary PCA also holds for sparse PCA. To guide potential users of sparse PCA methods, we first discuss several popular sparse PCA methods in terms of where the sparseness is imposed on the loadings or on the weights, assumed model, and optimization criterion used to impose sparseness. Second, using an extensive simulation study, we assess each of these methods by means of performance measures such as squared relative error, misidentification rate, and percentage of explained variance for several data generating models and conditions for the population model. Finally, two examples using empirical data are considered.
In this paper we discuss the use of a recent dimension reduction technique called Locally Linear Embedding, introduced by Roweis and Saul, for performing an exploratory latent structure analysis. The coordinate variables from the locally linear embedding describing the manifold on which the data reside serve as the latent variable scores. We propose the use of semiparametric penalized spline methods for reconstruction of the manifold equations that approximate the data space. We also discuss a crossvalidation strategy that can guide in selecting an appropriate number of latent variables. Synthetic as well as real data sets are used to illustrate the proposed approach. A nonlinear latent structure representation of a data set also serves as a data visualization tool.
Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are convolutional neural network (CNN)-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: (a) their high dimension and (b) lack of generalization, leading to low efficiency and poor performance in real robotic applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in four challenging real-world datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. The code of our work is publicly available at https://github.com/MedlarTea/CAE-VPR.
Large data sets are difficult to grasp. To make progress, we often seek a few quantities that capture as much of the information in the data as possible. In this chapter, we discuss a procedure called Principal Component Analysis (PCA), also called Empirical Orthogonal Function (EOF) analysis, which finds the components that minimizes the sum square difference between the components and the data. The components are ordered such that the first approximates the data the best (in a least squares sense), the second approximates the data the best among all components orthogonal to the first, and so on. In typical climate applications, a principal component consists of two parts: (1) a fixed spatial structure, called an Empirical Orthogonal Function (EOF), and (2) its time-dependent amplitude, called a PC time series. The EOFs are orthogonal and the PC time series are uncorrelated. Principal components often are used as input to other analyses, such as linear regression, canonical correlation analysis, predictable components analysis, or discriminant analysis. The procedure for performing area-weighted PCA is discussed in detail in this chapter.
Dietary pattern analysis is typically based on dimension reduction and summarises the diet with a small number of scores. We assess ‘joint and individual variance explained’ (JIVE) as a method for extracting dietary patterns from longitudinal data that highlights elements of the diet that are associated over time. The Auckland Birthweight Collaborative Study, in which participants completed an FFQ at ages 3·5 (n 549), 7 (n 591) and 11 (n 617), is used as an example. Data from each time point are projected onto the directions of shared variability produced by JIVE to yield dietary patterns and scores. We assess the ability of the scores to predict future BMI and blood pressure measurements of the participants and make a comparison with principal component analysis (PCA) performed separately at each time point. The diet could be summarised with three JIVE patterns. The patterns were interpretable, with the same interpretation across age groups: a vegetable and whole grain pattern, a sweets and meats pattern and a cereal v. sweet drinks pattern. The first two PCA-derived patterns were similar across age groups and similar to the first two JIVE patterns. The interpretation of the third PCA pattern changed across age groups. Scores produced by the two techniques were similarly effective in predicting future BMI and blood pressure. We conclude that when data from the same participants at multiple ages are available, JIVE provides an advantage over PCA by extracting patterns with a common interpretation across age groups.
In this work we derive by $\Gamma$-convergence techniques a model for brittle fracture linearly elastic plates. Precisely, we start from a brittle linearly elastic thin film with positive thickness $\rho$ and study the limit as $\rho$ tends to $0$. The analysis is performed with no a priori restrictions on the admissible displacements and on the geometry of the fracture set. The limit model is characterized by a Kirchhoff-Love type of structure.
Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.
Starting from three-dimensional non-linear elasticity under the restriction of incompressibility, we derive reduced models to capture the behaviour of strings in response to external forces. Our Γ-convergence analysis of the constrained energy functionals in the limit of shrinking cross-sections gives rise to explicit one-dimensional limit energies. The latter depend on the scaling of the applied forces. The effect of local volume preservation is reflected either in their energy densities through a constrained minimization over the cross-section variables or in the class of admissible deformations. Interestingly, all scaling regimes allow for compression and/or stretching of the string. The main difficulty in the proof of the Γ-limit is to establish recovery sequences that accommodate the non-linear differential constraint imposed by the incompressibility. To this end, we modify classical constructions in the unconstrained case with the help of an inner perturbation argument tailored for 3d-1d dimension reduction problems.
Motivated by the idea of turbomachinery active subspace performance maps, this paper studies dimension reduction in turbomachinery 3D CFD simulations. First, we show that these subspaces exist across different blades—under the same parametrisation—largely independent of their Mach number or Reynolds number. This is demonstrated via a numerical study on three different blades. Then, in an attempt to reduce the computational cost of identifying a suitable dimension reducing subspace, we examine statistical sufficient dimension reduction methods, including sliced inverse regression, sliced average variance estimation, principal Hessian directions and contour regression. Unsatisfied by these results, we evaluate a new idea based on polynomial variable projection—a non-linear least-squares problem. Our results using polynomial variable projection clearly demonstrate that one can accurately identify dimension reducing subspaces for turbomachinery functionals at a fraction of the cost associated with prior methods. We apply these subspaces to the problem of comparing design configurations across different flight points on a working line of a fan blade. We demonstrate how designs that offer a healthy compromise between performance at cruise and sea-level conditions can be easily found by visually inspecting their subspaces.
One of today’s most propitious immersive technologies is virtual reality (VR). This term is colloquially associated with headsets that transport users to a bespoke, built-for-purpose immersive 3D virtual environment. It has given rise to the field of immersive analytics—a new field of research that aims to use immersive technologies for enhancing and empowering data analytics. However, in developing such a new set of tools, one has to ask whether the move from standard hardware setup to a fully immersive 3D environment is justified—both in terms of efficiency and development costs. To this end, in this paper, we present AeroVR—an immersive aerospace design environment with the objective of aiding the component aerodynamic design process by interactively visualizing performance and geometry. We decompose the design of such an environment into function structures, identify the primary and secondary tasks, present an implementation of the system, and verify the interface in terms of usability and expressiveness. We deploy AeroVR on a prototypical design study of a compressor blade for an engine.
We present an application of statistical graphical models to simulate economic variables for the purpose of risk calculations over long time horizons. We show that this approach is relatively easy to implement, and argue that it is appealing because of the transparent yet flexible means of achieving dimension reduction when many variables must be modelled. Using the United Kingdom data as an example, we demonstrate the development of an economic scenario generator that can be used by life insurance companies and pension funds.We compare different algorithms to select a graphical model, based on p-values, AIC, BIC and deviance. We find the economic scenario generator to yield reasonable results and relatively stable structures in our example, suggesting that it would be beneficial for actuaries to include graphical models in their toolkit.
In areas of application, including actuarial science and demography, it is increasingly common to consider a time series of curves; an example of this is age-specific mortality rates observed over a period of years. Given that age can be treated as a discrete or continuous variable, a dimension reduction technique, such as principal component analysis (PCA), is often implemented. However, in the presence of moderate-to-strong temporal dependence, static PCA commonly used for analyzing independent and identically distributed data may not be adequate. As an alternative, we consider a dynamic principal component approach to model temporal dependence in a time series of curves. Inspired by Brillinger’s (1974, Time Series: Data Analysis and Theory. New York: Holt, Rinehart and Winston) theory of dynamic principal components, we introduce a dynamic PCA, which is based on eigen decomposition of estimated long-run covariance. Through a series of empirical applications, we demonstrate the potential improvement of 1-year-ahead point and interval forecast accuracies that the dynamic principal component regression entails when compared with the static counterpart.
We prove that, in the limit of vanishing thickness, equilibrium configurations of inhomogeneous, three-dimensional non-linearly elastic rods converge to equilibrium configurations of the variational limit theory. More precisely, we show that, as $h\searrow 0$, stationary points of the energy , for a rod $\Omega _h\subset {\open R}^3$ with cross-sectional diameter h, subconverge to stationary points of the Γ-limit of , provided that the bending energy of the sequence scales appropriately. This generalizes earlier results for homogeneous materials to the case of materials with (not necessarily periodic) inhomogeneities.
We derive asymptotic formulas for the solutions of the mixed boundary value problem forthe Poisson equation on the union of a thin cylindrical plate and several thin cylindricalrods. One of the ends of each rod is set into a hole in the plate and the other one issupplied with the Dirichlet condition. The Neumann conditions are imposed on the wholeremaining part of the boundary. Elements of the junction are assumed to have contrastingproperties so that the small parameter, i.e. the relative thickness,appears in the differential equation, too, while the asymptotic structures cruciallydepend on the contrastness ratio. Asymptotic error estimates are derived in anisotropicweighted Sobolev norms.
A 3D-2D dimension reduction for −Δ1 is obtained. A power
law approximation from −Δp as
p → 1 in terms of Γ-convergence, duality and
asymptotics for least gradient functions has also been provided.
Dimension reduction of multivariate data was developed by Y. Guan for point processes with Gaussian random fields as covariates. The generalization to fibre and surface processes is straightforward. In inverse regression methods, we suggest slicing based on geometrical marks. An investigation of the properties of this method is presented in simulation studies of random marked sets. In a refined model for dimension reduction, the second-order central subspace is analyzed in detail. A real data pattern is tested for independence of a covariate.
The paper deals with a Dirichlet spectral problem for an elliptic operator with
ε-periodic coefficients in a 3D bounded domain of small thickness
δ. We study the asymptotic behavior of the spectrum as
ε and δ tend to zero. This asymptotic behavior depends
crucially on whether ε and δ are of the same order
(δ ≈ ε), or ε is much less than
δ(δ = ετ, τ < 1),
or ε is much greater than
δ(δ = ετ, τ > 1).
We consider all three cases.
The question, how does an organism maintain balance? provides a unifying theme tointroduce undergraduate students to the use of mathematics and modeling techniques inbiological research. The availability of inexpensive high speed motion capture camerasmakes it possible to collect the precise and reliable data that facilitates thedevelopment of relevant mathematical models. An in–house laboratory component ensures thatstudents have the opportunity to directly compare prediction to observation and motivatesthe development of projects that push the boundaries of the subject. The projects, bytheir nature, readily lend themselves to the formation of inter–disciplinary studentresearch teams. Thus students have the opportunity to learn skills essential for successin today’s workplace including productive team work, critical thinking, problem solving,project management, and effective communication.