We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This study introduces an innovative methodology for mortality forecasting, which integrates signature-based methods within the functional data framework of the Hyndman–Ullah (HU) model. This new approach, termed the Hyndman–Ullah with truncated signatures (HUts) model, aims to enhance the accuracy and robustness of mortality predictions. By utilizing signature regression, the HUts model is able to capture complex, nonlinear dependencies in mortality data which enhances forecasting accuracy across various demographic conditions. The model is applied to mortality data from 12 countries, comparing its forecasting performance against variants of the HU models across multiple forecast horizons. Our findings indicate that overall the HUts model not only provides more precise point forecasts but also shows robustness against data irregularities, such as those observed in countries with historical outliers. The integration of signature-based methods enables the HUts model to capture complex patterns in mortality data, making it a powerful tool for actuaries and demographers. Prediction intervals are also constructed with bootstrapping methods.
This study bridges the study of social inclusion with welfare regime theory. By linking social inclusion with welfare regimes, we establish a novel analytical framework for assessing global trends and national divergences in social inclusion based on a multidimensional view of the concept. While scholars have developed typologies for social inclusion and welfare regimes independent of each other, limited insights exist on how social inclusion relates to welfare regime typologies. We develop a social inclusion index for 225 countries using principal component analysis with 10 measures of social inclusion from the United Nations’ Sustainable Development Goals Indicators Database. We then employ clustering algorithms to inductively group countries based on the index. We find six “worlds” of social inclusion based on the index and other social factors – the Low, Mid, and High Social Inclusion regimes and the Low, Mid, and High Social Exclusion regimes.
The fuzzy perspective in statistical analysis is first illustrated with reference to the “Informational Paradigm" allowing us to deal with different types of uncertainties related to the various informational ingredients (data, model, assumptions). The fuzzy empirical data are then introduced, referring to J LR fuzzy variables as observed on I observation units. Each observation is characterized by its center and its left and right spreads (LR1 fuzzy number) or by its left and right “centers" and its left and right spreads (LR2 fuzzy number). Two types of component models for LR1 and LR2 fuzzy data are proposed. The estimation of the parameters of these models is based on a Least Squares approach, exploiting an appropriately introduced distance measure for fuzzy data. A simulation study is carried out in order to assess the efficacy of the suggested models as compared with traditional Principal Component Analysis on the centers and with existing methods for fuzzy and interval valued data. An application to real fuzzy data is finally performed.
The component loadings are interpreted by considering their magnitudes, which indicates how strongly each of the original variables relates to the corresponding principal component. The usual ad hoc practice in the interpretation process is to ignore the variables with small absolute loadings or set to zero loadings smaller than some threshold value. This, in fact, makes the component loadings sparse in an artificial and a subjective way. We propose a new alternative approach, which produces sparse loadings in an optimal way. The introduced approach is illustrated on two well-known data sets and compared to the existing rotation methods.
Factor analysis and principal component analysis result in computing a new coordinate system, which is usually rotated to obtain a better interpretation of the results. In the present paper, the idea of rotation to simple structure is extended to two dimensions. While the classical definition of simple structure is aimed at rotating (one-dimensional) factors, the extension to a simple structure for two dimensions is based on the rotation of planes. The resulting planes (principal planes) reveal a better view of the data than planes spanned by factors from classical rotation and hence allow a more reliable interpretation. The usefulness of the method as well as the effectiveness of a proposed algorithm are demonstrated by simulation experiments and an example.
Homogeneity analysis, or multiple correspondence analysis, is usually applied to k separate variables. In this paper we apply it to sets of variables by using sums within sets. The resulting technique is called OVERALS. It uses the notion of optimal scaling, with transformations that can be multiple or single. The single transformations consist of three types: nominal, ordinal, and numerical. The corresponding OVERALS computer program minimizes a least squares loss function by using an alternating least squares algorithm. Many existing linear and nonlinear multivariate analysis techniques are shown to be special cases of OVERALS. An application to data from an epidemiological survey is presented.
The aim of this note is to show that the centroid method has two optimality properties. It yields loadings with the highest sum of absolute values, even in absence of the constraint that the squared component weights be equal. In addition, it yields scores with maximum variance, subject to the constraint that none of the squared component weights be larger than 1.
Horn’s parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy–Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy–Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy–Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy–Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.
When r Principal Components are available for k variables, the correlation matrix is approximated in the least squares sense by the loading matrix times its transpose. The approximation is generally not perfect unless r =k. In the present paper it is shown that, when r is at or above the Ledermann bound, r principal components are enough to perfectly reconstruct the correlation matrix, albeit in a way more involved than taking the loading matrix times its transpose. In certain cases just below the Ledermann bound, recovery of the correlation matrix is still possible when the set of all eigenvalues of the correlation matrix is available as additional information.
The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA.
The specifications of state space model for some principal component-related models are described, including the independent-group common principal component (CPC) model, the dependent-group CPC model, and principal component-based multivariate analysis of variance. Some derivations are provided to show the equivalence of the state space approach and the existing Wishart-likelihood approach. For each model, a numeric example is used to illustrate the state space approach. In addition, a simulation study is conducted to evaluate the standard error estimates under the normality and nonnormality conditions. In order to cope with the nonnormality conditions, the robust standard errors are also computed. Finally, other possible applications of the state space approach are discussed at the end.
Concise formulas for the asymptotic standard errors of component loading estimates were derived. The formulas cover the cases of principal component analysis for unstandardized and standardized variables with orthogonal and oblique rotations. The formulas can be used under any distributions for observed variables as long as the asymptotic covariance matrix for sample covariances/correlations is available. The estimated standard errors in numerical examples were shown to be equivalent to those by the methods using information matrices.
Saito and Otsu (1988) compared their OSMOD method of nonmetric principal-component analysis to an early and incorrect implementation of the PRINCIPALS algorithm of Young, Takane, and de Leeuw (1978). In this comment we present results from the current, correct implementations of the algorithm.
The possibility of obtaining locally optimal solutions with categorical data is pointed out for the original version of OSMOD development by Saito and Otsu. A revision of the initialization strategy in OSMOD is suggested, and its effectiveness in diminishing this possibility is demonstrated.
This paper develops a method of optimal scaling for multivariate ordinal data, in the framework of a generalized principal component analysis. This method yields a multidimensional configuration of items, a unidimensional scale of category weights for each item and, optionally, a multidimensional configuration of subjects. The computation is performed by alternately solving an eigenvalue problem and executing a quasi-Newton projection method. The algorithm is extended for analysis of data with mixed measurement levels or for analysis with a combined weighting of items. Numerical examples and simulations are provided. The algorithm is discussed and compared with some related methods.
The paper derives sufficient conditions for the consistency and asymptotic normality of the least squares estimator of a trilinear decomposition model for multiway data analysis.
Given multivariate multiblock data (e.g., subjects nested in groups are measured on multiple variables), one may be interested in the nature and number of dimensions that underlie the variables, and in differences in dimensional structure across data blocks. To this end, clusterwise simultaneous component analysis (SCA) was proposed which simultaneously clusters blocks with a similar structure and performs an SCA per cluster. However, the number of components was restricted to be the same across clusters, which is often unrealistic. In this paper, this restriction is removed. The resulting challenges with respect to model estimation and selection are resolved.
Transfer learning has been highlighted as a promising framework to increase the accuracy of the data-driven model in the case of data sparsity, specifically by leveraging pretrained knowledge to the training of the target model. The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order modeling (ROM) that represents the homogeneous ignition of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of the hydrogen/air mixture in composition space. Artificial neural networks (ANNs) are used to regress the reaction rates of principal components, and subsequently, a system of ordinary differential equations is solved. As the number of training samples decreases in the target task, the ROM fails to predict the ignition evolution of a hydrogen/air mixture. Three transfer learning strategies are then applied to the training of the ANN model with a sparse dataset. The performance of the ROM with a sparse dataset is remarkably enhanced if the training of the ANN model is restricted by a regularization term that controls the degree of knowledge transfer from source to target tasks. To this end, a novel transfer learning method is introduced, Parameter control via Partial Initialization and Regularization (PaPIR), whereby the amount of knowledge transferred is systemically adjusted in terms of the initialization and regularization schemes of the ANN model in the target task.
This paper derives from new work on Mesolithic human skeletal material from Strøby Egede, a near coastal site in eastern Sjælland, with two foci. The first confirms sex identifications from original work carried out in 1986. The second, and central focus, re-examines comments by one of us (CM) based on work in 1992, and a new statistical analysis including data from the two Strøby Egede adults. In 1998 it was suggested that the Strøby Egede sample more closely resembled Skateholm, on the coast of Skåne in southern Sweden, than Vedbæk-Bøgebakken on Sjælland, fitting lithic patterns noted earlier by Vang Petersen. We revisit the 1998 suggestion below, comparing data from Strøby Egede to those available from southern Scandinavia and Germany, and suggest that the 1998 comment was, in all probability, incorrect. The analysis below suggests overall morphological similarity between individuals in eastern Sjælland and Skåne, while noting the existence of apparent outliers.
We investigated associations between ‘healthy dietary pattern’ scores, at ages 36, 43, 53 and 60–64 years, and body composition at age 60–64 years among participants from the MRC National Survey of Health and Development (NSHD). Principal component analyses of dietary data (food diaries) at age 60–64 years were used to calculate diet scores (healthy dietary pattern scores) at each age. Higher scores indicated healthier diets (higher consumption of fruit, vegetables and wholegrain bread). Linear regression was used to investigate associations between diet scores at each age and height-adjusted dual-energy X-ray absorptiometry-measured fat and lean mass measures at age 60–64 years. Analyses, adjusting for sex and other potential confounders (age, smoking history, physical activity and occupational class), were implemented among 692 men and women. At age 43, 53 and 60–64 years, higher diet scores were associated with lower fat mass index (FMI) and android:gynoid fat mass ratio; for example, in fully adjusted analyses, a standard deviation (sd) increase in diet score at age 60–64 years was associated with an SD difference in mean FMI of −0·18 (95 % CI: −0·25, −0·10). In conditional analyses, higher diet scores at ages 43, 53 and 60–64 years (than expected from diet scores at younger ages) were associated with lower FMI and android:gynoid fat mass ratio in fully adjusted analyses. Diet scores at age 36 years had weaker associations with the outcomes considered. No associations regarding appendicular lean mass index were robust after full adjustment. This suggests that improvements in diet through adulthood are linked to beneficial effects on adiposity in older age.