We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Millsap and Meredith (1988) have developed a generalization of principal components analysis for the simultaneous analysis of a number of variables observed in several populations or on several occasions. The algorithm they provide has some disadvantages. The present paper offers two alternating least squares algorithms for their method, suitable for small and large data sets, respectively. Lower and upper bounds are given for the loss function to be minimized in the Millsap and Meredith method. These can serve to indicate whether or not a global optimum for the simultaneous components analysis problem has been attained.
A number of methods for the analysis of three-way data are described and shown to be variants of principal components analysis (PCA) of the two-way supermatrix in which each two-way slice is “strung out” into a column vector. The methods are shown to form a hierarchy such that each method is a constrained variant of its predecessor. A strategy is suggested to determine which of the methods yields the most useful description of a given three-way data set.
The Maxbet method is a generalized principal components analysis of a data set, where the group structure of the variables is taken into account. Similarly, 3-block[12,13] partial Maxdiff method is a generalization of covariance analysis, where only the covariances between blocks (1, 2) and (1, 3) are taken into account. The aim of this paper is to give the global maximum for the 2-block Maxbet and 3-block[12,13] partial Maxdiff problems by picking the best solution from the complete solution set for the multivariate eigenvalue problem involved. To do this, we generalize the characteristic polynomial of a matrix to a system of two characteristic polynomials, and provide the complete solution set of the latter via Sylvester resultants. Examples are provided.
In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix independently and concurrently, thus destroying the entire correlational structure of the data. This strategy is considered appropriate for assessing the significance of the PCA solution as a whole, but is not suitable for assessing the significance of the contribution of single variables. Alternatively, we propose a strategy involving permutation of one variable at a time, while keeping the other variables fixed. We compare the two approaches in a simulation study, considering proportions of Type I and Type II error. We use two corrections for multiple testing: the Bonferroni correction and controlling the False Discovery Rate (FDR). To assess the significance of the variance accounted for by the variables, permuting one variable at a time, combined with FDR correction, yields the most favorable results. This optimal strategy is applied to an empirical data set, and results are compared with bootstrap confidence intervals.
Multitrait-Multimethod (MTMM) matrices are often analyzed by means of confirmatory factor analysis (CFA). However, fitting MTMM models often leads to improper solutions, or non-convergence. In an attempt to overcome these problems, various alternative CFA models have been proposed, but with none of these the problem of finding improper solutions was solved completely. In the present paper, an approach is proposed where improper solutions are ruled out altogether and convergence is guaranteed. The approach is based on constrained variants of components analysis (CA). Besides the fact that these methods do not give improper solutions, they have the advantage that they provide component scores which can later on be used to relate the components to external variables. The new methods are illustrated by means of simulated data, as well as empirical data sets.
Procedures for oblique rotation of factors or principal components typically focus on rotating the pattern matrix such that it becomes optimally simple. An important oblique rotation method that does so is Harris and Kaiser's (1964) independent cluster (HKIC) rotation. In principal components analysis, a case can be made for interpreting the components on the basis of the component weights rather than on the basis of the pattern, so it seems desirable to rotate the components such that the weights rather than the pattern become optimally simple. In the present paper, it is shown that HKIC rotates the components such that both the pattern and the weights matrix become optimally simple. In addition, it is shown that the pattern resulting from HKIC rotation is columnwise proportional to the associated weights matrix, which implies that the interpretation of the components does not depend on whether it is based on the pattern or on the component weights matrix. It is also shown that the latter result only holds for HKIC rotation and slight modifications of it.
We offer an introduction to the five papers that make up this special section. These papers deal with a range of the methodological challenges that face researchers analyzing fMRI data—the spatial, multilevel, and longitudinal nature of the data, the sources of noise, and so on. The papers all provide analyses of data collected by a multi-site consortium, the Function Biomedical Informatics Research Network. Due to the sheer volume of data, univariate procedures are often applied, which leads to a multiple comparisons problem (since the data are necessarily multivariate). The papers in this section include interesting applications, such as a state-space model applied to these data, and conclude with a reflection on basic measurement problems in fMRI. All in all, they provide a good overview of the challenges that fMRI data present to the standard psychometric toolbox, but also to the opportunities they offer for new psychometric modeling.
In this paper we discuss the use of a recent dimension reduction technique called Locally Linear Embedding, introduced by Roweis and Saul, for performing an exploratory latent structure analysis. The coordinate variables from the locally linear embedding describing the manifold on which the data reside serve as the latent variable scores. We propose the use of semiparametric penalized spline methods for reconstruction of the manifold equations that approximate the data space. We also discuss a crossvalidation strategy that can guide in selecting an appropriate number of latent variables. Synthetic as well as real data sets are used to illustrate the proposed approach. A nonlinear latent structure representation of a data set also serves as a data visualization tool.
In practice it may happen that a first-try econometric model is not appropriate because it violates one or more of the key assumptions that are needed to obtain valid results. In case there is something wrong with the variables, such as measurement error or strong collinearity, we may better modify the estimation method or change the model. In the present chapter we deal with endogeneity, which can, for example, be caused by measurement error, and which implies that one or more regressors are correlated with the unknown error term. This is of course not immediately visible because the errors are not known beforehand and are estimated jointly with the unknown parameters. Endogeneity can thus happen when a regressor is measured with error, and, as we see, when the data are aggregated at too low a frequency. Another issue is called multicollinearity, in which it is difficult to disentangle (the statistical significance of) the separate effects. This certainly holds for levels and squares of the same variable. Finally, we deal with the interpretation of model outcomes.
Currently we may have access to large databases, sometimes coined as Big Data, and for those large datasets simple econometric models will not do. When you have a million people in your database, such as insurance firms or telephone providers or charities, and you have collected information on these individuals for many years, you simply cannot summarize these data using a small-sized econometric model with just a few regressors. In this chapter we address diverse options for how to handle Big Data. We kick off with a discussion about what Big Data is and why it is special. Next, we discuss a few options such as selective sampling, aggregation, nonlinear models, and variable reduction. Methods such as ridge regression, lasso, elastic net, and artificial neural networks are also addressed; these latter concepts are nowadays described as so-called machine learning methods. We see that with these methods the number of choices rapidly increases, and that reproducibility can reduce. The analysis of Big Data therefore comes at a cost of more analysis and of more choices to make and to report.
The components or functions derived from an eigenanalysis are linear combinations of the original variables. Principal components analysis (PCA) is a very common method that uses these components to examine patterns among the objects, often in a plot termed an ordination, and identify which variables are driving those patterns. Correspondence analysis (CA) is a related method used when the variables represent counts or abundances. Redundancy analysis and canonical CA are constrained versions of PCA and CA, respectively, where the components are derived after taking into account the relationships with additional explanatory variables. Finally, we introduce linear discriminant function analysis as a way of identifying and predicting membership of objects to predefined groups.
Relatively little is known about how the diet of chronically undernourished children may impact cardiometabolic biomarkers. The objective of this exploratory study was to characterise relationships between dietary patterns and the cardiometabolic profile of 153 3–5-year-old Peruvian children with a high prevalence of chronic undernutrition. We collected monthly dietary recalls from children when they were 9–24 months old. At 3–5 years, additional dietary recalls were collected, and blood pressure, height, weight, subscapular skinfolds and fasting plasma glucose, insulin and lipid profiles were assessed. Nutrient intakes were expressed as average density per 100 kcals (i) from 9 to 24 months and (ii) at follow-up. The treelet transform and sparse reduced rank regress'ion (RRR) were used to summarize nutrient intake data. Linear regression models were then used to compare these factors to cardiometabolic outcomes and anthropometry. Linear regression models adjusting for subscapular skinfold-for-age Z-scores (SSFZ) were then used to test whether observed relationships were mediated by body composition. 26 % of children were stunted at 3–5 years old. Both treelet transform and sparse RRR-derived child dietary factors are related to protein intake and associated with total cholesterol and SSFZ. Associations between dietary factors and insulin were attenuated after adjusting for SSFZ, suggesting that body composition mediated these relationships. Dietary factors in early childhood, influenced by protein intake, are associated with cholesterol profiles, fasting glucose and body fat in a chronically undernourished population.
Reading difficulties are prevalent worldwide, including in economically developed countries, and are associated with low academic achievement and unemployment. Longitudinal studies have identified several early childhood predictors of reading ability, but studies frequently lack genotype data that would enable testing of predictors with heritable influences. The National Child Development Study (NCDS) is a UK birth cohort study containing direct reading skill variables at every data collection wave from age 7 years through to adulthood with a subsample (final n = 6431) for whom modern genotype data are available. It is one of the longest running UK cohort studies for which genotyped data are currently available and is a rich dataset with excellent potential for future phenotypic and gene-by-environment interaction studies in reading. Here, we carry out imputation of the genotype data to the Haplotype Reference Panel, an updated reference panel that offers greater imputation quality. Guiding phenotype choice, we report a principal components analysis of nine reading variables, yielding a composite measure of reading ability in the genotyped sample. We include recommendations for use of composite scores and the most reliable variables for use during childhood when conducting longitudinal, genetically sensitive analyses of reading ability.
As the world’s population is ageing, improving the physical performance (PP) of the older population is becoming important. Although diets are fundamental to maintaining and improving PP, few studies have addressed the role of these factors in adults aged ≥ 85 years, and none have been conducted in Asia. This study aimed to determine the dietary patterns (DP) and examine their relationship with PP in this population.
Design:
This cross-sectional study (Kawasaki Aging and Wellbeing Project) estimated food consumption using a brief-type self-administered diet history questionnaire. The results were adjusted for energy after aggregating into thirty-three groups, excluding possible over- or underestimation. Principal component analysis was used to identify DP, and outcomes included hand grip strength (HGS), timed up-and-go test, and usual walking speed.
Setting:
This study was set throughout several hospitals in Kawasaki city.
Participants:
In total, 1026 community-dwelling older adults (85–89 years) were enrolled.
Results:
Data of 1000 participants (median age: 86·9 years, men: 49·9 %) were included in the analysis. Three major DP (DP1: various foods, DP2: red meats and coffee, DP3: bread and processed meats) were identified. The results of multiple regression analysis showed that the trend of DP2 was negatively associated with HGS (B, 95 % CI –0·35, –0·64, –0·06).
Conclusions:
This study suggests a negative association between HGS and DP characterised by red meats and coffee in older adults aged ≥ 85 years in Japan.
In this chapter, we study risks associated with movements of interest rates in financial markets. We begin with a brief discussion of the term structure of interest rates. We then discuss commonly used interest rate sensitive securities. This is followed by the study of different measures of sensitivity to interest rates, including duration and convexity. We consider mitigating interest rate risk through hedging and immunization. Finally, we take a more in-depth look at the drivers of interest rate term structure dynamics.
Dietary pattern analysis is typically based on dimension reduction and summarises the diet with a small number of scores. We assess ‘joint and individual variance explained’ (JIVE) as a method for extracting dietary patterns from longitudinal data that highlights elements of the diet that are associated over time. The Auckland Birthweight Collaborative Study, in which participants completed an FFQ at ages 3·5 (n 549), 7 (n 591) and 11 (n 617), is used as an example. Data from each time point are projected onto the directions of shared variability produced by JIVE to yield dietary patterns and scores. We assess the ability of the scores to predict future BMI and blood pressure measurements of the participants and make a comparison with principal component analysis (PCA) performed separately at each time point. The diet could be summarised with three JIVE patterns. The patterns were interpretable, with the same interpretation across age groups: a vegetable and whole grain pattern, a sweets and meats pattern and a cereal v. sweet drinks pattern. The first two PCA-derived patterns were similar across age groups and similar to the first two JIVE patterns. The interpretation of the third PCA pattern changed across age groups. Scores produced by the two techniques were similarly effective in predicting future BMI and blood pressure. We conclude that when data from the same participants at multiple ages are available, JIVE provides an advantage over PCA by extracting patterns with a common interpretation across age groups.
The present study investigated the association between dietary patterns and hypertension applying the Chinese Dietary Balance Index-07 (DBI-07).
Design:
A cross-sectional study on adult nutrition and chronic disease in Inner Mongolia. Dietary data were collected using 24 h recall over three consecutive days and weighing method. Dietary patterns were identified using principal components analysis. Generalized linear models and multivariate logistic regression models were used to examine the associations between DBI-07 and dietary patterns, and between dietary patterns and hypertension.
Setting:
Inner Mongolia (n 1861).
Participants:
A representative sample of adults aged ≥18 years in Inner Mongolia.
Results:
Four major dietary patterns were identified: ‘high protein’, ‘traditional northern’, ‘modern’ and ‘condiments’. Generalized linear models showed higher factor scores in the ‘high protein’ pattern were associated with lower DBI-07 (βLBS = −1·993, βHBS = −0·206, βDQD = −2·199; all P < 0·001); the opposite in the ‘condiments’ pattern (βLBS = 0·967, βHBS = 0·751, βDQD = 1·718; all P < 0·001). OR for hypertension in the highest quartile of the ‘high protein’ pattern compared with the lowest was 0·374 (95 % CI 0·244, 0·573; Ptrend < 0·001) in males. OR for hypertension in the ‘condiments’ pattern was 1·663 (95 % CI 1·113, 2·483; Ptrend < 0·001) in males, 1·788 (95 % CI 1·155, 2·766; Ptrend < 0·001) in females.
Conclusions:
Our findings suggested a higher-quality dietary pattern evaluated by DBI-07 was related to decreased risk for hypertension, whereas a lower-quality dietary pattern was related to increased risk for hypertension in Inner Mongolia.
Tognini-Bonelli (2001) made the following distinction between corpus-based and corpus-driven studies. While corpus-based studies start with pre-existing theories which are tested using corpus data, in corpus driven studies the hypothesis is derived by examination of the corpus evidence. This chapter will give an overview of the two different families of statistical tests which are suited for these two approaches. For corpus-based approaches, we use more traditional statistics, such as the t-test, or ANOVA which return a value called a p-value to tell us to what extent we should accept or reject the initial hypothesis. Multi-level modelling (also known as mixed modelling) is a new technique which shows considerable promise for corpus-based studies, and will also be described here to analyse the ENNTT subset of Europarl corpus. Multi-level modelling is useful for the examination of hierarchically structured or “nested” data, where for example translations may be “nested” together in a class if they have the same language of origin. A multi-level model takes account both of the variation between individual translations and the variation between classes. For example, we might expect the scores (such as vocabulary richness or readability scores) of two translations in the same class to be more similar to each other than two translations in different classes.
To describe the relationship between adherence to distinct dietary patterns and nutrition literacy.
Design:
We identified distinct dietary patterns using principal covariates regression (PCovR) and principal components analysis (PCA) from the Diet History Questionnaire II. Nutrition literacy was assessed using the Nutrition Literacy Assessment Instrument (NLit). Cross-sectional relationships between dietary pattern adherence and global and domain-specific NLit scores were tested by multiple linear regression. Mean differences in diet pattern adherence among three predefined nutrition literacy performance categories were tested by ANOVA.
Setting:
Metropolitan Kansas City, USA.
Participants:
Adults (n 386) with at least one of four diet-related diseases.
Results:
Three diet patterns of interest were derived: a PCovR prudent pattern and PCA-derived Western and Mediterranean patterns. After controlling for age, sex, BMI, race, household income, education level and diabetes status, PCovR prudent pattern adherence positively related to global NLit score (P < 0·001, β = 0·36), indicating more intake of prudent diet foods with improved nutrition literacy. Validating the PCovR findings, PCA Western pattern adherence inversely related to global NLit (P = 0·003, β = −0·13) while PCA Mediterranean pattern positively related to global NLit (P = 0·02, β = 0·12). Using predefined cut points, those with poor nutrition literacy consumed more foods associated with the Western diet (fried foods, sugar-sweetened beverages, red meat, processed foods) while those with good nutrition literacy consumed more foods associated with prudent and Mediterranean diets (vegetables, olive oil, nuts).
Conclusions:
Nutrition literacy predicted adherence to healthy/unhealthy diet patterns. These findings warrant future research to determine if improving nutrition literacy effectively improves eating patterns.
Data on the combination of foods consumed simultaneously at specific eating occasions are scarce, primarily due to a lack of assessment tools. We applied a recently developed meal coding system to multiple-day dietary intake data for assessing its ability to estimate food and nutrient intakes and characterise meal-based dietary patterns in the Japanese context. A total of 242 Japanese adults completed sixteen non-consecutive-day weighed dietary records, including 14 734 eating occasions (3788 breakfasts, 3823 lunches, 3856 dinners and 3267 snacks). Common food group combinations were identified by meal type to identify a range of generic meals. Dietary intake was calculated on the basis of not only the standard food composition database but also the substituted generic meal database. In total, eighty generic meals (twenty-three breakfasts, twenty-one lunches, twenty-four dinners and twelve snacks) were identified. The Spearman correlation coefficients between food group intakes calculated based on the standard food composition database and the substituted generic meal database ranged from 0·26 to 0·85 (median 0·69). The corresponding correlations for nutrient intakes ranged from 0·17 to 0·82 (median 0·61). A total of eleven meal patterns were established using principal components analysis, and these accounted for 39·1 % of total meal variance. Considerable variation in patterns was seen in meal type inclusion and choice of staple foods (bread, rice and noodles) and drinks, and also in meal constituents. In conclusion, this study demonstrated the usefulness of a meal coding system for assessing habitual diet, providing a scientific basis towards the development of simple meal-based dietary assessment tools.