We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Many well-known measures for the comparison of distinct partitions of the same set of n objects are based on the structure of class overlap presented in the form of a contingency table (e.g., Pearson's chi-square statistic, Rand's measure, or Goodman-Kruskal's τb), but they all can be rephrased through the use of a simple cross-product index defined between the corresponding entries from two n ×n proximity matrices that provide particular a priori (numerical) codings of the within- and between-class relationships for each of the partitions. We consider the task of optimally constructing the proximity matrices characterizing the partitions (under suitable restriction) so as to maximize the cross-product measure, or equivalently, the Pearson correlation between their entries. The major result presented states that within the broad classes of matrices that are either symmetric, skew-symmetric, or completely arbitrary, optimal representations are already derivable from what is given by a simple one-dimensional correspondence analysis solution. Besides severely limiting the type of structures that might be of interest to consider for representing the proximity matrices, this result also implies that correspondence analysis beyond one dimension must always be justified from logical bases other than the optimization of a single correlational relationship between the matrices representing the two partitions.
Correspondence analysis can be described as a technique which decomposes the departure from independence in a two-way contingency table. In this paper a form of correspondence analysis is proposed which decomposes the departure from the quasi-independence model. This form seems to be a good alternative to ordinary correspondence analysis in cases where the use of the latter is either impossible or not recommended, for example, in case of missing data or structural zeros. It is shown that Nora's reconstitution of order zero, a procedure well-known in the French literature, is formally identical to our correspondence analysis of incomplete tables. Therefore, reconstitution of order zero can also be interpreted as providing a decomposition of the residuals from the quasi-independence model. Furthermore, correspondence analysis of incomplete tables can be performed using existing programs for ordinary correspondence analysis.
Homogeneity analysis, or multiple correspondence analysis, is usually applied to k separate variables. In this paper we apply it to sets of variables by using sums within sets. The resulting technique is called OVERALS. It uses the notion of optimal scaling, with transformations that can be multiple or single. The single transformations consist of three types: nominal, ordinal, and numerical. The corresponding OVERALS computer program minimizes a least squares loss function by using an alternating least squares algorithm. Many existing linear and nonlinear multivariate analysis techniques are shown to be special cases of OVERALS. An application to data from an epidemiological survey is presented.
A discussion of alternative constraint systems has been lacking in the literature on correspondence analysis and related techniques. This paper reiterates earlier results that an explicit choice of constraints has to be made which can have important effects on the resulting scores. The paper also presents new results on dealing with missing data and probabilistic category assignment.
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
In van der Heijden and de Leeuw (1985) it was proposed to use loglinear analysis to detect interactions in a multiway contingency table, and to explore the form of these interactions with correspondence analysis. After performing the exploratory phase of the analysis, we will show here how the results found in this phase can be used for confirmation.
We study the class of multivariate distributions in which all bivariate regressions can be linearized by separate transformation of each of the variables. This class seems more realistic than the multivariate normal or the elliptical distributions, and at the same time its study allows us to combine the results from multivariate analysis with optimal scaling and classical multivariate analysis. In particular a two-stage procedure which first scales the variables optimally, and then fits a simultaneous equations model, is studied in detail and is shown to have some desirable properties.
Taxicab correspondence analysis is based on the taxicab singular value decomposition of a contingency table, and it shares some similar properties with correspondence analysis. It is more robust than the ordinary correspondence analysis, because it gives uniform weights to all the points. The visual map constructed by taxicab correspondence analysis has a larger sweep and clearer perspective than the map obtained by correspondence analysis. Two examples are provided.
Loglinear analysis and correspondence analysis provide us with two different methods for the decomposition of contingency tables. In this paper we will show that there are cases in which these two techniques can be used complementary to each other. More specifically, we will show that often correspondence analysis can be viewed as providing a decomposition of the difference between two matrices, each following a specific loglinear model. Therefore, in these cases the correspondence analysis solution can be interpreted in terms of the difference between these loglinear models. A generalization of correspondence analysis, recently proposed by Escofier, will also be discussed. With this decomposition, which includes classical correspondence analysis as a special case, it is possible to use correspondence analysis complementary to loglinear analysis in more instances than those described for classical correspondence analysis. In this context correspondence analysis is used for the decomposition of the residuals of specific restricted loglinear models.
Dual scaling (DS) is a multivariate exploratory method equivalent to correspondence analysis when analysing contingency tables. However, for the analysis of rating data, different proposals appear in the DS and correspondence analysis literature. It is shown here that a peculiarity of the DS method can be exploited to detect differences in response styles. Response styles occur when respondents use rating scales differently for reasons not related to the questions, often biasing results. A spline-based constrained version of DS is devised which can detect the presence of four prominent types of response styles, and is extended to allow for multiple response styles. An alternating nonnegative least squares algorithm is devised for estimating the parameters. The new method is appraised both by simulation studies and an empirical application.
The manner in which the conditional independence graph of a multiway contingency table effects the fitting and interpretation of the Goodman association model (RC) and of correspondence analysis (CA) is considered.
Estimation of the row and column scores is presented in this context by developing a unified framework that includes both models. Incorporation of the conditional independence constraints inherent in the graph may lead to equal or additive scores for the corresponding marginal tables, depending on the topology of the graph. An example of doubly additive scores in the analysis of a Burt subtable is given.
The perturbation theory of the generalized eigenproblem is used to derive influence functions of each squared canonical correlation coefficient and the corresponding canonical vector pair. Three sample versions of these functions are described and some properties are noted. As particular applications, the influence function of the squared multiple correlation coefficient and influence functions of eigenvalues and eigenvectors in correspondence analysis are obtained. Three numerical examples are briefly discussed.
Correspondence analysis (CA) is a statistical method for depicting the relationship between two categorical variables, and usually places an emphasis on graphical representations. In this study, we discuss a CA formulation based on canonical correlation analysis (CCA). In CCA-based formulation, the correlations within and between row/column categories in a reduced dimensional space can be expressed by canonical variables. However, in existing CCA-based formulations, only orthogonal rotation is permitted. Herein, we propose an alternative CCA-based formulation that permits oblique rotation. In the proposed formulation, the CA loss function can be defined as maximizing the generalized coefficient of determination, which is a measure of proximity between two variables. Simulation studies and real data examples are presented in order to demonstrate the benefits of the proposed formulation.
The aim of this paper is to study the analysis of contingency tables with one heavyweight column or one heavyweight entry by taxicab correspondence analysis (TCA). Given that the mathematics of TCA is simpler than the mathematics of correspondence analysis (CA), the influence of one heavyweight column on the outputs of TCA is studied explicitly without recourse to asymptotics as done by Benzécri (Les Cahiers de L’Analyse des Données 4:413–16, 1979). A reweighting of the heavyweight column is proposed, which can also be applied to CA. A real data set is analyzed.
Whereas in one-mode data, individuals or groups are connected directly with one another through interactions or relations, in two-mode data, individuals are indirectly connected with one another through affiliations (events, organizations, associations, alliances, and so on). Affiliation data are often used as a proxy for detecting ties among social actors when direct evidence of ties is difficult to obtain. For example, it is generally easier to know that two people belong to the same club or work in the same department than to know that they have lunch together every Thursday. But affiliation data can also be used to see aspects of social structures not visible in one-mode networks. Duality is a kind of structural relation that shows how levels of social structure intersect with one another. We discuss the classic approach to duality as well as two generalizations that extend the duality approach in hierarchical, temporal, and spatial directions.
The components or functions derived from an eigenanalysis are linear combinations of the original variables. Principal components analysis (PCA) is a very common method that uses these components to examine patterns among the objects, often in a plot termed an ordination, and identify which variables are driving those patterns. Correspondence analysis (CA) is a related method used when the variables represent counts or abundances. Redundancy analysis and canonical CA are constrained versions of PCA and CA, respectively, where the components are derived after taking into account the relationships with additional explanatory variables. Finally, we introduce linear discriminant function analysis as a way of identifying and predicting membership of objects to predefined groups.
The information included on food packages has a crucial role in influencing consumer product associations and purchase decisions. In particular, visual and textual cues on processed and ultra-processed products can convey health-related associations that influence consumer healthiness perception and purchase decisions. In this context, the present work aimed to explore the use of health-related cues on the packages of processed and ultra-processed products sold in Uruguay to provide insights for policy making. A total of 3813 products from thirty-four different food categories found in four of the most important supermarket chains in Uruguay were surveyed. The textual and visual information included on the packages as well as the nutritional composition of the products were analysed. Results showed that 67 % of the products included at least one health-related cue. Pictures of culinary ingredients, natural and minimally processed foods were the most frequent health-related cue, followed by references to naturalness and claims related to critical nutrients. The prevalence of health-related cues largely differed across product categories, ranging from 100 to 17 %. The relationship between the presence of health-related cues on the packages and the excessive content of nutrients associated with non-communicable diseases was assessed using a gradient boosting model, which showed limited predictive ability. This suggests that the inclusion of health-related cues on food packages was not strongly related to the nutritional composition of products and therefore cannot be regarded as a healthiness indicator. These results stress the need to develop stricter labelling regulations to protect consumers from misleading information.
This article examines the phenomenon of so-called North African-style pottery made in early third-century York. The pottery, which was produced in significant quantities in late Ebor ware, is strikingly different from vessels in circulation in Roman Britain and the north-west provinces – so much so that the late Vivien Swan argued that it was ‘made by Africans for the use of Africans’. The present study reassesses the evidence of ceramic genealogical influences, production waste, fabric supply, consumption patterns and contextual finds associations. The results shed new light on the manufacture and use of late Ebor ware by York's military community, qualifying claims made about the repertoire's links with novel culinary practices, cultural diversity and the unique historical circumstances of Severan York.
Disturbances and successional dynamics shape the composition of tree communities, but data remain scarce for tropical forests of West Africa. We assessed the imprint of past disturbances on the composition of evergreen forests in an Ivorian National Park. We hypothesized that (i) Pioneer indices (PI) based on the relative proportion of pioneer and non-pioneer trees relate to changing floristic composition due to successional dynamics, (ii) local community richness peaks at an intermediate value of PI under the Intermediate Disturbance Hypothesis (IDH) and (iii) early successional communities have higher beta diversity due to erratic founder effects. We performed a Correspondence Analysis of tree composition of 38 plots and examined how the main components of floristic variation related to environmental factors and PI. In addition, we tested the relationship between PI, local richness and beta diversity. The variation of PI better explained the main components of floristic variation than abiotic environmental variation, supporting a primary role of successional dynamics in shaping tree communities. We found a peak of richness at intermediate values of PI, supporting the IDH, with a mixture and earlier and later-successional species and more even abundances. The communities were very diverse and included many endemics and rare species. The results underline that the composition of early successional forests greatly varies depending on chance colonization events, while more similar old-growth communities are eventually observed after several decades. The findings should guide management practices for forest restoration, and for conservation of endangered species depending on their successional status.
This paper proposes a robust text classification and correspondence analysis approach to identification of similar languages. In particular, we propose to use the readily available information of clauses and word length distribution to model similar languages. The modeling and classification are based on the hypothesis that languages are self-adaptive complex systems and hence can be classified by dynamic features describing the system, especially in terms of distributional relations of constituents of a system. For similar languages whose grammatical differences are often subtle, classification based on dynamic system features should be more effective. To test this hypothesis, we considered both regional and genre varieties of Mandarin Chinese for classification. The data are extracted from two comparable balanced corpora to minimize possible confounding factors. The two corpora are the Sinica Corpus from Taiwan and the Lancaster Corpus of Mandarin Chinese from Mainland China, and the two genres are reportage and review. Our text classification and correspondence analysis results show that the linguistically felicitous two-level constituency model combining power functions between word and clauses effectively classifies the two varieties of Chinese for both genres. In addition, we found that genres do have compounding effect on classification of regional varieties. In particular, reportage in two varieties is more likely to be classified than review, corroborating the complex system view of language variations. That is, language variations and changes typically do not take place evenly across the board for the complete language system. This further enhances our hypothesis that dynamic complex system features, such as the power functions captured by the Menzerath–Altmann law, provide effective models in classifications of similar languages.