We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Extending the definitions of part and bipartial correlation to sets of variates, the notion of part and bipartial canonical correlation analysis are developed and illustrated.
The parameter matrices of factor analysis and principal component analysis are arbitrary with respect to the scale of the factors or components; typically, the scale is fixed so that the factors have unit variance. Oblique transformations to optimize an objective statement of a principle such as simple structure or factor simplicity yield arbitrary solutions, unless the criterion function is invariant with respect to the scale of the factors, or the parameter matrix is scale free with respect to the factors. Criterion functions that are factor scale-free have a number of invariance characteristics, such as being equally applicable to primary pattern or reference structure matrices. A scale-invariant simple structure function of previously studied function components is defined. First and second partial derivatives are obtained, and Newton-Raphson iterations are utilized. The resulting solutions are locally optimal and subjectively pleasing.
A scale-invariant index of factorial simplicity is proposed as a summary statistic for principal components and factor analysis. The index ranges from zero to one, and attains its maximum when all variables are simple rather than factorially complex. A factor scale-free oblique transformation method is developed to maximize the index. In addition, a new orthogonal rotation procedure is developed. These factor transformation methods are implemented using rapidly convergent computer programs. Observed results indicate that the procedures produce meaningfully simple factor pattern solutions.
The recent history of multidimensional data analysis suggests two distinct traditions that have developed along quite different lines. In multidimensional scaling (MDS), the available data typically describe the relationships among a set of objects in terms of similarity/dissimilarity (or (pseudo-)distances). In multivariate analysis (MVA), data usually result from observation on a collection of variables over a common set of objects. This paper starts from a very general multidimensional scaling task, defined on distances between objects derived from one or more sets of multivariate data. Particular special cases of the general problem, following familiar notions from MVA, will be discussed that encompass a variety of analysis techniques, including the possible use of optimal variable transformation. Throughout, it will be noted how certain data analysis approaches are equivalent to familiar MVA solutions when particular problem specifications are combined with particular distance approximations.
There is a unity underlying the diversity of models for the analysis of multivariate data. Essentially, they constitute a family models, most generally nonlinear, for structural/functional relations between variables drawn from a behavior domain.
We study the class of multivariate distributions in which all bivariate regressions can be linearized by separate transformation of each of the variables. This class seems more realistic than the multivariate normal or the elliptical distributions, and at the same time its study allows us to combine the results from multivariate analysis with optimal scaling and classical multivariate analysis. In particular a two-stage procedure which first scales the variables optimally, and then fits a simultaneous equations model, is studied in detail and is shown to have some desirable properties.
We present an approach for evaluating coherence in multivariate systems that considers all the variables simultaneously. We operationalize the multivariate system as a network and define coherence as the efficiency with which a signal is transmitted throughout the network. We illustrate this approach with time series data from 15 psychophysiological signals representing individuals’ moment-by-moment emotional reactions to emotional films. First, we summarize the time series through nonparametric Receiver Operating Characteristic (ROC) curves. Second, we use Spearman rank correlations to calculate relationships between each pair of variables. Third, based on the obtained associations, we construct a network using the variables as nodes. Finally, we examine signal transmission through all the nodes in the network. Our results indicate that the network consisting of the 15 psychophysiological signals has a small-world structure, with three clusters of variables and strong within-cluster connections. This structure supports an effective signal transmission across the entire network. When compared across experimental conditions, our results indicate that coherence is relatively stronger for intense emotional stimuli than for neutral stimuli. These findings are discussed in relation to multivariate methods and emotion theories.
In the applications of maximum likelihood factor analysis the occurrence of boundary minima instead of proper minima is no exception at all. In the past the causes of such improper solutions could not be detected. This was impossible because the matrices containing the parameters of the factor analysis model were kept positive definite. By dropping these constraints, it becomes possible to distinguish between the different causes of improper solutions. In this paper some of the most important causes are discussed and illustrated by means of artificial and empirical data.
Generalized structured component analysis (GSCA) is a multivariate method for examining theory-driven relationships between variables including components. GSCA can provide the deterministic component score for each individual once model parameters are estimated. As the traditional GSCA always standardizes all indicators and components, however, it could not utilize information on the indicators’ scale in parameter estimation. Consequently, its component scores could just show the relative standing of each individual for a component, rather than the individual’s absolute standing in terms of the original indicators’ measurement scales. In the paper, we propose a new version of GSCA, named convex GSCA, which can produce a new type of unstandardized components, termed convex components, which can be intuitively interpreted in terms of the original indicators’ scales. We investigate the empirical performance of the proposed method through the analyses of simulated and real data.
Very general multilinear models, called CANDELINC, and a practical least-squares fitting procedure, also called CANDELINC, are described for data consisting of a many-way array. The models incorporate the possibility of general linear constraints, which turn out to have substantial practical value in some applications, by permitting better prediction and understanding. Description of the model, and proof of a theorem which greatly simplifies the least-squares fitting process, is given first for the case involving two-way data and a bilinear model. Model and proof are then extended to the case of N-way data and an N-linear model for general N. The case N = 3 covers many significant applications. Two applications are described: one of two-way CANDELINC, and the other of CANDELINC used as a constrained version of INDSCAL. Possible additional applications are discussed.
We propose a prenet (product-based elastic net), a novel penalization method for factor analysis models. The penalty is based on the product of a pair of elements in each row of the loading matrix. The prenet not only shrinks some of the factor loadings toward exactly zero but also enhances the simplicity of the loading matrix, which plays an important role in the interpretation of the common factors. In particular, with a large amount of prenet penalization, the estimated loading matrix possesses a perfect simple structure, which is known as a desirable structure in terms of the simplicity of the loading matrix. Furthermore, the perfect simple structure estimation via the proposed penalization turns out to be a generalization of the k-means clustering of variables. On the other hand, a mild amount of the penalization approximates a loading matrix estimated by the quartimin rotation, one of the most commonly used oblique rotation techniques. Simulation studies compare the performance of our proposed penalization with that of existing methods under a variety of settings. The usefulness of the perfect simple structure estimation via our proposed procedure is presented through various real data applications.
A distinction is drawn between redundancy measurement and the measurement of multivariate association for two sets of variables. Several measures of multivariate association between two sets of variables are examined. It is shown that all of these measures are generalizations of the (univariate) squared-multiple correlation; all are functions of the canonical correlations, and all are invariant under linear transformations of the original sets of variables. It is further shown that the measures can be considered to be symmetric and are strictly ordered for any two sets of observed variables. It is suggested that measures of multivariate relationship may be used to generalize the concept of test reliability to the case of vector random variables.
In this article we present an exploratory tool for extracting systematic patterns from multivariate data. The technique, hierarchical segmentation (HS), can be used to group multivariate time series into segments with similar discrete-state recurrence patterns and it is not restricted by the stationarity assumption. We use a simulation study to describe the steps and properties of HS. We then use empirical data on daily affect from one couple to illustrate the use of HS for describing the affective dynamics of the dyad. First, we partition the data into three periods that represent different affective states and show different dynamics between both individuals’ affect. We then examine the synchrony between both individuals’ affective states and identify different patterns of coherence across the periods. Finally, we discuss the possibilities of using results from HS to construct confirmatory dynamic models with multiple change points or regime-specific dynamics.
Discriminatory morpho-metric features are obvious on legume seeds. This study utilized seven quantitative and 11 qualitative seed traits to characterize 139 African yam bean (AYB) breeding lines which were developed through single seed descent procedure. The seven quantitative data were subjected to analysis of variance, their means were combined with qualitative scores for genetic distance, principal component (PC) and clustering analyses. Significant (P ≤ 0.001) variation existed among the breeding lines for the seven traits. Mean ranges of seed length (SL), width (SW), thickness (ST) and a single seed weight (SSW) among the 139 breeding lines were respectively: 6.77–10.22 mm, 5.70–7.86 mm, 4.96–7.45 mm and 0.15–0.42 g. Positive and significant (P ≤ 0.05) genotypic correlation existed among SSW, SL, SW and ST. Seed colours, pattern, shapes, sizes, surface texture, brilliance varied among the breeding lines. Ranges of phenotypic and genotypic coefficient of variation and broadsense heritability were: 5.49–23.84%, 2.95–19.88% and 28.91–69.54% respectively. Fourteen (quantitative and qualitative) traits contributed higher (≥ 0.30) eigenvector loadings to the first three PC axes which explained 57.9% of the total variation among the breeding lines. Similarity among the lines was 0.75. Four clusters ensued in the dendrograph and each group had genetic similarities of: 0.85 (I), 0.82 (II), 0.78 (III) and 0.80 (IV). This research unveiled significant variation among AYB breeding lines with promising reliability for breeding opportunities of the qualitative and quantitative seed traits, which could contribute to higher grain yield and acceptability.
This research aimed to assess the agronomic performance of the progeny (F3 and F4 generations) of 48 newly developed Aus rice lines, using a randomized-complete-block-design under rainfed conditions. We found a wide range of variations in yield and yield-contributing traits among the studied genotypes. High board sense heritability percentages were found for sterility percentage (99.50 and 97.20), thousand-grain-weight (88.10 and 90.20 g), plant-height (84.90 and 86.90 cm) and day-to-maturity (84.50 and 97.60 d) in both F3 and F4 generations, respectively. However, the highest genetic advance as mean percentage was observed for sterility (48.00 and 50.60), effective tillers number per hill (ET) (44.70 and 47.10), total tillers number per hill (TT) (43.00 and 45.40) and filled-grains per panicle (41.00 and 43.20) respectively. Notably, the correlation study also identified the traits, TT (r = 0.31 and 0.45), ET (r = 0.30 and 0.44), straw yield (r = 0.57 and 0.39) and harvest index (r = 0.63 and 0.67) as effective for improving grain yield in both F3 and F4 generations, respectively. We identified higher grain yield per hill (g) and shorter to moderate crop growth duration (days) in several distinct accessions, including R1-49-7-1-1, R3-26-4-3-1, R1-6-2-3-1, R1-13-1-1-1, R1-50-1-1-1, R3-49-4-3-1, R1-47-7-3-1, R2-26-6-2-2, R3-30-1-2-1 and R1-44-1-2-1, among the 48 genotypes in both the F3 and F4 generations. A further location-specific agronomic study is recommended to assess the drought tolerance of these promising genotypes. This will further assess their suitability as potential breeding materials when developing rice varieties adapted to grow under fluctuating rainfalls conditions.
Safflower, a semiarid crop, contains a healthy oil with high unsaturated fatty acids. Genetically diverse accessions are important for genetic maintenance of safflower and breeding proposes. The objectives of present investigation were to evaluate the morphological variation of 100 safflower accessions across two years (2022 and 2023), to explore similar genotypic groups and to identify the higher contribution of traits with to the observed variability. The highest coefficient of variation (CV) was observed for seeds per secondary capitulum, number of capitula per plant and weight of lateral capitulum in the first year and the highest CV values were observed for number of capitula per plant and capitula per lateral branch in the second year. The factor analysis identified five factors in the first year and six factors in the second as yield components, height, seed yield, capitulum diameter and phonology while number of branches was identified as the extra factor in the second year. Results showed that the variation of morphologic traits was made up of from the most measured traits of safflower. We defined seven distinct clusters, which made it possible to differentiate safflower accessions based on measured traits across two years. Of 45 accessions were grouped in similar clusters across two years, without any or similar genotype by environment interaction. Some high yielding accessions like C-47 and Lesaf-175 can be entered directly in multi-environmental trials for cultivar release proposes. The recognized variation improves as a good resource, indicating an important issue for future projects for safflower germplasm maintenance and breeding.
Multivariate analysis of variance and discriminant analysis were used to establish the crystal chemistry of several Al-rich smectites. The statistical analyses were carded out on 78 samples taken from the literature which were classified on the basis of their physicochemical properties. A strong discrimination exists between beidellites and montmorillonites, ‘non-ideal’ montmorillonites and ‘ideal’ montmorillonites, and Wyoming-type and Cheto-type montmorillonites. Of the Cheto-type montmorillonites, the Tatatilla-type samples are strongly discriminated, whereas the distinction between Chambers- and Otay-types is not strong. AlIV, AlVI, Fe, Mg, and Ca are generally important discriminating variables, whereas the tetrahedral portion of the layer charge, commonly used as a discriminating factor among these minerals, is only moderately significant.
Multivariate statistical analyses of geochemical, mineralogical, and cation-exchange capacity (CEC) data from a Venezuelan oil well were used to construct a model which relates elemental concentrations to mineral abundances. An r-mode factor analysis showed that most of the variance could be accounted for by four independent factors and that these factors were related to individual mineral components: kaolinite, illite, K-feldspar, and heavy minerals. Concentrations of Al, Fe, and K in core samples were used to estimate the abundances of kaolinite, illite, K-feldspar, and, by subtraction from unity, quartz. Concentrations of these elements were also measured remotely in the well by geochemical logging tools and were used to estimate these mineral abundances on a continuous basis as a function of depth. The CEC was estimated from a linear combination of the derived kaolinite and illite abundances. The formation's thermal neutron capture cross section estimated from the log-derived mineralogy and a porosity log agreed well with the measured data. Concentrations of V, among other trace elements, were modeled as linear combinations of the clay mineral abundances. The measured core V agreed with the derived values in shales and water-bearing sands, but exceeded the clay-derived values in samples containing heavy oil. The excess V was used to estimate the V content and API Gravity of the oil. The log-derived clay mineralogy was used to help distinguish nonmarine from transitional depositional environments. Kaolinite was the dominant clay in nonmarine deposits, whereas transitional sediments contained more illite.
The patterns and extent of genetic variation among 56 cashew germplasm with respect to 32 qualitative and 33 quantitative traits were evaluated for two successive years in the present study. Additionally, maturase K gene-based genetic diversity among those breeding materials was also assessed. The cashew hybrids were developed from five crosses (local parent × 2/9 Dicherla; H-2/15 × red hazari; WBDC-V × JGM-1; BLA-39-4 × H-2/15 and H-2/15 × yellow hazari) involving eight parents of Indian cashew. Different genotype groups (parents and their hybrids) showed significant variation in both the years of assessment based on quantitative characters. The highest Shannon–Weaver diversity (H′) was obtained for the colour of the young leaf (0.96), possibly indicating differential exposure to sunlight, mixing of various pigments and another set of chemicals such as phenolics, carotenoids, etc. in trees. From correlation studies, canopy spread, tree spread, nuts/m2 and nuts/panicle were found to be significantly and positively correlated with nut yield. In the year 2021, nuts/m2, area and tree height were the significant explanatory variables that explained 80% of the variation in the yield, whereas in 2022, nuts/m2, tree area, nuts/panicle, kernel weight, shell thickness, inflorescence breadth and sex ratio explained 86% of the variation. Principal component analysis indicated that the genotypes under study are diverse enough to be exploited for the future cashew improvement programmes.
Soybean is a major source of vegetable oil and protein worldwide. Globally, India is among the top five producers where soybean is a major oilseed grown under diverse agro-climatic conditions by small and marginal farmers. The present study aims to identify soybean varieties with higher yield levels, resistance to pestdiseases and adaptability to climatic fluctuations. One hundred and twenty-five (125) indigenous and exotic soybean germplasm accessions and five checks were evaluated and characterized for eight agro-morphological traits at five testing locations and also screened for frog-eye leaf spot (FLS) and yellow mosaic virus (YMV) diseases under hot-spot locations during the rainy season. A wide range of variability was observed among accessions for days to 50% flowering (39–59), plant height (41–111 cm), number of nodes/plant (10–30), pod clusters/plant (14–39), number of pods/plant (40–102), days to maturity (96–115), grain yield/plant (4.89–16.54 g) and 100-seed weight (6.02–13.72 g). Among various traits, 100-seed weight (0.45), number of pods/plant (0.60) and number of pod clusters/plant (0.38) were found to be major yield-contributing traits as they exhibited highly significant correlation with grain yield/plant. Principal components PCI and PCII with eigen value >1 accounted for 42.66 and 27.08% of the total variation, respectively. Accessions G24 (EC 393222) from Taiwan and G40 (IMP-1) from the USA belonging to cluster IV were found promising for multiple yield traits and JS 20–38 from cluster III for earliness as per cluster analysis. GGE biplot average environment coordination (AEC) view revealed that the accessions viz., G11 (EC 333872), G2 (EC 251506) and G47 (TNAU-S-55) were the best performing stable genotypes in terms of grain yield/plant across locations. Twelve accessions had a high level of resistance against both FLS and YMV diseases under natural hot-spot conditions which can be utilized as promising donors in the soybean breeding programme.