To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Poor socket fit is the leading cause of prosthetic limb discomfort. However, currently clinicians have limited objective data to support and improve socket design. Finite element analysis predictions might help improve the fit, but this requires internal and external anatomy models. While external 3D surface scans are often collected in routine clinical computer-aided design practice, detailed internal anatomy imaging (e.g., MRI or CT) is not. We present a prototype statistical shape model (SSM) describing the transtibial amputated residual limb, generated using a sparse dataset of 33 MRI and CT scans. To describe the maximal shape variance, training scans are size-normalized to their estimated intact tibia length. A mean limb is calculated and principal component analysis used to extract the principal modes of shape variation. In an illustrative use case, the model is interrogated to predict internal bone shapes given a skin surface shape. The model attributes ~52% of shape variance to amputation height and ~17% to slender-bulbous soft tissue profile. In cross-validation, left-out shapes influenced the mean by 0.14–0.88 mm root mean square error (RMSE) surface deviation (median 0.42 mm), and left-out shapes were recreated with 1.82–5.75 mm RMSE (median 3.40 mm). Linear regression between mode scores from skin-only- and full-model SSMs allowed prediction of bone shapes from the skin with 3.56–10.9 mm RMSE (median 6.66 mm). The model showed the feasibility of predicting bone shapes from surface scans, which addresses a key barrier to implementing simulation within clinical practice, and enables more representative prosthetic biomechanics research.
Microwaves (MWs) have emerged as a promising sensing technology to complement optical methods for monitoring floating plastic litter. This study uses machine learning (ML) to identify optimal MW frequencies for detecting floating macroplastics (>5 cm) across S, C, and X-bands. Data were obtained from dedicated wideband backscattering radio measurements conducted in a controlled indoor scenario that mimics deep-sea conditions. The paper presents new strategies to directly analyze the frequency domain signals using ML algorithms, instead of generating an image from those signals and analyzing the image. We propose two ML workflows, one unsupervised, to characterize the difference in feature importance across the measured MW spectrum, and the other supervised, based on multilayer perceptron, to study the detection accuracy in unseen data. For the tested conditions, the backscatter response of the plastic litter is optimal at X-band frequencies, achieving accuracies up to 90% and 80% for lower and higher water wave heights, respectively. Multiclass classification is also investigated to distinguish between different types of plastic targets. ML results are interpreted in terms of the physical phenomena obtained through numerical analysis, and quantified through an energy-based metric.
This chapter covers principal component analysis and low-rank models, which are popular techniques to process high-dimensional datasets with many features. We begin by defining the mean of random vectors and random matrices. Then, we introduce the covariance matrix which encodes the variance of any linear combination of the entries in a random vector, and explain how to estimate it from data. We model the geographic location of Canadian cities as a running example. Next, we present principal component analysis (PCA), a method to extract the directions of maximum variance in a dataset. We explain how to use PCA to find optimal low-dimensional representations of high-dimensional data and apply it to a dataset of human faces. Then, we introduce low-rank models for matrix-valued data and describe how to fit them using the singular-value decomposition. We show that this approach is able to automatically identify meaningful patterns in real-world weather data. Finally, we explain how to estimate missing entries in a matrix under a low-rank assumption and apply this methodology to predict movie ratings via collaborative filtering.
Avocado is a delicious fruit crop having great economic importance. Understanding the extent of variability present in the existing germplasm is important to identify genotypes with specific traits and their utilization in crop improvement. The information on genetic variability with respect to morphological and biochemical traits in Indian avocados is limited and as it has hindered genetic improvement of the crop. In the current study, 83 avocado accessions from different regions of India were assessed for important 17 morphological and 8 biochemical traits. The results showed the existence of wide variability for traits such as fruit weight (75.88–934.12 g), pulp weight (48.08–736.19 g), seed weight (6.37–32.62 g), FRAP activity (27.65–119.81 mg AEAC/100 g), total carotenoids (0.96–7.17 mg/100 g), oil content (4.91–25.49%) and crude fibre (6.85–20.75%) in the studied accessions. The first three components of principal component analysis explained 54.79 per cent of total variance. Traits such as fruit weight, pulp weight, seed weight, moisture and oil content contributed more significantly towards total variance compared to other traits. The dendrogram constructed based on Euclidean distance wards minimum variance method divided 83 accessions into two major groups and nine sub clusters suggesting wide variability in the accessions with respect to studied traits. In this study, superior accessions for important traits such as fruit size (PA-102, PA-012), high pulp recovery (PA-036, PA-082,), thick peel (PA-084, PA-043, PA-011, PA-008), high carotenoids (PA-026, PA-096) and high oil content (PA-044, PA-043, PA-046, PA-045) were identified which have potential utility in further crop improvement programmes.
Perilla is a self-fertilizing crop widely used in East Asia for its seeds and leaves. Of the two varieties of Perilla, P. frutescens var. frutescens has long been used as a folk plant in South Korea. The seeds are rich in unsaturated fatty acids, which offer significant health benefits, making them popular for use in seed oil or as a spice. The leaves, with their high perilla ketone content and unique aroma, are used as leafy vegetables and spices. The morphological characteristics of crops are complex for various reasons, such as environment factors, multiplicity, etc. To better understand the morphological variations among three types of Perilla collected from three regions of South Korea, 7 qualitative traits and 10 quantitative traits were investigated using 500 Perilla accessions. The results of principal component analysis (PCA) indicated that the first two components together explained 52.2% of the overall variation. The 500 Perilla accessions clearly distinguished cultivated var. frutescens from weedy var. crispa and also revealed differences between cultivated and weedy types of var. frutescens. Significant morphological differences were observed among the three types of Perilla, especially in seed and plant characteristics. When the PCA results were analysed by region, regional differences were observed for all three types of Perilla. Therefore, this study provides a better understanding of the morphological and geographical differences in Perilla grown and naturally occurring in South Korea, which will aid research on crop evolution and differentiation, as well as Perilla breeding programmes.
In this chapter, we present two important and related problems in data analysis: the low-rank approximation and principal component analysis (PCA), both based on singular value decomposition. First, we consider the low-rank approximation problem for mappings between two vector spaces. Next we specialize on the low-rank approximation problem for matrices in both induced norm and the Frobenius norm, which are of independent interest for applications. Then we consider PCA. These results are also useful in machine learning. Furthermore, as an extension of the ideas and methods, we present a study of some related matrix nearness problems.
Recent research into vowel covariation has suggested that speakers can be identified as leaders or laggers in multiple ongoing sound changes. What remains unclear is how stable a speaker’s patterns of covariation are over time and whether these leaders and laggers of sound changes remain leaders and laggers over time. We employ corpus data from 51 New Zealand English (NZE) speakers who were recorded at two time-points (eight years apart) and explore covariation between 10 monophthongs using principal component analysis (PCA). The results indicate significant stability across the time-points in two unique vowel clusters, suggesting that speakers’ covariation position within their community remains stable over time. The overall covariation patterns also replicate patterns previously observed in a different corpus of NZE, indicating that patterns of vowel covariation observed with PCA can be stable and replicable across multiple corpora.
Assessing children’s diets is currently challenging and burdensome. Abbreviated FFQ have the potential to assess dietary patterns in a rapid and standardised manner. Using nationally representative UK dietary intake and biomarker data, we developed an abbreviated FFQ to calculate dietary quality scores for pre-school and primary school-aged children. UK National Diet and Nutrition Survey (2008–2016) weekly consumption frequencies of 129 food groups from 4-d diaries were cross-sectionally analysed using principal component analysis. A 129-item score was derived, alongside a 12-item score based on foods with the six highest and six lowest coefficients. Participants included 1069 pre-schoolers and 2565 primary schoolchildren. The first principal component explained 3·4 and 3·0 % of the variation in the original diet variables for pre-school and primary school groups, respectively, and described a prudent diet pattern. Prudent diet scores were characterised by greater consumption of fruit, vegetables and tap water and lower consumption of crisps, manufactured coated chicken/turkey products, purchased chips and soft drinks for both age groups. Correlations between the 129-item and 12-item scores were 0·86 and 0·84 for pre-school and primary school-aged children, respectively. Bland–Altman mean differences between the scores were 0·00 sd; 95 % limits of agreement were −1·05 to 1·05 and −1·10 to 1·10 sd for pre-school and primary school-aged children, respectively. Correlations between dietary scores and nutritional biomarkers showed only minor attenuation for the 12-item compared with the 129-item scores, illustrating acceptable congruence between prudent diet scores. The two 12-item FFQ offer user-friendly tools to measure dietary quality among UK children.
This chapter covers a number of disparate applications of quantum computing in the area of machine learning. We only consider situations where the dataset is classical (rather than quantum). We cover quantum algorithms for big-data problems relying upon high-dimensional linear algebra, such as Gaussian process regression and support vector machines. We discuss the prospect of achieving a quantum speedup with these algorithms, which face certain input/output caveats and must compete against quantum-inspired classical algorithms. We also cover heuristic quantum algorithms for energy-based models, which are generative machine learning models that learn to produce outputs similar to those in a training dataset. Next, we cover a quantum algorithm for the tensor principal component analysis problem, where a quartic speedup may be available, as well as quantum algorithms for topological data analysis, which aim to compute topologically invariant properties of a dataset. We conclude by covering quantum neural networks and quantum kernel methods, where the machine learning model itself is quantum in nature.
We present a multidimensional data analysis framework for the analysis of ordinal response variables. Underlying the ordinal variables, we assume a continuous latent variable, leading to cumulative logit models. The framework includes unsupervised methods, when no predictor variables are available, and supervised methods, when predictor variables are available. We distinguish between dominance variables and proximity variables, where dominance variables are analyzed using inner product models, whereas the proximity variables are analyzed using distance models. An expectation–majorization–minimization algorithm is derived for estimation of the parameters of the models. We illustrate our methodology with three empirical data sets highlighting the advantages of the proposed framework. A simulation study is conducted to evaluate the performance of the algorithm.
Glucosinolates (GSLs) are significant and specialized metabolites found in Brassicas that have crucial roles in both human and plant defence. The present study investigated sinigrin, progoitrin and glucoerucin in Indian cauliflower genotypes using high-performance liquid chromatography (HPLC). For this, 37 genotypes of cauliflower from early (14), mid-early (6), mid-late (15) and late (2) maturity groups along with broccoli (two) and Sicilian purple (one) were evaluated in randomized block design during 2019–20 and 2020–21. Glucoerucin was predominant in most of the cauliflower genotypes (30), followed by sinigrin (5) and progoitrin (2). It was also prominent in broccoli genotypes. Progoitrin was the principal GLS in Sicilian Purple ‘PC-1 (2.430 μmol/g). In cauliflower, the glucoerucin, progoitrin and sinigrin were ranged from 0.067 to 7.248 μmol/g, 0.001 to 0.849 μmol/g and 0.001 to 3.310 μmol/g, respectively. Pusa Deepali (early), Pusa Sharad (mid-early) and Pusa Shukti (mid-late) were found to be ‘low progoitrin-high glucoerucin’ varieties in their respective groups. In the late group, Pusa Snowball Kt-25 had low progoitrin. Glucoerucin and sinigrin were highest in the mid-early group. Progoitrin was highest in genotypes harvested in the first fortnight of November and the second fortnight of February, whereas sinigrin and glucoerucin were maximum in the genotypes harvested during the second fortnight of November. The K-means clustering identified four clusters, and principal component analysis revealed two principal components. The information on three GLSs in Indian cauliflower will be useful for breeding varieties with desirable GSL profiles for public health and plant defence.
This work aimed to evaluate the impact of conversion from native vegetation to pastures and agriculture on soil quality in the Brazilian semi-arid region and identify which soil attributes have the greatest potential as soil quality indicators. We collected soil samples at 0–10 and 10–20 cm layers from seven municipalities in the Brazilian semi-arid region. We determined the stocks of total soil organic carbon (TOC), total nitrogen (TN), carbon and nitrogen from microbial biomass (MB-C, MB-N), oxidizable fractions, humic substances, granulometry, soil bulk density (BD), pH, P, and cation exchange capacity (CEC). The evaluated systems were pasture, agriculture with different implementation times, and native forest (Caatinga biome). The results show that conventional cultivation and grazing systems lead to substantial losses of fundamental attributes needed to maintain soil quality. The study observed losses of MB-C, TOC, TN, and more recalcitrant fractions like fulvic acid and humin, along with a reduction in soil P and CEC. Soil physical, chemical, and biological attributes work as indicators of separation between environments; however, labile compartments showed greater potential as indicators of land use changes, being considered the main indicators in the soil quality assessment.
This chapter provides a selective review of the factor-augmented regression (FAR) models, where the factors are usually estimated from a large set of observed data, and then as “generated regressors” enter into the next stage of regression. It begins with an introduction to the large-dimensional factor models and the widely used principal component analysis (PCA) estimator. Then we review FAR models with time series data, the extensions of FAR to some nonlinear models, and the factor-augmented panel regressions. Lastly, we briefly introduce some applications of FAR to financial markets.
Metabolic syndrome (MetS) is a widespread and complex health disorder. Dietary habits and consumption of simple sugars have been shown to play an important role in the prevention and treatment of MetS. This cross-sectional study was conducted in a population of 3380 adults from the Shiraz University of Medical Sciences (SUMS) employees’ health cohort. The healthy beverage index (HBI) and healthy beverage score (HBS) were calculated. Risk for MetS and its components, including blood pressure, fasting blood glucose, waist circumference, triglyceride levels, and high-density lipoprotein cholesterol, were measured using standardised protocols. Results showed a significant inverse association between higher adherence to HBI (OR = 0.60, 95% CI: 0.48–0.74, P < 0.001) and HBS (OR = 0.80, 95% CI: 0.65–0.97, P = 0.030) with lower risk of MetS. Also, we observed a significant association between higher level of HBI and HBS with decreased risk of hypertension, as a critical component of MetS. These findings support the notion that healthier beverage consumption, as indicated by higher HBI and HBS levels, may play a critical role in reducing the risk of MetS.
The ripening-dependent changes in antioxidant activities and phytochemical content of mango (Mangifera indica L.) cultivar Safaid Chonsa at various ripening stages were evaluated. The ripening time period was divided into five stages (RSI-RSV) and the pulp was subjected to proximate analysis, antioxidant potential, and UHPLC/MS-based non-targeted metabolite fingerprinting. Proximate analyses depicted variations in moisture, dry matter, fat, protein, carbohydrate, and energy parameters. Maximum DPPH activity (51%) was observed at stages III, IV, and V while FRSP increased 31% at RS V as compared to stage I. Total antioxidant capacity and total reducing power potential were maximum (295.7 and 345.71 µg AAE/mg extract, respectively at stage V. Total phenolic content increased from 3.57 µg GAE/mg extract to 5.72 µg GAE/mg extract from stage I to RSIII while 19% increase in total flavonoid content was observed at stage V as compared to stage I. UHPLC/MS analysis showed presence of Aconitic acid, methylisocitric acid, 4-O-methyl gallate, beta-glucogallin, xanthenes, sakebiose, Isobergaptene, Fructoselysine 6-phosphate, Citbismine C, and many others at different ripening stages of chonsa mango extracts. The results conclude that during the mango ripening stages, changes in phytochemical composition have positive correlation with antioxidantive potential. These phytochemicals have nutritional and nutraceutical effects on human health therefore ripening stage should be considered for consumption of mango.
This study bridges the study of social inclusion with welfare regime theory. By linking social inclusion with welfare regimes, we establish a novel analytical framework for assessing global trends and national divergences in social inclusion based on a multidimensional view of the concept. While scholars have developed typologies for social inclusion and welfare regimes independent of each other, limited insights exist on how social inclusion relates to welfare regime typologies. We develop a social inclusion index for 225 countries using principal component analysis with 10 measures of social inclusion from the United Nations’ Sustainable Development Goals Indicators Database. We then employ clustering algorithms to inductively group countries based on the index. We find six “worlds” of social inclusion based on the index and other social factors – the Low, Mid, and High Social Inclusion regimes and the Low, Mid, and High Social Exclusion regimes.
The fuzzy perspective in statistical analysis is first illustrated with reference to the “Informational Paradigm" allowing us to deal with different types of uncertainties related to the various informational ingredients (data, model, assumptions). The fuzzy empirical data are then introduced, referring to J LR fuzzy variables as observed on I observation units. Each observation is characterized by its center and its left and right spreads (LR1 fuzzy number) or by its left and right “centers" and its left and right spreads (LR2 fuzzy number). Two types of component models for LR1 and LR2 fuzzy data are proposed. The estimation of the parameters of these models is based on a Least Squares approach, exploiting an appropriately introduced distance measure for fuzzy data. A simulation study is carried out in order to assess the efficacy of the suggested models as compared with traditional Principal Component Analysis on the centers and with existing methods for fuzzy and interval valued data. An application to real fuzzy data is finally performed.
The component loadings are interpreted by considering their magnitudes, which indicates how strongly each of the original variables relates to the corresponding principal component. The usual ad hoc practice in the interpretation process is to ignore the variables with small absolute loadings or set to zero loadings smaller than some threshold value. This, in fact, makes the component loadings sparse in an artificial and a subjective way. We propose a new alternative approach, which produces sparse loadings in an optimal way. The introduced approach is illustrated on two well-known data sets and compared to the existing rotation methods.
Factor analysis and principal component analysis result in computing a new coordinate system, which is usually rotated to obtain a better interpretation of the results. In the present paper, the idea of rotation to simple structure is extended to two dimensions. While the classical definition of simple structure is aimed at rotating (one-dimensional) factors, the extension to a simple structure for two dimensions is based on the rotation of planes. The resulting planes (principal planes) reveal a better view of the data than planes spanned by factors from classical rotation and hence allow a more reliable interpretation. The usefulness of the method as well as the effectiveness of a proposed algorithm are demonstrated by simulation experiments and an example.
Homogeneity analysis, or multiple correspondence analysis, is usually applied to k separate variables. In this paper we apply it to sets of variables by using sums within sets. The resulting technique is called OVERALS. It uses the notion of optimal scaling, with transformations that can be multiple or single. The single transformations consist of three types: nominal, ordinal, and numerical. The corresponding OVERALS computer program minimizes a least squares loss function by using an alternating least squares algorithm. Many existing linear and nonlinear multivariate analysis techniques are shown to be special cases of OVERALS. An application to data from an epidemiological survey is presented.