To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter statistically tests the relationship between American hierarchy, property rights, and state capacity using mediation analysis. It finds that American economic hierarchy enhances property rights in partner states, indirectly strengthening state capacity. The analysis explores scope conditions and the interaction between security and economic hierarchy, highlighting the contrasting effects on state-building. The chapter discusses the implications of the quantitative results for cases like Afghanistan.
This study investigated the factors influencing the mental health of rural doctors in Hebei Province, to provide a basis for improving the mental health of rural doctors and enhancing the level of primary health care.
Background:
The aim of this study was to understand the mental health of rural doctors in Hebei Province, identify the factors that influence it, and propose ways to improve their psychological status and the level of medical service of rural doctors.
Methods:
Rural doctors from 11 cities in Hebei Province were randomly selected, and their basic characteristics and mental health status were surveyed via a structured questionnaire and the Symptom Checklist-90 (SCL-90). The differences between the SCL-90 scores of rural doctors in Hebei Province and the Chinese population norm, as well as the proportion of doctors with mental health problems, were compared. Logistic regression was used to analyse the factors that affect the mental health of rural doctors.
Results:
A total of 2593 valid questionnaires were received. The results of the study revealed several findings: the younger the rural doctors, the greater the incidence of mental health problems (OR = 0.792); female rural doctors were more likely to experience mental health issues than their male counterparts (OR = 0.789); rural doctors with disabilities and chronic diseases faced a significantly greater risk of mental health problems compared to healthy rural doctors (OR = 2.268); rural doctors with longer working hours have a greater incidence of mental health problems; and rural doctors with higher education backgrounds have a higher prevalence of somatization (OR = 1.203).
Conclusion:
Rural doctors who are younger, male, have been in medical service longer, have a chronic illness or disability, and have a high degree of education are at greater risk of developing mental health problems. Attention should be given to the mental health of the rural doctor population to improve primary health care services.
This chapter begins by defining an averaging procedure for random variables, known as the mean. We show that the mean is linear, and also that the mean of the product of independent variables equals the product of their means. Then, we derive the mean of popular parametric distributions. Next, we caution that the mean can be severely distorted by extreme values, as illustrated by an analysis of NBA salaries. In addition, we define the mean square, which is the average squared value of a random variable, and the variance, which is the mean square deviation from the mean. We explain how to estimate the variance from data and use it to describe temperature variability at different geographic locations. Then, we define the conditional mean, a quantity that represents the average of a variable when other variables are fixed. We prove that the conditional mean is an optimal solution to the problem of regression, where the goal is to estimate a quantity of interest as a function of other variables. We end the chapter by studying how to estimate average causal effects.
Political scientists regularly rely on a selection-on-observables assumption to identify causal effects of interest. Once a causal effect has been identified in this way, a wide variety of estimators can, in principle, be used to consistently estimate the effect of interest. While these estimators are all justified by appeals to the same causal identification assumptions, they often differ greatly in how they make use of the data at hand. For instance, methods based on regression rely on an explicit model of the outcome variable but do not explicitly model the treatment assignment process, whereas methods based on propensity scores explicitly model the treatment assignment process but do not explicitly model the outcome variable. Understanding the tradeoffs between estimation methods is complicated by these seemingly fundamental differences. In this paper we seek to rectify this problem. We do so by clarifying how most estimators of causal effects that are justified by an appeal to a selection-on-observables assumption are all special cases of a general weighting estimator. We then explain how this commonality provides for diagnostics that allow for meaningful comparisons across estimation methods—even when the methods are seemingly very different. We illustrate these ideas with two applied examples.
Factor score indeterminacy is a characteristic property of factor analysis (FA) models. This research introduces a novel procedure, regression-based factor score exploration (RFE), which uniquely determines factor scores and simultaneously estimates other parameters of the FA model. RFE uniquely determines factor scores by minimizing a loss function that balances FA and multivariate regression, regulated by a tuning parameter. Theoretical aspects of RFE, including the uniqueness of factor scores, the relationship between observed and latent variables, and rotational indeterminacy, are examined. Additionally, clustering-based factor exploration (CFE) is presented as a variant of RFE, derived by generalizing the penalty term to enable the clustering of factor scores. It is demonstrated that CFE creates cluster structures more accurately than the existing method. A simulation study shows that the proposed procedures accurately recover true parameter matrices even in the presence of error-contaminated data, with lower computational demand compared to existing methods. Real data examples illustrate that the proposed procedures provide interpretable results, demonstrating high relevance to the factor scores obtained by existing methods.
In this chapter, we introduce the reader to basic concepts in machine learning. We start by defining the artificial intelligence, machine learning, and deep learning. We give a historical viewpoint on the field, also from the perspective of statistical physics. Then, we give a very basic introduction to different tasks that are amenable for machine learning such as regression or classification and explain various types of learning. We end the chapter by explaining how to read the book and how chapters depend on each other.
This paper uses a two-step approach to modelling the probability of a policyholder making an auto insurance claim. We perform clustering via Gaussian mixture models and cluster-specific binary regression models. We use telematics information along with traditional auto insurance information and find that the best model incorporates telematics, without the need for dimension reduction via principal components. We also utilise the probabilistic estimates from the mixture model to account for the uncertainty in the cluster assignments. The clustering process allows for the creation of driving profiles and offers a fairer method for policyholder segmentation than when clustering is not used. By fitting separate regression models to the observations from the respective clusters, we are able to offer differential pricing, which recognises that policyholders have different exposures to risk despite having similar covariate information, such as total miles driven. The approach outlined in this paper offers an explainable and interpretable model that can compete with black box models. Our comparisons are based on a synthesised telematics data set that was emulated from a real insurance data set.
Regression is a fundamental prediction task common in data-centric engineering applications that involves learning mappings between continuous variables. In many engineering applications (e.g., structural health monitoring), feature-label pairs used to learn such mappings are of limited availability, which hinders the effectiveness of traditional supervised machine learning approaches. This paper proposes a methodology for overcoming the issue of data scarcity by combining active learning (AL) for regression with hierarchical Bayesian modeling. AL is an approach for preferentially acquiring feature-label pairs in a resource-efficient manner. In particular, the current work adopts a risk-informed approach that leverages contextual information associated with regression-based engineering decision-making tasks (e.g., inspection and maintenance). Hierarchical Bayesian modeling allow multiple related regression tasks to be learned over a population, capturing local and global effects. The information sharing facilitated by this modeling approach means that information acquired for one engineering system can improve predictive performance across the population. The proposed methodology is demonstrated using an experimental case study. Specifically, multiple regressions are performed over a population of machining tools, where the quantity of interest is the surface roughness of the workpieces. An inspection and maintenance decision process is defined using these regression tasks, which is in turn used to construct the active-learning algorithm. The novel methodology proposed is benchmarked against an uninformed approach to label acquisition and independent modeling of the regression tasks. It is shown that the proposed approach has superior performance in terms of expected cost—maintaining predictive performance while reducing the number of inspections required.
The current context of regressive border regimes challenges critical theory’s commitments. Can we still take recent legal and political practices as starting points for reimagining political norms and institutions based on a reconstruction of hidden emancipatory potentials? The chapter argues that critical border theory could benefit from recentering the idea of political representation, and especially from building on insights of the recent constructivist turn in representation theory. Understanding political representation as shape-shifting and constituency-mobilizing changes long-held assumptions about the spaces, subjects, and demands articulated in border politics. While this representative perspective has diagnostic advantages, it is unable to criticize the legitimacy of existing border regimes owing to its thin normative assumptions. Reconstructive approaches to border politics should therefore use the diagnostic tools of the recent representation scholarship without committing to their limited critical potential
Multivariate selection can be represented as a linear transformation in a geometric framework. This approach has led to considerable simplification in the study of the effects of selection on factor analysis. In this note this approach is extended to describe the effects of selection on regression analysis and to adjust for the effects of selection using the inverse of the linear transformation.
Bayesian least squares techniques are adapted to estimation of stimulus-response curves, rather broadly conceived. Illustrative examples deal with estimation of person characteristic curves and item characteristic curves in the context of mental testing, and estimation of a stimulus-response curve using data from a psychophysical experiment.
There is a unity underlying the diversity of models for the analysis of multivariate data. Essentially, they constitute a family models, most generally nonlinear, for structural/functional relations between variables drawn from a behavior domain.
This paper discusses least squares methods for fitting a reformulation of the general Euclidean model for the external analysis of preference data. The reformulated subject weights refer to a common set of reference vectors for all subjects and hence are comparable across subjects. If the rotation of the stimulus space is fixed, the subject weight estimates in the model are uniquely determined. Weight estimates can be guaranteed nonnegative. While the reformulation is a metric model for single stimulus data, the paper briefly discusses extensions to nonmetric, pairwise, and logistic models. The reformulated model is less general than Carroll's earlier formulation.
A summary and interpretation of the recent literature on the indeterminacy of factor scores is given in simple terms. A good index of factor score determinacy is the squared multiple correlation of the factor with the observed variables.
For analyses with missing data, some popular procedures delete cases with missing values, perform analysis with “missing value” correlation or covariance matrices, or estimate missing values by sample means. There are objections to each of these procedures. Several procedures are outlined here for replacing missing values by regression values obtained in various ways, and for adjusting coefficients (such as factor score coefficients) when data are missing. None of the procedures are complex or expensive.
Structural equation models with latent variables are sometimes estimated using an intuitive three-step approach, here denoted factor score regression. Consider a structural equation model composed of an explanatory latent variable and a response latent variable related by a structural parameter of scientific interest. In this simple example estimation of the structural parameter proceeds as follows: First, common factor models areseparately estimated for each latent variable. Second, factor scores areseparately assigned to each latent variable, based on the estimates. Third, ordinary linear regression analysis is performed among the factor scores producing an estimate for the structural parameter. We investigate the asymptotic and finite sample performance of different factor score regression methods for structural equation models with latent variables. It is demonstrated that the conventional approach to factor score regression performs very badly. Revised factor score regression, using Regression factor scores for the explanatory latent variables and Bartlett scores for the response latent variables, produces consistent estimators for all parameters.
In the current paper, we review existing tools for solving variable selection problems in psychology. Modern regularization methods such as lasso regression have recently been introduced in the field and are incorporated into popular methodologies, such as network analysis. However, several recognized limitations of lasso regularization may limit its suitability for psychological research. In this paper, we compare the properties of lasso approaches used for variable selection to Bayesian variable selection approaches. In particular we highlight advantages of stochastic search variable selection (SSVS), that make it well suited for variable selection applications in psychology. We demonstrate these advantages and contrast SSVS with lasso type penalization in an application to predict depression symptoms in a large sample and an accompanying simulation study. We investigate the effects of sample size, effect size, and patterns of correlation among predictors on rates of correct and false inclusion and bias in the estimates. SSVS as investigated here is reasonably computationally efficient and powerful to detect moderate effects in small sample sizes (or small effects in moderate sample sizes), while protecting against false inclusion and without over-penalizing true effects. We recommend SSVS as a flexible framework that is well-suited for the field, discuss limitations, and suggest directions for future development.
In the course of the medical program at the University of Limburg, students complete a total of 24 progress tests, consisting of items drawn from a constant itembank. A model is presented for the growth of knowledge reflected by these results. The Rasch model is used as a starting point, but both ability and difficulty parameters are taken to be random, and moreover the logistic distribution is replaced by the normal. Both individual and group abilities are estimated and explained through simple linear regression. Application to real data shows that the model fits very well.
This chapter empirically analyzes how portfolios of external finance impact aid agreements. The chapter integrates data on external debt and foreign aid to establish a comprehensive picture of developing countries' portfolios of external finance, demonstrating that these have become less reliant on traditional donors over time. The analysis tests if a greater share of finance from Chinese or private sources is associated with favorable terms from traditional donors, using measures of aid volume, infrastructure project share, and conditions attached to World Bank projects. The findings indicate that as countries draw a greater share of their external finance from nontraditional sources, they are more likely to receive aid on preferred terms. The relationship is stronger for countries of strategic significance to donors and, especially, those with higher donor trust.
Suppose you are running a company that provides proofreading services to publishers. You employ people who sit in front of screens, correcting written text. Spelling errors are the most frequent problem, so you are motivated to hire proofreaders who are excellent spellers. Therefore, you decide to give your job applicants a spelling test. It isn’t hard: throw together 25 words, and score everyone on a scale of 0–25. You are now a social scientist, a specialist called a psychometrician, measuring “spelling ability.”