Search

matrixdist: an R package for statistical analysis of matrix distributions
Martin Bladt, Alaric Mueller, Jorge Yslas
Journal:

Annals of Actuarial Science , First View

Published online by Cambridge University Press:

02 October 2025, pp. 1-35
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The matrixdist R package provides a comprehensive suite of tools for the statistical analysis of matrix distributions, including phase-type, inhomogeneous phase-type, discrete phase-type, and related multivariate distributions. This paper introduces the package and its key features, including the estimation of these distributions and their extensions through expectation-maximization algorithms, as well as the implementation of regression through the proportional intensities and mixture-of-experts models. Additionally, the paper provides an overview of the theoretical background, discusses the algorithms and methods implemented in the package, and offers practical examples to illustrate the application of matrixdist in real-world actuarial problems. The matrixdist R package aims to provide researchers and practitioners a wide set of tools for analyzing and modeling complex data using matrix distributions.

6 - Statistical Analysis of Hierarchy, Property Rights, and State Capacity
Patrick E. Shea, University of Glasgow
Book:

Hierarchy and the State

Published online:

20 August 2025

Print publication:

02 October 2025, pp 117-137
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter statistically tests the relationship between American hierarchy, property rights, and state capacity using mediation analysis. It finds that American economic hierarchy enhances property rights in partner states, indirectly strengthening state capacity. The analysis explores scope conditions and the interaction between security and economic hierarchy, highlighting the contrasting effects on state-building. The chapter discusses the implications of the quantitative results for cases like Afghanistan.

Mental health of rural doctors and influencing factors in Hebei, China
Yatian Liu, Hanling Di, Yunqing Xu, Ziwei Yang, Ye Zhang, Yuqi Yuan, Ning Zhang, Jiajun Li, Biao Zhao, Yu Wang, Yujie Niu, Longmei Tang
Journal:

Primary Health Care Research & Development / Volume 26 / 2025

Published online by Cambridge University Press:

11 July 2025, e55
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Aim:
This study investigated the factors influencing the mental health of rural doctors in Hebei Province, to provide a basis for improving the mental health of rural doctors and enhancing the level of primary health care.
Background:
The aim of this study was to understand the mental health of rural doctors in Hebei Province, identify the factors that influence it, and propose ways to improve their psychological status and the level of medical service of rural doctors.
Methods:
Rural doctors from 11 cities in Hebei Province were randomly selected, and their basic characteristics and mental health status were surveyed via a structured questionnaire and the Symptom Checklist-90 (SCL-90). The differences between the SCL-90 scores of rural doctors in Hebei Province and the Chinese population norm, as well as the proportion of doctors with mental health problems, were compared. Logistic regression was used to analyse the factors that affect the mental health of rural doctors.
Results:
A total of 2593 valid questionnaires were received. The results of the study revealed several findings: the younger the rural doctors, the greater the incidence of mental health problems (OR = 0.792); female rural doctors were more likely to experience mental health issues than their male counterparts (OR = 0.789); rural doctors with disabilities and chronic diseases faced a significantly greater risk of mental health problems compared to healthy rural doctors (OR = 2.268); rural doctors with longer working hours have a greater incidence of mental health problems; and rural doctors with higher education backgrounds have a higher prevalence of somatization (OR = 1.203).
Conclusion:
Rural doctors who are younger, male, have been in medical service longer, have a chronic illness or disability, and have a high degree of education are at greater risk of developing mental health problems. Attention should be given to the mental health of the rural doctor population to improve primary health care services.

7 - Averaging
Carlos Fernandez-Granda, New York University
Book:

Probability and Statistics for Data Science

Published online:

19 June 2025

Print publication:

03 July 2025, pp 241-283
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter begins by defining an averaging procedure for random variables, known as the mean. We show that the mean is linear, and also that the mean of the product of independent variables equals the product of their means. Then, we derive the mean of popular parametric distributions. Next, we caution that the mean can be severely distorted by extreme values, as illustrated by an analysis of NBA salaries. In addition, we define the mean square, which is the average squared value of a random variable, and the variance, which is the mean square deviation from the mean. We explain how to estimate the variance from data and use it to describe temperature variability at different geographic locations. Then, we define the conditional mean, a quantity that represents the average of a variable when other variables are fixed. We prove that the conditional mean is an optimal solution to the problem of regression, where the goal is to estimate a quantity of interest as a function of other variables. We end the chapter by studying how to estimate average causal effects.

What to Observe When Assuming Selection on Observables
Kevin M. Quinn, Guoer Liu, Lee Epstein, Andrew D. Martin
Journal:

Political Analysis , First View

Published online by Cambridge University Press:

27 June 2025, pp. 1-22
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Political scientists regularly rely on a selection-on-observables assumption to identify causal effects of interest. Once a causal effect has been identified in this way, a wide variety of estimators can, in principle, be used to consistently estimate the effect of interest. While these estimators are all justified by appeals to the same causal identification assumptions, they often differ greatly in how they make use of the data at hand. For instance, methods based on regression rely on an explicit model of the outcome variable but do not explicitly model the treatment assignment process, whereas methods based on propensity scores explicitly model the treatment assignment process but do not explicitly model the outcome variable. Understanding the tradeoffs between estimation methods is complicated by these seemingly fundamental differences. In this paper we seek to rectify this problem. We do so by clarifying how most estimators of causal effects that are justified by an appeal to a selection-on-observables assumption are all special cases of a general weighting estimator. We then explain how this commonality provides for diagnostics that allow for meaningful comparisons across estimation methods—even when the methods are seemingly very different. We illustrate these ideas with two applied examples.

Identification of Factor Scores by Regression with External Variables in Exploratory Factor Analysis
Naoto Yamashita
Journal:

Psychometrika / Volume 90 / Issue 3 / June 2025

Published online by Cambridge University Press:

16 June 2025, pp. 1097-1110
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Factor score indeterminacy is a characteristic property of factor analysis (FA) models. This research introduces a novel procedure, regression-based factor score exploration (RFE), which uniquely determines factor scores and simultaneously estimates other parameters of the FA model. RFE uniquely determines factor scores by minimizing a loss function that balances FA and multivariate regression, regulated by a tuning parameter. Theoretical aspects of RFE, including the uniqueness of factor scores, the relationship between observed and latent variables, and rotational indeterminacy, are examined. Additionally, clustering-based factor exploration (CFE) is presented as a variant of RFE, derived by generalizing the penalty term to enable the clustering of factor scores. It is demonstrated that CFE creates cluster structures more accurately than the existing method. A simulation study shows that the proposed procedures accurately recover true parameter matrices even in the presence of error-contaminated data, with lower computational demand compared to existing methods. Real data examples illustrate that the proposed procedures provide interpretable results, demonstrating high relevance to the factor scores obtained by existing methods.

1 - Introduction
Anna Dawid, Uniwersytet Warszawski, Poland, Julian Arnold, Universität Basel, Switzerland, Borja Requena, ICFO - The Institute of Photonic Sciences, Alexander Gresch, Heinrich-Heine-Universität Düsseldorf, Marcin Płodzień, ICFO - The Institute of Photonic Sciences, Kaelan Donatella, Université de Paris VII (Denis Diderot), Kim A. Nicoli, University of Bonn, Paolo Stornati, ICFO - The Institute of Photonic Sciences, Rouven Koch, Aalto University, Finland, Miriam Büttner, Albert-Ludwigs-Universität Freiburg, Germany, Robert Okuła, Gdańsk University of Technology, Gorka Muñoz-Gil, Universität Innsbruck, Austria, Rodrigo A. Vargas-Hernández, McMaster University, Ontario, Alba Cervera-Lierta, Centro Nacional de Supercomputación, Juan Carrasquilla, Swiss Federal Institute of Technology in Zurich, Vedran Dunjko, Universiteit Leiden, Marylou Gabrié, Institut Polytechnique de Paris, Patrick Huembeli, Evert van Nieuwenburg, Universiteit Leiden, Filippo Vicentini, Institut Polytechnique de Paris, Lei Wang, Chinese Academy of Sciences, Beijing, Sebastian J. Wetzel, University of Waterloo, Ontario, Giuseppe Carleo, École Polytechnique Fédérale de Lausanne, Eliška Greplová, Technische Universiteit Delft, The Netherlands, Roman Krems, University of British Columbia, Vancouver, Florian Marquardt, Max-Planck-Institut für die Wissenschaft des Lichts, Michał Tomza, Uniwersytet Warszawski, Maciej Lewenstein, ICFO - Institute of Photonic Sciences, Alexandre Dauphin, Instituto de Ciencias Fotónicas
Book:

Machine Learning in Quantum Sciences

Published online:

13 June 2025

Print publication:

12 June 2025, pp 1-13
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we introduce the reader to basic concepts in machine learning. We start by defining the artificial intelligence, machine learning, and deep learning. We give a historical viewpoint on the field, also from the perspective of statistical physics. Then, we give a very basic introduction to different tasks that are amenable for machine learning such as regression or classification and explain various types of learning. We end the chapter by explaining how to read the book and how chapters depend on each other.

Statistical models for improved insurance risk assessment using telematics
James Hannon, Adrian O’Hagan
Journal:

British Actuarial Journal / Volume 30 / 2025

Published online by Cambridge University Press:

26 May 2025, e16
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper uses a two-step approach to modelling the probability of a policyholder making an auto insurance claim. We perform clustering via Gaussian mixture models and cluster-specific binary regression models. We use telematics information along with traditional auto insurance information and find that the best model incorporates telematics, without the need for dimension reduction via principal components. We also utilise the probabilistic estimates from the mixture model to account for the uncertainty in the cluster assignments. The clustering process allows for the creation of driving profiles and offers a fairer method for policyholder segmentation than when clustering is not used. By fitting separate regression models to the observations from the respective clusters, we are able to offer differential pricing, which recognises that policyholders have different exposures to risk despite having similar covariate information, such as total miles driven. The approach outlined in this paper offers an explainable and interpretable model that can compete with black box models. Our comparisons are based on a synthesised telematics data set that was emulated from a real insurance data set.

Active learning for regression in engineering populations: a risk-informed approach
Daniel R. Clarkson, Lawrence A. Bull, Tina A. Dardeno, Chandula T. Wickramarachchi, Elizabeth J. Cross, Timothy J. Rogers, Keith Worden, Nikolaos Dervilis, Aidan J. Hughes
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

21 February 2025, e16
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Regression is a fundamental prediction task common in data-centric engineering applications that involves learning mappings between continuous variables. In many engineering applications (e.g., structural health monitoring), feature-label pairs used to learn such mappings are of limited availability, which hinders the effectiveness of traditional supervised machine learning approaches. This paper proposes a methodology for overcoming the issue of data scarcity by combining active learning (AL) for regression with hierarchical Bayesian modeling. AL is an approach for preferentially acquiring feature-label pairs in a resource-efficient manner. In particular, the current work adopts a risk-informed approach that leverages contextual information associated with regression-based engineering decision-making tasks (e.g., inspection and maintenance). Hierarchical Bayesian modeling allow multiple related regression tasks to be learned over a population, capturing local and global effects. The information sharing facilitated by this modeling approach means that information acquired for one engineering system can improve predictive performance across the population. The proposed methodology is demonstrated using an experimental case study. Specifically, multiple regressions are performed over a population of machining tools, where the quantity of interest is the surface roughness of the workpieces. An inspection and maintenance decision process is defined using these regression tasks, which is in turn used to construct the active-learning algorithm. The novel methodology proposed is benchmarked against an uninformed approach to label acquisition and independent modeling of the regression tasks. It is shown that the proposed approach has superior performance in terms of expected cost—maintaining predictive performance while reducing the number of inspections required.

15 - Shifting Borders, Shifting Political Representation
from Part IV - Democratizing Shifting Borders
- By Svenja Ahlhaus
Edited by Seyla Benhabib, Yale University and Columbia Law School, Ayelet Shachar, University of Toronto and University of California, Berkeley
Book:

Lawless Zones, Rightless Subjects

Published online:

02 January 2025

Print publication:

09 January 2025, pp 264-279
- Chapter
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Summary

The current context of regressive border regimes challenges critical theory’s commitments. Can we still take recent legal and political practices as starting points for reimagining political norms and institutions based on a reconstruction of hidden emancipatory potentials? The chapter argues that critical border theory could benefit from recentering the idea of political representation, and especially from building on insights of the recent constructivist turn in representation theory. Understanding political representation as shape-shifting and constituency-mobilizing changes long-held assumptions about the spaces, subjects, and demands articulated in border politics. While this representative perspective has diagnostic advantages, it is unable to criticize the legitimacy of existing border regimes owing to its thin normative assumptions. Reconstructive approaches to border politics should therefore use the diagnostic tools of the recent representation scholarship without committing to their limited critical potential

On the Geometric Approach to Multivariate Selection
C. J. Skinner
Journal:

Psychometrika / Volume 49 / Issue 3 / September 1984

Published online by Cambridge University Press:

01 January 2025, pp. 383-390
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Multivariate selection can be represented as a linear transformation in a geometric framework. This approach has led to considerable simplification in the study of the effects of selection on factor analysis. In this note this approach is extended to describe the effects of selection on regression analysis and to adjust for the effects of selection using the inverse of the linear transformation.

Estimation of Stimulus-Response Curves by Bayesian Least Squares
H. D. Brunk
Journal:

Psychometrika / Volume 46 / Issue 2 / June 1981

Published online by Cambridge University Press:

01 January 2025, pp. 115-128
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Bayesian least squares techniques are adapted to estimation of stimulus-response curves, rather broadly conceived. Illustrative examples deal with estimation of person characteristic curves and item characteristic curves in the context of mental testing, and estimation of a stimulus-response curve using data from a psychophysical experiment.

Describing the Elephant: Structure and Function in Multivariate Data
Roderick P. McDonald
Journal:

Psychometrika / Volume 51 / Issue 4 / December 1986

Published online by Cambridge University Press:

01 January 2025, pp. 513-534
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
There is a unity underlying the diversity of models for the analysis of multivariate data. Essentially, they constitute a family models, most generally nonlinear, for structural/functional relations between variables drawn from a behavior domain.

A Reformulation of the General Euclidean Model for the External Analysis of Preference Data
Mark L. Davison
Journal:

Psychometrika / Volume 53 / Issue 3 / September 1988

Published online by Cambridge University Press:

01 January 2025, pp. 305-320
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper discusses least squares methods for fitting a reformulation of the general Euclidean model for the external analysis of preference data. The reformulated subject weights refer to a common set of reference vectors for all subjects and hence are comparable across subjects. If the rotation of the stimulus space is fixed, the subject weight estimates in the model are uniquely determined. Weight estimates can be guaranteed nonnegative. While the reformulation is a metric model for single stimulus data, the paper briefly discusses extensions to nonmetric, pairwise, and logistic models. The reformulated model is less general than Carroll's earlier formulation.

On the Factor Score Controversy
Bert F. Green, Jr.
Journal:

Psychometrika / Volume 41 / Issue 2 / June 1976

Published online by Cambridge University Press:

01 January 2025, pp. 263-266
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A summary and interpretation of the recent literature on the indeterminacy of factor scores is given in simple terms. A good index of factor score determinacy is the squared multiple correlation of the factor with the observed variables.

Some Simple Procedures for Handling Missing Data in Multivariate Analysis
James W. Frane
Journal:

Psychometrika / Volume 41 / Issue 3 / September 1976

Published online by Cambridge University Press:

01 January 2025, pp. 409-415
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
For analyses with missing data, some popular procedures delete cases with missing values, perform analysis with “missing value” correlation or covariance matrices, or estimate missing values by sample means. There are objections to each of these procedures. Several procedures are outlined here for replacing missing values by regression values obtained in various ways, and for adjusting coefficients (such as factor score coefficients) when data are missing. None of the procedures are complex or expensive.

Regression Among Factor Scores
Anders Skrondal, Petter Laake
Journal:

Psychometrika / Volume 66 / Issue 4 / December 2001

Published online by Cambridge University Press:

01 January 2025, pp. 563-575
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Structural equation models with latent variables are sometimes estimated using an intuitive three-step approach, here denoted factor score regression. Consider a structural equation model composed of an explanatory latent variable and a response latent variable related by a structural parameter of scientific interest. In this simple example estimation of the structural parameter proceeds as follows: First, common factor models areseparately estimated for each latent variable. Second, factor scores areseparately assigned to each latent variable, based on the estimates. Third, ordinary linear regression analysis is performed among the factor scores producing an estimate for the structural parameter. We investigate the asymptotic and finite sample performance of different factor score regression methods for structural equation models with latent variables. It is demonstrated that the conventional approach to factor score regression performs very badly. Revised factor score regression, using Regression factor scores for the explanatory latent variables and Bartlett scores for the response latent variables, produces consistent estimators for all parameters.

Comparing Bayesian Variable Selection to Lasso Approaches for Applications in Psychology
Sierra A. Bainter, Thomas G. McCauley, Mahmoud M. Fahmy, Zachary T. Goodman, Lauren B. Kupis, J. Sunil Rao
Journal:

Psychometrika / Volume 88 / Issue 3 / September 2023

Published online by Cambridge University Press:

01 January 2025, pp. 1032-1055
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In the current paper, we review existing tools for solving variable selection problems in psychology. Modern regularization methods such as lasso regression have recently been introduced in the field and are incorporated into popular methodologies, such as network analysis. However, several recognized limitations of lasso regularization may limit its suitability for psychological research. In this paper, we compare the properties of lasso approaches used for variable selection to Bayesian variable selection approaches. In particular we highlight advantages of stochastic search variable selection (SSVS), that make it well suited for variable selection applications in psychology. We demonstrate these advantages and contrast SSVS with lasso type penalization in an application to predict depression symptoms in a large sample and an accompanying simulation study. We investigate the effects of sample size, effect size, and patterns of correlation among predictors on rates of correct and false inclusion and bias in the estimates. SSVS as investigated here is reasonably computationally efficient and powerful to detect moderate effects in small sample sizes (or small effects in moderate sample sizes), while protecting against false inclusion and without over-penalizing true effects. We recommend SSVS as a flexible framework that is well-suited for the field, discuss limitations, and suggest directions for future development.

A Stochastic Growth Model Applied to Repeated Tests of Academic Knowledge
W. Albers, R. J. M. M. Does, Tj. Imbos, M. P. E. Janssen
Journal:

Psychometrika / Volume 54 / Issue 3 / September 1989

Published online by Cambridge University Press:

01 January 2025, pp. 451-466
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the course of the medical program at the University of Limburg, students complete a total of 24 progress tests, consisting of items drawn from a constant itembank. A model is presented for the growth of knowledge reflected by these results. The Rasch model is used as a starting point, but both ability and difficulty parameters are taken to be random, and moreover the logistic distribution is replaced by the normal. Both individual and group abilities are estimated and explained through simple linear regression. Application to real data shows that the model fits very well.

3 - The Big Picture: Large-n Evidence
Alexandra O. Zeitz, Concordia University, Montréal
Book:

The Financial Statecraft of Borrowers

Published online:

12 December 2024

Print publication:

19 December 2024, pp 69-135
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter empirically analyzes how portfolios of external finance impact aid agreements. The chapter integrates data on external debt and foreign aid to establish a comprehensive picture of developing countries' portfolios of external finance, demonstrating that these have become less reliant on traditional donors over time. The analysis tests if a greater share of finance from Chinese or private sources is associated with favorable terms from traditional donors, using measures of aid volume, infrastructure project share, and conditions attached to World Bank projects. The findings indicate that as countries draw a greater share of their external finance from nontraditional sources, they are more likely to receive aid on preferred terms. The relationship is stronger for countries of strategic significance to donors and, especially, those with higher donor trust.

Search Results

Refine search

Refine search

Actions for selected content:

121 results

matrixdist: an R package for statistical analysis of matrix distributions

6 - Statistical Analysis of Hierarchy, Property Rights, and State Capacity

Summary

Mental health of rural doctors and influencing factors in Hebei, China

7 - Averaging

Summary

What to Observe When Assuming Selection on Observables

Identification of Factor Scores by Regression with External Variables in Exploratory Factor Analysis

1 - Introduction

Summary

Statistical models for improved insurance risk assessment using telematics

Active learning for regression in engineering populations: a risk-informed approach

15 - Shifting Borders, Shifting Political Representation

Summary

On the Geometric Approach to Multivariate Selection

Estimation of Stimulus-Response Curves by Bayesian Least Squares

Describing the Elephant: Structure and Function in Multivariate Data

A Reformulation of the General Euclidean Model for the External Analysis of Preference Data

On the Factor Score Controversy

Some Simple Procedures for Handling Missing Data in Multivariate Analysis

Regression Among Factor Scores

Comparing Bayesian Variable Selection to Lasso Approaches for Applications in Psychology

A Stochastic Growth Model Applied to Repeated Tests of Academic Knowledge

3 - The Big Picture: Large-n Evidence

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

121 results

Summary

Summary

Summary

Summary

Summary