To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
No research has assessed Hamilton Rating Scale for Anxiety (HRSA) psychometric properties in Ethiopian university students, using item response theory (IRT) and classical theory.
Aims
This study aimed to assess psychometric properties of the English HRSA in Ethiopian students, using IRT and classical theory.
Method
University students (N = 370, age 21.44 ± 2.30 years) in Ethiopia participated in a cross-sectional study. Participants completed a self-reported measure of anxiety, a sociodemographics tool and interviewer-administered HRSA.
Results
Confirmatory factor analysis (CFA) favoured a one-factor structure because fit indices for the one-factor model; and two distinct two-factor models were similar, but high interfactor correlations violated discriminant validity criteria in two-factor models. This one-factor structure showed structural invariance as evidenced by multi-group CFA across gender groups. No ceiling/floor effects were seen for the HRSA total scores. Infit and outfit mean square values for all the items were within the acceptable range (0.6–1.4). Four threshold estimates (τi1, τi2, τi3 and τi4) for each item were ordered as expected. Differential item functions showed item-level measurement invariance for all the 14 HRSA items across gender for both uniform and non-uniform estimates. McDonald’s ω and Cronbach’s α for the HRSA tool were both 0.88. The convergent validity of the interviewer-administered HRSA with self-reported anxiety subscale of the 21-item Depression, Anxiety and Stress Scale was weak to moderate.
Conclusions
The findings favour the validity of a one-factor structure of the HRSA with adequate item properties (classical and rating scale model), convergent validity, reliability and measurement invariance (structural and item level) across gender groups in Ethiopian university students.
In this article, we propose a series of latent trait models for the responses and the response times on low stakes tests where some test takers respond preliminary without making full effort to solve the items. The models consider individual differences in capability and persistence. Core of the models is a race between the solution process and a process of disengagement that interrupts the solution process. The different processes are modeled with the linear ballistic accumulator model. Within this general framework, we develop different model variants that differ in the number of accumulators and the way the response is generated when the solution process is interrupted. We distinguish no guessing, random guessing and informed guessing where the guessing probability depends on the status of the solution process. We conduct simulation studies on parameter recovery and on trait estimation. The simulation study suggests that parameter values and traits can be recovered well under certain conditions. Finally, we apply the model variants to empirical data.
The purpose of this study was to measure meal quality in representative samples of schoolchildren in three cities located in different Brazilian regions using the Meal and Snack Assessment Quality (MESA) scale and examine association with weight status, socio-demographic characteristics and behavioural variables. This cross-sectional study analysed data on 5612 schoolchildren aged 7–12 years who resided in cities in Southern, Southeastern and Northeastern Brazil. Dietary intake was evaluated using the WebCAAFE questionnaire. Body weight and height were measured to calculate the BMI. Weight status was classified based on age- and sex-specific Z-scores. Meal quality was measured using the MESA scale. Associations of meal quality with weight status and socio-demographic and behavioural variables were investigated using multinomial regression analysis. Schoolchildren in Feira de Santana, São Paulo and Florianópolis had a predominance of healthy (41·8 %), mixed (44·4 %) and unhealthy (42·7 %) meal quality, respectively. There was no association with weight status. Schoolchildren living in Feira de Santana, those who reported weekday dietary intakes, and those with lower physical activity and screen activity scores showed higher meal quality. Schoolchildren aged 10–12 years, those who reported dietary intakes relative to weekend days, and those with higher screen activity scores exhibited lower meal quality.
Fitting propensity (FP) analysis quantifies model complexity but has been impeded in item response theory (IRT) due to the computational infeasibility of uniformly and randomly sampling multinomial item response patterns under a full-information approach. We adopt a limited-information (LI) approach, wherein we generate data only up to the lower-order margins of the complete item response patterns. We present an algorithm that builds upon classical work on sampling contingency tables with fixed margins by implementing a Sequential Importance Sampling algorithm to Quickly and Uniformly Obtain Contingency tables (SISQUOC). Theoretical justification and comprehensive validation demonstrate the effectiveness of the SISQUOC algorithm for IRT and offer insights into sampling from the complete data space defined by the lower-order margins. We highlight the efficiency and simplicity of the LI approach for generating large and uniformly random datasets of dichotomous and polytomous items. We further present an iterative proportional fitting procedure to reconstruct joint multinomial probabilities after LI-based data generation, facilitating FP evaluation using traditional estimation strategies. We illustrate the proposed approach by examining the FP of the graded response model and generalized partial credit model, with results suggesting that their functional forms express similar degrees of configural complexity.
Bifactor Item Response Theory (IRT) models are the usual option for modeling composite constructs. However, in application, researchers typically must assume that all dimensions of person parameter space are orthogonal. This can result in absurd model interpretations. We propose a new bifactor model—the Completely Oblique Rasch Bifactor (CORB) model—which allows for estimation of correlations between all dimensions. We discuss relations of this model to other oblique bifactor models and study the conditions for its identification in the dichotomous case. We analytically prove that this model is identified in the case that (a) at least one item loads solely on the general factor and no items are shared between any pair of specific factors (we call this the G-structure), or (b) if no items load solely on the general factor, but at least one item is shared between every pair of the specific factors (the S-structure). Using simulated and real data, we show that this model outperforms the other partially oblique bifactor models in terms of model fit because it corresponds to the more realistic assumptions about construct structure. We also discuss possible difficulties in the interpretation of the CORB model’s parameters using, by analogy, the “explaining away” phenomenon from Bayesian reasoning.
The field of criminology is limited by a 'hidden' measurement crisis. It is hidden because scholars either are not aware of the shortcomings of their measures or have implicitly agreed that scales with certain properties merit publication. It is a crisis because the approaches used to construct measures do not employ modern systematic psychometric methods. As a result, the degree to which existing measures have methodological limitations is unknown. The purpose of this Element is to unmask this hidden crisis and provide a case study demonstrating how to build a measure of a prominent criminological construct through modern systematic psychometric methods. Using multiple surveys and item response theory, it develops a ten-item scale of procedural justice in policing. This can be used in primary research and to adjudicate existing measures. The goal is to reveal the nature of the field's measurement crisis and show a strategy for solving it.
Conditional dependence (CD) reflects potential interactions between persons and items in measurement, offering valuable information for deriving personalized diagnoses, evaluations, and feedback. The recent integration of psychometric models with latent space provides an effective way to visualize and quantify person–item interactions unexplained by latent variables and item parameters. In such applications, it is important to recognize the relative nature of CD, as models with different structures and complexities (e.g., due to factor dimensionality and item parameters) produce varying systematic explanations of person and item effects, leading to differing residual variations in both quantitative and qualitative sense. To demonstrate this relativity, we extend the previously developed unidimensional Rasch-based latent space item response model by incorporating between-item multidimensionality and item discrimination parameters. The resulting model can be reduced to simpler models with appropriate constraints, allowing us to explore the relativity in CD by comparing them. Simulation studies demonstrate that (1) the most complex proposed model properly recovers its parameters, (2) it outperforms the traditional IRT models by accounting for CD, and (3) the models in comparison exhibit distinctive extents of CD. The study continues with empirical examples that further illustrate relative changes in the extent and configurations of CD with practical implications.
Dynamic models of aggregate public opinion are increasingly popular, but to date they have been restricted to unidimensional latent traits. This is problematic because in many domains the structure of mass preferences is multidimensional. We address this limitation by deriving a multidimensional ordinal dynamic group-level item response theory (MODGIRT) model. We describe the Bayesian estimation of the model and present a novel workflow for dealing with the difficult problem of identification. With simulations, we show that MODGIRT recovers aggregate parameters without estimating subject-level ideal points and is robust to moderate violations of assumptions. We further validate the model by reproducing at the group level an existing individual-level analysis of British attitudes towards redistribution. We then reanalyze a recent cross-national application of a group-level item response theory model, replacing its domain-specific confirmatory approach with an exploratory MODGIRT model. We describe extensions to allow for overdispersion, differential item functioning, and group-level predictors. A publicly available R package implements these methods.
Dynamic latent variable models generally link units’ positions on a latent dimension over time via random walks. Theoretically, these trajectories are often expected to resemble a mixture of periods of stability interrupted by moments of change. In these cases, a prior distribution such as the regularized horseshoe—that allows for both stasis and change—can prove a better theoretical and empirical fit for the underlying construct than other priors. Replicating Reuning, Kenwick, and Fariss (2019), we find that the regularized horseshoe performs better than the standard normal and the Student’s t-distribution when modeling dynamic latent variable models. Overall, the use of the regularized horseshoe results in more accurate and precise estimates. More broadly, the regularized horseshoe is a promising prior for many similar applications.
Multidimensional item response theory (MIRT) offers psychometric models for various data settings, most popularly for dichotomous and polytomous data. Less attention has been devoted to count responses. A recent growth in interest in count item response models (CIRM)—perhaps sparked by increased occurrence of psychometric count data, e.g., in the form of process data, clinical symptom frequency, number of ideas or errors in cognitive ability assessment—has focused on unidimensional models. Some recent unidimensional CIRMs rely on the Conway–Maxwell–Poisson distribution as the conditional response distribution which allows conditionally over-, under-, and equidispersed responses. In this article, we generalize to the multidimensional case, introducing the Multidimensional Two-Parameter Conway–Maxwell–Poisson Model (M2PCMPM). Using the expectation-maximization (EM) algorithm, we develop marginal maximum likelihood estimation methods, primarily for exploratory M2PCMPMs. The resulting discrimination matrices are rotationally indeterminate. Recently, regularization of the discrimination matrix has been used to obtain a simple structure (i.e., a sparse solution) for dichotomous and polytomous data. For count data, we also (1) rotate or (2) regularize the discrimination matrix. We develop an EM algorithm with lasso ($\ell _1$) regularization for the M2PCMPM and compare (1) and (2) in a simulation study. We illustrate the proposed model with an empirical example using intelligence test data.
When multiple items are clustered around a reading passage, the local independence assumption in item response theory is often violated. The amount of information contained in an item cluster is usually overestimated if violation of local independence is ignored and items are treated as locally independent when in fact they are not. In this article we provide a general method that adjusts for the inflation of information associated with a test containing item clusters. A computational scheme was presented for the evaluation of the factor of adjustment for clusters in the restrictive case of two items per cluster, and the general case of more than two items per cluster. The methodology was motivated by a study of the NAEP Reading Assessment. We present a simulated study along with an analysis of a NAEP data set.
In this note, we prove that the 3 parameter logistic model with fixed-effect abilities is identified only up to a linear transformation of the ability scale under mild regularity conditions, contrary to the claims in Theorem 2 of San Martín et al. (Psychometrika, 80(2):450–467, 2015a).
This study proposes a new item parameter linking method for the common-item nonequivalent groups design in item response theory (IRT). Previous studies assumed that examinees are randomly assigned to either test form. However, examinees can frequently select their own test forms and tests often differ according to examinees’ abilities. In such cases, concurrent calibration or multiple group IRT modeling without modeling test form selection behavior can yield severely biased results. We proposed a model wherein test form selection behavior depends on test scores and used a Monte Carlo expectation maximization (MCEM) algorithm. This method provided adequate estimates of testing parameters.
The relationship between the EM algorithm and the Bock-Aitkin procedure is described with a continuous distribution of ability (latent trait) from an EM-algorithm perspective. Previous work has been restricted to the discrete case from a probit-analysis perspective.
When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on item response modeling (IRT). Although recent computational advancement allows efficient and accurate estimation of multilevel IRT models, we argue that a two-stage divide-and-conquer strategy still has its unique advantages. Within the two-stage framework, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method. They are compared to the naïve two-stage estimation and the one-stage MCMC estimation. A simulation study is conducted to compare the five methods in terms of model parameter recovery and their standard error estimation. The pros and cons of each method are also discussed to provide guidelines for practitioners. Finally, a real data example is given to illustrate the applications of various methods using the National Educational Longitudinal Survey data (NELS 88).
This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course. The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative “validation” of the factor model, using auxiliary information about the popularity of items consulted during an open-book examination in the course.
This paper presents a noncompensatory latent trait model, the multicomponent latent trait model for diagnosis (MLTM-D), for cognitive diagnosis. In MLTM-D, a hierarchical relationship between components and attributes is specified to be applicable to permit diagnosis at two levels. MLTM-D is a generalization of the multicomponent latent trait model (MLTM; Whitely in Psychometrika, 45:479–494, 1980; Embretson in Psychometrika, 49:175–186, 1984) to be applicable to measures of broad traits, such as achievement tests, in which component structure varies between items. Conditions for model identification are described and marginal maximum likelihood estimators are presented, along with simulation data to demonstrate parameter recovery. To illustrate how MLTM-D can be used for diagnosis, an application to a large-scale test of mathematics achievement is presented. An advantage of MLTM-D for diagnosis is that it may be more applicable to large-scale assessments with more heterogeneous items than are latent class models.
This commentary is an attempt to present some additional alternatives to the suggestions made by Reise et al. (2021). IRT models as they are used for patient-reported outcome (PRO) scales may not be fully satisfactory when used with commonly made assumptions. The suggested change to an alternative parameterization is critically reflected with the intent to initiate discussion around more comprehensive alternatives that allow for more complex latent structures having the potential to be more appropriate for PRO scales as they are applied to diverse populations.
We introduce an extended multivariate logistic response model for multiple choice items; this model includes several earlier proposals as special cases. The discussion includes a theoretical development of the model, a description of the relationship between the model and data, and a marginal maximum likelihood estimation scheme for the item parameters. Comparisons of the performance of different versions of the full model with more constrained forms corresponding to previous proposals are included, using likelihood ratio statistics and empirical data.
An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.