Search

16 - Summary: Steps for Constructing a Multivariable Model
Mitchell H. Katz, NYC Health and Hospitals
Book:

Multivariable Analysis

Published online:

09 October 2025

Print publication:

23 October 2025, pp 259-260
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Choose the type of multivariable model based on the type of outcome variable you have. Perform univariate statistics to understand the distribution of your independent and outcome variables. Perform bivariate analysis of your independent variables. Run a correlation matrix to understand how your independent variables are related to another. Assess your missing data. Perform your analysis and assess how well your model fits the data. Assess the strength of your individual covariates in estimating outcome. Use regression diagnostics to assess the underlying assumptions of your model. Perform sensitivity analyses to assess the robustness of your findings and consider whether it would be possible to validate your model. Publish your work and soak up the glory.

6 - Setting Up a Multivariable Analysis
Mitchell H. Katz, NYC Health and Hospitals
Book:

Multivariable Analysis

Published online:

09 October 2025

Print publication:

23 October 2025, pp 98-122
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In setting up your model, include those variables, in addition to the risk factor or group assignment, that have been theorized or shown in prior research to be confounders or those that empirically are associated with the risk factor and the outcome in bivariate analysis.
Exclude variables that are on the intervening pathway between the risk factor and outcome, those that are extraneous because they are not on the causal pathway, redundant variables, and variables with a lot of missing data.
Sample size calculation for multivariable analysis is complicated but statistical programs exist to help you to calculate it. Missing data on independent variables can compromise your multivariable analysis. Several methods exist to compensate for missing independent data including deleting cases, using indicator variables to represent missing data and estimating the value of missing cases. Methods also exist for estimating missing outcome data using other data you have about the subject and multiple imputation.

12 - Sensitivity Analysis
Mitchell H. Katz, NYC Health and Hospitals
Book:

Multivariable Analysis

Published online:

09 October 2025

Print publication:

23 October 2025, pp 225-237
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Sensitivity analysis tests how robust the results are to changes in the underlying assumptions of your analysis. In other words, if you made plausible changes in your assumptions, would you still draw the same conclusions? The changes could be a more restrictive or inclusive sample, a different way to measure your variables, a different way for handling missing data, or a change of a different feature of your analysis. With sensitivity analysis you cannot lose. If you vary the assumptions of your analysis and you get the same result, you will have more confidence in the conclusions of your study. Conversely, if plausible changes in your assumptions lead to a different conclusion, you will have learned something important. A common assumption tested in sensitivity analysis is that there are no unmeasured confounders, which can be tested with E values or falsification analysis. Other common assumptions tested are that losses to follow-up are random, that the sample is unbiased, that there is the correct exposure period and follow-up period, that there is a biased predictor or outcome, or that the model is misspecified.

Hierarchical imputation of categorical variables in the presence of systematically and sporadically missing data
Shahab Jolani
Journal:

Research Synthesis Methods / Volume 16 / Issue 5 / September 2025

Published online by Cambridge University Press:

10 June 2025, pp. 729-757
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Modern quantitative evidence synthesis methods often combine patient-level data from different sources, known as individual participants data (IPD) sets. A specific challenge in meta-analysis of IPD sets is the presence of systematically missing data, when certain variables are not measured in some studies, and sporadically missing data, when measurements of certain variables are incomplete across different studies. Multiple imputation (MI) is among the better approaches to deal with missing data. However, MI of hierarchical data, such as IPD meta-analysis, requires advanced imputation routines that preserve the hierarchical data structure and accommodate the presence of both systematically and sporadically missing data. We have recently developed a new class of hierarchical imputation methods within the MICE framework tailored for continuous variables. This article discusses the extensions of this methodology to categorical variables, accommodating the simultaneous presence of systematically and sporadically missing data in nested designs with arbitrary missing data patterns. To address the challenge of the categorical nature of the data, we propose an accept–reject algorithm during the imputation process. Following theoretical discussions, we evaluate the performance of the new methodology through simulation studies and demonstrate its application using an IPD set from patients with kidney disease.

What the next generation of doctors want from a career in psychiatry: longitudinal survey of UK trainees and medical students
Nagore Penades, Brooke Marron, Darragh Hamilton
Journal:

BJPsych Bulletin , FirstView

Published online by Cambridge University Press:

14 March 2025, pp. 1-5
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Workforce planning aims to model and predict supply and demand in medical specialties. In Scotland it is undertaken jointly by the Scottish Government and the Royal College of Psychiatrists in Scotland to ensure workforce sustainability. The survey described in this paper aimed to ascertain why doctors continue to choose to take a break from/delay training programmes or pursue alternative jobs and career pathways. Career breaks, time out of training, less than full-time working patterns, dual training and non-clinical careers need to be taken into account during workforce planning not only to make psychiatry an attractive specialty to work in, but to ensure robust future sustainability in the psychiatric workforce in Scotland and the UK.

10 - Data Processing
from Part III - Expanding the Multiverse
Cristobal Young, Cornell University, New York, Erin Cumberworth, Cornell University, New York
Book:

Multiverse Analysis

Published online:

28 February 2025

Print publication:

06 March 2025, pp 154-176
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Raw data require a great deal of cleaning, coding, and categorizing of observations. Vague standards for this data work can make it troublingly ad hoc, with much opportunity and temptation to influence the final results. Preprocessing rules and assumptions are not often seen as part of the model, but they can influence the result just as much as control variables or functional form assumptions. In this chapter, we discuss the main data processing decisions that analysts often face and how they can affect the results: coding and classifying of variables, processing anomalous and outlier observations, and the use of sample weights.

State-Dependent Missingness in Hidden Markov Models, with an Application to Drop-Out in a Clinical Trial
Part of
- Model Identification and Estimation for Longitudinal Data in Practice
Maarten Speekenbrink, Ingmar Visser
Journal:

Psychometrika / Volume 90 / Issue 2 / April 2025

Published online by Cambridge University Press:

03 January 2025, pp. 476-507
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Establishing the effectiveness of treatments for psychopathology requires accurate models of its progression over time and the factors that impact it. Longitudinal data is however fraught with missingness, hindering accurate modeling. We re-analyse data on schizophrenia severity in a clinical trial using hidden Markov models (HMMs). We consider missing data in HMMs with a focus on situations where data is missing not at random (MNAR) and missingness depends on the latent states, allowing symptom severity to indirectly impact probability of missingness. In simulations, we show that including a submodel for state-dependent missingness reduces bias when data is MNAR and state-dependent, whilst not reducing accuracy when data is missing-at-random (MAR). When missingness depends on time, a model that allows missingness to be both state- and time-dependent is unbiased. Overall, these results show that modelling missingness as state-dependent and including relevant covariates is a useful strategy in applications of HMMs to time-series with missing data. Applying the model to data from a clinical trial, we find that drop-out is more likely for patients with less severe symptoms, which may lead to a biased assessment of treatment effectiveness.

A Dynamic Generalization of the Rasch Model
N. D. Verhelst, C. A. W. Glas
Journal:

Psychometrika / Volume 58 / Issue 3 / September 1993

Published online by Cambridge University Press:

01 January 2025, pp. 395-415
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the present paper a model for describing dynamic processes is constructed by combining the common Rasch model with the concept of structurally incomplete designs. This is accomplished by mapping each item on a collection of virtual items, one of which is assumed to be presented to the respondent dependent on the preceding responses and/or the feedback obtained. It is shown that, in the case of subject control, no unique conditional maximum likelihood (CML) estimates exist, whereas marginal maximum likelihood (MML) proves a suitable estimation procedure. A hierarchical family of dynamic models is presented, and it is shown how to test special cases against more general ones. Furthermore, it is shown that the model presented is a generalization of a class of mathematical learning models, known as Luce's beta-model.

Maximum Likelihood Estimation of the Joint Covariance Matrix for Sections of Tests given to Distinct Samples with Application to Test Equating
Dorothy T. Thayer
Journal:

Psychometrika / Volume 48 / Issue 2 / June 1983

Published online by Cambridge University Press:

01 January 2025, pp. 293-297
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Consider an old test X consisting of s sections and two new tests Y and Z similar to X consisting of p and q sections respectively. All subjects are given test X plus two variable sections from either test Y or Z. Different pairings of variable sections are given to each subsample of subjects. We present a method of estimating the covariance matrix of the combined test (X1, ..., Xs, Y1, ..., Yp, Z1, ..., Zq) and describe an application of these estimation techniques to linear, observed-score, test equating.

Estimating Correlations with Missing Data, a Bayesian Approach
Alan L. Gross, Rocio Torres-Quevedo
Journal:

Psychometrika / Volume 60 / Issue 3 / September 1995

Published online by Cambridge University Press:

01 January 2025, pp. 341-354
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The posterior distribution of the bivariate correlation is analytically derived given a data set where x is completely observed but y is missing at random for a portion of the sample. Interval estimates of the correlation are then constructed from the posterior distribution in terms of highest density regions (HDRs). Various choices for the form of the prior distribution are explored. For each of these priors, the resulting Bayesian HDRs are compared with each other and with intervals derived from maximum likelihood theory.

Randomization-Based Inference about Latent Variables from Complex Samples
Robert J. Mislevy
Journal:

Psychometrika / Volume 56 / Issue 2 / June 1991

Published online by Cambridge University Press:

01 January 2025, pp. 177-196
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Standard procedures for drawing inferences from complex samples do not apply when the variable of interest θ cannot be observed directly, but must be inferred from the values of secondary random variables that depend on θ stochastically. Examples are proficiency variables in item response models and class memberships in latent class models. Rubin's “multiple imputation” techniques yield approximations of sample statistics that would have been obtained, had θ been observable, and associated variance estimates that account for uncertainty due to both the sampling of respondents and the latent nature of θ. The approach is illustrated with data from the National Assessment for Educational Progress.

A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data
Jun Li, Yao Yu
Journal:

Psychometrika / Volume 80 / Issue 3 / September 2015

Published online by Cambridge University Press:

01 January 2025, pp. 707-726
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Missing data occur in many real world studies. Knowing the type of missing mechanisms is important for adopting appropriate statistical analysis procedure. Many statistical methods assume missing completely at random (MCAR) due to its simplicity. Therefore, it is necessary to test whether this assumption is satisfied before applying those procedures. In the literature, most of the procedures for testing MCAR were developed under normality assumption which is sometimes difficult to justify in practice. In this paper, we propose a nonparametric test of MCAR for incomplete multivariate data which does not require distributional assumptions. The proposed test is carried out by comparing the distributions of the observed data across different missing-pattern groups. We prove that the proposed test is consistent against any distributional differences in the observed data. Simulation shows that the proposed procedure has the Type I error well controlled at the nominal level for testing MCAR and also has good power against a variety of non-MCAR alternatives.

A Class of Distribution-Free Models for Longitudinal Mediation Analysis
D. Gunzler, W. Tang, N. Lu, P. Wu, X. M. Tu
Journal:

Psychometrika / Volume 79 / Issue 4 / October 2014

Published online by Cambridge University Press:

01 January 2025, pp. 543-568
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Mediation analysis constitutes an important part of treatment study to identify the mechanisms by which an intervention achieves its effect. Structural equation model (SEM) is a popular framework for modeling such causal relationship. However, current methods impose various restrictions on the study designs and data distributions, limiting the utility of the information they provide in real study applications. In particular, in longitudinal studies missing data is commonly addressed under the assumption of missing at random (MAR), where current methods are unable to handle such missing data if parametric assumptions are violated.
In this paper, we propose a new, robust approach to address the limitations of current SEM within the context of longitudinal mediation analysis by utilizing a class of functional response models (FRM). Being distribution-free, the FRM-based approach does not impose any parametric assumption on data distributions. In addition, by extending the inverse probability weighted (IPW) estimates to the current context, the FRM-based SEM provides valid inference for longitudinal mediation analysis under the two most popular missing data mechanisms; missing completely at random (MCAR) and missing at random (MAR). We illustrate the approach with both real and simulated data.

Constrained Maximum Likelihood Estimation of Two-Level Covariance Structure Model Via EM Type Algorithms
Sik-Yum Lee, Sin-Yu Tsang
Journal:

Psychometrika / Volume 64 / Issue 4 / December 1999

Published online by Cambridge University Press:

01 January 2025, pp. 435-450
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper, the constrained maximum likelihood estimation of a two-level covariance structure model with unbalanced designs is considered. The two-level model is reformulated as a single-level model by treating the group level latent random vectors as hypothetical missing-data. Then, the popular EM algorithm is extended to obtain the constrained maximum likelihood estimates. For general nonlinear constraints, the multiplier method is used at the M-step to find the constrained minimum of the conditional expectation. An accelerated EM gradient procedure is derived to handle linear constraints. The empirical performance of the proposed EM type algorithms is illustrated by some artifical and real examples.

Weighted Least Squares Fitting Using Ordinary Least Squares Algorithms
Henk A. L. Kiers
Journal:

Psychometrika / Volume 62 / Issue 2 / June 1997

Published online by Cambridge University Press:

01 January 2025, pp. 251-266
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A general approach for fitting a model to a data matrix by weighted least squares (WLS) is studied. This approach consists of iteratively performing (steps of) existing algorithms for ordinary least squares (OLS) fitting of the same model. The approach is based on minimizing a function that majorizes the WLS loss function. The generality of the approach implies that, for every model for which an OLS fitting algorithm is available, the present approach yields a WLS fitting algorithm. In the special case where the WLS weight matrix is binary, the approach reduces to missing data imputation.

An Estimate of the Covariance between Variables which are not Jointly Observed
Robert Cudeck
Journal:

Psychometrika / Volume 65 / Issue 4 / December 2000

Published online by Cambridge University Press:

01 January 2025, pp. 539-546
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Situations sometimes arise in which variables collected in a study are not jointly observed. This typically occurs because of study design. An example is an equating study where distinct groups of subjects are administered different sections of a test. In the normal maximum likelihood function to estimate the covariance matrix among all variables, elements corresponding to those that are not jointly observed are unidentified. If a factor analysis model holds for the variables, however, then all sections of the matrix can be accurately estimated, using the fact that the covariances are a function of the factor loadings. Standard errors of the estimated covariances can be obtained by the delta method. In addition to estimating the covariance matrix in this design, the method can be applied to other problems such as regression factor analysis. Two examples are presented to illustrate the method.

Tests of Homogeneity of Means and Covariance Matrices for Multivariate Incomplete Data
Kevin H. Kim, Peter M. Bentler
Journal:

Psychometrika / Volume 67 / Issue 4 / December 2002

Published online by Cambridge University Press:

01 January 2025, pp. 609-623
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Existing test statistics for assessing whether incomplete data represent a missing completely at random sample from a single population are based on a normal likelihood rationale and effectively test for homogeneity of means and covariances across missing data patterns. The likelihood approach cannot be implemented adequately if a pattern of missing data contains very few subjects. A generalized least squares rationale is used to develop parallel tests that are expected to be more stable in small samples. Three factors were varied for a simulation: number of variables, percent missing completely at random, and sample size. One thousand data sets were simulated for each condition. The generalized least squares test of homogeneity of means performed close to an ideal Type I error rate for most of the conditions. The generalized least squares test of homogeneity of covariance matrices and a combined test performed quite well also.

Modeling Not-Reached Items in Timed Tests: A Response Time Censoring Approach
Jinxin Guo, Xin Xu, Zhiliang Ying, Susu Zhang
Journal:

Psychometrika / Volume 87 / Issue 3 / September 2022

Published online by Cambridge University Press:

01 January 2025, pp. 835-867
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Time limits are imposed on many computer-based assessments, and it is common to observe examinees who run out of time, resulting in missingness due to not-reached items. The present study proposes an approach to account for the missing mechanisms of not-reached items via response time censoring. The censoring mechanism is directly incorporated into the observed likelihood of item responses and response times. A marginal maximum likelihood estimator is proposed, and its asymptotic properties are established. The proposed method was evaluated and compared to several alternative approaches that ignore the censoring through simulation studies. An empirical study based on the PISA 2018 Science Test was further conducted.

Modeling Concordance Correlation Coefficient for Longitudinal Study Data
Yan Ma, Wan Tang, Qin Yu, X. M. Tu
Journal:

Psychometrika / Volume 75 / Issue 1 / March 2010

Published online by Cambridge University Press:

01 January 2025, pp. 99-119
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Measures of agreement are used in a wide range of behavioral, biomedical, psychosocial, and health-care related research to assess reliability of diagnostic test, psychometric properties of instrument, fidelity of psychosocial intervention, and accuracy of proxy outcome. The concordance correlation coefficient (CCC) is a popular measure of agreement for continuous outcomes. In modern-day applications, data are often clustered, making inference difficult to perform using existing methods. In addition, as longitudinal study designs become increasingly popular, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data and other related issues for performing CCC analysis within a longitudinal data setting. The approach is illustrated with both real and simulated data.

An Upgrading Procedure for Adaptive Assessment of Knowledge
Pasquale Anselmi, Egidio Robusto, Luca Stefanutti, Debora de Chiusole
Journal:

Psychometrika / Volume 81 / Issue 2 / June 2016

Published online by Cambridge University Press:

01 January 2025, pp. 461-482
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In knowledge space theory, existing adaptive assessment procedures can only be applied when suitable estimates of their parameters are available. In this paper, an iterative procedure is proposed, which upgrades its parameters with the increasing number of assessments. The first assessments are run using parameter values that favor accuracy over efficiency. Subsequent assessments are run using new parameter values estimated on the incomplete response patterns from previous assessments. Parameter estimation is carried out through a new probabilistic model for missing-at-random data. Two simulation studies show that, with the increasing number of assessments, the performance of the proposed procedure approaches that of gold standards.

Search Results

Refine search

Refine search

Actions for selected content:

94 results

16 - Summary: Steps for Constructing a Multivariable Model

Summary

6 - Setting Up a Multivariable Analysis

Summary

12 - Sensitivity Analysis

Summary

Hierarchical imputation of categorical variables in the presence of systematically and sporadically missing data

What the next generation of doctors want from a career in psychiatry: longitudinal survey of UK trainees and medical students

10 - Data Processing

Summary

State-Dependent Missingness in Hidden Markov Models, with an Application to Drop-Out in a Clinical Trial

A Dynamic Generalization of the Rasch Model

Maximum Likelihood Estimation of the Joint Covariance Matrix for Sections of Tests given to Distinct Samples with Application to Test Equating

Estimating Correlations with Missing Data, a Bayesian Approach

Randomization-Based Inference about Latent Variables from Complex Samples

A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data

A Class of Distribution-Free Models for Longitudinal Mediation Analysis

Constrained Maximum Likelihood Estimation of Two-Level Covariance Structure Model Via EM Type Algorithms

Weighted Least Squares Fitting Using Ordinary Least Squares Algorithms

An Estimate of the Covariance between Variables which are not Jointly Observed

Tests of Homogeneity of Means and Covariance Matrices for Multivariate Incomplete Data

Modeling Not-Reached Items in Timed Tests: A Response Time Censoring Approach

Modeling Concordance Correlation Coefficient for Longitudinal Study Data

An Upgrading Procedure for Adaptive Assessment of Knowledge

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

94 results

Summary

Summary

Summary

Summary