Identification and Interpretation of the Completely Oblique Rasch Bifactor Model

Denis Federiakin; Mark R. Wilson

doi:10.1017/psy.2025.14

Identification and Interpretation of the Completely Oblique Rasch Bifactor Model

Published online by Cambridge University Press: 24 April 2025

Denis Federiakin

and

Mark R. Wilson

Show author details

Denis Federiakin*: Affiliation:
Department of Economic Education, Johannes Gutenberg University of Mainz, Mainz, Germany Institute of Psychology, Goethe University Frankfurt, Frankfurt, Germany Centre for Psychometrics and Educational Measurement, Institute of Education, HSE University, Moscow, Russia
Mark R. Wilson: Affiliation:
Berkeley Evaluation and Assessment Research (BEAR) Center, Graduate School of Education, UC Berkeley, Berkeley, CA, USA
*: Corresponding author: Denis Federiakin; Email: denis.federiakin@uni-mainz.de

Article contents

Abstract
MRCMLM framework
The completely oblique Rasch bifactor model
Other oblique bifactor models
The simulation study
A real data example
Discussion
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

Bifactor Item Response Theory (IRT) models are the usual option for modeling composite constructs. However, in application, researchers typically must assume that all dimensions of person parameter space are orthogonal. This can result in absurd model interpretations. We propose a new bifactor model—the Completely Oblique Rasch Bifactor (CORB) model—which allows for estimation of correlations between all dimensions. We discuss relations of this model to other oblique bifactor models and study the conditions for its identification in the dichotomous case. We analytically prove that this model is identified in the case that (a) at least one item loads solely on the general factor and no items are shared between any pair of specific factors (we call this the G-structure), or (b) if no items load solely on the general factor, but at least one item is shared between every pair of the specific factors (the S-structure). Using simulated and real data, we show that this model outperforms the other partially oblique bifactor models in terms of model fit because it corresponds to the more realistic assumptions about construct structure. We also discuss possible difficulties in the interpretation of the CORB model’s parameters using, by analogy, the “explaining away” phenomenon from Bayesian reasoning.

Keywords

bifactor models item response theory multidimensional random coefficients multinominal logit model oblique bifactor models Rasch models

Information

Type: Theory and Methods
Information: Psychometrika , Volume 90 , Issue 4 , September 2025 , pp. 1284 - 1318

DOI: https://doi.org/10.1017/psy.2025.14 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society

Bifactor models (Holzinger & Swineford, Reference Holzinger and Swineford1937) are a common approach in IRT for modeling composite constructs. These models enable the simultaneous estimation of a general factor, which is measured by all items, and specific factors, which are measured by subsets of them (see, for example, Figure 1). Bifactor models are particularly useful for capturing a general factor in tests with varied item types or in testlet-based assessments, where groups of items are linked by a common stimulus (Reise, Reference Reise2012). They are also a popular focus in psychometric research because they generalize higher-order models mathematically (Gignac, Reference Gignac2016). Additionally, bifactor models have a constrained form known as the testlet model, which is equivalent to higher-order models (Rijmen, Reference Rijmen2010).

Figure 1 A bifactor structure with three specific factors. All items load on the general factors and on one specific factor.

Traditional bifactor models are constrained by a restrictive assumption: the general and specific factors must be orthogonal, meaning they are uncorrelated. According to the traditional framework, this assumption is necessary to ensure model identification (Reise et al., Reference Reise, Moore and Haviland2010). However, from an interpretational and substantive perspective, this assumption is often nonsensical despite its mathematical justification. For example, consider a bifactor model applied to a test comprising items that measure algebra and geometry to derive a general mathematics score. This approach requires assuming that the general mathematics factor is uncorrelated with both algebra and geometry scores. Additionally, it demands that algebra and geometry scores themselves be uncorrelated. Such assumptions make it challenging to interpret the resulting factor scores as meaningful representations of content domains (Wilson & Gochyyev, Reference Wilson and Gochyyev2020).

Eid et al. (Reference Eid, Geiser, Koch and Heene2017) highlighted a related paradox using stochastic measurement theory, demonstrating that orthogonal bifactor models should only be applied when the specific factors are interchangeable—essentially drawn at random from the universe of specific factors. This assumption, however, does not hold when specific factors represent distinct subject matter domains, such as algebra and geometry. Eid et al. (Reference Eid, Geiser, Koch and Heene2017) further concluded that this requirement is rarely met in practice, leading to the overuse of bifactor models due to inappropriate measurement design.

To address this limitation, researchers often justify their use of bifactor models by aligning their application with specific modeling objectives. Frequently, the focus is on the general factor, with specific factors serving as a mathematical tool to account for local dependencies among items caused by shared content or stimuli (e.g., DeMars, Reference DeMars2013). Alternatively, some researchers emphasize the specific factors and view the general factor as a common source of error variance across all items (Hendy & Biderman, Reference Hendy and Biderman2019). In such cases, the assumption of total orthogonality contradicts theoretical models of the construct.

Still, psychometricians often treat secondary factors as nuisance dimensions, enabling them to overlook interpretational challenges. However, this approach is suboptimal for modeling composite constructs, as it prioritizes mathematical convenience over an accurate representation of the relationships between components. Attempts have been made to differentiate the contexts in which orthogonal bifactor models are applied. For example, these models have been shown to perform exceptionally well in measurement contexts (Cai et al., Reference Cai, Yang and Hansen2011; Jeon et al., Reference Jeon, Rijmen and Rabe-Hesketh2018; Wang & Zhang, Reference Wang and Zhang2019) but fail to yield reliable estimates in predictive contexts (Zhang et al., Reference Zhang, Sun, Cao and Drasgow2021; Zhang, Luo, Sun, et al., Reference Zhang, Luo, Sun, Cao and Drasgow2023).

To address these limitations, several extensions of bifactor IRT models have been proposed, providing partial solutions to the challenges of traditional bifactor models. These extensions allow for the direct estimation of specific entries in the variance–covariance matrix of the latent person parameter space. Notable examples include the Extended Rasch Testlet Model (ETM; Paek et al., Reference Paek, Yon, Wilson and Kang2009) and the Generalized Subdimensional Model (GSM; Brandt & Duckor, Reference Brandt and Duckor2013), both of which have been developed within the Rasch modeling framework (Rasch, Reference Rasch1993).

The ETM permits the estimation of covariances between specific factors and the general factor while maintaining orthogonality among the specific factors. In contrast, the GSM enforces orthogonality between the general factor and the specific factors but allows correlations among the specific factors, albeit under complex constraints. More recently, partially oblique bifactor models, such as GSM (but without those constraints), have been shown to be analytically identified within the covariance structure modeling framework if the factor loading matrix satisfies certain stringent requirements (Fang et al., Reference Fang, Guo, Xu, Ying and Zhang2021). However, these models have demonstrated high numerical instability in practice (Zhang, Luo, Zhang, et al., Reference Zhang, Luo, Zhang, Sun and Zhang2023), leading researchers to advise caution in their use. Furthermore, none of these partially oblique bifactor models allow for the unrestricted estimation of the entire variance–covariance matrix. As a result, the interpretation of factor scores and correlations remains as challenging as it is in traditional bifactor models.

The purpose of this article is twofold. First, from a theoretical perspective, we introduce the CORB model within the confirmatory IRT paradigm. This model, with certain limitations, enables the direct estimation of all entries in the variance–covariance matrix of person parameters, simplifying the interpretation of model parameters. We explore the structure and interpretation of the CORB model in relation to existing oblique bifactor models. As a special case of the Multidimensional Random Coefficients Multinomial Logit Model (MRCMLM; Adams et al., Reference Adams, Wilson and Wang1997), the CORB model can be calibrated using dedicated software such as the ConQuest program (Adams et al., Reference Adams, Wu, Cloney, Berezner and Wilson2020), the TAM package for the R language (Robitzsch et al., Reference Robitzsch, Kiefer and Wu2025), or other tools for Generalized Linear Mixed Effect Modeling (e.g., de Boeck et al., Reference de Boeck, Bakker, Zwitser, Nivard, Hofman, Tuerlinckx and Partchev2011).

Second, this article makes a practical contribution by describing two specific test dimensionality structures that facilitate the estimation of all correlations among person parameters. The first structure involves having at least one item that does not load on any specific factor, effectively serving as an indicator for the general factor. The second structure requires that every pair of specific factors share at least one item. We demonstrate how these two structures ensure the identification of the CORB model and discuss their practical implications.

The article is organized as follows: First, we describe the MRCMLM framework and outline the conditions necessary for identifying multidimensional Rasch models derived from this framework. Second, we present the CORB model and examine the conditions under which it is identified. Third, we compare the CORB model with other oblique bifactor models. Fourth, we conduct a simulation study to demonstrate that the CORB model is more flexible and performs better than other oblique Rasch models in terms of technical characteristics. Fifth, we provide a real data example using a reading assessment for first-graders and discuss challenges in interpreting the CORB model. Finally, we conclude with a discussion of the CORB model and potential directions for future research and application.

1 MRCMLM framework

Assume a test consists of $I$ items ( $i=1,\dots, I$ ), where each item has ${K}_i$ categories ( ${k}_i={1}_i,\dots, {K}_i$ ), and the total number of categories in the test is $K$ (so that $K=2I$ in the case of a dichotomous test). Without loss of generality, we assume all items are dichotomous. Consequently, each item is described by a single parameter ( ${\xi}_i$ ), and the total number of item parameters, $P$ , equals the number of items, $I$ .

Further, let the test measure $D$ latent factors ${\theta}_d$ ( $d=1,\dots, D$ ). Each of $D$ test scores, ${\theta}_d$ , is assumed to follow a distribution marginalized to have a mean ( $\mu$ ) of zero for model identification (i.e., $\boldsymbol{\mu} =\boldsymbol{0}$ ). For simplicity, this distribution is assumed to be normal with an estimated variance $\mathit{\operatorname{var}}\left({\theta}_d\right)$ . However, this normality assumption is not necessary in the general case (Le & Adams, Reference Le and Adams2013). The latent space of person parameters is then defined by a multivariate normal distribution characterized by a vector of means $\boldsymbol{\mu}$ and a variance–covariance matrix $\boldsymbol{\varSigma}$ .

According to the reflective perspective on measurement, we assume a predetermined correspondence between every response category of each test item and a specific latent factor. This correspondence is governed by a scoring matrix $\boldsymbol{B}$ (explained below). The first category of every item is scored as zero, which serves to identify the model and establishes this category as the reference category.

Formally, the MRCMLM is expressed as follows:

(1)

$$\begin{align}P\left({\boldsymbol{X}}_{ik}=1;\boldsymbol{A},\boldsymbol{B},\boldsymbol{\xi} |\boldsymbol{\theta} \right)=\frac{\exp \left({\boldsymbol{b}}_{ik}^T\boldsymbol{\theta} +{\boldsymbol{a}}_{ik}^T\boldsymbol{\xi} \right)}{\sum_{k=1}^{K_i}\exp \left({\boldsymbol{b}}_{ik}^T\boldsymbol{\theta} +{\boldsymbol{a}}_{ik}^T\boldsymbol{\xi} \right)},\end{align}$$

where ${\boldsymbol{X}}_i$ is a vector-valued random variable indicating ${X}_{ik}=1$ if a response to item $i$ is in category $k$ (out of all possible ${K}_i$ categories) and 0 otherwise,

$\boldsymbol{\xi}$ is a vector of $P$ item parameters ( $=I$ item difficulties in the dichotomous case),

$\boldsymbol{A}$ is the design matrix ( $K\unicode{x2A09} P$ ), composed of design vectors ${\boldsymbol{a}}_{ik}$ (each of length $P$ ),

$\boldsymbol{\theta}$ is the vector of person parameters, representing a $D$ -dimensional latent space,

$\boldsymbol{B}$ is the scoring matrix ( $K\unicode{x2A09} D$ ), composed of scoring vectors ${\boldsymbol{b}}_{ik}$ (each of length $D$ ).

The design matrix $\boldsymbol{A}$ defines the relationships between item categories and item parameters, while the scoring matrix $\boldsymbol{B}$ links item categories to the test dimensions. If non-zero entries of $\boldsymbol{B}$ are estimated as free parameters, they are interpreted as discrimination (or scoring) parameters, and the model corresponds to the 2PL approach in IRT. Conversely, if these entries are constrained to unity, the model follows the Rasch approach.Footnote ¹ Generally, the $\boldsymbol{B}$ matrix is structured as a factor loading matrix. The MRCMLM framework encompasses a wide range of models, including multidimensional, dichotomous, and polytomous models (using the adjacent logit link function), as well as other specialized models from the exponential family, within both the Rasch and 2PL paradigms.

1.1 Volodin and Adams condition for Identifying a D-dimensional Rasch model

Volodin and Adams (Reference Volodin and Adams2002) outlined the condition required for identifying multidimensional Rasch models with all correlated dimensions. They demonstrated that an oblique multidimensional Rasch model is identifiable if the following condition is metFootnote ²:

(2)

$$\begin{align}\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]=D+{P}_R\le K-I,\end{align}$$

where ${\boldsymbol{A}}_R$ is a reduced design matrix, carefully constructed to preserve the original model’s structure, and ${P}_R$ is the length of the reduced vector of item parameters corresponding to ${\boldsymbol{A}}_R$ .

If Equation 2 holds, the model permits the direct estimation of all entries in the variance–covariance matrix. However, constructing ${\boldsymbol{A}}_R$ involves systematically dropping $D$ item parameters from $\boldsymbol{A}$ to impose the necessary constraints for identification. To establish this result, Volodin and Adams (Reference Volodin and Adams2002) derived a series of theorems, which we reproduce and discuss in detail in this section. In Section 1.2, we provide a detailed illustration of this procedure for the test dimensionality structure from Figure 1.

For the dichotomous multidimensional model described in Equation 1, model identification is typically achieved by constraining the average ability in each dimension to 0 ( $\boldsymbol{\mu} =\boldsymbol{0}$ ). However, this constraint is not strictly necessary. In the general case, these averages can be estimated as part of the Rasch model, and the resulting vector of constants $\boldsymbol{c}$ can then be subtracted from the corresponding item difficulties without altering the likelihood of the data:

$$\begin{align*}{\boldsymbol{\xi}}^{\ast}&=\boldsymbol{\xi} -\boldsymbol{Bc},\\{\boldsymbol{\mu}}^{\ast}&=\boldsymbol{\mu} +\boldsymbol{c},\end{align*}$$

then,

$$\begin{align*}P\left({\boldsymbol{X}}_{ik}=1;\boldsymbol{A},\boldsymbol{B},{\boldsymbol{\xi}}^{\ast},\ {\boldsymbol{\mu}}^{\ast },\boldsymbol{\Sigma} \right)=P\left({\boldsymbol{X}}_{ik}=1;\boldsymbol{A},\boldsymbol{B},\boldsymbol{\xi},\ \boldsymbol{\mu},\ \boldsymbol{\Sigma} \right).\end{align*}$$

Naturally, the problem of model identification reduces to demonstrating that ${\boldsymbol{\xi}}^{\ast}\equiv \boldsymbol{\xi}$ and ${\boldsymbol{\mu}}^{\ast}\equiv \boldsymbol{\mu}$ for any response profile $\boldsymbol{x}$ in a vector-valued variable $\boldsymbol{X}$ . In other words, if the model is identified, the matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ must satisfy the condition ${\boldsymbol{x}}^T\left(\boldsymbol{B}\boldsymbol{\mu } +\boldsymbol{A}\boldsymbol{\xi } \right)={\boldsymbol{x}}^T\left(\boldsymbol{B}{\boldsymbol{\mu}}^{\ast }+\boldsymbol{A}{\boldsymbol{\xi}}^{\ast}\right)$ . Voloding and Adams propose a sequence of theorems to establish when this condition holds. To do so, they consider the vector $\boldsymbol{\zeta}$ of length $P+D$ , which concatenates the vectors ${\boldsymbol{\xi}}^T$ and ${\boldsymbol{\mu}}^T$ , and the corresponding vector ${\boldsymbol{\zeta}}^{\ast}$ of the same length, which concatenates the vectors ${{\boldsymbol{\xi}}^{\ast}}^T$ and ${{\boldsymbol{\mu}}^{\ast}}^T$ .

Theorem 1. The model (1) can only be identified if $P+D\le K-I$ .

Proof. Assume, $P+D>K-I$ . Then, $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ cannot be of full column rank, as it can have at most $K-I$ non-zero rows. Consequently, there would be no unique solution for the vector $\boldsymbol{\zeta}$ , contradicting the definition of ${\boldsymbol{\xi}}^{\ast}\equiv \boldsymbol{\xi}$ and ${\boldsymbol{\mu}}^{\ast}\equiv \boldsymbol{\mu}$ or an identified model (1). Therefore, $P+D\le K-I$ must hold.

Theorem 2. The model (1) can only be identified if $\mathit{{rank}}\left[\boldsymbol{A}\right]=P$ , $\mathit{{rank}}\left[\boldsymbol{B}\right]=D$ , and $\mathit{{rank}}\left[\boldsymbol{A}|\boldsymbol{B}\right]=P+D$ .

Proof. The matrix $\boldsymbol{A}$ must conform to the vector $\boldsymbol{\xi}$ of length $P$ , it should have $\mathit{{rank}}\left[\boldsymbol{A}\right]\le P$ . Assume, $\mathit{{rank}}\left[\boldsymbol{A}\right]<P$ . In this case, $\boldsymbol{A}\boldsymbol{\xi }$ does not provide a unique solution for $\boldsymbol{\xi}$ , and the model (1) cannot be identified. Therefore, if the model is identified, $\mathit{{rank}}\left[\boldsymbol{A}\right]=P$ must hold. Similarly, $\mathit{{rank}}\left[\boldsymbol{B}\right]=D$ must also hold. Consequently, $\mathit{{rank}}\left[\boldsymbol{A}|\boldsymbol{B}\right]=P+D$ also must be true.

Theorem 3. The model (1) can be identified only if $\mathit{{rank}}\left[A|B\right]=P+D\le K-I$ .

Proof. The necessary conditions directly follow from Theorems 1 and 2. To prove the sufficiency, consider the identification condition for model (1)

${\boldsymbol{x}}^T\left[\boldsymbol{A}|\boldsymbol{B}\right]\left(\boldsymbol{\zeta} -{\boldsymbol{\zeta}}^{\ast}\right)=\boldsymbol{0}\forall \boldsymbol{x}\iff \boldsymbol{\zeta} ={\boldsymbol{\zeta}}^{\ast}$ .

The matrix $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ is of size $\left(K-I\right)\times \left(P+D\right)$ with $\mathit{{rank}}\left[\boldsymbol{A}|\boldsymbol{B}\right]=P+D$ , and $P+D\le K-I$ . Thus, it is possible to remove $\left(K-I\right)-\left(P+D\right)$ rows from $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ to construct a square submatrix of size $\left(P+D\right)\times \left(P+D\right)$ in full rank. Denote this matrix as $\boldsymbol{Z}$ .

Let ${\boldsymbol{x}}^{\ast }$ be a vector corresponding to $\boldsymbol{x}$ with the same elements removed as the rows excluded from $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ to construct $\boldsymbol{Z}$ . To avoid trivial solutions, we constrain $\boldsymbol{x}$ (and ${\boldsymbol{x}}^{\ast }$ ) to not be entirely zero. Then, ${\boldsymbol{x}}^T\left[\boldsymbol{A}|\boldsymbol{B}\right]\left(\boldsymbol{\zeta} -{\boldsymbol{\zeta}}^{\ast}\right)={{\boldsymbol{x}}^{\ast}}^T\boldsymbol{Z}\left(\boldsymbol{\zeta} -{\boldsymbol{\zeta}}^{\ast}\right)=\boldsymbol{0}\forall \boldsymbol{x}$ is equivalent to $\boldsymbol{Z}\left(\boldsymbol{\zeta} -{\boldsymbol{\zeta}}^{\ast}\right)=\boldsymbol{0}$ . This holds iff $\boldsymbol{\zeta} ={\boldsymbol{\zeta}}^{\ast}$ , meaning that ${\boldsymbol{\xi}}^{\ast}\equiv \boldsymbol{\xi}$ and ${\boldsymbol{\mu}}^{\ast}\equiv \boldsymbol{\mu}$ $\forall \boldsymbol{x}$ , which is the target of showing that the model is identified.

It follows that, in general, the special cases of model (1) are not identified unless the vector $\boldsymbol{\mu}$ is not constrained to all zeros. Additionally, this procedure does not address the covariance matrix $\boldsymbol{\Sigma}$ , which spans the latent space of person parameters. Instead, it emphasizes that item parameters play the central role in the identification of Rasch models. This procedure applies broadly to any completely oblique multidimensional Rasch model, including the CORB model.

However, in many scenarios, the constraint of $\boldsymbol{\mu} =\boldsymbol{0}$ still might be insufficient for identification. For instance, if the test dimensionality structure aligns with that shown in Figure 1, the matrix $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ fails to satisfy the condition in Equation 2. More generally, avoiding the constraint $\boldsymbol{\mu} =\boldsymbol{0}$ can be advantageous. In such cases, as noted in Theorem 3, the full matrix $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ will not suffice for identification. This is where the construction of the reduced design matrix ${\boldsymbol{A}}_R$ becomes essential.

After substituting $\boldsymbol{A}$ with ${\boldsymbol{A}}_R$ in $\left[\boldsymbol{A}|\boldsymbol{B}\right]$ (resulting in $\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]$ ), Theorem 3 can often be proven in the marginal case where $\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]={P}_R+D=K-I$ , as demonstrated in this article. The key question, then, is how to construct ${\boldsymbol{A}}_R$ so that it fully preserves the structural properties of $\boldsymbol{A}$ , up to a vector of additive constants $\boldsymbol{c}$ , while still enabling the proof of Theorem 3. The process of construction ${\boldsymbol{A}}_R$ lies at the core of the Volodin–Adams procedure.

For the Volodin–Adams procedure $D$ subsets of items ( ${\boldsymbol{J}}_1,\dots, {\boldsymbol{J}}_D$ ) are defined, each of size ${n}_d>1$ . It is not necessary that ${\boldsymbol{J}}_g\cap {\boldsymbol{J}}_h=\varnothing, g\ne h$ ; but it is necessary that ${\boldsymbol{J}}_g\nsubseteq {\boldsymbol{J}}_h$ , $g\ne h$ . Next, matrix $\boldsymbol{E}$ of size $D\times D$ is constructed, where $d$ th row is represented by the vector ${\boldsymbol{e}}_d$ , consisting of the column sums of values in $\boldsymbol{B}$ for the items in ${\boldsymbol{J}}_d$ . Additionally, a set of $D$ items, $\boldsymbol{F}$ , is identified such that ${\boldsymbol{J}}_d\cap \boldsymbol{F}\ne \varnothing$ $\forall d$ .

Theorem 4. If $\det \left(\boldsymbol{E}\right)\ne 0$ , then the completely oblique multidimensional dichotomous model can still be specified if $\boldsymbol{A}$ is substituted with ${\boldsymbol{A}}_R$ , where ${\boldsymbol{A}}_R$ is reduced by $D$ columns compared to $\boldsymbol{A}$ , such that $\mathit{{rank}}\left[{\boldsymbol{A}}_R\right]={P}_R=P-D$ .

Proof. Assume, ${\boldsymbol{J}}_d\cap \boldsymbol{F}={i}_{n_dd}$ . Now, set ${\xi}_{i_{n_dd}}=-\sum_{j=1}^{n_d-1}{\xi}_{i_{jd}}$ , where ${i}_{jd}\in {\boldsymbol{J}}_d$ . Under this assumption, the ${i}_{n_dd}$ th row of ${\boldsymbol{A}}_R$ will contain value “−1” in the columns corresponding to ${i}_{jd}$ (j $=1,\dots, {n}_d-1$ ), and “0” in the column corresponding to ${i}_{n_dd}$ . Repeat this procedure $D$ times, such that ${\boldsymbol{A}}_R$ contains $D$ all-zero columns. Delete these all-zero columns, to obtain ${\boldsymbol{A}}_R$ with $\mathit{{rank}}\left[{\boldsymbol{A}}_R\right]={P}_R=P-D$ .

Now suppose $\exists \boldsymbol{c}$ such that ${\boldsymbol{x}}^T\left(\boldsymbol{B}\boldsymbol{\mu } +{\boldsymbol{A}}_R\boldsymbol{\xi} \right)={\boldsymbol{x}}^T\left(\boldsymbol{B}{\boldsymbol{\mu}}^{\ast }+{\boldsymbol{A}}_R{\boldsymbol{\xi}}^{\ast}\right)$ $\forall \boldsymbol{x}$ . For an extreme case, select $\boldsymbol{x}$ such that it contains values of “1” in positions ${i}_{nd}$ ( $n=1,\dots, {n}_d$ ) and “0” elsewhere. Then, ${\boldsymbol{x}}^T{\boldsymbol{A}}_R\boldsymbol{\xi} ={\boldsymbol{x}}^T{\boldsymbol{A}}_R{\boldsymbol{\xi}}^{\ast}=\boldsymbol{0}$ , and ${\boldsymbol{x}}^T\boldsymbol{B}\left({\boldsymbol{\mu}}^{\ast }-\boldsymbol{\mu} \right)={\boldsymbol{x}}^T\boldsymbol{B}\boldsymbol{c}={\boldsymbol{e}}_d\boldsymbol{c}=\boldsymbol{0}$ $\forall d$ . Now if $\det \left(\boldsymbol{E}\right)\ne 0$ , then the only solution is $\boldsymbol{c}=\boldsymbol{0}$ , which implies ${\boldsymbol{\xi}}^{\ast}\equiv \boldsymbol{\xi}$ and ${\boldsymbol{\mu}}^{\ast}\equiv \boldsymbol{\mu}$ .

This completes the proof of Equation 2. Essentially, Volodin and Adams (Reference Volodin and Adams2002) demonstrated that the matrix $\boldsymbol{E}$ function as a scaling and rotation matrix for the vector $\boldsymbol{c}$ (of length $D$ ), which consists of constants that can be added to the vector of means $\boldsymbol{\mu}$ and subtracted from the item parameters in each corresponding dimension without affecting the overall likelihood of the data. If the determinant of $\boldsymbol{E}$ is non-zero, the vector $\boldsymbol{c}$ can only contain zeros, indicating that the model under the given ${\boldsymbol{A}}_R$ is identified. The challenge of constructing the matrix ${\boldsymbol{A}}_R$ relies on a well-known result in the partial identification of Rasch models. According to this result, constraining the averages of latent dimensions to zero, or fixing one of the item parameters to zero, does not affect the relative rank order of items and respondents. Instead, it merely shifts the latent scale numerically, leaving the model’s interpretability and validity unaffected.

The full description of the procedure for the general case (including the polytomous case) is beyond the scope of this article; for further details, refer to Volodin and Adams (Reference Volodin and Adams2002). The Supplementary Materials for this article include the R code for a function that automates this procedure in the dichotomous case.

1.2 An Example of test dimensionality structure from Figure 1

Using the Volodin and Adams procedure, it can be shown that if a test has a structure similar to the one presented in Figure 1, it is impossible to construct non-nested sets of item parameters with a non-zero determinant of $\boldsymbol{E}$ when all dimensions are oblique. For the structure in Figure 1 under a dichotomous test, $D=4$ , $K=18$ , $I=9$ .

The complete design matrix $\boldsymbol{A}$ is of $9\unicode{x2A09} 9$ size, where each column corresponds to a single item difficulty parameter, and each row corresponds to a single item. Strictly speaking, in the design matrix $\boldsymbol{A}$ , each row should describe a single category for a single item, resulting in a matrix of $18\unicode{x2A09} 9$ size. However, since all rows corresponding to zero categories are redundant (composed entirely of zeros), they can be excluded from the design matrix for simplicity. In this simplified representation, an entry of “0” in the matrix indicates that the corresponding parameter is not applied to the respective item category, and an entry of “1” indicates that the item parameter is applied. The resulting design matrix $\boldsymbol{A}$ is as follows:

(3)

$$\begin{align}\boldsymbol{A}=\left[\begin{matrix}1& 0& 0& 0& 0& 0& 0& 0& 0\\ {}0& 1& 0& 0& 0& 0& 0& 0& 0\\ {}0& 0& 1& 0& 0& 0& 0& 0& 0\\ {}0& 0& 0& 1& 0& 0& 0& 0& 0\\ {}0& 0& 0& 0& 1& 0& 0& 0& 0\\ {}0& 0& 0& 0& 0& 1& 0& 0& 0\\ {}0& 0& 0& 0& 0& 0& 1& 0& 0\\ {}0& 0& 0& 0& 0& 0& 0& 1& 0\\ {}0& 0& 0& 0& 0& 0& 0& 0& 1\end{matrix}\right].\end{align}$$

The scoring matrix $\boldsymbol{B}$ for Figure 1 is of $9\unicode{x2A09} 4$ size (analogous to the design matrix $\boldsymbol{A}$ , it would typically be $18\unicode{x2A09} 4$ but, again, zero rows can be excluded for simplicity). Each row in $\boldsymbol{B}$ corresponds to a single item, and each column corresponds to a single latent factor. The general factor is represented by the first column. In this matrix, an entry of “0” indicates that the corresponding category does not load on the respective factor, and an entry of “1” indicates that the item does load on the respective factor. The resulting scoring matrix $\boldsymbol{B}$ is as follows:

(4)

$$\begin{align}\boldsymbol{B}=\left[\begin{matrix}1& 1& 0& 0\\ {}1& 1& 0& 0\\ {}1& 1& 0& 0\\ {}1& 0& 1& 0\\ {}1& 0& 1& 0\\ {}1& 0& 1& 0\\ {}1& 0& 0& 1\\ {}1& 0& 0& 1\\ {}1& 0& 0& 1\end{matrix}\right].\end{align}$$

To construct the reduced design matrix ${\boldsymbol{A}}_R$ , define four sets of item parameters: (i) items 1 and 2, (ii) items 3 and 4, (iii) items 5 and 6, and (iv) items 7, 8, and 9. These sets correspond to the grouping indicated by the dashed lines in Equation 5, which illustrate how the item parameters are partitioned into subsets for constructing ${\boldsymbol{A}}_R$ :

(5)

$$\begin{align}\boldsymbol{A}=\left[\begin{array}{cc:cc:cc:ccc}1& 0& 0& 0& 0& 0& 0& 0& 0\\{}0& 1& 0& 0& 0& 0& 0& 0& 0\\\hdashline{}0& 0& 1& 0& 0& 0& 0& 0& 0\\{}0& 0& 0& 1& 0& 0& 0& 0& 0\\\hdashline{}0& 0& 0& 0& 1& 0& 0& 0& 0\\{}0& 0& 0& 0& 0& 1& 0& 0& 0\\\hdashline{}0& 0& 0& 0& 0& 0& 1& 0& 0\\{}0& 0& 0& 0& 0& 0& 0& 1& 0\\{}0& 0& 0& 0& 0& 0& 0& 0& 1\end{array}\right].\end{align}$$

In this case, the reduced design matrix ${\boldsymbol{A}}_R$ is defined as follows, with the length of the reduced vector of items parameters ${P}_R=5$ :

(6)

$$\begin{align}{\boldsymbol{A}}_R=\left[\begin{array}{c:c:c:cc}1& 0& 0& 0& 0\\ {}-1& 0& 0& 0& 0\\\hdashline{}0& 1& 0& 0& 0\\{}0& -1& 0& 0& 0\\\hdashline{}0& 0& 1& 0& 0\\{}0& 0& -1& 0& 0\\\hdashline{}0& 0& 0& 1& 0\\{}0& 0& 0& 0& 1\\{}0& 0& 0& -1& -1\end{array}\right].\end{align}$$

The corresponding partitioning of the matrix $\boldsymbol{B}$ is

(7)

$$\begin{align}\boldsymbol{B}=\left[\begin{array}{cccc}1& 1& 0& 0\\{}1& 1& 0& 0\\\hdashline {}1& 1& 0& 0\\{}1& 0& 1& 0\\\hdashline {}1& 0& 1& 0\\ {}1& 0& 1& 0\\\hdashline {}1& 0& 0& 1\\ {}1& 0& 0& 1\\ {}1& 0& 0& 1\end{array}\right].\end{align}$$

Then, the matrix $\boldsymbol{E}$ , consisting of the set-wise sums of the entries in $\boldsymbol{B}$ is given by

(8)

$$\begin{align}\boldsymbol{E}=\left[\begin{matrix}2& 2& 0& 0\\ {}2& 1& 1& 0\\ {}2& 0& 2& 0\\ {}3& 0& 0& 3\end{matrix}\right].\end{align}$$

Consequently, $\det \left(\boldsymbol{E}\right)=0$ , which effectively terminates the Volodin–Adams procedure by showing that ${\boldsymbol{A}}_R$ does not fully preserve the structure of the original model. As a result, $\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]$ fails to satisfy Equation 2, confirming that the reduced design matrix does not enable the identification of the model:

(9)

$$\begin{align}\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]=\mathit{{rank}}\left[\begin{array}{c:c:c:cc|cccc}1& 0& 0& 0& 0& 1& 1& 0& 0\\ {}-1& 0& 0& 0& 0& 1& 1& 0& 0\\\hdashline {}0& 1& 0& 0& 0& 1& 1& 0& 0\\ {}0& -1& 0& 0& 0& 1& 0& 1& 0\\\hdashline {}0& 0& 1& 0& 0& 1& 0& 1& 0\\ {}0& 0& -1& 0& 0& 1& 0& 1& 0\\\hdashline {}0& 0& 0& 1& 0& 1& 0& 0& 1\\ {}0& 0& 0& 0& 1& 1& 0& 0& 1\\ {}0& 0& 0& -1& -1& 1& 0& 0& 1\end{array}\right]=8\ne D+{P}_R.\end{align}$$

Repeating this procedure for any arbitrary partitioning of items into sets demonstrates that such a test dimensionality structure does not permit the identification of the oblique bifactor model. Consequently, constraining all covariances among person dimensions to zero becomes necessary for model identification, leading to the orthogonal bifactor Rasch model (Wang & Wilson, Reference Wang and Wilson2005).

However, it is important to note that this procedure describes an analytical approach to model identification. In practice, the general principles of modeling suggest that some constraints can be introduced into analytically unidentified models to achieve empirical identification (Kenny, Reference Kenny1979; Rindskopf, Reference Rindskopf1984). The Volodin–Adams procedure does not account for such constraints; it specifically evaluates whether the completely oblique multidimensional Rasch model is analytically identified.

The orthogonal Rasch bifactor model represents an extreme yet common solution for identifying bifactor models, where all factor covariances are simultaneously constrained to zero. In Section 3.3, we discuss that while this solution ensures identification, it may be overly restrictive and unsuitable for certain purposes.

2 The completely oblique Rasch bifactor model

The CORB model is distinguished from the orthogonal bifactor Rasch model by two key features.

2.1 Distinction 1: the variance–covariance matrix

The first distinction is that, unlike the orthogonal bifactor Rasch model, the CORB model enables the simultaneous estimation of all entries in the variance–covariance matrix $\boldsymbol{\Sigma}$ of latent factors. For example, in a test consisting of three specific factors, the variance–covariance matrix of the dimensions in the latent person parameter space for the CORB model takes the form shown in Equation 10. This contrasts with the orthogonal bifactor model, where the corresponding variance–covariance matrix is restricted as shown in Equation 11 (Wang & Wilson, Reference Wang and Wilson2005).

(10)

$$\begin{align}\boldsymbol{\Sigma} =\left[\begin{matrix}\mathit{\operatorname{var}}\left({\theta}_g\right)& \mathit{\operatorname{cov}}\left({\theta}_g,{\theta}_{s_1}\right)& \mathit{\operatorname{cov}}\left({\theta}_g,{\theta}_{s_2}\right)& \mathit{\operatorname{cov}}\left({\theta}_g,{\theta}_{s_3}\right)\\ {}\mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_g\right)& \mathit{\operatorname{var}}\left({\theta}_{s_1}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_{s_2}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_{s_3}\right)\\ {}\mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_g\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_{s_1}\right)& \mathit{\operatorname{var}}\left({\theta}_{s_2}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_{s_3}\right)\\ {}\mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_g\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_{s_1}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_{s_2}\right)& \mathit{\operatorname{var}}\left({\theta}_{s_3}\right)\end{matrix}\right].\end{align}$$

(11)

From the comparison of the variance–covariance matrices, it is evident that the orthogonal bifactor Rasch model is a special case of the CORB model. Specifically, constraining all off-diagonal elements in Equation 10 to zero results in the matrix form given in Equation 11.

2.2 Distinction 2: the structure of test dimensionality

The orthogonal bifactor Rasch model (Wang & Wilson, Reference Wang and Wilson2005), when specified for the structure of test dimensionality similar to Figure 1, can be expressed in scalar notation as followsFootnote ³:

(12)

$$\begin{align}P\left({X}_i=1|\boldsymbol{\theta} \right)\propto \exp \left({\theta}_g+{\theta}_{s_d}-{\xi}_i\right),\end{align}$$

where $P\left({X}_i=1|\boldsymbol{\theta} \right)$ is the probability of a response of 1 to item $i$ , given the vector-valued latent variable $\boldsymbol{\theta}$ of dimensionality $D$ ,

${\theta}_g$ is the value of the general factor,

${\theta}_{s_d}$ is the value of the specific factor $d$ ( $d=1,\dots, {D}_s$ , where ${D}_s$ is the number of specific factors, so that $D={D}_s+1$ due to the general factor), and

${\xi}_i$ is the difficulty of item $i$ .

We refer to test dimensionality structures similar to Figure 1 as “clear bifactor structures”: no items load solely on the general factor without also loading on specific factors, and no specific factors share any items. Jennrich and Bentler (Reference Jennrich and Bentler2012) describe such bifactor structures as “perfect cluster structures,” referring to item clustering logic.

The CORB model is not identified for all such clear bifactor structures. However, the CORB model becomes identifiable when the test dimensionality structure resembles the one shown in Figure 2—that is, when there is at least one item that loads on the general factor but not on any specific factor.

Figure 2 A bifactor structure for identifying the CORB model. Item 1 loads solely on the general factor, while no items are shared between any pair of specific factors. Factor covariances are non-zero but are not depicted in the figure.

To define the test dimensionality structure for a case like Figure 2, researchers must identify two sets of items: $\boldsymbol{G}$ —the set of items (consisting of at least one item) that loads only on the general factor and not on any specific factor, and $\boldsymbol{T}$ —the set of items that load on both the general factor and one specific factor.

The complete scalar formulation of this model is as follows:

(13)

$$\begin{align}P\left({X}_i=1|\boldsymbol{\theta} \right)\propto \left\{\begin{array}{c}\exp \left({\theta}_g+{\theta}_{s_d}-{\xi}_i\right),\ if\;i\in \boldsymbol{T},\\ {}\exp \left({\theta}_g-{\xi}_i\right),\ if\;i\in \boldsymbol{G}.\end{array}\right.\end{align}$$

For the structure shown in Figure 2, $\boldsymbol{T}=\left\{2,3,4,5,6,7,8,9,10\right\}$ , representing items that load on both the general and specific factors, and $\boldsymbol{G}=\left\{1\right\}$ , representing the item that loads solely on the general factor. We call CORB models with such structures “G-structures.”

The implementation of the Volodin–Adams procedure, analogous to the outline provided for Equations 3–9, is illustrated below for the G-structure depicted in Figure 2. This implementation demonstrates that the procedure enables the construction of ${\boldsymbol{A}}_R$ , where the corresponding $\boldsymbol{E}$ matrix has a non-zero determinant, thereby satisfying Equation 2.

$$\begin{align*}\boldsymbol{A}=\left[\begin{array}{cc:cc:ccc:ccc}1& 0& 0& 0& 0& 0& 0& 0& 0& 0\\ {}0& 1& 0& 0& 0& 0& 0& 0& 0& 0\\\hdashline {}0& 0& 1& 0& 0& 0& 0& 0& 0& 0\\ {}0& 0& 0& 1& 0& 0& 0& 0& 0& 0\\\hdashline {}0& 0& 0& 0& 1& 0& 0& 0& 0& 0\\ {}0& 0& 0& 0& 0& 1& 0& 0& 0& 0\\ {}0& 0& 0& 0& 0& 0& 1& 0& 0& 0\\\hdashline {}0& 0& 0& 0& 0& 0& 0& 1& 0& 0\\ {}0& 0& 0& 0& 0& 0& 0& 0& 1& 0\\ {}0& 0& 0& 0& 0& 0& 0& 0& 0& 1\end{array}\right],\boldsymbol{B}=\left[\begin{array}{cccc}1& 0& 0& 0\\ {}1& 1& 0& 0\\\hdashline {}1& 1& 0& 0\\ {}1& 1& 0& 0\\\hdashline {}1& 0& 1& 0\\ {}1& 0& 1& 0\\ {}1& 0& 1& 0\\\hdashline {}1& 0& 0& 1\\ {}1& 0& 0& 1\\ {}1& 0& 0& 1\end{array}\right].\end{align*}$$

$$\begin{align*}{\boldsymbol{A}}_R=\left[\begin{array}{c:c:cc:cc}1& 0& 0& 0& 0& 0\\ {}-1& 0& 0& 0& 0& 0\\\hdashline {}0& 1& 0& 0& 0& 0\\ {}0& -1& 0& 0& 0& 0\\\hdashline {}0& 0& 1& 0& 0& 0\\ {}0& 0& 0& 1& 0& 0\\ {}0& 0& -1& -1& 0& 0\\\hdashline {}0& 0& 0& 0& 1& 0\\ {}0& 0& 0& 0& 0& 1\\ {}0& 0& 0& 0& -1& -1\end{array}\right],{P}_R=6.\end{align*}$$

$$\begin{align*}\det \left(\boldsymbol{E}\right)=\det \left[\begin{matrix}2& 1& 0& 0\\ {}2& 2& 0& 0\\ {}3& 0& 3& 0\\ {}3& 0& 0& 3\end{matrix}\right]=18\ne 0.\end{align*}$$

$$\begin{align*}\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]=\mathit{{rank}}\left[\begin{array}{c:c:cc:cc|cccc}1& 0& 0& 0& 0& 0& 1& 0& 0& 0\\ {}-1& 0& 0& 0& 0& 0& 1& 1& 0& 0\\\hdashline {}0& 1& 0& 0& 0& 0& 1& 1& 0& 0\\ {}0& -1& 0& 0& 0& 0& 1& 1& 0& 0\\\hdashline {}0& 0& 1& 0& 0& 0& 1& 0& 1& 0\\ {}0& 0& 0& 1& 0& 0& 1& 0& 1& 0\\ {}0& 0& -1& -1& 0& 0& 1& 0& 1& 0\\\hdashline {}0& 0& 0& 0& 1& 0& 1& 0& 0& 1\\ {}0& 0& 0& 0& 0& 1& 1& 0& 0& 1\\ {}0& 0& 0& 0& -1& -1& 1& 0& 0& 1\end{array}\right]=10=D+{P}_R=K-I.\end{align*}$$

The intuitive explanation for this logic can be drawn from the work of Zhang, Luo, Zhang, et al. (Reference Zhang, Luo, Zhang, Sun and Zhang2023). They demonstrated that the identification of partially oblique bifactor factor-analytical models hinges on the factor loadings matrix. In Rasch modeling, however, all discrimination parameters are constrained to unity. Now, consider a “clear,” completely oblique bifactor structure. As shown earlier (Equations 3–9), such a Rasch bifactor model is not identifiable. However, adding a “construct item” (from the $\boldsymbol{G}$ set) to this bifactor structure increases the number of observed variables without increasing the number of estimated factor loadings. This adjustment renders the model identifiable.

That said, the G-structure requires at least one item that loads solely on the general factor, effectively defining it. Eid et al. (Reference Eid, Geiser, Koch and Heene2017) refer to such items as “reference indicators,” as all other indicators’ parameters are estimated relative to this reference. Including general construct items, however, may be impractical when the test is purely composite and comprises distinct components. While Eid et al. (Reference Eid, Geiser, Koch and Heene2017) emphasize the necessity of such items and Zhang, Luo, Zhang, et al. (Reference Zhang, Luo, Zhang, Sun and Zhang2023) provide detailed guidance on selecting them (including real-world examples, which interested readers may consult in their work). Overall, however, this requirement poses a challenge for test developers and item writers.

Fortunately, an alternative test dimensionality structure can identify the CORB model, as shown in Figure 3. This structure requires that every pair of dimensions share at least one item.

Figure 3 A bifactor structure for identifying the CORB model. No single item loads solely on the general factor, but at least one item is shared between each pair of specific factors (i.e., item 1 for specific factors 1 and 3, item 4 for specific factors 1 and 2, and item 7 for specific factors 2 and 3). Factor covariances are non-zero but are not depicted in the figure.

To specify the structure of test dimensionality for a case similar to Figure 3, one must define two sets of items: $\boldsymbol{S}$ —the set of items that load on two specific factors, and $\boldsymbol{T}$ —the set of items that load on one specific factor.

The complete scalar formulation of this model is given as

(14)

$$\begin{align}P\left({X}_i=1|\boldsymbol{\theta} \right)\propto \left\{\begin{array}{c}\exp \left({\theta}_g+{\theta}_{s_d}-{\xi}_i\right),\ if\;i\in \boldsymbol{T},\\ {}\exp \left({\theta}_g+{\theta}_{s_{d_u}}+{\theta}_{s_{d_y}}-{\xi}_i\right),\ if\;i\in \boldsymbol{S},u\ne y.\end{array}\right.\end{align}$$

For the structure depicted in Figure 3: $\boldsymbol{T}=\left\{2,3,5,6,8,9\right\}$ , representing items that load on the general factor and one specific factor, and $\boldsymbol{S}=\left\{1,4,7\right\}$ , representing items that load on the general factor and two specific factors. We call CORB models with such structures “S-structures.”

The calculations below illustrate the implementation of the Volodin and Adams procedure for the test dimensionality structure shown in Figure 3:

$$\begin{align*}\boldsymbol{A}=\left[\begin{array}{cc:cc:cc:ccc}1& 0& 0& 0& 0& 0& 0& 0& 0\\ {}0& 1& 0& 0& 0& 0& 0& 0& 0\\\hdashline {}0& 0& 1& 0& 0& 0& 0& 0& 0\\ {}0& 0& 0& 1& 0& 0& 0& 0& 0\\\hdashline {}0& 0& 0& 0& 1& 0& 0& 0& 0\\ {}0& 0& 0& 0& 0& 1& 0& 0& 0\\\hdashline {}0& 0& 0& 0& 0& 0& 1& 0& 0\\ {}0& 0& 0& 0& 0& 0& 0& 1& 0\\ {}0& 0& 0& 0& 0& 0& 0& 0& 1\end{array}\right],\boldsymbol{B}=\left[\begin{array}{cccc}1& 1& 0& 1\\ {}1& 1& 0& 0\\\hdashline {}1& 1& 0& 0\\ {}1& 1& 1& 0\\\hdashline {}1& 0& 1& 0\\ {}1& 0& 1& 0\\\hdashline {}1& 0& 1& 1\\ {}1& 0& 0& 1\\ {}1& 0& 0& 1\end{array}\right].\end{align*}$$

$$\begin{align*}{\boldsymbol{A}}_R=\left[\begin{array}{c:c:c:cc}1& 0& 0& 0& 0\\ {}-1& 0& 0& 0& 0\\\hdashline {}0& 1& 0& 0& 0\\ {}0& -1& 0& 0& 0\\\hdashline {}0& 0& 1& 0& 0\\ {}0& 0& -1& 0& 0\\\hdashline {}0& 0& 0& 1& 0\\ {}0& 0& 0& 0& 1\\ {}0& 0& 0& -1& -1\end{array}\right],{P}_R=5.\end{align*}$$

$$\begin{align*}\det \left(\boldsymbol{E}\right)=\det \left[\begin{matrix}2& 2& 0& 1\\ {}2& 2& 1& 0\\ {}2& 0& 2& 0\\ {}3& 0& 1& 3\end{matrix}\right]=4\ne 0.\end{align*}$$

$$\begin{align*}\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]=\mathit{{rank}}\left[\begin{array}{c:c:c:cc|cccc}1& 0& 0& 0& 0& 1& 1& 0& 1\\ {}-1& 0& 0& 0& 0& 1& 1& 0& 0\\\hdashline {}0& 1& 0& 0& 0& 1& 1& 0& 0\\ {}0& -1& 0& 0& 0& 1& 1& 1& 0\\\hdashline {}0& 0& 1& 0& 0& 1& 0& 1& 0\\ {}0& 0& -1& 0& 0& 1& 0& 1& 0\\\hdashline {}0& 0& 0& 1& 0& 1& 0& 1& 1\\ {}0& 0& 0& 0& 1& 1& 0& 0& 1\\ {}0& 0& 0& -1& -1& 1& 0& 0& 1\end{array}\right]=9=D+{P}_R=K-I.\end{align*}$$

To intuitively understand why the S-structure allows for CORB model identification, we turn to the geometric interpretation of the multidimensional Item Characteristic Surface (ICS; Ackerman, Reference Ackerman1994; Reckase & McKinley, Reference Reckase and McKinley1991). In this framework, the shape of the multidimensional ICS is determined by two key factors: (1) the angle between the latent dimensions (represented as the arccosine of the Pearson correlation between the factors) measured by the items with within-item multidimensionality, and (2) the discrimination parameters of these items on the respective factors.

In Rasch models, however, the discrimination parameters are constrained to unity. As a result, the structure of item response variance in the shared items directly defines the correlations between the latent factors, since the discriminations cannot vary freely. This fixed discrimination ensures that shared items play a critical role in establishing the relationships among the latent dimensions, making the S-structure effective for CORB model identification.

When a test consists of multiple content areas, the S-structure of test dimensionality may offer a more practical approach to CORB model identification. To specify this structure, a test developer can enhance the existing test by adding new items that combine, in a compensatory manner, pairs of specific factors. Alternatively, Bifactor Exploratory Structural Equation Modeling (Morin et al., Reference Morin, Arens and Marsh2016) can aid in identifying items suitable for inclusion in the $\boldsymbol{S}$ set. Such items should exhibit significant and relatively similar factor loadings to the “main” items associated with the specific factors. This approach is often more feasible than creating “construct items” to define the general factor, which can be challenging in most testing contexts.

It is crucial to note that not all items are suitable to serve as “construct items” in G-structures or “shared items” in S-structures. A defining feature of Rasch modeling is that all items with the same dimensionality structure share the same discrimination parameters. This concept, when related to the logic of factor analysis, implies that all items with the same dimensionality structure allocate the same proportions of response variance to the different latent factors.

For example, if a “shared item” in an S-structure has a distribution of response variance across latent factors that does not align with the variance–covariance structure of other items loading on these factors, it is likely to be flagged as a misfitting item in item fit analyses. Similarly, “construct items” in G-structures are subject to the same requirement. As a result, modifying the dimensionality structure of existing test items or developing new tests identifying the CORB model remains a challenging task.

In both cases, deviations from the clear bifactor structure (Figure 1) are necessary to identify the CORB model. However, it is important to note that the G-structure and S-structure do not exhaust the possible dimensionality structures capable of identifying the CORB model. To determine whether the CORB model is identifiable for a particular test structure, it is necessary to apply the Volodin-Adams procedure.

Additionally, both the G-structure and S-structure also identify the orthogonal Rasch bifactor model. This is because the orthogonal Rasch bifactor model is a special case of the more general CORB model, which is identifiable under these structures. In such cases, instead of being defined solely by Equation 12, the orthogonal Rasch bifactor model would also be described by Equations 13 or 14, depending on the structure.

3 Other oblique bifactor models

3.1 The Extended Rasch Testlet model

The closest relative of the CORB model in the literature is the Extended Rasch Testlet model (ETM; Paek et al., Reference Paek, Yon, Wilson and Kang2009). The ETM allows for the estimation of non-zero correlations between the specific factors and the general factor while maintaining orthogonality among the specific factors. The variance–covariance matrix $\boldsymbol{\Sigma}$ of person parameters for the same number of dimensions as in Equations 10 and 11 is represented as follows:

(15)

The scalar specification of the ETM follows the same form as Equations 13 or 14. The only difference between the CORB model and the ETM lies in their variance–covariance structures. Specifically, the ETM is a special case of the CORB model: constraining all off-diagonal elements in Equation 10, except for those corresponding to the covariances between the general factor and specific factors (i.e., the first row and the first column), results in Equation 15. This implies that the same test dimensionality structures that identify the CORB model also identify the ETM. Furthermore, in the original paper (Paek et al., Reference Paek, Yon, Wilson and Kang2009), the G-structure of the ETM was used for model identification, corresponding to Equation 13, as several “construct items” were included.

At the same time, the original orthogonal Rasch bifactor model (originally called the Rasch Testlet Model or RTM; Wang & Wilson, Reference Wang and Wilson2005) is a special case of the ETM. Constraining all off-diagonal elements in Equation 15 to zero results in Equation 11, which represents the variance–covariance matrix of the RTM. Consequently, these models form a hierarchy of nested models, enabling their comparison using a likelihood ratio test.

3.2 The Subdimensional family of models

The GSM (Brandt, Duckor, Reference Brandt and Duckor2013) and the Subdimensional Rasch Model (SRM; Brandt, Reference Brandt2008) allow for the estimation of correlations between specific factors while maintaining orthogonality between the specific factors and the general factor. To achieve this, these models require the exclusion of one specific factor from estimation:

(16)

$$\begin{align}\boldsymbol{\Sigma} =\left[\begin{matrix}\mathit{\operatorname{var}}\left({\theta}_g\right)& 0& 0& NA\\ {}0& \mathit{\operatorname{var}}\left({\theta}_{s_1}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_{s_1}\right)& NA\\ {}0& \mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_{s_2}\right)& \mathit{\operatorname{var}}\left({\theta}_{s_2}\right)& NA\\ {} NA& NA& NA& NA\end{matrix}\right].\end{align}$$

In this setup, the last specific factor in Equation 16 is defined as the negative sum of all remaining specific factors. This constraint necessitates a modification of the scoring matrix $\boldsymbol{B}$ , as shown in Equation 17. Below is an example of the scoring matrix $\boldsymbol{B}$ for a clear bifactor structure (Figure 1):

(17)

$$\begin{align}\boldsymbol{B}=\left[\begin{array}{ccc}1& 1& 0\\ {}1& 1& 0\\\hdashline {}1& 1& 0\\ {}1& 0& 1\\\hdashline {}1& 0& 1\\ {}1& 0& 1\\ {}1& -1& -1\\ {}1& -1& -1\\ {}1& -1& -1\end{array}\right].\end{align}$$

Comparing Equation 17 with Equation 4 shows that this modification of the scoring matrix $\boldsymbol{B}$ makes the GSM not a special case of the CORB model, as it does not simply constrain some parameters to zero.

Due to the exclusion of a specific factor, it is necessary to recalibrate the GSM with alternative reparameterizations at least three times to obtain the complete variance–covariance matrix of the specific factors (i.e., GSM and SRM require ${D}_s\ge 3$ ). This process involves:

(1) Excluding the last specific factor ( ${s}_{D_s}$ ) to recover all covariances between specific except those involving last specific factor ( ${s}_{D_s}$ ), as described by Equations 16 and 17.

(2) Excluding the second to last specific factor ( ${s}_{D_s-1}$ ) to recover all covariances involving the last specific factor ( ${s}_{D_s}$ ), except for the covariance between specific factors ${s}_{D_s}$ and ${s}_{D_s-1}$ . This step results in

(18)

$$\begin{align} \boldsymbol{\Sigma} =\left[\begin{matrix}\mathit{\operatorname{var}}\left({\theta}_g\right)& 0& NA& 0\\ {}0& \mathit{\operatorname{var}}\left({\theta}_{s_1}\right)& NA& \mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_{s_1}\right)\\ {} NA& NA& NA& NA\\ {}0& \mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_{s_3}\right)& NA& \mathit{\operatorname{var}}\left({\theta}_{s_3}\right)\end{matrix}\right],\end{align}$$

(19)

$$\begin{align} \boldsymbol{B}=\left[\begin{array}{ccc}1& 1& 0\\ {}1& 1& 0\\\hdashline {}1& 1& 0\\ {}1& -1& -1\\\hdashline {}1& -1& -1\\ {}1& -1& -1\\ {}1& 0& 1\\ {}1& 0& 1\\ {}1& 0& 1\end{array}\right],\end{align}$$

(3) Excluding the third to last specific factor ( ${s}_{D_s-2}$ ) to recover the covariance of the specific factors ${s}_{D_s}$ and ${s}_{D_s-1}$ . This step results in

(20)

$$\begin{align} \boldsymbol{\Sigma} =\left[\begin{matrix}\mathit{\operatorname{var}}\left({\theta}_g\right)& NA& 0& 0\\ {} NA& NA& NA& NA\\ {}0& NA& \mathit{\operatorname{var}}\left({\theta}_{s_2}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_{s_2}\right)\\ {}0& NA& \mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_{s_3}\right)& \mathit{\operatorname{var}}\left({\theta}_{s_3}\right)\end{matrix}\right],\end{align}$$

(21)

$$\begin{align} \boldsymbol{B}=\left[\begin{array}{ccc}1& -1& -1\\ {}1& -1& -1\\\hdashline {}1& -1& -1\\ {}1& 1& 0\\\hdashline {}1& 1& 0\\ {}1& 1& 0\\ {}1& 0& 1\\ {}1& 0& 1\\ {}1& 0& 1\end{array}\right].\end{align}$$

These different reparameterizations describe the same latent space of person parameters, differing only in which parameters are directly estimated. This is possible because all reparameterizations satisfy the constraint $\sum_{d=1}^{D_s}{\theta}_{s_d}=0$ for every respondent. The choice of which factor to exclude is arbitrary and does not affect model fit. This can also be verified using the Volodin–Adams procedure across different reparameterizations.

Equation 22 applies to all reparameterizations since they differ only in the scoring matrix $\boldsymbol{B}$ :

(22)

$$\begin{align} {\boldsymbol{A}}_R=\left[\begin{array}{c:c:cccc}1& 0& 0& 0& 0& 0\\ {}-1& 0& 0& 0& 0& 0\\\hdashline {}0& 1& 0& 0& 0& 0\\ {}0& -1& 0& 0& 0& 0\\\hdashline {}0& 0& 1& 0& 0& 0\\ {}0& 0& 0& 1& 0& 0\\ {}0& 0& 0& 0& 1& 0\\ {}0& 0& 0& 0& 0& 1\\ {}0& 0& -1& -1& -1& -1\end{array}\right],{P}_R=6.\end{align}$$

Equations 23–25 demonstrate that the determinants of the $\boldsymbol{E}$ matrices are non-zero when the partitioning specified in Equation 22 is applied to Equations 17, 19, and 21:

(23)

$$\begin{align} \det \left(\boldsymbol{E}\right)=\det \left[\begin{matrix}2& 2& 0\\ {}2& 1& 1\\ {}5& -3& -1\end{matrix}\right]=18\ne 0,\end{align}$$

(24)

$$\begin{align} \det \left(\boldsymbol{E}\right)=\det \left[\begin{matrix}2& 2& 0\\ {}2& 0& -1\\ {}5& -2& 1\end{matrix}\right]=-18\ne 0,\end{align}$$

(25)

$$\begin{align} \det \left(\boldsymbol{E}\right)=\det \left[\begin{matrix}2& -2& -2\\ {}2& 0& -1\\ {}5& 2& 3\end{matrix}\right]=18\ne 0.\end{align}$$

The absolute values of all determinants are identical, indicating that not only does ${\boldsymbol{A}}_R$ fully specify the model represented by $\boldsymbol{A}$ , but also that the same partitioning of the $\boldsymbol{A}$ matrix across different parameterizations of the same GSM model results in the same latent person parameter space. This property arises from the nature of the $\boldsymbol{E}$ matrix, which moderates the scaling and rotation of the constants vector $\boldsymbol{c}$ . Equation 26 confirms that all reparameterizations of the GSM model are identifiable.

(26)

$$\begin{align}\mathit{{rank}}\left[{\boldsymbol{A}}_R|\boldsymbol{B}\right]&= \mathit{{rank}}\left[\begin{array}{c:c:cccc|ccc}1& 0& 0& 0& 0& 0& 1& 1& 0\\ {}-1& 0& 0& 0& 0& 0& 1& 1& 0\\\hdashline {}0& 1& 0& 0& 0& 0& 1& 1& 0\\ {}0& -1& 0& 0& 0& 0& 1& 0& 1\\\hdashline {}0& 0& 1& 0& 0& 0& 1& 0& 1\\ {}0& 0& 0& 1& 0& 0& 1& 0& 1\\ {}0& 0& 0& 0& 1& 0& 1& -1& -1\\ {}0& 0& 0& 0& 0& 1& 1& -1& -1\\ {}0& 0& -1& -1& -1& -1& 1& -1& -1\end{array}\right]\nonumber\\[6pt]&= \mathit{{rank}}\left[\begin{array}{c:c:cccc|ccc}1& 0& 0& 0& 0& 0& 1& 1& 0\\ {}-1& 0& 0& 0& 0& 0& 1& 1& 0\\\hdashline {}0& 1& 0& 0& 0& 0& 1& 1& 0\\ {}0& -1& 0& 0& 0& 0& 1& -1& -1\\\hdashline {}0& 0& 1& 0& 0& 0& 1& -1& -1\\ {}0& 0& 0& 1& 0& 0& 1& -1& -1\\ {}0& 0& 0& 0& 1& 0& 1& 0& 1\\ {}0& 0& 0& 0& 0& 1& 1& 0& 1\\ {}0& 0& -1& -1& -1& -1& 1& 0& 1\end{array}\right]\nonumber\\[6pt]&= \mathit{{rank}}\left[\begin{array}{c:c:cccc|ccc}1& 0& 0& 0& 0& 0& 1& -1& -1\\ {}-1& 0& 0& 0& 0& 0& 1& -1& -1\\\hdashline {}0& 1& 0& 0& 0& 0& 1& -1& -1\\ {}0& -1& 0& 0& 0& 0& 1& 1& 0\\\hdashline {}0& 0& 1& 0& 0& 0& 1& 1& 0\\ {}0& 0& 0& 1& 0& 0& 1& 1& 0\\ {}0& 0& 0& 0& 1& 0& 1& 0& 1\\ {}0& 0& 0& 0& 0& 1& 1& 0& 1\\ {}0& 0& -1& -1& -1& -1& 1& 0& 1\end{array}\right]=9=D+{P}_R=K-I.\end{align}$$

Moreover, empirical comparisons of parameters estimated multiple times across different reparameterizations demonstrate that they converge to the same values (Federiakin, Reference Federiakin2020).

From the model definitions described above, it follows that calibrating the GSM to study the variance–covariance matrix becomes meaningless when the number of specific factors is two. In such cases, one of the two specific factors will always be excluded from calibration under any parameterization, and their correlation will necessarily be constrained to −1, since the sum of the specific factors is fixed to zero for every respondent.

Unlike the ETM and the CORB model, the GSM can be identified in cases of clear bifactor structures. The GSM follows the scalar form:

(27)

$$\begin{align}P\left({X}_i=1|\boldsymbol{\theta} \right)\propto \exp \left({k}_d\left({\theta}_g+{\theta}_{s_d}\right)-{\xi}_i\right),\end{align}$$

or equivalently:

(28)

$$\begin{align}P\left({X}_i=1|\boldsymbol{\theta} \right)\propto \exp \left({k}_d\left({\theta}_g+{\theta}_{s_d}-{\xi}_i\right)\right).\end{align}$$

The parameter ${k}_d$ distinguishes the GSM from the SRM (which follows Equation 12) and highlights that the GSM is not a special case of the CORB model. The parameter ${k}_d$ is essential for addressing an implicit assumption in the SRM, which assumes equality of variances across all specific factors. Consequently, the GSM requires an additional constraint of $\sum_{d=1}^{D_s}{k_d}^2={D}_s$ . The notation in Equation 27 was initially proposed by Brandt and Duckor (Reference Brandt and Duckor2013), while the notation in Equation 28 was introduced later by Robitzsch et al. (Reference Robitzsch, Kiefer and Wu2025, p. 145) for simplicity in estimation. It is important to note, however, that these two notations are equivalent and both align with the Rasch modeling paradigm.

Additionally, the GSM differs from the ETM and the CORB model in its interpretation of the latent parameter space. In the GSM, the specific factors are orthogonal to the general factor, and their sum is constrained to zero. As a result, the GSM models the relationships among the components within the general factor (see Brandt, Reference Brandt2017, for the algebraic formalization). The construct components themselves, under this unidimensional interpretation, are represented as the sums of the corresponding specific factors and the general factor. In contrast, the ETM and, by extension, the CORB model describe components that are additional to the general factor. In this sense, the GSM is conceptually closer to a unidimensional model, while the ETM and CORB models are “more multidimensional” in their interpretation.

Consequently, the GSM does not belong to the model hierarchy of RT-ETM-CORB. Comparisons between the GSM and these models can only be conducted using information criteria such as AIC (Akaike, Reference Akaike1974) and BIC (Schwarz, Reference Schwarz1978). These criteria penalize model fit for additional parameters (AIC) and adjust for sample size (BIC).

3.3 Other possibilities for oblique bifactor modeling

It is important to note that the models discussed so far do not represent the full range of oblique bifactor models. Exploratory bifactor factor analysis offers additional oblique bifactor solutions. For example, Jennrich and Bentler (Reference Jennrich and Bentler2012) proposed two criteria for bifactor rotation of the factor loading matrix. However, their approach has an approximating nature and comes with additional requirements.

First, their method constrains the specific factors to be orthogonal to the general factor, making its interpretation the reverse of the ETM. Second, their approach is not identified in clear bifactor cases. When the data structure is truly bifactor, Jennrich and Bentler’s criteria fail to provide a unique factor solution. Finally, this approach belongs to the exploratory data analysis paradigm, which poses challenges for its application in hypothesis testing, modeling growth and change, or conducting measurement invariance analysis. As a result, the practical application of these models in testing scenarios remains limited.

Lorenzo-Seva and Ferrando (Reference Lorenzo-Seva and Ferrando2019) proposed a somewhat similar logic for partially oblique exploratory bifactor modeling. Their approach involves a sequence of rotation steps designed to build upon one another, stabilizing the results of their procedure.

Partially oblique confirmatory bifactor models have recently gained attention in the field of factor analysis. Fang et al. (Reference Fang, Guo, Xu, Ying and Zhang2021) demonstrated that, within the covariance structure model (applicable to both identity and probit link functions), it is not analytically necessary for bifactor models to have orthogonal specific factors if the factor loading matrix satisfies certain conditions of linear independence. A key condition for their identification is the linear independence of columns in the submatrices of the factor loadings matrix. Specifically, if the submatrices corresponding to the specific factors have a column rank of at least 2, models with correlated specific factors can be identified (for details, see Fang et al., Reference Fang, Guo, Xu, Ying and Zhang2021).

In their work, Fang et al. (Reference Fang, Guo, Xu, Ying and Zhang2021) adapted the general results of Anderson and Rubin (Reference Anderson and Rubin1956) and the conclusions of Grayson and Marsh (Reference Grayson and Marsh1994) for Multitrait-Multimethod (MTMM) models to bifactor models. However, more recently, Zhang, Luo, Zhang, et al. (Reference Zhang, Luo, Zhang, Sun and Zhang2023) revealed that these models are highly numerically unstable in practice, highlighting the need for more rigorous investigation into their empirical identifiability and cautioning against their unchecked use. Interestingly, Zhang et al. (Reference Zhang, Sun, Cao and Drasgow2021) and Zhang, Luo, Zhang, et al. (Reference Zhang, Luo, Zhang, Sun and Zhang2023) also proposed a model augmentation approach equivalent to the G-structures of test dimensionality described in this article. They demonstrated that this approach stabilizes estimation algorithms and resolves many convergence issues in the case of freely estimated factor loadings. Notably, these suggestions follow the structure of partially oblique bifactor models—specifically Bifactor-(S-1) and Bifactor-(S*I-1) models with correlated specific factors (Eid et al., Reference Eid, Geiser, Koch and Heene2017)—which have been critically discussed by Koch and Eid (Reference Koch and Eid2024).

In the context of this article, these findings suggest that while there are structural parallels between factor analysis and logistic IRT, the identification strategies can differ significantly (Bee et al., Reference Bee, Koch and Eid2023). Further exploration of these differences and their implications for model stability and practical application remains a promising area for future research.

Overall, the Bifactor-(S-1) and Bifactor-(S*I-1) models (Eid et al., Reference Eid, Geiser, Koch and Heene2017), the augmentation approach by Zhang et al. (Reference Zhang, Sun, Cao and Drasgow2021) and Zhang, Luo, Zhang, et al. (Reference Zhang, Luo, Zhang, Sun and Zhang2023), and G-structures all fit within a common structural framework. However, by fixing factor loadings to known values, researchers are able to estimate correlations among all latent factors. Crucially, this alters the interpretation of these correlations. In traditional partially oblique bifactor models (such as Bifactor-(S-1) or Bifactor-(SI-1)), correlations between specific factors are partial correlations—conditional on the general factor—similar to the correlations between general and specific factors in the ETM model. In contrast, in the CORB model the latent dimensions are not treated as residuals; they are not conditioned on one another. As a result, their variances are not strictly separated, allowing for a more holistic interpretation of the latent structure.

Additionally, the literature describes other CORBs that impose specific constraints on the variance–covariance matrix of person parameters. For example, Robitzsch et al. (Reference Robitzsch, Kiefer and Wu2025) introduce models with a zero constraint on the sum of covariances across all dimensions (Robitzsch et al., Reference Robitzsch, Kiefer and Wu2025, p. 143), or a zero constraint on the sum of variances and covariances of all dimensions (Robitzsch et al., Reference Robitzsch, Kiefer and Wu2025, pp. 143–144). These models appear to be identifiable under clear bifactor structures, though this conclusion does not directly follow from the Volodin–Adams procedure.

This suggests that certain constraints on the variance–covariance matrix can render analytically unidentified multidimensional Rasch models empirically identifiable. Consequently, some special cases of the CORB model—such as the ETM—may also be empirically identified under clear bifactor structures.

In contrast, the G-structures and S-structures of test dimensionality described in this paper provide analytical (in this context—definitive) identification for the CORB model and all its special cases, including the ETM. However, the models introduced by Robitzsch et al. (Reference Robitzsch, Kiefer and Wu2025) have only been described in the software literature and have not yet been thoroughly studied. Moreover, their practical interpretation remains unclear, as it is nearly impossible to align such constraints with realistic expectations from the data or the structure of the construct being measured.

Finally, a wide range of longitudinal and MTMM models are relevant to this type of bifactor modeling. Specifically, within the longitudinal framework, derivations of Jöreskog’s (Reference Jöreskog1970) simplex model (Wilson et al., Reference Wilson, Zheng and McGuire2012) can be viewed as nested bifactor models with G-structures. These models produce latent estimates of difference scores that reflect changes in ability across measurement occasions. This is conceptually similar to bifactor models in which specific factor estimates represent the difference between the general ability and the ability required to solve the items associated with a given specific factor.

While longitudinal models can estimate the full correlation matrix of latent dimensions—thanks to constraints placed on the factor loadings of anchor items (Duncan & Duncan, Reference Duncan and Duncan2004)—the reliability of the resulting difference scores has been a longstanding concern (e.g., Cronbach & Furby, Reference Cronbach and Furby1970). Although the debate on the reliability of factor scores continues (see Trafimow, Reference Trafimow2015), we explore this issue in the context of the CORB model through our simulation study.

Several special cases of MTMM models are also highly relevant to partially oblique bifactor models. In particular, some MTMM models adopt a latent difference score approach by imposing constraints on factor loadings (e.g., Pohl et al., Reference Pohl, Steyer and Kraus2008). Other models have modified these constraints so that specific factors do not reflect the difference between two abilities but rather the deviation from a person-specific average across all specific abilities—resulting in latent mean models (e.g., Pohl & Steyer, Reference Pohl and Steyer2010).

More broadly, a growing body of research is investigating the conditions under which correlation matrices in these models are identifiable (see Bee et al., Reference Bee, Koch and Eid2023, for a recent review). These modeling approaches have been extended to a variety of applications, ranging from survey validation to rater assessments (Eid et al., Reference Eid, Geiser and Koch2024), and now represent one of the most prominent and rapidly evolving areas in psychometrics.

4 The simulation study

4.1 Design

We conducted a simulation study to examine the recovery of model parameters by the CORB model and compare it to existing partially oblique bifactor models. For simplicity in comparing model fits, the simulations utilized only the G-structure of test dimensionality. The study addressed three Research Questions related to parameter recovery:

RQ1: How does the number of “construct items” affect parameter recovery?

RQ2: How does the number of specific factors affect parameter recovery?

RQ3: How does the number of items per specific factor affect parameter recovery?

To address the research questions:

1. For RQ1, we varied the number of construct items from 1 to 2 to 3, while keeping the number of items per specific factor and the number of specific factors constant (5 and 3, respectively).
2. For RQ2, we varied the number of specific factors from 3 to 4 to 5, while keeping the number of items per specific factor and the number of construct items constant (3 and 1, respectively).
3. For RQ3, we varied the number of items per specific factor from 3 to 5 to 7, while keeping the number of specific factors and the number of construct items constant (5 and 3, respectively).

Overall, we designed 9 simulation conditions, with 100 replications for each condition. In each replication, we calibrated the CORB model, the ETM, three reparameterizations of the GSM (averaging the results across them), and the orthogonal RTM, all using the same test dimensionality structures for comparison.

In the replications of these conditions, we randomly varied the variance–covariance matrices of the latent person parameter space, ensuring they were positive-definite. The variances ranged from 0.3 to 4 logits, with all dimensions (including the general factor) being oblique, reflecting a realistic setup. Across all simulations, the sample size was fixed at 2,000, and the item difficulties were spaced equally from −2 to 2 logits. Items were assigned alternating loads on specific factors, though the number of items varied.

To compare the simulation results, we utilized the following metrics:

• AIC and BIC indices: To assess model fit while accounting for parameter complexity and sample size.
• Pearson correlation: Between Expected a Posteriori (EAP; Bock & Mislevy, Reference Bock and Mislevy1982) ability estimates and their true values.
• EAP reliabilities: To assess the consistency of EAP estimates (Adams, Reference Adams2005).
• Root Mean Squared Error (RMSE) of the factor correlation matrix was estimated as the Root Mean Squared Frobenius norm of the difference matrix between estimated and true covariance matrices across all replications, providing inherent normalization to its values and robustness to the varying covariance scale:

$$\begin{align*}RMSE=\sqrt{\frac{\sum_{r=1}^R{\left\Vert \widehat{{\boldsymbol{\Sigma}}_r}-{\boldsymbol{\Sigma}}_r\right\Vert}_F^2}{R}},\end{align*}$$

where ${\left\Vert \widehat{{\boldsymbol{\Sigma}}_r}-{\boldsymbol{\Sigma}}_r\right\Vert}_F$ is the Frobenius norm of the difference matrix between the true covariance matrix ${\boldsymbol{\Sigma}}_r$ in replication $r$ and the estimated covariance matrix $\widehat{{\boldsymbol{\Sigma}}_r}$ ,

$R$ is the total number of replications.

• Bias in the variance estimates:

$$\begin{align*}Bias=\frac{\sum_{r=1}^R\widehat{\zeta_r}-{\zeta}_r}{R},\end{align*}$$

where ${\zeta}_r$ is a true value of the parameter in the replication $r$ , $\widehat{\zeta_r}$ is an estimate of the parameter in the replication $r$ .

Since RMSE was essentially estimated in the correlation matrix, it does not account for potential biases in the variance estimates of latent dimensions. Bias in the variances was used to account for this limitation. It allows us to evaluate the general tendency of a model to overestimate or underestimate the variances of latent dimensions, as well as the expected magnitude of this over- or underestimation.

First, we expect the CORB model to yield the most accurate parameter estimates compared to other partially oblique bifactor models, since it was used as the data-generating model and reflects the most realistic assumptions about the construct structure. Specifically, we anticipate that the CORB model will recover the most accurate correlation estimates. Also, we expect the CORB model to demonstrate the best global model-data fit across all conditions.

Second, we expect that the number of specific factors and the number of construct items will have the greatest impact on model fit, as these elements directly influence the dimensionality of the test structure. Therefore, in RQs 1 and 2, we expect the CORB model to outperform the other oblique bifactor models most significantly.

Third, we expect the CORB model to achieve the highest EAP reliabilities of test scores. This is because EAP estimation can incorporate information from the variance–covariance matrix of the dimensions, allowing scores in the oblique multidimensional model to “reinforce” one another in proportion to their correlations (de la Torre & Patz, Reference de la Torre and Patz2005).

For the simulations we used TAM package v. 3.7-16 for R software (Robitzsch et al., Reference Robitzsch, Kiefer and Wu2021).

4.2 Results

4.2.1 RQ1: how does the number of “construct items” affect parameter recovery?

The ETM failed to converge in 23% of cases when there was 1 construct item, 19% of cases with 2 construct items, and 22% of cases with 3 construct items. These results suggest that the number of construct items does not significantly impact the ETM’s convergence behavior. In contrast, all other models converged 100% of the time, regardless of the number of construct items. This indicates potential distortions in the person parameter space during estimation, rendering the ETM difficult or impossible to estimate consistently (Table 1).

Table 1 Comparison of the bifactor models of interest for the first research question of the simulation study

In general, the results indicate that the CORB model outperforms other partially oblique bifactor models in terms of the correlation of parameter estimates with their true values and global model fit. Across all simulation conditions, the CORB model consistently provides better parameter recovery. Interestingly, however, the reliability of the general factor in the CORB model is lower than that of the GSM. This aligns with the fact that the GSM primarily focuses on a unidimensional interpretation of the test, thereby forcing more information from item scores into the general factor.

Similarly, the general factor reliability in the ETM is higher than in the CORB model for a related reason: the ETM’s orthogonality assumption between specific factors enhances the general factor’s reliability while weakening the specific factor reliability compared to the CORB model. As expected, the worst performance overall is observed for the completely orthogonal bifactor model, both in terms of reliability and model fit.

When comparing the RMSE of the correlation matrix, the results align with expectations for the RTM, the ETM, and the CORB model: the more general the model, the better it recovers correlations. However, the GSM exhibits a surprising result. Despite fitting better than the ETM according to AIC and BIC, the GSM produces a latent space of person dimensions that deviates the most from the data-generating space. This outcome reflects the GSM’s modeling approach, which constrains the components of the construct to lie within the general factor rather than treating them as additional to it.

Interestingly, the RTM, the ETM, and the GSM show minimal systematic bias in variance estimates (relative to the standard deviation of this bias). In contrast, the CORB model tends to slightly underestimate the variances of both the general and specific factors, particularly when the test includes only 1 construct item. Nevertheless, the CORB model demonstrates superior stability in terms of the bias-variance trade-off compared to other models.

Regarding RQ1, increasing the number of construct items improves the performance of all models. This global improvement can likely be attributed to test length—a well-established factor in improving the precision of parameter estimates, as longer tests provide more data upon which parameter estimates are based.

4.2.2 RQ2: how does the number of specific factors affect parameter recovery?

The ETM failed to converge in 8% of cases for 3 specific factors, 20% of cases for 4 specific factors, and 32% of cases for 5 specific factors. Unlike the results in the previous simulation (RQ1), this suggests that the ETM’s convergence is influenced by the complexity of the latent person parameter space, with higher numbers of specific factors leading to greater convergence issues. In contrast, all other models converged 100% of the time, regardless of the number of specific factors (Table 2).

Table 2 Comparison of the bifactor models of interest for the second research question of the simulation study

In general, the results are consistent with the previous simulation and indicate that the CORB model outperforms the other partially oblique bifactor models across all key statistics—from the correlation between parameter estimates and their true values to global model fit (with the exception of the general factor reliability in the GSM model). The insights from the previous simulation study are repeated here: the GSM tends to recover the most reliable general factor scores, but this comes at the expense of interpreting the specific factors.

The CORB model provides a balance between the reliability of the general factor and the specific factors. It improves the reliability of the general factor compared to the traditional orthogonal bifactor model while simultaneously recovering the most reliable scores for the specific factors. As expected, the CORB model recovers the correlation matrix more accurately than all other bifactor models and remains significantly more stable in terms of variance estimates.

Interestingly, while increasing the number of specific factors reduces the underestimation of the general factor variance on average, the CORB model’s recovery of the general factor variance, although improved and more stable, does not surpass that of its special cases (such as the ETM or GSM). This may indicate that the CORB model requires special convergence criteria or longer estimation times to achieve better performance in complex test structures.

Regarding RQ2, increasing the number of specific factors tends to improve overall model performance for all models. However, as in RQ1, this improvement may primarily result from the increased test length, which enhances parameter precision by providing more data for estimation.

4.2.3 RQ3: how does the number of items per specific factor affect parameter recovery?

The ETM failed to converge in 19% of cases with 3 items per specific factor, 41% of cases with 5 items per specific factor, and 66% of cases with 7 items per specific factor. These results indicate that the convergence of the ETM strongly depends on the length of the testlet, with longer testlets significantly reducing its likelihood of convergence. In contrast, all other models converged 100% of the time, regardless of the number of items per specific factor (Table 3).

Table 3 Comparison of the bifactor models of interest for the third research question of the simulation study

In general, the results are consistent with the previous simulations and once again demonstrate that the CORB model outperforms other partially oblique bifactor models across all simulation conditions and key statistics (with the exception of the general factor reliability in the GSM model). While the GSM model consistently yields the highest reliabilities, the CORB model produces the most accurate parameter estimates, as evidenced by the lower average RMSE of the correlation matrix and, in this case, also by the lower bias in variance estimates. Both increasing the number of “construct items” and lengthening the specific factors positively impact parameter recovery across all models.

Notably, while the GSM model consistently recovers a latent space that is furthest from the data-generating space, it paradoxically exhibits better model fit than the orthogonal bifactor model and the ETM, though not better than the CORB model. This highlights a critical limitation: a naïve comparison of the GSM with other models based solely on global model fit indices (such as AIC and BIC) can lead to substantial distortion in the interpretation of test scores. Such distortion undermines the intended construct validity that test developers aim for when designing the test. Therefore, we strongly recommend exercising caution when using the GSM model alongside the orthogonal bifactor model, the ETM, and the CORB model, as the GSM is fundamentally different from these models. Crude comparisons may result in significant validity threats.

Regarding RQ3, we can again conclude that, in general, the longer the test, the better the results, across all models.

5 A real data example

5.1 The test and the data

For the real data example, we used data from a low-stakes computerized assessment of reading literacy in Russian called “START.” This test is designed to measure first-graders’ reading literacy, defined as their ability to: (1) recognize letters of the Russian alphabet, (2) read words aloud, (3) read a short story aloud (“mechanical” reading), and (4) comprehend reading material (Ivanova & Kardanova-Biryukova, Reference Ivanova and Kardanova-Biryukova2019). The assessment is conducted by teachers, who assist each student by opening the test in an internet browser and determining whether the student’s responses to each item are correct. All teachers follow standardized test administration guidelines provided by the test developers. The test consists of 35 dichotomous items, divided into four subsections based on the construct definition: letter recognition (9 items), reading words aloud (9 items), mechanical reading (3 items), reading comprehension (14 items).

For the sake of illustration, we calibrated the models without using a specific factor for reading comprehension. This approach forces the models to rely solely on the G-structure of the test dimensionality. Initially, this simplification was necessary to identify the CORB model and the ETM. To ensure consistency in model comparison, we applied the same G-structure to the RTM and GSM models. Consequently, all reading comprehension items were treated as “construct items” defining the general factor across all models. For the GSM, this meant that the ${k}_d$ parameter was not estimated for the “construct items,” constraining their discrimination to unity.

This G-structure aligns with the construct definition, as reading literacy is conceptualized as the ability to comprehend texts. In this framework, the “lower-order” skills (letter recognition, word reading aloud, and mechanical reading) are considered prerequisites for reading comprehension.

The data was collected in November 2020 from a region in the Russian Federation. The sample includes 1,000 first-grade students, though it is not representative of the broader population.

5.2 Results

The results of the model comparison are presented in Table 4.

The correlation matrix from the ETM is presented in Table 5.

Table 4 The results of the model comparison of the real data

Note. The results of the GSM are presented for 3 reparameterizations of it.

Likelihood Ratio Test confirmed that ETM fits better than RTM ( ${\chi}^2$ =254.6, df = 3, p-value <0.001).

Likelihood Ratio Test confirmed that CORB model fits better than both RTM ( ${\chi}^2$ =347.1, df = 6, p-value <0.001) and ETM ( ${\chi}^2$ =92.5, df = 3, p-value <0.001).

Table 5 The correlation matrix from the ETM

The gathered correlation matrix from the GSM is presented in Table 6.

Table 6 The gathered correlation matrix from all three reparameterizations of the GSM

The correlation matrix from the CORB model is presented in Table 7.

Table 7 The correlation matrix from the CORB model

5.3 Interpretation of results

The results from the real data application indicate that the CORB model fits the data better than other oblique bifactor models and the orthogonal bifactor model, which is expected since the CORB model is more general. However, the most significant distinction of the CORB model lies in its interpretability. Unlike other bifactor models, the CORB model allows for the direct interpretation of specific factors as “components” of general reading skills, as it permits these factors to correlate freely. In contrast, the assumptions of complete or partial orthogonality in other bifactor models imply that the extracted factor scores are abstract constructs, statistically “purified” from the influence of other factors.

The variances of all latent factors appear relatively high compared to similar studies. This can be attributed to the high “guttmanization” of students’ response profiles (Maggino, Reference Maggino and Michalos2014) and the data collection conditions. Guttmanization likely results from the theoretical framework of the test, which presupposes a hierarchical structure of behavior indicators. In such a framework, a student is unlikely to answer a subsequent item correctly if they have already failed a preceding one. Additionally, on the practical side, the teacher (acting as the proctor) may end the testing session prematurely when a student begins to struggle, reinforcing the hierarchical nature of the responses. These factors likely increase item discriminations, and as a result, constraining discriminations to unity leads to relatively high variance estimates.

The estimates from different reparameterizations of the GSM exhibit some numerical fluctuations but tend to converge toward consistent values (albeit slightly less consistently than in previous studies; Federiakin, Reference Federiakin2020).

One of the most challenging results to interpret is the occurrence of negative correlations in the correlation matrices. For example, a naive interpretation of Tables 5 and 7 might suggest that students who excel in letter recognition tend to struggle with reading comprehension, and vice versa. This apparent paradox affects both the ETM and CORB models. In the GSM case, the negative correlations in Table 6 can be explained from a technical standpoint: since the sum of specific dimensions is constrained to zero for each student, increasing one specific factor necessarily decreases the others, thereby inducing negative correlations.

Although the paradox of negative correlations appears puzzling from a content perspective, it is a well-documented phenomenon in within-item multidimensional models (van Rijn & Rijmen, Reference van Rijn and Rijmen2012). This effect is known as the “explaining away phenomenon,” extensively studied within the framework of Bayesian reasoning and the causal interpretation of IRT models (Marsman et al., Reference Marsman, Borsboom, Kruis, Epskamp, van Bork, Waldorp, van der Maas and Maris2018). The principle behind this phenomenon is that “the confirmation of one cause of an observed event reduces the need to invoke alternative causes” (Wellman & Henrion, Reference Wellman and Henrion1993, p. 287). In the context of within-item multidimensional IRT models, this implies that when an item loads on two latent dimensions in a compensatory manner, a student can succeed in answering the item through three possible scenarios:

1. Compensating for low ability on dimension 1 by having high ability on dimension 2.
2. Compensating low ability on dimension 2 by having high ability on dimension 1.
3. Having high ability on both dimensions.

However, scenario 3 is less likely, as it requires more conditions to be simultaneously satisfied. Therefore, negative correlations between dimensions arise because scenarios 1 and 2 dominate in the sample. Hooker and Finkelman (Reference Hooker and Finkelman2010) proved this result for bifactor models that do not comply with the Schmid and Leiman (Reference Schmid and Leiman1957) constraints. Later, van der Linden (Reference van der Linden2012) provided a rigorous generalization of this result, while van Rijn and Rijmen (Reference van Rijn and Rijmen2012) graphically demonstrated it for all compensatory within-item multidimensional models.

Consequently, the negative correlations observed in the ETM and CORB models do not imply that students who are better at reading are worse at recognizing letters, or vice versa. Instead, these negative correlations are statistical artifacts resulting from the conditioning of parameter estimates on the distribution of student abilities. Therefore, this paradoxical outcome does not require extensive content interpretation or explanations based on substantial issues with the construct.

Interestingly, in both the ETM and CORB models, the correlations of the three specific factors with the general factor support the theoretical hierarchy of reading skills proposed by Ivanova and Kardanova-Biryukova (Reference Ivanova and Kardanova-Biryukova2019). Specifically, the closer a specific factor is to reading comprehension (which defines the general factor) in the theoretical hierarchy of skills, the stronger its correlation with the general factor becomes. It is important to note that the hierarchy of skills in this context is defined purely in terms of theoretical interpretation and does not impose structural constraints on the model itself. That is, although students are theoretically expected to acquire skills in a sequential manner, the model treats all skills as independent but correlated dimensions.

As a result, the closer two skills are in terms of cognitive content (e.g., letter recognition is cognitively closer to word reading than to mechanical reading), the stronger their correlation becomes. This effect may act as a counterbalance to the explaining-away phenomenon, driven by the similarity of the cognitive content across latent dimensions.

6 Discussion

Bifactor models are prevalent in psychometric literature because they directly extract the general factor from a truly composite test structure while accounting for local item dependence. However, they are notoriously difficult to interpret, as their identification requires highly restrictive constraints on the variance–covariance matrix. Specifically, the assumption of total orthogonality often results in models where only the general factor is practically interpretable, while the specific factors are typically treated as nuisance dimensions and ignored.

In response to these limitations, several partially oblique bifactor models have been proposed and studied. Notably, the ETM allows for direct estimation of correlations between the general factor and specific factors while maintaining orthogonality among the specific factors. Another well-documented model is the GSM, which allows correlations between specific factors but constrains them to be orthogonal to the general factor. However, the theoretical interpretations of these models vary considerably, as different constraints on the variance–covariance matrix lead to different conceptualizations of the construct being measured. Additionally, other bifactor models similar to the GSM—but without such constraints—can apparently be identified if the factor loading matrix satisfies specific conditions. Despite this, the interpretation and practical application of these models are often as complicated as those of orthogonal bifactor models due to the complexity of their underlying assumptions.

The purpose of this article was twofold:

1. To introduce the CORB model, which enables the direct estimation of all correlations between latent factors.
2. To describe the structures of test dimensionality that allow for the CORB model’s identification.

Through simulation studies and a real data example, we demonstrated that the CORB model outperforms other bifactor models in terms of model fit and the recovery of factor correlations. However, successful identification of the CORB model requires a specific test design structure. In this article, we introduced and analyzed two such structures:

1. G-structure (Figure 2): This structure requires that the test contain at least one “construct item” that loads solely on the general factor.
2. S-structure (Figure 3): This structure requires that no items load solely on the general factor, but at least one item is shared between every pair of specific factors.

These test dimensionality structures allow for direct estimation of all correlations between specific factors, simplifying the interpretation of the latent person parameter space. To analytically establish the identification of the CORB model, we applied the Volodin–Adams procedure, which verifies the identification of oblique Rasch models by examining the rank and structure of the design and scoring matrices.

However, as a within-item multidimensional compensatory IRT model, the CORB model is susceptible to paradoxical results, where two latent factors that are theoretically expected to correlate positively may instead be estimated as negatively correlated. For example, in the real data application, the specific factor “Letters recognition” was negatively correlated with the general factor, interpreted as reading comprehension. Nevertheless, such results are not truly paradoxical; they can be explained by the “explaining away” phenomenon from the Bayesian reasoning paradigm. From this perspective, these results are merely statistical artifacts that do not require extensive content interpretation.

Broadly, this aricle addresses the topic of bifactor model identification. Most researchers, particularly applied researchers and test developers, tend to assume that bifactor models must be orthogonal. In certain contexts, such as testlets and item bundles, this assumption is appropriate. Moreover, orthogonality significantly accelerates parameter estimation, as it prevents these models from falling victim to the “curse of dimensionality,” which exponentially increases computational complexity and slows down numerical integration as the number of correlated latent dimensions grows (Rijmen, Reference Rijmen2009). In such cases, factors secondary to the researcher’s primary interest are often treated as nuisance dimensions that explain common variance across items.

However, our work demonstrates that deliberate modifications to the structure of test dimensionality can enable researchers to estimate all entries in the variance–covariance matrix of a bifactor model. This approach allows for models that align more closely with theoretical assumptions about the construct’s structure, particularly when the construct is intentionally composite rather than being artificially defined by the stimuli. Although such models are more computationally demanding and less advantageous from a technical standpoint due to their vulnerability to the curse of dimensionality, they offer significant theoretical benefits. Specifically, they are more useful in cases where researchers seek to explore the nuances of the construct structure (especially the correlation matrix of person dimensions) or apply the model in predictive measurement contexts (Zhang et al., Reference Zhang, Sun, Cao and Drasgow2021; Zhang, Luo, Sun, et al., Reference Zhang, Luo, Sun, Cao and Drasgow2023).

Furthermore, recent advances in parameterizing IRT (Converse, Reference Converse2021) and factor-analytical models (Urban & Bauer, Reference Urban and Bauer2021) as artificial neural networks may help mitigate these computational challenges. Neural networks are far less susceptible—if not entirely immune—to the curse of dimensionality (Cheridito et al., Reference Cheridito, Jentzen and Rossmannek2021). Therefore, parameterizing the CORB model as a neural network could potentially eliminate computational inefficiencies, rendering computational time a negligible concern.

In the context of model identification, our article highlights that the conditions for identifying oblique bifactor models remain an area for further research. Notably, existing models that impose zero constraints on the sum of covariances or on the sum of variances and covariances of person parameters suggest that many oblique bifactor models that are analytically unidentified (under currently known procedures) may, in fact, be empirically identified. Further exploration of identification conditions could pave the way for the development of new oblique bifactor models with practical and theoretically meaningful interpretations.

Additionally, while models with freely estimated discrimination parameters require linear independence of factor loadings on the general and specific factors for identification (Fang et al., Reference Fang, Guo, Xu, Ying and Zhang2021; Zhang, Luo, Zhang, et al., Reference Zhang, Luo, Zhang, Sun and Zhang2023), this requirement appears irrelevant for Rasch models. In Rasch models, discrimination parameters are constrained to unity by definition, resulting in linearly dependent “factor loadings” on the general and specific factors. This distinction underscores the need for a more detailed investigation into the identification conditions specific to Rasch-based bifactor models.

Interestingly, since the ETM is a special case of the CORB model, it is conceptually possible to extend the CORB framework by proposing the Subdimensional Oblique Rasch Bifactor (SORB) model. The SORB model shares conceptual similarities with the GSM and is closely related to partially oblique models in factor analysis (Fang et al., Reference Fang, Guo, Xu, Ying and Zhang2021; Zhang, Luo, Zhang, et al., Reference Zhang, Luo, Zhang, Sun and Zhang2023), while also being a special case of the CORB model. Consequently, the SORB model follows the same identification requirements as the CORB model, since it adheres to the general formulations of Equations 13 or 14, similar to the ETM. However, instead of recovering the variance–covariance matrix in Equation 10, the SORB model recovers the matrix given by Equation 29:

(29)

$$\begin{align} \boldsymbol{\Sigma} =\left[\begin{matrix}\mathit{\operatorname{var}}\left({\theta}_g\right)& 0& 0& 0\\ {}0& \mathit{\operatorname{var}}\left({\theta}_{s_1}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_{s_2}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_1},{\theta}_{s_3}\right)\\ {}0& \mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_{s_1}\right)& \mathit{\operatorname{var}}\left({\theta}_{s_2}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_2},{\theta}_{s_3}\right)\\ {}0& \mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_{s_1}\right)& \mathit{\operatorname{cov}}\left({\theta}_{s_3},{\theta}_{s_2}\right)& \mathit{\operatorname{var}}\left({\theta}_{s_3}\right)\end{matrix}\right]\end{align}$$

The constraint of correlations between specific factors and the general factor to zero, combined with the free estimation of correlations among specific factors, results in a “reversed” ETM, conceptually similar to the oblique bifactor solutions proposed by Jennrich and Bentler (Reference Jennrich and Bentler2012) and Lorenzo-Seva and Ferrando (Reference Lorenzo-Seva and Ferrando2019), but approached from a confirmatory modeling paradigm. This constraint also makes the orthogonal RTM a special case of both the ETM and the SORB model, though without nesting these models within one another.

This model is closer in interpretation to the ETM and CORB models than to the GSM. Specifically, it models specific factors as additional to the general factor, rather than as components of the general factor, as in the GSM. However, unlike the partially oblique bifactor models from factor analysis (Fang et al., Reference Fang, Guo, Xu, Ying and Zhang2021; Zhang, Luo, Zhang, et al., Reference Zhang, Luo, Zhang, Sun and Zhang2023), it constrains discrimination parameters, potentially making it easier to identify and more numerically stable. Further exploration of this model and its comparison to the GSM could be a valuable area for future research. In particular, a multi-step estimation procedure involving the following steps may improve numerical stability of parameter estimates and allow for estimation of the 2PL counterparts of all models used in this paper:

1. Preliminary estimation of a model.
2. Extraction of the estimated correlation matrix of the multivariate ability distribution from the preliminary estimation.
3. Fixing the correlation matrix in subsequent bifactor models with free factor loadings.
4. Estimating discrimination parameters given the fixed correlation matrix.

This approach could yield further improved model fit, more stable parameter estimates, and enhanced robustness of bifactor model applications.

This article has several limitations. First, we only considered dichotomous items. To identify the CORB model in a test with polytomous items, at least one category of at least one item must load solely on the general factor, or alternatively, at least one category of at least one item must be shared between every pair of specific factors. However, this result is specific to the adjacent logit link function. Extending these findings to other link functions, such as probit or cumulative link functions, represents a promising avenue for future research.

Moreover, the proposed structures of test dimensionality reflect a complex but tractable process of test development, particularly under partial credit scoring in educational assessments. Among the two structures discussed in this paper, developing items for the S-structure appears more feasible. For example, consider a test with a clear bifactor structure, where one subdimension represents addition skills and another represents subtraction skills. In this scenario, a researcher could create items requiring both skills (e.g., word problems combining addition and subtraction) to transform a clear bifactor structure into an S-structure.

Conversely, developing items that measure only general arithmetic skills (i.e., not specific to addition or subtraction) is likely more challenging. This example illustrates that while transforming an existing test into an S-structure may be achievable, constructing items for a G-structure, which requires purely general items, can be considerably more difficult.

Additionally, this paper does not delve into several important applied aspects of the CORB model. For example, we do not explore how the CORB model relates to the intricate connections between second-order models and bifactor models defined by the Schmid–Leiman constraints (Gignac, Reference Gignac2016; Mansolf & Reise, Reference Mansolf and Reise2017; Rijmen, Reference Rijmen2010). Furthermore, we do not address the item development process in detail and only briefly touch upon the topic of item fit.

A more in-depth discussion on the interpretation of the ETM and other partially oblique bifactor models is also necessary. Currently, their detailed interpretation remains unclear—particularly regarding when and how such complex variance–covariance matrix constraints can be expected to align with the underlying construct. Additionally, this article does not examine the potential impact of the CORB model on subscore reporting (Haberman et al., Reference Haberman, Sinharay, Feinberg and Wainer2024). While the demand for interpretable scores on specific factors motivates our work, further research is needed to assess the added value of subscores derived from the CORB model.

Moreover, the CORB model is presented solely within the Rasch measurement paradigm, which assumes that items with the same factor loading structure share identical factor loadings. This marks a key point of divergence from the 2PNO paradigm, where discrimination parameters are freely estimated. Investigating analogous models within the 2PNO paradigm, examining their properties, and generalizing the Volodin–Adams procedure to this paradigm represent promising directions for future research.

Finally, working in the confirmatory IRT paradigm, we do not discuss the consequences of the CORB model for the exploratory paradigm. Developing further rotation methods for the exploratory oblique bifactor analysis, determining the number of latent factors (Chen & Li, Reference Chen and Li2022), analyzing exploratory model fit, and other issues in the exploratory modeling also present a perspective avenue for further research.

Finally, since this paper operates within the confirmatory IRT paradigm, we do not discuss the implications of the CORB model for the exploratory paradigm. Developing advanced rotation methods for exploratory oblique bifactor analysis, determining the appropriate number of latent factors (Chen & Li, Reference Chen and Li2022), assessing exploratory model fit, and addressing other issues in exploratory modeling represent promising avenues for future research.

Supplementary material.

To view supplementary material for this article, please visit http://doi.org/10.1017/psy.2025.14.

Funding statement

The research was supported by RFBR according to the research project No. 19-29-14110.

Competing interests

The authors declare none.

Footnotes

¹ More generally, they can be assigned to any real number values, as necessary for the Partial Credit Model.

² We use the symbol $\mid$ (solid vertical line) within square brackets to denote row-based matrix concatenation, ensuring clarity and avoiding confusion with matrix multiplication. Matrix multiplication is not possible in this context due to the non-conformable dimensions of the matrices involved. Similarly, Volodin and Adams (Reference Volodin and Adams2002) addressed this potential ambiguity by using spacing between the symbols denoting matrices for the same purpose.

³ For simplicity, from this point forward, the model equations will be expressed in terms of proportionality functions to avoid specifying the full model denominator as in Equation 1.

References

Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education , 7(4), 255–278. https://doi.org/10.1207/s15324818ame0704_1 CrossRef Google Scholar

Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31(2–3), 162–172. https://doi.org/10.1016/j.stueduc.2005.05.008 CrossRef Google Scholar

Adams, R. J., Wilson, M., & Wang, W. C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/10.1177/0146621697211001 CrossRef Google Scholar

Adams, R. J., Wu, M. L., Cloney, D., Berezner, A., & Wilson, M. (2020). ACER ConQuest: Generalised Item Response Modelling Software (Version 5.29) [Computer software]. Australian Council for Educational Research. https://www.acer.org/au/conquest Google Scholar

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705 CrossRef Google Scholar

Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5, 111–150.Google Scholar

Bee, R. M., Koch, T., & Eid, M. (2023). A general theorem and proof for the identification of composed CFA models. Psychometrika, 88(4), 1334–1353. https://doi.org/10.1007/s11336-023-09933-6 CrossRef Google Scholar PubMed

Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405 CrossRef Google Scholar

Brandt, S. (2008). Estimation of a Rasch model including subdimensions. IERI Monograph Series. Issues and Methodologies in Large-Scale Assessments, 1, 51–70. https://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_01_Chapter_3.pdf Google Scholar

Brandt, S. (2017). Concurrent unidimensional and multidimensional calibration within item response theory. Pensamiento Educativo. Revista de Investigación Educacional Latinoamericana, 54, 1–16. https://doi.org/10.7764/PEL.54.2.2017.4 CrossRef Google Scholar

Brandt, S., & Duckor, B. (2013). Increasing unidimensional measurement precision using a multidimensional item response model approach. Psychological Test and Assessment Modeling, 55(2), 148. http://www.psychologie-aktuell.com/fileadmin/download/ptam/2-2013_20130625/02_Brandt.pdf Google Scholar

Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248. https://doi.org/10.1037/a0023350 CrossRef Google Scholar PubMed

Chen, Y., & Li, X. (2022). Determining the number of factors in high-dimensional generalized latent factor models. Biometrika, 109(3), 769–782. https://doi.org/10.1093/biomet/asab044 CrossRef Google Scholar

Cheridito, P., Jentzen, A., & Rossmannek, F. (2021). Efficient approximation of high-dimensional functions with neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(7), 3079–3093. http://doi.org/10.1109/TNNLS.2021.3049719 CrossRef Google Scholar

Converse, G. (2021). Neural network methods for application in educational measurement. [Doctoral Dissertation, The University of Iowa].Google Scholar

Cronbach, L. J., & Furby, L. (1970). How we should measure “change”: Or should we? Psychological Bulletin, 74(1), 68–80. https://doi.org/10.1037/h0029382 CrossRef Google Scholar

de Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39, 1–28. https://doi.org/10.18637/jss.v039.i12 CrossRef Google Scholar

de la Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311. https://doi.org/10.3102/10769986030003295 CrossRef Google Scholar

DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International Journal of Testing, 13(4), 354–378. https://doi.org/10.1080/15305058.2013.799067 CrossRef Google Scholar

Duncan, T. E., & Duncan, S. C. (2004). An introduction to latent growth curve modeling. Behavior Therapy, 35(2), 333–363. https://doi.org/10.1016/S0005-7894(04)80042-X CrossRef Google Scholar

Eid, M., Geiser, C., & Koch, T. (2024). Structural equation modeling of multiple rater data. Guilford Publications.Google Scholar

Eid, M., Geiser, C., Koch, T., & Heene, M. (2017). Anomalous results in G-factor models: Explanations and alternatives. Psychological Methods, 22(3), 541. https://doi.org/10.1037/met0000083 CrossRef Google Scholar PubMed

Fang, G., Guo, J., Xu, X., Ying, Z., & Zhang, S. (2021). Identifiability of bifactor models. Statistica Sinica, 31, 2309–2330. https://doi.org/10.5705/ss.202020.0386 Google Scholar

Federiakin, D. (2020). Calibrating the test of relational reasoning: New information from oblique bifactor models. Frontiers in Psychology, 11, 2129. https://doi.org/10.3389/fpsyg.2020.02129 CrossRef Google Scholar PubMed

Gignac, G. E. (2016). The higher-order model imposes a proportionality constraint: That is why the bifactor model tends to fit better. Intelligence, 55, 57–68. https://doi.org/10.1016/j.intell.2016.01.006 CrossRef Google Scholar

Grayson, D., & Marsh, H. W. (1994). Identification with deficient rank loading matrices in confirmatory factor analysis: Multitrait-multimethod models. Psychometrika, 59, 121–134. https://doi.org/10.1007/BF02294271 CrossRef Google Scholar

Haberman, S., Sinharay, S., Feinberg, R. A., & Wainer, H. (2024). Subscores: A practical guide to their production and consumption. Cambridge University Press. https://doi.org/10.1017/9781009413701 CrossRef Google Scholar

Hendy, N. T., & Biderman, M. D. (2019). Using bifactor model of personality to predict academic performance and dishonesty. The International Journal of Management Education, 17(2), 294–303. https://doi.org/10.1016/j.ijme.2019.05.003 CrossRef Google Scholar

Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2(1), 41–54. https://doi.org/10.1007/BF02287965 CrossRef Google Scholar

Hooker, G., & Finkelman, M. (2010). Paradoxical results and item bundles. Psychometrika, 75(2), 249–271. https://doi.org/10.1007/s11336-009-9143-y CrossRef Google Scholar

Ivanova, A., & Kardanova-Biryukova, K. (2019). Constructing a Russian-language version of the international early reading assessment tool. Educational Studies Moscow, (4), https://doi.org/10.17323/1814-9545-2019-4-93-115 Google Scholar

Jennrich, R. I., & Bentler, P. M. (2012). Exploratory bi-factor analysis: The oblique case. Psychometrika, 77(3), 442–454. https://doi.org/10.1007/s11336-012-9269-1 CrossRef Google Scholar PubMed

Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2018). CFA models with a general factor and multiple sets of secondary factors. Psychometrika, 83, 785–808. https://doi.org/10.1007/s11336-018-9633-x CrossRef Google Scholar PubMed

Jöreskog, K. G. (1970). Estimation and testing of simplex models. British Journal of Mathematical and Statistical Psychology, 23(2), 121–145. https://doi.org/10.1111/j.2044-8317.1970.tb00439.x CrossRef Google Scholar

Kenny, D. A. (1979). Correlation and causality. Wiley.Google Scholar

Koch, T., & Eid, M. (2024). Augmented bifactor models and bifactor-(S-1) models are identical. A comment on Zhang, Luo, Zhang, Sun & Zhang (2023). Structural Equation Modeling: A Multidisciplinary Journal, 31(5), 794–801. https://doi.org/10.1080/10705511.2024.2339387 CrossRef Google Scholar

Le, L. T., & Adams, R. J. (2013). Accuracy of Rasch model item parameter estimation. https://research.acer.edu.au/ar_misc/13 Google Scholar

Lorenzo-Seva, U., & Ferrando, P. J. (2019). A general approach for fitting pure exploratory bifactor models. Multivariate Behavioral Research, 54(1), 15–30. https://doi.org/10.1080/00273171.2018.1484339 CrossRef Google Scholar PubMed

Maggino, F. (2014). Guttman scale. In Michalos, A. C. (Ed.), Encyclopedia of quality of life and well-being research. Springer. https://doi.org/10.1007/978-94-007-0753-5_1218 Google Scholar

Mansolf, M., & Reise, S. P. (2017). When and why the second-order and bifactor models are distinguishable. Intelligence, 61, 120–129. https://doi.org/10.1016/j.intell.2017.01.012 CrossRef Google Scholar

Marsman, M., Borsboom, D., Kruis, J., Epskamp, S., van Bork, R. V., Waldorp, L. J., van der Maas, H. L. J. & Maris, G. (2018). An introduction to network psychometrics: Relating Ising network models to item response theory models. Multivariate Behavioral Research, 53(1), 15–35. https://doi.org/10.1080/00273171.2017.1379379 CrossRef Google Scholar PubMed

Morin, A. J. S., Arens, A. K., Marsh, H. W. (2016). A bifactor exploratory structural equation modeling framework for the identification of distinct sources of construct-relevant psychometric multidimensionality. Structural Equation Modeling: A Multidisciplinary Journal, 23(1), 116–139. https://doi.org/10.1080/10705511.2014.961800 CrossRef Google Scholar

Paek, I., Yon, H., Wilson, M., Kang, T. (2009). Random parameter structure and the testlet model: Extension of the Rasch testlet model. Journal of Applied Measurement, 10(4), 394–407.Google Scholar PubMed

Pohl, S., & Steyer, R. (2010). Modeling common traits and method effects in multitrait-multimethod analysis. Multivariate Behavioral Research, 45(1), 45–72. https://doi.org/10.1080/00273170903504729 CrossRef Google Scholar PubMed

Pohl, S., Steyer, R., & Kraus, K. (2008). Modelling method effects as individual causal effects. Journal of the Royal Statistical Society Series A: Statistics in Society, 171(1), 41–63. https://doi.org/10.1111/j.1467-985X.2007.00517.x CrossRef Google Scholar

Rasch, G. (1993). Probabilistic models for some intelligence and attainment tests. MESA Press. www.rasch.org Google Scholar

Reckase, M. D., & McKinley, R. L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15(4), 361–373. https://doi.org/10.1177/014662169101500407 CrossRef Google Scholar

Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696. https://doi.org/10.1080/00273171.2012.715555 CrossRef Google Scholar PubMed

Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment, 92(6), 544–559. https://doi.org/10.1080/00223891.2010.496477 CrossRef Google Scholar PubMed

Rijmen, F. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models. ETS Research Report Series, 2009(1), i-31. https://doi.org/10.1002/j.2333-8504.2009.tb02160.x CrossRef Google Scholar

Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional irt model. Journal of Educational Measurement, 47(3), 361–372. http://www.jstor.org/stable/20778960 CrossRef Google Scholar

Rindskopf, D. (1984). Structural equation models: Empirical identification, Heywood cases, and related problems. Sociological Methods & Research, 13(1), 109–119. https://doi.org/10.1177/0049124184013001004 CrossRef Google Scholar

Robitzsch, A., Kiefer, T., Wu, M. (2021). Package ‘TAM’ (test analysis modules). v. 3.7-16. CRAN.Google Scholar

Robitzsch, A., Kiefer, T., Wu, M. (2025). Package ‘TAM’ (test analysis modules). v. 4.2-21. CRAN. https://cran.r-project.org/web/packages/TAM/TAM.pdf Google Scholar

Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53–61. https://doi.org/10.1007/BF02289209 CrossRef Google Scholar

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 461–464. https://doi.org/10.1214/aos/1176344136 Google Scholar

Trafimow, D. (2015). A defense against the alleged unreliability of difference scores. Cogent Mathematics, 2(1), 1064626. https://doi.org/10.1080/23311835.2015.1064626 CrossRef Google Scholar

Urban, C. J., & Bauer, D. J. (2021). A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika, 86(1), 1–29. https://doi.org/10.1007/s11336-021-09748-3 CrossRef Google Scholar PubMed

van der Linden, W. J. (2012). On compensation in multidimensional response modeling. Psychometrika, 77(1), 21–30. https://doi.org/10.1007/s11336-011-9237-1 CrossRef Google Scholar

van Rijn, P. W., & Rijmen, F. (2012). A note on explaining away and paradoxical results in multidimensional item response theory. ETS Research Report Series, 2012(2), i-10. https://doi.org/10.1002/j.2333-8504.2012.tb02295.x CrossRef Google Scholar

Volodin, N., & Adams, R. J. (2002). The estimation of polytomous item response models with many dimensions. https://research.acer.edu.au/ar_misc/14 Google Scholar

Wang, C., & Zhang, X. (2019). A note on the conversion of item parameters standard errors. Multivariate Behavioral Research, 54, 307–321. https://doi.org/10.1080/00273171.2018.1513829 CrossRef Google Scholar PubMed

Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149. https://doi.org/10.1177/0146621604271053 CrossRef Google Scholar

Wellman, M. P., & Henrion, M. (1993). Explaining ‘explaining away’. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(3), 287–292. https://doi.org/10.1109/34.204911 CrossRef Google Scholar

Wilson, M., & Gochyyev, P. (2020). Having your cake and eating it too: Multiple dimensions and a composite. Measurement, 151, 107247. https://doi.org/10.1016/j.measurement.2019.107247 CrossRef Google Scholar

Wilson, M., Zheng, X., & McGuire, L. (2012). Formulating latent growth using an explanatory item response model approach. Journal of Applied Measurement, 13(1), 1.Google Scholar PubMed

Zhang, B., Luo, J., Sun, T., Cao, M., & Drasgow, F. (2023). Small but nontrivial: A comparison of six strategies to handle cross-loadings in bifactor predictive models. Multivariate Behavioral Research, 58, 115–132. https://doi.org/10.1080/00273171.2021.1957664 CrossRef Google Scholar PubMed

Zhang, B., Luo, J., Zhang, S., Sun, T., & Zhang, D. C. (2023). Improving the statistical performance of oblique bifactor measurement and predictive models: An augmentation approach. Structural Equation Modeling: A Multidisciplinary Journal, 31(2), 233–252. https://doi.org/10.1080/10705511.2023.2222229 CrossRef Google Scholar

Zhang, B., Sun, T., Cao, M., & Drasgow, F. (2021). Using bifactor models to examine the predictive validity of hierarchical constructs: Pros, cons, and solutions. Organizational Research Methods, 24, 530–571. https://doi.org/10.1177/1094428120915522 CrossRef Google Scholar