1 Introduction
Teamwork involves the collaboration of individuals to achieve shared goals that are difficult to accomplish alone (Salas et al., Reference Salas, Dickinson, Converse, Tannenbaum, Swezey and Salas1992). Studying teamwork explores the cognitive and behavioral dynamics of interpersonal interactions, enhancing our understanding of cooperation (Salas et al., Reference Salas, Reyes, Woods, von Davier, Zhu and Kyllonen2017). This shift in focus from individual cognition to teamwork is evident in emerging research on collaborative problem-solving (Li et al., Reference Li, Liu, Cai and Yuan2023; OECD, 2017), group intelligence (Burton et al., Reference Burton, Lopez-Lopez, Hechtlinger, Rahwan, Aeschbach and Lorenz-Spreen2024; Woolley et al., Reference Woolley, Chabris, Pentland, Hashmi and Malone2010), social learning (Pan et al., Reference Pan, Cheng and Hu2022), and team creativity (Lu et al., Reference Lu, Gao, Wang, Qiao, He, Zhang and Hao2023).
The effectiveness of teamwork is essential for teams to outperform individuals. Effective teamwork enhances efficiency, expands knowledge, fosters innovation, reduces errors, and boosts satisfaction, while ineffective teamwork often underperforms individual efforts (Austin, Reference Austin2003; Hao et al., Reference Hao, Liu, von Davier, Kyllonen, von Davier, Zhu and Kyllonen2017). Achieving effective teamwork is challenging, as it relies heavily on team cognition—team-level constructs like shared understanding and the collective organization of knowledge—that emerge from cognitive interactions among members and play a critical role in shaping performance (Cannon-Bowers et al., Reference Cannon-Bowers, Salas, Converse and Castellan1993; DeChurch & Mesmer-Magnus, Reference DeChurch and Mesmer-Magnus2010; Klimoski & Mohammed, Reference Klimoski and Mohammed1994; Salas et al., Reference Salas, Reyes, Woods, von Davier, Zhu and Kyllonen2017). Even teams of highly skilled individuals may struggle to perform well, highlighting the importance of considering team cognition alongside individual cognition (Devine & Philips, Reference Devine and Philips2001; Woolley et al., Reference Woolley, Aggarwal and Malone2015). Thus, promoting effective teamwork requires objective measures of team performance and the ability to identify strengths and weaknesses in team cognition, including team members’ individual cognitions and the interactions or interdependencies among them (Salas et al., Reference Salas, Rosen, Burke, Nicholson and Howse2007, Reference Salas, Reyes, Woods, von Davier, Zhu and Kyllonen2017).
In addition to using total scores from self- and expert-report questionnaires to evaluate teamwork performance (Andersson et al., Reference Andersson, Rankin and Diptee2017), large-scale assessment programs such as PISA and ATC21S have adopted computer-based collaborative problem-solving tasks to assess team performance. This shift has prompted researchers to explore new psychometric models for analyzing teamwork response data (e.g., Andrews et al., Reference Andrews, Kerr, Mislevy, von Davier, Hao and Liu2017; Li et al., Reference Li, Liu, Cai and Yuan2023; von Davier & Halpin, Reference von Davier and Halpin2013; Wilson et al., Reference Wilson, Gochyyev and Scalise2017; Yuan et al., Reference Yuan, Xiao and Liu2019). For instance, Wilson et al. (Reference Wilson, Gochyyev and Scalise2017) incorporated the team-level random effect into the Rasch model to characterize interdependence in members’ response probabilities, assuming that both individual member ability and team-level random effect jointly influence performance in collaborative tasks; Yuan et al. (Reference Yuan, Xiao and Liu2019) decomposed collaborative problem-solving behaviors into independent actions (solely influenced by single-member ability) and collaborative actions (jointly determined by both members’ abilities), analyzing them through a two-dimensional Rasch model; Li et al. (Reference Li, Liu, Cai and Yuan2023) employed a two-dimensional item response theory model to assess individual cognitive ability and social competence, revealing a moderate positive correlation between them. One of the central premises of these models is that the evaluation of teamwork performance should integrate both cognitive and social dimensions. This dual-dimensional framework recognizes that effective teamwork depends not only on individuals’ cognitive abilities but also on the social interactions and the resulting interdependencies among team members. However, most, if not all, existing models are constructed within the item response theory framework and provide only an overall measurement of individuals’ teamwork abilities (e.g., the collaborative problem-solving ability) without offering a fine-grained diagnosis of individual cognitive attributes and assessing corresponding team cognitions. Consequently, these models fall short in identifying specific causes of ineffective teamwork and poor performance, such as whether issues stem from a lack of knowledge or skills among team members or problems within the collaborative process. This limitation makes it challenging to design targeted team-building interventions or provide support for developing the team cognition necessary for effective teamwork (Lacerenza et al., Reference Lacerenza, Marlow, Tannenbaum and Salas2018).
In recent decades, cognitive diagnosis or diagnostic classification (Rupp et al., Reference Rupp, Templin and Henson2010) has become a widely used assessment paradigm in psychology, education, and behavioral sciences. Designed to identify individuals’ mastery of cognitive attributes (e.g., understanding, knowledge, and skills) and provide targeted feedback to support cognitive development (Tang & Zhan, Reference Tang and Zhan2021), numerous cognitive diagnostic models (CDMs; von Davier & Lee, Reference von Davier and Lee2019) have been developed and applied in diagnosing attributes, such as spatial rotation (Wang et al., Reference Wang, Zhang, Douglas and Culpepper2018), pathological gambling (Templin & Henson, Reference Templin and Henson2006), numerical literacy (Liang et al., Reference Liang, de la Torre and Law2021), rational number operations (Tang & Zhan, Reference Tang and Zhan2021), and problem-solving skills (Zhan & Qiao, Reference Zhan and Qiao2022). However, traditional CDMs are limited to diagnosing individual cognitive attributes and cannot analyze teamwork response data, falling short of addressing the assessment needs of team cognition in collaborative settings. This limitation restricts their application in real-world scenarios where teamwork is a fundamental organizational form.
To the best of our knowledge, no CDM has been applied to teamwork-based diagnostic assessments, making this a novel and unexplored area in psychometrics. This study aims to propose a teamwork CDM framework comprising 12 specific models—collectively referred to as Team-CDMs—which are designed to capture the interdependence among team members through emergent team cognitions by jointly modeling individual cognitive attributes and a team-level construct, termed teamwork quality, which reflects the social dimension of collaboration. These Team-CDMs not only diagnose individual team members’ cognitive attributes but also assess teamwork quality, offering valuable insights into the strengths and weaknesses of team cognition.
This paper begins with a review of the log-linear model (LLM; Maris, Reference Maris1999), a representative CDM for diagnosing individual cognitive attributes, which also serves as the foundation for the construction of the proposed Team-CDMs. Subsequently, 12 Team-CDMs are constructed by fully crossing three design factors: (1) teamwork response mode (teamwork-separated and teamwork-unified), (2) teamwork interaction mechanism (disjunctive, additive, and conjunctive), and (3) teamwork-effect specification (random-effect and covariate-modulated). A Bayesian approach is employed for model parameter estimation. The psychometric performance of the Team-CDMs is evaluated through two simulation studies, followed by an empirical study illustrating the practical application of the proposed models. This paper concludes with key findings and a discussion of directions for future research.
2 Teamwork cognitive diagnostic modeling
2.1 Modeling foundation: log-linear model
Existing CDMs for individual cognitive attributes are typically categorized into four types based on the condensation rules that define how latent attributes influence observed item responses: conjunctive, disjunctive, additive (or compensatory), and saturated models (Henson et al., Reference Henson, Templin and Willse2009; Maris, Reference Maris1999). Conjunctive models, such as the deterministic input, noisy “and” gate (DINA) model (Junker & Sijtsma, Reference Junker and Sijtsma2001), assume that all required attributes must be mastered to produce a correct response. Disjunctive models, such as the deterministic input, noisy “or” gate model (Templin & Henson, Reference Templin and Henson2006), assume that mastery of at least one required attribute is sufficient. Additive models, such as the LLM (Maris, Reference Maris1999), allow for compensatory relationships among attributes, enabling partial mastery to contribute to the response probability. Saturated models, such as the log-linear CDM (Henson et al., Reference Henson, Templin and Willse2009) and the generalized DINA model (de la Torre, Reference de la Torre2011), include all possible main effects and interactions among attributes, without imposing constraints related to specific condensation rules.
In this study, the LLM is taken as an example to illustrate the conceptualization of the proposed Team-CDMs. There are three primary reasons for choosing the LLM as the foundation instead of a saturated model. First, the LLM strikes a balance between the excessive complexity of saturated models and the simplicity of the DINA model.Footnote 1 It also serves as a commonly used restricted form of both the general diagnostic model (von Davier, Reference von Davier2008) and the log-linear CDM. This flexibility allows the proposed Team-CDM to be theoretically extended to a more general model if needed. Second, saturated models require large sample sizes for robust parameter estimation (Chiu et al., Reference Chiu, Sun and Bian2018), which is often impractical in small-scale teamwork projects. Third, the primary focus of this study is to extend traditional CDMs for analyzing teamwork response, rather than on defining how multiple cognitive attributes combine to determine the probability of a correct item response (i.e., condensation rules) within traditional CDMs. Therefore, the selection of the foundational model is not the central objective of the study. While we acknowledge that saturated CDMs can capture complex attribute interactions—particularly in complex cognitive tasks—we intentionally adopt simpler CDM forms to better isolate and emphasize the role of interdependence in teamwork settings.
Let
${Y}_{ni}$
represent the observed response of participant n (n = 1, …, N) to item i (i = 1, …, I). The item response function of the LLM can be expressed as

where
$P\left({Y}_{ni}=1\right)$
is the probability that participant n correctly responds to item i and
${\gamma}_{0i}$
and
${\gamma}_{1 ki}$
represent the intercept and the kth main effect of item i, respectively. Here,
${\alpha}_{nk}\in \left\{0,1\right\}$
denotes the cognitive attribute status of participant n on attribute k (k = 1, …, K), with
${\alpha}_{nk}=1$
if mastered and
${\alpha}_{nk}=0$
otherwise.
${q}_{ik}\in \left\{0,1\right\}$
is an element of the Q-matrix (Tatsuoka, Reference Tatsuoka1983), with
${q}_{ik}=1$
if attribute k is required for a correct response to item i, and
${q}_{ik}=0$
otherwise.
As a compensatory model, the LLM assumes that, for the probability of a correct response, mastery of one required attribute may compensate for the lack of mastery in another required attribute. In the LLM, the lowest correct response probability of item i is

which represents the guessing probability of a correct response to item i when no required attributes are mastered. The correct response probability increases progressively with participants’ mastery of the required attributes, reaching a highest probability defined as

which denotes the ideal probability of a correct response to item i when all required attributes are mastered. Typically,
${g}_i$
and
${h}_i$
are referred to as the item guessing and non-slipping parameters, respectively, with
${s}_i=1-{h}_i$
commonly known as the slipping parameter. Finally, the likelihood function of the LLM can be found in Section A3 in the Supplementary Material.
2.2 Teamwork cognitive diagnostic models
Before presenting the new models, we highlight several key points of this study. First, we focus on the dyadic team structure (i.e., either team t [t = 1, 2, …, T] contains two members A(t) and B(t)), as it represents a foundational form of team configurations, such as “one-with-many” and leaderless teams (Kenny et al., Reference Kenny, Kashy and Cook2006).
Second, teamwork tasks typically include individual response items (answered independently by each member) alongside teamwork response items (answered collaboratively by all team members). Since individual response items and teamwork response items measure the same latent constructs, this combination provides a benchmark for assessing the advantages or disadvantages of teamwork (Hao et al., Reference Hao, Liu, von Davier, Kyllonen, von Davier, Zhu and Kyllonen2017, Reference Hao, Liu, Kyllonen, Flor and von Davier2019). Based on this, the observed responses to teamwork response items can take two primary modes: (1) the teamwork-unified response mode, where team members collaborate to produce a single unified response (e.g., Woolley et al., Reference Woolley, Chabris, Pentland, Hashmi and Malone2010; Zhang et al., Reference Zhang, Yin, Zhang, Zhang, Bao and Xuan2024), and (2) the teamwork-separate response mode, where team members work together but submit individual responses separately (e.g., Li et al., Reference Li, Liu, Cai and Yuan2023; Wilson et al., Reference Wilson, Gochyyev and Scalise2017; Yuan et al., Reference Yuan, Xiao and Liu2019). Typically, the teamwork-separate response mode serves as a precursor to the teamwork-unified response mode, as members must first generate individual responses before synthesizing them into a unified submission. Note that this study does not aim to evaluate the relative advantages or disadvantages of the two response modes, but rather treats them as two distinct testing scenarios that may arise in practice. In such cases, for tasks adopting the teamwork-unified response mode, the complete response data comprise three item response matrices: two individual response matrices for each team member (i.e., Y
A(t) and Y
B(t)) and one matrix for the teamwork-unified items (i.e.,
${\boldsymbol{Y}}_{AB(t)}^{\ast }$
). In contrast, for tasks using the teamwork-separate response mode, the response data consist of four matrices: two for individual responses (i.e., Y
A(t) and Y
B(t)) and two for member-specific teamwork response matrices (i.e.,
${\boldsymbol{Y}}_{A(t)}^{\ast }$
and
${\boldsymbol{Y}}_{B(t)}^{\ast }$
). Separate models were constructed for each teamwork response mode.
2.2.1 Models for tasks adopting teamwork-separate response mode
The proposed model contains two components: one for individual item responses and another for teamwork-separate item responses. Let
${Y}_{A(t)i}$
and
${Y}_{B(t)i}$
be the observed individual responses of team members A and B of team t to individual response item i (i = 1, …, I
1), respectively, and let
${Y}_{A(t)i}^{\ast }$
and
${Y}_{B(t)i}^{\ast }$
be the observed teamwork-separate responses of team members A and B of team t to teamwork response item i (i = 1, …, I
2), respectively.
For individual response items, the LLM can be applied to each team member as

namely,


where # denotes team member A or B,
$P\left({Y}_{A(t)i}=1\right)$
and
$P\left({Y}_{B(t)i}=1\right)$
are the probabilities that team members A and B of team t correctly respond to item i, respectively, and
${\alpha}_{A(t)k}$
and
${\alpha}_{B(t)k}$
denote the individual cognitive attribute status of team members A and B on attribute k, respectively. This is equivalent to dividing the N participants equally into two groups of T = N/2 participants each, with each group responding the same number of I
1 items;
${q}_{ik}$
is the element of the sub-Q-matrix corresponding to individual response items.
For teamwork response items, a major challenge in modeling teamwork responses lies in accurately representing the interaction or interdependence between team members. Drawing on the theoretical framework of team cognition (e.g., Cannon-Bowers et al., Reference Cannon-Bowers, Salas, Converse and Castellan1993; Salas et al., Reference Salas, Reyes, Woods, von Davier, Zhu and Kyllonen2017), the present study conceptualizes team cognition as a team-level construct emerging from the interaction of team members’ cognitive attribute, capturing both team’s collective cognitive capacity, such as shared knowledge and skills, and the quality of teamwork, such as the alignment, coordination, and communication among team members. In other words, team cognitions are continuous, team-level constructs that correspond to individual cognitive attributes. Hence, the emergence of a specific team cognition is contingent upon the mastery of the relevant cognitive attribute by at least one team member, thereby establishing the foundation for shared understanding, and the quality of teamwork further shapes the level of team cognition, with high-quality teamwork enhancing it and low-quality teamwork diminishing it (DeChurch & Mesmer-Magnus, Reference DeChurch and Mesmer-Magnus2010; Lewis, Reference Lewis2003). Figure 1 displays the structural diagram of team cognitions in this study.

Figure 1 The structural diagram of team cognitions. Note: A and B are two team members.
In such a case, a team cognition parameter can be introduced into Equation (2) to account for the interaction or interdependence between team members. Accordingly, the item response function for teamwork-separate responses can be specified as follows:

namely,


where
${\Theta}_{tk}$
represents the team cognition formed by team t on cognitive attribute k,
${q}_{ik}^{\ast }$
is an element of the sub-Q-matrix corresponding to teamwork response items, and other parameters have been defined above. This model assumes that the probability of a correct teamwork-separated response is jointly determined by the individual member’s mastery of the required cognitive attributes and the corresponding team cognition formed through member interactions.
Furthermore, this study hypothesizes that each team member’s cognitive attributes contribute to teamwork responses indirectly by shaping shared team cognition through three distinct types of interaction mechanisms. The first interaction type, termed the disjunctive interaction, assumes that if any team member has mastered a given cognitive attribute, the team as a whole possesses that attribute as the basis for team cognition (cf. DeChurch & Mesmer-Magnus, Reference DeChurch and Mesmer-Magnus2010; Xue et al., Reference Xue, Lu and Hao2018), with the degree of team cognition influenced by the quality of teamwork. The second type, termed the additive interaction, assumes that the degree of team cognition is influenced not only by teamwork quality but also by the cumulative cognitive attributes jointly mastered by team members (cf. Hao et al., Reference Hao, Liu, von Davier, Kyllonen, von Davier, Zhu and Kyllonen2017; Salas et al., Reference Salas, Reyes, Woods, von Davier, Zhu and Kyllonen2017). The third type is termed the conjunctive interaction, which requires both team members to collectively master a specific cognitive attribute to establish the necessary foundation for developing corresponding team cognition, where the degree of team cognition is further influenced by the quality of teamwork (cf. Cooke, Reference Cooke2015).
Specifically, given these three distinct interaction mechanisms, the team cognition parameter can be further defined as

where
${\tau}_t\sim N\left(0,{\sigma}_{\tau}^2\right)$
is a team-level random effect parameter, referred to as the teamwork-effect parameter, which captures the quality of teamwork between the members of team t. Higher values of this parameter indicate that the team exhibits relatively higher teamwork quality compared with other teams in the sample population of participants. It reflects the relative positioning of a team’s teamwork quality on a latent continuum, rather than a deterministic evaluation of good or bad performance. Specifically, when
${\tau}_t>0$
, it indicates that teamwork provides benefits, enhances team cognition, and increases the probability of correct teamwork responses. A value of
${\tau}_t=0$
indicates ineffective collaboration among team members, meaning that no team cognition is formed. In this case, the probability of teamwork response is separately contributed by each team member, resembling a division of labor without actual collaboration. When
${\tau}_t<0$
, teamwork impairs team cognition, reducing the probability of correct teamwork responses.
For ease of understanding, consider a hypothetical example in which the individual cognitive attribute patterns of two members within team t are (1, 0, 0)′ and (1, 1, 1)′, respectively. Based on Equation (4), the resulting team cognition pattern varies by the type of team interaction mechanism: (
${\tau}_t$
,
${\tau}_t$
,
${\tau}_t$
)′ for disjunctive interaction, (
$2{\tau}_t$
,
${\tau}_t$
,
${\tau}_t$
)′ for additive interaction, and (
${\tau}_t$
, 0, 0)′ for conjunctive interaction, respectively. Then, the team cognition pattern, together with the individual-level cognitive patterns, jointly influences the probability of a correct teamwork response, as defined in Equation (3).
2.2.2 Models for tasks adopting teamwork-unified response mode
When the influence of team cognition is excluded, unlike the teamwork-separate response mode—in which each member’s response probability is solely determined by their own mastery of the required cognitive attributes—the teamwork-unified response mode assumes that the team’s unified response probability is jointly influenced by the combined attribute mastery of both members. Therefore, the item response function for teamwork-unified responses can be specified as follows:

where
$P\left({Y}_{AB(t)i}^{\ast}=1\right)$
denotes the correct teamwork-unified response probability of team t to item i. Equation (4) can be incorporated accordingly, and all other parameters are as previously defined.
This model illustrates how the team cognition formed from the two members, along with their individual mastery of cognitive attributes, jointly influences the probability of a correct teamwork-unified response. In such a case, both members will no longer have their own probability of a correct teamwork response. Noting that, given the interpersonal equivalence of cognitive attributes, the model does not differentiate the relative contributions of each member’s attribute mastery to the probability of teamwork-unified responses. That is, it assumes equal contributions from both members (Yuan et al., Reference Yuan, Xiao and Liu2019).
2.2.3 Explanatory models with covariates
In the proposed models above, the teamwork-effect parameter (
${\tau}_t$
) is a team-level random effect parameter that is freely estimated (cf. Wilson et al., Reference Wilson, Gochyyev and Scalise2017). Typically, information such as communications between team members can serve as an indicator of teamwork quality (e.g., Li et al., Reference Li, Liu, Cai and Yuan2023; Woolley et al., Reference Woolley, Chabris, Pentland, Hashmi and Malone2010). Consequently, it is reasonable to assume that the teamwork-effect parameter depends on covariates reflecting teamwork quality, such as the frequency of communication between team members and the number of teamwork items answered consistently by both members.Footnote 2
Inspired by the explanatory models (Park et al., Reference Park, Xing and Lee2018; Wilson & De Boeck, Reference Wilson and De Boeck2004), the teamwork-effect parameter in Equation (4) can be further deconstructed as

where
${z}_{ht}$
represents the teamwork covariate h (h = 1, 2, …, H) of team t weighted by coefficient
${\tau}_{1h}$
and
${\varepsilon}_t\sim N\left(0,{\sigma}_{\varepsilon}^2\right)$
is the residual term.
Overall, this study proposes 12 Team-CDMs based on the teamwork response modes (separate and unified), the teamwork interaction mechanism (disjunctive, additive, and conjunctive), and the teamwork-effect specification (random-effect and covariate-modulated). Specifically, combining Equations (2)–(4) results in three models for teamwork-separate responses with different teamwork interaction mechanisms:
-
– M1: Disjunctive model for teamwork-separate responses;
-
– M2: Additive model for teamwork-separate responses;
-
– M3: Conjunctive model for teamwork-separate responses.
Combining Equations (2), (4), and (5) results in another three models for teamwork-unified responses with different teamwork interaction mechanisms:
-
– M4: Disjunctive model for teamwork-unified responses;
-
– M5: Additive model for teamwork-unified responses;
-
– M6: Conjunctive model for teamwork-unified responses.
Deconstructing the random effect teamwork-effect parameter in the above six models through the inclusion of teamwork covariates yields six corresponding explanatory models:
-
– M7: Disjunctive model for teamwork-separate responses with covariates;
-
– M8: Additive model for teamwork-separate responses with covariates;
-
– M9: Conjunctive model for teamwork-separate responses with covariates;
-
– M10: Disjunctive model for teamwork-unified responses with covariates;
-
– M11: Additive model for teamwork-unified responses with covariates;
-
– M12: Conjunctive model for teamwork-unified responses with covariates.
To facilitate the presentation, the initial models (M1 ~ M6) are referred to as standard models collectively, whereas the subsequent models (M7 ~ M12) are designated as explanatory models. Figure 2 provides graphical representations and a summary of these proposed Team-CDMs. The likelihood function of the proposed models can be found in Section A3 in the Supplementary Material.

Figure 2 Graphical representations and summarization of teamwork cognitive diagnostic models (CDMs). Note: (a) Models for tasks adopting teamwork-separate response mode; (b) models for tasks adopting teamwork-unified response mode; (c) summarization of 12 teamwork CDMs with three modeling factors. M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; Θ = team cognition; A and B = team members A and B; z = covariate;
$\otimes$
= interaction mechanism; I
1 = the number of individual response items; I
2 = the number of teamwork response items.
2.3 Bayesian parameter estimation
The fully Bayesian Markov Chain Monte Carlo (MCMC) algorithm is employed to estimate the parameters of the proposed models. This approach streamlines the estimation of intricate model parameters by simulating the posterior distributions of item parameters and latent variables, thus enabling the subsequent inference of model parameters. The JAGS (Version 4.3.2; Plummer, Reference Plummer2015) is invoked from within the R software environment to implement the Bayesian MCMC algorithm. For detailed guidance on using JAGS for Bayesian MCMC estimation in CDMs, please refer to Zhan, Jiao, Man, et al. (Reference Zhan, Jiao, Man and Wang2019).
Except for the teamwork-effect parameter, the prior distributions of the other parameters refer to those outlined in Zhan, Jiao, Man, et al. (Reference Zhan, Jiao, Man and Wang2019). To begin with, item responses are assumed to be independently distributed following a Bernoulli distribution:
${Y}_{\#(t)i}\sim Bernoulli\;\left(P\left({Y}_{\#(t)i}=1\right)\right)$
and
${Y}_{ti}\sim Bernoulli\;\left(P\left({Y}_{ti}=1\right)\right)$
. Imposing the monotonicity restriction, the priors of item parameters are specified
${\gamma}_{0i}\sim N\left(-2.197,1\right)$
and
${\gamma}_{1 ki}\sim {N}^{+}\left(0,{2}^2\right)$
. Let
${\pi}_c=P\left({\boldsymbol{\alpha}}_c\right)$
be the marginal probability of attribute pattern c in the population (i.e., the mixing proportion), then
$\boldsymbol{\pi} ={\left({\pi}_1,\dots, {\pi}_C\right)}^{\prime }$
be a C-dimensional probabilistic vector of attribute patterns, and
${\sum}_{c=1}^C{\pi}_c=1$
, where
$C={2}^K$
is the number of possible attribute patterns. The prior distribution of the attributes is set as:
$c\sim Categorical\left(\boldsymbol{\pi} \right)$
,
$\boldsymbol{\pi} \sim Dirichlet\left(\boldsymbol{\lambda} \right)$
, and
$\boldsymbol{\lambda} ={\left(1,\dots, 1\right)}^{\prime }$
, indicating that participant n’s (i.e., A(t)’s or B(t)’s) attribute pattern is assumed to follow a categorical distribution, with the probability of membership in the cth pattern, then
${\boldsymbol{\alpha}}_n={\boldsymbol{\alpha}}_c$
. For standard Team-CDMs, the prior distribution of the teamwork-effect parameter is set as
${\tau}_t\sim N\left(0,{\sigma}_{\tau}^2\right)$
with hyper-prior
${\sigma}_{\tau}^2\sim InvGamma\left(1,1\right)$
. In contrast, for explanatory Team-CDMs, the prior distribution of the hth coefficient is set as
${\tau}_{1h}\sim N\left(0,1\right)$
, and the prior distribution of the residual term is set as
${\varepsilon}_t\sim N\left(0,{\sigma}_{\varepsilon}^2\right)$
with hyper-prior
${\sigma}_{\varepsilon}^2\sim InvGamma\left(2,0.5\right)$
.
Finally, the posterior mean is used as the estimated value of continuous parameters, such as item parameters and teamwork-effect parameters. The posterior mode is used as the estimated value of individual cognitive attributes.
3 Simulation study
We conducted two simulation studies to evaluate the proposed models. The first study assessed the psychometric performance of the 12 Team-CDMs under various simulated testing scenarios. The second study examined potential risks arising from omitting the teamwork-effect parameter or the whole team cognitions in the analysis of teamwork response data. It also investigated the negative consequences of analyzing only individual response items in a teamwork-based assessment and evaluated each model’s performance when analyzing only collaborative response items.
3.1 Simulation Study 1
3.1.1 Design and data generation
In addition to using each of the 12 Team-CDMs as data generating models, two factors were manipulated: first, the number of teams (T) at four levels of 50, 100, 200, and 400, corresponding to a total of 100, 200, 400, and 800 participants; second, the number of individual and teamwork items (I 1 = I 2) at two levels: 15 and 30, corresponding to a total of 30 and 60 items. Consistent with the empirical study (see Section 4), five personal latent attributes (K = 5) were diagnosed, with the Q-matrices shown in Figure 3. Each Q-matrix for the task comprises two submatrices: one corresponding to individual response items and the other to teamwork response items. Each submatrix included at least one identity matrix for completeness, and each attribute was measured at least three times to further ensure model identifiability (Gu & Xu, Reference Gu and Xu2020; Köhn & Chiu, Reference Köhn and Chiu2017; Xu & Zhang, Reference Xu and Zhang2016). In addition, each attribute was measured approximately the same number of times to mitigate the potential impact of unbalanced attribute settings on parameter estimation results (Cheng, Reference Cheng2010).

Figure 3 K-by-I Q ′ matrix for simulation study. Note: Gray means “1” and blank means “0”. “*” denotes teamwork response items. (a) Both individual and teamwork response items contain 15 items. (b) Both individual and teamwork response items contain 30 items.
Other parameters were generated based on specific distributions. Since the standard and explanatory models define the teamwork-effect parameters differently, we used two distinct methods to generate these parameters accordingly. For standard models (M1 ~ M6), a multivariate normal distribution was first used to generate two members’ general abilities (e.g., the fluid intelligence in the empirical study) and their teamwork effect:

This setup assumes a moderate positive correlation between the general abilities of the two members, and a low positive correlation between their respective general abilities and the teamwork effect. For explanatory models (M7 ~ M12), H = 3 covariates (
${z}_1$
,
${z}_2$
, and
${z}_3$
) were considered, all standardized for simplicity. Specifically, each covariate was generated from a standard normal distribution, with equal weights represented as
${\boldsymbol{\unicode{x3c4}}}_1={\left(0.5,0.5,0.5\right)}^{\prime }$
. Then, a multivariate normal distribution was used to generate two members’ general abilities and the residual term of teamwork effect:

Under these settings, the three covariates account for 75% of the variance in the teamwork effect, with the variance of the teamwork-effect parameter still constrained to 1 (i.e.,
${\sigma}_{\tau}^2=1$
). Additionally, the correlation between the teamwork effect and each general ability is fixed at 0.3, consistent with the specifications in Equation (6).
Subsequently, a higher-order latent structural model was applied to both standard and explanatory models to generate individual cognitive attributes, enabling the incorporation of correlations among these attributes (de la Torre & Douglas, Reference de la Torre and Douglas2004), as

where
${\lambda}_{0k}$
is the attribute intercept parameter (fixed at
${\boldsymbol{\unicode{x3bb}}}_0={\left(-1.5,-1,0,1,1.5\right)}^{\prime }$
) and
${\lambda}_{1k}$
is the attribute slope parameter (fixed at
${\boldsymbol{\unicode{x3bb}}}_1={\left(1.5,1.5,1.5,1.5,1.5\right)}^{\prime }$
). Then, the true mastery of each member on each attribute was generated from a Bernoulli distribution:
${\alpha}_{\#(t)k}\sim Bernoulli\;\left(P\left({\alpha}_{\#(t)k}=1\right)\right)$
.
For item parameters, item intercept parameters were generated from
${\gamma}_{0i}\sim N\left(-2.197,{0.5}^2\right)$
and item main-effect parameters were generated from
${\gamma}_{1 ki}\sim N\left(\frac{4.394}{\sum_{k=1}^K{q}_{ik}},{0.5}^2\right)$
. In this setting,
${g}_i$
for all items follow a positively skewed distribution (mean ≈ 0.1, minimum ≈ 0.05, maximum ≈ 0.45), and
${h}_i$
for all items follow a negatively skewed distribution (mean ≈ 0.9, minimum ≈ 0.55, maximum ≈ 0.95). Consequently, the correlation between
${g}_i$
and
$1-{h}_i$
follows a negative correlation, which is more realistic (Zhan, Jiao, Liao, et al., Reference Zhan, Jiao, Liao and Bian2019).
Finally, the observed individual responses were generated as
${Y}_{\#(t)i}\sim Bernoulli\;\left(P\left({Y}_{\#(t)i}=1\right)\right)$
, where
$P\left({Y}_{\#(t)i}=1\right)$
is respectively defined in Equations (2); the observed teamwork-separate responses were generated as
${Y}_{\#(t)i}^{\ast}\sim Bernoulli\;\left(P\left({Y}_{\#(t)i}^{\ast}=1\right)\right)$
, where
$P\left({Y}_{\#(t)i}^{\ast}=1\right)$
is, respectively, defined in Equations (3); and the observed teamwork-unified responses were generated as
${Y}_{AB(t)i}^{\ast}\sim Bernoulli\;\left(P\left({Y}_{AB(t)i}^{\ast}=1\right)\right)$
, where
$P\left({Y}_{AB(t)i}^{\ast}=1\right)$
is defined in Equation (5). A total of 50 datasets were generated in each simulated condition.
3.1.2 Analysis
A total of 12 Team-CDMs were used to analyze the data generated under each model. For each simulated condition, 50 replications were conducted using two Markov chains with random starting points. After discarding the first 3,000 iterations of 5,000 as burn-in, 4,000 iterations (2,000 per chain) were retained for parameter inference. To assess convergence, the potential scale reduction factor (PSRF; Brooks & Gelman, Reference Brooks and Gelman1998) was computed for each parameter. In this study, PSRF values were generally below 1.01 and all below 1.2, indicating good convergence under the specified settings.
The bias and root-mean-square error (RMSE) were computed to evaluate the recovery of continuous parameters (e.g., item parameters and the teamwork-effect parameter) as
$bias\left(\widehat{x}\right)={\sum}_{r=1}^{50}\frac{{\widehat{x}}_r-{x}_r}{50}$
and
$RMSE\left(\widehat{x}\right)=\sqrt{\sum_{r=1}^{50}\frac{{\left({\widehat{x}}_r-{x}_r\right)}^2}{50}}$
, respectively, where
${x}_r$
and
${\widehat{x}}_r$
is the generated and estimated values of the model parameter in replication r. In addition, the correlations between the generated and estimated values (denoted as Cor) of the model parameters were computed. The classification accuracy of attributes (CAA) and the classification accuracy of attribute patterns (CAP) were calculated to evaluate diagnostic classification accuracy. For CAA, the accuracy for each attribute was computed as
${CAA}_k=\frac{\sum_{r=1}^{50}{\sum}_{n=1}^NI\left({\widehat{\alpha}}_{nkr}={\alpha}_{nkr}\right)}{NR}$
, where N represents the total number of participants (i.e., 2T). Two types of CAPs were calculated: one based on 5 attributes for individual members and another based on 10 attributes for the two members of teams. The former evaluates the classification accuracy of cognitive attribute patterns for individual participants, whereas the latter treats the team as a whole, assessing the proportion of teams in which the cognitive attribute patterns of both members are accurately classified. Specifically, the CAP for individuals was computed as
${CAP}_{individuals}=\frac{\sum_{r=1}^{50}{\sum}_{n=1}^NI\left({\widehat{\boldsymbol{\alpha}}}_{nk}={\boldsymbol{\alpha}}_{nk}\right)}{NR}$
, and for teams as
${CAP}_{Team}=\frac{\sum_{r=1}^{50}{\sum}_{t=1}^TI\left({\left({\widehat{\boldsymbol{\unicode{x3b1}}}}_{A(t),}\;{\widehat{\boldsymbol{\unicode{x3b1}}}}_{B(t)}\right)}^{\prime}={\left({\boldsymbol{\unicode{x3b1}}}_{A(t),}\;{\boldsymbol{\unicode{x3b1}}}_{B(t)}\right)}^{\prime}\right)}{TR}$
.
3.1.3 Results
Table 1 summarizes the computation time of the proposed model across all simulated conditions. Computation time remains highly consistent across the 50 iterations within each condition, suggesting stable algorithmic performance. Statistical analysis indicates that the primary factors influencing computation time are data volume—specifically, the number of teams, the number of items, and the teamwork response mode. In contrast, the other manipulated variables have minimal impact on computational cost.
Table 1 Computation time of proposed models in Simulation Study 1

Note: T: the number of dyadic teams; I: the number of individual or teamwork response items. All computations were conducted on a Mac mini (Apple M4 Pro, 64 GB Memory) using multi-program parallel processing.
Figure 4 displays the RMSE values for the item guessing and non-slipping parameters, with detailed metrics for bias, RMSE, and correlation (Cor) provided in Table A1 in the Supplementary Material. Overall, item parameter recovery was satisfactory across all models and simulation conditions. Nearly all RMSE values were below 0.1, with the exception of a few non-slipping parameters associated with teamwork response items under small sample size conditions. The explanatory models (M7–M12) slightly outperformed the standard models (M1–M6) in parameter recovery. Similarly, models using the teamwork-separate mode (M1–M3 and M7–M9) demonstrated marginally better recovery than those employing the teamwork-unified mode (M4–M6 and M10–M12). No notable differences were found in parameter recovery between models using different interaction mechanisms. Finally, increasing the number of teams and items, thereby augmenting the amount of available data, significantly improved the accuracy of item parameter recovery.

Figure 4 Root-mean-square error of item parameters in Simulation Study 1. Note: M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; T: the number of dyadic teams; I: the number of individual or teamwork response items; guessing: item guessing parameter, namely, minimum correct response probability of an item; non-slipping: item non-slipping parameter, namely, maximum correct response probability of an item.
Figure 5 presents the classification accuracy of attributes (CAA) and attribute patterns (CAP). Detailed values are available in Table A2 in the Supplementary Material. First, CAA for each attribute was consistently high (above 0.9) across all models and simulated conditions. Second, the standard models and explanatory models exhibited comparable classification accuracy, indicating equivalent performance. Third, models using the teamwork-separate mode (M1–M3 and M7–M9) achieved better classification accuracy than those employing the teamwork-unified mode (M4–M6 and M10–M12). Fourth, no notable differences were found in classification accuracy between models using different interaction mechanisms. Finally, increasing the number of items significantly enhanced classification accuracy for both attributes and attribute patterns.

Figure 5 Classification accuracy of attributes and attribute patterns in simulation 1. Note: M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; T: the number of dyadic teams; I: the number of individual or teamwork response items; Mean_CAA: mean classification accuracy of five attributes; CAP Individuals: classification accuracy of attribute pattern for individual participants (five attributes); CAP team: classification accuracy of attribute pattern for teams (10 attributes).
Figure 6 illustrates the recovery of the teamwork-effect parameter, with detailed values for this parameter and other related parameters (e.g., weights of covariates in the explanatory models) provided in Table A3 in the Supplementary Material. First, recovery accuracy improves as the number of items increases, whereas the number of teams has minimal impact. Second, explanatory models (M7–M12) show superior recovery compared with standard models (M1–M6), emphasizing the advantage of integrating covariates. Third, models using the teamwork-separate mode (M1–M3 and M7–M9) achieve better recovery than those employing the teamwork-unified mode (M4–M6 and M10–M12). Fourth, models based on the additive interaction mechanism (M1, M4, M7, and M10) exhibit slightly better recovery performance compared with those based on disjunctive and conjunctive interaction mechanisms. Notably, the recovery of the teamwork quality parameters is poorest under the conjunctive interaction mechanism. One possible explanation is that, in this framework, these parameters affect the response probability only when two members simultaneously master specific required attributes—that is, the prerequisite for social behaviors is relatively stringent. Consequently, fewer teamwork-effect parameters have a meaningful impact on the response data, which restricts their estimation and results in poorer recovery performance.

Figure 6 Root-mean-square error of the teamwork-effect parameter in Simulation Study 1. Note: M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; T: the number of dyadic teams; I: the number of individual or teamwork response items.
Overall, the results of Simulation Study 1 demonstrated that the setup of the 12 proposed Team-CDMs was well justified, and the parameter estimates were accurate enough, indicating good psychometric performance across various testing scenarios.
3.2 Simulation Study 2
3.2.1 Design, data generation, and analysis
This study investigated the risks of omitting team cognitions or teamwork effects in analyzing collaborative response data. It also assessed the drawbacks of focusing solely on individual responses in teamwork assessments and evaluated model performance when analyzing only collaborative responses. Each of the 12 Team-CDMs was used to generate data, with the number of teams fixed at 100 and the number of individual or teamwork items set at 15. For simplicity, this study directly used 50 datasets generated under such specific simulated conditions in Simulation Study 1. This relatively limited testing condition was chosen to minimize the ceiling effect on model performance that could arise from an overabundance of information in the data.
For each dataset, five analytical models were applied: (1) the Team-CDM itself (i.e., M1–M8); (2) a constrained model with the teamwork-effect parameter fixed at 1, namely, ignoring the effect of teamwork quality on the team cognition and the probability of correct teamwork responses, and only the interaction between team members’ attributes is considered (denoted as M1a–M8a); (3) a constrained model excluding the whole team cognition on the probability of correct teamwork responses (denoted as M1b–M8b), in which the interdependence between team members is ignored; (4) the Team-CDM itself applied only to individual response data (denoted as M1-1 to M8-1; equivalent to LLM), and (5) the Team-CDM itself applied only to collaborative response data (denoted as M1-2 to M8-2).
Notably, fixing the teamwork-effect parameter makes the interaction mechanism the only factor shaping the model’s structure. This effectively eliminates the distinction between the standard models and the corresponding explanatory models, resulting in equivalences such as M1a being equivalent to M7a, M2a–M8a, and so on. Furthermore, when team cognitions are excluded, all models with the same teamwork response mode become identical. For instance, M1b = M2b = M3b = M7b = M8b = M9b (models employing the teamwork-separate mode are also equivalent to the LLM) and M4b = M5b = M6b = M10b = M11b = M12b.
To compare the fit of the original and constrained models to the data, three model comparison indices were computed: deviance information criterion (DIC), widely applicable information criterion (WAIC), and leave-one-out cross-validation (LOO). Lower values of these indices indicate a better model fit. The remaining analytical procedures and evaluation metrics were consistent with those used in Simulation Study 1.
3.2.2 Results
Table 2 presents comparative analyses of model-data fit statistics between the original and constrained models. The results demonstrate that the original parameterization (M1–M12) exhibits a consistently superior fit relative to both constrained variants (M1a–M12a and M1b–M12b) across all conditions. Moreover, all three fit indices successfully identified the theoretically optimal model.
Table 2 Summary of model-data fits in Simulation Study 2

Note: M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; M1a–M12a: M1–M12 with fixed teamwork effect; M1b–M12b: M1–M12 without team cognitions; DIC: deviance information criterion; WAIC: widely available information criterion; LOO: leave-one-out cross-validation.
Table A4 in the Supplementary Material reports the recovery of item parameters. First, the original models (M1–M12) achieve the lowest RMSE values across almost all conditions. Second, parameter recovery is relatively poor when analyzing individual response items or collaborative response items alone, though slightly better for the former. This highlights the limitations of focusing solely on individual response items in a collaborative task and the necessity of including them. Accurate diagnosis of individual cognitive attribute mastery—ensured through individual response items—is essential for isolating and evaluating the unique contribution of team cognitions in the teamwork response items. Third, fixing the teamwork-effect parameter (M1a–M12a) results in a significant increase in RMSE for collaborative response items, underscoring the negative impact of neglecting teamwork quality on parameter estimation, particularly for collaborative tasks. Fourth, excluding the whole team cognitions (M1b–M12b) produces RMSE values for individual response items comparable to the unconstrained models. However, for collaborative response items, RMSE values increase moderately. Interestingly, ignoring team cognitions entirely yields better item parameter recovery than considering interaction while neglecting teamwork quality. This finding underscores the importance of accounting for both interaction mechanisms and teamwork quality to accurately model team cognition.
Figure 7 summarizes the CAA and CAP across various models, with detailed CAA values for each attribute provided in Table A5 in the Supplementary Material. First, the Team-CDMs (M1–M12) achieve the highest classification accuracies. Second, classification accuracy is relatively poor when analyzing individual response items or collaborative response items alone. Third, models that fail to fully account for team cognition—whether by ignoring teamwork quality or excluding the whole team cognition—show reduced classification accuracy. Notably, models that exclude the whole team cognition entirely perform slightly better than those that neglect teamwork quality alone, consistent with the findings for item parameter recovery.

Figure 7 Classification accuracy of attributes and attribute patterns in simulation 2. Note: M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; M1-1 to M12-1: M1–M12 for individual response items; M1-2 to M12-2: M1–M12 for collaborative response items; M1a–M12a: M1–M12 with fixed teamwork effect; M1b–M12b: M1–M12 without team cognitions; CAA: classification accuracy of attribute; CAP Individuals: classification accuracy of attribute pattern for individual participant (5 attributes); CAP team: classification accuracy of attribute pattern for team (10 attributes).
Overall, the results of Simulation Study 2 highlight that accurate modeling of team cognition requires integrating both the teamwork-effect and interaction mechanisms. Models that failed to fully account for these factors—either by neglecting teamwork quality or excluding interaction terms—exhibited poor parameter recovery and classification accuracy, particularly for collaborative response items. Additionally, analyzing individual or collaborative response items in isolation yielded similarly suboptimal results, emphasizing the importance of incorporating both individual contributions and collaborative processes into the modeling of teamwork.
4 Empirical example
4.1 Instrument
A teamwork reasoning task for measuring collective intelligence was implemented to demonstrate the application of the proposed Team-CDMs. The collective intelligence can be regarded as an important aspect that deeply influences a group’s collaborative problem-solving ability. Similar to how individual intelligence equips an individual with the cognitive means to handle various tasks independently, collective intelligence emerges from the interactions and synergy among group members, endowing the group with a unique capacity to approach and resolve complex problems collaboratively (Woolley et al., Reference Woolley, Chabris, Pentland, Hashmi and Malone2010).
A shortened version of Raven’s Advanced Progressive Matrices (APM; Zhan, Chen, Man, et al., Reference Zhan, Chen, Man and Hao2024) was used as the foundation for this teamwork task to reduce administration time and alleviate participants’ cognitive load. This shortened APM includes 18 items selected from the original 36-item APM using the revised maximum priority index algorithm (Liu et al., Reference Liu, Cai and Tu2018). The selection ensured alignment with the cognitive component distribution of the original APM while maximizing test information under cognitive component constraints. Five cognitive components (Carpenter et al., Reference Carpenter, Just and Shell1990) were treated as cognitive attributes in this study: (α1) constant in a row, (α2) quantitative pairwise progression, (α3) addition/subtraction, (α4) distribution of three, and (α5) distribution of two.
To incorporate the teamwork aspect, the 18 items were divided into two sub-tests with balanced difficulty. One set was randomly assigned as individual response items, and the other as collaborative response items. The original APM’s item serial numbers and their corresponding cognitive attributes (Q-matrix) for individual and teamwork response items are shown in Figure 8d. It should be noted that due to the absence of an identity submatrix, the Q-matrix is incomplete and may not fully satisfy the strict identifiability conditions for certain model parameters (Köhn & Chiu, Reference Köhn and Chiu2017; Xu & Zhang, Reference Xu and Zhang2016). This limitation may affect the accuracy of diagnostic classification to some extent. Sample individual and teamwork items are presented in Figure 8a,b. For individual response items, both team members viewed the same item on their screens and responded independently on separate computers without communication. For collaborative response items, resource interdependence was introduced by creating information asymmetry: specific graphs in the matrix (e.g., three in the left or middle column) were masked to encourage verbal interaction. Team members were seated back-to-back to prevent nonverbal communication (e.g., visual or physical cues), as shown in Figure 8c. To ensure simultaneous responses, the system is configured so that when one member submits an answer first, he/she must wait until the other member also submits his/her answer before moving on to the next item.

Figure 8 Sample individual and collaborative response items, real test scenario, and Q-matrices. Note: (a) An individual response item consists of a 3 × 3 matrix with figural elements in the matrix area and 12 choices in the response options area; one cell in the matrix is missing and must be selected from the response options; both team members view the same information for the item on their screens. (b) A collaborative response item builds on the individual response item by blocking out part of the graphs, creating information asymmetry. In this case, each team member sees different but complementary information about the item on their screens. (c) Sitting back-to-back prevents communication, such as visual and physical movement. (d) Q-matrices for individual and collaborative response items; gray indicates “1,” whereas white indicates “0.”
Finally, five team cognition constructs corresponding to individual cognitive attributes are formed as (Θ1) shared understanding of constant in a row, (Θ2) shared understanding of quantitative pairwise progression, (Θ3) shared understanding of addition/subtraction, (Θ4) shared understanding of the distribution of three, and (Θ5) shared understanding of the distribution of two.
4.2 Participants
A total of 67 dyadic teams, comprising 134 students, participated in this study (Gender: male = 26, female = 108; Age: M = 20.310, SD = 1.952). All participants were voluntarily recruited from a university in a coastal province in China, and none had previously taken the APM.
4.3 Data analysis
In this study, for collaborative response items, team members were permitted to submit their own individual responses rather than being required to agree on a unified response. This approach enabled the collection of data under two teamwork response modes. First, by recording the responses of each participant for the teamwork items, we naturally obtained data for the teamwork-separate response mode. Second, for the teamwork-unified response mode, a teamwork-unified response was considered correct only if both team members’ answers were consistent and correct (i.e.,
${Y}_{ti}=1$
if
${Y}_{A(t)i}={Y}_{B(t)i}=1$
). In all other cases, teamwork-unified responses were treated as incorrect (i.e.,
${Y}_{ti}=0$
if
${Y}_{A(t)i}={Y}_{B(t)i}=0$
,
${Y}_{A(t)i}=1\;\mathrm{and}\;{Y}_{B(t)i}=0,$
or
${Y}_{A(t)i}=0\;\mathrm{and}\;{Y}_{B(t)i}=1$
).
Three teamwork covariates were used in this study: the number of teamwork items answered consistently by two team members (
${z}_1$
; M = 8.611, SD = 1.044), the frequency of communication between team members (
${z}_2$
; M = 172.642, SD = 101.720), and the time difference between two team members submitting their responses to teamwork items (
${z}_3$
; in seconds, M = 24.843, SD = 37.958). For
${z}_1$
, consistency was recorded only when the options submitted separately by both members were identical, regardless of whether they were correct. For
${z}_2$
, the speaking frequency of each team member was recorded individually, and the communication frequency for the pair was defined as the lower frequency between the two. This approach excluded cases where one member spoke frequently without receiving a response. For
${z}_3$
, the total time difference between the two members’ answer submissions across all teamwork items was calculated, without differentiating who waited for whom. Logically, higher values for the first two covariates indicate higher teamwork quality, while a higher value for the third covariate suggests lower teamwork quality. To explore the relative importance of these covariates, all three were standardized before being incorporated into the explanatory models.
All 12 Team-CDMs, the LLM, and a Rasch model with teamwork effect (denoted as Team-Rasch, details can be found in Section A2 in the Supplementary Material) (cf. Wilson et al., Reference Wilson, Gochyyev and Scalise2017) were applied to fit the data using two Markov chains per model. Each chain included 20,000 iterations, with the first 10,000 iterations discarded as burn-in. The remaining 20,000 iterations (10,000 per chain) were used for model parameter inference. We checked for convergence using the PSRF. The LLM can only be used to analyze the teamwork-separate response mode, which was employed to investigate the impact of using traditional CDM while disregarding team cognition in teamwork tasks. The Team-Rasch also applies only to the teamwork-separate response mode, which was used to showcase the performance of existing teamwork models to demonstrate the relative advantages of proposed models. Posterior predictive model checking (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014) was employed to evaluate the absolute model-data fit. A posterior predictive probability (ppp) value between 0.05 and 0.95 was considered indicative of no systematic differences between the observed and predicted data, suggesting an adequate model-data fit. The ppp values were computed as
$ppp={\sum}_{e=1}^E( Sum({\boldsymbol{X}}^{postpred(e)})\ge Sum(\boldsymbol{X}))/E$
, where
$E$
is the total number of MCMC iterations,
$\boldsymbol{X}$
is the observed data, and
${\boldsymbol{X}}^{postpred(e)}$
represents the posterior predicted data in the eth iteration; the posterior predicted data were generated using the item response function of each Team-CDM, based on parameter samples from the posterior distributions (Levy & Mislevy, Reference Levy and Mislevy2016; Zhan, Chen, Wang, et al., Reference Zhan, Chen, Wang and Zhang2024). For model selection, the DIC, WAIC, and LOO were calculated, with lower values indicating a better fit between the model and the data.
4.4 Results
The PSRF values for all parameters across models were below 1.1, indicating good convergence under the specified settings. Table 3 summarizes the model-data fit statistics for the 12 proposed models, the LLM, and the Team-Rasch. All ppp values fell within the range of 0.05–0.95, indicating an acceptable fit to the data. Furthermore, based on the DIC, WAIC, and LOO indices, the following findings can be observed: (1) the LLM exhibited poorer data fit compared with the models using the teamwork-separate mode, highlighting the significance of incorporating team cognition in teamwork tasks. (2) The Team-Rasch model exhibited poorer data fit compared with the models employing the teamwork-separate response mode, but outperformed the LLM. These results suggest that CDMs provide a more appropriate analytical framework for this task and underscore the importance of accounting for interdependence among team members. (3) The models incorporating the additive interaction mechanism generally exhibited the best fit among the three interaction mechanisms. However, in rare instances, the models employing the conjunctive interaction mechanism demonstrated a slightly better fit to the data. (4) The explanatory models consistently outperformed their corresponding standard models. However, because the data analyzed by models using the teamwork-separate and teamwork-unified modes are not identical, their relative fit indices are not directly comparable. For simplicity, the subsequent discussion focuses on parameter estimates from M11, which demonstrated the best relative fit among the proposed models.
Table 3 Summary of model-data fits in empirical study

Note: M1: disjunctive model for teamwork-separate responses; M2: additive model for teamwork-separate responses; M3: conjunctive model for teamwork-separate responses; M4: disjunctive model for teamwork-unified responses; M5: additive model for teamwork-unified responses; M6: conjunctive model for teamwork-unified responses; M7: disjunctive model for teamwork-separate responses with covariates; M8: additive model for teamwork-separate responses with covariates; M9: conjunctive model for teamwork-separate responses with covariates; M10: disjunctive model for teamwork-unified responses with covariates; M11: additive model for teamwork-unified responses with covariates; M12: conjunctive model for teamwork-unified responses with covariates; LLM: log-linear model; TR: Team-Rasch model; DIC: deviance information criterion; WAIC: widely available information criterion; LOO: leave-one-out cross-validation; ppp: posterior predictive probability.
Figure 9 summarizes the estimates of model parameters, including item parameters, attribute patterns, and teamwork effects. First, the overall quality of the items was not satisfactory, particularly due to the high guessing probabilities for the first four individual response items, likely caused by their low difficulty (cf. Liu et al., Reference Liu, Zhan, Fu, Chen, Man and Luo2023). Second, the diagnostic results revealed that the 134 participants exhibited 24 of the 32 possible attribute patterns. Among these, the largest group of participants (31.34%) mastered all five attributes, followed by those who were missing the first and fifth attributes. This distribution suggests that the reasoning abilities (or fluid intelligence) of the participants were generally high. As shown in Table A6 in the Supplementary Material, the tetrachoric correlations among the five attributes reveal a low-to-moderate degree of association, suggesting that the attributes are relatively distinct yet not entirely independent. Third, the teamwork-effect parameter estimates were significantly positively correlated with the total number of attributes mastered by the two team members (r(τ, α) = 0.274, p = 0.025), the consistency of answers submitted by the two members (r(τ,
${z}_1$
) = 0.836, p < 0.001), and the total frequency of communication (r(τ,
${z}_2$
) = 0.404, p = 0.001). Conversely, the teamwork effect was significantly negatively correlated with the total waiting time between the two members (r(τ,
${z}_3$
) = −0.493, p < 0.001). However, the partial correlation coefficient indicates that the relationship between the teamwork effect and the total waiting time was not significant when controlling for other variables (pr(τ,
${z}_3$
) = 0.081, p = 0.511), suggesting that the observed correlation depends on the influence of other variables. This finding was further supported by the w12 parameter estimates of the three covariates, with the estimates of
${\widehat{\tau}}_{11}=0.624$
,
${\widehat{\tau}}_{12}=0.206$
, and
${\widehat{\tau}}_{13}=0.046$
. Hence, the consistency of answers submitted by team members and the frequency of communication between them positively predicted the quality of teamwork. However, the waiting time between the two members did not play a significant role in predicting teamwork quality when considered alongside the former two factors. In other words, while waiting time alone was significantly negatively correlated with teamwork quality, its effect diminished when the other two variables were taken into account.

Figure 9 Summary of parameter estimates in the empirical study. Note: (a) Item parameter estimates, g: guessing parameter, s: slipping parameter. (b) Mixing proportions of individual attribute patterns. (c) Correlation heatmap, τ: teamwork-effect estimates; α: total number of attributes mastered by the team; z 1: the number of teamwork items answered consistently by two team members; z 2: the frequency of communication between team members; z 3: the time difference between two team members submitting their responses to teamwork items. (d) Partial correlation heatmap; *0.01 < p < 0.05; **0.001 < p < 0.01; ***p < 0.001.
To showcase the diagnostic capabilities of the proposed models, Table 4 presents seven illustrative teams deliberately chosen to highlight distinct and characteristic outcomes, thereby facilitating a clearer understanding of the model outputs. When the total score of unified responses does not exceed the sum of individual responses from the two members, the quality of teamwork is typically poor. Additionally, greater consistency in the responses submitted by team members, more frequent communication, and shorter waiting times between them are generally associated with higher teamwork quality, supporting the earlier findings. Further analysis of the teams reveals additional insights. Team 1, composed of two high-level members, achieved perfect scores on both individual and teamwork items, supported by efficient communication with below-average frequency and waiting times. Similarly, Team 2 also comprised high-level members, but their unified response score was lower than their individual scores, despite identical answers. This suggests effective communication but potential biases in their shared understanding of the items, hindering their collaborative performance. In contrast, Team 7, consisting of two low-level members, lacked the shared mastery of attributes necessary for team cognition, making it difficult to perform on teamwork items despite communication efforts. Team 15 provides an example of high-level members demonstrating poor teamwork, marked by minimal communication and significantly above-average waiting times, indicating near-independent work on teamwork items. Teams 23 and 48, featuring one high-level and one low-level member, showed divergent outcomes. Team 48 achieved ultra-short waiting times and perfect collaborative scores through frequent communication, while the high-level member in Team 23 failed to lead effectively, resulting in no substantial improvement in overall performance. Finally, Team 50 exemplifies how frequent communication can enable a team to exceed the performance levels of its individual members, demonstrating the potential for teamwork to enhance collective outcomes. These examples illustrate that team performance depends not only on the attributes mastered by members but also on the quality of their teamwork. Furthermore, they underscore the proposed models’ ability to identify specific factors influencing team performance.
Table 4 Information about seven example teams

Note:
${\boldsymbol{\unicode{x3b1}}}_{\mathrm{A}}$
and
${\boldsymbol{\unicode{x3b1}}}_{\mathrm{B}}$
: attribute patterns for members A and B; τ: teamwork-effect estimates; α: total number of attributes mastered by the team; z
1: the number of teamwork items answered consistently by two team members; z
2: the frequency of communication between team members; z
3: the time difference between two team members submitting their responses to teamwork items; ScoreIndividual: total score of individuals on individual response items; ScoreUnified: total score of a team on collaborative response items.
Overall, the results of the empirical study validate the model setup, confirm the interpretability of the model parameters, and underscore the benefits of incorporating teamwork-effect parameters.
5 Discussion
This study proposed 12 Team-CDMs to diagnose team members’ cognitive attributes and assess teamwork quality during collaborative tasks. These models effectively identify whether poor team performance arises from individual knowledge or skill deficiencies or low-quality collaboration, such as ineffective communication. By highlighting specific areas for improvement, the proposed approach offers valuable insights for optimizing team-building strategies and implementing targeted interventions to enhance team cognition and overall performance.
Two simulation studies were conducted. Simulation Study 1 demonstrated the robustness and effectiveness of the models in recovering item and teamwork-related parameters. Classification accuracy for cognitive attributes and attribute patterns was consistently high across all models, confirming their suitability for teamwork-based diagnostic assessments. Explanatory models incorporating covariates like communication frequency and response consistency outperformed standard models, while models using the teamwork-separate mode generally achieved better parameter recovery than those employing the teamwork-unified mode. Increasing the number of items significantly improved parameter recovery and classification accuracy, underscoring the importance of sufficient data volume. Additive interaction mechanisms also showed a slight edge in recovery performance. Simulation Study 2 highlighted the necessity of integrating both teamwork-effect and interaction mechanisms for accurate modeling of team cognition. Unconstrained models achieved the highest classification accuracy and parameter recovery, while models that excluded teamwork quality or team cognitions showed diminished performance, particularly for collaborative response items. Models ignoring whole team cognitions entirely performed better than those neglecting teamwork quality, emphasizing the latter’s greater impact. Furthermore, analyzing individual or teamwork response items in isolation resulted in suboptimal performance, reinforcing the need to incorporate both individual contributions and collaborative processes in teamwork tasks. Finally, a teamwork reasoning task illustrated the application of the models. The results indicated that team cognition significantly impacts teamwork quality, with the consistency of responses and frequency of communication positively predicting performance. Waiting time between members, however, did not play a significant role when considered alongside the other two factors. The models successfully identified specific reasons for poor team performance, such as insufficient knowledge or ineffective collaboration, demonstrating their practical utility in real-world assessments.
Despite promising results, further research is needed to address several limitations. First, this study used a constrained CDM—the LLM—as the base model for illustration. While extending it to a generalized CDM without specific condensation rules poses no theoretical challenges, the psychometric performance of such extensions requires further investigation.
Second, the study focused on dyadic teams, simplifying the modeling process but limiting the applicability of the proposed models to broader teamwork scenarios. Future research should explore diagnostic data from teams with three or more members to expand the utility of these models.
Third, while this study provides insights into measuring social/collaborative abilities through the teamwork-effect parameter (i.e., team-level random-effect parameter or teamwork covariates), the data analyzed were outcome-based (i.e., item response accuracy). Observed collaborative behaviors, documented through speech, were not modeled. Future research could integrate such social process data into the Team-CDM framework for a more direct assessment of social abilities.
Fourth, team cognition in this study was constructed using three interaction mechanisms—disjunctive, additive, and conjunctive. While the results demonstrate that the proposed models exhibit good psychometric properties, these mechanisms inevitably simplify the complexities inherent in cognitive interactions among team members. Therefore, future research should further investigate the underlying mechanisms of team cognition formation and develop more nuanced models of cognitive interaction to more accurately capture and interpret the interdependence among team members.
Fifth, the proposed models assume that all cognitive attributes contribute to team cognition through a uniform interaction mechanism. However, in practice, different cognitive attributes may follow distinct interaction mechanisms in forming team cognition. For example, one attribute may follow a disjunctive mechanism, while another attribute follows an additive one. In future work, it would be valuable to develop a generalized interaction framework that allows each cognitive attribute to follow a potentially different interaction mechanism.
Sixth, the proposed models under the teamwork-unified response mode assume equal contributions from both team members. However, in real-world scenarios, one member may dominate the decision-making process. Therefore, it would be valuable to collect data that capture the relative influence of individual members and to extend the model to account for unequal contributions to team decision-making.
Seventh, to avoid overloading the results with too much information, certain issues that may have affected the study’s outcomes were simplified. For example, the types and levels of independent variables (e.g., item parameters) manipulated were insufficiently diverse. Hence, future research directions could prioritize exploring the performance of new models based on richer manipulated variables in order to provide more theoretical guidance for practical applications.
Eighth, with the rise of technology-enhanced assessments, the use of multimodal data—such as response time (Sinharay & Johnson, Reference Sinharay and Johnson2020; Zhan, Chen, Wang, et al., Reference Zhan, Chen, Wang and Zhang2024), eye-tracking (Liu et al., Reference Liu, Zhan, Fu, Chen, Man and Luo2023; Man & Harring, Reference Man and Harring2021; Zhan et al., Reference Zhan, Man, Wind and Malone2022), brain activations (Jeon et al., Reference Jeon, De Boeck, Luo, Li and Lu2021), and action sequences (Fu et al., Reference Fu, Zhan, Chen and Jiao2024; Han et al., Reference Han, Liu and Ji2022)—is becoming increasingly feasible. Future efforts should integrate multimodal data into teamwork assessments to provide more accurate diagnoses and comprehensive feedback.
Nineth, with the rise of AI agents based on large language models (e.g., ChatGPT), researchers are investigating their impact on teamwork, particularly in human–machine or human–AI collaboration (Burton et al., Reference Burton, Lopez-Lopez, Hechtlinger, Rahwan, Aeschbach and Lorenz-Spreen2024). Whether AI agents should be treated as independent team members or merely tools to enhance efficiency will shape future teamwork assessment research.
Lastly, Bayesian estimation depends heavily on prior distributions, which reflect analysts’ assumptions. This study adopted priors from existing research without comparing their effects on parameter estimation. Given that prior distributions can strongly influence results in small-scale projects, future studies should carefully select priors based on context-specific considerations, rather than simply replicating those used in this study.
In summary, this study represents a preliminary attempt in the domain of teamwork cognitive diagnosis. Despite its limitations, we hope that it serves as a catalyst for further exploration and encourages more researchers to engage with this emerging area of inquiry.
Supplementary material
To view supplementary material for this article, please visit http://doi.org/10.1017/psy.2025.10036.
Data availability statement
The datasets generated and/or analyzed during this study are not publicly available due to privacy restrictions, but they are available from the corresponding author upon reasonable request.
Author contributions
Conceptualization: P.Z.; Data curation: Z.W., G.C., H.Q.; Formal analysis: P.Z.; Funding acquisition: P.Z.; Investigation: G.C., H.Q.; Project administration: P.Z.; Software: Z.W.; Supervision: P.Z.; Writing—original draft: P.Z.; Writing—review and editing: P.Z.
Funding statement
This study was supported by the Ministry of Education (MOE) in China Project of Humanities and Social Sciences (Grant No. 24YJA190019) and the Zhejiang Provincial Philosophy and Social Sciences Planning Leading Talents Training Project (Grant No. 25QNYC010ZD).
Competing interests
The authors declared no potential competing interests with respect to the research, authorship, and/or publication of this article.
Ethical standards
The ethical approval was acquired from the Ethics Committee of Zhejiang Normal University (ZSRT2023020).
Informed consent
Informed consent was obtained from all individual participants included in the study.