Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-12T05:23:35.815Z Has data issue: false hasContentIssue false

A Latent Hidden Markov Model for Process Data

Published online by Cambridge University Press:  01 January 2025

Xueying Tang*
Affiliation:
University of Arizona
*
Correspondence should be made to Xueying Tang, University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ 85721, USA. Email: xytang@math.arizona.edu
Rights & Permissions [Opens in a new window]

Abstract

Response process data from computer-based problem-solving items describe respondents’ problem-solving processes as sequences of actions. Such data provide a valuable source for understanding respondents’ problem-solving behaviors. Recently, data-driven feature extraction methods have been developed to compress the information in unstructured process data into relatively low-dimensional features. Although the extracted features can be used as covariates in regression or other models to understand respondents’ response behaviors, the results are often not easy to interpret since the relationship between the extracted features, and the original response process is often not explicitly defined. In this paper, we propose a statistical model for describing response processes and how they vary across respondents. The proposed model assumes a response process follows a hidden Markov model given the respondent’s latent traits. The structure of hidden Markov models resembles problem-solving processes, with the hidden states interpreted as problem-solving subtasks or stages. Incorporating the latent traits in hidden Markov models enables us to characterize the heterogeneity of response processes across respondents in a parsimonious and interpretable way. We demonstrate the performance of the proposed model through simulation experiments and case studies of PISA process data.

Type
Theory and Methods
Copyright
Copyright © 2023 The Author(s), under exclusive licence to The Psychometric Society.

Problem-solving is one of the most important skills in the 21st century (Binkley et al., Reference Binkley, Erstad, Herman, Raizen, Ripley, Ricci and Rumble2012). Evaluation of problem-solving competency has gained increasing popularity in large-scale assessments. For example, Problem-Solving in Technology-Rich Environments (PSTRE) was one of the three domains of the first cycle of the Programme of the International Assessment of Adult Competencies (PIAAC). In the Programme of International Student Assessment (PISA), surveys were conducted to measure students’ problem-solving and collaborative problem-solving skills in 2012 and 2015, respectively. In these assessments, respondents’ skills are measured by computer-based interactive items where respondents are required to fulfill a real-life task through an interface in a simulated computer environment. The use of computers in these items not only facilitates the simulation of problem-solving scenarios, but also enables more comprehensive data collection. When a respondent solves an interactive item, computer log files keep track of the actions taken by the respondent within the interface (e.g., mouse clicks and keyboard inputs). After the completion of the item, the recorded action sequence allows us to reproduce the main process of solving the item and are thus called response process data.

Compared to traditional dichotomous or polytomous item responses that only record the final response outcomes, process data contain detailed information on how respondents solved an item. These data not only record whether the item was answered correctly or incorrectly, but also demonstrate how the answer was reached. Thus, process data provide a valuable source for understanding respondents’ problem-solving behaviors and improving current psychometric practice. Recent studies have shown that process data are useful for accurate latent trait assessment (Zhang et al., Reference Zhang, Wang, Qi, Liu and Ying2023), comprehensive diagnostic classification (Liang et al., Reference Liang, Tu and Cai2022; Zhan and Qiao, Reference Zhan and Qiao2022), early detection of task failure (Ulitzsch et al., Reference Ulitzsch, Ulitzsch, He and Lüdtke2022b), and problem-solving strategy analysis (Wang et al., Reference Wang, Tang, Liu and Ying2022).

The rich information in response process data comes with difficulties in data analysis. First of all, process data have a nonstandard format. The traditional item response of an item is a single variable. In contrast, the corresponding response process is a sequence of categorical variables (actions), and the number of categories (unique actions) is often a few dozen or more. Moreover, the length of the sequence varies greatly across respondents. The nonstandard data format prevents the direct application of many conventional item response models such as item response theory models (Lord, Reference Lord1980) and cognitive diagnostic models (Rupp et al., Reference Rupp, Templin and Henson2010) to process data. Besides, the dependence structure among actions in a response process is often complicated. Usually, the action to be taken in the next step depends on not only the current action, but also actions taken several steps back, making simple models such as first-order Markov models insufficient for characterizing process data. The unprecedented opportunities and challenges in analyzing response process data call for innovative methods for utilizing the rich behavioral information in the data.

Feature extraction is a common approach to circumventing the difficulties brought by the nonstandard format of response processes. In this approach, a fixed number of features are extracted for each process and then fed to regression or other supervised learning methods to explore the relationship between response processes and other variables of interest. For example, Chen et al. (Reference Chen, Li, Liu and Ying2019a) and Ulitzsch et al. (Reference Ulitzsch, Ulitzsch, He and Lüdtke2022b) used a collection of summary statistics of response processes to predict the final response outcome to achieve early detection of failure. He and von Davier (Reference He, Davier and Rosen2016) and Stadler et al. (Reference Stadler, Fischer and Greiff2019) identified the differences in the behavior patterns among various groups of respondents with the help of n-gram features. Zhang et al. (Reference Zhang, Wang, Qi, Liu and Ying2023) developed methods for obtaining accurate latent trait assessment through process features.

Traditionally, process features are summary statistics of a response process such as sequence length, action counts, and item response time (Chen et al., Reference Chen, Li, Liu and Ying2019a; Ulitzsch et al., Reference Ulitzsch, Ulitzsch, He and Lüdtke2022b). Features derived from existing cognitive theory are also used in top-down research studies (Greiff et al., Reference Greiff, Niepel, Scherer and Martin2016; von Davier et al., Reference von Davier, Khorramdel, He, Shin and Chen2019). Such features usually do not comprehensively summarize response processes and thus may overlook important information for understanding respondents’ problem-solving behaviors. Moreover, theory-based features are often item-specific and time-consuming to construct if a wide range of items are available. Recently, data-driven feature extraction methods (Tang et al., Reference Tang, Wang, He, Liu and Ying2020, Reference Tang, Wang, Liu and Ying2021) have been developed. These methods do not require substantial prior knowledge about the item. They, with the help of machine learning and statistical tools, compress as much information in response processes into the extracted features as possible. However, the relationship between the extracted features and the original response process is often not explicitly defined. As a result, although the extracted features can be easily incorporated in psychometrically meaningful machine learning tasks and often lead to performance improvement, identifying the characteristics of response processes that contribute to the improvement requires additional careful examination.

In this paper, we take a different route to analyzing process data. We propose a latent hidden Markov model (LHMM) to describe respondents’ problem-solving processes directly. The proposed model offers three distinct features that set it apart from previous models. First, it naturally accounts for the long-term dependence among actions by adopting hidden Markov models as the basic framework for characterizing response processes. Second, the hidden states in HMMs can be interpreted as problem-solving states or subtasks, providing a detailed understanding of response processes from a subtask perspective (Wang et al., Reference Wang, Tang, Liu and Ying2022). Third, a latent trait variable is introduced in HMM to characterize the respondents’ heterogeneous behaviors in transitioning between subtasks and completing a subtask parsimoniously. Compared to the analyses based on feature extraction methods, the proposed model directly links respondents’ latent traits to their problem-solving behaviors, providing more straightforward and interpretable results.

Although several models have been recently proposed to characterize response processes (Chen, Reference Chen2020; Han et al., Reference Han, Liu and Ji2021; Xiao et al., Reference Xiao, He, Veldkamp and Liu2021; Xu et al., Reference Xu, Fang and Ying2020), none of them possesses all three features of the proposed model. Xiao et al. (Reference Xiao, He, Veldkamp and Liu2021) also used HMMs to describe response processes, but they did not take into account respondents’ differences in problem-solving processes. Xu et al. (Reference Xu, Fang and Ying2020) proposed a latent topic model with Markovian transition for process data. This model is essentially an HMM with personalized state transition probability matrices but a shared state-action probability matrix. However, since their model does not use latent trait variables to parameterize the personalized state transition probability matrices, it is difficult to visualize the differences in the response behaviors. Without considering the long-term dependence among actions, Han et al. (Reference Han, Liu and Ji2021) proposed a sequential response model which assumes response processes follow a first-order Markov model after conditioning on the latent trait. Finally, Chen (Reference Chen2020) developed a dynamic choice measurement model for process data based on marked point processes. This model takes into account both the heterogeneity of respondents’ behaviors and the complex action dependence structure. However, it requires a significant amount of expert input to specify the dependence structure.

The rest of the paper is organized as follows. We first present an example of problem-solving items and process data in Sect. 1 to motivate the proposed latent hidden Markov model described in Sect. 2. In Sect. 3, we explain how the parameters and latent variables in the proposed model are estimated. Case studies of process data from two problem-solving items in PISA 2012 are presented in Sect. 4 to demonstrate the performance of the proposed model. Simulation studies examining the performance of statistical inference in different scenarios are presented in Sect. 5. We conclude with final remarks in Sect.6.

1. An Example of Problem-Solving Items and Process Data

Figure. 1 Interface of the climate control item in PISA 2012.

In this section, we describe the climate control (CC) item in PISA 2012 as an example of items producing process data. This item is one of the 42 items in the survey for assessing students’ problem-solving skills. In the CC item, students are asked to figure out how to use a new air conditioner to control room climate. The item interface, presented in Fig. 1, includes an air-conditioner with three control bars. Each bar, controlling either the temperature or the humidity of the room, can be placed at five different positions, “ - - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$--$$\end{document} ” ( - 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-2$$\end{document} ), “−” ( - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} ), default position (0), “ + \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document} ” (1), and “ + + \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$++$$\end{document} ” (2). Students are required to determine which climate variable that each bar influences by conducting experiments through the interface. They can slide the bars and click the APPLY button to read the humidity and temperature under the current setting from the charts in the interface. Based on the results, they need to match the control bars with the two climate variables.

When students solve the item, the experiments they conduct and the buttons they click are recorded sequentially in the computer log files. The CC item has 126 recorded actions, including one action of clicking the RESET button and 125 ( = 5 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=5^3$$\end{document} ) actions describing the setting of an experiment conducted by a student. Each of the experiment setting actions describes the positions of the three control bars when the APPLY button is clicked. For example, action “ - 1 _ 2 _ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1\_2\_0$$\end{document} ” means a student places the top, middle, and bottom controls at “−,” “ + + \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$++$$\end{document} ,” and the default position, respectively, and then clicks the APPLY button. A recorded sequence describes the entire response of a student and is one observation of process data. For example, the sequence “ 1 _ 0 _ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\_0\_0$$\end{document} , RESET, 0 _ 0 _ - 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0\_0\_-2$$\end{document} ” shows that the student conducted two experiments when solving the item. In the first experiment, the top control was moved to “+,” followed by a click of the APPLY button. Then, the positions of the three controls were reset by a click of the RESET button. Finally, the bottom control was moved to “ - - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$--$$\end{document} ,” and the APPLY button was clicked to conduct the second experiment. In this paper, we present a statistical model describing such detailed response processes of a group of respondents.

2. Model Description

In this section, we describe a latent hidden Markov model (LHMM) for response process data. We use Y = ( Y 1 , , Y T ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y} = (Y_1, \ldots , Y_T)$$\end{document} to denote a generic response process (action sequence) of an item. For t = 1 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1, \ldots , T$$\end{document} , the t-th element Y t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_t$$\end{document} of Y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}$$\end{document} is a random variable denoting the t-th action taken in the process and it takes value from the action set A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}}$$\end{document} containing M possible actions of the item. For simplicity, the actions in A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}}$$\end{document} are represented by 1 , , M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1, \ldots , M$$\end{document} . We use y = ( y 1 , , y T ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{y} = (y_1, \ldots , y_T)$$\end{document} to represent a realization of Y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}$$\end{document} . Given a sequence x = ( x 1 , , x T ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x} = (x_1, \ldots , x_T)$$\end{document} and indices i < j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i < j$$\end{document} , we use x i : j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{i:j}$$\end{document} to denote the subsequence of x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}$$\end{document} consisting of x i , x i + 1 , , x j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_i, x_{i+1}, \ldots , x_j$$\end{document} . In the following, we first describe the idea of using HMMs as a framework for modeling response processes in Sect. 2.1. Then, in Sect. 2.2, latent trait variables are introduced to HMMs to account for individual differences in response processes.

2.1. Hidden Markov Models

Under an HMM, the evolution of a response process Y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}$$\end{document} is determined by a hidden state sequence S = ( S 1 , , S T ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S} = (S_1, \ldots , S_T)$$\end{document} that evolves according to a first-order Markov model. Each element in S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}$$\end{document} indicates the hidden state of the step. It takes value from the set S = { 1 , 2 , , K } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {S}} = \{1, 2, \ldots , K\}$$\end{document} containing K possible hidden states. In HMMs, the evolution of the action sequence and hidden states is characterized by three groups of parameters, a K-dimensional initial state probability vector π = ( π 1 , , π K ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }= (\pi _1, \ldots , \pi _K)$$\end{document} , a K × K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K \times K$$\end{document} state transition probability matrix P = ( p kl ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P} = (p_{kl})$$\end{document} describing the evolution of the hidden states, and a K × M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K \times M$$\end{document} state-action probability matrix Q = ( q kj ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q} = (q_{kj})$$\end{document} describing the action distributions under different hidden states. Let s = ( s 1 , , s T ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{s} = (s_1, \ldots , s_T)$$\end{document} be a realization of S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}$$\end{document} . In the initial step, the hidden state S 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_1$$\end{document} follows the initial state probability distribution π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} with

(1) P ( S 1 = s 1 ) = π s 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(S_1 = s_1) = \pi _{s_1}. \end{aligned}$$\end{document}

Given S 1 = s 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_1 = s_1$$\end{document} , the distribution of the initial action Y 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_1$$\end{document} is determined by s 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_1$$\end{document} and Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} as

(2) P ( Y 1 = y 1 S 1 = s 1 ) = q s 1 , y 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(Y_1 = y_1 \mid S_1 = s_1) = q_{s_1, y_1}. \end{aligned}$$\end{document}

For t = 2 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t =2, \ldots , T$$\end{document} , given all previous hidden states S 1 : ( t - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}_{1:(t-1)}$$\end{document} and all previous actions Y 1 : ( t - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{1:(t-1)}$$\end{document} , the distribution of hidden state S t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_t$$\end{document} depends only on S t - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{t-1}$$\end{document} as

(3) P ( S t = s t S 1 : ( t - 1 ) = s 1 : ( t - 1 ) , Y 1 : ( t - 1 ) = y 1 : ( t - 1 ) ) = P ( S t = s t S t - 1 = s t - 1 ) = p s t - 1 , s t . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(S_{t} = s_t \mid \varvec{S}_{1:(t-1)} = \varvec{s}_{1:(t-1)}, \varvec{Y}_{1:(t-1)} = \varvec{y}_{1:(t-1)}) = P(S_{t} = s_t \mid S_{t-1} = s_{t-1}) = p_{s_{t-1}, s_t}.\nonumber \\ \end{aligned}$$\end{document}

Given all previous actions Y 1 : ( t - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}_{1:(t-1)}$$\end{document} and all hidden states up to step t, action Y t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_t$$\end{document} is determined by S t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_t$$\end{document} only as

(4) P ( Y t = y t Y 1 : ( t - 1 ) = y 1 : ( t - 1 ) , S 1 : t = s 1 : t ) = P ( Y t = y t S t = s t ) = q s t , y t . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(Y_{t} = y_t \mid \varvec{Y}_{1:(t-1)} = \varvec{y}_{1:(t-1)}, \varvec{S}_{1:t} = \varvec{s}_{1:t}) = P(Y_{t} = y_t \mid S_t = s_t) = q_{s_t, y_t}. \end{aligned}$$\end{document}

The structure of HMMs is demonstrated in the left panel of Fig. 2.

Figure. 2 Structure of HMM (left) and LHMM (right).

The structure formed by the observed sequence (action sequence) and the hidden state sequence in HMMs resembles problem-solving processes. Solving a complex task often involves completing a few simpler subtasks or going through several problem-solving stages. For example, solving the CC item described in Sect. 1 could involve two stages. After the item starts, students may enter an exploration stage in which they explore the interface and try to understand the effects of clicking buttons and sliding control bars. Once they become familiar with the interface, an efficient problem-solving stage may start. The role of problem-solving stages or subtasks in response processes is similar to that of the hidden states in HMMs. In HMMs, the hidden states evolve according to a Markov model and decide the action distribution at each step. In a response process, the stage or subtask changes as the process progresses, and how actions are taken varies across different stages or subtasks. In the efficient problem-solving stage of the CC item, students are more likely to take actions that are directly related to identifying the climate variable associated with each control bar. For example, they may conduct experiments with only one bar moved away from the default position, making actions such as “ - 1 _ 0 _ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1\_0\_0$$\end{document} ,” “0_2_0” and “0_0_1” more likely be taken. Such pattern may not exist in the exploration stage. With the analogy between hidden states and problem-solving stages, the state transition probability matrix P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} describes how respondents move between the stages, and the state-action probability matrix Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} describes how actions are used differently in different stages.

Using HMMs for modeling response processes also enables us to incorporate long-term dependence among actions in a response process. Actions in a response process are not taken independently. The action that a respondent would take at a given step could depend on not only the action taken in the step immediately before the current one, but also the actions taken several steps back. Although the strong conditional independence assumptions (3) and (4) imposed on the hidden states and actions limit the complexity of the dependence structure that HMMs could characterize, we will see later in Sect. 4 that such simplified structure can reasonably describe response processes in our case studies and that the hidden states can indeed be interpreted as problem-solving stages.

2.2. Latent Hidden Markov Models

The above HMM representation of action sequences does not explicitly account for individual differences in problem-solving. If two respondents took the same actions in steps 1 to t, then, under HMMs, the distributions of the action at step t + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t+1$$\end{document} would be the same for the two respondents. The difference in the actual actions taken at this step is purely explained by randomness. However, respondents usually behave differently in solving problems. For instance, in the CC item, students with more advanced problem-solving skills may have a higher chance to enter the efficient problem-solving stage. They may also start the response processes in the efficient problem-solving stage without going through the exploration stage. These differences lead to different initial probability vectors, state transition probability matrices, and action emission probability matrices for different respondents in the framework of HMMs. To incorporate such individual variation in problem-solving, we introduce an additional latent variable in HMM, producing a latent HMM (LHMM), which is described in detail below.

Let θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} denote a uni-dimensional variable describing a latent trait of a respondent. Throughout this paper, we assume θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} follows the standard normal distribution for simplicity. Under LHMM, given θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , the response process follows the HMM whose parameters π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} , P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} , and Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} are further parametrized as follows. For initial state probabilities,

(5) π k ( θ ) = P ( S 1 = k θ ) = 1 1 + k = 2 K exp ( τ k θ + μ k ) for k = 1 ; exp ( τ k θ + μ k ) 1 + k = 2 K exp ( τ k θ + μ k ) for k = 2 , , K . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \pi _{k}(\theta ) = P(S_{1} = k \mid \theta ) =\left\{ \begin{array}{ll} \frac{1}{1 + \sum _{k'=2}^K \exp (\tau _{k'}\theta + \mu _{k'})} &{} \text {for } k = 1;\\ \frac{\exp (\tau _{k}\theta + \mu _{k})}{1 + \sum _{k'=2}^K \exp (\tau _{k'}\theta + \mu _{k'})} &{} \text {for } k = 2, \ldots , K.\\ \end{array}\right. \end{aligned}$$\end{document}

For k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} , the probability of a hidden state jumping from state k to state l is

(6) p kl ( θ ) = P ( S t + 1 = l S t = k , θ ) = 1 1 + l = 2 K exp ( a k l θ + b k l ) for l = 1 ; exp ( a kl θ + b kl ) 1 + l = 2 K exp ( a k l θ + b k l ) for l = 2 , , K ; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{kl}(\theta ) = P(S_{t+1} = l\mid S_t = k, \theta ) =\left\{ \begin{array}{ll} \frac{1}{1 + \sum _{l'=2}^K \exp (a_{kl'}\theta + b_{kl'})} &{} \text {for } l = 1;\\ \frac{\exp (a_{kl}\theta + b_{kl})}{1 + \sum _{l'=2}^K \exp (a_{kl'}\theta + b_{kl'})} &{} \text {for } l = 2, \ldots , K;\\ \end{array}\right. \end{aligned}$$\end{document}

and the probability of taking action j in state k is

(7) q kj ( θ ) = P ( Y t = j S t = k , θ ) = 1 1 + j = 2 M exp ( c k j θ + d k j ) for j = 1 ; exp ( c kj θ + d kj ) 1 + j = 2 M exp ( c k j θ + d k j ) for j = 2 , , M . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q_{kj}(\theta ) = P(Y_{t} = j \mid S_t = k, \theta ) = \left\{ \begin{array}{ll} \frac{1}{1 + \sum _{j'=2}^M \exp (c_{kj'}\theta + d_{kj'})} &{} \text {for } j = 1;\\ \frac{\exp (c_{kj}\theta + d_{kj})}{1 + \sum _{j'=2}^M \exp (c_{kj'}\theta + d_{kj'})} &{} \text {for } j = 2, \ldots , M.\\ \end{array}\right. \end{aligned}$$\end{document}

In other words, given θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , the initial hidden states S 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_1$$\end{document} , other hidden states S t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_t$$\end{document} given the previous hidden states S t - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{t-1}$$\end{document} , and the action Y t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_t$$\end{document} given the current state S t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_t$$\end{document} all follow multinomial logistic models (MLM; McCullagh & Nelder, Reference McCullagh and Nelder2018) with θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} as the covariate and state 1 or action 1 as the baseline category. The structure of the model is depicted in the right panel of Fig. 2.

In (5), (6) and (7), τ k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _k$$\end{document} , μ k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _k$$\end{document} , a kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{kl}$$\end{document} , b kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{kl}$$\end{document} , c kj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{kj}$$\end{document} , d kj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{kj}$$\end{document} are real-valued parameters. We write μ = ( μ 2 , , μ K ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }= (\mu _2, \ldots , \mu _K)$$\end{document} , τ = ( τ 2 , , τ K ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }= (\tau _2, \ldots , \tau _K)$$\end{document} , A = ( a kl ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A} = (a_{kl})$$\end{document} , B = ( b kl ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{B} = (b_{kl})$$\end{document} , C = ( c kj ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C} = (c_{kj})$$\end{document} , and D = ( d kj ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}=(d_{kj})$$\end{document} as compact notation of the parameters where A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document} and B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{B}$$\end{document} are K × ( K - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K \times (K-1)$$\end{document} matrices and C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} and D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} are K × ( M - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K \times (M-1)$$\end{document} matrices. In addition, we also use η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} to denote the vector collecting all parameters ( μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }$$\end{document} , τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} , A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document} , B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{B}$$\end{document} , C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} , and D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} ) in LHMM.

Under the model for the state transition probability in (6), given S t = k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_t = k$$\end{document} , the log odds of jumping to state l versus state 1 at step t + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t+1$$\end{document} is a kl θ + b kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{kl}\theta + b_{kl}$$\end{document} . Parameter b kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{kl}$$\end{document} is the log odds when θ = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = 0$$\end{document} . Parameter a kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{kl}$$\end{document} controls how sensitive the log odds are to the change of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . If a kl > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{kl} > 0$$\end{document} , a larger θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} leads to higher odds of jumping to state l against state 1. If a kl < 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{kl} < 0$$\end{document} , a larger θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} leads to lower odds. In the CC item, if θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} represents students’ problem-solving skill proficiency, we may expect the odds of jumping from the exploration state to the efficient problem-solving state against staying in the exploration state to increase as θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} increases. If the exploration stage is the baseline state (State 1), then a 12 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{12}$$\end{document} is expected to be positive. If a kl = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{kl} = 0$$\end{document} for l = 2 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l = 2, \ldots , K$$\end{document} , then the state transition distribution for state k does not depend on θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and is completely determined by b kl , l = 2 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{kl}, l = 2, \ldots , K$$\end{document} . Parameters τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} and μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }$$\end{document} for modeling the initial state probabilities in (5) and parameters c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{c}$$\end{document} and d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{d}$$\end{document} for modeling the state-action probabilities in (7) can be interpreted similarly.

3. Statistical Inference

In this section, we describe how to estimate the parameters η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , latent trait θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , and hidden states S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}$$\end{document} in LHMM when a set of response processes from different respondents is available. We distinguish response processes and hidden state sequences from different respondents through superscripts. For example, Y ( i ) = ( Y 1 ( i ) , , Y T i ( i ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}^{(i)} = (Y^{(i)}_1, \ldots , Y^{(i)}_{T_i})$$\end{document} is the response process of respondent i and S ( i ) = ( s 1 ( i ) , , s T i ( i ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{(i)} = (s^{(i)}_1, \ldots , s^{(i)}_{T_i})$$\end{document} is the corresponding (unobserved) hidden state sequence. The latent trait of respondent i is denoted by θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} . The set Y n = { y ( 1 ) , , y ( n ) } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}_n = \{\varvec{y}^{(1)}, \ldots , \varvec{y}^{(n)}\}$$\end{document} collects n observed response processes from n respondents. In the following, we first give the likelihood function for LHMMs in Sect. 3.1 and then describe how to obtain the marginalized maximum likelihood estimator of the model parameters in Sect. 3.2. Latent trait estimation and hidden state estimation are then discussed in Sects. 3.3 and 3.4, respectively.

3.1. Likelihood Function

Since both θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} and the hidden state sequence S ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{(i)}$$\end{document} in LHMMs are unobservable, the (marginalized) likelihood function of η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} given the i-th observed response process Y ( i ) = y ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}^{(i)} = \varvec{y}^{(i)}$$\end{document} is

(8) L i ( η ) = P Y ( i ) = y ( i ) η = s ( i ) P Y ( i ) = y ( i ) , S ( i ) = s ( i ) , θ i η d θ i = ϕ ( θ i ) s ( i ) P Y ( i ) = y ( i ) , S ( i ) = s ( i ) θ i , η d θ i , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} L_i(\varvec{\eta }) = P\left(\varvec{Y}^{(i)} = \varvec{y}^{(i)} \mid \varvec{\eta }\right)&= \int \sum _{\varvec{s}^{(i)}}P\left(\varvec{Y}^{(i)} = \varvec{y}^{(i)}, \varvec{S}^{(i)} = \varvec{s}^{(i)}, \theta _i \mid \varvec{\eta }\right) d\theta _i \\&\quad = \int \phi (\theta _i) \sum _{\varvec{s}^{(i)}} P\left(\varvec{Y}^{(i)} = \varvec{y}^{(i)}, \varvec{S}^{(i)} = \varvec{s}^{(i)} \mid \theta _i, \varvec{\eta }\right) d\theta _i, \end{aligned} \end{aligned}$$\end{document}

where the summation is over all possible realization s ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{s}^{(i)}$$\end{document} of S ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{(i)}$$\end{document} and ϕ ( θ ) = 1 2 π e - θ 2 / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\theta ) = \frac{1}{\sqrt{2\pi }}e^{-\theta ^2/2}$$\end{document} is the probability density function of the standard normal distribution.

Given θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} and η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , the action sequence Y ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}^{(i)}$$\end{document} and the associated hidden state sequence S ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{(i)}$$\end{document} follow the HMM with the elements of parameters π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} , P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} , and Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} being π k ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _k(\theta _i)$$\end{document} , p kl ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{kl}(\theta _i)$$\end{document} , and q kj ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{kj}(\theta _i)$$\end{document} defined in (5), (6), and (7). Thus, according to assumptions (1) – (4),

(9) P ( Y ( i ) = y ( i ) , S ( i ) = s ( i ) θ i , η ) = π s 1 ( i ) ( i ) q s 1 ( i ) , y 1 ( i ) ( i ) t = 2 T i p s t - 1 ( i ) , s t ( i ) ( i ) q s t ( i ) , y t ( i ) ( i ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} P(\varvec{Y}^{(i)} = \varvec{y}^{(i)}, \varvec{S}^{(i)} = \varvec{s}^{(i)} \mid \theta _i, \varvec{\eta }) = \pi ^{(i)}_{s^{(i)}_1} q^{(i)}_{s^{(i)}_1, y^{(i)}_1} \prod _{t = 2}^{T_i} p^{(i)}_{s^{(i)}_{t-1}, s^{(i)}_t} q^{(i)}_{s^{(i)}_t, y^{(i)}_t}, \end{aligned} \end{aligned}$$\end{document}

where π k ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^{(i)}_k$$\end{document} , p kl ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^{(i)}_{kl}$$\end{document} and q kj ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q^{(i)}_{kj}$$\end{document} are short-hand notation for π k ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _k(\theta _i)$$\end{document} , p kl ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ p_{kl}(\theta _i)$$\end{document} and q kj ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{kj}(\theta _i)$$\end{document} , respectively. Combining (8) and (9), we obtain the (marginalized) likelihood function for a set of response processes Y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}_n$$\end{document} under LHMM as

(10) L ( η Y n ) = i = 1 n L i ( η ) = i = 1 n ϕ ( θ i ) s ( i ) π s 1 ( i ) ( i ) q s 1 ( i ) , y 1 ( i ) ( i ) t = 2 T i p s t - 1 ( i ) , s t ( i ) ( i ) q s t ( i ) , y t ( i ) ( i ) d θ i . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} L(\varvec{\eta }\mid {\mathcal {Y}}_n) = \prod _{i=1}^n L_i(\varvec{\eta }) = \prod _{i = 1}^n \left\{ \int \phi (\theta _i)\sum _{\varvec{s}^{(i)}}\pi ^{(i)}_{s^{(i)}_1} q^{(i)}_{s^{(i)}_1, y^{(i)}_1} \prod _{t = 2}^{T_i} p^{(i)}_{s^{(i)}_{t-1}, s^{(i)}_t} q^{(i)}_{s^{(i)}_t, y^{(i)}_t} d\theta _i\right\} . \end{aligned}$$\end{document}

The main difficulty in evaluation L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} at a given η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} is to compute the summation over s ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{s}^{(i)}$$\end{document} and the integration over θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} . Since the number of terms involved in the summation in L i ( η ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_i(\varvec{\eta })$$\end{document} is of order T i K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_i^K$$\end{document} , it is computationally burdensome to calculate the summation directly even for a sequence with a moderate length T i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_i$$\end{document} and a small number of hidden states K. A dynamic programming algorithm called forward–backward algorithm (Rabiner and Juang, Reference Rabiner and Juang1986) has been designed to compute the likelihood function for HMMs. It thus can be used to efficiently compute the summation over s ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{s}^{(i)}$$\end{document} in (10) for a given θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} . For the integration, although the integral does not have a closed-form expression, it can be computed numerically using Gauss–Hermite quadrature. The details of evaluating the likelihood function L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} are provided in Appendix A.

3.2. Parameter Estimation

We estimate the parameter vector η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} in LHMMs by its maximum likelihood estimator (MLE)

(11) η ^ = argmax η L ( η Y n ) = argmax η log L ( η Y n ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\varvec{\eta }} = \mathop {\textrm{argmax}}\limits _{\varvec{\eta }} L(\varvec{\eta }\mid {\mathcal {Y}}_n) = \mathop {\textrm{argmax}}\limits _{\varvec{\eta }} \log L(\varvec{\eta }\mid {\mathcal {Y}}_n). \end{aligned}$$\end{document}

Since the likelihood function L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} is differentiable and its gradient is computable, gradient-based optimization algorithms can be used to maximize the log-likelihood function log L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} . We choose the BFGS algorithm (Broyden, Reference Broyden1970; Fletcher, Reference Fletcher1970; Goldfarb, Reference Goldfarb1970; Shanno, Reference Shanno1970) to obtain η ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\eta }}$$\end{document} because of its fast convergence rate. Although the gradient of L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} can be numerically computed from L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} , supplying the exact gradient to the BFGS algorithm often leads to faster computation. The expressions of the gradient of L ( η Y n ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid {\mathcal {Y}}_n)$$\end{document} and how they can be evaluated using dynamic programming and Gauss–Hermite quadrature are given in Appendix B.

Expectation–Maximization (EM) algorithms (Dempster et al., Reference Dempster, Laird and Rubin1977) are often used to compute MLEs for latent variable models such as HMMs. An EM algorithm can be designed for LHMMs, but unlike the EM algorithm for HMM, the optimization problem in the maximization step for LHMMs does not have a closed-form solution, so BFGS or other numerical optimization algorithms are still needed. In addition, numerical integration and dynamic programming are also needed in the expectation step as in the evaluation of the likelihood function. Therefore, EM algorithms do not bring significant convenience in computing the MLE for LHHMs, and we directly maximize the log-likelihood function using the BFGS algorithm instead.

3.3. Latent Trait Estimation

Given the estimated parameters η ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\eta }}$$\end{document} , the latent trait θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} is often estimated by the maximum a posteriori (MAP) estimator or the expected a posteriori (EAP) estimator. We adopt the EAP estimator for estimating θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} because it can be easily computed from the intermediate results of computing η ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\eta }}$$\end{document} , whereas computing the MAP estimator requires optimizing the posterior density function of θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} for each i. More specifically, the EAP estimator of θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} is

(12) θ i ^ = E ( θ i | η ^ , y ( i ) ) = θ i s ( i ) P ( Y ( i ) = y ( i ) , S ( i ) = s ( i ) θ i , η ^ ) ϕ ( θ i ) d θ i s ( i ) P ( Y ( i ) = y ( i ) , S ( i ) = s ( i ) θ i , η ^ ) ϕ ( θ i ) d θ i . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\hat{\theta _i}} = E(\theta _i | \hat{\varvec{\eta }}, \varvec{y}^{(i)}) = \frac{\int \theta _i \sum _{\varvec{s}^{(i)}}P(\varvec{Y}^{(i)} = \varvec{y}^{(i)}, \varvec{S}^{(i)} = \varvec{s}^{(i)} \mid \theta _i, \hat{\varvec{\eta }}) \phi (\theta _i) d\theta _i}{\int \sum _{\varvec{s}^{(i)}}P(\varvec{Y}^{(i)} = \varvec{y}^{(i)}, \varvec{S}^{(i)} = \varvec{s}^{(i)} \mid \theta _i, \hat{\varvec{\eta }}) \phi (\theta _i) d\theta _i}. \end{aligned}$$\end{document}

The denominator in (12) is L i ( η ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_i(\hat{\varvec{\eta }})$$\end{document} . Its value is already computed in the final iteration of the optimization algorithm for obtaining η ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\eta }}$$\end{document} . Also, the numerator can be calculated using Gauss–Hermite quadrature. The components needed in the calculation (e.g., quadrature points, weights, and function values at the quadrature points) can be recycled from computing L i ( η ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_i(\hat{\varvec{\eta }})$$\end{document} since the integrand in the numerator is the integrand in the denominator multiplied by θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} .

3.4. Hidden State Estimation

Given the estimated parameters η ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\eta }}$$\end{document} and latent trait θ i ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta _i}}$$\end{document} , we estimate s ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{s}^{(i)}$$\end{document} by the most probable hidden state sequence for the action sequence y ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{y}^{(i)}$$\end{document} :

(13) s ^ ( i ) = argmax s ( i ) P ( Y ( i ) = y ( i ) , S ( i ) = s ( i ) θ i ^ , η ^ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\varvec{s}}^{(i)} = \mathop {\textrm{argmax}}\limits _{\varvec{s}^{(i)}} P(\varvec{Y}^{(i)} = \varvec{y}^{(i)}, \varvec{S}^{(i)} = \varvec{s}^{(i)} \mid {\hat{\theta _i}}, \hat{\varvec{\eta }}). \end{aligned}$$\end{document}

As the number of possible hidden state sequences is exponentially large, directly maximizing the probability is computationally expensive. Instead, in light of the connection between HMM and LHMM, we obtain s ^ ( i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{s}}^{(i)}$$\end{document} through the Viterbi algorithm (Viterbi, Reference Viterbi1967), a dynamic programing algorithm for finding the most probable hidden state sequence for HMM. Details of the algorithm are provided in Appendix C.

We implement the algorithms mentioned in Sects. 3.1, 3.2, 3.3, and 3.4 in R (R Core Team, 2023). The Gaussian quadrature points and weights are computed using the R package statmod (Giner et al., Reference Giner, Chen, Hu, Dunn, Phipson and Chen2023). The function optim() is used to maximize the marginalized likelihood function for LHMMs. The evaluation of the marginalized likelihood function is implemented using Rcpp (Eddelbuettel and François, Reference Eddelbuettel and François2011) to improve the computational speed.

4. Case Studies

In this section, we demonstrate how LHMM helps understand respondents’ problem-solving processes through two case studies of response processes from PISA 2012. We consider data from two items, the CC item described in Sect. 1 and the TICKET item to be introduced in Sect. 4.2. The two items represent two important types of interactive problem-solving items: MicroDYN system and finite state automata OECD (2014, Chapter 1).

4.1. Climate Control Item

In this section, we present the analysis of the CC response processes of 350 students from the USA. The lengths of the response processes range from 3 to 128, with the average being 21. More than 90% of the response processes contain fewer than 50 actions. As we described in Sect. 1, the item originally has 126 distinct actions. In this case study, we simplify the 125 experiment setting actions so that each action only reflects which bars are placed at a nonzero position at the time of clicking the APPLY button. The simplification leads to nine distinct actions in total: “RESET,” “None,” “Top,” “Middle,” “Bottom,” “Top_Middle,” “Top_Bottom,” “Middle_Bottom,” and “All.”

The CC item has been well-studied in the literature (Chen et al., Reference Chen, Li, Liu and Ying2019a; Greiff et al., Reference Greiff, Wüstenberg and Avvisati2015; Xu et al., Reference Xu, Fang and Ying2020). An efficient way of solving the item is to adopt the Varying-One-Thing-At-a-Time (VOTAT) strategy. The action sequence “Top, RESET, Bottom, RESET, Middle” is an example of response processes adopting this strategy. In this case study, with the help of the proposed LHMM, we examine how students’ problem-solving processes vary in terms of adopting the VOTAT strategy. We fit the LHMM with K = 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2$$\end{document} . The choice of K is made based on our understanding of the problem-solving stages involved in the item (Sect. 2.1). Also, the CC item belongs to MicroDYN systems, which typically contains two phases: the knowledge acquisition phase in which students collect information about the required task, and the knowledge application phase in which students solve the required task by applying the acquired knowledge (Herborn et al., Reference Herborn, Mustafić and Greiff2017). Our choice seems also reasonable from this perspective. We will discuss more about data-driven selection of K in Sect. 4.3. For simplicity, we ignore the individual difference in the initial state probability vector π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} by setting τ = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }= 0$$\end{document} . As a comparison to LHMM, we also fit the HMM with two hidden states to the response processes. The results are presented below.

4.1.1. Comparison Between LHMM and HMM Fits

Since HMM is a special case of LHMM with τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} , a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{a}$$\end{document} , and c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{c}$$\end{document} being zeros, we compare the goodness of fit of the two models using the likelihood ratio test. The results ( χ 2 ( 18 ) = 2394.0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2(18) = 2394.0$$\end{document} , p-value < 0.001 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<0.001$$\end{document} ) show that the proposed LHMM provided a better fit than the standard HMM. Also, the Bayesian Information Criterion (BIC) values of HMM and LHMM are 25020.7 and 22786.5, respectively. The results again support incorporating the latent variable θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} to explain the heterogeneity of response processes. We focus on the results from LHMM in the rest of the sections. The estimated parameters are provided in Table 3 in Appendix D.

Figure. 3 Left: histogram of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} in the CC item. Middle: boxplots of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} grouped by binary item responses. Right: ROC curve of classifying binary item responses using θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} .

4.1.2. Latent Trait Interpretation

In Fig. 3, we present θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} and its relationship with students’ binary item responses. The middle panel displays boxplots of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} grouped by students’ binary item responses indicating whether the item was answered correctly (1) or incorrectly (0). It shows that students who successfully solved the item tend to have a higher θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} . In addition, the ROC curve of classifying the item response by θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} is plotted in the right panel of Fig. 3. The area under the curve (AUC) is 0.709. These results suggest that the latent trait may be related to students’ problem-solving skills. Further examination of the response processes shows that the VOTAT strategy is often used in the response processes with large θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} , while the response processes with small θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} often lack this feature. Table 1 presents a few examples of response processes with top or bottom 5% of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} . Since the ability of using the VOTAT strategy is closely related to problem-solving skills, it is not unreasonable to interpret the latent trait θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} as students’ problem-solving proficiency.

Table 1 Examples of response processes with top or bottom 5% of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} .

Note: For ease of presentation, the exhibited response processes are shortened by removing consecutively repeating actions.

Figure. 4 State-action probability matrices at the quartiles of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} in the CC item. The column names T_M, T_B, and M_B stand for actions “Top_Middle,” “Top_Bottom,” and “Middle_Bottom,” respectively.

4.1.3. Hidden State Interpretation

In Fig. 4, we plot the state-action probability matrices at the quartiles of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} to check the connection between the two hidden states and the problem-solving processes of the CC item. As shown in the figure, actions “Middle,” “Bottom,” and “RESET” are often used in State 1 but rarely used in State 2. These actions are often used in the VOTAT strategy to isolate the effect of a control bar. In contrast, the actions involving placing multiple bars at nonzero positions (e.g., “All” and “Top_Middle”) often have small or near zero probabilities in State 1 and higher probabilities in State 2. These patterns suggest that students often explore the item interface and try to figure out how to solve the problem in State 2 and apply the efficient VOTAT strategy to solve the problem in State 1. We label States 2 and 1 as the EXPLORE and VOTAT States, respectively.

The probabilities associated with action “Top” under the two states seem to conflict with the interpretation. Intuitively, action “Top” plays a similar role as actions “Middle” and “Bottom.” It should be a critical action in applying the VOTAT strategy. However, in the estimated state-action probability matrices, “Top” has a much higher probability in the EXPLORE State than in the VOTAT State. A possible reason for this counterintuitive pattern is that “Top” is also used when students explore the interface since the top bar is often the first control bar one would move. The fitted LHMM tends to assign the same hidden states to “Top” as those for “All,” “Top_Middle,” and other actions that students used when exploring the interface.

Figure. 5 State transition probability curves of the CC item.

4.1.4. Difference in Response Processes Across Students

We investigate the variation in response processes by examining the state transition and the state-action probability curves shown in Figs. 5 and 6. Regarding the state transition probabilities, in both states, the probability of staying in the same state is high, indicating students often stay in one state for a few steps and then jump to the other state. As θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} increases, the probability of staying in the EXPLORE State decreases, while the probability of staying in the VOTAT State has no notable change. These results suggest that students with higher problem-solving proficiency often take fewer actions in the EXPLORE State and reach the efficient VOTAT State faster. Once in the VOTAT State, all students are very likely to stay in the efficient problem-solving state regardless of their problem-solving proficiency.

Figure. 6 State-action probability curves of the CC item.

In terms of state-action probability, students’ behaviors vary with θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} in both states. In the EXPLORE State, students with a higher θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} tend to use diverse actions to explore the interface while those with a lower θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} often place all three bars at nonzero positions without exploring other patterns or using the RESET button. In the VOTAT State, the behavior patterns for students with θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} between 0.5 and 2.5 do not vary greatly. They mainly use actions “Middle,” “Bottom,” and “RESET” to apply the VOTAT strategy. On the other hand, students with a lower θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} are more likely to take actions “Middle_Bottom,” “Top_Bottom,” and “None.” By examining the action sequences and corresponding estimated hidden state sequences of these students, we found that some of these students seem to use a different way to implement the VOTAT strategy. Instead of placing only one bar at a nonzero position at a time, they place one additional bar at a nonzero position at a time. Inspecting the change in the temperature and humidity readings under the two settings provides information on the climate variable controlled by the additional bar. Action sequence “Middle, Middle_Bottom, Top_Middle_Bottom, Middle_Bottom, Bottom, Top_Bottom” is an example of the response processes performing VOTAT in this way. Students can identify the climate variable associated with the bottom bar by comparing the readings under settings “Middle” and “Middle_Bottom.” Similarly, the readings from “Middle_Bottom” and “Bottom” provide information about the middle bar. The results from LHMM can help us distinguish the two different ways of implementing the VOTAT strategy to some extent.

Figure. 7 Interface of the TICKET item in PISA 2012.

4.2. TICKET Item

In the section, we focus on the response processes from the TICKET item. The item requires students to purchase a full fare country train ticket with two individual trips through an automated ticketing machine. The interactive screen of the ticketing machine, along with operating instructions, is included in the item interface. Figure 7 presents screenshots of different pages shown on the ticketing machine screen and how the screen changes in response to button clicks. The upper left screenshot gives the initial page. The same operating instructions are shown in the left panel of each page. They are omitted in the screenshots of subsequent pages to save space. The flow of the pages is marked by the arrows in the figure. When solving the item, a student needs to choose the train network (country trains or city subway), fare type (full fare or concession), and pricing basis (daily or individual trips) for the tickets in sequence. The student will also be asked to choose the number of trips if individual trip tickets are chosen. Tickets are purchased after clicking the BUY button on the final page. If the CANCEL button is clicked on any page, then the screen will return to the initial page with all previous choices cleared. The TICKET item involves 13 distinct actions, each corresponding to clicking a button in the interface. The descriptions of these actions are provided in Table 2.

Table 2 Actions in the TICKET item.

Our dataset contains the response processes of 417 students from the USA. The process length ranges from 4 to 32, with a mean of 6.58 and a standard deviation of 3.71. The action counts and proportions are presented in the last column of Table 2. Among the 417 students, 272 (65.2%) answered the item correctly.

As described above, the TICKET item requires students to make three major choices regarding the network, fare type, and pricing basis of the ticket to be bought. Choosing each aspect of the ticket can be seen as a subtask of the original task. However, the transitions of the subtasks are completely determined by the action in the current step, clearly violating the model assumptions of LHMM. Despite this, we still fit LHMM with three hidden states ( K = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} ) to examine whether LHMM can recover the subtask structure under model misspecification. Because of the design of the interface, all students start the item by choosing the train network, so we ignore the individual difference in the initial state probability vector π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} by setting τ = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }= \varvec{0}$$\end{document} . The HMM with three hidden states is also fitted to the data for comparison.

4.2.1. Comparison Between LHMM and HMM Fits

Similar to the analysis of the CC item, we use the likelihood ratio test and BIC to compare the model fits of LHMM and HMM. The likelihood ratio test gives test statistic χ 2 ( 42 ) = 872.0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2(42) = 872.0$$\end{document} and p-value < . 001 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<.001$$\end{document} . The BIC values of HMM and LHMM are 8663.8 and 8124.3, respectively. These results indicate that LHMM is more appropriate to describe the response processes than HMM. In the remaining parts of the section, we examine the results from LHMM in detail. The estimated parameters are provided in Table 4 in Appendix D.

4.2.2. Latent Trait Interpretation

The left panel of Fig. 8 presents the histogram of the estimated latent traits θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} for the 417 students. A tall bar on the high end of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} stands out in the graph. It exists because a significant proportion of students have identical action sequences “country_trains, full_fare, individual, trip_2, Buy,” leading to identical θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} values for these students.

Figure. 8 Left: histogram of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} in the TICKET item. Middle: boxplots of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} grouped by binary item responses. Right: ROC curve of classifying binary item responses using θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} .

The boxplots in the middle panel show that students who answered the item correctly tend to have a smaller θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} , and those who answered the item incorrectly tend to have a larger θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} . The ROC curve in the right panel of Fig. 8 has AUC 0.899, which is close to 1. Based on these results, we interpret the latent trait as students’ problem-solving skills that the TICKET item is designed to assess. With this interpretation, it is evident that students with the action sequence “country_trains, full_fare, individual, trip_2, Buy” should have the highest θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} as this action sequence corresponds to the most succinct way of answering the item correctly.

4.2.3. Connection Between Hidden States and Subtasks

To examine how the hidden states in LHMM are connected with problem-solving subtasks, we present in Fig. 9 the state-action probability matrices at the quartiles of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} . In State 1, actions “country_train” and “city_subway” are taken with high probabilities. Since the two actions are the only possible choices for the train network, State 1 is related to the subtask of choosing the train network. Similarly, State 2 is related to the subtask of choosing fare type as actions “full_fare” and “concession” are taken with high probabilities in this state. Finally, in State 3, the high probability actions involve choosing between daily and individual trip tickets and the number of individual trips. Hence, the state is related to the subtask of choosing the pricing basis (and related details) of the tickets. Although LHMM is not provided with the information on how the required task should be solved, the hidden states in the fitted model can be linked to the subtasks of solving the required task. This suggests that LHMM is able to capture the structure of students’ response processes of the TICKET item. For ease of reference, we label the three states as the Network state, the Fare Type state, and the Pricing state, respectively.

Figure. 9 State-action probability matrices at the quartiles of θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} in the TICKET item.

4.2.4. Difference in Response Processes Across Students

With the interpretations of the latent trait and hidden states in mind, we are now ready to examine how the response processes vary across students. Figure 10 shows the curves of state-action probabilities as functions of the latent trait θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . In the Network state, as θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} increases, the probability of taking the correct action (“country_train”) increases while the probability of taking the incorrect action (“city_subway”) decreases. Similar patterns are also observed in the Fare Type and Pricing states. Since θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is interpreted as students’ problem-solving proficiency, the results agree with the intuition that students who are better at problem-solving will more likely make the correct choices.

Figure. 10 State-action probability curves of the TICKET item.

Figure 11 exhibits the curves of state transition probabilities. According to the figure, when in the Network state (left panel), the students will almost surely jump to the Fare Type state in the next step regardless of students’ problem-solving proficiency θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . If the current state is the Fare Type state (middle panel), the next state is either the Pricing state or the Network state. Among the two states, the Pricing state takes about 90% of the probability, and the percentage increases slightly as θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} increases. If the current state is the Pricing state (right panel), the state either stays at the Pricing state (for selecting the number of individual trips) or returns to the Network state. Students with higher θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} values are more likely to stay in the Pricing state. The transition from Fare Type or Pricing to Network is a result of clicking the CANCEL button to correct previous choices or explore the interface more. Students with better problem-solving skills are less likely to make mistakes and need less exploration. Hence, they are less likely to restart the selection process and more likely to remain on the track of choosing the network, fare type, and pricing basis to complete the required task.

Figure. 11 State transition probability curves of the TICKET item.

Note that the state transition patterns obtained from LHMM are consistent with how the screen pages are switched in the interface although such information was not utilized in model fitting. It again suggests that LHMM is able to characterize the structure of the response processes of the TICKET item. The resemblance between state transitions and page changes further confirms the links between hidden states and the subtasks.

4.3. Selection of K

Fitting LHMMs requires a pre-specified number of hidden states K. In the case studies, we set K = 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2$$\end{document} in the CC item and K = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} in the TICKET item based on our understanding of the required tasks in the two items. For both items, we tried K = 2 , 3 , 4 , 5 , 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2,3,4,5,6$$\end{document} and found that the chosen K produces the most interpretable results. Data-driven methods for setting K in an HMM are available. Chapter 15.6 of Cappé et al. (Reference Cappé, Moulines and Ryden2005) gives a penalized maximum likelihood method for selecting K. The resulting criterion for selection is similar to BIC but with a heavier penalty on the model complexity. For the CC item, the modified BIC values for the LHMM with K = 2 , 3 , 4 , 5 , 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2, 3, 4, 5, 6$$\end{document} are 22820.11, 22724.54, 22298.61, 23597.99, 24391.41 with K = 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=4$$\end{document} corresponding to the smallest value. However, the values for K = 2 , 3 , 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2,3,4$$\end{document} do not vary very much. Our choice K = 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2$$\end{document} also seems reasonable in terms of the criterion. For the TICKET item, the modified BIC values are 9402.99, 8586.84, 9169.67, 9695.15, and 10094.50. According to this criterion, K = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} is selected, matching our choice. Although strong consistency has been established for the modified BIC, we do not recommend to select K solely based on the criterion. Whether the obtained results are interpretable is also an important factor. Finally, as far as the authors’ knowledge, the ordinary BIC has not been theoretically justified for selecting K in HMMs or LHMMs as the model with a smaller K is located on the boundary instead of the interior of the parameter space of the model with a larger K.

5. Simulation Studies

In this section, we demonstrate the performance of LHMM through simulation studies. We investigate 1) the performance of estimating the parameters, latent traits, and hidden states in LHMM and 2) how well the proposed model can be distinguished from the ordinary HMM.

5.1. Settings

For the first aim, we generate action sequences from the LHMM described in Sect. 2.2 with three hidden states ( K = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} ) and ten actions ( M = 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M=10$$\end{document} ). The parameters of the LHMM are presented in Table 5 in Appendix E. These parameters are chosen so that the resulting probability curves (solid lines in Fig. 14) resemble those estimated from the TICKET item. Datasets with different combinations of sample size n and average sequence length L are generated to investigate how the estimation performance changes with the two quantities. Two choices of n, 100 and 500, are considered to represent small and large sample size scenarios. Two choices of L, 10 and 50, are considered to represent short and long sequences scenarios. Fifty datasets are generated for each combination of n and L. To generate a sequence, we first generate θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} from the standard normal distribution and the sequence length T i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_i$$\end{document} from the Poisson distribution with mean L. Then, the hidden states and actions in the sequence are generated sequentially from the LHMM. Note that the sequences in a generated dataset vary in length and action composition.

For each dataset, we fit the proposed LHMM using the algorithms described in Sect. 3. Since the parameters in LHMM are identifiable up to a permutation of hidden states and sign change of parameters A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document} and C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} , the following post-estimation processing is conducted to match the estimated latent traits and parameters with the corresponding true values before evaluating the estimation performance. First, to align the estimated and the true latent traits, we multiply θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} by - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} if the correlation between θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} and θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is negative. Then, the estimated latent traits θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} are standardized to have mean zero and standard deviation one. The estimated parameters are also rescaled and shifted accordingly. Furthermore, for each possible permutation of the hidden states, we record the deviation of the estimated probability curves computed using the permuted parameters from the true curves. The deviation measure will be described shortly. The permutation producing the smallest deviation measurement is then chosen to obtain the final permuted parameters and hidden states for evaluation.

For each dataset, we measure the discrepancy between the estimated and true state transition probability curves using root-mean-squared error (RMSE):

RMSE ( P ) = 1 n K 2 k = 1 K l = 1 K i = 1 n ( p ^ kl ( θ i ) - p kl ( θ i ) ) 2 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {RMSE}(\varvec{P}) = \sqrt{\frac{1}{nK^2}\sum _{k=1}^K \sum _{l=1}^K \sum _{i = 1}^n ({\hat{p}}_{kl}(\theta _i) - p_{kl}(\theta _i))^2}, \end{aligned}$$\end{document}

where θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} is the true latent trait of the i-th sequence in the dataset, p kl ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{kl}(\theta )$$\end{document} is the true probability curve of transition from state k to state l, and p ^ kl ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{p}}_{kl}(\theta )$$\end{document} is the corresponding estimated curve. The RMSEs for evaluating the estimated state-action probability curves and the initial state probability distribution are defined analogously. We use the Pearson correlation between the estimated and true latent traits to evaluate the accuracy of latent trait estimation. The accuracy of hidden state estimation is evaluated by computing the proportion of the estimated hidden states that match the true ones.

For the purpose of comparing the proposed model and the ordinary HMM, we also fit the HMM with K = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} to each of the datasets generated previously. In addition, we generate datasets from a HMM with K = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} and fit both LHMM and HMM to each datasets. The model parameters are set as their counterparts in the LHMM with θ = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = 0$$\end{document} . Same as the settings for generating datasets from LHMM, we consider sample size n = 100 , 500 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 100, 500$$\end{document} and average sequence length L = 10 , 50 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 10, 50$$\end{document} . Fifty datasets are generated for each combination of n and L. The two fitted models for a given dataset are compared using BIC.

Figure. 12 Histograms of the difference between the BIC of LHMM and HMM for datasets generated from LHMM (top) and HMM (bottom).

5.2. Results

5.2.1. Comparison of LHMM and HMM

Figure 12 presents the histograms of the difference between the BIC values of LHMM and HMM in different scenarios. The top and bottom rows correspond to results for datasets generated from LHMM and HMM, respectively. According to the figure, BIC can correctly choose between LHMM and HMM most of the time. Several mistakes are made in the case of n = 100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=100$$\end{document} and L = 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=10$$\end{document} when the true model is LHMM and in the case of n = 500 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=500$$\end{document} and L = 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=10$$\end{document} when the true model is HMM. Overall, short sequences may bring difficulty in distinguishing between LHMM and HMM using BIC.

5.2.2. Parameter Estimation

The RMSEs of the estimated probabilities are given in Fig. 13. In general, the performance of estimating the state transition and state-action probabilities improves as n or L increases. However, increasing sequence length does not reduce the RMSE of the estimated initial state probabilities since only the first action in each sequence provides information on π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} . In Fig. 13, the medians of the RMSE for each component are usually below 0.1, indicating reasonable overall estimation performance. We present the estimated probability curves from two randomly selected datasets in Fig. 14. One dataset is selected from the small-n-small-L scenario ( n = 100 , L = 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=100, L=10$$\end{document} ), and the other one is selected from the large-n-large-L scenario ( n = 500 , L = 50 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=500, L=50$$\end{document} ). It is clear that the probability curves are estimated more accurately in the latter scenario. Although the estimated curves in the small-n-small-L scenario can deviate from the true curves by a large amount, the overall trends often still resemble the true ones.

Figure. 13 Boxplots of the RMSEs of estimated probabilities.

Figure. 14 True (solid lines) and estimated state transition (top row) and state-action probability curves (bottom row) for scenarios n = 100 , L = 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=100, L=10$$\end{document} (dashed lines) and n = 500 , L = 50 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=500, L=50$$\end{document} (dotted lines). Different elements in a probability distribution are distinguished by colors.

5.2.3. Latent Trait and Hidden States Estimation

Figure 15 presents the results of estimating the latent variables in LHMM. Overall, both latent traits and hidden states are estimated reasonably well. In all scenarios, the median Pearson correlations between estimated and true latent traits are above 0.85, and the median accuracies of estimated hidden states are above 0.8. However, large variation in estimation performance is seen in scenarios with short sequences. Note that both latent traits and hidden states are sequence-specific. While increasing sequence length provides more information for these latent variables, increasing sample size does not directly provide more information. As a result, in Fig. 15, we do not see significant improvement in the estimation performance when n is increased from 100 to 500.

Figure. 15 Boxplots of evaluation measures of estimated latent variables.

6. Summary and Discussion

Process data contain rich information on respondents’ problem-solving behaviors that is not available in traditional item responses. This paper proposes an LHMM for characterizing response processes and understanding the heterogeneity of problem-solving behaviors across respondents. Under the proposed model, a response process follows an HMM given the respondent’s latent trait. The parameters in HMM are further parametrized using the latent trait to account for individual differences in solving problems. The structure of HMMs is analogous to problem-solving processes with the hidden states interpreted as problem-solving subtasks. The latent-trait-dependent state transition probability matrix P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} and state-action probability matrix Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} describe how respondents differ in arranging and completing subtasks, respectively.

In LHMM, the latent trait is introduced as a quantity that summarizes the differences in respondents’ behavior patterns in a parsimonious and abstract way. Examining the estimated latent trait θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} can help distinguish the behavior patterns of respondents in different groups. For example, in the CC item, after comparing θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} with respondents’ age, one may observe that younger respondents have significantly smaller θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} and thus are less likely to reach the efficient VOTAT state. Identifying the differences can provide guidance on designing more targeted and even individualized interventions.

Although the proposed model does not impose concrete meaning on the latent trait, it is possible to find meaningful interpretations of the latent trait for at least some items. In the case studies, we interpret the latent trait as respondents’ problem-solving skill by observing the close relationship between θ ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\theta }}$$\end{document} and the binary item responses. In general, interpreting the latent trait requires knowledge about the item design. One can first infer the meaning of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} by examining how the state transition probability P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} and the state-action probability matrix Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} change with θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The conjecture can then be verified by comparing the latent traits with appropriate quantities (e.g., binary item response in our case).

One important feature of the proposed model is the use of HMM as the basic framework for modeling response processes. This feature allows us to connect latent traits to problem-solving subtasks. Latent variable models with simpler structures may not possess this feature although they can still describe the heterogeneity of problem-solving processes. For example, one could consider a set of n-grams (He and von Davier, Reference He, Davier and Rosen2016) of actions. Item response theory (IRT) models can be used to describe the binary matrix recording whether each n-gram appears in each response process or not. We considered a 2PL IRT approach for analyzing the TICKET and CC items. Unigrams and bigrams of actions were used to form the data matrix. The estimated latent traits have a moderate to strong correlation with those obtained from LHMM (0.87 for the TICKET item and 0.47 for the CC item), indicating that similar levels of heterogeneity are summarized in the two models. For the TICKET item, the results from the IRT model show that, as θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} increases, the probability of using “country_train” increases and the probability of using “subway” decreases, but the results do not reveal that the two actions are the two options for determining the train network. Similarly, for the CC item, we are not able to tell from the IRT model results that students with a higher θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} tend to reach the efficient VOTAT State faster.

Under the proposed model, the response processes of respondents with the same θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} are assumed to evolve according to the same stochastic process. This assumption does not deprive the model’s ability to describe the behavior diversity among these respondents. In fact, the diversity is characterized by the dispersion of the state transition and the state-action probability distributions. For example, if a state-action distribution is concentrated on a single action, different respondents are very likely to take the same action under the given hidden state. The actions taken by different respondents will be more diverse if more actions are assigned significant probability or if the probability is more evenly distributed among actions. According to the state-action distributions in Fig. 9, in State 1, respondents’ behaviors vary more at θ = - 0.74 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =-0.74$$\end{document} than those at θ = 0.71 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =0.71$$\end{document} since three actions have non-trivial probabilities at the smaller θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , while a single action “county_train” takes almost all probability mass at the larger θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . Similar patterns are observed in other states. Quantitatively, Shannon entropy (Cover and Thomas, Reference Cover and Thomas2006) can be used to measure the dispersion or the uncertainty of a distribution, with a higher value indicating more variation. The Shannon entropies of the state-action probability distributions in the three hidden states are 0.34, 0.38, and 0.23 at θ = - 0.74 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = -0.74$$\end{document} and reduced to 0.02, 0.16, and 0.01 at θ = 0.71 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = 0.71$$\end{document} . These results are consonant with existing empirical results showing that the behavior patterns of respondents with lower problem-solving proficiency have more variation (Eichmann et al., Reference Eichmann, Greiff, Naumann, Brandhuber and Goldhammer2020; He et al., Reference He, Liao and Jiao2019; Ulitzsch et al., Reference Ulitzsch, He and Pohl2022a, Reference Ulitzsch, Ulitzsch, He and Lüdtkeb).

The proposed model and our data analyses have several limitations, suggesting directions for further investigation. In the case studies, we interpret the latent trait as problem-solving proficiency because of its close relationship with binary item responses. Ideally, the interpretation should be further validated through carefully designed experiments or reliable measurement results. Such analyses will also help gauge the possibility of using LHMM as a measurement model. Also, we currently interpret the hidden states based on the patterns in state-action probability matrices and our understanding of the required tasks. More systematic and less subjective methods for interpreting hidden states would be beneficial for applying the proposed model in practice.

When analyzing the CC item, we grouped multiple recorded actions into one action. The simplification may cause information loss and obscure certain patterns in students’ behaviors. Other ways of grouping the actions may lead to results different from those shown in our case study. Analyzing the response processes without simplifying the action set can avoid the problem but will be more computationally intensive.

Because of the complexity of human behaviors, it is almost certain that LHMM is not the underlying model generating real response processes. Nonetheless, the overly simplified model is helpful for understanding respondents’ problem-solving behaviors as we showed in the case studies. It is interesting to see if more suitable models can be designed to capture more complicated patterns without sacrificing interpretability and computational efficiency too much.

Like many other studies on process data, we focus on analyzing individual items and ignore potential connections between them. Joint modeling of response processes from multiple items can be statistically more efficient than single-item analyses. It can also help us understand how the behavior patterns in different items are related. Although joint analysis using LHMM is conceptually straightforward, it could be computationally challenging due to the increased number of parameters. In addition, there are several model choices to consider, such as whether respondents’ behaviors in different items should be affected by the same latent trait. As it stands, the proposed LHMM is not flexible enough to accommodate such needs. As we discussed below, several extensions can be considered to increase the flexibility of the model and better support joint modeling of response processes from different items.

The proposed model involves a single latent trait in the initial state probability vector π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} , state transition probability matrix P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} , and state-action probability Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} . In complex problem-solving items, it is very likely that multidimensional latent traits should be used for characterizing the heterogeneity of response processes. It is also possible that P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} and Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} are affected by different latent traits. LHMMs involving multidimensional latent traits are straightforward to formulate. However, the MLE of the model parameter may be difficult to obtain since numerical integration in higher dimensions may be computationally expensive and inaccurate. Developing more computationally efficient methods for statistical inference is essential for such models. Jointly estimating model parameters and latent traits may be a direction to take as this method has demonstrated computational advantages in multidimensional item response theory models Chen et al. (Reference Chen, Li and Zhang2019b).

Currently, no constraint is imposed on the state transitions in our model. Transitions could occur between any two states. In practice, constraints exist on state transitions. For example, in the TICKET item, the design of the item interface determines that no transition should occur from the Network state to the Pricing state or from the Pricing state to the Fare Type state. In the CC item, it is natural to assume that students will not go back to an inefficient problem-solving state once they figure out the efficient strategy, which prohibits the transition from the VOTAT State to the EXPLORE State. The fitted LHMM allows such transitions, although only with tiny probabilities. Imposing constraints on state transitions reduces the dimension of parameter space and thus improves the stability of the model fit. Developing data-driven methods to detect such structure in the state transition probability matrix is an interesting future direction.

As a final remark, we would like to point out that the proposed LHMM may not be suitable for response processes from all problem-solving items despite its good performance in characterizing the response structure of the TICKET item and the CC item. Also, if the required task does not have a multi-subtask pattern, one can still fit an LHMM to the response processes, but the hidden states may not have a clear interpretation.

Data Availability

The dataset analyzed in the current study are available at https://www.oecd.org/pisa/pisaproducts/database-cbapisa2012.htm.

Declarations

Conflict of interest

The author has no conflicts of interest to declare that are relevant to the content of this article.

Appendix A LHMM Likelihood Computation

The likelihood for a set of response processes Y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}_n$$\end{document} following an LHMM is

L ( η Y n ) = i = 1 n P ( Y ( i ) = y ( i ) η ) = i = 1 n ϕ ( θ i ) P ( Y ( i ) = y ( i ) η , θ i ) d θ i . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} L(\varvec{\eta }\mid {\mathcal {Y}}_n) = \prod _{i=1}^n P(\varvec{Y}^{(i)} = \varvec{y}^{(i)} \mid \varvec{\eta }) = \prod _{i = 1}^n \left\{ \int \phi (\theta _i) P(\varvec{Y}^{(i)} = \varvec{y}^{(i)} \mid \varvec{\eta }, \theta _i)d\theta _i\right\} . \end{aligned}$$\end{document}

We demonstrate here how to compute L i η y ( i ) = P Y ( i ) = y ( i ) η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_i\left(\varvec{\eta }\mid \varvec{y}^{(i)}\right) = P\left(\varvec{Y}^{(i)} = \varvec{y}^{(i)} \mid \varvec{\eta }\right) $$\end{document} . For notation simplicity, the superscripts and the subscripts denoting different respondents are suppressed hereafter. We explain first how to compute f ( η , θ ) = P ( Y = y η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \theta ) = P(\varvec{Y} = \varvec{y} \mid \varvec{\eta }, \theta )$$\end{document} given ( η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }, \theta )$$\end{document} and then how to numerically integrate ϕ ( θ ) f ( η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\theta )f(\varvec{\eta }, \theta )$$\end{document} with respect to θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} to obtain L ( η y ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid \varvec{y})$$\end{document} .

For k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} and t = 1 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1, \ldots , T$$\end{document} , define the forward probability

(A1) α t ( k θ ) = P ( Y 1 : t = y 1 : t , S t = k η , θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _t(k \mid \theta ) = P(\varvec{Y}_{1:t} = \varvec{y}_{1:t}, S_t = k \mid \varvec{\eta }, \theta ). \end{aligned}$$\end{document}

Given η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} and θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , we can obtain f ( η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \theta )$$\end{document} from the forward probabilities α T ( k θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _T(k \mid \theta )$$\end{document} since f ( η , θ ) = k = 1 K α T ( k θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \theta ) = \sum _{k = 1}^K \alpha _T(k \mid \theta )$$\end{document} . According to HMM assumptions (14), it is easy to verify α 1 ( k θ ) = π k ( θ ) q k , y 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _1(k \mid \theta ) = \pi _k (\theta ) q_{k, y_1}(\theta )$$\end{document} and

(A2) α t ( k θ ) = l = 1 K α t - 1 ( l θ ) p lk ( θ ) q k , y t ( θ ) , t = 2 , , T , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _{t}(k \mid \theta ) = \sum _{l = 1}^K \alpha _{t-1}(l \mid \theta ) p_{lk}(\theta )q_{k, y_t}(\theta ), ~t = 2, \ldots , T, \end{aligned}$$\end{document}

where π k ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _k(\theta )$$\end{document} , p kl ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{kl}(\theta )$$\end{document} , and q kj ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{kj}(\theta )$$\end{document} are defined in (57). Therefore, α T ( k θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _T(k\mid \theta )$$\end{document} can be computed by first calculating α 1 ( k θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _1(k \mid \theta )$$\end{document} and then applying (A2) recursively.

Besides the forward probabilities, one can also define the backward probability

(A3) β t ( k θ ) = P ( Y ( t + 1 ) : T = y ( t + 1 ) : T S t = k , η , θ ) , k = 1 , , K , t = 1 , , T - 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \beta _t(k \mid \theta ) = P(\varvec{Y}_{(t+1):T} = \varvec{y}_{(t+1):T} \mid S_t = k, \varvec{\eta }, \theta ), ~k = 1, \ldots , K, ~t = 1, \ldots , T-1. \end{aligned}$$\end{document}

Letting β T ( k θ ) = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _T(k \mid \theta ) = 1$$\end{document} , then we have the recursive relation

(A4) β t ( k θ ) = l = 1 K p kl ( θ ) q l , y t + 1 ( θ ) β t + 1 ( l | θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \beta _t(k \mid \theta ) = \sum _{l = 1}^K p_{kl}(\theta )q_{l,y_{t+1}}(\theta ) \beta _{t+1}(l | \theta ). \end{aligned}$$\end{document}

Although computing f ( η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \theta )$$\end{document} does not require the backward probabilities, we still compute them when evaluating the likelihood because they, together with the forward probabilities, are essential components for computing the derivatives of the likelihood function. See Appendix B for details.

Given that f ( η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \theta )$$\end{document} is computable, we can approximate

L ( η y ) = ϕ ( θ ) f ( η , θ ) d θ = 1 π e - x 2 f ( η , 2 x ) d x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} L(\varvec{\eta }\mid \varvec{y}) = \int \phi (\theta ) f(\varvec{\eta }, \theta ) d\theta = \frac{1}{\sqrt{\pi }} \int e^{-x^2} f(\varvec{\eta }, \sqrt{2x})dx \end{aligned}$$\end{document}

using Gaussian–Hermite quadrature by 1 π u = 1 U w u f ( η , 2 x u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{\sqrt{\pi }} \sum _{u = 1}^U w_u f(\varvec{\eta }, \sqrt{2}x_u)$$\end{document} where x 1 , , x U \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1, \ldots , x_U$$\end{document} are U quadrature points and w 1 , , w U \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_1, \ldots , w_U$$\end{document} are the associated weights. The quadrature points and the corresponding weights for a given U can be computed based on the Hermite polynomials. We use the function gauss.quad in the R package statmod for this aim.

The algorithm for computing the likelihood function for LHMM is summarized in Algorithm 1.

Algorithm 1

(LHMM likelihood computation) The likelihood function L ( η y ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid \varvec{y})$$\end{document} for a response process y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{y}$$\end{document} following LHMM is computed in the following steps.

  1. Obtain Gaussian–Hermite quadrature points x 1 , , x U \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1, \ldots , x_U$$\end{document} and the associated weights w 1 , , w U \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_1, \ldots , w_U$$\end{document} .

  2. For u = 1 , , U \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u = 1, \ldots , U$$\end{document} , compute f ( η , 2 x u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \sqrt{2}x_u)$$\end{document} as follows.

    1. Compute α 1 ( k 2 x u ) = π k ( 2 x u ) q k , y 1 ( 2 x u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _1(k \mid \sqrt{2}x_u) = \pi _k(\sqrt{2}x_u) q_{k, y_1}(\sqrt{2}x_u)$$\end{document} and set β T ( k 2 x u ) = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _T(k \mid \sqrt{2}x_u) = 1$$\end{document} for k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} .

    2. For t = 2 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 2, \ldots , T$$\end{document} and k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} , compute

      α t ( k 2 x u ) = l = 1 K α t - 1 ( l 2 x u ) p lk ( 2 x u ) q k , y t ( 2 x u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha _t(k \mid \sqrt{2}x_u) = \sum _{l = 1}^K \alpha _{t-1} (l \mid \sqrt{2}x_u) p_{lk}(\sqrt{2}x_u)q_{k, y_t}(\sqrt{2}x_u) \end{aligned}$$\end{document}
      and
      β T - t + 1 ( k 2 x u ) = l = 1 K p kl ( 2 x u ) q l , y T - t + 2 ( 2 x u ) β T - t + 2 ( l 2 x u ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \beta _{T-t+1}(k \mid \sqrt{2} x_u) = \sum _{l = 1}^K p_{kl}(\sqrt{2}x_u) q_{l,y_{T-t+2}}(\sqrt{2}x_u) \beta _{T-t+2}(l \mid \sqrt{2}x_u). \end{aligned}$$\end{document}
    3. Compute f ( η , 2 x u ) = k = 1 K α T ( k 2 x u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \sqrt{2}x_u) = \sum _{k = 1}^K \alpha _T(k\mid \sqrt{2}x_u)$$\end{document} .

  3. Compute L ( η y ) = 1 π u = 1 U w u f ( η , 2 x u ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\varvec{\eta }\mid \varvec{y}) = \frac{1}{\sqrt{\pi }}\sum _{u=1}^U w_u f(\varvec{\eta }, \sqrt{2}x_u)$$\end{document} .

Appendix B Gradient of LHMM Log-Likelihood Function

For a given element η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta $$\end{document} in η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} ,

log L ( η ) η = i = 1 n 1 L i ( η y ( i ) ) L i ( η y ( i ) ) η . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \log L(\varvec{\eta })}{\partial \eta } = \sum _{i=1}^n \frac{1}{L_i(\varvec{\eta }\mid \varvec{y}^{(i)})} \frac{\partial L_i(\varvec{\eta }\mid \varvec{y}^{(i)})}{\partial \eta }. \end{aligned}$$\end{document}

The algorithm for calculating L i ( η y ( i ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_i(\varvec{\eta }\mid \varvec{y}^{(i)})$$\end{document} is presented in Appendix A. We explain here how to compute L i ( η y ( i ) ) η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial L_i(\varvec{\eta }\mid \varvec{y}^{(i)})}{\partial \eta }$$\end{document} . The superscripts and the subscripts denoting different respondents are suppressed hereafter to simplify notation. Let f ( η , θ ) = P ( Y = y η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{\eta }, \theta ) = P(\varvec{Y} = \varvec{y} \mid \varvec{\eta },\theta )$$\end{document} . Then

(A5) L ( η y ) η = ϕ ( θ ) f ( η , θ ) η d θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial L(\varvec{\eta }\mid \varvec{y})}{\partial \eta } = \int \phi (\theta ) \frac{\partial f(\varvec{\eta }, \theta )}{\partial \eta }d\theta . \end{aligned}$$\end{document}

If f ( η , θ ) η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial f(\varvec{\eta }, \theta )}{\partial \eta }$$\end{document} is computable given ( η , θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }, \theta )$$\end{document} , then the integral on the right-hand side of (A5) can be approximated using Gaussian–Hermite quadrature similarly as in computing the likelihood function. In the remaining part, we focus on deriving f ( η , θ ) η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial f(\varvec{\eta }, \theta )}{\partial \eta }$$\end{document} . In the following calculations, the initial state probability π k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _k$$\end{document} , the state transition probabilities p kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{kl}$$\end{document} , and the state-action probabilities q kj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{kj}$$\end{document} all depend on θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} as defined in (57). To simplify notation, we do not explicitly write them as functions of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} .

First, consider taking derivative of f with respect to π k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _k$$\end{document} , p kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{kl}$$\end{document} , and q kj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{kj}$$\end{document} . Define α t = ( α t ( 1 ) , , α t ( K ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_t = (\alpha _t(1), \ldots , \alpha _t(K))^\top $$\end{document} and β t = ( β t ( 1 ) , , β t ( K ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_t = (\beta _t(1), \ldots , \beta _t(K))^\top $$\end{document} where α t ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _t(k)$$\end{document} and β t ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _t(k)$$\end{document} are the forward and backward probabilities defined in (A1) and (A3), respectively. Then, the relationship in (A2) and (A4) can be expressed compactly as

α t = α t - 1 P Q ~ t , and β t = P Q ~ t + 1 β t + 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{\alpha }_t = \varvec{\alpha }_{t-1}^\top \varvec{P} \tilde{\varvec{Q}}_t, ~\text {and}~ \varvec{\beta }_t = \varvec{P} \tilde{\varvec{Q}}_{t+1} \varvec{\beta }_{t+1}, \end{aligned}$$\end{document}

where P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} is the state transition probability matrix and Q ~ t = diag { q 1 , y t , , q K , y t } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\varvec{Q}}_t = {{\,\textrm{diag}\,}}\{q_{1, y_t}, \ldots , q_{K, y_t}\}$$\end{document} . Recursively applying the above relationship, we get

α t = π Q ~ 1 P Q ~ 2 P Q ~ t and β t = P Q ~ t + 1 P Q ~ T 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{\alpha }_t = \varvec{\pi }^\top \tilde{\varvec{Q}}_1 \varvec{P} \tilde{\varvec{Q}}_2 \cdots \varvec{P} \tilde{\varvec{Q}}_t ~\text {and}~ \varvec{\beta }_t = \varvec{P}\tilde{\varvec{Q}}_{t+1} \cdots \varvec{P} \tilde{\varvec{Q}}_T \varvec{1}, \end{aligned}$$\end{document}

where 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{1}$$\end{document} is a column vector of K ones. Let x denote a generic element of π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} , P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}$$\end{document} or Q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Q}$$\end{document} . Then,

f x = π Q ~ 1 x P Q ~ 2 P Q ~ T 1 + t = 1 T - 1 α t P Q ~ t + 1 x β t + 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial f}{\partial x} = \frac{\partial \varvec{\pi }^\top \tilde{\varvec{Q}}_1}{\partial x} \varvec{P} \tilde{\varvec{Q}}_2 \cdots \varvec{P} \tilde{\varvec{Q}}_T \varvec{1} + \sum _{t=1}^{T-1} \varvec{\alpha }_t^\top \frac{\partial \varvec{P} \tilde{\varvec{Q}}_{t+1}}{\partial x} \varvec{\beta }_{t+1}. \end{aligned}$$\end{document}

Replacing x with π k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _k$$\end{document} , p kl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{kl}$$\end{document} , and q kj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{kj}$$\end{document} and simplifying the expression, we obtain

(A6) f π k = q k , y 1 β 1 ( k ) , k = 1 , , K f p kl = t = 1 T - 1 α t ( k ) β t + 1 ( l ) q l , y t + 1 , k , l = 1 , , K f q kj = t : y t = j α t ( k ) β t ( k ) / q kj , k = 1 , , K , j = 1 , , M . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \frac{\partial f}{\partial \pi _k}&= q_{k,y_1} \beta _1(k), ~ k =1, \ldots , K\\ \frac{\partial f}{\partial p_{kl}}&= \sum _{t=1}^{T-1} \alpha _t(k) \beta _{t+1}(l)q_{l, y_{t+1}}, ~ k,l = 1, \ldots , K\\ \frac{\partial f}{\partial q_{kj}}&= \sum _{t: y_t = j} \alpha _t(k) \beta _t(k) /q_{kj}, ~k = 1, \ldots , K, ~ j = 1, \ldots , M. \end{aligned} \end{aligned}$$\end{document}

According to the chain rule,

(A7) f μ k = k = 1 K f π k π k μ k = π k f π k - k = 1 K f π k π k , f τ k = k = 1 K f π k π k τ k = θ f μ k , f b kl = l = 1 K f p k l p k l b kl = p kl f p kl - l = 1 K f p k l p k l , f a kl = l = 1 K f p k l p k l a kl = θ f b kl , f d kj = j = 1 M f q k j q k j d kj = q kj f q kj - j = 1 K f q k j q k j , f c kj = j = 1 M f q k j q k j c kj = θ f d kj . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \frac{\partial f}{\partial \mu _k}&= \sum _{k'=1}^K \frac{\partial f}{\partial \pi _{k'}} \frac{\partial \pi _{k'}}{\partial \mu _{k}} = \pi _{k} \left(\frac{\partial f}{\partial \pi _{k}} - \sum _{k' = 1}^K \frac{\partial f}{\partial \pi _{k'}}\pi _{k'} \right) , ~\frac{\partial f}{\partial \tau _k} = \sum _{k'=1}^K \frac{\partial f}{\partial \pi _{k'}} \frac{\partial \pi _{k'}}{\partial \tau _{k}} = \theta \frac{\partial f}{\partial \mu _k},\\ \frac{\partial f}{\partial b_{kl}}&= \sum _{l'=1}^K\frac{\partial f}{\partial p_{kl'}} \frac{\partial p_{kl'}}{\partial b_{kl}} = p_{kl} \left(\frac{\partial f}{\partial p_{kl}} - \sum _{l' = 1}^K \frac{\partial f}{\partial p_{kl'}}p_{kl'} \right) , ~\frac{\partial f}{\partial a_{kl}} = \sum _{l'=1}^K\frac{\partial f}{\partial p_{kl'}} \frac{\partial p_{kl'}}{\partial a_{kl}} = \theta \frac{\partial f}{\partial b_{kl}},\\ \frac{\partial f}{\partial d_{kj}}&= \sum _{j'=1}^M\frac{\partial f}{\partial q_{kj'}} \frac{\partial q_{kj'}}{\partial d_{kj}} = q_{kj} \left(\frac{\partial f}{\partial q_{kj}} - \sum _{j' = 1}^K \frac{\partial f}{\partial q_{kj'}}q_{kj'} \right) , ~\frac{\partial f}{\partial c_{kj}} = \sum _{j'=1}^M\frac{\partial f}{\partial q_{kj'}} \frac{\partial q_{kj'}}{\partial c_{kj}} = \theta \frac{\partial f}{\partial d_{kj}}.\\ \end{aligned}\nonumber \\ \end{aligned}$$\end{document}

Combining (A6) and (A7) gives f η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial f}{\partial \eta }$$\end{document} for η = τ k , μ k , a kl , b kl , c kj , d kj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta = \tau _k, \mu _k, a_{kl}, b_{kl}, c_{kj}, d_{kj}$$\end{document} .

Appendix C Viterbi Algorithm

Let y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{y}$$\end{document} be a sequence following the LHMM with parameters η \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} and latent trait θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The most probable hidden state sequence s ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{s}}$$\end{document} can be found using the Viterbi algorithm. For k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} and t = 2 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 2, \ldots , T$$\end{document} , define

v t ( k ) = max s 1 : ( t - 1 ) P ( Y 1 : t = y 1 : t , S 1 : ( t - 1 ) = s 1 : ( t - 1 ) , S t = k θ , η ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} v_t(k) = \max _{\varvec{s}_{1:(t-1)}} P(\varvec{Y}_{1:t} = \varvec{y}_{1:t}, \varvec{S}_{1:(t-1)} = \varvec{s}_{1:(t-1)}, S_t = k \mid \theta , \varvec{\eta }). \end{aligned}$$\end{document}

According to HMM assumptions (1)–(4), we have the recursive relation

v t ( k ) = max l = 1 , , K v t - 1 ( l ) p lk ( θ ) q k , y t ( θ ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} v_t(k) = \max _{l=1, \ldots , K} v_{t-1}(l) p_{lk}(\theta ) q_{k, y_t}(\theta ), \end{aligned}$$\end{document}

where v 1 ( k ) = π k ( θ ) q k , y 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_1(k) = \pi _k(\theta ) q_{k,y_1}(\theta )$$\end{document} . Let

u t ( k ) = argmax l = 1 , , K v t - 1 ( l ) p lk ( θ ) q k , y t ( θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} u_t(k) = \mathop {\textrm{argmax}}\limits _{l=1, \ldots , K} v_{t-1}(l) p_{lk}(\theta )q_{k, y_t}(\theta ). \end{aligned}$$\end{document}

After computing v t ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_t(k)$$\end{document} and u t ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u_t(k)$$\end{document} for k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1, \ldots , K$$\end{document} and t = 2 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=2, \ldots , T$$\end{document} sequentially, the most probable hidden state sequence can be obtained by backtracing:

(A8) s ^ T = argmax k = 1 , , K v T ( k ) , s ^ t = argmax k = 1 , , K u t + 1 ( k ) , for t = T - 1 , , 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\hat{s}}_T = \mathop {\textrm{argmax}}\limits _{k=1, \ldots , K} v_T(k), ~ {\hat{s}}_{t} = \mathop {\textrm{argmax}}\limits _{k=1, \ldots , K} u_{t+1}(k), \text {~for~} t = T-1, \ldots , 1. \end{aligned}$$\end{document}

The algorithm is summarized in Algorithm 2.

Algorithm 2

(Viterbi Algorithm) The most probable hidden state sequence s ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{s}}$$\end{document} for a response process y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{y}$$\end{document} following the LHMM with latent trait θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is obtained in the following steps.

  1. For k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} , compute v 1 ( k ) = π k ( θ ) q k , y 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_1(k) = \pi _k(\theta ) q_{k, y_1}(\theta )$$\end{document} .

  2. For t = 2 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 2, \ldots , T$$\end{document} ,

    1. Compute w t ( l , k ) = v t - 1 ( l ) p lk ( θ ) q k , y t ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_t(l, k) = v_{t-1}(l) p_{lk}(\theta ) q_{k, y_t}(\theta )$$\end{document} for k , l = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k,l = 1, \ldots , K$$\end{document} ;

    2. Record v t ( k ) = max l w t ( l , k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_t(k) = \max _{l} w_t(l, k)$$\end{document} and u t ( k ) = argmax l w t ( l , k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u_t(k) = \mathop {\textrm{argmax}}\limits _{l} w_t(l, k)$$\end{document} for k = 1 , , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \ldots , K$$\end{document} .

  3. Obtain s ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{s}}$$\end{document} by backtracing:

    1. s ^ T = argmax k v T ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{s}}_T = \mathop {\textrm{argmax}}\limits _k v_T(k)$$\end{document} ;

    2. For t = T - 1 , , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = T-1, \ldots , 1$$\end{document} , set s ^ t = argmax k u t + 1 ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{s}}_t = \mathop {\textrm{argmax}}\limits _k u_{t+1}(k)$$\end{document} .

Appendix D Estimated LHMM Parameters in Case Studies

Tables 3 and 4 present the LHMM parameter estimates for the CC item and the TICKET item, respectively.

Table 3 Estimated LHMM parameters for the CC item.

For ease of comparison, a column of zeros is prepended to A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document} , B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{B}$$\end{document} , C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} , and D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} . In C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} and D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} , T_M, T_B, and M_B stand for actions “Top_Middle,” “Top_Bottom,” and “Middle_Bottom,” respectively.

Table 4 Estimated LHMM parameters for the TICKET item.

For ease of comparison, a column of zeros is prepended to A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{A}$$\end{document} , B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{B}$$\end{document} , C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} , and D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} . In C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{C}$$\end{document} and D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} , trains, subway, full, conc., and indv. stand for actions “country_trains,” “city_subway,” “full_fare,” “concession,” and “individual,” respectively.

Appendix E True Parameters in Simulation Studies

Table 5 presents the parameters of LHMM for generating the action sequences in the simulation study. The values are chosen so that the resulting state transition and state-action probability curves are similar to those obtained in the TICKET item.

Table 5 Parameters used for generating action sequences in the simulation study.

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Binkley, M., Erstad, O., Herman, J., Raizen, S., Ripley, M., Miller-Ricci, M., & Rumble, M. (2012). Defining twenty-first century skills. In Assessment and teaching of 21st century skills (pp. 1766). Springer.CrossRefGoogle Scholar
Broyden, C.G.. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6 17690.CrossRefGoogle Scholar
Cappé, O, Moulines, E, Ryden, TInference in hidden Markov models 2005 Springer.CrossRefGoogle Scholar
Chen, Y. (2020). A continuous-time dynamic choice measurement model for problem-solving process data. Psychometrika, 85 410521075.CrossRefGoogle ScholarPubMed
Chen, Y, Li, X, Liu, J, Ying, Z. (2019). Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology, 10, 486.CrossRefGoogle Scholar
Chen, Y, Li, X, Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84 1124146.CrossRefGoogle ScholarPubMed
Cover, T.M., Thomas, J.A.Elements of information theory 2006 2Wiley.Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39 1122.CrossRefGoogle Scholar
Eddelbuettel, D, François, R. (2011). Rcpp: Seamless r and c++ integration. Journal of Statistical Software, 40, 118.CrossRefGoogle Scholar
Eichmann, B, Greiff, S, Naumann, J, Brandhuber, L, Goldhammer, F. (2020). Exploring behavioural patterns during complex problem-solving. Journal of Computer Assisted Learning, 36 6933956.CrossRefGoogle Scholar
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13 3317322.CrossRefGoogle Scholar
Giner, G., Chen, L., Hu, Y., Dunn, P., Phipson, B., & Chen, Y. (2023). statmod: Statistical modeling [Computer software manual]. Retrieved from https://cran.r-project.org/package=statmod.Google Scholar
Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24 1092326.CrossRefGoogle Scholar
Greiff, S, Niepel, C, Scherer, R, Martin, R. (2016). Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files. Computers in Human Behavior, 61, 3646.CrossRefGoogle Scholar
Greiff, S, Wüstenberg, S, Avvisati, F. (2015). Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving. Computers & Education, 91, 92105.CrossRefGoogle Scholar
Han, Y, Liu, H, Ji, F. (2021). A sequential response model for analyzing process data on technology-based problem-solving tasks. Multivariate Behavioral Research, 57, 960.CrossRefGoogle ScholarPubMed
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749-776). Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029.CrossRefGoogle Scholar
He, Q., Liao, D., & Jiao, H. (2019). Clustering behavioral patterns using process data in PIAAC problem-solving items. In Theoretical and practical advances in computer-based educational measurement (pp. 189-212). Springer.CrossRefGoogle Scholar
Herborn, K, Mustafić, M, Greiff, S. (2017). Mapping an experiment-based assessment of collaborative behavior onto collaborative problem solving in PISA 2015: A cluster analysis approach for collaborator profiles. Journal of Educational Measurement, 54 1103122.CrossRefGoogle Scholar
Liang, K, Tu, D, Cai, Y. (2022). Using process data to improve classification accuracy of cognitive diagnosis model. Multivariate Behavioral Research, .Google Scholar
Lord, F.M.Applications of item response theory to practical testing problems 1980 Routledge.Google Scholar
McCullagh, P, Nelder, JGeneralized linear models 2018 Routledge.Google Scholar
OECD PISA 2012 results: Creative problem solving: Students’ skills in tackling real-life problems 2014 OECD Publishing.Google Scholar
R Core Team. (2023). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/.Google Scholar
Rabiner, L, Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3 1416.CrossRefGoogle Scholar
Rupp, A.A., Templin, J, Henson, R.A.Diagnostic measurement: Theory, methods, and applications 2010 Guilford Press.Google Scholar
Shanno, D.F.. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24 111647656.CrossRefGoogle Scholar
Stadler, M, Fischer, F, Greiff, S. (2019). Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems. Frontiers in Psychology, 10, 777.CrossRefGoogle ScholarPubMed
Tang, X, Wang, Z, He, Q, Liu, J, Ying, Z. (2020). Automatic feature construction for process data using multidimensional scaling. Psychometrika, 85, 378397.CrossRefGoogle Scholar
Tang, X, Wang, Z, Liu, J, Ying, Z. (2021). An exploration of process data by action sequence autoencoder. British Journal of Mathematical and Statistical Psychology, 74, 133.CrossRefGoogle Scholar
Ulitzsch, E, He, Q, Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47 1335.CrossRefGoogle Scholar
Ulitzsch, E, Ulitzsch, V, He, Q, Lüdtke, O. (2022). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behavior Research Methods, 55, 1392.CrossRefGoogle ScholarPubMed
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13 2260269.CrossRefGoogle Scholar
von Davier, M, Khorramdel, L, He, Q, Shin, H.J., Chen, H. (2019). Developments in psychometric population models for technology-based large-scale assessments: An overview of challenges and opportunities. Journal of Educational and Behavioral Statistics, 44 6671705.CrossRefGoogle Scholar
Wang, Z, Tang, X, Liu, J, Ying, Z. (2022). Subtask analysis of process data through a predictive model. British Journal of Mathematical and Statistical Psychology, .Google ScholarPubMed
Xiao, Y, He, Q, Veldkamp, B, Liu, H. (2021). Exploring latent states of problem-solving competence using hidden Markov model on process data. Journal of Computer Assisted Learning, 37 512321247.CrossRefGoogle Scholar
Xu, H, Fang, G, Ying, Z. (2020). A latent topic model with Markov transition for process data. British Journal of Mathematical and Statistical Psychology, 73 3474505.CrossRefGoogle ScholarPubMed
Zhang, S, Wang, Z, Qi, J, Liu, J, Ying, ZAccurate assessment via process data. Psychometric 2023 88, 7697.Google Scholar
Zhan, P, Qiao, X. (2022). Diagnostic classification analysis of problem-solving competence using process data: An item expansion method. Psychometrika, 87, 1529.CrossRefGoogle ScholarPubMed
Figure 0

Figure. 1 Interface of the climate control item in PISA 2012.

Figure 1

Figure. 2 Structure of HMM (left) and LHMM (right).

Figure 2

Figure. 3 Left: histogram of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document} in the CC item. Middle: boxplots of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document} grouped by binary item responses. Right: ROC curve of classifying binary item responses using θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document}.

Figure 3

Table 1 Examples of response processes with top or bottom 5% of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document}.

Figure 4

Figure. 4 State-action probability matrices at the quartiles of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document} in the CC item. The column names T_M, T_B, and M_B stand for actions “Top_Middle,” “Top_Bottom,” and “Middle_Bottom,” respectively.

Figure 5

Figure. 5 State transition probability curves of the CC item.

Figure 6

Figure. 6 State-action probability curves of the CC item.

Figure 7

Figure. 7 Interface of the TICKET item in PISA 2012.

Figure 8

Table 2 Actions in the TICKET item.

Figure 9

Figure. 8 Left: histogram of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document} in the TICKET item. Middle: boxplots of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\hat{\theta }$$\end{document} grouped by binary item responses. Right: ROC curve of classifying binary item responses using θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document}.

Figure 10

Figure. 9 State-action probability matrices at the quartiles of θ^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\theta }}$$\end{document} in the TICKET item.

Figure 11

Figure. 10 State-action probability curves of the TICKET item.

Figure 12

Figure. 11 State transition probability curves of the TICKET item.

Figure 13

Figure. 12 Histograms of the difference between the BIC of LHMM and HMM for datasets generated from LHMM (top) and HMM (bottom).

Figure 14

Figure. 13 Boxplots of the RMSEs of estimated probabilities.

Figure 15

Figure. 14 True (solid lines) and estimated state transition (top row) and state-action probability curves (bottom row) for scenarios n=100,L=10\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n=100, L=10$$\end{document} (dashed lines) and n=500,L=50\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n=500, L=50$$\end{document} (dotted lines). Different elements in a probability distribution are distinguished by colors.

Figure 16

Figure. 15 Boxplots of evaluation measures of estimated latent variables.

Figure 17

Table 3 Estimated LHMM parameters for the CC item.

Figure 18

Table 4 Estimated LHMM parameters for the TICKET item.

Figure 19

Table 5 Parameters used for generating action sequences in the simulation study.