Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-11T04:55:25.889Z Has data issue: false hasContentIssue false

A Note on Exploratory Item Factor Analysis by Singular Value Decomposition

Published online by Cambridge University Press:  01 January 2025

Haoran Zhang
Affiliation:
Fudan University
Yunxiao Chen*
Affiliation:
London School of Economics and Political Science
Xiaoou Li
Affiliation:
University of Minnesota
*
Correspondence should be made to Yunxiao Chen, Department of Statistics, London School of Economics and Political Science, London, UK. Email: y.chen186@lse.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

We revisit a singular value decomposition (SVD) algorithm given in Chen et al. (Psychometrika 84:124–146, 2019b) for exploratory item factor analysis (IFA). This algorithm estimates a multidimensional IFA model by SVD and was used to obtain a starting point for joint maximum likelihood estimation in Chen et al. (2019b). Thanks to the analytic and computational properties of SVD, this algorithm guarantees a unique solution and has computational advantage over other exploratory IFA methods. Its computational advantage becomes significant when the numbers of respondents, items, and factors are all large. This algorithm can be viewed as a generalization of principal component analysis to binary data. In this note, we provide the statistical underpinning of the algorithm. In particular, we show its statistical consistency under the same double asymptotic setting as in Chen et al. (2019b). We also demonstrate how this algorithm provides a scree plot for investigating the number of factors and provide its asymptotic theory. Further extensions of the algorithm are discussed. Finally, simulation studies suggest that the algorithm has good finite sample performance.

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2020 The Author(s)

1. Background

Exploratory IFA (Bock et al. Reference Bock, Gibbons and Muraki1988) has been widely used for analyzing item-level data in social and behavioral sciences (Bartholomew et al. Reference Bartholomew, Moustaki, Galbraith and Steele2008). We consider a standard exploratory IFA setting for binary item response data. Let Y ij { 0 , 1 } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij} \in \{0, 1\}$$\end{document} be a random variable, denoting individual i’s response to item j, where i = 1 , , N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1,\ldots , N$$\end{document} , and j = 1 , , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,\ldots , J$$\end{document} . Moreover, IFA assumes that an individual i’s responses are driven by K latent factors, denoted by θ i = ( θ i 1 , , θ iK ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i = (\theta _{i1},\ldots , \theta _{iK})^\top $$\end{document} . We consider a general family of multidimensional IFA models (Reckase Reference Reckase2009), which assumes that

(1) Pr ( Y ij = 1 | θ i ) = f ( d j + a j θ i ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Pr (Y_{ij} = 1 \vert \varvec{\theta }_i) = f(d_{j} + {\mathbf {a}}_j^\top \varvec{\theta }_i), \end{aligned}$$\end{document}

where a j = ( a j 1 , , a jK ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_j = (a_{j1},\ldots , a_{jK})^\top $$\end{document} is typically known as the loading parameters, d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_j$$\end{document} is an intercept parameter, and f : R ( 0 , 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f: {\mathbb {R}} \mapsto (0, 1)$$\end{document} is a pre-specified monotone increasing function which guarantees (1) to be a valid probability. Using the terminology from generalized linear models, f is called the inverse link function. Note that (1) includes the widely used multidimensional two-parameter logistic (M2PL) model and multidimensional normal ogive model as special cases, for which f ( x ) = exp ( x ) / ( 1 + exp ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(x) = \exp (x)/(1+\exp (x))$$\end{document} and f ( x ) = - x exp ( - t 2 / 2 ) / ( 2 π ) d t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(x) = \int _{-\infty }^x \exp (-t^2/2)/(2\pi )dt$$\end{document} , respectively. Moreover, we assume local independence; that is, Y i 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i1}$$\end{document} ,..., Y iJ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{iJ}$$\end{document} are conditionally independent given θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i$$\end{document} . Finally, θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i$$\end{document} , i = 1 , , N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i =1,\ldots , N$$\end{document} , are independent and identically distributed, following an unknown distribution F.

A major focus of exploratory IFA is to estimate the loading matrix A = ( a jk ) J × K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A = (a_{jk})_{J\times K}$$\end{document} , which helps to understand the latent structure underlying the set of items. It is worth noting that the loading matrix can only be recovered up to an oblique rotation (Browne Reference Browne2001).Footnote 1 That is, model (1) will remain unchanged, with a rotated loading vector a ~ j = O a j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{{\mathbf {a}}}_j = O^\top {\mathbf {a}}_j$$\end{document} and θ ~ i = O - 1 θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\varvec{\theta }}_i = O^{-1}\varvec{\theta }_i$$\end{document} , where O is an K × K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\times K$$\end{document} invertible matrix that is also known as an oblique rotation. Recognizing the rotational indeterminacy issue, exploratory IFA typically proceeds in two steps. In the first step, an estimate A ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}}$$\end{document} is obtained, using an arbitrary way to fix the rotation. Then in the second step, analytic rotational methods are applied to A ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}}$$\end{document} to obtain a more sparse loading matrix for better interpretability.

An analytic rotation finds a rotation matrix O such that A ^ O \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}} O$$\end{document} minimizes a certain “complexity function,” where a lower value of the complexity function indicates more sparsity in the loading matrix (see Browne Reference Browne2001, for a review of analytic rotations). It implicitly assumes that the true loading matrix has a sparse pattern; i.e., each item is only directly associated with a small number of factors.

In this note, we focus on the first step of exploratory IFA. In particular, we study an estimator given in Chen et al. (Reference Chen, Li and Zhang2019b) that is based on SVD. Compared to other estimators, this estimator is computationally much faster and does not suffer from convergence issues. It was used to obtain a starting point for a constrained joint maximum likelihood estimator (CJMLE). Simulation studies showed that the convergence of CJMLE can be improved by using the SVD-based estimator as a starting point. Moreover, this SVD-based estimator itself is reasonably accurate when both N and J are large. Thus, it can be used not only as a starting point for the CJMLE, but also as a quick and high-quality solution to large-scale exploratory IFA problems. In what follows, we investigate the statistical properties of this estimator.

2. Main Results

SVD-Based Estimator We restate this SVD-based algorithm below.Footnote 2

Algorithm 1

(SVD-based estimator for exploratory IFA)

  1. 1. Input response Y = ( y ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y = (y_{ij})_{N\times J}$$\end{document} , the number of factors K, inverse link function f, and truncation parameter ϵ N , J > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J} > 0$$\end{document} .

  2. 2. Apply the singular value decomposition to Y and obtain Y = j = 1 J σ j u j v j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y = \sum _{j = 1}^J \sigma _j {\mathbf {u}}_j{\mathbf {v}}_j^\top $$\end{document} , where σ 1 σ J 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _1 \ge \cdots \ge \sigma _J \ge 0$$\end{document} are the singular values, and u j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {u}}_j$$\end{document} s and v j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {v}}_j$$\end{document} s are left and right singular vectors, respectively.

  3. 3. Let X = ( x ij ) N × J = k = 1 K ~ σ k u k v k , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X = (x_{ij})_{N \times J} = \sum _{k = 1}^{{\tilde{K}}} \sigma _k {\mathbf {u}}_k{\mathbf {v}}_k^\top ,$$\end{document} where K ~ = max { K + 1 , arg max k { σ k 1.01 N } } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{K}} = \max \big \{K+1, \mathop {{\text {arg max}}}\limits _k\{\sigma _k \ge 1.01 \sqrt{N}\}\big \}$$\end{document} .

  4. 4. Let X ^ = ( x ^ ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{X}} = ({\hat{x}}_{ij})_{N\times J}$$\end{document} be defined as

    x ^ ij = ϵ N , J , if x ij < ϵ N , J , x ij , if ϵ N , J x ij 1 - ϵ N , J , 1 - ϵ N , J , if x ij > 1 - ϵ N , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\hat{x}}_{ij} = {\left\{ \begin{array}{ll} \epsilon _{N,J}, \quad \text {if } x_{ij} < \epsilon _{N,J},\\ x_{ij}, \quad \text {if } \epsilon _{N,J} \le x_{ij} \le 1-\epsilon _{N,J},\\ 1-\epsilon _{N,J}, \quad \text {if } x_{ij} > 1 - \epsilon _{N,J}. \end{array}\right. } \end{aligned}$$\end{document}
  5. 5. Let M ~ = ( m ~ ij ) N × J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{M}} = ({\tilde{m}}_{ij})_{N\times J},$$\end{document} where m ~ ij = f - 1 ( x ^ ij ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{m}}_{ij} = f^{-1}({\hat{x}}_{ij}).$$\end{document}

  6. 6. Let d ^ = ( d ^ 1 , , d ^ J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{{{\mathbf {d}}}}}} = ({\hat{d}}_1,\ldots ,{\hat{d}}_J)$$\end{document} , where d ^ j = ( i = 1 N m ~ ij ) / N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{d}}_j = (\sum _{i=1}^N{\tilde{m}}_{ij})/N$$\end{document} .

  7. 7. Apply singular value decomposition to M ^ = ( m ~ ij - d ^ j ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} = ({\tilde{m}}_{ij} - {\hat{d}}_j)_{N\times J}$$\end{document} to have M ^ = j = 1 J σ ^ j u ^ j v ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} = \sum _{j = 1}^J {{\hat{\sigma }}}_j {{\hat{{\mathbf {u}}}}}_j{{\hat{{\mathbf {v}}}}}_j^\top $$\end{document} , where σ ^ 1 σ ^ J 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{\sigma }}}_1 \ge \cdots \ge {{\hat{\sigma }}}_J \ge 0$$\end{document} are the singular values, and u ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{{\mathbf {u}}}}}_j$$\end{document} s and v ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{{\mathbf {v}}}}}_j$$\end{document} s are the left and right singular vectors, respectively.

  8. 8. Output A ^ = 1 N ( σ ^ 1 v ^ 1 , , σ ^ K v ^ K ) , Θ ^ = N ( u ^ 1 , , u ^ K ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}} = \frac{1}{\sqrt{N}}({{\hat{\sigma }}}_1 {\hat{{\mathbf {v}}}}_1,\ldots ,{\hat{\sigma }}_K{\hat{{\mathbf {v}}}}_K), {\hat{\Theta }} = \sqrt{N}({\hat{{\mathbf {u}}}}_1,\ldots ,{\hat{{\mathbf {u}}}}_K).$$\end{document}

Remark 1

SVD is a powerful tool for the factorization of rectangular matrices that has been widely used in multivariate statistics for the dimension reduction in data (Wall et al. Reference Wall, Rechtsteiner, Rocha, Berrar, Dubitzky and Granzow2003). Thanks to the mathematical properties of SVD, the estimator given by Algorithm 1 is analytic that does not suffer from convergence issues. On the other hand, as the objective functions of the CJMLE and the marginal maximum likelihood estimator (MMLE; Bock and Aitkin Reference Bock and Aitkin1981) are nonconvex, there is no guarantee for finding their global optima. In addition, this SVD approach is also much faster than the other estimators, including the CJMLE and MMLE. In particular, the computation of the MMLE based on the vanilla expectation maximization algorithm is not affordable when the latent dimension K is of a moderate size (e.g., K 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\ge 5$$\end{document} ). Even the stochastic algorithms for the MMLE (Cai Reference Cai2010a; Reference Cai2010b; Zhang et al. Reference Zhang, Chen and Liu2020) and the alternating minimization algorithm for the CJMLE (Chen et al. Reference Chen, Li and Zhang2019b; Reference Chen, Li and Zhang2019c) are much slower than the SVD algorithm, as these algorithms typically need a large number of iterations to converge. A speed comparison is provided in the simulation study between the SVD method and the CJMLE.

Remark 2

Algorithm 1 can be viewed as a generalization of PCA to binary data. PCA is an SVD-based algorithm (e.g., Chapter 14, Friedman et al. Reference Friedman, Hastie and Tibshirani2001) that is fast and commonly used for exploratory linear factor analysis. Unfortunately, PCA cannot be applied to exploratory IFA, due to the nonlinear link function in IFA models. Unlike PCA which applies SVD only once, Algorithm 1 applies SVD twice. The first application of SVD and the inverse transformation (Steps 2–5) denoise and linearize the data. Then, the second application of SVD (Steps 6–7) is essentially doing PCA to the linearized data.

Remark 3

Similar as the CJMLE (Chen et al. Reference Chen, Li and Zhang2019b; Reference Chen, Li and Zhang2019c), this SVD-based estimator does not require the latent distribution F to be known or to take a parametric form as is required in the MMLE approach. Moreover, exploratory IFA based on tetrachoric/polychoric correlations (Muthén Reference Muthén1984; Lee et al. Reference Lee, Poon and Bentler1990; Lee et al. Reference Lee, Poon and Bentler1992; Jöreskog Reference Jöreskog1994) or composite-likelihood-based estimator (Katsikatsou et al. Reference Katsikatsou, Moustaki, Yang-Wallentin and Jöreskog2012) requires F to be multivariate normal, with the former approach further requiring the inverse link f to be probit. In this sense, the SVD-based estimator and the CJMLE require less model assumptions than the other estimators. As a price, their consistency requires stronger conditions, specifically, a double asymptotic regime where both N and J diverge.

Remark 4

Steps 2–4 of the algorithm essentially follow the same procedure of Chatterjee (Reference Chatterjee2015) for matrix estimation. We thus refer the readers to Chatterjee (Reference Chatterjee2015) for the details. A small difference is that we require K ~ K + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{K}} \ge K+1$$\end{document} in Step 3 of the algorithm. This modification does not affect the asymptotic behavior of the estimator. However, it can improve the finite-sample performance when N and J are not large enough. Intuitively, we need K ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{K}}$$\end{document} to be at least K + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K+1$$\end{document} , in order to recover the matrix ( d j + a j θ i ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d_{j} + {\mathbf {a}}_j^\top \varvec{\theta }_i)_{N\times J}$$\end{document} which is of rank K + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K + 1$$\end{document} . The constant 1.01 in Step 3 of the algorithm follows Theorem 1.1 of Chatterjee (Reference Chatterjee2015), which makes use of the fact that V a r ( Y ij ) 1 / 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Var(Y_{ij})\le 1/4$$\end{document} . This constant can be replaced by any fixed constant in the open interval (1, 1.5), without affecting its consistency given in Theorem 1. We set it to be 1.01, because according to Theorem 1.1 of Chatterjee (Reference Chatterjee2015) this constant should be chosen close to 1 for better accuracy.

Remark 5

The truncation step (Step 4) is necessary, as it guarantees the existence of a solution. This is because, even though x ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{ij}$$\end{document} in Step 3 is approximating the true probability Pr ( Y ij = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Pr (Y_{ij} = 1)$$\end{document} , it is not guaranteed to be in the interval (0, 1). As a consequence, f - 1 ( x ij ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f^{-1}(x_{ij})$$\end{document} may not be well defined. The pre-specified truncation parameter ϵ N , J > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J} > 0$$\end{document} determines the truncation level. As shown in the sequel, the choice of ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} affects the statistical consistency of the proposed algorithm. Under certain circumstances, we will need the truncation parameter ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} to decay to zero as N and J grow to infinity, which is why we attach subscripts N and J to the truncation parameter. In practice, the performance of the proposed method tends to be insensitive to the choice of ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} when it is chosen sufficiently small, which is justified theoretically by Propositions 1 and 2, under two specific settings. In the numerical analysis of this paper, we use ϵ N , J = 10 - 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J} = 10^{-4}$$\end{document} as a default value.

Statistical Consistency In what follows, we establish the theoretical consistency of this method. In particular, we show that this SVD-based algorithm is consistent under similar asymptotic setting and notion of consistency as in Chen et al. (Reference Chen, Li and Zhang2019b) and Chen et al. (Reference Chen, Li and Zhang2019c). The proofs of our theoretical results are given in the supplementary material. More precisely, we consider a loss function on the recovery of the true loading matrix A = ( a jk ) J × K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A^* = (a_{jk}^*)_{J\times K}$$\end{document} up to an oblique rotation

(2) L N , J ( A , A ^ ) = min O R K × K A - A ^ O F 2 JK , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} L_{N, J}(A^*, {\hat{A}}) =\min _{O\in {\mathbb {R}}^{K\times K}} \left\{ \frac{\Vert A^* - {\hat{A}} O \Vert _F^2}{JK} \right\} , \end{aligned}$$\end{document}

where the subscripts N and J are used to emphasize that the loss function depends on the sample size N and the number of items J, and X F = i j x ij 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert X\Vert _F = \sqrt{\sum _{i}\sum _{j} x_{ij}^2}$$\end{document} denotes the Frobenius norm of a matrix X = ( x ij ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X = (x_{ij})$$\end{document} . Under mild technical conditions and a double asymptotic setting where both N and J grow to infinity, we show that the loss function L N , J ( A , A ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{N, J}(A^*, {\hat{A}}) $$\end{document} converges to zero in probability. The regularity conditions and the consistency result are formally described in Theorem 1, with two special cases discussed in the sequel. Similar double asymptotic settings have been considered in psychometric research, including the analyses of unidimensional IRT models (Haberman Reference Haberman1977; Reference Haberman2004) and diagnostic classification models (Chiu et al. Reference Chiu, Köhn, Zheng and Henson2016). The following regularity conditions are needed for our main result in Theorem 1. As will be discussed in the sequel, these conditions are mild.

  1. A1. There exists a constant C such that ( d j ) 2 + a j 2 C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{(d_j^*)^2 + \Vert {\mathbf {a}}^*_j\Vert ^2 } \le C$$\end{document} , for j = 1 , , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,\ldots ,J$$\end{document} , where d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_j^*$$\end{document} and a j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_j^*$$\end{document} are the true item parameters.

  2. A2. The true person parameters θ 1 , , θ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_1^*,\ldots ,\varvec{\theta }_N^*$$\end{document} are independent and identically distributed (i.i.d.) following a distribution F which has mean 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {0}}$$\end{document} and positive definite covariance matrix Σ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma .$$\end{document}

  3. A3. The inverse link function f is strictly monotone increasing, continuously differentiable, and Lipschitz continuous with Lipschitz constant L. We further assume that

    lim x - f ( x ) = 0 , and lim x f ( x ) = 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}\lim _{x\rightarrow -\infty } f(x) = 0, ~~\text{ and }~~ \lim _{x\rightarrow \infty } f(x) = 1.\end{aligned}$$\end{document}
  4. A4. There exists a constant C 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_1,$$\end{document} such that the Kth singular value of A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A^*$$\end{document} , denoted by σ K ( A ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _K(A^*)$$\end{document} , satisfies σ K ( A ) C 1 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _K(A^*) \ge C_1\sqrt{J}$$\end{document} for all J.

  5. A5. The sample size N is no less than the number of items J, i.e., N J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N \ge J$$\end{document} .

Theorem 1

Suppose that conditions A1–A5 are satisfied. Further suppose that ϵ N , J 1 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J} \le \frac{1}{5}$$\end{document} and satisfies

(3) Pr θ 1 h ( 2 ϵ N , J ) / C = o ( N - 1 ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\Pr \left( \Vert \varvec{\theta }_1^*\Vert \ge h(2\epsilon _{N,J})/C \right) = o({N}^{-1}), \end{aligned}$$\end{document}
(4) ( h ( 2 ϵ N , J ) ) K + 1 K + 3 ( ϵ N , J g ( ϵ N , J ) ) 2 = o ( J 1 K + 3 ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\frac{(h(2\epsilon _{N,J}))^{\frac{K+1}{K+3}}}{(\epsilon _{N,J} g(\epsilon _{N,J}))^2} = o(J^{\frac{1}{K+3}}), \end{aligned}$$\end{document}

where

(5) h ( y ) = max { | f - 1 ( y ) | , | f - 1 ( 1 - y ) | } , y ( 0 , 0.5 ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} h(y)&= \max \{ |f^{-1}(y)|, |f^{-1}(1-y)| \}, ~~ y \in (0,0.5), \end{aligned}$$\end{document}
(6) g ( y ) = inf { f ( x ) : x [ f - 1 ( y ) , f - 1 ( 1 - y ) ] } , y ( 0 , 0.5 ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g(y)&= \inf \{f'(x): x \in [f^{-1}(y),f^{-1}(1-y)]\}, ~~ y \in (0, 0.5). \end{aligned}$$\end{document}

Then, the estimate A ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}}$$\end{document} given by Algorithm 1 satisfies L N , J ( A , A ^ ) pr 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{N,J}(A^*,{\hat{A}}) \overset{pr}{\rightarrow } 0$$\end{document} , as N , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N, J \rightarrow \infty .$$\end{document}

Remark 6

We remark that the notion of consistency for the estimation of the loading matrix is weaker than that in the traditional sense, since the loss function (2) is an average of the entrywise losses when J grows. Let O ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}$$\end{document} minimize the right-hand side of (2), and let A ~ : = ( a ~ jk ) J × K = A ^ O ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{A}} := ({\tilde{a}}_{jk})_{J\times K} = {\hat{A}} {\tilde{O}}$$\end{document} . Then, (2) converges to 0 means that for any ϵ > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon > 0$$\end{document} , ( j = 1 J k = 1 K 1 { | a jk - a ~ jk | > ϵ } ) / JK \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\sum _{j=1}^J\sum _{k=1}^K 1_{\{\vert a_{jk}^* - {\tilde{a}}_{jk} \vert > \epsilon \}}})/{JK}$$\end{document} also converges to 0. That is, the proportion of inaccurately estimated loading parameters converges to zero in probability under the optimal rotation. Due to the double asymptotic setting, our theoretical result only suggests the sensible use of the SVD-based algorithm when the sample size N and the number of items J are both large.

Remark 7

It has been well understood that PCA can consistently estimate a linear factor model under a similar double asymptotic setting (Stock and Watson Reference Stock and Watson2002), which provides the theoretical justification for the use of PCA in exploratory linear factor analysis. Theorem 1 can be viewed as a similar result for exploratory item factor analysis.

Remark 8

We provide some discussions on the regularity conditions required in Theorem 1. Assumption A1 requires that the parameters of each item, including the intercept and slope parameters, should not be too large. That is, the presence of an extreme item is likely to distort the analysis. Assumption A2 is a very standard assumption in exploratory IFA. It is more flexible than many exploratory IFA settings, as it does not require the distribution F to be multivariate normal.

Assumption A3 is satisfied by the logistic and probit link functions, two most commonly used link functions in exploratory IFA, but it excludes, for example, the multidimensional version of the three-parameter logistic model, as a special case. Assumption A4 requires that there is sufficient variability in the items. The same assumption is also required in Chen et al. (Reference Chen, Li and Zhang2019b) and Chen et al. (Reference Chen, Li and Zhang2019c). In fact, this assumption is satisfied with probability tending to one, when the true loadings a j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_j^*$$\end{document} are i.i.d. samples from a K-variate distribution whose covariance matrix is non-degenerate. Finally, assumption A5 is practically reasonable, as in large-scale measurement, the sample size is usually larger than the number of items. Since people and items are almost mathematically symmetric in the IFA model, similar asymptotic results can be derived when J N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J\ge N$$\end{document} .

Remark 9

We further provide some intuitions on the reason why the algorithm works. Steps 2–4 essentially follow the same procedure of Chatterjee (Reference Chatterjee2015) for matrix estimation. The procedure guarantees the loss i , j ( f ( d j + ( a j ) θ i ) - x ^ ij ) 2 / ( N J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sum _{i,j}( f(d_j^* + ({\mathbf {a}}_j^*)^\top \varvec{\theta }_i^*) - {\hat{x}}_{ij} )^2}/{(NJ)}$$\end{document} to be small with high probability, where d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_j^*$$\end{document} and a j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_j^*$$\end{document} denote the true item-specific parameters and θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i^*$$\end{document} denotes the true person parameters sampled from distribution F. Further with conditions A1 and A3, Steps 5 and 6 guarantee the average loss i = 1 N j = 1 J ( ( a j ) θ i - a ^ j θ ^ i ) 2 / ( N J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{i=1}^N \sum _{j=1}^J (({\mathbf {a}}_j^*)^\top \varvec{\theta }_i^* - {\hat{{\mathbf {a}}}}_j^\top \hat{\varvec{\theta }}_i)^2/(NJ)$$\end{document} to be small with high probability. Finally, under conditions A2 and A4, the famous Davis–Kahan–Wedin theorem from matrix perturbation theory (see e.g., Stewart and Sun Reference Stewart and Sun1990; O’Rourke et al. Reference O’Rourke, Vu and Wang2018) guarantees that L N , J ( A , A ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{N, J}(A^*, {\hat{A}})$$\end{document} is small with high probability.

Remark 10

Equations (3) and (4) are requirements on the truncation parameter ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} , which depends on both the tail of distribution F and the properties of the inverse link function. Roughly speaking, Equation (3) is saying that ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} cannot be too large. This is because, given F and f, the probability in (3) is increasing in ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N, J}$$\end{document} . Requiring the probability being o ( N - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o(N^{-1})$$\end{document} implies that ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N, J}$$\end{document} cannot be large. This requirement is intuitive, because M ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{M}}$$\end{document} can be a poor approximation to M = ( m ij ) N × J : = ( d j + ( a j ) θ i ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M^* = (m_{ij}^*)_{N\times J} := (d_{j}^* + ({\mathbf {a}}_j^*)^\top \varvec{\theta }_i^*)_{N\times J}$$\end{document} , when many entries of M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M^*$$\end{document} are larger than h ( ϵ N , J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\epsilon _{N, J})$$\end{document} . The function h ( · ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\cdot )$$\end{document} transforms the truncation on x ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{ij}$$\end{document} to a truncation on m ~ ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{m}}_{ij}$$\end{document} . Using h ( 2 ϵ N , J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(2\epsilon _{N, J})$$\end{document} instead of h ( ϵ N , J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\epsilon _{N, J})$$\end{document} is for technical reasons.

Equation (4) requires that ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} cannot be too small, as the left-hand side of (4) is decreasing in ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} . This requirement is also intuitive. Note that | m ~ ij | h ( ϵ N , J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\tilde{m}}_{ij}| \le h(\epsilon _{N, J})$$\end{document} , where h ( ϵ N , J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\epsilon _{N, J})$$\end{document} is decreasing in ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} . Therefore, a sufficiently large choice of ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} avoids the approximation error M ~ - M F \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert {\tilde{M}} - M^*\Vert _F$$\end{document} being too large when there exist some extreme estimates m ~ ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{m}}_{ij}$$\end{document} . Function g ( · ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\cdot )$$\end{document} measures the local flatness of the inverse link f. The true matrix M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M^*$$\end{document} is more difficult to estimate when g ( ϵ N , J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\epsilon _{N,J})$$\end{document} is smaller. This is because | m ~ ij - m ij | \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\tilde{m}}_{ij} - m_{ij}^*|$$\end{document} can be large, even when | x ^ ij - f ( m ij ) | \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\hat{x}}_{ij} - f(m_{ij}^*)|$$\end{document} is small, due to the local flatness of the inverse link function.

Remark 11

We take a stochastic design for the true person parameters and a fixed design for the true item parameters, following the convention of item factor analysis (e.g., Bartholomew et al. Reference Bartholomew, Moustaki, Galbraith and Steele2008). It is worth pointing out that whether taking a stochastic or fixed design is not essential under our double asymptotic regime. For example, the consistent result of Theorem 1 still holds, if we can replace condition A2 by a corresponding fixed design as in Chen et al. (Reference Chen, Li and Zhang2019b).

Following the discussion on ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} in Remark 10, we consider two concrete settings under which the requirement on ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} becomes more specific. These results are given in Propositions 1 and 2.

Proposition 1

Suppose that F has a compact support. More precisely, there exists a constant C 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_0$$\end{document} , satisfying

Pr ( θ 1 C 0 ) = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}\Pr (\Vert \varvec{\theta }_1^*\Vert \ge C_0) = 0,\end{aligned}$$\end{document}

under the law of F. If we fix ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} to be a constant ϵ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document} independent of N and J, satisfying

(7) 0 < ϵ 1 2 min 1 - f C C 0 2 + 1 , f - C C 0 2 + 1 , 2 5 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} 0 < \epsilon \le \frac{1}{2}\min \left\{ 1-f\left( C\sqrt{C_0^2+1}\right) ,f\left( -C\sqrt{C_0^2+1}\right) , \frac{2}{5} \right\} , \end{aligned}$$\end{document}

then (3) and (4) are satisfied. This choice of ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} , together with the regularity conditions in Theorem 1, guarantees L N , J ( A , A ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{N, J}(A^*, {\hat{A}})$$\end{document} to converge to zero in probability.

Proposition 2

Consider exploratory IFA based on the M2PL model, where F is a multivariate sub-Gaussian distributionFootnote 3 and f is the logistic link. Suppose that there exists a constant β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta \ge 1$$\end{document} such that

(8) J N J β . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} J \le N \le J^{\beta }. \end{aligned}$$\end{document}

Then,

(3) and (4) hold, for any ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} taking the form

(9) ϵ N , J = γ 0 J - γ 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \epsilon _{N,J} = \gamma _0 J^{-\gamma _1}, \end{aligned}$$\end{document}

where γ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _0$$\end{document} and γ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _1$$\end{document} are any constants satisfying γ 0 > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _0 > 0$$\end{document} and γ 1 ( 0 , ( 4 ( K + 3 ) ) - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _1 \in (0, (4(K+3))^{-1})$$\end{document} . The choice of ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} following (9), together with the regularity conditions in Theorem 1, guarantees L N , J ( A , A ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{N, J}(A^*, {\hat{A}})$$\end{document} to converge to zero in probability.

According to the result of Proposition 1, it suffices to choose ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} as a sufficiently small positive constant, when F has a bounded support. Under the setting of Proposition 2, to ensure consistency, one has to let ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} decay to zero at an appropriate rate. Note that even in the second setting where the support of F is unbounded, ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} is almost like a constant, as it decays to zero very slowly when J grows. These results suggest that we may choose ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} to be a sufficiently small constant in practice.

On the Choice of K In the previous discussion, the number of factors K is assumed to be known. In practice, however, this information is often unknown and an important task in exploratory IFA is to determine the number of factors based on data. When conducting exploratory linear factor analysis, one typically gains the first idea by examining the scree plot from principal component analysis. Thanks to the connection between Algorithm 1 and PCA as discussed in Remark 2, a similar scree plot is available from the current method.

The scree plot is produced as follows. We first run Algorithm 1, but replace the unknown K in Step 1 of the algorithm by a reasonably large number K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\dagger }$$\end{document} . Then, a scree plot can be obtained by plotting σ ^ k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_k$$\end{document} in a descending order, for σ ^ k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_k$$\end{document} s produced by Step 7 of Algorithm 1. Figure 1 shows such a scree plot, for which the data are generated from a five-factor model ( K = 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=5$$\end{document} ) with J = 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 200$$\end{document} and N = 4000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 4000$$\end{document} , and the input number of factors is set to be K = 10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\dagger } = 10$$\end{document} in Step 1 of the algorithm. Unsurprisingly, an obvious gap is observed between σ ^ 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_{5}$$\end{document} and σ ^ 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_{6}$$\end{document} . In fact, when data follow an IFA model, such a gap in the singular values is guaranteed to exist asymptotically, no matter what the input dimension is. In practice, the latent dimension K can be chosen by identifying the singular value gap from the scree plot.

Theorem 2

Under the same conditions as Theorem 1 and when the input dimension K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\dagger }$$\end{document} in Algorithm 1 is set fixed (i.e., independent of N and J) but not necessarily equal to the true number of factors, there exists a constant δ > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta > 0$$\end{document} such that for the true number of factors K,

lim N , J Pr σ ^ K NJ > δ = 1 , and σ ^ K + 1 NJ pr 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}\lim _{N, J\rightarrow \infty } \Pr \left( \frac{{\hat{\sigma }}_{K}}{\sqrt{NJ}} > \delta \right) = 1, \text{ and } \frac{{\hat{\sigma }}_{K+1}}{\sqrt{NJ}} \overset{pr}{\rightarrow } 0,\end{aligned}$$\end{document}

as N and J grow to infinity simultaneously.

Remark 12

As shown in the proof, the input dimension K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\dagger }$$\end{document} does not affect the asymptotics, as long as it does not grow with N and J. However, for relatively small N and J, X obtained in Step 3 of the algorithm may not reserve enough information when the input dimension is smaller than K + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K+1$$\end{document} , which may lead to an underestimation of the number of factors. Thus, in practical applications, we recommend to choose the input dimension to be slightly larger than the maximum number of factors one suspects to exist in the data.

Figure 1. A scree plot for choosing the number of factors. The y-axis shows the standardized singular values σ ^ k / NJ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_k/\sqrt{NJ}$$\end{document} , where σ ^ k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_k$$\end{document} s are obtained from Step 7 of Algorithm 1. The data are simulated from an IFA model with K = 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=5$$\end{document} , J = 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 200$$\end{document} , and N = 4000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 4000$$\end{document} . The input dimension is set to be 10 in Algorithm 1. A singular value gap can be found between the 5th and 6th singular values

Statistical Efficiency We further point out that a price is paid for the computational advantage of the SVD-based estimator. To elaborate on this point, we compare it with the CJMLE (Chen et al. Reference Chen, Li and Zhang2019b; Reference Chen, Li and Zhang2019c). The CJMLE treats both item parameters and latent factors as fixed parameters and maximizes a joint likelihood function with respect to all the fixed parameters. The SVD-based estimator is statistically less efficient than the CJMLE, in the sense that the SVD-based estimator converges to the true parameters in a much slower rate. To make this comparison, we consider the same setting as in Proposition 1. The following proposition establishes the convergence rate for X - X ^ F 2 / N J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert X^* - {\hat{X}}\Vert _F^2/NJ,$$\end{document} which determines the convergence of A ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}}$$\end{document} . Here, X = ( f ( d j + a j ( θ i ) ) ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^* = (f(d_{j}^* + {\mathbf {a}}_j^* (\varvec{\theta }_i^*)^\top ))_{N\times J}$$\end{document} is the true item response probability matrix.

Proposition 3

Suppose that the same assumptions as in Proposition 1 hold and choose ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} as in Proposition 1. Then, we have

(10) 1 NJ X - X ^ F 2 = O p ( J - 1 K + 2 ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{1}{NJ}\Vert X^* - {\hat{X}} \Vert _F^2 = O_p(J^{-\frac{1}{K+2}}). \end{aligned}$$\end{document}

On the other hand, as shown in Chen et al. (Reference Chen, Li and Zhang2019c), the CJMLE achieves the optimal rate (in minimax sense) for estimating X \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^*$$\end{document} , that is, X - X ^ JML F 2 / ( N J ) = O p ( J - 1 ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert X^* - {\hat{X}}_{JML} \Vert _F^2/(NJ) = O_p(J^{-1}),$$\end{document} where X ^ JML \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{X}}_{JML}$$\end{document} denotes the CJMLE. This result suggests that the SVD-based estimator converges in a much slower rate than the CJMLE.

3. Extensions

Dealing with Missing Data With slight modification, Algorithm 1 can handle item response data with missing values. We use matrix W = ( w ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W = (w_{ij})_{N\times J}$$\end{document} to indicate the data nonmissingness, where w ij = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij} = 1$$\end{document} indicates the response Y ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} is not missing and w ij = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij} =0$$\end{document} otherwise. The modified algorithm is described as follows.

Algorithm 2

(SVD-based estimator for exploratory IFA with missing data)

  1. 1. Input nonmissing indicator W = ( w ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W = (w_{ij})_{N\times J}$$\end{document} , nonmissing responses { y ij : w ij = 1 , i = 1 , , N , j = 1 , , J } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{y_{ij}: w_{ij} =1, i = 1,\ldots , N, j = 1,\ldots , J\}$$\end{document} , the number of factors K, inverse link function f, and truncation parameter ϵ N , J > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J} > 0$$\end{document} .

  2. 2. Compute p ^ = ( i = 1 N j = 1 J w ij ) / ( N J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{p}} = (\sum _{i=1}^N\sum _{j=1}^J w_{ij})/(NJ)$$\end{document} as the proportion of observed responses.

  3. 3. For each i and j, let z ij = y ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{ij} = y_{ij}$$\end{document} , if w ij = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij} = 1$$\end{document} , and z ij = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{ij} = 0$$\end{document} if w ij = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij} = 0$$\end{document} .

  4. 4. Apply the singular value decomposition to Z to obtain Z = j = 1 J σ j u j v j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z = \sum _{j = 1}^J \sigma _j {\mathbf {u}}_j{\mathbf {v}}_j^\top $$\end{document} , where σ 1 σ J 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _1 \ge \cdots \ge \sigma _J \ge 0$$\end{document} are the singular values and u j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {u}}_j$$\end{document} s and v j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {v}}_j$$\end{document} s are left and right singular vectors, respectively.

  5. 5. Let

    X = ( x ij ) N × J = 1 p ^ k = 1 K ~ σ k u k v k , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}X = (x_{ij})_{N \times J} = \frac{1}{{\hat{p}}}\sum _{k = 1}^{{\tilde{K}}} \sigma _k {\mathbf {u}}_k{\mathbf {v}}_k^\top ,\end{aligned}$$\end{document}
    where K ~ = max { K + 1 , arg max k { σ k 1.01 N ( p ^ + 3 p ^ ( 1 - p ^ ) ) } } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{K}} = \max \big \{K+1, \mathop {{\text {arg max}}}\limits _k\{\sigma _k \ge 1.01 \sqrt{N({\hat{p}}+3{\hat{p}}(1-{\hat{p}}))}\}\big \}$$\end{document} .
  6. 6. Let X ^ = ( x ^ ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{X}} = ({\hat{x}}_{ij})_{N\times J}$$\end{document} be defined as

    x ^ ij = ϵ N , J , if x ij < ϵ N , J , x ij , if ϵ N , J x ij 1 - ϵ N , J , 1 - ϵ N , J , if x ij > 1 - ϵ N , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\hat{x}}_{ij} = {\left\{ \begin{array}{ll} \epsilon _{N,J}, \quad \text {if } x_{ij} < \epsilon _{N,J},\\ x_{ij}, \quad \text {if } \epsilon _{N,J} \le x_{ij} \le 1-\epsilon _{N,J},\\ 1-\epsilon _{N,J}, \quad \text {if } x_{ij} > 1 - \epsilon _{N,J}. \end{array}\right. } \end{aligned}$$\end{document}
  7. 7. Let M ~ = ( m ~ ij ) N × J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{M}} = ({\tilde{m}}_{ij})_{N\times J},$$\end{document} where m ~ ij = f - 1 ( x ^ ij ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{m}}_{ij} = f^{-1}({\hat{x}}_{ij}).$$\end{document}

  8. 8. Let d ^ = ( d ^ 1 , , d ^ J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{{\mathbf {d}}}}} = ({\hat{d}}_1,\ldots ,{\hat{d}}_J)$$\end{document} , where d ^ j = ( i = 1 N m ~ ij ) / N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{d}}_j = (\sum _{i=1}^N{\tilde{m}}_{ij})/N$$\end{document} .

  9. 9. Apply singular value decomposition to M ^ = ( m ~ ij - d ^ j ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} = ({\tilde{m}}_{ij} - {\hat{d}}_j)_{N\times J}$$\end{document} to have M ^ = j = 1 J σ ^ j u ^ j v ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} = \sum _{j = 1}^J {\hat{\sigma }}_j {\hat{{\mathbf {u}}}}_j{\hat{{\mathbf {v}}}}_j^\top $$\end{document} , where σ ^ 1 σ ^ J 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_1 \ge \cdots \ge {\hat{\sigma }}_J \ge 0$$\end{document} are the singular values and u ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{\mathbf {u}}}}_j$$\end{document} s and v ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{\mathbf {v}}}}_j$$\end{document} are the left and right singular vectors, respectively.

  10. 10. Output A ^ = 1 N ( σ ^ 1 v ^ 1 , , σ ^ K v ^ K ) , Θ ^ = N ( u ^ 1 , , u ^ K ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}} = \frac{1}{\sqrt{N}}({\hat{\sigma }}_1 {\hat{{\mathbf {v}}}}_1,\ldots ,{\hat{\sigma }}_K{\hat{{\mathbf {v}}}}_K), {\hat{\Theta }} = \sqrt{N}({\hat{{\mathbf {u}}}}_1,\ldots ,{\hat{{\mathbf {u}}}}_K).$$\end{document}

Remark 13

It is easy to see that p ^ = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{p}} = 1$$\end{document} when there is no missing data. In that case, Algorithm 2 becomes exactly the same as Algorithm 1. Steps 2–5 essentially follow the same procedure of Chatterjee (Reference Chatterjee2015) for matrix completion, and the rest of the steps are the same as those in Algorithm 1. Specifically, missing data are first imputed by zero in Step 3 of the algorithm. The bias brought by the simple imputation procedure is corrected in Step 5, by multiplying the factor 1 / p ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/{\hat{p}}$$\end{document} . Similar to Algorithm 1, the choice of K ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{K}}$$\end{document} in Step 5 is determined by the procedure of Chatterjee (Reference Chatterjee2015) with a small modification which guarantees K ~ K + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{K}} \ge K+1$$\end{document} .

In fact, when the entries of the item response matrix are missing completely at random, using a similar proof, one can show that A ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}}$$\end{document} given by Algorithm 2 is still consistent, under some mild condition on the missing data mechanism and the same conditions as in Theorem 1. Specifically, the following condition is needed, in addition to conditions A1–A5.

  1. A6. The w ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij}$$\end{document} s are independent and identically distributed from a Bernoulli distribution with Pr ( w ij = 1 ) = p , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Pr (w_{ij} = 1) = p,$$\end{document} where 0 < p 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<p\le 1$$\end{document} is a constant which does not depend on N and J.

Under conditions A1–A6, the following proposition holds that guarantees the consistency of the proposed SVD estimator.

Proposition 4

Under the same conditions as Theorem 1 plus condition A6, the estimate A ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}}$$\end{document} given by Algorithm 2 satisfies L N , J ( A , A ^ ) pr 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{N,J}(A^*,{\hat{A}}) \overset{pr}{\rightarrow } 0$$\end{document} , as N , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N, J \rightarrow \infty .$$\end{document}

Dealing with Ordinal Data In exploratory IFA, ordinal data are also commonly encountered, due to the wide use of Likert-scale items. With slight modification, the SVD method can also be used to analyze ordinal data. This is achieved by applying Algorithm 1 to multiple dichotomized versions of data.

More precisely, consider data Y = ( Y ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y = (Y_{ij})_{N\times J}$$\end{document} , where Y ij { 0 , 1 , , T } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij} \in \{0, 1,\ldots , T\}$$\end{document} . We consider a general family of graded response-type models:

(11) Pr ( Y ij t | θ i ) = f ( d jt + a j θ i ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Pr (Y_{ij}\ge t \vert \varvec{\theta }_i) = f(d_{jt} + {\mathbf {a}}_j^\top \varvec{\theta }_i), \end{aligned}$$\end{document}

where d jt \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{jt}$$\end{document} is an item- and category-specific intercept parameter, and the rest of the notations are the same as that of model (1). Note that the linear combination of the factors a j θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_j^\top \varvec{\theta }_i$$\end{document} does not depend on the response category and appears in all the submodels Pr ( Y ij t | θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Pr (Y_{ij}\ge t \vert \varvec{\theta }_i)$$\end{document} for t = 1 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1,\ldots , T$$\end{document} . When f ( x ) = exp ( x ) / ( 1 + exp ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(x) = \exp (x)/(1+\exp (x))$$\end{document} takes the logistic form, model (11) becomes the multidimensional graded response model (Muraki and Carlson Reference Muraki and Carlson1995).

Model (11) is closely related to the general model (1) for binary data. In fact, if we dichotomize data at response category t, i.e., Y ij ( t ) = 1 { Y ij t } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}^{(t)} = 1_{\{Y_{ij} \ge t\}}$$\end{document} , then binary data Y ij ( t ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}^{(t)}$$\end{document} follows model (1) with the same loading parameters. Therefore, the loading matrix A can be estimated by applying Algorithm 1 to dichotomized data Y ( t ) = ( 1 { y ij t } ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y^{(t)} = (1_{\{y_{ij}\ge t\}})_{N\times J}$$\end{document} , for some t = 1 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1,\ldots , T$$\end{document} . The estimation accuracy may be further improved by aggregating the results from multiple dichotomized versions of data. This aggregation method is summarized by Algorithm 3.

Algorithm 3

(SVD-based estimator for exploratory IFA with ordinal data)

  1. 1. Input response Y = ( y ij ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y = (y_{ij})_{N\times J}$$\end{document} , the number of categories T, the number of factors K, inverse link function f, and truncation parameter ϵ N , J > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J} > 0$$\end{document} .

  2. 2. For t = 1 , , T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1,\ldots , T$$\end{document} , apply Algorithm 1 to dichotomized data Y ( t ) = ( 1 { y ij t } ) N × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y^{(t)} = (1_{\{y_{ij}\ge t\}})_{N\times J}$$\end{document} and obtain M ^ ( t ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}}^{(t)}$$\end{document} from Step 7 of Algorithm 1.

  3. 3. Let M ^ = ( t = 1 T M ^ ( t ) ) / T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} = (\sum _{t=1}^T {\hat{M}}^{(t)})/{T}$$\end{document} . Apply singular value decomposition to M ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} $$\end{document} and obtain M ^ = j = 1 J σ ^ j u ^ j v ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}} = \sum _{j = 1}^J {\hat{\sigma }}_j {\hat{{\mathbf {u}}}}_j{\hat{{\mathbf {v}}}}_j^\top $$\end{document} , where σ ^ 1 σ ^ J 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\sigma }}_1 \ge \cdots \ge {\hat{\sigma }}_J \ge 0$$\end{document} are the singular values and u ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{\mathbf {u}}}}_j$$\end{document} s and v ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{\mathbf {v}}}}_j$$\end{document} are left and right singular vectors, respectively.

  4. 4. Output A ^ = 1 N ( σ ^ 1 v ^ 1 , , σ ^ K v ^ K ) , Θ ^ = N ( u ^ 1 , , u ^ K ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{A}} = \frac{1}{\sqrt{N}}({\hat{\sigma }}_1 {\hat{{\mathbf {v}}}}_1,\ldots ,{\hat{\sigma }}_K{\hat{{\mathbf {v}}}}_K), {\hat{\Theta }} = \sqrt{N}({\hat{{\mathbf {u}}}}_1,\ldots ,{\hat{{\mathbf {u}}}}_K).$$\end{document}

4. Simulation

Simulation Setting We consider K = 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 4$$\end{document} and 8, J = 200 , 400 , 600 , 800 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 200, 400, 600, 800$$\end{document} , 1000, and 1200, and N = 20 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 20J$$\end{document} . For each combination of N, J, and K, two different latent distributions F are considered, one is a K-variate standard normal distribution, and the other is a K-variate normal distribution N ( 0 , ( σ ij ) K × K ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N({\mathbf {0}}, (\sigma _{ij})_{K\times K})$$\end{document} , where σ ij = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij} = 1$$\end{document} if i = j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=j$$\end{document} and σ ij = 0.3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij} = 0.3$$\end{document} if i j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\ne j$$\end{document} . The inverse link f is chosen to be logistic, i.e., f ( x ) = exp ( x ) / ( 1 + exp ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(x) = \exp (x)/(1+\exp (x))$$\end{document} . This leads to 24 different simulation settings, for all possible combinations of N, J, K, and F.

For each simulation setting, 100 independent replications are generated, with the item parameters keeping fixed across replications. When J = 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 200$$\end{document} and given K, the item parameters are generated as follows.

  1. 1. d 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_1^*$$\end{document} ,..., d 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{200}^*$$\end{document} are i.i.d. from a uniform distribution over interval [ - 1 , 1 ] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1,1]$$\end{document} .

  2. 2. a 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_1^*$$\end{document} ,..., a 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_{200}^*$$\end{document} are i.i.d., with a j = ( a j 1 q j 1 , , a jK q jK ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_j^* = (a_{j1}^\dagger q_{j1},\ldots , a_{jK}^\dagger q_{jK})^\top $$\end{document} . Here, a jk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jk}^\dagger $$\end{document} s are i.i.d. from a uniform distribution over interval [1, 2], and q j = ( q j 1 , , q jK ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {q}}_{j} = (q_{j1},\ldots , q_{jK})^\top $$\end{document} are i.i.d. from a uniform distribution over Q K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Q}}_K$$\end{document} . Specifically,

    Q 4 = ( q 1 , , q 4 ) : q k { 0 , 1 } , k = 1 4 q k 1 , and k = 1 4 q k 3 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{\mathcal {Q}}_4 = \left\{ (q_1,\ldots , q_4)^\top : q_k \in \{0,1\}, \sum _{k=1}^4 q_k \ge 1, \text{ and } \sum _{k=1}^4 q_k \le 3\right\} ,\end{aligned}$$\end{document}
    and
    Q 8 = ( q 1 , , q 8 ) : q k { 0 , 1 } , k = 1 8 q k 1 , and k = 1 8 q k 3 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{\mathcal {Q}}_8 = \left\{ (q_1,\ldots , q_8)^\top : q_k \in \{0,1\}, \sum _{k=1}^8 q_k \ge 1, \text{ and } \sum _{k=1}^8 q_k \le 3 \right\} .\end{aligned}$$\end{document}
    The q j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {q}}_{j}$$\end{document} s lead to sparse loading vectors.

When J > 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J > 200$$\end{document} , we set the item parameters by repeating multiple times the parameters under J = 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 200$$\end{document} and the same K. For example, when J = 400 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 400$$\end{document} , we set parameters for items 1–200 and those for items 201–400 to be the same as the parameters generated under the setting J = 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 200$$\end{document} .

Results Each simulated dataset is analyzed using the SVD-based estimator, with the truncation parameter ϵ N , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{N,J}$$\end{document} set to be 10 - 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-4}$$\end{document} . The performance of the SVD-based estimator is compared with that of the CJMLE.Footnote 4 The results are shown in Figs. 2, 3, 4, and 5.

Figure 2. Simulation results when K = 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=4$$\end{document} and the true factors are independent. Panel a shows the number of items J in x-axis versus the loss (2) in y-axis, and Panel b shows the number of items J in x-axis versus the computation time (in seconds) in y-axis. For each metric and each method, we show the median, 25% quantile, and 75% quantile based on the 100 independent replications

Figure 3. Simulation results when K = 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=4$$\end{document} and the true factors are correlated. The two panels show the same metrics as in Fig. 2

Figure 4. Simulation results when K = 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=8$$\end{document} and the true factors are independent. The two panels show the same metrics as in Fig. 2

Figure 5. Simulation results when K = 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=8$$\end{document} and the true factors are correlated. The two panels show the same metrics as in Fig. 2

The loss for the SVD-based estimator decreases when N and J simultaneously grow, under all settings. Reasonable accuracy can be achieved when N and J are reasonably large, in which case the SVD-based estimator may be directly used for data analysis. For example, under the setting that K = 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 4$$\end{document} and F is multivariate standard normal, the loss function is already around 0.006 when J is 200. It suggests that the average entrywise error is around 0.08. In addition, the loss for the SVD-based estimator tends to be smaller when the factors are independent than that when they are correlated, for the same N, J, and K. This is because, the signal in the data is weaker in the latter case, due to the redundant information in correlated factors.

Moreover, we compare the performance of the two estimators. The CJMLE is always more accurate than the SVD-based estimator. This is consistent with the asymptotic theory that the CJMLE is statistically more efficient. However, if we compare the computation time of the two approaches, the SVD-based estimator is substantially faster. Under the most time-consuming setting where J = 1200 , K = 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 1200, K = 8$$\end{document} and the factors are correlated, the SVD approach only takes about 60 seconds, while the CJMLE takes about 17 minutes. Note that as shown in Chen et al. (Reference Chen, Li and Zhang2019b), CJMLE is already substantially faster than the marginal maximum likelihood estimator. Given its reasonable accuracy and computational advantage, the SVD-based estimator may be a good alternative to the CJMLE and the MMLE in large-scale exploratory IFA problems.

5. Concluding Remarks

As shown in this note, the proposed SVD-based algorithm is statistically consistent and has good finite sample performance in large-scale exploratory IFA problems. Although not statistically most efficient, the algorithm has its unique strengths over other exploratory IFA methods. In particular, it is computationally much faster. In addition, it guarantees a unique solution, while most of the other estimators can suffer from convergence issues for involving nonconvex optimization, including the CJMLE and MMLE.

Given its computational advantages and good finite sample performance, the SVD-based estimator can be used, not only as a starting point for other estimators to improve their numerical convergence, but also as an alternative estimator for data analysis. Specifically, in large-scale exploratory IFA applications, we suggest to start data exploration with the SVD-based estimator. Using this estimator, we can quickly gain some understanding about the number of factors underlying the data, and the loading structures of IFA models assuming different numbers of factors. Such initial knowledge helps us to focus on a smaller set of latent dimension K. For these latent dimensions, we tend to further investigate their loading structures by the CJMLE, using the corresponding SVD solutions as starting points. When sample and item sizes are relatively smaller, the traditional methods may be more suitable, such as the MMLE and the composite-likelihood-based estimator.

One limitation of the SVD-based estimator is that it is not easy to make statistical inference on the estimated loading matrix, such as constructing a confidence interval for an estimated loading parameter. This type of inference problem is not an issue for estimators based on the marginal likelihood, for which the asymptotic regime let N diverge and keep J fixed. However, it is a general challenge for both the SVD-based estimator and the CJMLE, whose consistency relies on a double asymptotic regime and the notion of consistency is weaker than that in the traditional sense. In recent years, this type of inference problems has received much attention in statistics (Chen et al. Reference Chen, Fan, Ma and Yan2019a; Xia and Yuan Reference Xia and Yuan2019). However, to the best of our knowledge, no results have been obtained under an IFA model. We leave this problem for future investigation.

Acknowledgements

Yunxiao Chen acknowledges the support from the National Academy of Education/Spencer Postdoctoral Fellowship. Xiaoou Li acknowledges the support from NSF under the grant DMS-1712657.

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-020-09704-7) contains supplementary material, which is available to authorized users.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 We only discuss oblique rotation here, as our general exploratory IFA model does not require the factors to be uncorrelated. If the factors are further required to be uncorrelated, then the loading matrix can be recovered up to an orthogonal rotation, for which the rotation matrix O is an orthogonal matrix (e.g., Kaiser Reference Kaiser1958).

2 The original algorithm was described in the supplementary material of Chen et al. (Reference Chen, Li and Zhang2019b). The algorithm here is a slightly modified version. The major modification is in Step 3 of the algorithm that requires at least K + 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K+1$$\end{document} singular values to be retained. This modification can improve the finite-sample performance of the algorithm; see Remark 4 for more discussions. The other modifications are mainly to simplify the exposition of the algorithm.

3 We say the distribution of a K-variate random vector θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} is sub-Gaussian, if there exist constants b 1 , b 2 > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_1, b_2 >0$$\end{document} such that for any u ∈ R K , ‖ u ‖ = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \in {\mathbb {R}}^K, \Vert {\mathbf {u}}\Vert =1$$\end{document} and t > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t>0$$\end{document} , Pr ( | u ⊤ θ | > t ) ≤ b 1 e - b 2 t 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Pr (|{\mathbf {u}}^\top \varvec{\theta }| > t ) \le b_1e^{-b_2t^2}$$\end{document} . In particular, the multivariate normal distribution is sub-Gaussian.

4 The CJMLE is implemented using R package mirtjml (Zhang et al. Reference Zhang, Chen and Li2018). All the computation is conducted on a single Intel®Gold 6130 core.

References

Bartholomew, D. J., Moustaki, I., Galbraith, J., & Steele, F., (2008). Analysis of multivariate social science data. Boca Raton, FL: CRC Press. CrossRefGoogle Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459. CrossRefGoogle Scholar
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261280. CrossRefGoogle Scholar
Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111150. CrossRefGoogle Scholar
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75, 3357. CrossRefGoogle Scholar
Cai, L. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35, 3 307335. CrossRefGoogle Scholar
Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43, 177214. CrossRefGoogle Scholar
Chen, Y., Fan, J., Ma, C., & Yan, Y. (2019a). Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116, 2293122937. CrossRefGoogle ScholarPubMed
Chen, Y., Li, X., Zhang, S (2019b). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84, 124146. CrossRefGoogle ScholarPubMed
Chen, Y., Li, X., & Zhang, S. (2019c). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2019.1635485. CrossRefGoogle Scholar
Chiu, C. -Y., Köhn, H. -F., Zheng, Y., & Henson, R. (2016). Joint maximum likelihood estimation for diagnostic classification models. Psychometrika, 81, 10691092. CrossRefGoogle ScholarPubMed
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning. New York, NY: Springer. Google Scholar
Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. The Annals of Statistics, 5, 815841. CrossRefGoogle Scholar
Haberman, S. J. (2004). Joint and conditional maximum likelihood estimation for the Rasch model for binary responses. ETS Research Report Series RR-04-20. CrossRefGoogle Scholar
Jöreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59, 381389. CrossRefGoogle Scholar
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187200. CrossRefGoogle Scholar
Katsikatsou, M., Moustaki, I., Yang-Wallentin, F., & Jöreskog, K. G. (2012). Pairwise likelihood estimation for factor analysis models with ordinal data. Computational Statistics and Data Analysis, 56, 42434258. CrossRefGoogle Scholar
Lee, S. -Y., Poon, W. -Y., & Bentler, P. (1990). Full maximum likelihood analysis of structural equation models with polytomous variables. Statistics and Probability Letters, 9, 9197. CrossRefGoogle Scholar
Lee, S. -Y., Poon, W. -Y., & Bentler, P. M. (1992). Structural equation models with continuous and polytomous variables. Psychometrika, 57, 89105. CrossRefGoogle Scholar
Muraki, E., & Carlson, J. E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 7390. CrossRefGoogle Scholar
Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115132. CrossRefGoogle Scholar
O’Rourke, S., Vu, V., & Wang, K. (2018). Random perturbation of low rank matrices: Improving classical bounds. Linear Algebra and its Applications, 540, 2659. CrossRefGoogle Scholar
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer. CrossRefGoogle Scholar
Stewart, G., & Sun, J. (1990). Matrix perturbation theory. Cambridge, MA: Academic Press, Google Scholar
Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97, 11671179. CrossRefGoogle Scholar
Wall, M. E., Rechtsteiner, A., & Rocha, L. M. Berrar, D. P., Dubitzky, W., & Granzow, M. (2003). Singular value decomposition and principal component analysis. A practical approach to microarray data analysis, New York, NY: Springer. 91109. CrossRefGoogle Scholar
Xia, D., & Yuan, M. (2019). Statistical inferences of linear forms for noisy matrix completion. arXiv preprint arXiv:1909.00116. Google Scholar
Zhang, S., Chen, Y., & Li, X. (2018). mirtjml: Joint maximum likelihood estimation for high-dimensional item factor analysis. R package version, 1.2. Google Scholar
Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73, 4471. 10.1111/bmsp.12153 CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. A scree plot for choosing the number of factors. The y-axis shows the standardized singular values σ^k/NJ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\hat{\sigma }_k/\sqrt{NJ}$$\end{document}, where σ^k\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{\sigma }}_k$$\end{document}s are obtained from Step 7 of Algorithm 1. The data are simulated from an IFA model with K=5\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K=5$$\end{document}, J=200\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J = 200$$\end{document}, and N=4000\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$N = 4000$$\end{document}. The input dimension is set to be 10 in Algorithm 1. A singular value gap can be found between the 5th and 6th singular values

Figure 1

Figure 2. Simulation results when K=4\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K=4$$\end{document} and the true factors are independent. Panel a shows the number of items J in x-axis versus the loss (2) in y-axis, and Panel b shows the number of items J in x-axis versus the computation time (in seconds) in y-axis. For each metric and each method, we show the median, 25% quantile, and 75% quantile based on the 100 independent replications

Figure 2

Figure 3. Simulation results when K=4\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K=4$$\end{document} and the true factors are correlated. The two panels show the same metrics as in Fig. 2

Figure 3

Figure 4. Simulation results when K=8\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K=8$$\end{document} and the true factors are independent. The two panels show the same metrics as in Fig. 2

Figure 4

Figure 5. Simulation results when K=8\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$K=8$$\end{document} and the true factors are correlated. The two panels show the same metrics as in Fig. 2

Supplementary material: File

Zhang et al. supplementary material

Supplement to “A Note on Exploratory Item Factor Analysis by Singular Value Decomposition"
Download Zhang et al. supplementary material(File)
File 253.7 KB
Supplementary material: File

Zhang et al. supplementary material

Zhang et al. supplementary material
Download Zhang et al. supplementary material(File)
File 714 Bytes