Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-11T00:55:31.827Z Has data issue: false hasContentIssue false

Modeling Evasive Response Bias in Randomized Response: Cheater Detection Versus Self-protective No-Saying

Published online by Cambridge University Press:  01 January 2025

Khadiga H. A. Sayed*
Affiliation:
Utrecht University Cairo University
Maarten J. L. F. Cruyff
Affiliation:
Utrecht University
Peter G. M. van der Heijden
Affiliation:
Utrecht University University of Southampton
*
Correspondence should be made to Khadiga H. A. Sayed, Department of Methodology and Statistics, Utrecht University, Padualaan 14, 3584 CH, Utrecht, The Netherlands. Email: k.h.a.sayed@uu.nl
Rights & Permissions [Opens in a new window]

Abstract

Randomized response is an interview technique for sensitive questions designed to eliminate evasive response bias. Since this elimination is only partially successful, two models have been proposed for modeling evasive response bias: the cheater detection model for a design with two sub-samples with different randomization probabilities and the self-protective no sayers model for a design with multiple sensitive questions. This paper shows the correspondence between these models, and introduces models for the new, hybrid “ever/last year” design that account for self-protective no saying and cheating. The model for one set of ever/last year questions has a degree of freedom that can be used for the inclusion of a response bias parameter. Models with multiple degrees of freedom are introduced for extensions of the design with a third randomized response question and a second set of ever/last year questions. The models are illustrated with two surveys on doping use. We conclude with a discussion of the pros and cons of the ever/last year design and its potential for future research.

Type
Original Research
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
© 2024 The Author(s)

Randomized response (RR) is an indirect survey method introduced by Warner (Reference Warner1965) to eliminate evasive response to sensitive questions. RR involves the use of a randomizer (e.g., a die or a spinner) that adds random noise to the responses so that they do not reveal the respondent’s true response, i.e., the truthful response that would have been given to a direct question. Several comparative validation studies [e.g., Umesh and Peterson (Reference Umesh and Peterson1991), Lamb and Stem (Reference Lamb and Stem1978), Tracy and Fox (Reference Tracy and Fox1981), Moshagen et al. (Reference Moshagen, Hilbig, Erdfelder and Moritz2014), Hoffmann et al. (Reference Hoffmann, Diedenhofen, Verschuere and Musch2015), Lara et al. (Reference Lara, García, Ellertson, Camlin and Suárez2006)] and two meta-analyses (Lensvelt-Mulders et al., Reference Lensvelt-Mulders, Hox, van der Heijden and Maas2005; Sagoe et al., Reference Sagoe, Cruyff, Spendiff, Chegeni, De Hon, Saugy, van der Heijden and Petróczi2021) have shown that RR tends to yield more valid responses than direct questioning. In general, RR yields higher prevalence estimates when the sensitive attribute is socially undesirable (the “more-is-better” criterion) and lower prevalence estimates when the sensitive attribute is socially desirable (the “less-is-better” criterion) (Mieth et al., Reference Mieth, Mayer, Hoffmann, Buchner and Bell2021; Meisters et al., Reference Meisters, Hoffmann and Musch2022a).

Although RR protects the respondents’ privacy, several studies showed that RR does not fully eliminate evasive response behavior (Edgell, Reference Edgell, Himmelfarb and Duchan1982; Böckenholt et al., Reference Böckenholt, Barlas and van der Heijden2009; Wolter and Preisendörfer, Reference Wolter and Preisendörfer2013; Höglinger et al., Reference Höglinger, Jann and Diekmann2016; John et al., Reference John, Loewenstein, Acquisti and Vosgerau2018; van der Heijden et al., Reference van der Heijden, Van Gils, Bouts and Hox2000). For example, in a study by van der Heijden et al. (Reference van der Heijden, Van Gils, Bouts and Hox2000) all respondents were known to have committed fraud, but RR yielded a prevalence estimate around 50%, and in a study by Edgell (Reference Edgell, Himmelfarb and Duchan1982), where the outcomes of the randomizer were predetermined, 25% of respondents gave an evasive “no” answer while the randomizer required them to answer “yes.” In a qualitative study of the forced response design (Boruch, Reference Boruch1971) by Boeije and Lensvelt-Mulders (Reference Boeije and Lensvelt-Mulders2002), some respondents admitted to have edited their responses because they did not want to falsely incriminate themselves by giving a forced “yes” response.

Since evasive responses bias the prevalence estimates, it is important to correct for them. The problem with RR designs like that of Warner, forced response (Boruch, Reference Boruch1971), the unrelated question (Greenberg et al., Reference Greenberg, Abul-Ela, Simmons and Horvitz1969), and the crosswise design (Tian and Tang, Reference Tian and Tang2013), is that their statistical models are saturated, because they have only one non-redundant randomized response proportion (that of the “yes” responses in the sample) to estimate the prevalence of the sensitive attribute. As a consequence, an additional parameter accounting for evasive response bias would not be identified. To model evasive response bias, degree(s) of freedom need to be generated for the inclusion of the additional parameter. In this paper, we compare three different designs that generate the necessary degree of freedom, and two different models that use this degree to model evasive response bias.

The sub-samples design in combination with the cheater detection model (CDM) was introduced by Clark and Desharnais (Reference Clark and Desharnais1998). This design generates a degree of freedom by splitting the sample in two non-overlapping sub-samples with different randomization probabilities, and the CDM estimates the prevalence of (i) instruction-adherent carriers of the sensitive attribute, (ii) instruction-adherent non-carriers of the sensitive attribute, and (iii) cheaters, i.e., respondents with unknown true response who give the evasive answer irrespective of the outcome of the randomizer. The prevalence estimate of the sensitive attribute therefore has a lower and upper bound, respectively, given by the estimate of the instruction-adherent carrier and the sum of the estimates of the instruction-adherent carriers and the cheaters. For details of the statistical properties of this model, see Feth et al. (2017). The CDM was used in combination with the forced response design by Clark and Desharnais (Reference Clark and Desharnais1998), but has also been used in combination with the unrelated question and triangular designs (Ostapczuk et al., Reference Ostapczuk, Moshagen, Zhao and Musch2009; Reiber et al., Reference Reiber, Pope and Ulrich2020, Reference Reiber, Bryce and Ulrich2022; Meisters et al., Reference Meisters, Hoffmann and Musch2022b). Topics that have been investigated with the CDM include doping use by elite and recreational athletes (Christiansen et al., Reference Christiansen, Frenger, Chirico and Pitsch2023; Elbe and Pitsch, Reference Elbe and Pitsch2018; Pitsch and Emrich, Reference Pitsch and Emrich2011; Petróczi et al., Reference Petróczi, Cruyff, de Hon, Sagoe and Saugy2022; Schröter et al., Reference Schröter, Studzinski, Dietz, Ulrich, Striegel and Simon2016; Fincoeur and Pitsch, Reference Fincoeur and Pitsch2017; Frenger et al., Reference Frenger, Pitsch and Emrich2016), cheating in examinations (Ostapczuk et al., Reference Ostapczuk, Moshagen, Zhao and Musch2009), medication non-adherence (Ostapczuk et al., Reference Ostapczuk, Musch and Moshagen2011), intimate partner violence during the COVID-19 pandemic (Reiber et al., Reference Reiber, Bryce and Ulrich2022), and social welfare fraud (van den Hout et al., Reference van den Hout, Böckenholt and van der Heijden2010a). The latter used a dual sampling scheme with RR questions in one sub-sample and direct questions in the other. The developed extended crosswise model (Heck et al., Reference Heck, Hoffmann and Moshagen2018) that has recently received much attention also uses the sub-samples design, but because it does not use the response categories “yes/no” it does do not lend itself for the estimation of cheating/SP-no saying [for details, see Heck et al. (Reference Heck, Hoffmann and Moshagen2018), Sayed et al. (Reference Sayed, Cruyff, van der Heijden and Petróczi2022)], and its discussion is therefore beyond the scope of this paper.

The SP-no model was introduced by Böckenholt and van der Heijden (Reference Böckenholt and van der Heijden2007) for the multiple questions design. This design consists of p 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\ge 2$$\end{document} dichotomous sensitive questions inquiring about different sensitive attributes. The SP-no model analyses the 2 p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^p$$\end{document} randomized response profiles under the assumption that the probabilities of the 2 p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^p$$\end{document} true response profiles can be described by a constrained multivariate distribution. This constraint generates the degree of freedom necessary to account for the presence of self-protective no sayers (SP-no sayers) who give an evasive “no” response to all questions, irrespective the outcome of the randomizer. In contrast to the CDM, the SP-no model does not treat the SP-no sayers as a separate category alongside the carriers and non-carriers, but it corrects the prevalence estimates of carriers and non-carriers for SP-no saying. The SP-no model has been used with various constrained multivariate distributions, including an item response theory (IRT) variant (Böckenholt and van der Heijden, Reference Böckenholt and van der Heijden2007; Böckenholt et al., Reference Böckenholt, Barlas and van der Heijden2009; Fox and Meijer, Reference Fox and Meijer2008; De Jong et al., Reference De Jong, Pieters and Fox2010), a log-linear variant (Cruyff et al., Reference Cruyff, van den Hout, van der Heijden and Böckenholt2007; van den Hout et al., Reference van den Hout, Gilchrist and van der Heijden2010b) and a zero-inflated Poisson variant (Cruyff et al., Reference Cruyff, Böckenholt, van den Hout and van der Heijden2008a, Reference Cruyff, van den Hout and van der Heijdenb), and it has also been used with a mixture of dichotomous and polytomous questions (Fox et al., Reference Fox, Avetisyan and van der Palen2013; Cruyff et al., Reference Cruyff, Böckenholt and van der Heijden2016). The log-linear and IRT models can generate more than one degree of freedom, which makes it possible to estimate item-specific SP-no parameters that correspond to the sensitivity of the items [see, Böckenholt et al. (Reference Böckenholt, Barlas and van der Heijden2009)]. The SP-no model has been used for prevalence estimation of such topics as social welfare fraud (Böckenholt and van der Heijden, Reference Böckenholt and van der Heijden2007; Böckenholt et al., Reference Böckenholt, Barlas and van der Heijden2009; Cruyff et al., Reference Cruyff, van den Hout, van der Heijden and Böckenholt2007, Reference Cruyff, Böckenholt, van den Hout and van der Heijden2008a, Reference Cruyff, van den Hout and van der Heijdenb, Reference Cruyff, Böckenholt and van der Heijden2016), smoking behavior (Fox et al., Reference Fox, Avetisyan and van der Palen2013), attitudes toward academic learning (Fox and Meijer, Reference Fox and Meijer2008), and sexual attitudes (De Jong et al., Reference De Jong, Pieters and Fox2010).

The CDM and SP-no models appear to be different approaches to modeling evasive response bias, with the former corresponding to the sub-samples design and the latter to the multiple questions design. Although van den Hout et al. (Reference van den Hout, Böckenholt and van der Heijden2010a) has briefly commented on the correspondence between the models, little is known about the similarities and differences between the two models. This makes it difficult for researchers who want to conduct an RR survey and correct for response bias to select the appropriate model. This paper sheds more light on this issue by showing the correspondence between the parameters of both models and derives such models for the ever/last year design.

This paper also introduces a new design that can serve as an alternative to the existing designs for detecting evasive response bias. Recently, Sayed et al. (Reference Sayed, Cruyff and van der Heijden2023) proposed a model for the RR design with “ever” and “last year” questions, the former asking about the presence of a sensitive attribute during the respondent’s lifetime and the latter about its presence during the last year. This design was originally developed to investigate the prevalence of a sensitive attribute over time, but it has some favorable properties that are useful even if the primary interest is not in the prevalence estimates over time. The model for this design estimates the prevalence of non-carriers, former carriers and last year carriers of the sensitive attribute from the four observed randomized response profiles, and therefore has one degree of freedom. An important advantage of this model is that it estimates the prevalence of the last year carriers more efficiently than when only the “last year” question is asked. Another advantage is that there is no need for splitting the sample in two sub-samples with different randomization probabilities nor for assuming a constrained multivariate distribution for the true response profiles to generate the degree of freedom for the estimation of response bias, because this degree of freedom is already available. The ever/last year design has been applied in two studies on doping use in The Netherlands (De Hon et al., Reference De Hon, Kuipers and van Bottenburg2015; Hilkens et al., Reference Hilkens, Cruyff, Woertman, Benjamins and Evers2021), but it did not include a parameter to account for evasive response bias.

This paper introduces response bias models for the ever/last year design with a single set of ever/last year questions, and extension to an additional third dichotomous randomized response question and a second set of ever/last year questions. The benefits of these extensions are twofold. Firstly, they increase the power to detect response biases. Secondly, they increase the degrees of freedom, which allows for the inclusion of multiple parameters to test different assumptions about response bias. These models are applied to data from two Dutch surveys with ever/last year question on the use of anabolic steroids and SARMs, and two sets of ever/last year questions on the use of anabolics and blood manipulations.

The paper is structured as follows. Section 1 reviews the CDM for the sub-samples design and SP-no model for the multiple question design and shows the correspondence between these models by applying the SP-no model to the sub-samples design and the CDM to the multiple questions design. In Sect. 2, we derive the CDM and the SP-no model for the ever/last year design. Section 3 presents the maximum likelihood estimators of the model parameters. Section 4 investigates the power to detect cheating/SP-no saying in the design with one set of ever/last year and designs with an additional third question and a second set of ever/last year questions. Section 5 presents the results of analyses of the data of a survey with ever/last year questions about anabolic steroids and a dichotomous RR question about SARMs use by gym users in the Netherlands and of a survey with ever/last year questions about both the use of androgenic anabolics and blood manipulations by Dutch elite athletes. The analyses include prevalence estimation of these two types of doping and of cheating/SP-no saying. Section 6 discusses the pros and cons of both approaches for modeling evasive response bias and some limitations. Section 7 ends the paper with discussion of the pros and cons of the various designs and suggestions for future research.

1. Correspondence between the CDM and SP-no model

In this section, we review the CDM for the sub-samples design and the SP-no model for the multiple questions design, and then show that the SP-no model can also be applied to the sub-samples design, and that the CDM can also be applied to the multiple questions design. We start the section with an introduction of the matrix notation for dichotomous, saturated randomized response models. We use matrix notation because it provides a visualization of the models that facilitates their interpretation. In Sect. 3, we show that the use of matrix notation also greatly facilitates parameter estimation.

Consider a design with a single dichotomous sensitive question. Let π t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _t$$\end{document} denote the probability of the true response and π r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_r$$\end{document} the probability of the randomized response, for r , t { n = n o , y = y e s } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r,t\in \{n=no,y=yes\}$$\end{document} , and let P 2 × 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{2\times 2}$$\end{document} be the 2 × 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document} transition matrix with entries p r | t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{r|t}$$\end{document} denoting the conditional randomization probabilities of observing randomized response r given true response t. In matrix notation, the model π = P 2 × 2 π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }^*=\varvec{P}_{2\times 2}\varvec{\pi }$$\end{document} for this design is given by

(1.1) π n π y = p n | n p n | y p y | n p y | y π n π y = p n | n π n + p n | y π y p y | n π n + p y | y π y , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_n\\ \pi ^*_y \end{array}\right) = \left( \begin{array}{cc} p_{n|n} & p_{n|y}\\ p_{y|n} & p_{y|y} \end{array}\right) \left( \begin{array}{c} \pi _n\\ \pi _y \end{array}\right) = \left( \begin{array}{c} p_{n|n}\pi _n+p_{n|y}\pi _y \\ p_{y|n}\pi _n+p_{y|y}\pi _y \end{array}\right) , \end{aligned}$$\end{document}

for p n | n p y | n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{n|n}\ne p_{y|n}$$\end{document} and p n | y p y | y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{n|y}\ne p_{y|y}$$\end{document} (Chaudhuri and Mukerjee, Reference Chaudhuri and Mukerjee1988; van den Hout and van der Heijden, Reference van den Hout and van der Heijden2002).

To simplify the notation of the transition matrices, we will from hereon restrict the model derivations to designs with symmetrical randomization probabilities p = p y | y = p n | n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=p_{y|y}=p_{n|n}$$\end{document} and q = p y | n = p n | y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=p_{y|n}=p_{n|y}$$\end{document} , for p + q = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p+q=1$$\end{document} and p q \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\ne {q}$$\end{document} , so that the probability that a carrier answers y is equal to the probability that a non-carrier answers n.

The transition matrix defines the statistical properties of the model. For example, for Warner’s design with probability.8 of answering the sensitive question p = . 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=.8$$\end{document} and q = . 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=.2$$\end{document} , and for the unrelated question design (Greenberg et al., Reference Greenberg, Abul-Ela, Simmons and Horvitz1969) with probability 0.6 of answering the sensitive question and probability 0.5 of answering “yes” to the unrelated question, p = . 6 + . 5 × . 4 = . 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=.6+.5 \times .4=.8$$\end{document} and q = . 5 × . 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=.5\times .4$$\end{document} , so that the models for both designs have the same transition matrix

(1.2) P 2 × 2 = p q q p = . 8 . 2 . 2 . 8 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{P}_{2\times 2} = \left( \begin{array}{cc} p & q\\ q & p \end{array}\right) = \left( \begin{array}{cc}.8 & .2\\ .2 & .8 \end{array}\right) , \end{aligned}$$\end{document}

which shows that these designs are mathematically equivalent. The efficiency of a design is determined by diagonal entries of the transition matrix; the closer to 1 the higher its efficiency ( p = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=1$$\end{document} corresponds to the direct question design). For p = . 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=.5$$\end{document} , the design is uninformative because it results in the randomized response probabilities π n = π y = . 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_n=\pi ^*_y=.5$$\end{document} , irrespective of the prevalence of the sensitive attribute.

For a design with two sensitive questions A and B and r AB , t AB { n n , n y , y n , y y } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{AB},t_{AB}\in \{nn, ny,yn,yy\}$$\end{document} , respectively, denoting randomized and true response profiles, the bivariate model is given by

(1.3) π nn π ny π yn π yy = p 2 pq qp q 2 p q p 2 q 2 qp q p q 2 p 2 pq q 2 qp pq p 2 π nn π ny π yn π yy . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array} \right) = \left( \begin{array}{cccc} p^2 & pq & qp & q^2\\ pq & p^2 & q^2 & qp\\ qp & q^2 & p^2 & pq\\ q^2 & qp & pq & p^2 \end{array}\right) \left( \begin{array}{c} \pi _{nn}\\ \pi _{ny}\\ \pi _{yn}\\ \pi _{yy} \end{array} \right) . \end{aligned}$$\end{document}

where P 4 × 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{4\times 4}$$\end{document} is obtained as the Kronecker P 2 × 2 P 2 × 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{2\times 2}\otimes \varvec{P}_{2\times 2}$$\end{document} . The extension to more than two randomized response variables is straightforward. For example, for three variables ABC the response profiles are r ABC , t ABC { n n n , n n y , , y y n , y y y } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{ABC},t_{ABC}\in \{nnn, nny, \dots , yyn, yyy\}$$\end{document} and the transition matrix is P 2 × 2 P 2 × 2 P 2 × 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{2\times 2}\otimes \varvec{P}_{2\times 2}\otimes \varvec{P}_{2\times 2}$$\end{document} . If C is a categorical non-randomized response variable with p categories, the transition matrix is obtained by P 2 × 2 P 2 × 2 I p × p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{2\times 2}\otimes \varvec{P}_{2\times 2}\otimes {I}_{p\times {p}}$$\end{document} , where I p × p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{p\times {p}}$$\end{document} is the p × p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\times {p}$$\end{document} identity matrix (van den Hout and van der Heijden, Reference van den Hout and van der Heijden2002).

1.1. The CDM for the sub-samples design

The sub-samples design of Clark and Desharnais (Reference Clark and Desharnais1998) splits the sample into two sub-samples s { 1 , 2 } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\in \{1,2\}$$\end{document} with different randomization probabilities p s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_s$$\end{document} and q s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_s$$\end{document} . The CDM for this design is formulated in terms of the conditional randomized response probabilities π r | s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_{r|s}$$\end{document} denoting the probability of observing randomized response r given membership of sub-sample s, for r π r | s = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _r\pi ^*_{r|s}=1$$\end{document} . The advantage of this formulation is that the sub-sample sizes n s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_s$$\end{document} drop out the equation, which simplifies notation and interpretation. The model assuming instruction-adherence is

(1.4) π n | 1 π y | 1 π n | 2 π y | 2 = p 1 q 1 q 1 p 1 q 2 p 2 p 2 q 2 π n π y , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{n|1}\\ \pi ^*_{y|1}\\ \pi ^*_{n|2}\\ \pi ^*_{y|2} \end{array}\right) = \left( \begin{array}{ccc} p_1 & q_1\\ q_1 & p_1\\ q_2 & p_2\\ p_2 & q_2 \end{array}\right) \left( \begin{array}{c} \pi _n\\ \pi _y \end{array}\right) , \end{aligned}$$\end{document}

A common choice for p s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_s$$\end{document} is to set p 1 = p 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_1=p_2$$\end{document} , resulting in complementary randomization probabilities p = p n | n = p y | y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=p_{n|n}=p_{y|y}$$\end{document} for sub-sample 1 and p = p n | y = p y | n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=p_{n|y}=p_{y|n}$$\end{document} for sub-sample 2. In the remainder of this paper we will assume that the randomization probabilities are complementary and drop the subscript s from p s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_s$$\end{document} and q s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_s$$\end{document} .

This model has one degree of freedom, because there are two non-redundant randomized response probabilities to estimate one non-redundant true response probability. To illustrate the model, consider a design with the two statements “I used doping” and “I never used doping,” and that respondents in sub-sample 1 answer the first statement with probability.8 and the second with probability.2, while respondents in sub-sample 2 answer the first statement with probability.2 and the second with probability.8. With a true prevalence of doping use π y = . 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _y=.2$$\end{document} and randomization probability p = 1 - q = . 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=1-q=.8$$\end{document} , the model is given by

. 68 . 32 . 32 . 68 = . 8 . 2 . 2 . 8 . 2 . 8 . 8 . 2 . 8 . 2 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c}.68\\ .32\\ .32\\ .68 \end{array}\right) = \left( \begin{array}{ccc}.8 & .2\\ .2 & .8\\ .2 & .8\\ .8 & .2 \end{array}\right) \left( \begin{array}{c}.8\\ .2 \end{array}\right) . \end{aligned}$$\end{document}

This example shows that in case of instruction-adherence π n | 1 + π n | 2 = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_{n|1}+\pi ^*_{n|2}=1$$\end{document} and π y | 1 + π y | 2 = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_{y|1}+\pi ^*_{y|2}=1$$\end{document} .

The CDM postulates the presence of “cheaters,” i.e., respondents who answer “no” irrespective of the outcome of the randomizer and for whom the true response is unknown. Consequently, the CDM distinguishes between three true responses t { n , y , c } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\in \{n,y,c\}$$\end{document} , with n denoting the instruction-adherent non-carriers, y the instruction-adherent carriers, and c the cheaters with unknown true response. By replacing π = ( π n , π y ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }=(\pi _n,\pi _y)'$$\end{document} of model (1.4) by τ = ( τ n , τ y , τ c ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }=(\tau _n, \tau _y, \tau _c)'$$\end{document} , for τ y + τ n + τ c = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _y + \tau _n + \tau _c = 1$$\end{document} , the CDM is given by

(1.5) π n | 1 π y | 1 π n | 2 π y | 2 = p q 1 q p 0 q p 1 p q 0 τ n τ y τ c , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{n|1}\\ \pi ^*_{y|1}\\ \pi ^*_{n|2}\\ \pi ^*_{y|2} \end{array}\right) = \left( \begin{array}{ccc} p & q & 1\\ q & p & 0\\ q & p & 1\\ p & q & 0 \end{array}\right) \left( \begin{array}{c} \tau _n\\ \tau _y\\ \tau _c \end{array}\right) , \end{aligned}$$\end{document}

where the third column of the transition matrix corresponds to the conditional randomization probabilities for the cheaters, for whom p n | c = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{n|c}=1$$\end{document} and p y | c = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{y|c}=0$$\end{document} in both sub-samples.

As an example, consider the randomized response probabilities π = ( . 7 , . 3 , . 4 , . 6 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }^*=(.7,.3,.4,.6)'$$\end{document} , which indicate the presence of cheaters because π n | 1 + π n | 2 = . 7 + . 4 > 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_{n|1}+\pi ^*_{n|2}=.7+.4>1$$\end{document} . The corresponding true response probabilities are τ = ( . 7 , . 2 , . 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }=(.7,.2,.1)$$\end{document} , so that the prevalence of doping has a lower bound of τ y = . 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _y=.2$$\end{document} (the instruction-adherent carriers) and an upper bound τ y + τ c = . 2 + . 1 = . 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _y+\tau _c=.2 +.1 =.3$$\end{document} (the instruction-adherent carriers and the cheaters).

1.2. The SP-no model for the multiple questions design

To formulate a general SP-no model for a design with two sensitive questions inquiring about two different sensitive attributes, we extend model (1.3) with the parameters θ t AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{t_{AB}}$$\end{document} , denoting the probability of SP-no saying by respondents with true response profile t AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{AB}$$\end{document} . The model is

(1.6) π nn π ny π yn π yy = p 2 pq qp q 2 p q p 2 q 2 qp q p q 2 p 2 pq q 2 qp pq p 2 ( 1 - θ nn ) π nn ( 1 - θ ny ) π ny ( 1 - θ yn ) π yn ( 1 - θ yy ) π yy + θ nn π nn + θ ny π ny + θ yn π yn + θ yy π yy 0 0 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array}\right) = \left( \begin{array}{cccc} p^2 & pq & qp & q^2\\ pq & p^2 & q^2 & qp\\ qp & q^2 & p^2 & pq\\ q^2 & qp & pq & p^2 \end{array}\right) \left( \begin{array}{c} (1-\theta _{nn})\pi _{nn}\\ (1-\theta _{ny})\pi _{ny}\\ (1-\theta _{yn})\pi _{yn}\\ (1-\theta _{yy})\pi _{yy} \end{array}\right) + \left( \begin{array}{c} \theta _{nn}\pi _{nn}+\theta _{ny}\pi _{ny}+\theta _{yn}\pi _{yn}+\theta _{yy}\pi _{yy}\\ 0\\ 0\\ 0 \end{array}\right) , \nonumber \\ \end{aligned}$$\end{document}

where ( 1 - θ t AB ) π t AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-\theta _{t_{AB}})\pi _{t_{AB}}$$\end{document} denotes the probability that respondents with true response profile t AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{AB}$$\end{document} are instruction-adherent and the vector at the right-hand side of the model denotes the total probability of observing an SP-no response.

With four randomized response probabilities π r AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*_{r_{AB}}$$\end{document} and the eight parameters θ t AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{t_{AB}}$$\end{document} and π t AB \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{t_{AB}}$$\end{document} to be estimated, the model is obviously over-parameterized. The model can be identified by i) assuming an equal SP-no probability for all true response profiles and ii) assuming independence of the two sensitive attributes by formulating the log-linear independence model (AB) given by log π t AB = λ + λ t A A + λ t B B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log \pi _{t_{AB}}=\lambda +\lambda ^A_{t_A}+\lambda ^B_{t_B}$$\end{document} for the probabilities of the true response profiles. Using dummy coding for the log-linear model with the true responses t A , t B = n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_A,t_B=n$$\end{document} as reference category, the model becomes

(1.7) π nn π ny π yn π yy = ( 1 - θ ) p 2 pq qp q 2 p q p 2 q 2 qp q p q 2 p 2 pq q 2 qp pq p 2 e λ e λ + λ y B e λ + λ y A e λ + λ y A + λ y B + θ 1 0 0 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array}\right) = (1-\theta ) \left( \begin{array}{cccc} p^2 & pq & qp & q^2\\ pq & p^2 & q^2 & qp\\ qp & q^2 & p^2 & pq\\ q^2 & qp & pq & p^2 \end{array}\right) \left( \begin{array}{c} e^{\lambda }\\ e^{\lambda +\lambda ^B_{y}}\\ e^{\lambda +\lambda ^A_{y}}\\ e^{\lambda +\lambda ^A_{y}+\lambda ^B_{y}} \end{array}\right) + \theta \left( \begin{array}{c} 1\\ 0\\ 0\\ 0 \end{array}\right) . \end{aligned}$$\end{document}

This parameterization shows that the SP-no model can be interpreted as a mixture model, with 1 - θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\theta $$\end{document} the probability to the latent class of instruction-adherent respondents, and θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} the probability of the latent class of SP-no sayers. The vector ( 1 , 0 , 0 , 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1,0,0,0)'$$\end{document} can be interpreted as the randomization probabilities of the SP-no sayers, which are 1 for the randomized response profile nn and 0 otherwise.

The validity of this model depends on the strong assumption that the two sensitive attributes are independent. If the sensitive attributes are not independent, the parameter estimates of model (1.7) will be biased. To illustrate, consider the true response probability vector π = ( . 4 , . 1 , . 4 , . 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }=(.4,.1,.4,.1)'$$\end{document} implying independence of the two sensitive attributes. With a prevalence θ = . 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =.2$$\end{document} of SP-no sayers and p = . 8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=.8$$\end{document} , the model yields the unbiased estimates π ^ = ( . 4 , . 1 , . 4 , . 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\pi }}=(.4,.1,.4,.1)'$$\end{document} and θ ^ = . 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }=.2$$\end{document} . However, for the vector π = ( . 4 , . 3 , . 2 , . 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }=(.4,.3,.2,.1)'$$\end{document} implying dependence of the two sensitive attributes, the estimates π ^ = ( . 46 , . 27 , . 17 , . 10 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\pi }}=(.46,.27,.17,.10)'$$\end{document} and θ ^ = . 164 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }=.164$$\end{document} are biased.

The independence assumption can be relaxed by asking three questions ABC and formulating the log-linear model (ABACBC) given by log π t ABC = λ + λ t A A + λ t B B + λ t C C + λ t AB AB + λ t AC AC + λ t BC BC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log \pi _{t_{ABC}}=\lambda +\lambda ^A_{t_A}+\lambda ^B_{t_B}+\lambda ^C_{t_C}+\lambda ^{AB}_{t_{AB}}+\lambda ^{AC}_{t_{AC}}+\lambda ^{BC}_{t_{BC}}$$\end{document} . This model constrains the three-factor interaction λ t ABC ABC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda ^{ABC}_{t_{ABC}}$$\end{document} to zero to obtain the degree of freedom for model identification, but includes all pairwise interactions. With three or more questions, it also becomes possible to specify an IRT model as introduced by Böckenholt and van der Heijden (Reference Böckenholt and van der Heijden2007).

On first sight the CDM and SP-no models may appear to be incompatible, but in the next two sections we show that these models are two sides of the same coin by writing the parameters of one in terms of the parameters of the other.

1.3. The SP-no model for the sub-samples design

To apply the SP-no model to the sub-samples design, we use the general formulation (1.6) with separate probabilities θ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _y$$\end{document} for the carriers an θ n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n$$\end{document} for the non-carriers. This yields the model

(1.8) π n | 1 π y | 1 π n | 2 π y | 2 = p q q p q p p q ( 1 - θ n ) π n ( 1 - θ y ) π y + θ n π n + θ y π y 0 θ n π n + θ y π y 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{n|1}\\ \pi ^*_{y|1}\\ \pi ^*_{n|2}\\ \pi ^*_{y|2} \end{array}\right) = \left( \begin{array}{cc} p & q\\ q & p\\ q & p\\ p & q \end{array}\right) \left( \begin{array}{c} (1-\theta _n)\pi _{n}\\ (1-\theta _y)\pi _{y} \end{array}\right) + \left( \begin{array}{c} \theta _n\pi _n+\theta _y\pi _y\\ 0\\ \theta _n\pi _n+\theta _y\pi _y\\ 0 \end{array}\right) . \end{aligned}$$\end{document}

We have now formulated a model for the sub-samples design for which π n + π y = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _n+\pi _y=1$$\end{document} , and the prevalence of the cheaters τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _c$$\end{document} is replaced by the SP-no sayer parameters θ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _y$$\end{document} and θ n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n$$\end{document} . This allows us to make different assumptions with respect to the true responses of the SP-no sayers and enables us to investigate the correspondence between the parameters of the CDM (1.5) and the SP-no model (1.8). Table 1 summarizes this correspondence for θ n = θ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n=\theta _y$$\end{document} , θ n = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n=0$$\end{document} and θ y = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _y=0$$\end{document} . For the derivations of these equality relations we refer the Appendix A on OSF (https://osf.io/autr5/?view_only=2af4b338b9be45cd8657f926438c5f93).

Table 1 Correspondence between the parameters of the CDM and SP-no model.

The table shows that for θ n = θ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n=\theta _y$$\end{document} the prevalence of cheaters is equal to the prevalence of SP-no sayers, and that the prevalence of carriers π y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _y$$\end{document} is equal to the conditional probability of the adherent carriers τ y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _y$$\end{document} given the non-cheaters 1 - τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\tau _c$$\end{document} . For θ n = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _n=0$$\end{document} and θ y = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _y=0$$\end{document} , the prevalence of carriers π y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _y$$\end{document} , respectively, corresponds to the upper and lower bound of the prevalence of the carriers under the CDM, and θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} corresponds to the respective conditional probabilities of cheating given cheaters and adherent carriers and of cheating given cheaters and adherent non-carriers.

1.4. The CDM for the multiple questions design

To apply the CDM to the multiple questions design, we replace the vector π = ( π nn , π ny , π yn , π yy ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }=(\pi _{nn}, \pi _{ny},\pi _{yn},\pi _{yy})'$$\end{document} of the SP-no model (1.7) by the vector τ = ( τ nn , τ ny , τ yn , τ yy , τ c ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }=(\tau _{nn}, \tau _{ny},\tau _{yn},\tau _{yy}, \tau _c)'$$\end{document} and formulate the log-linear independence model log τ t AB = λ + λ t A A + λ t B B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log \tau _{t_{AB}}=\lambda +\lambda ^A_{t_A}+\lambda ^B_{t_B}$$\end{document} for the true response profiles in τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} corresponding to the instruction-adherent respondents. The vector denoting the latent class of SP-no sayers in (1.7) is replaced by a fifth column in the transition matrix with the randomization probabilities of the cheaters. The model is then given by

(1.9) π nn π ny π yn π yy = p 2 pq qp q 2 1 p q p 2 q 2 qp 0 q p q 2 p 2 pq 0 q 2 qp pq p 2 0 ( 1 - τ c ) e λ ( 1 - τ c ) e λ + λ y B ( 1 - τ c ) e λ + λ y A ( 1 - τ c ) e λ + λ y A + λ y B τ c . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array}\right) = \left( \begin{array}{ccccc} p^2 & pq & qp & q^2 & 1\\ pq & p^2 & q^2 & qp & 0\\ qp & q^2 & p^2 & pq & 0\\ q^2 & qp & pq & p^2 & 0 \end{array}\right) \left( \begin{array}{l} (1-\tau _c)e^{\lambda }\\ (1-\tau _c)e^{\lambda + \lambda ^B_{y}}\\ (1-\tau _c)e^{\lambda + \lambda ^A_{y}}\\ (1-\tau _c)e^{\lambda + \lambda ^A_{y} + \lambda ^B_{y}}\\ \tau _c \end{array}\right) . \end{aligned}$$\end{document}

The correspondence between the parameters of SP-no model (1.7) and the CDM (1.9) can no longer be established by assuming that the SP-no sayers of model (1.7) are either carriers or non-carriers, because there is now a mixture of carriers and non-carriers on the two sensitive attributes. Under the assumption that all four true response profiles have the same probability of SP-no saying, the correspondence is analogous to that described in the first column of Table 1, i.e., θ = τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =\tau _c$$\end{document} and π t AB = τ t AB / ( 1 - τ c ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{t_{AB}}=\tau _{t_{AB}}/(1-\tau _c)$$\end{document} .

2. Models for ever/last year designs

The ever/last year design is characterized by two questions about the same sensitive attribute; one about its presence during the respondent’s lifetime, and one about its presence in the last year. It can be considered a hybrid of the cheater detection and the multiple questions designs. Like the sub-samples design a single sensitive attribute is queried, but like the multiple questions design it employs multiple (in this case two) sensitive questions. The design makes it possible to identify last year, former and non-carriers of the sensitive attribute. The advantages of this design over a design with a single question design with the three responses “never,” “former” and “last year” are that the model for the ever/last year design has a degree of freedom and that the prevalence of the last year carriers is estimated with higher precision (Sayed et al. (Reference Sayed, Cruyff and van der Heijden2023)).

In this section, we consider models for one set of ever/last year questions, for one set of ever/last year question and a third, dichotomous randomized response question, and for two sets of ever/last year questions. We introduce the SP-no and CDM versions of these models with the parameters θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _c$$\end{document} , and for the SP-no models with more than one degree of freedom, we include an additional parameter that accounts for evasive responses to the last year question by last year carriers.

2.1. A single set of ever/last year questions

The model for one set of ever/last year questions comes with one degree of freedom. This degree of freedom is due to the fact that the true response profile ny of never been carrier while having been carrier during the last year is impossible. As a consequence, the parameter π ny \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{ny}$$\end{document} and the second column of the P 4 x 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{4x4}$$\end{document} transition matrix of model (1.3) are redundant. The null model for this design is given by

(2.1) π nn π ny π yn π yy = p 2 qp q 2 p q q 2 qp q p p 2 pq q 2 pq p 2 π nn π yn π yy . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array}\right) = \left( \begin{array}{ccc} p^2 & qp & q^2\\ pq & q^2 & qp\\ qp & p^2 & pq\\ q^2 & pq & p^2 \end{array}\right) \left( \begin{array}{c} \pi _{nn}\\ \pi _{yn}\\ \pi _{yy} \end{array}\right) . \end{aligned}$$\end{document}

The true response profiles t { n n , y n , y y } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\in \{nn, yn, yy\}$$\end{document} are, respectively, interpreted as the non-carriers (those who never carried the sensitive attribute), the former carriers (those who have once carried the sensitive attribute, but not in the last year), and the last year carriers (those who carried the sensitive attribute in the last year and possibly before).

The CDM version of this model is

(2.2) π nn π ny π yn π yy = p 2 qp q 2 1 p q q 2 qp 0 q p p 2 pq 0 q 2 pq p 2 0 τ nn τ yn τ yy τ c , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array}\right) = \left( \begin{array}{cccc} p^2 & qp & q^2 & 1\\ pq & q^2 & qp & 0\\ qp & p^2 & pq & 0\\ q^2 & pq & p^2 & 0 \end{array}\right) \left( \begin{array}{c} \tau _{nn}\\ \tau _{yn}\\ \tau _{yy}\\ \tau _c \end{array}\right) , \end{aligned}$$\end{document}

and the SP-no model is

(2.3) π nn π ny π yn π yy = ( 1 - θ ) p 2 qp q 2 p q q 2 qp q p p 2 pq q 2 pq p 2 π nn π yn π yy + θ 1 0 0 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{c} \pi ^*_{nn}\\ \pi ^*_{ny}\\ \pi ^*_{yn}\\ \pi ^*_{yy} \end{array}\right) = (1-\theta ) \left( \begin{array}{lll} p^2 & qp & q^2\\ pq & q^2 & qp\\ qp & p^2 & pq\\ q^2 & pq & p^2 \end{array}\right) \left( \begin{array}{c} \pi _{nn}\\ \pi _{yn}\\ \pi _{yy} \end{array}\right) + \theta \left( \begin{array}{c} 1\\ 0\\ 0\\ 0 \end{array}\right) . \end{aligned}$$\end{document}

It has been suggested that the ever/last year design may reduce the respondents’ trust in the privacy protection, because two questions on the same sensitive attribute have to be answered. This may especially be the case for last year carriers when they have to answer “yes” to the last year question when they already answered the ever question with “yes.” To account for this kind of response bias, we formulate the SP(last year) model. This model includes, aside from the general SP-no parameter θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , the parameter θ y y y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy~\rightarrow ~yn}$$\end{document} denoting the probability that a last year carrier answers yn to the ever and last year questions when yy was required. This model can be formulated by bringing the θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and θ y y y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy~\rightarrow ~yn}$$\end{document} inside the transition matrix of model 2.2. Given that the yn and yy response profiles are represented by the third and fourth row of the transition matrix and the last year users by its third column, the transition matrix is given by

(2.4) P 4 × 3 = ( 1 - θ ) p 2 + θ ( 1 - θ ) q p + θ ( 1 - θ ) q 2 + θ ( 1 - θ ) p q ( 1 - θ ) q 2 ( 1 - θ ) q p ( 1 - θ ) q p ( 1 - θ ) p 2 ( 1 - θ ) p q + θ y y y n p 2 ( 1 - θ ) q 2 ( 1 - θ ) p q ( 1 - θ - θ y y y n ) p 2 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{P}_{4\times 3} = \left( \begin{array}{lll} (1-\theta )p^2 +\theta & (1-\theta )qp +\theta & (1-\theta )q^2+\theta \\ (1-\theta )pq & (1-\theta )q^2 & (1-\theta )qp\\ (1-\theta )qp & (1-\theta )p^2 & (1-\theta )pq+\theta _{yy~\rightarrow ~yn}p^2\\ (1-\theta )q^2 & (1-\theta )pq & (1-\theta -\theta _{yy~\rightarrow ~yn})p^2 \end{array}\right) . \end{aligned}$$\end{document}

For a single set of ever/last year questions this model is not identified, but it can be identified by the inclusion of more questions.

2.2. Model extensions

Model (2.1) is easily extended to more questions. The model for one set of ever/last year questions and a third, dichotomous question is obtained by extending the true and randomized response profiles with the answers to the third question, and constructing the transition matrix P 8 × 6 = P 4 × 3 P 2 × 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{8\times 6}=\varvec{P}_{4\times 3}\otimes \varvec{P}_{2\times 2}$$\end{document} , where P 4 × 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{4\times 3}$$\end{document} is the transition matrix of model (2.1) and P 2 × 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{2\times 2}$$\end{document} is that of the third question. This model has two degrees of freedom. The transition matrix of the model for two sets of ever/last year questions is P 16 × 9 = P 4 × 3 P 4 × 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{P}_{16\times 9}=\varvec{P}_{4\times 3}\otimes \varvec{P}_{4\times 3}$$\end{document} , so that this model has 7 degrees of freedom.

The derivation of the SP-no and CDM versions of these models is straightforward. For the SP(last year) model, we replace the parameter θ y y y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy ~\rightarrow ~ yn}$$\end{document} in the 4 × 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4\times 3$$\end{document} transition matrix (2.4) by θ y y · y n · \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy\cdot ~\rightarrow ~ yn\cdot }$$\end{document} in the 8 × 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8\times 6$$\end{document} transition matrix for the design with a third question, with the dot representing the response to the third question. Analogously, we define the parameters θ y y · · y n · · = θ · · y y · · y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy\cdot \cdot ~\rightarrow ~yn\cdot \cdot }=\theta _{\cdot \cdot yy~\rightarrow ~\cdot \cdot yn}$$\end{document} for the design with two sets of ever/last year questions. The R code for constructing these transition matrices is given in Appendix C (https://osf.io/autr5/?view_only=2af4b338b9be45cd8657f926438c5f93).

3. Estimation

For the examples presented in this paper, the maximum likelihood estimates (MLEs) of the model parameters are obtained by maximization of the kernel of the log-likelihood

(3.1) ln ( Φ n ) = n ln π , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \ln \ell (\varvec{\Phi }\mid \varvec{n})=\varvec{n}'\ln \varvec{\pi }^*, \end{aligned}$$\end{document}

where Φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Phi }$$\end{document} is the vector with the model parameters π , τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi },\varvec{\tau }$$\end{document} and/or θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} , and n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{n}$$\end{document} the vector with the frequencies of the observed randomized response profiles. Maximization of the log-likelihood may result in negative prevalence estimates of the sensitive attribute(s) (van den Hout and van der Heijden, Reference van den Hout and van der Heijden2004). To ensure that these parameter estimates are inside the parameter space (0, 1), the parameters π j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _j$$\end{document} and τ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _j$$\end{document} are estimated via the softmax function exp ( β j ) / j exp ( β j ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\exp (\beta _j)/\sum _j\exp (\beta _j)$$\end{document} . The sampling variances of π ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_j$$\end{document} and τ ^ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\tau }_j$$\end{document} are obtained with the delta method (Hoef, Reference Hoef2012). Examples of the use of the delta method can be found in Appendix C (e.g., Section 2.5). The θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} parameter is estimated directly and is therefore allowed to take a negative value.

The Akaike information criterion (AIC), computed as twice the number of model parameters minus twice the log-likelihood, is used as model selection criterion. When comparing models with and without response bias parameters, the model with the lowest AIC is considered to be the best model.

The goodness of fit of the models estimated by maximization of the log-likelihood (3.1) can be evaluated with the asymptotically chi-squared distributed G 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2$$\end{document} statistic

(3.2) G ( d f ) 2 = 2 · n ln ( n / n ^ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} G^2_{(df)}=2\cdot \varvec{n}'\ln (\varvec{n}/\hat{\varvec{n}}) \end{aligned}$$\end{document}

where n ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{n}}$$\end{document} is the vector with the fitted randomized response frequencies and df the degrees of freedom of the model.

4. Power study

This section presents a power study to detect θ / τ c { 0 , . 05 , . 1 , . 15 , . 2 } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta /\tau _c\in \{0,.05,.1,.15,.2\}$$\end{document} for the ever/last year designs with one set of ever/last year questions, one set of ever/last year question and a third (dichotomous) question, and two sets of ever/last year questions.

Figure 1 Power curves for detecting θ / τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta /\tau _c$$\end{document} .

For these three designs, π never \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{never}$$\end{document} is defined as the prevalence of respondents with the true response profiles nn, nnn or nnnn, respectively, with the prevalence of remaining true response profiles set to ( 1 - π never ) / k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-\pi _{never})/k$$\end{document} , where k denotes the number of the remaining true response profiles. The probabilities that the randomized response coincides with the true response is set to p = 5 / 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=5/6$$\end{document} for each question in the design. The sample sizes n are displayed on a logarithmic scale.

The plots show that the power to detect cheating/SP-no saying increases with the number of questions and, to a lesser extent, with smaller values for π never \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{never}$$\end{document} . For example, to attain a power of 80% to detect of a prevalence of 5% given π never = . 9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{never}=.9$$\end{document} requires a sample size around 10, 000 for the design with one set of ever/last year questions, while for the design with two sets of ever/last year questions the required sample size is around 4, 000. For π never { 0.7 , 0.8 } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{never}\in \{0.7, 0.8\}$$\end{document} , the required sample sizes are slightly smaller. The power increases rapidly as the value for θ / τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta /\tau _c$$\end{document} increases. For θ / τ c = 0.1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta /\tau _c=0.1$$\end{document} and π never = . 9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{never}=.9$$\end{document} , the required sample sizes are around 2, 500 for one set of ever/last year questions, and around 800 for two sets. Under the most favorable condition that π never = 0.7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{never}=0.7$$\end{document} and θ / τ c = 0.2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta /\tau _c=0.2$$\end{document} , a sample size around 200 suffices.

We also investigated the power of detecting the π former \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{former}$$\end{document} and π l a s t y e a r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{last~year}$$\end{document} in the design with one set of ever/last year questions, and for decreasing the probability p that the randomized response coincides with the true response. The results show that π l a s t y e a r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{last~year}$$\end{document} is estimated more efficiently than when estimated on the basis of a single question, and that the power to detect θ / τ c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta /\tau _c$$\end{document} increases when using p = 2 / 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=2/3$$\end{document} instead of p = 5 / 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=5/6$$\end{document} . For the power curves of these studies, we refer to Appendix B (https://osf.io/autr5/?view_only=2af4b338b9be45cd8657f926438c5f93).

5. Examples

In this section, we present analyses of two online surveys which, aside from demographic and sport-related questions, included randomized response questions on the use of doping. Study I was conducted by the HAN University of Applied Sciences and Utrecht University (Hilkens et al., Reference Hilkens, Cruyff, Woertman, Benjamins and Evers2021) among 2, 269 male gym users. In this study the researchers were interested in the prevalence of current and former use of anabolics, and in the lifetime use of SARMs (because SARMs use is a relatively new phenomenon, the researchers found the distinction between former and current use less relevant). The respondents were asked the ever/last year questions “Have you ever/in the last 12 months used anabolic steroids (e.g., Testosterone, Deca, Winstrol, Dianabol, Anavar)?” and the question “Have you ever used SARMs (Selective Androgen Receptor Modulators).” To answer these questions, respondents were shown a circle and a square symbol with the answers “yes” and “no” rapidly changing position. When clicking a Stop button the changing of positions stopped, and the respondents were asked to answer “Circle” or “Square” depending on the symbol that contained their true answer. The probability that “yes” ended up in the circle was fixed at 5/6, so that p y | y = p n | n = 5 / 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{y|y} = p_{n|n} = 5/6$$\end{document} .

Study II (De Hon et al., Reference De Hon, Kuipers and van Bottenburg2015) was conducted in 2014 by the Doping Authority Netherlands, which was especially interested in distinguishing between current and former use of doping substances. The sample included 1, 050 Dutch elite athletes, who were asked ever/last year questions about both the use of anabolics and blood manipulations (e.g., EPO). The study employed two different RR techniques, with 535 athletes answering according to the forced response design (Boruch, Reference Boruch1971) and 515 following a procedure similar to that of Study I but with answer categories “A”/“B” instead of “Circle”/“Square.” As in Study I, both RR techniques used p y | y = p n | n = 5 / 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{y|y}=p_{n|n}=5/6$$\end{document} . The data of both studies are shown in Table 2. The R code of the analyses is given in Appendix C.

Table 2 Observed response frequencies of Studies I and II.

5.1. Ever/last year use of anabolics

Table 3 shows the prevalence estimates for the ever/last year questions about anabolic steroids use of Study I. The response frequencies can be obtained from Table 2 by collapsing over the third index. The estimates are obtained with models presented in section 2.1. All three models fit the data adequately, but the AIC prefers the null model over the response bias models, so that these models do not provide significant evidence for SP-no/cheating. The null model estimates a prevalence of 4.3% of former user of anabolics, and 4.7% of last year users.

Table 3 Prevalence estimates of anabolics (A) of Study I.

5.2. Ever/last year use of anabolics and ever use of SARMs

Table 4 presents the prevalence estimates of the models presented in Sect. 2.2 for the ever/last year questions about anabolics and the ever question about SARMs of Study I. For these data, the SP(last year) model has been included in the analysis. The null, SP-no and CDM models exhibit an adequate fit. The AIC prefers the SP-no and CDM models, which yield a prevalence estimate of 6.4% for cheating/SP-no saying. And although the SP(last year) model yields significant estimates for both θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} parameters, the model is not preferred on the basis of the AIC.

Table 4 Parameter estimates of anabolics (A) and SARMs (S) of Study I.

5.3. Ever/last year use of anabolics (A) and blood manipulations (B).

Table 5 presents the parameter estimates of the models for the two sets of ever/last year questions of Study II. All four models exhibit an adequate fit, but the AIC prefers the SP-no/CDM. These models yield a prevalence estimate of 7.1% for SP-no/cheating. The inclusion of the parameters θ y y · · y n · · = θ · · y y · · y n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy\cdot \cdot ~\rightarrow ~yn\cdot \cdot }=\theta _{\cdot \cdot yy~\rightarrow ~\cdot \cdot yn}$$\end{document} of the SP(last year) model does not improve the goodness of fit of these models.

Table 5 Parameter estimates of anabolics and blood manipulations of Study II.

6. Discussion

By formulating CDM and SP-no models for the sub-samples and multiple question designs, we have shown the correspondence between the parameters of these models. We then introduced the ever/last year design as an alternative to these designs, and formulated models for this design and its extensions to multiple questions. By doing so, we provide practitioners with the tools to make a well-informed choice for a design. This choice should primarily be based on the purpose of the study, but it also involves considerations about the power to detect the prevalence of both the sensitive attribute(s) and evasive responses, and the trade-off between making untestable strong assumptions and the interpretability of the prevalence estimates as either point or interval estimates. Below we summarize the benefits and limitations that are relevant for the choice of a particular model and design.

Designs for a single sensitive attribute are the sub-samples and ever/last year design. The null models for these designs both have a degree of freedom, but they differ in the assumptions that have to make to estimate cheating/SP-no saying. The sub-samples design simply assumes that cheaters/SP-no sayers answer “no” to the sensitive question irrespective of the outcome of the randomizer, while the ever/last year design makes the additional strong assumption that cheaters/SP-no sayers do this to both questions. The benefit of the latter design is that it yields a more efficient estimate of the last year users than the sub-samples design. The models for these designs also differ in the assumptions they make with respect to the true responses of the cheaters/SP-no sayers. The CDM is assumption-free in the sense that it does not make any assumptions with respect to the true responses of the cheaters. As a consequence, the prevalence estimates of carriers and non-carriers are interval estimates, the width of which is determined by the prevalence estimate of the cheaters. The SP-no model, on the other hand, does make assumptions with respect to the true responses of the SP-no sayers, and therefore yields point estimates for the carriers and non-carriers that are corrected for SP-no saying.

In this paper we focused on CDM and SP-no models for the ever/last year design, and its extensions with a dichotomous question and another set of ever/last year questions. The main benefit of the extended designs is the increased power to detect cheating/SP-no saying. The power increase is exemplified by the data of Study I, where the models for the ever/last year questions on anabolics yielded an insignificant prevalence estimate of 4.6% for cheating/SP-no saying, while the model with the additional SARMs question yielded a significant estimate of 6.4%. A reviewer wondered whether this result could not alternatively be explained by a higher proportion of evasive responses to the SARMs question. We have included a simulation study in Appendix C to investigate this. The study shows that a higher proportion of evasive responses to the SARMs question does not bias the prevalence estimates of never, former and last year users of anabolics, and does not lead to a higher estimates of SP-no saying/cheating. It does however result in an underestimate for SARMs use. The reason for this is that the prevalence estimate for the SARMs question is free to take any value because its (univariate) model is saturated. In the model for the three questions the excess of evasive responses to the SARMs question will therefore be explained by an underestimate of the SARMs prevalence and will not substantially affect the estimate of SP-no saying/cheating.

An advantage of the SP-no model over the CDM is that it allows for the inclusion of additional response bias parameters. The SP(last year) model served as a somewhat speculative example of this. For the data of Study I, this model yielded contradictory results, with a highly significant estimate of the parameter θ y y · y n · \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy\cdot ~\rightarrow ~yn\cdot }$$\end{document} , but also a higher AIC than the competing models. It is not clear whether these results indicate that the parameter θ y y · y n · \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{yy\cdot ~\rightarrow ~yn\cdot }$$\end{document} indeed detected some last year users that answered evasively to the last year question, or that the model simply overfitted the data. In the data of Study II, however, provided no evidence for the SP(last year) assumption. Future studies may provide a more definite answer to the question whether last year carriers are indeed inclined to answer “no” to the last year question when they already answer “yes” to the ever question.

A final word on the interpretation of the cheating/SP-no parameters. These parameters are sometimes erroneously interpreted as the proportion of misreported responses. This is incorrect, because a substantial fraction of the cheaters/SP-no sayers does not have to misreport because they have to answer “no” by design. As an illustration, consider a dichotomous question with p n | n = 5 / 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{n|n}=5/6$$\end{document} , so that five out of six non-carriers have to answer “no,” so that only one out of six of the SP-no sayers in this group has to answer “yes” and thus has to misreport. Analogously, five out of every SP-no sayers in the carriers group have to misreport. Given a prevalence of carriers of 0.1 and of SP-no sayers of 0.2, the total proportion of actually misreported responses in the entire sample is then computed as ( . 2 ) ( . 9 ) ( 1 / 6 ) + ( . 2 ) ( . 1 ) ( 5 / 6 ) 0.047 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(.2)(.9)(1/6)+(.2)(.1)(5/6)\approx 0.047$$\end{document} . In this example, less than one-quarter of the SP-no sayers had to misreport.

A closing remark concerns regression models for the ever/last year design. While there are numerous examples of regression models for the multiple questions design in the literature [e.g., Böckenholt and van der Heijden (Reference Böckenholt and van der Heijden2007)], such models have not yet been developed for the ever/last year design. Appendix D (https://osf.io/autr5/?view_only=2af4b338b9be45cd8657f926438c5f93) derives regression models for this design that allow for the explanation of both the prevalence of the sensitive attribute and the probability of cheating/SP-no saying in terms of covariates.

7. Conclusion

This paper showed the correspondence between cheating and SP-no saying for the sub-samples and multiple questions design, and derived such models for a design with ever and last year questions about the same sensitive attribute. These models yield prevalence estimates of non-carriers, former carriers and last year carriers while at the same time allowing for the estimation of cheating/SP-no saying. We furthermore showed that by extending this design with questions about other sensitive attributes, alternative hypotheses about response bias can be tested through the inclusion of additional response bias parameters in the transition matrix. This allowed us to formulate a model that accounts for last year users who edit their response to the last year question when they already answered “yes” to the ever question, but we found no convincing evidence for this in our example data. For researchers who are more interested in detecting response biases than in the prevalence estimates of the sensitive attributes, we recommend to use randomization probabilities closer to 0.5, as our power study showed that this enhances the power to detect such biases. Obviously, this benefit comes at the expense of increased variances of the prevalence estimates of the sensitive attribute. The Achilles heel of the models for multiple questions is the strong assumption that a class of cheaters/SP-no sayers exists who consequently give an evasive “no” answer to all questions. While it seems difficult to test this assumption experimentally, future sensitivity analyses can provide insight in the effects on the parameter estimates when this assumption is not or only partially fulfilled.

Acknowledgements

The authors thank the editor Prof. Sandip Sinharay and three anonymous reviewers for their helpful and constructive comments.

Author contribution

Conceptualization: Khadiga H. A. Sayed, Maarten J. L. F. Cruyff, Peter G. M. van der Heijden. Investigation: Khadiga H. A. Sayed, Maarten J. L. F. Cruyff, Peter G. M. van der Heijden. Methodology: Khadiga H. A. Sayed, Maarten J. L. F. Cruyff, Peter G. M. van der Heijden. Writing-original draft: Khadiga H. A. Sayed, Maarten J. L. F. Cruyff. Review & editing: Khadiga H. A. Sayed, Maarten J. L. F. Cruyff, Peter G. M. van der Heijden.

Data availability

The dataset, appendix of all derivations, and the R codes necessary to reproduce the results are available on the OSF repository via: https://osf.io/autr5/?view_only=2af4b338b9be45cd8657f926438c5f93

Declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Böckenholt, U., Barlas, S., van der Heijden, P. G. M. (2009). Do randomized-response designs eliminate response biases? An empirical study of non-compliance behavior. Journal of Applied Econometrics, 24(3), 377392.CrossRefGoogle Scholar
Böckenholt, U., van der Heijden, P. G. M. (2007). Item randomized-response models for measuring noncompliance: Risk-return perceptions, social influences, and self-protective responses. Psychometrika, 72(2), 245262.CrossRefGoogle Scholar
Boeije, H., Lensvelt-Mulders, G. (2002). Honest by chance: A qualitative interview study to clarify respondents’ (non-) compliance with computer-assisted randomized response. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 75(1), 2439.CrossRefGoogle Scholar
Boruch, R. F. (1971). Assuring confidentiality of responses in social research: A note on strategies. The American Sociologist, 6(4), 308311.Google Scholar
Chaudhuri, A., Mukerjee, R. (1988). Randomized response: Theory and techniques, Marcel Dekker.Google Scholar
Christiansen, A. V., Frenger, M., Chirico, A., Pitsch, W. (2023). Recreational athletes’ use of performance-enhancing substances: Results from the first European randomized response technique survey. Sports Medicine-Open, 9(1), 117.CrossRefGoogle ScholarPubMed
Clark, S. J., Desharnais, R. A. (1998). Honest answers to embarrassing questions: Detecting cheating in the randomized response model. Psychological Methods, 3(2), 160168.CrossRefGoogle Scholar
Cruyff, M. J. L. F., Böckenholt, U., van den Hout, A., van der Heijden, P. G. M. (2008). Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model. The Annals of Applied Statistics, 2(1), 316331.CrossRefGoogle Scholar
Cruyff, M. J. L. F., Böckenholt, U., van der Heijden, P. G. M. (2016). The multidimensional randomized response design: Estimating different aspects of the same sensitive behavior. Behavior Research Methods, 48(1), 390399.CrossRefGoogle ScholarPubMed
Cruyff, M. J. L. F., van den Hout, A., van der Heijden, P. G. M. (2008). The analysis of randomized response sum score variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 2130.CrossRefGoogle Scholar
Cruyff, M. J. L. F., van den Hout, A., van der Heijden, P. G. M., Böckenholt, U. (2007). Log-linear randomized-response models taking self-protective response behavior into account. Sociological Methods & Research, 36(2), 266282.CrossRefGoogle Scholar
De Hon, O., Kuipers, H., van Bottenburg, M. (2015). Prevalence of doping use in elite sports: A review of numbers and methods. Sports Medicine, 45(1), 5769.CrossRefGoogle ScholarPubMed
De Jong, M. G., Pieters, R., Fox, J-P (2010). Reducing social desirability bias through item randomized response: An application to measure underreported desires. Journal of Marketing Research, 47(1), 1427.CrossRefGoogle Scholar
Edgell, S. E., Himmelfarb, S., Duchan, K. L. (1982). Validity of forced response in a randomized response model. Sociological Methods & Research, 11(1), 89110.CrossRefGoogle Scholar
Elbe, A-M, Pitsch, W. (2018). Doping prevalence among Danish elite athletes. Performance Enhancement & Health, 6(1), 2832.CrossRefGoogle Scholar
Feth, S., Frenger, M., Pitsch, W., & Schmelzeisen, P. (2017). Cheater detection for randomized response-techniques: Derivation, analyses and application. Saarländische Universitäts-und Landesbibliothek.Google Scholar
Fincoeur, B., Pitsch, W. (2017). Omgaan met sociale wenselijkheid: Inschatting van de dopingprevalentie aan de hand van de randomized response technique. Panopticon: Journal of Criminal Law, Criminology and Criminal Justice, 38(5), 376386.Google Scholar
Fox, J-P, Avetisyan, M., van der Palen, J. (2013). Mixture randomized item-response modeling: A smoking behavior validation study. Statistics in Medicine, 32(27), 48214837.CrossRefGoogle ScholarPubMed
Fox, J-P, Meijer, R. R. (2008). Using item response theory to obtain individual information from randomized response data: An application using cheating data. Applied Psychological Measurement, 32(8), 595610.CrossRefGoogle Scholar
Frenger, M., Pitsch, W., Emrich, E. (2016). Sport-induced substance use—An empirical study to the extent within a German sports association. PLoS ONE, 11(10).CrossRefGoogle Scholar
Greenberg, B. G., Abul-Ela, A-LA, Simmons, W. R., Horvitz, D. G. (1969). The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association, 64(326), 520539.CrossRefGoogle Scholar
Heck, D. W., Hoffmann, A., Moshagen, M. (2018). Detecting nonadherence without loss in efficiency: A simple extension of the crosswise model. Behavior Research Methods, 50(5), 18951905.CrossRefGoogle ScholarPubMed
Hilkens, L., Cruyff, M., Woertman, L., Benjamins, J., Evers, C. (2021). Social media, body image and resistance training: Creating the perfect ‘Me’ with dietary supplements, anabolic steroids and SARM’s. Sports Medicine - Open, 7, 81.CrossRefGoogle ScholarPubMed
Hoef, J. M. V. (2012). Who invented the delta method?. The American Statistician, 66(2), 124127.CrossRefGoogle Scholar
Hoffmann, A., Diedenhofen, B., Verschuere, B., Musch, J. (2015). A strong validation of the crosswise model using experimentally-induced cheating behavior. Experimental Psychology, 62(6), 403414.CrossRefGoogle Scholar
Höglinger, M., Jann, B., Diekmann, A. (2016). Sensitive questions in online surveys: An experimental evaluation of different implementations of the randomized response technique and the crosswise model. Survey Research Methods, 10(3), 171187.Google Scholar
John, L. K., Loewenstein, G., Acquisti, A., Vosgerau, J. (2018). When and why randomized response techniques (fail to) elicit the truth. Organizational Behavior and Human Decision Processes, 148, 101123.CrossRefGoogle Scholar
Lamb, CW JrStem, DE Jr (1978). An empirical validation of the randomized response technique. Journal of Marketing Research, 15(4), 616621.CrossRefGoogle Scholar
Lara, D., García, S. G., Ellertson, C., Camlin, C., Suárez, J. (2006). The measure of induced abortion levels in Mexico using random response technique. Sociological Methods & Research, 35(2), 279301.CrossRefGoogle Scholar
Lensvelt-Mulders, G. J., Hox, J. J., van der Heijden, P. G. M., Maas, C. J. (2005). Meta-analysis of randomized response research: Thirty-five years of validation. Sociological Methods & Research, 33(3), 319348.CrossRefGoogle Scholar
Meisters, J., Hoffmann, A., Musch, J. (2022). More than random responding: Empirical evidence for the validity of the (extended) crosswise model. Behavior Research Methods, 55, 114.CrossRefGoogle ScholarPubMed
Meisters, J., Hoffmann, A., Musch, J. (2022). A new approach to detecting cheating in sensitive surveys: The cheating detection triangular model. Sociological Methods & Research, 53, 141.Google Scholar
Mieth, L., Mayer, M. M., Hoffmann, A., Buchner, A., Bell, R. (2021). Do they really wash their hands? Prevalence estimates for personal hygiene behaviour during the COVID-19 pandemic based on indirect questions. BMC Public Health, 21(1), 18.CrossRefGoogle ScholarPubMed
Moshagen, M., Hilbig, B. E., Erdfelder, E., Moritz, A. (2014). An experimental validation method for questioning techniques that assess sensitive issues. Experimental Psychology, 61(1), 4854.CrossRefGoogle ScholarPubMed
Ostapczuk, M., Moshagen, M., Zhao, Z., Musch, J. (2009). Assessing sensitive attributes using the randomized response technique: Evidence for the importance of response symmetry. Journal of Educational and Behavioral Statistics, 34(2), 267287.CrossRefGoogle Scholar
Ostapczuk, M., Musch, J., Moshagen, M. (2011). Improving self-report measures of medication non-adherence using a cheating detection extension of the randomised-response-technique. Statistical Methods in Medical Research, 20(5), 489503.CrossRefGoogle ScholarPubMed
Petróczi, A., Cruyff, M., de Hon, O., Sagoe, D., Saugy, M. (2022). Hidden figures: Revisiting doping prevalence estimates previously reported for two major international sport events in the context of further empirical evidence and the extant literature. Frontiers in Sports and Active Living, 4, 122.CrossRefGoogle ScholarPubMed
Pitsch, W., Emrich, E. (2011). The frequency of doping in elite sport: Results of a replication study. International Review for the Sociology of Sport, 47(5), 559580.CrossRefGoogle Scholar
Reiber, F., Bryce, D., Ulrich, R. (2022). Self-protecting responses in randomized response designs: A survey on intimate partner violence during the coronavirus disease 2019 pandemic. Sociological Methods & Research, 53, 132.Google Scholar
Reiber, F., Pope, H., Ulrich, R. (2020). Cheater detection using the unrelated question model. Sociological Methods & Research, 52, 123.Google Scholar
Sagoe, D., Cruyff, M., Spendiff, O., Chegeni, R., De Hon, O., Saugy, M., van der Heijden, P. G. M., Petróczi, A. (2021). Functionality of the crosswise model for assessing sensitive or transgressive behavior: A systematic review and meta-analysis. Frontiers in Psychology, 12, 119.CrossRefGoogle ScholarPubMed
Sayed, K. H. A., Cruyff, M. J. L. F., van der Heijden, P. G. M., Petróczi, A. (2022). Refinement of the extended crosswise model with a number sequence randomizer: Evidence from three different studies in the UK. PLoS ONE, 17(12).CrossRefGoogle ScholarPubMed
Sayed, K. H. A., Cruyff, M. J. L. F., van der Heijden, P. G. M. (2023). The analysis of randomized response “ever” and “last year” questions: A non-saturated multinomial model. Behavior Research Methods, 56, 114.CrossRefGoogle ScholarPubMed
Schröter, H., Studzinski, B., Dietz, P., Ulrich, R., Striegel, H., Simon, P. (2016). A comparison of the cheater detection and the unrelated question models: A randomized response survey on physical and cognitive doping in recreational triathletes. PLoS ONE, 11(5).CrossRefGoogle ScholarPubMed
Tian, G-L, Tang, M-L (2013). Incomplete categorical data design: Non-randomized response techniques for sensitive questions in surveys, CRC Press.Google Scholar
Tracy, P. E., Fox, J. A. (1981). The validity of randomized response for sensitive measurements. American Sociological Review, 46, 187200.CrossRefGoogle Scholar
Umesh, U. N., Peterson, R. A. (1991). A critical evaluation of the randomized response method: Applications, validation, and research agenda. Sociological Methods & Research, 20(1), 104138.CrossRefGoogle Scholar
van den Hout, A., Böckenholt, U., van der Heijden, P. G. M. (2010). Estimating the prevalence of sensitive behaviour and cheating with a dual design for direct questioning and randomized response. Journal of the Royal Statistical Society Series C: Applied Statistics, 59(4), 723736.CrossRefGoogle ScholarPubMed
van den Hout, A., Gilchrist, R., van der Heijden, P. G. M. (2010). The randomized response log linear model as a composite link model. Statistical Modelling, 10(1), 5767.CrossRefGoogle Scholar
van den Hout, A., van der Heijden, P. G. M. (2002). Randomized response, statistical disclosure control and misclassification: A review. International Statistical Review, 70(2), 269288.Google Scholar
van den Hout, A., van der Heijden, P. G. M. (2004). The analysis of multivariate misclassified data with special attention to randomized response data. Sociological Methods & Research, 32(3), 384410.CrossRefGoogle Scholar
van der Heijden, P. G. M., Van Gils, G., Bouts, J., Hox, J. J. (2000). A comparison of randomized response, computer-assisted self-interview, and face-to-face direct questioning: Eliciting sensitive information in the context of welfare and unemployment benefit. Sociological Methods & Research, 28(4), 505537.CrossRefGoogle Scholar
Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 6369.CrossRefGoogle ScholarPubMed
Wolter, F., Preisendörfer, P. (2013). Asking sensitive questions: An evaluation of the randomized response technique versus direct questioning using individual validation data. Sociological Methods & Research, 42(3), 321353.CrossRefGoogle Scholar
Figure 0

Table 1 Correspondence between the parameters of the CDM and SP-no model.

Figure 1

Figure 1 Power curves for detecting θ/τc\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta /\tau _c$$\end{document}.

Figure 2

Table 2 Observed response frequencies of Studies I and II.

Figure 3

Table 3 Prevalence estimates of anabolics (A) of Study I.

Figure 4

Table 4 Parameter estimates of anabolics (A) and SARMs (S) of Study I.

Figure 5

Table 5 Parameter estimates of anabolics and blood manipulations of Study II.