We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This is a reaction to Borsboom’s (2006) discussion paper on the issue that psychology takes so little notice of the modern developments in psychometrics, in particular, latent variable methods. Contrary to Borsboom, it is argued that latent variables are summaries of interesting data properties, that construct validation should involve studying nomological networks, that psychological research slowly but definitely will incorporate latent variable methods, and that the role of psychometrics in psychology is that of partner, not role model.
This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.
Borsboom (2006) attacks psychologists for failing to incorporate psychometric advances in their work, discusses factors that contribute to this regrettable situation, and offers suggestions for ameliorating it. This commentary applauds Borsboom for calling the field to task on this issue and notes additional problems in the field regarding measurement that he could add to his critique. It also chastises Borsboom for occasionally being unnecessarily perjorative in his critique, noting that negative rhetoric is unlikely to make converts of offenders. Finally, it exhorts psychometricians to make their work more accessible and points to Borsboom, Mellenbergh, and Van Heerden (2003) as an excellent example of how this can be done.
A new model of confirmatory factor analysis (CFA) for multitrait-multimethod (MTMM) data sets is presented. It is shown that this model can be defined by only three assumptions in the framework of classical psychometric test theory (CTT). All other properties of the model, particularly the uncorrelatedness of the trait with the method factors are logical consequences of the definition of the model. In the model proposed there are as many trait factors as different traits considered, but the number of method factors is one fewer than the number of methods included in an MTMM study. The covariance structure implied by this model is derived, and it is shown that this model is identified even under conditions under which other CFA-MTMM models are not. The model is illustrated by two empirical applications. Furthermore, its advantages and limitations are discussed with respect to previously developed CFA models for MTMM data.
Established results on latent variable models are applied to the study of the validity of a psychological test. When the test predicts a criterion by measuring a unidimensional latent construct, not only must the total score predict the criterion, but the joint distribution of criterion scores and item responses must exhibit a certain pattern. The presence of this population pattern may be tested with sample data using the stratified Wilcoxon rank sum test. Often, criterion information is available only for selected examinees, for instance, those who are admitted or hired. Three cases are discussed: (i) selection at random, (ii) selection based on the current test, and (iii) selection based on other measures of the latent construct. Discriminant validity is also discussed.
In this chapter we review advanced psychometric methods for examining the validity of self-report measures of attitudes, beliefs, personality style, and other social psychological and personality constructs that rely on introspection. The methods include confirmatory-factor analysis to examine whether measurements can be interpreted as meaningful continua, and measurement invariance analysis to examine whether items are answered the same way in different groups of people. We illustrate the methods using a measure of individual differences in openness to political pluralism, which includes four conceptual facets. To understand how the facets relate to the overall dimension of openness to political pluralism, we compare a second-order factor model and a bifactor model. We also check to see whether the psychometric patterns of item responses are the same for males and females. These psychometric methods can both document the quality of obtained measurements and inform theorists about nuances of their constructs.
The validity of conclusions drawn from specific research studies must be evaluated in light of the purposes for which the research was undertaken. We distinguish four general types of research: description and point estimation, correlation and prediction, causal inference, and explanation. For causal and explanatory research, internal validity is critical – the extent to which a causal relationship can be inferred from the results of variation in the independent and dependent variables of an experiment. Random assignment is discussed as the key to avoiding threats to internal validity. Internal validity is distinguished from construct validity (the relationship between a theoretical construct and the methods used to operationalize that concept) and external validity (the extent to which the results of a research study can be generalized to other contexts). Construct validity is discussed in terms of multiple operations and discriminant and convergent validity assessment. External validity is discussed in terms of replicability, robustness, and relevance of specific research findings.
Lakshmi Balachandran Nair, Libera Università Internazionale degli Studi Sociali Guido Carli, Italy,Michael Gibbert, Università della Svizzera Italiana, Switzerland,Bareerah Hafeez Hoorani, Radboud University Nijmegen, Institute for Management Research, The Netherlands
We introduce and define the single holistic case study design in this chapter. The strengths of the design are discussed in detail, with examples. In particular, we discuss the potential of single holistic design in providing a detailed explanation of processes. Single holistic case studies also explore the theorizing potential of unique cases which hold the potential to reveal new dimensions of a phenomenon or falsify/refute an existing theory. Relatively high data access, construct validity, potential to include an unlimited number of variables, etc., are some other strengths that we discuss. The weaknesses of the design (i.e. low internal and external validity) are discussed afterwards. The chapter also addresses some common (mis)conceptions regarding single holistic designs and their external validity.
Sound general and sports nutrition knowledge in athletes is essential for making appropriate dietary choices. Assessment of nutrition knowledge enables evaluation and tailoring of nutrition education. However, few well-validated tools are available to assess nutrition knowledge in athletes. The objective of the present study was to establish the validity of the Platform to Evaluate Athlete Knowledge Sports – Nutrition Questionnaire (PEAKS-NQ) for use in the United Kingdom and Irish (UK-I) athletes. To confirm content validity, twenty-three sports nutritionists (SNs) from elite, UK-I sports institutes provided feedback on the PEAKS-NQ via a modified Delphi method. After minor changes, the UK-I version of the PEAKS-NQ was administered to UK-I SN from the British Dietetic Association Sport and Exercise Nutrition Register, and elite athletes (EA) training at elite sports institutes in the UK and Ireland. Independent samples t-test and independent samples median tests were used to compare PEAKS-NQ total and subsection scores between EA and SN (to assess construct validity). Cronbach's alpha (good ≥ 0⋅7) was used to establish internal consistency. The SN achieved greater overall [SN (n 23) 92⋅3 (9⋅3) v. EA (n 154): 71⋅4 (10⋅0)%; P < 0⋅001] and individual section scores (P < 0⋅001) except Section B, Identification of Food Groups (P = 0⋅07). Largest knowledge differences between SN and EA were in Section D, Applied Sports Nutrition [SN: 88⋅5 (8⋅9) v. EA: 56⋅7 (14⋅5)%; P < 0⋅00]. Overall ES was large (2⋅1), with subsections ranging from 0⋅6 to 2⋅3. Cronbach's alpha was good (0⋅83). The PEAKS-NQ had good content and construct validity, supporting its use to assess nutrition knowledge of UK-I athletes.
Psychopathologists have failed to make significant progress toward understanding the causes of psychopathology. Despite the foundational importance of construct validity and measurement to our field, insufficient attention is paid to these concerns in the assessment of psychopathology vulnerabilities prior to their implementation in causal models. I review the current state of construct validity and measurement in psychopathology research, highlighting the lack of consensus regarding how we should define and measure vulnerability constructs. The limited capacity of open science practices to address these definitional and measurement challenges is discussed. Recommendations for progress are made, including the need for consensus agreement on (1) working definitions and (2) measures of vulnerability constructs. Other recommendations include (3) the need to incentivize ‘pre-clinical’ descriptive work focused on measurement development, (4) the formation of open-access databases designed to facilitate measurement evaluation and development, and (5) increased exploration of the use of novel technologies to facilitate the collection of high-quality measures of vulnerability.
To evaluate the construct validity of the NIH Toolbox Cognitive Battery (NIH TB-CB) in the healthy oldest-old (85+ years old).
Method:
Our sample from the McKnight Brain Aging Registry consists of 179 individuals, 85 to 99 years of age, screened for memory, neurological, and psychiatric disorders. Using previous research methods on a sample of 85 + y/o adults, we conducted confirmatory factor analyses on models of NIH TB-CB and same domain standard neuropsychological measures. We hypothesized the five-factor model (Reading, Vocabulary, Memory, Working Memory, and Executive/Speed) would have the best fit, consistent with younger populations. We assessed confirmatory and discriminant validity. We also evaluated demographic and computer use predictors of NIH TB-CB composite scores.
Results:
Findings suggest the six-factor model (Vocabulary, Reading, Memory, Working Memory, Executive, and Speed) had a better fit than alternative models. NIH TB-CB tests had good convergent and discriminant validity, though tests in the executive functioning domain had high inter-correlations with other cognitive domains. Computer use was strongly associated with higher NIH TB-CB overall and fluid cognition composite scores.
Conclusion:
The NIH TB-CB is a valid assessment for the oldest-old samples, with relatively weak validity in the domain of executive functioning. Computer use’s impact on composite scores could be due to the executive demands of learning to use a tablet. Strong relationships of executive function with other cognitive domains could be due to cognitive dedifferentiation. Overall, the NIH TB-CB could be useful for testing cognition in the oldest-old and the impact of aging on cognition in older populations.
In this paper, we evaluate the factorial validity of the Spanish short version of the Utrecht Work Engagement Scale (UWES–9) and assess its predictive validity with respect to self-assessed work performance. A total of 229 employees from educational institutions in Ecuador participated. Using a model comparison analysis, the unidimensional model exhibited an excellent goodness of fit, χ2 = 26.176 (24), p = .344; CFI =1.000; TLI = 1.000; RMSEA = .020; SRMR = .034; it was not improved by more complex models, Three-factor model: χ2 = 22.148 (21), p = .391; CFI =1.000; TLI = 1.000; RMSEA = .016; SRMR = .033. Two-factor model: χ2 = 26.080 (23), p = .297; CFI = 1.000; TLI = 1.000; RMSEA = .025; SRMR = .034). Therefore, it is justified as a unidimensional instrument of work engagement. However, upon analyzing the correlation patterns of the overall score and the work engagement dimensions in relation to the task performance, contextual performance, and counterproductive behaviors, we conclude that, while the unidimensional model exhibits a good fit, the three-factor theoretical approach is substantively superior in that it maintains differential predictive validity for each theoretical dimension.
Rowland Universal Dementia Assessment Scale (RUDAS) is a brief cognitive test, appropriate for people with minimum completed level of education and sensitive to multicultural contexts. It could be a good instrument for cognitive impairment (CI) screening in Primary Health Care (PHC). It comprises the following areas: recent memory, body orientation, praxis, executive functions and language.
Research Objective:
The objective of this study is to assess the construct validity of RUDAS analysing its internal consistency and factorial structure.
Method:
Internal consistency will be calculated using ordinal Cronbach’s α, which reflects the average inter-item correlation score and, as such, will increase when correlations between the items increase. Exploratory Factor Analysis will be used to arrange the variables in domains using principal components extraction. The factorial analysis will include the extraction of five factors reflecting the neuropsychological areas assessed by the test. The result will be rotated under Varimax procedure to ease interpretation.
Exploratory factor analysis will be used to arrange the variables in domains using principal components extraction. The analysis will include Kaiser–Meyer–Olkin measure of sampling adequacy and Bartlett’s test of sphericity. Estimations will be based based on Pearson’s correlations between indicators using a principal component analysis and later replicated with a tetrachoric correlation matrix. The variance in the tetrachoric model will be analysed to indentify convergent iterations and their explicative power.
Preliminary results of the ongoing study:
RUDAS is being administered to 321 participants older than 65 years, from seven PHC physicians’ consultations in O Grove Health Center. The data collection will be finished by August 2021 and in this poster we will present the final results of the exploratory factor analysis.
Conclusions:
We expect that the results of the exploratory factor analysis will replicate the results of previous studies of construct validity of the test in which explanatory factor weights were between 0.57 and 0.82, and all were above 40%. Confirming that RUDAS has a strong factor construct with high factor weights and variance ratio, and 6-item model is appropriate for measurement will support its recommendation as a valid screening instrument for PHC.
Delay discounting paradigms have gained widespread popularity across clinical research. Given the prevalence in the field, researchers have set lofty expectations for the importance of delay discounting as a key transdiagnostic process and a ‘core’ process underlying specific domains of dysfunction (e.g. addiction). We believe delay discounting has been prematurely reified as, in and of itself, a core process underlying psychological dysfunction, despite significant concerns with the construct validity of discounting rates. Specifically, high delay discounting rates are only modestly related to measures of psychological dysfunction and therefore are not ‘core’ to these more complex behavioral problems. Furthermore, discounting rates do not appear to be specifically related to any disorder(s) or dimension(s) of psychopathology. This raises fundamental concerns about the utility of discounting, if the measure is only loosely associated with most forms of psychopathology. This stands in striking contrast to claims that discounting can serve as a ‘marker’ for specific disorders, despite never demonstrating adequate sensitivity or specificity for any disorder that we are aware of. Finally, empirical evidence does not support the generalizability of discounting rates to other decisions made either in the lab or in the real-world, and therefore discounting rates cannot and should not serve as a summary measure of an individual's decision-making patterns. We provide recommendations for improving future delay discounting research, but also strongly encourage researchers to consider whether the empirical evidence supports the field's hyper-focus on discounting.
This chapter examines experimental treatments and the theoretical, practical and empirical issues involved in their implementation. I begin by discussing the underlying purpose of experimental treatments. Second, I address what it means to say that a treatment has generalizable effects. Third, I discuss practical issues involved in constructing treatments in a variety of contexts including written, spoken, visual, and behavioral interventions. In the fourth section, I highlight the importance of validating that experimental treatments have induced the intended differences by experimental condition in the independent variable. I point to the general neglect of manipulation checks in experiments in political science and emphasize what can be learned through their inclusion. Contemporary publications provide some evidence of confusion among political scientists about the purposes for which manipulation checks and attention checks are appropriate. In the fifth and final section, I highlight the need for political scientists to move beyond between-subject assignment of treatments to consider far more powerful within-subject and hybrid experimental treatments.
Reliable and valid assessment of sports nutrition knowledge can inform athlete nutrition education to address knowledge gaps. This study aimed to test the reliability and validity of an electronically administered sports nutrition knowledge tool – Platform to Evaluate Athlete Knowledge of Sports Nutrition Questionnaire (PEAKS-NQ). A 94-item PEAKS-NQ was piloted to 149 developmental athletes (DA) in New Zealand, with a subset invited to complete the PEAKS-NQ again to assess reliability. Reliability was evaluated using sign test, intraclass correlation and Cronbach’s α. Accredited sports dietitians (ASD; n 255) completed the PEAKS-NQ to establish construct validity via known-groups methodology and provided relevance scores to determine the scale content validity index (S-CVI). Rasch analysis was conducted to identify potentially problematic items and test reliability. Score differences between DA and ASD were analysed using independent t or non-parametric tests. DA (n 88) were 17·8 (sd 1·4) years, 61·4 % female and mostly in high school (94·3 %). ASD (n 45) were 37·8 (sd 7·6) years, 82·2 % female, with >5 years of dietetic experience (59·1 %). ASD scored higher than DA in all sections and overall (91·5 (sd 3·4) v. 67·1 (sd 10·5) %) (P < 0·001). There were no differences between retests (n 18; P = 0·14). Cronbach’s α was 0·86. S-CVI indicated good content validity (0·88). Rasch analysis resulted in a fifty-item PEAKS-NQ with high item (0·91) and person (0·92) reliability. The PEAKS-NQ is reliable and valid for assessing sports nutrition knowledge which could assist practitioners effectively tailor and evaluate nutrition education.
Flory (this volume) provides a compelling review of evidence bearing on the reliability and validity of diagnostic interviews for personality disorders (PDs). This commentary discusses several issues central to this topic, among the most important of which are: (1) the importance of distinguishing PD categories and constructs from the measures used to quantify them; and (2) the need to separate critiques of overarching conceptual frameworks (e.g., the dimensional perspective on personality) from criticisms of narrower assessment rubrics (e.g., the Five-Factor Model). Given the introspective limitations inherent in human information processing—limitations which are magnified in many forms of personality pathology—rigorous validation of PD assessment tools requires that researchers complement self-report outcome measures with behavioral and performance-based indices of personality dysfunction. To illuminate causal relationships among different features of personality pathology researchers must use experimental methods to alter PD-related psychological processes and assess the impact of these manipulations on affect, cognition, and behavior.
The authors of this commentary aim to expand on particular points covered in the chapter by Evans, Williams and Simms, and discuss other issues that were not covered there. First, they discuss future research directions for, and the potential utility of, multisource assessments of personality pathology. Second, they emphasize the need for aspects of clinical utility within some of the reviewed measures (e.g., norms, validity scales). Third, they discuss the need for further examinations into the feasibility and utility of longitudinal assessments of personality pathology (e.g., dynamics in the context of treatment). Fourth, they describe two recent measures of personality pathology that warrant further validation. Lastly, they emphasize the need for a conceptual and measurement-based consensus regarding the multidimensional nature of personality pathology as a whole.
The paper discusses the relevance of sufficient psychometric standards for dementia rating scales. The concurrent, convergent and construct validity of the Mini Mental State Examination (MMSE), the Alzheimer's Disease Assessment Scale (ADAS) and the CAMCOG are assessed. The Clinical Global Impressions and the Global Deterioration Scale are used as global scales. The concurrent and convergent validity are satisfactory. The construct validity expressed by the Cronbach and Loevinger coefficient are very good for all scales and subscales. The Mokken's single item coefficients show that the MMSE has the best individual hierarchical fit, the item reading can be left out. The ADAS is less uni-dimensional, eight items can be left out. The CAMCOG consists of too many items to apply the Mokken's single item coefficients or the Loevinger coefficient. Instead, the CAMCOG subscales are analyzed. This results in a possible reduction of the CAMCOG by 30 items to a total of 35 items. The factor analysis reveals two factors in both the MMSE and the ADAS while the number of observations does not allow a factor analysis of the CAMCOG to be performed.