To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This study aimed to translate, culturally adapt, and validate the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire for Colorectal Cancer for Serbian patients.
Methods
The prospective cohort study was conducted at the Clinic for Digestive Surgery, University Clinical Center of Serbia, and included 150 Serbian-speaking colorectal adenocarcinoma patients undergoing colorectal surgery. The translation process involved rigorous forward and backward translations, pilot testing with patients, and statistical analysis for psychometric validation, including internal consistency, reliability, convergent and discriminant validity, concurrent validity, and known-groups validity.
Results
Results showed good internal consistency across most scales (Cronbach’s alpha values ranging from 0.769 to 0.855), with excellent split-half reliability (0.872). Convergent and discriminant validity analyses confirmed the questionnaire’s capacity to measure constructs it was theoretically related. The significant correlations were observed between corresponding scales and items of EORTC QLQ-C30 and EORTC QLQ-CR29 questionnaires. Known-groups analysis demonstrated the tool’s ability to distinguish between patient groups based on tumor location, stoma presence, and neoadjuvant therapy.
Significance of results
The Serbian version of the EORTC QLQ-CR29 is a reliable and valid instrument for assessing the quality of life in Serbian colorectal cancer patients, reflecting its potential for widespread clinical application.
Psychometric methods are used to remove underperforming items and reduce error in existing measures, albeit different approaches can produce different results. This study aimed to determine the implications of applying different psychometric methods for clinical trial outcomes.
Methods
Individual participant data from 15 antidepressant treatment trials from Vivli.org were analyzed. Baseline (pretreatment) and 8-week (range 4–12 weeks) outcome data from the Montgomery-Asberg Depression Rating Scale were subjected to best-practice factor analysis (FA), item response theory (IRT), and network analysis (NA) approaches. Trial outcomes for the original summative scores and psychometric-model scores were assessed using multilevel models. Percentage differences in Cohen’s d effect sizes for the original summative and psychometrically modeled scores were the effects of interest.
Results
Each method produced unidimensional models, but the modified scales varied from 7 to 10 items. Treatment effects (d = 0.072) were unchanged for IRT (10 items), decreased by 1.3%–2.8% (eight-item abbreviated d = 0.070; weighted score d = 0.071) for NA, and increased by 11%–12.5% (seven-item abbreviated model d = 0.081; weighted score d = 0.080) for FA.
Discussion
IRT and NA yielded negligible differences in effect outcomes relative to original trials. FA increased effect sizes and may be the most effective method for identifying the items on which placebo and treatment group outcomes differ.
To further investigate the “other side of the bell curve” hypothesis, the current study examined the number of low and high scores on a neuropsychological battery: 1) in cognitively unimpaired or impaired older adults, 2) as they relate to biomarkers of Alzheimer’s disease (AD), and 3) as they relate to traditional scores on this battery.
Method:
In 68 cognitively unimpaired and 97 cognitively impaired participant, the number of low (i.e., ≤ 16th percentile) and high (i.e., ≥ 75th percentile) scores on the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) were calculated, compared between the two groups, and related to biomarkers of AD (i.e., amyloid deposition, hippocampal volumes, ε4 alleles of Apolipoprotein E (APOE)) and RBANS Total score.
Results:
In this cognitively diverse sample, low and high scores were common, with approximately 75% having at least one low score and 86% having at least one high score. Unimpaired participants had significantly more high scores and fewer low scores than their impaired counterparts. The number of low scores was significantly related to more amyloid deposition, smaller hippocampal volume, and having one or more copies of the ε4 allele of APOE. The number of high scores was similarly related with these biomarkers. Low/high scores were comparable to traditional scores on the RBANS in identifying cognitively impaired participants.
Conclusions:
Support for the “other side of the bell curve” hypothesis was equivocal in these analyses, with both sides of the bell curve appearing to provide relevant information in a cognitively diverse sample.
No research has assessed Hamilton Rating Scale for Anxiety (HRSA) psychometric properties in Ethiopian university students, using item response theory (IRT) and classical theory.
Aims
This study aimed to assess psychometric properties of the English HRSA in Ethiopian students, using IRT and classical theory.
Method
University students (N = 370, age 21.44 ± 2.30 years) in Ethiopia participated in a cross-sectional study. Participants completed a self-reported measure of anxiety, a sociodemographics tool and interviewer-administered HRSA.
Results
Confirmatory factor analysis (CFA) favoured a one-factor structure because fit indices for the one-factor model; and two distinct two-factor models were similar, but high interfactor correlations violated discriminant validity criteria in two-factor models. This one-factor structure showed structural invariance as evidenced by multi-group CFA across gender groups. No ceiling/floor effects were seen for the HRSA total scores. Infit and outfit mean square values for all the items were within the acceptable range (0.6–1.4). Four threshold estimates (τi1, τi2, τi3 and τi4) for each item were ordered as expected. Differential item functions showed item-level measurement invariance for all the 14 HRSA items across gender for both uniform and non-uniform estimates. McDonald’s ω and Cronbach’s α for the HRSA tool were both 0.88. The convergent validity of the interviewer-administered HRSA with self-reported anxiety subscale of the 21-item Depression, Anxiety and Stress Scale was weak to moderate.
Conclusions
The findings favour the validity of a one-factor structure of the HRSA with adequate item properties (classical and rating scale model), convergent validity, reliability and measurement invariance (structural and item level) across gender groups in Ethiopian university students.
Impairments in social interaction are common symptoms of dementia and necessitate the use of validated neuropsychological instruments to measure social cognition. We aim to investigate the Hinting Task – Dutch version (HT-NL), which measures the ability to infer intentions behind indirect speech to assess Theory of Mind, in dementia.
Method:
Sixty-six patients with dementia, of whom 22 had behavioral variant frontotemporal dementia (bvFTD), 21 had primary progressive aphasia, and 23 had Alzheimer’s disease (AD), and 99 healthy control participants were included. We examined the HT-NL’s psychometric properties, including internal consistency, between-group differences using analyses of covariance with Bonferroni-adjusted post hoc comparisons, discriminative ability and concurrent validity using the area under the receiver operating characteristic curve (AUC), and construct validity using Spearman rank correlations with other cognitive tests.
Results:
Internal consistency was acceptable (Cronbach’s α = 0.74). All patient groups scored lower on the HT-NL than the control group. Patients with bvFTD scored lower than patients with AD dementia. The HT-NL showed excellent discriminative ability (AUC = 0.83), comparable to a test of emotion recognition (ΔAUC = 0.03, p = .67). The HT-NL correlated significantly with a test for emotion recognition (r = .45), and with measures of memory and language (r = [.31, .40]), but not with measures of information processing speed, executive functioning, or working memory (r = [.00, .17]). Preliminary normative data are provided.
Conclusions:
The HT-NL is a psychometrically sound and valid instrument and is useful for identifying Theory of Mind impairments in patients with dementia.
The aims of this study were to field and pilot test the Korean version of the Household Emergency Preparedness Instrument (K-HEPI) and perform psychometric testing of the instrument’s reliability and validity.
Methods
The English to Korean translation followed a symmetrical translation approach utilizing a decentered process (i.e., both the source and target languages were considered equally important) focusing on the instruments remaining loyal to the content. After translation, the K-HEPI was field tested with 30 bilingual participants who all reported that the instructions were easy to understand and the items aligned closely with the original English version. The K-HEPI was then pilot tested with 399 Korean-speaking participants in a controlled, before-after study utilizing a disaster preparedness educational intervention.
Results
Confirmatory factor analyses supported the K-HEPI retaining the factor structure of the original English version. The K-HEPI was also found to be psychometrically comparable to the original instrument.
Conclusions
The K-HEPI can validly and reliably assess the disaster preparedness of Korean-speaking populations, enabling clinicians, researchers, emergency management professionals, and policymakers to gather accurate data on disaster preparedness levels in Korean communities, identify gaps in preparedness, develop targeted interventions, and evaluate the effectiveness of disaster preparedness interventions over time.
We compare the Emory 10-item, 4-choice Rey Complex Figure (CF) Recognition task with the Meyers and Lange (M&L) 24-item yes/no CF Recognition task in a large cohort of healthy research participants and in patients with heterogeneous movement disorder diagnoses. While both tasks assess CF recognition, they differ in key aspects including the saliency of target and distractor responses, self-selection versus forced-choice formats, and the length of the item sets.
Participants and Methods:
There were 1056 participants from the Emory Healthy Brain Study (EHBS; average MoCA = 26.8, SD = 2.4) and 223 movement disorder patients undergoing neuropsychological evaluation (average MoCA = 24.3, SD = 4.0).
Results:
Both recognition tasks differentiated between healthy and clinical groups; however, the Emory task demonstrated a larger effect size (Cohen’s d = 1.02) compared to the M&L task (Cohen’s d = 0.79). d-prime scoring of M&L recognition showed comparable group discrimination (Cohen’s d = 0.81). Unidimensional two-parameter logistic item response theory analysis revealed that many M&L items had low discrimination values and extreme difficulty parameters, which contributed to the task’s reduced sensitivity, particularly at lower cognitive proficiency levels relevant to clinical diagnosis. Dimensionality analyses indicated the influence of response sets as a potential contributor to poor item performance.
Conclusions:
Emory CF Recognition task demonstrates superior psychometric properties and greater sensitivity to cognitive impairment compared to the M&L task. Its ability to more precisely measure lower levels of cognitive functioning, along with its brevity, suggests it may be more effective for diagnostic use, especially in clinical populations with cognitive decline.
To describe motor, respiratory and quality of life changes in a mixed cohort of adults with spinal muscular atrophy (SMA) from a single tertiary rehabilitation center in Canada and to report preliminary psychometric evidence of a nationally recommended core outcome set over 12 months.
Methods:
This real-world, mixed-treatment cohort, exploratory, single-site, prospective observational study followed fifteen adults with SMA over 12 months. Participants completed the Spinal Muscular Atrophy Recommended Toolkit (SMART), which consists of eight outcome measures (OM) assessed at baseline and 12 months. Concurrent and predictive validity were assessed using Spearman’s Correlation Coefficient (SCC). Longitudinal change and sensitivity to change were evaluated using the Wilcoxon signed-rank test and standardized response mean.
Results:
Ten participants were receiving disease-modifying treatments. None of the OMs demonstrated statistically significant changes over 12 months. Respiratory and motor function measures are independently clustered into two clusters. Only the Children’s Hospital of Philadelphia – Adult Test of Neuromuscular Disorders (CHOP-ATEND) exhibited high sensitivity to change. Forced vital capacity (FVC) >2 L or peak cough flow (PCF) >200 L/min corresponds with ceiling effects of the Revised Upper Limb Module (RULM) and SMA Functional Rating Scale (SMAFRS).
Conclusions:
This exploratory study identified two collinear clusters between SMART OMs, suggesting measurement redundancy. SMART OMs did not demonstrate significant changes over 12 months in this small mixed-treatment cohort. Developing new OMs that are valid, reliable and responsive, and optimizing OM selection will reduce clinic and patient burden, and improve clinical utility in a real-world setting.
The purpose of this study was to measure meal quality in representative samples of schoolchildren in three cities located in different Brazilian regions using the Meal and Snack Assessment Quality (MESA) scale and examine association with weight status, socio-demographic characteristics and behavioural variables. This cross-sectional study analysed data on 5612 schoolchildren aged 7–12 years who resided in cities in Southern, Southeastern and Northeastern Brazil. Dietary intake was evaluated using the WebCAAFE questionnaire. Body weight and height were measured to calculate the BMI. Weight status was classified based on age- and sex-specific Z-scores. Meal quality was measured using the MESA scale. Associations of meal quality with weight status and socio-demographic and behavioural variables were investigated using multinomial regression analysis. Schoolchildren in Feira de Santana, São Paulo and Florianópolis had a predominance of healthy (41·8 %), mixed (44·4 %) and unhealthy (42·7 %) meal quality, respectively. There was no association with weight status. Schoolchildren living in Feira de Santana, those who reported weekday dietary intakes, and those with lower physical activity and screen activity scores showed higher meal quality. Schoolchildren aged 10–12 years, those who reported dietary intakes relative to weekend days, and those with higher screen activity scores exhibited lower meal quality.
To assess for differences in low score frequency on cognitive testing amongst older adults with and without a self-reported history of traumatic brain injury (TBI) in the National Alzheimer’s Coordinating Center (NACC) dataset.
Method:
The sample included adults aged 65 or older who completed the Uniform Data Set 3.0 neuropsychological test battery (N = 7,363) and was divided by individuals with and without a history of TBI, as well as cognitive status as measured by the CDR. We compared TBI- and TBI + groups by the prevalence of low scores obtained across testing. Three scores falling at or below the 2nd percentile or four scores at or below the 5th percentile were criteria for an atypical number of low scores. Nonparametric tests assessed associations among low score prevalence and demographics, symptoms of depression, and TBI history.
Results:
Among cognitively normal participants (CDR = 0), older age, male sex and greater levels of depression were associated with low score frequency; among participants with mild cognitive impairment (CDR = 0.5-1), greater levels of depression, shorter duration of time since most recent TBI, and no prior history of TBI were associated with low score frequency.
Conclusions:
Participants with and without a history of TBI largely produced low scores on cognitive testing at similar frequencies. Cognitive status, sex, education, depression, and TBI recency showed variable associations with the number of low scores within subsamples. Future research that includes more comprehensive TBI history is indicated to characterize factors that may modify the association between low scores and TBI history.
People with an intellectual disability are vulnerable to additional disorders such as dementia. Psychometrically sound and specific instruments are needed for assessment of cognitive functioning in cases of suspected dementia.
Aims
To evaluate the construct and item validity, internal consistency and test–retest reliability of a new neuropsychological test battery, the Dementia Test for People with Intellectual Disability (DTIM).
Method
The DTIM was applied to 107 individuals with intellectual disability with (n = 16) and without (n = 91) dementia. The psychometric properties of the DTIM were assessed in a prospective study. The assessors were blinded to the diagnostic assignment.
Results
Confirmatory factor analysis at the scale level showed that a one-factor model fitted the data well (root mean square error of approximation < 0.06, standardised root mean square residual < 0.08, comparative fit index > 0.9). At the domain level, one-factor models showed reasonable-to-good fit index for five of seven domains. Internal consistency indicated excellent reliability of the overall scale (Cronbach’s α: 0.94 for dementia and 0.95 for controls). Item analysis revealed a wide range of difficulties (0.19–0.75 for dementia, 0.31–0.87 for controls), with minimal floor and ceiling effects. Eleven items (26%) had discrimination value ≤ 0.50. Test–retest reliability (n = 82) was high, with intraclass correlations of 0.95 (total score) and 0.69–0.96 (domains).
Conclusions
The DTIM fits a one-factor model and demonstrates internal and test–retest reliability; thus, it is suitable for use in cases of suspected dementia in people with various intellectual disabilities.
Measurement is an essential activity in neurology, as it allows for collecting and sharing data that can be used for description, comparison and decision making regarding the health status of patients. The adequate assessment of motor and functional signs and symptoms of movement disorders must be done with instruments that have been developed and tested following a standardized methodology. The validation of a scale or instrument is an iterative process that includes several phases and the testing of a number of psychometric properties following the principles of the Classical Test Theory or the Latent Test Theory, each with its own methods and statistical procedures. In this chapter, we review the characteristics and psychometric properties of the main measurement instruments and scales for assessing motor and functional symptoms in movement disorders, particularly those recommended by the Movement Disorders Society.
This methodological study aimed to adapt the DLS, introduced for individuals aged 18-60 years, to those aged 60 years and older and to determine its psychometric properties.
Methods
We collected the data between December 15, 2021 and April 18, 2022. We carried out the study with a sample of 60 years and older living in the city center of Burdur, Turkey. The sample was selected using snowball sampling, a non-probability sampling technique. We collected the data using a questionnaire booklet covering an 11-item demographic information form and the DLS. We utilized reliability and validity analyses in the data analysis. The analyses were performed on SPSS 23.0, and a P value < 0.05 was considered statistically significant.
Results
The mean age of the participants was found to be 68.29 (SD = 6.36). The 61-item measurement tool was reduced to 57 items by removing a total of 4 items from the scale. We also calculated Cronbach’s α values to be 0.936 for the mitigation/prevention subscale, 0.935 for the preparedness subscale, 0.939 for the response subscale, and 0.945 for the recovery/rehabilitation subscale.
Conclusions
As adapted in this study, the DLS-S can be validly and reliably used for individuals aged 60 years and older.
The last two decades have been marked by excitement for measuring implicit attitudes and implicit biases, as well as optimism that new technologies have made this possible. Despite considerable attention, this movement is marked by weak measures. Current implicit measures do not have the psychometric properties needed to meet the standards required for psychological assessment or necessary for reliable criterion prediction. Some of the creativity that defines this approach has also introduced measures with unusual properties that constrain their applications and limit interpretations. We illustrate these problems by summarizing our research using the Implicit Association Test (IAT) as a case study to reveal the challenges these measures face. We consider such issues as reliability, validity, model misspecification, sources of both random and systematic method variance, as well as unusual and arbitrary properties of the IAT’s metric and scoring algorithm. We then review and critique four new interpretations of the IAT that have been advanced to defend the measure and its properties. We conclude that the IAT is not a viable measure of individual differences in biases or attitudes. Efforts to prove otherwise have diverted resources and attention, limiting progress in the scientific study of racism and bias.
In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.
This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.
Human abilities in perceptual domains have conventionally been described with reference to a threshold that may be defined as the maximum amount of stimulation which leads to baseline performance. Traditional psychometric links, such as the probit, logit, and t, are incompatible with a threshold as there are no true scores corresponding to baseline performance. We introduce a truncated probit link for modeling thresholds and develop a two-parameter IRT model based on this link. The model is Bayesian and analysis is performed with MCMC sampling. Through simulation, we show that the model provides for accurate measurement of performance with thresholds. The model is applied to a digit-classification experiment in which digits are briefly flashed and then subsequently masked. Using parameter estimates from the model, individuals’ thresholds for flashed-digit discrimination is estimated.
Borsboom (2006) attacks psychologists for failing to incorporate psychometric advances in their work, discusses factors that contribute to this regrettable situation, and offers suggestions for ameliorating it. This commentary applauds Borsboom for calling the field to task on this issue and notes additional problems in the field regarding measurement that he could add to his critique. It also chastises Borsboom for occasionally being unnecessarily perjorative in his critique, noting that negative rhetoric is unlikely to make converts of offenders. Finally, it exhorts psychometricians to make their work more accessible and points to Borsboom, Mellenbergh, and Van Heerden (2003) as an excellent example of how this can be done.
Educational assessment concerns inference about students' knowledge, skills, and accomplishments. Because data are never so comprehensive and unequivocal as to ensure certitude, test theory evolved in part to address questions of weight, coverage, and import of data. The resulting concepts and techniques can be viewed as applications of more general principles for inference in the presence of uncertainty. Issues of evidence and inference in educational assessment are discussed from this perspective.
A taxonomy of latent structure assumptions (LSAs) for probability matrix decomposition (PMD) models is proposed which includes the original PMD model (Maris, De Boeck, & Van Mechelen, 1996) as well as a three-way extension of the multiple classification latent class model (Maris, 1999). It is shown that PMD models involving different LSAs are actually restricted latent class models with latent variables that depend on some external variables. For parameter estimation a combined approach is proposed that uses both a mode-finding algorithm (EM) and a sampling-based approach (Gibbs sampling). A simulation study is conducted to investigate the extent to which information criteria, specific model checks, and checks for global goodness of fit may help to specify the basic assumptions of the different PMD models. Finally, an application is described with models involving different latent structure assumptions for data on hostile behavior in frustrating situations.