Statement of Research Significance
Research Question(s) or Topic(s)
The psychometric properties, between-group differences, and concurrent and construct validity of the Hinting Task – Dutch version (HT-NL), a measure of Theory of Mind, in dementia.
Main Findings
The HT-NL has sound psychometric properties. Patients with behavioral variant frontotemporal dementia (bvFTD), primary progressive aphasia (PPA) and Alzheimer’s disease (AD) dementia performed worse on the HT-NL compared to control participants. Additionally, patients with bvFTD performed worse than patients with AD dementia. The HT-NL measures social cognition, but is also at least partially dependent on memory and language.
Study Contributions
This study shows that the HT-NL has sound psychometric properties, is able to detect group differences, and demonstrates concurrent and construct validity. The results support the use of the HT-NL as a measure of social cognition for the diagnostics of dementia.
Introduction
Changes in social behavior frequently occur as a result of neurodegenerative diseases (Setién-Suero et al., Reference Setién-Suero, Murillo-García, Sevilla-Ramos, Abreu-Fernández, Pozueta and Ayesa-Arriola2022). Social and behavioral changes are particularly prominent in the behavioral variant of frontotemporal dementia (bvFTD; Henry et al., Reference Henry, von Hippel, Molenberghs, Lee and Sachdev2016), but they also occur in primary progressive aphasia (PPA; Fittipaldi et al., Reference Fittipaldi, Ibanez, Baez, Manes, Sedeno and Garcia2019; Magno et al., Reference Magno, Canu, Filippi and Agosta2022) and Alzheimer’s disease (AD) dementia (Ossenkoppele et al., Reference Ossenkoppele, Singleton, Groot, Dijkstra, Eikelboom, Seeley, Miller, Laforce, Scheltens, Papma, Rabinovici and Pijnenburg2022). Social behavior is essential to human interactions and requires intact sociocognitive abilities (De Jaegher et al., Reference De Jaegher, Di Paolo and Gallagher2010). Social cognition is defined as the cognitive processes that are involved in processing social information, which underlie social interactions (Green et al., Reference Green, Penn, Bentall, Carpenter, Gaebel, Gur, Kring, Park, Silverstein and Heinssen2008). These cognitive processes comprise multiple components, as social signs need to be recognized, understood, and interpreted within their context (Adolphs, Reference Adolphs2003; Henry et al., Reference Henry, Phillips and von Hippel2014; Kennedy & Adolphs, Reference Kennedy and Adolphs2012). The sociocognitive process of understanding social information is referred to as Theory of Mind (ToM) - that is, the ability to infer and understand mental states in oneself and others, and to understand that mental states in others can differ from one’s own (Henry et al., Reference Henry, von Hippel, Molenberghs, Lee and Sachdev2016; Premack & Woodruff, Reference Premack and Woodruff1978). Impairments in ToM have been reported in different types of dementia, including bvFTD, PPA, and AD dementia (Bora et al., Reference Bora, Walterfang and Velakoulis2015; Henry et al., Reference Henry, Phillips and von Hippel2014; Magno et al., Reference Magno, Canu, Filippi and Agosta2022), and pose a burden on patients and their caregivers (Brioschi Guevara et al., Reference Brioschi Guevara, Knutson, Wassermann, Pulaski, Grafman and Krueger2015; Formica et al., Reference Formica, Bonanno, Todaro, Marra, Alagna, Corallo, Marino, Bramanti and De Salvo2020).
Despite the importance of measuring ToM and the recognition of social cognition as one of the six core components of neurocognitive functioning (American Psychiatric Association, 2013), social cognition is not routinely assessed in clinical practice (McDonald et al., Reference McDonald, Wearne and Kelly2023). An important reason is that few neuropsychological tests aimed at social cognition, and in particular ToM, are found to be psychometrically sound instruments. Developing and validating neuropsychological tests for ToM, and social cognition in general, is fundamental for ensuring a reliable and accurate diagnostic assessment of sociocognitive functioning in dementia.
To this end, the Hinting Task (Corcoran et al., Reference Corcoran, Mercer and Frith1995) is a promising neuropsychological instrument. The Hinting Task measures the ability to infer intentions behind indirect speech. Originally, the Hinting Task was developed to assess ToM in patients with schizophrenia and has since been investigated primarily in patient groups with various neuropsychiatric disorders, showing sensitivity to impairments in ToM (e.g. Bora et al., Reference Bora, Vahip, Gonul, Akdeniz, Alkan, Ogut and Eryavuz2005; Pinkham et al., Reference Pinkham, Harvey and Penn2018). In these clinical populations, the Hinting Task has demonstrated adequate to good results concerning discriminative ability, test-retest reliability, internal consistency, and construct validity (Halverson et al., Reference Halverson, Pinkham, Harvey and Penn2022; Klein et al., Reference Klein, Springfield, Bass, Ludwig, Penn, Harvey and Pinkham2020; Morrison et al., Reference Morrison, Pinkham, Kelsven, Ludwig, Penn and Sasson2019; Pinkham et al., Reference Pinkham, Harvey and Penn2018; Tsui et al., Reference Tsui, Liao, Hsiao, Suen, Yan, Poon, Siu, Hui, Chang, Lee, Chen and Chan2024). However, ceiling effects have been reported (Davidson et al., Reference Davidson, Lesser, Parente and Fiszdon2018; Frøyhaug et al., Reference Frøyhaug, Andersson, Andreassen, Ueland and Vaskinn2019), though not consistently (Pinkham et al., Reference Pinkham, Harvey and Penn2018).
Limited research has been performed on the Hinting Task in dementia (Braak et al., Reference Braak, Su, Krudop, Pijnenburg, Reus, van der Wee, Bilderbeck, Dawson, Winter-Van Rossum, Vieira Campos, Arango, Saris, Kas and Penninx2022; Van ’t Hooft et al., Reference van ‘t Hooft, Hartog, Braun, Boessen, Fieldhouse, van Engelen, Singleton, Jaschke, Schaefer, Venkatraghavan, Barkhof, van Harten, Duits, Schouws, Oudega, Warren, Tijms and Pijnenburg2024). Differences have been found between patients with AD dementia and control participants (Braak et al., Reference Braak, Su, Krudop, Pijnenburg, Reus, van der Wee, Bilderbeck, Dawson, Winter-Van Rossum, Vieira Campos, Arango, Saris, Kas and Penninx2022), albeit not invariably (Van ’t Hooft et al., Reference van ‘t Hooft, Hartog, Braun, Boessen, Fieldhouse, van Engelen, Singleton, Jaschke, Schaefer, Venkatraghavan, Barkhof, van Harten, Duits, Schouws, Oudega, Warren, Tijms and Pijnenburg2024). Despite the profound impairments in social cognition in frontotemporal dementia (FTD) spectrum disorders, only one study investigated the Hinting Task in the FTD spectrum and found that patients with bvFTD performed worse compared to control participants and patients with AD dementia (Van ’t Hooft et al., Reference van ‘t Hooft, Hartog, Braun, Boessen, Fieldhouse, van Engelen, Singleton, Jaschke, Schaefer, Venkatraghavan, Barkhof, van Harten, Duits, Schouws, Oudega, Warren, Tijms and Pijnenburg2024). A psychometric evaluation of the Hinting Task in dementia, however, has not been performed, leaving important questions on the validity of its use in memory clinics unanswered.
The aims of the present study are 1) to examine the psychometric properties of the Hinting Task – Dutch version (HT-NL), 2) to evaluate differences in HT-NL scores between patients with bvFTD, PPA, and AD dementia and control participants, and to investigate the discriminative ability of the HT-NL in patients with dementia compared to control participants, as well as to compare the discriminative ability to another test of social cognition to assess concurrent validity, and 3) to examine the associations between the HT-NL and other cognitive tests to assess construct validity. Preliminary normative data based on the control group are also reported to support the use of the HT-NL in research and clinical practice.
Method
Participants
This study included 66 patients with dementia, being bvFTD (n = 22), PPA (n = 21; semantic variant PPA = 8, nonfluent variant PPA = 6, logopenic variant PPA = 7), and AD dementia (n = 23), who visited the outpatient memory clinic of Alzheimer Center Erasmus MC in Rotterdam, the Netherlands, between June 2022 and April 2024 for a standardized diagnostic assessment, including a neurological examination, neuropsychological assessment, laboratory testing, and structural magnetic resonance imaging. Results were discussed in a multidisciplinary consensus meeting by neurologists, geriatricians, radiologists, and neuropsychologists in which a clinical diagnosis was made based on the international clinical criteria for bvFTD (Rascovsky et al., Reference Rascovsky, Hodges, Knopman, Mendez, Kramer, Neuhaus, van Swieten, Seelaar, Dopper, Onyike, Hillis, Josephs, Boeve, Kertesz, Seeley, Rankin, Johnson, Gorno-Tempini, Rosen and Miller2011), PPA (Gorno-Tempini et al., Reference Gorno-Tempini, Hillis, Weintraub, Kertesz, Mendez, Cappa, Ogar, Rohrer, Black, Boeve, Manes, Dronkers, Vandenberghe, Rascovsky, Patterson, Miller, Knopman, Hodges, Mesulam and Grossman2011), and AD (McKhann et al., Reference McKhann, Knopman, Chertkow, Hyman, Jack, Kawas, Klunk, Koroshetz, Manly, Mayeux, Mohs, Morris, Rossor, Scheltens, Carrillo, Thies, Weintraub and Phelps2011). Participants were included in the present study only if they had intact comprehension of the test instructions.
Control participants (n = 99) were healthy, community-dwelling adults recruited through word of mouth in the greater areas of Rotterdam and Groningen in the Netherlands. Control participants were included if they had no self-reported history of neurological or psychiatric disorders, scored below cutoff (< 11) on the anxiety and depression subscales of the Hospital Anxiety and Depression Scale (HADS; Bjelland et al., Reference Bjelland, Dahl, Haug and Neckelmann2002), and scored above 25 points on the Mini-Mental State Examination (MMSE; Folstein et al., Reference Folstein, Folstein and McHugh1975).
This research was approved by the Medical Ethics Review Committee of the Erasmus MC University Medical Center (MEC-2022-0546) and was completed in accordance with the Helsinki Declaration. All participants gave written informed consent for their data to be used for scientific analysis.
Measures
The Hinting Task
The Hinting Task is a verbal test that measures the ability to infer intentions behind indirect speech in order to assess ToM. Ten passages describing an interaction between two characters are read aloud to the participant. In each passage, one character makes an indirect inquiry and participants are asked to infer the meaning of this hint. If the initial answer is incorrect, a continuation of the passage with a more obvious hint is provided. A correct response yields a score of 2, a correct response after providing the extra hint yields a score of 1, and if the response remains incorrect after the extra hint, a score of 0 is given. These are the original scoring criteria as reported by Corcoran et al. (Reference Corcoran, Mercer and Frith1995). Total scores range from 0 to 20, with higher scores suggesting better ToM ability. An example of an item is: George arrives at Angela’s office after a long, hot trip on the motorway. Angela immediately begins to talk about some business ideas. George interrupts Angela to say: “Ugh, it has been a long, hot trip on the motorway.” The question asked is: What does George really want to say when he says this? If an incorrect response is given, the following hint is provided: George continues to say, “I am thirsty.” What does George want Angela to do?
The Hinting Task – Dutch version (HT-NL) was developed with permission of the authors of the original version of the Hinting Task (Corcoran et al., Reference Corcoran, Mercer and Frith1995). We translated and adapted the Hinting Task to the Dutch language and culture. Changes in the HT-NL relative to the original version were minor, such as names of the characters (e.g. Gerard for George). The translation from English to Dutch was performed by two independent raters (EB and HB) and the resulting translation was back-translated by a native English speaker. Differences in translations were minor and were resolved by consensus.
Cognitive assessment
All participants in this study completed a set of neuropsychological tests, but the test batteries differed slightly between patient groups as the neuropsychological assessment was part of the clinical diagnostic assessment. The MMSE was used as a measure of global cognition and as an estimator of disease severity (Folstein et al., Reference Folstein, Folstein and McHugh1975).
The Trail Making Test (TMT; Corrigan & Hinkeldey, Reference Corrigan and Hinkeldey1987) part A was included to measure information processing speed and TMT part B and the TMT B/A index were included to measure executive functioning. The 60-item Boston Naming Test (BNT; Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983) was included as a measure of language, specifically confrontation naming. The category (animal) and letter (D-A-T) fluency tests (Schmand et al., Reference Schmand, Groenink and van den Dungen2008) were included as measures of language and executive functioning. Verbal episodic memory was measured using the immediate recall (sum score of trials 1–5) and the delayed recall of the Dutch version of the Rey Auditory Verbal Learning Test (RAVLT; Van der Elst et al., Reference van der Elst, van Boxtel, van Breukelen and Jolles2005). The Digit Span test from the WAIS-IV (Wechsler, Reference Wechsler2012), including the forward, backward, and sequencing conditions, was included as a measure of working memory. Social cognition was assessed using the Emotion Recognition Test (ERT; Kessels et al., Reference Kessels, Montagne, Hendriks, Perrett and de Haan2014) as a measure of facial emotion recognition. Control participants did not perform the BNT, RAVLT, and Digit Span test.
Level of education was recorded using seven categories in line with the Dutch educational system, in which 1 corresponds to having completed less than primary school and 7 corresponds to having an academic degree (Duits & Kessels, Reference Duits and Kessels2014). These levels were converted to years of education in accordance with the Anglo-Saxon educational system.
Statistical analysis
To examine the sample characteristics, we evaluated between-group differences using analyses of variance, except for the categorical variable sex, for which a chi-square test was performed.
In the first step, we examined the psychometric properties of the HT-NL in the total sample (n = 165). After exploring the data distribution of the HT-NL, we assessed the internal consistency using Cronbach’s alpha and Pearson inter-item correlations. Associations with age, sex, and level of education and the HT-NL were assessed using multiple linear regression.
In the second step, between-group differences in HT-NL scores were examined using analysis of covariance controlling for age. Estimated marginal means were reported and partial eta squared (η2) was used as a measure of effect size. Post hoc pairwise comparisons were adjusted using the Bonferroni correction. In a secondary analysis, the aforementioned analysis was additionally adjusted for MMSE score as an estimator of disease severity. In the few cases in which a patient had completed the Montreal Cognitive Assessment (MoCA; Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin, Cummings and Chertkow2005) instead of the MMSE, the MoCA score was converted to an equivalent MMSE score according to Van Steenoven et al. (Reference van Steenoven, Aarsland, Hurtig, Chen‐Plotkin, Duda, Rick, Chahine, Dahodwala, Trojanowski, Roalf, Moberg and Weintraub2014). Power analyses were performed using G*Power 3.1.9.7, selecting the statistical test ‘ANCOVA: Fixed effects, main effects and interactions’ within the family of F-tests. Our sample size was sufficient to detect medium to large effects with a power of .80 and the alpha-level set at .05.
The discriminative ability of the HT-NL was assessed using receiver operating characteristic (ROC) analysis, with the area under the curve (AUC) calculated for the contrast of patients with dementia (n = 66) and control participants (n = 99). No analyses with subgroups of patients with dementia were performed due to small sample sizes. Additionally, we compared the discriminative ability of the HT-NL with the ERT (serving as a reference test for social cognition) as a measure of concurrent validity by comparing the AUCs of the two tests for the contrast of patients with dementia and control participants using a paired-sample design. An AUC below .70 was considered poor discrimination, between .70 and .80 was considered acceptable discrimination, between .80 and .90 was considered excellent discrimination, and above .90 was considered outstanding discrimination (Hosmer & Lemeshow, Reference Hosmer and Lemeshow2000).
In the third step, construct validity was assessed using exploratory Spearman rank correlation analyses between the HT-NL and other cognitive tests in the patient group (n = 66). The correlation between the HT-NL and the ERT was examined for convergent validity. Correlations with the TMT, BNT, category and letter fluency tests, RAVLT, and Digit Span test were examined for divergent validity.
Preliminary normative data were calculated as percentiles based on the distribution of HT-NL scores in the control group (n = 99), after investigating the effects of age, sex, and level of education using multiple linear regression.
All analyses were performed with IBM SPSS Statistics 28.0 (IBM Corporation, 2021), with alpha set at .05.
Results
Characteristics
The participant characteristics are shown in Table 1. Patients with bvFTD, PPA, and AD dementia and control participants differed in age (F (3, 161) = 4.88, p < .01, η2 = .08). Patients with bvFTD were younger than patients with AD dementia (p < .001), patients with PPA (p < .05), and control participants (p = .01), and control participants were younger than patients with AD dementia (p = .04). No differences were found between the groups in the distribution of sex (χ2 (3, N = 165) = 1.89, p = .60) or in years of education (F (3, 161) = 0.61, p = .61, η2 = .01). Months since symptom onset did not differ between the patient groups (F (2, 62) = 0.35, p = .70, η2 = .01). MMSE scores differed significantly (F (3, 156) = 60.67, p < .001, η2 = .54). All patient groups (p’s < .001) scored lower than the control group, and patients with AD dementia scored lower than patients with bvFTD (p < .001).
Table 1. Characteristics of the control group and patient groups

Note: Data are represented as mean ± standard deviation, unless otherwise specified. Abbreviations: MMSE = Mini-Mental State Examination; RAVLT = Rey Auditory Verbal Learning Test; Control = control participants; bvFTD = behavioral variant frontotemporal dementia; PPA = primary progressive aphasia; AD = Alzheimer’s disease.
Psychometric properties of the HT-NL
Patients with dementia (n = 66) had a mean HT-NL score of 14.2 ± 3.8, with scores ranging from 1 to 19. Control participants (n = 99) had a mean score of 17.7 ± 1.5, with scores ranging from 13 to 20.
Overall, none of the patients and 12 (12.1%) control participants obtained a maximum score of 20. At the item level, item 8 (unpacking shelves) had both the highest proportion of patients (50.0%) and control participants (10.1%) scoring zero points. Most patients obtained the maximum score on item 5 (no money; 81.8%), while most control participants obtained the maximum score on item 10 (heavy luggage; 94.9%).
Based on the total sample (n = 165), internal consistency was acceptable (Cronbach’s α = .74). None of the items were removed, as doing so would have resulted in a lower internal consistency. Inter-item correlations ranged from small to moderate (r = [.07, .41]) with a mean inter-item correlation of .24 ± .08.
The overall model including age, sex, and level of education as predictors of HT-NL score was insignificant in the total sample (F (3, 161) = 0.68, p = .56, R 2 = .01). None of the individual predictors, age (B = 0.01, SE = 0.03, t = 0.26, p = .80), sex (B = −0.64, SE = 0.49, t = −1.29, p = .20), and years of education (B = −0.04, SE = 0.08, t = −0.54, p = .59), were predictive of HT-NL score.
Between-group analyses and discriminative ability
Patients with bvFTD, PPA, and AD dementia and control participants scored significantly different on the HT-NL after adjusting for age (Table 2; F (3, 160) = 26.71, p < .001, ηp2 = .33). All three patient groups scored lower than the control group (bvFTD: p < .001, PPA: p < .001, AD dementia: p < .01). Within the patient groups, patients with bvFTD scored lower than patients with AD dementia (p = .02). No differences were found between patients with bvFTD and patients with PPA (p = .99), nor between patients with AD dementia and patients with PPA (p = .14).
Table 2. Inferential statistics and adjusted means of the Hinting Task – Dutch version

Note: Data are reported as estimated marginal means ± standard error, which are adjusted for age and for age and MMSE. Abbreviations: HT-NL = Hinting Task – Dutch version; MMSE = Mini-Mental State Examination; Control = control participants; bvFTD = behavioral variant frontotemporal dementia; PPA = primary progressive aphasia; AD = Alzheimer’s disease.
Additional adjustment for MMSE score as an estimator of disease severity yielded largely similar results (F (3, 154) = 12.45, p < .001, ηp2 = .20), except that the difference between patients with AD dementia and control participants was no longer significant (p = .99). Patients with bvFTD (p < .001) and patients with PPA (p < .001) scored lower than control participants. Patients with bvFTD (p < .01) and patients with PPA (p < .01) scored lower than patients with AD dementia. No differences were found between patients with bvFTD and patients with PPA (p = .99).
The ROC analysis, to assess the ability of the HT-NL to distinguish between patients with dementia and control participants, yielded an AUC of 0.83 (SE = 0.03, 95% confidence interval (95%CI) = [0.77, 0.89]), indicating excellent discrimination. When comparing the AUCs of the HT-NL and the ERT (serving as a reference test for social cognition), no significant difference was found, indicating that both tests have similar discriminative ability and that the HT-NL demonstrates concurrent validity (Figure 1; ΔAUC = 0.03, SE = 0.32, 95%CI = [−0.09, 0.15], p = .67).

Figure 1. The receiver operating characteristic curves of the Hinting Task – Dutch version and the Emotion Recognition Test. Note. Abbreviations: HT-NL = Hinting Task – Dutch version; ERT = Emotion Recognition Test.
Associations between the HT-NL and other cognitive tests
The correlations between the HT-NL and other cognitive tests in the patient group are shown in Table 3.
Table 3. Spearman rank correlations between the Hinting Task – Dutch version and other cognitive tests in the patient group (n = 66)

Note: Correlations in bold are significant at the .05 level. Abbreviations: HT-NL = Hinting Task – Dutch version; RAVLT = Rey Auditory Verbal Learning Test.
The HT-NL had a moderate to large positive correlation with the ERT, indicating convergent validity. The HT-NL had moderate positive correlations with the BNT and the category fluency test, and a moderate to large positive correlation with the RAVLT immediate recall. No associations were found with TMT part A, TMT part B, TMT B/A index, letter fluency test, RAVLT delayed recall, and Digit Span test, indicating divergent validity.
Preliminary normative data
Preliminary normative data based on the control group (n = 99, age = 67.9 ± 7.7, male sex = 44 (44%), years of education = 12.2 ± 3.0) are presented in Table 4. The overall model including age, sex, and level of education as predictors of HT-NL score was insignificant in the control group (F (3, 95) = 1.20, p = .31, R2 = .04). None of the individual predictors, age (B = −0.01, SE = 0.02, t = −0.27, p = .79), sex (B = −0.55, SE = 0.31, t = −1.78, p = .08), and years of education (B = 0.03, SE = 0.05, t = 0.67, p = .50), were predictive of HT-NL score.
Table 4. Preliminary normative data based on the control group (n = 99)

Note: Based on 99 control participants (age = 67.9 ± 7.7, male sex = 44 (44%), years of education = 12.2 ± 3.0). Abbreviation: HT-NL = Hinting Task – Dutch version.
Discussion
The current study examined the psychometric properties of the HT-NL, as well as group differences between patients with bvFTD, PPA, and AD dementia and control participants, the ability to distinguish between patients with dementia and control participants - also compared to a reference test for social cognition to assess concurrent validity, and associations between the HT-NL and other cognitive tests to assess construct validity. The results showed that the HT-NL has acceptable internal consistency and that its scores are not influenced by age, sex, or level of education. All patient groups performed worse than the control group, and patients with bvFTD performed worse than patients with AD dementia. The HT-NL could distinguish between patients with dementia and control participants, similar to a reference test for social cognition, thereby showing concurrent validity. The HT-NL showed convergent validity by its association with a test of facial emotion recognition. Significant associations with measures of memory and language were also found. Divergent validity was indicated by the absence of associations with measures of information processing speed, executive functioning, and working memory.
The psychometric properties found in this study are overall in line with previous literature about the Hinting Task in different clinical populations (Halverson et al., Reference Halverson, Pinkham, Harvey and Penn2022; Klein et al., Reference Klein, Springfield, Bass, Ludwig, Penn, Harvey and Pinkham2020; Morrison et al., Reference Morrison, Pinkham, Kelsven, Ludwig, Penn and Sasson2019; Pinkham et al., Reference Pinkham, Harvey and Penn2018; Tsui et al., Reference Tsui, Liao, Hsiao, Suen, Yan, Poon, Siu, Hui, Chang, Lee, Chen and Chan2024). Similar to Halverson et al. (Reference Halverson, Pinkham, Harvey and Penn2022), no item could be removed to increase the internal consistency, and we thereby find no strong support for a shorter version of the HT-NL. We applied the original scoring criteria instead of the more stringent scoring criteria as described by Klein et al. (Reference Klein, Springfield, Bass, Ludwig, Penn, Harvey and Pinkham2020), which have been found to be more sensitive and to reduce ceiling effects compared to the original scoring. In our sample, however, no clear evidence was found for ceiling effects on the total score of the HT-NL or on any of the items. This held for both patients and control participants and aligns with the absence of ceiling effects in the study by Pinkham et al. (Reference Pinkham, Harvey and Penn2018), but further research could investigate whether the different scoring criteria yield significantly different results for the HT-NL. The absence of ceiling effects does suggest that the HT-NL is sufficiently challenging to capture the range of performances within dementia, perhaps more than in psychiatric disorders. However, the number of control participants who obtained the maximum score also appeared to be slightly lower compared to previous research in healthy control participants (Frøyhaug et al., Reference Frøyhaug, Andersson, Andreassen, Ueland and Vaskinn2019; Klein et al., Reference Klein, Springfield and Pinkham2024). This could perhaps be due to our control participants being older in order to match the patients with dementia. Additionally, we cannot exclude a difference in difficulty between the original English version and the Dutch version, potentially related to a cultural difference in (in)directness in communication style (Labrie et al., Reference Labrie, Akkermans and Hample2020). Our findings thus highlight the importance of the international validation of instruments for social cognition (Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024).
Our results showed that all patient groups performed worse than the control group. In line with these results, worse performance on the Hinting Task has been reported for patients with AD (Braak et al., Reference Braak, Su, Krudop, Pijnenburg, Reus, van der Wee, Bilderbeck, Dawson, Winter-Van Rossum, Vieira Campos, Arango, Saris, Kas and Penninx2022) and bvFTD (Van ’t Hooft et al., Reference van ‘t Hooft, Hartog, Braun, Boessen, Fieldhouse, van Engelen, Singleton, Jaschke, Schaefer, Venkatraghavan, Barkhof, van Harten, Duits, Schouws, Oudega, Warren, Tijms and Pijnenburg2024) compared to control participants. The results are also consistent with the broader literature on social cognition in dementia, which shows that patients with AD dementia and bvFTD score lower than control participants on other tests measuring ToM (Henry et al., Reference Henry, Phillips and von Hippel2014). Additionally, we found that patients with bvFTD scored lower than patients with AD dementia, which also aligns with a previous study using the Hinting Task (Van ’t Hooft et al., Reference van ‘t Hooft, Hartog, Braun, Boessen, Fieldhouse, van Engelen, Singleton, Jaschke, Schaefer, Venkatraghavan, Barkhof, van Harten, Duits, Schouws, Oudega, Warren, Tijms and Pijnenburg2024) and with overall findings on ToM in bvFTD and AD dementia (Bora et al., Reference Bora, Walterfang and Velakoulis2015; Henry et al., Reference Henry, Phillips and von Hippel2014). Within PPA, relatively few studies examined ToM, or social cognition in general. However, similar to our results, impairments in ToM have been found in patients with PPA compared to healthy control participants (Fittipaldi et al., Reference Fittipaldi, Ibanez, Baez, Manes, Sedeno and Garcia2019; Magno et al., Reference Magno, Canu, Filippi and Agosta2022). Accounting for disease severity using MMSE scores affected the results only for the patients with AD dementia, who did not perform significantly different from control participants anymore. This could be due to broader cognitive impairments in AD dementia, in which impairments in social cognition generally occur later in the disease course than in bvFTD and PPA (Setién-Suero et al., Reference Setién-Suero, Murillo-García, Sevilla-Ramos, Abreu-Fernández, Pozueta and Ayesa-Arriola2022). Specifically, impairments in social cognition are a specific deficit in bvFTD, whereas impairments in social cognition are relatively less specific compared to other cognitive impairments in AD dementia (Bora et al., Reference Bora, Velakoulis and Walterfang2016). Of note, the MMSE is not an optimal estimator of disease severity in bvFTD and PPA (Bora et al., Reference Bora, Velakoulis and Walterfang2016; Premi et al., Reference Premi, Gualeni, Costa, Cosseddu, Gasparotti, Padovani and Borroni2016), for which the MoCA may be more appropriate (De Boer et al., Reference de Boer, Poos, van den Berg, De Houwer, Swartenbroekx, Dopper, Boesjes, Tahboun, Bouzigues, Foster, Ferry-Bolder, Adams-Carr, Russell, Convery, Rohrer, Seelaar and Jiskoot2025). The HT-NL showed similar discriminative ability to the ERT, which served as our reference test for social cognition, thereby showing concurrent validity and underlining the diagnostic applicability of the HT-NL.
Our results in terms of construct validity indicate that the HT-NL shows convergent validity through a moderate to large association with a test for facial emotion recognition, which is consistent with previous studies (Braak et al., Reference Braak, Su, Krudop, Pijnenburg, Reus, van der Wee, Bilderbeck, Dawson, Winter-Van Rossum, Vieira Campos, Arango, Saris, Kas and Penninx2022; Frøyhaug et al., Reference Frøyhaug, Andersson, Andreassen, Ueland and Vaskinn2019; Mallawaarachchi et al., Reference Mallawaarachchi, Cotton, Anderson, Killackey and Allott2019). We also found significant associations with measures of memory and language, indicating that performance on the HT-NL is at least partly dependent on these cognitive functions, which corroborates previous findings (Deckler et al., Reference Deckler, Hodgins, Pinkham, Penn and Harvey2018; Pérez-Flores et al., Reference Pérez-Flores, Nieto and Delgado2024). These associations are to be expected as the brief passages have to be understood and retained for a short period of time, after which a verbal response is required. These findings underline that designing tests for social cognition that solely measure a sociocognitive process is difficult. The results thus reflect the complexity of sociocognitive processes (Thibaudeau et al., Reference Thibaudeau, Achim, Parent, Turcotte and Cellard2020) and may call for the development of more direct measures of social cognition, such as gaze direction by means of eye tracking (Bueno et al., Reference Bueno, Sato and Hornberger2019; Singleton et al., Reference Singleton, Fieldhouse, van ’t Hooft, Scarioni, van Engelen, Sikkes, de Boer, Bocancea, van den Berg, Scheltens, van der Flier, Papma, Pijnenburg and Ossenkoppele2023) or automated speech analysis including linguistic and acoustic features, such as the emotional intensity of words (Vonk et al., Reference Vonk, Morin, Pillai, Rolon, Bogley, Baquirin, Ezzes, Tee, DeLeon, Wauters, Lukic, Montembeault, Younes, Miller, García, Mandelli, Sturm, Miller and Gorno-Tempini2024). The HT-NL showed divergent validity through the absence of associations with tests of information processing speed, executive functioning, and working memory. Previous studies corroborate the absence of associations with measures of information processing speed and executive functioning, although an association with working memory was not replicated (Deckler et al., Reference Deckler, Hodgins, Pinkham, Penn and Harvey2018; Kosutzka et al., Reference Kosutzka, Kralova, Kusnirova, Papayova, Valkovic, Csefalvay and Hajduk2019).
Strengths of this study include the large group of patients with dementia and the direct comparisons between patients with bvFTD, PPA, and AD dementia and control participants. Moreover, we report preliminary normative data to support the use of the HT-NL in research and clinical practice. A limitation of this study is the heterogeneity of the sample inherent to comparing different types of dementia. Disease onset and progression vary between dementia types, meaning that disease severity is not proportional to the time since disease onset, which complicates patient matching (Rascovsky et al., Reference Rascovsky, Salmon, Lipton, Leverenz, DeCarli, Jagust, Clark, Mendez, Tang-Wai, Graff-Radford and Galasko2005; Rogalski & Mesulam, Reference Rogalski and Mesulam2009). Another limitation is the cross-sectional rather than longitudinal design, which did not allow for evaluating the stability of test scores or the ability to measure decline in ToM – an aspect particularly relevant given the progressive nature of neurodegenerative diseases. In addition, we compared the HT-NL only to a measure of facial emotion recognition to assess convergent validity, whereas comparison with other tests measuring ToM would be preferable – these tests were, however, unavailable in the present study. Additionally, as social cognition is a multidimensional construct, investigating associations between the HT-NL and other sociocognitive components provides insights in the embedding of the HT-NL within the broader theoretical framework of social cognition and advances understanding of the interrelations among sociocognitive components (Van den Stock et al., Reference van den Stock, Sun, DeWinter and Vandenbulcke2021). Accordingly, the HT-NL could be compared to tests assessing higher-order sociocognitive components, such as emotion regulation, moral reasoning, and social knowledge, in addition to tests assessing ToM and emotion recognition (Eikelboom et al., Reference Eikelboom, van den Berg, Beauchamp, Bray, Kumfor, MacPherson, McDonald, Spikman and Kessels2025). Lastly, for a few participants only a MoCA score was available instead of an MMSE score. These scores were converted to MMSE scores according to Van Steenoven et al. (Reference van Steenoven, Aarsland, Hurtig, Chen‐Plotkin, Duda, Rick, Chahine, Dahodwala, Trojanowski, Roalf, Moberg and Weintraub2014), but we acknowledge that using a single test is preferable over the use of conversions.
Our results underline the importance of measuring social cognition in different types of dementia, even in those with relatively mild sociocognitive symptoms (Quesque et al., Reference Quesque, Nivet, Etchepare, Wauquiez, Prouteau, Desgranges and Bertoux2024). The HT-NL has been found to be a clinically useful test to recognize ToM impairments and can therefore facilitate the routine assessment of social cognition in dementia (McDonald et al., Reference McDonald, Wearne and Kelly2023). Preliminary normative data based on the control group (Table 4) are reported to support the use of the HT-NL in research and clinical practice, although these should be used with caution until further validation in larger samples.
In conclusion, the HT-NL has sound psychometric properties and shows adequate construct validity. The HT-NL is able to distinguish between patients with dementia and control participants, but more caution is warranted when differentiating between types of dementia. Altogether, the HT-NL is a useful test to identify impairments in ToM in patients with dementia as part of a comprehensive neuropsychological assessment.
Acknowledgements
The authors thank E.L. van der Ende for her help in translating the Hinting Task. This work is part of the YOD-INCLUDED project. A Dutch research project, aimed at improving early recognition and diagnosis, studying (hereditary) causes, and providing appropriate care and psychosocial support to people with young-onset dementia and their families.
Funding statement
This work was supported by ZonMw Onderzoeksprogramma Dementie [project number 10510032120002]; Alzheimer Nederland [project number WE.09-2023-04]; and the Bluefield project. Several authors of this publication are members of the European Reference Network for Rare Neurological Diseases – Project ID No. 739510.
Competing interests
The authors declare no potential conflicts of interest.