Statement of Research Significance
Research question(s) or topic(s): This study updated semantic verbal fluency norms for older Taiwanese adults and developed demographic cut-offs to enhance the accuracy of mild cognitive impairment screening in Mandarin speakers. Main findings: A demographically corrected threshold of 10 words (15th percentile) was identified as the impairment cut-off for community screening. Receiver Operating Characteristic analysis showed that a cut-off of 11.5 words yielded optimal diagnostic accuracy for distinguishing mild cognitive impairment from healthy controls (area under curve = .716, sensitivity = 57.8%, specificity = 73.9%). Study contributions: This study provides updated, education-sensitive semantic verbal fluency norms tailored for the rapidly changing demographic profile of Taiwan’s aging population. It also establishes practical, demographically adjusted cut-off scores for clinical use. These findings support the inclusion of culturally appropriate semantic verbal fluency tasks in routine cognitive assessments and contribute to improved early detection of mild cognitive impairment among Mandarin-speaking older adults.
Introduction
Semantic verbal fluency (SVF) is a critical tool for assessing global cognitive function and detecting early cognitive impairment. Widely utilized in both clinical and research settings, SVF tasks assess lexical access, memory retrieval, and executive function, involving processes such as information organization, sustained attention, response inhibition, and cognitive flexibility (Delgado-Álvarez et al., Reference Delgado-Álvarez, Cabrera-Martín, Pytel, Delgado-Alonso, Matías-Guiu and Matias-Guiu2022; Oh et al., Reference Oh, Sung, Choi and Jeong2019; Mathuranath et al., Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003). Due to these multidimensional demands, SVF performance provides valuable insight into an individual’s cognitive status and is a sensitive marker of early cognitive decline.
SVF is particularly relevant in identifying neurodegenerative disorders such as mild cognitive impairment (MCI), a transitional stage between normal aging and Alzheimer’s disease (AD). Individuals with MCI typically exhibit reduced word output, impaired SVF clustering (i.e., fewer semantically related words grouped together), and diminished cognitive flexibility, reflecting disruptions in semantic memory (Chang et al., Reference Chang, Chen and Tseng2020; Chasles et al., Reference Chasles, Tremblay, Escudier, Lajeunesse, Benoit, Langlois, Joubert and Rouleau2020; Henderson et al., Reference Henderson, Peterson, Patterson, Lambon Ralph and Rowe2023; Wang et al., Reference Wang, Hong, Wang, Su, Ng, Xu, Wang and Yan2021; Wright et al., Reference Wright, De Marco and Venneri2023). These deficits underscore the utility of SVF for detecting early-stage cognitive decline (Cheung et al., Reference Cheung, Cheung and Chan2004; Wang et al., Reference Wang, Hong, Wang, Su, Ng, Xu, Wang and Yan2021).
Verbal fluency comprises two main forms: phonemic verbal fluency (PVF) and SVF (Acevedo et al., Reference Acevedo, Loewenstein, Barker, Harwood, Luis, Bravo, Hurwitz, Aguero, Greenfield and Duara2000). PVF, which requires generating words starting with a particular letter (e.g., F, A, S), is commonly used in alphabetic language systems but is less suitable for logographic languages like Mandarin Chinese. In contrast, SVF tasks, such as naming items within a semantic category (e.g., animals or fruits), are more culturally adaptable. Nonetheless, demographic variables such as age, education, gender, and cultural familiarity significantly influence SVF performance (Ardila, Reference Ardila2020; Cameron et al., Reference Cameron, Wambaugh and Mauszycki2008; Chen, Reference Chen2010).
Among SVF categories, “animals” is commonly used due to its presumed cultural neutrality (Ardila, Reference Ardila2020). However, this assumption may not hold in Taiwanese populations. In Chinese culture, the 12 zodiac animals carry strong symbolic meaning, with each individual assigned an animal based on their birth year. These animals are deeply embedded in long-term memory and are highly salient for older adults.
Based on our clinical observations, many participants tend to rely on this culturally familiar set as a default retrieval strategy, often listing all 12 zodiac animals early in the task with minimal cognitive effort. However, this culturally scripted recall pattern may override typical semantic clustering strategies (e.g., land, aquatic, or flying animals), as the zodiac set does not comprehensively represent biologically or functionally organized categories (e.g., tiger, rabbit). In contrast, Western participants often begin with common domestic or farm animals (e.g., dog, cat, cow) and rely more on semantic search as they exhaust familiar examples. Taiwanese participants, by comparison, may rapidly generate the full zodiac set without engaging in deeper semantic organization. Notably, the number of zodiac animals (12) closely approaches or even exceeds common clinical cut-off scores for MCI detection (e.g., 14 reported in Torralva et al., and Mirandez et al.), increasing the risk of false negatives during screening.
Categories like “fruits” and “vegetables” offer better diagnostic accuracy and more consistent demographic patterns than animals. Fruit fluency is especially effective in distinguishing MCI from normal aging, notably in well-educated individuals (Ardila, Reference Ardila2020; Chen, Reference Chen2010; Radanovic et al., Reference Radanovic, Carthery-Goulart, Charchat-Fichman, Herrera, Lima, Smid, Porto and Nitrini2007). Thus, we chose the fruit category to minimize cultural bias and reliably assess semantic organization.
Although several Taiwanese studies have examined SVF using categories like fruits, vegetables, and fish, previous normative references may not reflect recent demographic changes. From 2017 to 2022, Taiwan experienced a notable increase in educational attainment among older adults; the proportion of those aged 55–64 with at least a junior high school education rose from 69.12% to 92.96% (Ministry of the Interior, 2024). Given the well-documented association between education and cognitive performance, updated normative data are essential for ensuring diagnostic accuracy.
Given these demographic changes, the psychometric strength of fruit-based SVF, and its cultural relevance, the present study aimed to update normative SVF data for older Taiwanese adults. We hypothesized that, due to increased educational attainment and health awareness in the aging population, updated norms would show higher semantic fluency scores compared to previously published data. Additionally, we sought to establish clinically meaningful cut-off scores to improve MCI detection accuracy. Lower cut-off scores are typically used to enhance sensitivity in population screening settings, while higher, specificity-focused thresholds tend to be applied when derived from MCI samples. These updated norms aim to support clinicians in accurately interpreting SVF results and enhancing early detection of cognitive decline in Taiwan’s aging population.
Method
Participants
Participants in the study comprised individuals diagnosed with MCI and cognitively healthy older adults (HC group), who were recruited from the Neurology Clinic at Taipei Veterans General Hospital by neurologists. Veteran status was not required for inclusion, and most participants were from the hospital’s general service area. MCI participants were either referred by neurologists or contacted by trained case managers in the outpatient clinic waiting area. HC participants were enlisted through community posters, with some being caregivers or family members of individuals with MCI. Participants were not financially reimbursed but received cognitive screening feedback upon completion.
MCI diagnosis followed Petersen’s criteria (Reference Petersen, Caracciolo, Brayne, Gauthier, Jelic and Fratiglioni2014): (1) self- or informant-reported cognitive complaints; (2) performance at least 1 SD below normative means on cognitive tests; (3) preserved independence in daily functioning; and (4) absence of dementia. Inclusion criteria for the HC group were: (1) age over 45 and (2) normal vision and hearing. Exclusion criteria for both groups were: (1) history of neurological or psychiatric conditions, including traumatic brain injury, stroke, dementia, endocrine disorders, systemic failure, or substance abuse; and (2) Mini-Mental State Examination (MMSE) scores below 24 for those with ≥ 2 years of education or below 18 for those with < 2 years of education (Wang, Reference Wang2007). The study protocol was approved by the Institutional Review Board of Taipei Veterans General Hospital (Approval No. 2023-01-019CC). All participants provided written informed consent prior to their inclusion in the study, in accordance with institutional guidelines and the ethical standards of the 1964 Declaration of Helsinki and its later amendments.
Neuropsychological assessments
Global cognitive function was assessed using the Mandarin version of the MMSE (Guo et al., Reference Guo, Liu, Wong, Liao, Yan, Lin, Chang and Hsu1988). SVF performance was evaluated using a fruit-naming task. Participants were asked to name as many different fruits as possible within 60 s in Mandarin Chinese. Responses were recorded verbatim by the examiner, and repeated or incorrect responses (e.g., non-fruit items) were excluded from the total score. No prompting or feedback was provided.
Object naming was assessed with the Mandarin version of the Boston Naming Test (Chen et al., Reference Chen, Lin, Lin, Yeh, Chen, Wang and Wang2014). Auditory verbal memory was measured using a 12-item word recall test (12-item recall), including immediate recall, delayed recall, and recognition (Vanderploeg et al., Reference Vanderploeg, Schinka, Jones, Small, Borenstein Graves and Mortimer2000). Processing speed was evaluated using the Mandarin version of the Trail Making Test Part A (TMT-A; Cheng et al., Reference Cheng, Hua, Liao and Chang2024). All assessments were administered by trained examiners following standardized procedures.
Statistical analyses
Statistical analyses were conducted using SPSS version 27.0 (IBM Corp., Armonk, NY). All qualifying participants were analyzed using the Intent-to-Treat principle. Outliers were only excluded for data errors or rule violations, preserving ecological validity and representing real-world clinical diversity. Additional sensitivity analyses excluding potential outliers yielded comparable results, confirming the robustness of the findings.
Descriptive statistics summarized demographic and cognitive data. The Shapiro–Wilk test assessed data normality. Pearson’s or Spearman’s correlation coefficients were used as appropriate to examine relationships between SVF performance and demographic factors.
Group comparisons were conducted using analysis of variance (ANOVA) with age, education, and sex as covariates, followed by Bonferroni-corrected post hoc tests. Kruskal–Wallis tests were used for non-parametric comparisons.
Hierarchical multiple regression assessed the effects of demographic variables on raw SVF scores. Correction equations were derived from regression coefficients (González et al., Reference González, Mungas, Reed, Marshall and Haan2001). The 15th percentile was used as the impairment threshold (Petersen et al., Reference Petersen, Caracciolo, Brayne, Gauthier, Jelic and Fratiglioni2014; Jak et al., Reference Jak, Bondi, Delano-Wood, Wierenga, Corey-Bloom, Salmon and Delis2009).
Additional neuropsychological measures were incorporated into the analysis to examine cognitive associations and predictors of SVF performance. Pearson’s correlation coefficients were computed separately for the HC and MCI groups to assess relationships between SVF scores and global cognitive measures. Multiple linear regression analyses were then performed within each group to determine which cognitive variables most effectively predicted SVF performance.
Receiver operating characteristic (ROC) curve analysis identified the optimal SVF cut-off for distinguishing MCI from healthy controls. We reported the area under the curve (AUC) and its 95% confidence interval (CI), with Youden’s index used to determine the optimal cut-off. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. No a priori thresholds for sensitivity or specificity were established. Cut-off scores were determined based on the maximum Youden’s index, balancing sensitivity and specificity. PPV and NPV estimates were based on a presumed MCI prevalence of 15.56%, derived from population-based epidemiological data (Bai et al., Reference Bai, Chen, Cai, Zhang, Su, Cheung, Jackson, Sha and Xiang2022).
Results
Demographic and cognitive characteristics
The HC group included 245 participants (mean age = 68.9 ± 8.3 years; mean education = 13.1 ± 6.2 years), and the MCI group included 360 participants (mean age = 71.5 ± 7.9 years; mean education = 10.8 ± 4.3 years). Significant group differences were observed in age (p < .001, Cohen’s d = .32), education (p < .001, Cohen’s d = .45), and MMSE scores (p < .001, Cohen’s d = 1.00). Group differences in sex distribution were not statistically significant (χ2 = 1, N = 605, p = .085, Cramer’s V = .07).
Normative data and demographic influences on SVF performance
SVF scores in the HC group were negatively correlated with age (r = –.414, p < .001) and positively correlated with education (r = .281, p < .001). Regression analysis showed that age, sex, and education explained 20.3%, 4.7%, and 4.0% of the variance in SVF performance, respectively. The results of the post hoc analyses are presented in Figures 1 and 2.

Figure 1. Age Group Differences in SVF. Group comparisons were analyzed using one-way ANOVA followed by Bonferroni-corrected post hoc tests. Error bars represent standard deviations. *p < .05, **p < .001.

Figure 2. Education-Level Differences in SVF. Group comparisons were analyzed using one-way ANOVA followed by Bonferroni-corrected post hoc tests. Error bars represent standard deviations. *p < .05, **p < .001.
SVF scores significantly differed between the HC and MCI groups (F (1, 601) = 59.62, p < .001, ηp 2 = .090) (Table 1). The HC group (M = 13.6, SD = 3.0) produced more words than the MCI group (M = 11.1, SD = 3.2). The correction equation was adjusted for sex (b = .229), age (b = − .344), and education (b = .258). After adjusting for demographic factors, the 15th percentile of the SVF scores was identified as 10, serving as the impairment threshold.
Table 1. Demographic and cognitive profiles of healthy controls and MCI participants, showing significant differences in age, education level, and MMSE performance

Note: a mean (standard deviation); b [minimum–maximum]. Abbreviations: HC = healthy older adults; MCI= mild cognitive impairment; MMSE= Mini-Mental State Examination; SVF = semantic verbal fluency.
Corrected Score = Raw Score − .229 × (Gender − 1.59) + .344 × (Age − 68.9) − .258 × (Education − 12.8),
where gender was coded as male = 1, female = 2. Age was recorded in years, and education reflects total years of formal schooling. The constants (1.59, 68.9, and 12.8) represent the sample means used for centering.
Cognitive correlates and predictors of SVF in HC and MCI groups
For the HC group, Spearman’s correlation showed significant relationships between SVF scores and 12-item immediate recall (r = .327, p < .001), TMT-A (r = -.321, p < .001), MMSE total scores (r = .296, p < .001), 12-item delayed free recall (r = .243, p < .001), Boston naming test (r = .186, p < .001), and writing sentence in MMSE (r = .179, p = .005). Regression analysis indicated MMSE total scores as the strongest predictor of SVF performance (β = .194, p = .001), after adjusting for age, education, and sex.
In contrast, in the MCI group, SVF scores significantly correlated with MMSE (r = .410, p < .001), 12-item immediate recall (r = .473, p < .001), 12-item delayed recall (r = .407, p < .001), Boston naming test (r = .322, p < .001), and TMT-A (r = −.377, p < .001). Regression analysis showed MMSE (β = .204, p < .001) and Boston naming test (β = .168, p < .001) as the strongest SVF performance predictors after adjusting for age, education, and sex.
Diagnostic accuracy of SVF for identifying MCI
ROC analysis identified a cut-off score of 11.5 words, yielding an AUC of .716 (95% CI: .68–.76), sensitivity of 57.8%, and specificity of 73.9%. Assuming an MCI prevalence of 15.56%, the PPV was 29.0% and the NPV was 90.5% (Figure 3). As the test scores are discrete integers, the practical threshold was set at the nearest whole number (≥ 12) for clinical applicability.

Figure 3. Receiver operating characteristic curve for the semantic verbal fluency Receiver operating characteristic analysis illustrating the ability of semantic verbal fluency scores to distinguish individuals with mild cognitive impairment from healthy controls.
Discussion
This study provides updated normative SVF data and proposes clinically relevant cut-off values for identifying MCI in older Taiwanese adults. SVF performance was influenced by age, education, and sex: older participants generated fewer words, those with higher education performed better, and women outperformed men. The cut-off score of 10 is recommended for community screening to maximize sensitivity, while a score of 11.5 is suitable for clinical diagnosis due to higher specificity.
The decline in SVF performance with advancing age is consistent with previous findings and reflects underlying changes in processing speed, executive functioning, and semantic memory retrieval (Chen, Reference Chen2010; Elgamal et al., Reference Elgamal, Roy and Sharratt2011; Fichman et al., Reference Fichman, Fernandes, Nitrini, Lourenço, Paradela, Carthery-Goulart and Caramelli2009; Mathuranath et al., Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003). A noticeable drop after age 60 highlights the test’s sensitivity to age-related neurobiological changes. Education also played a key role: individuals with more than 13 years of education outperformed those with fewer than 7 years, likely due to greater semantic knowledge and more effective verbal strategies (Cortés Pascual et al., Reference Cortés Pascual, Moyano Muñoz and Quílez Robres2019; Mathuranath et al., Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003). Women performed better than men, aligning with prior studies suggesting sex-related advantages in verbal tasks (Shirdel et al., Reference Shirdel, Esmaeeli, Alavi, Ghaemmaghami and Shariat2022; Zarino et al., Reference Zarino, Crespi, Launi and Casarotti2014). While the magnitude of gender differences may be smaller than those related to age or education, sociocultural and linguistic factors may still influence SVF performance (Ardila, Reference Ardila2020; Chen, Reference Chen2010; Zarino et al., Reference Zarino, Crespi, Launi and Casarotti2014).
As shown in Table 2, two education-adjusted cut-off scores are recommended. The 10-word threshold, corresponding to the 15th percentile in healthy controls, prioritizes sensitivity and is suitable for large-scale screening or community-based applications with low base rates of impairment. The 11.5-word cut-off provides enhanced specificity and is well-suited for clinical contexts where reducing false positives is essential, such as during confirmatory assessments or specialist referrals. As expected, cut-off scores established through ROC analyses anchored to the MCI group are more conservative than those derived from normative data, reflecting a preference for specificity over sensitivity in diagnostic confirmation. Owing to its brevity and straightforward administration, SVF can be incorporated into routine geriatric evaluations, for example, after the MMSE; however, its limited diagnostic accuracy underscores the importance of interpreting SVF results in conjunction with other cognitive assessments.
Table 2. Regression-adjusted SVF cut-off scores by education level and diagnostic purpose (screening vs. clinical confirmation)

Note: Cut-off score of 10 corresponds to the 15th percentile of SVF performance in the healthy control group and is recommended for community-based screening to minimize false negatives. A score of 11.5 is supported by ROC analysis and is recommended for clinical settings to increase specificity in MCI detection. These recommendations are based on data from older Taiwanese adults and should be applied with consideration of population-specific characteristics.
Table 3 presents cross-cultural comparisons, revealing considerable variation in SVF thresholds that reflect differences in language, task categories, and sample characteristics. The proposed cut-offs fall between lower values reported in English-speaking populations (Radanovic et al., Reference Radanovic, Carthery-Goulart, Charchat-Fichman, Herrera, Lima, Smid, Porto and Nitrini2007) and higher thresholds observed in younger or more educated cohorts (Acevedo et al., Reference Acevedo, Loewenstein, Barker, Harwood, Luis, Bravo, Hurwitz, Aguero, Greenfield and Duara2000; Radanovic et al., Reference Radanovic, Diniz, Mirandez, da Silva Novaretti, Flacks, Yassuda and Forlenza2009; Muangpaisan et al., Reference Muangpaisan, Intalapaporn and Assantachai2010; Mirandez et al., Reference Mirandez, Aprahamian, Talib, Forlenza and Radanovic2017; Shirdel et al., Reference Shirdel, Esmaeeli, Alavi, Ghaemmaghami and Shariat2022). These findings support the necessity of culturally and educationally appropriate norms to ensure diagnostic accuracy.
Table 3. Summary of articles on SVF cut-off scores for fruits based on ROC analysis and normative data

Note: The PPV and NPV in this study were calculated based on the global prevalence of MCI (15.56%). Abbreviations: PPV = positive predictive value; NPV = negative predictive value.
In Taiwan, the increasing educational attainment among older adults may affect the long-term stability of SVF norms. In comparison to previous normative studies (e.g., Chen, Reference Chen2010), participants in the current sample were older (M = 68.9 vs. 58.1) and had attained a higher level of education (M = 13.1 vs. 12.1), which corresponds with ongoing demographic changes in Taiwan’s aging population. These differences highlight the need for periodic updates to normative SVF data, ideally every 10 years, or the development of stratified norms by educational level.
Beyond its cross-sectional diagnostic utility, SVF may also serve as a marker for early cognitive changes. Prior research suggests that fluency impairments may precede overt clinical diagnoses and are associated with biomarker-positive states in preclinical populations (Chen, Reference Chen2010; Elgamal et al., Reference Elgamal, Roy and Sharratt2011; Fichman et al., Reference Fichman, Fernandes, Nitrini, Lourenço, Paradela, Carthery-Goulart and Caramelli2009; Mathuranath et al., Reference Mathuranath, George, Cherian, Alexander, Sarma and Sarma2003). Unlike previous Taiwanese normative studies, this research uses ROC analysis to establish cut-off scores, providing initial proof that SVF can help differentiate MCI from normal aging. Further longitudinal studies are warranted to investigate SVF’s predictive value for dementia conversion and to explore performance metrics such as clustering and switching.
While this study provides valuable insights, several limitations should be acknowledged. First, the normative sample was drawn from a single hospital in Taipei and may not fully represent the broader Taiwanese elderly population. Further validation in other regions and settings is warranted. Second, the cross-sectional design limits conclusions regarding the predictive validity of SVF performance or cut-off scores for future dementia conversion. MCI subtypes were not formally identified; while recruitment targeted memory complaints, the sample probably includes varied causes. Longitudinal studies that specify etiology are needed to overcome these limitations. Third, only one semantic category (fruits) was used, which may not capture the full range of verbal fluency. Performance may vary with other semantic or phonemic tasks. Fourth, the sensitivity and specificity of the derived SVF cut-off scores were relatively modest, indicating that the test’s diagnostic accuracy alone is limited. Because predictive values are inherently prevalence-dependent, these diagnostic indices may vary across clinical settings and should be interpreted with caution, considering the local base rate of MCI. Given these limitations, SVF performance should be evaluated in conjunction with other cognitive measures and clinical information to support more reliable diagnostic decisions. Finally, demographic differences between the healthy control and MCI groups – particularly in age and education – may have influenced results despite statistical adjustment, although we adjusted for these factors in analyses and interpretations.
Conclusion
This study generated updated SVF norms and empirically derived cut-off scores to identify MCI in older Taiwanese adults. Incorporating ROC-based diagnostic metrics increases the clinical value of SVF, making it an efficient, culturally relevant tool for routine cognitive assessments and improving early detection of cognitive decline in Taiwan’s aging population.
Supplementary material
The supplementary material for this article can be found at http://dx.doi.org/10.1017/S1355617725101665.
Funding statement
This research was funded by Academia Sinica of Taiwan (AS-KPQ-111-KNT) and the National Science and Technology Council of Taiwan (114-2314-B-075 -020 -MY2, 113-2321-B-001 -011 -, 114-2321-B-A49 -011 -), Taipei Veterans General Hospital (V112C-016, V113C-047, V114C-079), and the Brain Research Center, National Yang-Ming University, from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project of the Ministry of Education (MOE) in Taiwan. The interpretations and conclusions contained herein do not represent those of funding agencies.
Competing interests
The authors have no competing interests or conflicts of interest to report.




