1. Introduction
Verbal fluency (VF) tasks are fundamental instruments in the assessment of cognitive functioning (Demetriou & Holtzer, Reference Demetriou and Holtzer2017), specifically in the elderly, due to their ability to identify the speed of access to the mental lexicon, to assess its integrity, as well as the integrity of semantic memory. VF tasks also involve executive functions, such as updating, inhibitory control, cognitive flexibility and self-regulation (Aiello et al., Reference Aiello, Preti, Pucci, Diana, Corvaglia, Barattieri di San Pietro and Bolognini2022; Costa et al., Reference Costa, Bagoj, Monaco, Zabberoni, De Rosa, Papantonio and Carlesimo2014; Henry, Crawford & Phillips, Reference Henry, Crawford and Phillips2004; Lam & Marquardt, Reference Lam and Marquardt2022; Shao, Janse, Visser & Meyer, Reference Shao, Janse, Visser and Meyer2014; Troyer, Reference Troyer2000). There are four VF variants (grammatical, phonological, categorical/semantic and combined forms), a specific amount of time to do the task (usually, 60 seconds) and rules defining what cannot be done during evocation.
Grammatical VF is related to the evocation of adjectives or verbs during a limited time (Piatt, Fields, Paolo & Tröster, Reference Piatt, Fields, Paolo and Tröster2004). The most used forms of VF tasks are the phonological VF (PVF in advance) and the semantic variant (SVF). PVF requires the production of words that begin with a specific letter (e.g., A), whereas SFV requires the production of words belonging to a specific semantic category (e.g., animals) (Demetriou & Holtzer, Reference Demetriou and Holtzer2017). PVF is thought to primarily involve controlled access to memory representations in the mental lexicon, with support from the frontal lobe (Baldo, Schwartz, Wilkins & Dronkers, Reference Baldo, Schwartz, Wilkins and Dronkers2010; Marko et al., Reference Marko, Michalko, Dragašek, Vančová, Jarčušková and Riečanský2023). The search process in SVF mirrors the natural way of searching and retrieving information from the lexicon (Shao, Janse, Visser & Meyer, Reference Shao, Janse, Visser and Meyer2014) and consequently reflects automatic associative retrieval processes related to the integrity of the temporal lobes. Thus, the search process and performance in SVF, in addition to executive functions, rely on the organization of semantic memory (Henry, Crawford & Phillips, Reference Henry, Crawford and Phillips2004). In contrast, PVF must adhere to artificial constraints, which require the suppression of our natural retrieval methods (Heim, Eickhoff, Friederici & Amunts, Reference Heim, Eickhoff, Friederici and Amunts2009). This is why PVF is considered to have greater executive load; thus, many researchers view these tasks as valuable tools that are particularly sensitive in identifying cognitive disorders affecting the frontal lobe (Cipolotti et al., Reference Cipolotti, MacPherson, Gharooni, van-Harskamp, Shallice, Chan and Nachev2018; Rami, et al, Reference Rami, Molinuevo, Sanchez‐Valle, Bosch and Villar2007).
The cognitive mechanisms underlying each task are difficult to delineate because VF is multidimensional and maps onto cognitive domains including language, executive functions and processing speed. For example, Kiselica, Webber, and Benge (Reference Kiselica, Webber and Benge2020) reported that PVF loaded on executive functioning/speed factors, while SVF loaded on language factors. In other studies, both PVF and SVF are loaded on the language factor with limited support for an executive contribution to either type (Dowling, Hermann, La Rue & Sager, Reference Dowling, Hermann, La Rue and Sager2010; Whiteside et al., Reference Whiteside, Kealey, Semla, Luu, Rice, Basso and Roper2016). More recently, Aita et al. (Reference Aita, Beach, Taylor, Borgogna, Harrell and Hill2019) found that both tasks loaded onto a unique executive factor.
A special mention must be made for two combined VF forms: alternating SVF and PVF tasks (Costa et al, Reference Costa, Bagoj, Monaco, Zabberoni, De Rosa, Papantonio and Carlesimo2014; Lam & Marquardt, Reference Lam and Marquardt2022), and SVF with orthographic constraint (Shores, Carstairs & Crawford, Reference Shores, Carstairs and Crawford2006). Alternating VF, which requires shifting between semantic categories or letters throughout a task, involves additional attentional resources to track and prepare for upcoming semantic categories or phonemic targets. But, more importantly, shifting during alternating VF requires active suppression of clustering strategies, which should lead to greater cognitive effort compared to non-alternating VF (Lam & Marquardt, Reference Lam and Marquardt2022). Compared to typical fluency tasks, the orthographic constraint VF task is assumed to require more inhibitory control abilities (Macoir, Tremblay & Hudon, Reference Macoir, Tremblay and Hudon2022). To our knowledge, Macoir et al. (Reference Macoir, Tremblay and Hudon2022) is the only previous study showing that VF tasks with higher executive load are useful for detecting cognitive deficits in the preclinical stage of AD. However, Macoir et al. did not compare alternating VF and orthographic constraint VF to assess their level of difficulty (executive processing load). In the absence of prior empirical data establishing their relative difficulty, we assume that these FV variants are functionally comparable in terms of their executive demands.
Given all these arguments, it seems possible to establish a kind of gradient regarding the executive processing loads of FV tasks: SVF < PVF < Alternating FV = Orthographic Restriction FV (the equal sign indicates that they are functionally comparable in terms of executive demands).
Several factors, including age, educational level and gender, significantly influence performance on VF tasks, with education being the most impactful (Ekström, Josefsson, Bäckman & Laukka, Reference Ekström, Josefsson, Bäckman and Laukka2024; Hirnstein, Stuebs, Moè & Hausmann, Reference Hirnstein, Stuebs, Moè and Hausmann2023; Delgado-Losada et al., Reference Delgado-Losada, Rubio-Valdehita, López-Higes, Campos-Magdaleno, Avila-Villanueva, Frades-Payo and Lojo-Seoane2024; López-Higes et al., Reference López-Higes, Rubio-Valdehita, Fernández-Blázquez, Lojo-Seoane, Ávila-Villanueva, Montenegro-Peña and Delgado-Losada2022; Lubrini et al., Reference Lubrini, Periáñez, Laseca-Zaballa, Bernabéu-Brotons and Ríos-Lago2022; Peña-Casanova et al., Reference Peña-Casanova, Quinones-Ubeda, Gramunt-Fombuena, Quintana-Aparicio, Aguilar and Badenes2009). Most studies find no gender differences in PVF, although women tend to perform better than men in SVF for certain categories (Costa et al., Reference Costa, Bagoj, Monaco, Zabberoni, De Rosa, Papantonio and Carlesimo2014; López-Higes et al., Reference López-Higes, Rubio-Valdehita, Fernández-Blázquez, Lojo-Seoane, Ávila-Villanueva, Montenegro-Peña and Delgado-Losada2022; Peña-Casanova et al., Reference Peña-Casanova, Quinones-Ubeda, Gramunt-Fombuena, Quintana-Aparicio, Aguilar and Badenes2009). A decline in performance with age is reported in many studies (Cavaco et al., Reference Cavaco, Goncalves, Pinto, Almeida, Gomes, Moreira and Teixeira-Pinto2013; Contador et al., Reference Contador, Almondes, Fernández-Calvo, Boycheva, Puertas-Martín, Benito-León and Bermejo-Pareja2016; López-Higes et al., Reference López-Higes, Rubio-Valdehita, Fernández-Blázquez, Lojo-Seoane, Ávila-Villanueva, Montenegro-Peña and Delgado-Losada2022; Peña-Casanova et al., Reference Peña-Casanova, Quinones-Ubeda, Gramunt-Fombuena, Quintana-Aparicio, Aguilar and Badenes2009), although some recent research did not find age effects on PVF tasks (Delgado-Losada et al., Reference Delgado-Losada, Rubio-Valdehita, López-Higes, Campos-Magdaleno, Avila-Villanueva, Frades-Payo and Lojo-Seoane2024). In the Portuguese population, age and education consistently affect both PVF and SVF. Gender effects are inconsistent, but when observed, it tend to favor women in certain categories (Cavaco et al., Reference Cavaco, Goncalves, Pinto, Almeida, Gomes, Moreira and Teixeira-Pinto2013; Santos Nogueira et al., Reference Santos Nogueira, Azevedo Reis and Vieira2017; Vicente et al., Reference Vicente, Benito-Sánchez, Barbosa, Gaspar, Dores, Rivera and Arango-Lasprilla2022).
The performance on a VF test is usually evaluated by using the total number of correct words given within the time limit (Pekkala, Reference Pekkala2012). However, different authors have proposed more informative measures. Troyer et al. (Reference Troyer, Moscovitch and Winocur1997) proposed for the SVF tasks the mean cluster size and the number of switches between clusters. The average cluster size is associated with the word production process within a given semantic group/field. Basically, each cluster consists of two or more consecutive words belonging to a specific subcategory (e.g., domestic animals). The number of switches reflects, as its name suggests, the process of shifting from one semantic subcategory to another. It is obtained by counting the number of times each participant switches from one subcategory to another during the task.
In some studies, time segments have also been considered, provided that, in general, participants produce most of the words in the early stages of the task (first 20–30 seconds) using a semi-automatic rapid retrieval process (Linz et al., Reference Linz, Fors, Lindsay, Eckerström, Alexandersson and Kokkinakis2019; Venegas & Mansur, Reference Venegas and Mansur2011). Recent studies have computed the so-called ‘semantic-phonemic discrepancy’ (Aiello et al., Reference Aiello, Verde, Solca, Milone, Giacopuzzi Grigoli, Dubini and Poletti2023) by subtracting PVF scores from SVF ones (positive and negative values corresponding to a semantic and phonemic advantage, respectively), which seems to predict tauopathy across the AD spectrum.
The important role played by executive functions in VF has been highlighted in the preceding lines. We also know that impaired executive function is considered a sign of cognitive decline in the prodromal phase of AD (Bairami et al., Reference Bairami, Folia, Liampas, Ntanasi, Patrikelis, Siokas and Kosmidis2023; García-González, Reference García-González2024; Nevado et al., Reference Nevado, Del Rio, Martin-Aragoneses, Prados and Lopez-Higes2021). Early identification of neurodegenerative processes is a critical concern for public health as well as clinical and scientific research, since preventing the appearance of signs and symptoms of dementia and the diagnosis of AD is essential to reduce its negative impacts (McDonough & Allen, Reference McDonough and Allen2019).
The typical progression of neurocognitive disorders associated with Alzheimer’s disease (AD) can be understood as a continuum from normal aging to dementia. It begins with an asymptomatic phase, where individuals show no noticeable symptoms, progressing into subjective cognitive decline (SCD). In this early stage, individuals experience subjective complaints about their memory or cognition, which are linked to an increased risk of future cognitive impairment and AD development (Jessen et al., Reference Jessen, Amariglio, Van Boxtel, Breteler, Ceccaldi and Chételat2014). Although diagnosing SCD is challenging due to its subjective nature, criteria have been established to identify it, emphasizing the importance of early detection. Evidence suggests that some individuals with SCD may be in a preclinical stage of AD, especially when associated with biomarkers like amyloid deposits (Sanabria et al., Reference Sanabria, Alegret, Rodriguez-Gomez, Valero, Sotolongo-Grau, Monté-Rubio and Boada2018) or subtle cognitive deficits such as reduced semantic VF (Nikolai et al., Reference Nikolai, Bezdicek, Markova, Stepankova, Michalec, Kopecek and Vyhnalek2018). As the condition advances, some individuals develop mild cognitive impairment (MCI), particularly the amnestic subtype, characterized by objective memory deficits but preserved daily functioning. MCI represents an intermediate state between normal aging and dementia, with a higher likelihood of progression to AD. Diagnostic criteria for MCI include subjective complaints confirmed by informants, measurable cognitive decline and maintained independence in daily activities, although subtle impairments may be present (Demetriou & Holtzer, Reference Demetriou and Holtzer2017; Petersen, Reference Petersen2004; Petersen et al., Reference Petersen, Caracciolo, Brayne, Gauthier, Jelic and Fratiglioni2014). Finally, when cognitive decline becomes substantial, interfering with independence and involving multiple domains, a diagnosis of Alzheimer’s dementia is made. This stage is marked by significant impairments in memory, reasoning, language and other cognitive functions, often confirmed through neuropsychological testing and supported by biomarkers such as neuroimaging and cerebrospinal fluid (CSF) analysis (Jack et al., Reference Jack, Bennett, Blennow, Carrillo, Dunn, Haeberlein and Silverberg2018).
In this context, an important challenge could be to determine which VF tasks can serve as early markers of a neurodegenerative process in the preclinical stage of the disease. In this sense, some longitudinal studies have revealed that patients with preclinical AD show a differentially worse decline in SVF compared to PVF, highlighting the sensitivity of SVF for detecting AD progression in the early stages (Demetriou & Holtzer, Reference Demetriou and Holtzer2017; Wright, De Marco & Venneri, Reference Wright, De Marco and Venneri2023). Other cross-sectional studies have reported that individuals with SCD, who are in an intermediate preclinical stage in the continuum from normality to dementia, show lower performance on SVF tasks compared to cognitively intact individuals (Koppara et al., Reference Koppara, Wagner, Lange, Ernst, Wiese, König and Jessen2015; Nikolai et al., Reference Nikolai, Bezdicek, Markova, Stepankova, Michalec, Kopecek and Vyhnalek2018). A recent study conducted by Macoir et al. (Reference Macoir, Tremblay and Hudon2022) concluded that high-executive-processing-load VF tasks are useful for the identification of cognitive deficits in SCD.
To determine the potential contribution of VF to the early identification of cognitive impairment in the preclinical stage of AD, the present study explored the performance of older adults with SCD, MCI and healthy controls (HCs) in VF tasks having different executive processing loads. The HC group will show performance across the VF tasks according to the previously mentioned gradient: SVF < PVF < Alternating VF = Arthographic Constraint VF. By hypothesis, the group of participants with SCD is expected to have a performance similar to that of the HC group in both the SVF and PVF tasks, and a significantly lower performance in the tasks with high executive load. Finally, the performance of the MCI patient group is expected to be significantly lower than that of the HC group in all VF tasks, and in addition, in the case of high-executive-load tasks, it will also be lower than that of the SCD group.
2. Methodology
2.1. Participants
The total sample consisted of 97 participants from the northern region of Portugal, with an average age of 77.40 years (SD = 5.54), distributed by gender (47 females and 50 males). The recruitment process was conducted through direct contact with the heads or directors of senior universities and day social centers, who were informed about the study’s objectives and characteristics. This aimed to reach participants with different cognitive profiles and sociodemographic characteristics, such as educational level. After the initial contact, a letter and/or an email were delivered to the heads or directors of the centers who were interested. These included basic information about the study, the call for volunteer participants in the research and different forms of contact with project personnel.
The educational level of the participants is varied, although primary education predominates as the main category: 1% only have basic literacy (basic ability to read and write), 67% primary education, 21.6% have completed the second and third cycles, 7.2% have secondary education and 3.1% reached higher education. Participants were divided into three groups: HC elders, SCD and patients with MCI according to the following exclusion and inclusion criteria.
Subjects with a prior diagnosis of neurological or psychiatric diseases or having severe symptoms at the starting point of the study were excluded. Furthermore, those who had problems with alcohol or were under the influence of medications that could influence cognitive performance, as well as those who had levels of depressive or anxious symptoms above the normative cutoff point: 4 on the Geriatric Depression Scale (GDS-15) (Apostle et al., Reference Apostle, de Jesus Loureiro, dos Reis, da Silva, Cardoso and Sfetcu2014) for depression, and 32 on the State–Trait Anxiety Inventory (STAI-Y2) (Silva & Spielberger, Reference Silva and Spielberger2007) for anxiety, were also excluded from the study.
According to Jessen et al.’s (Reference Jessen, Amariglio, Van Boxtel, Breteler, Ceccaldi and Chételat2014) criteria, SCD participants have a self-perception of progressive deterioration in cognitive functioning (mainly associated with memory) and have memory or other subjective cognitive complaints. To determine the presence of memory complaints, the following three questions were used:
-
1. Do you have memory or other cognitive problems?
-
Yes/No.
-
-
2. Have you lost memory in recent years?
-
a) No.
-
b) Yes, but not much, it is not a problem because it does not affect my daily life.
-
c) Yes, and that is a problem for me because it somewhat affects my daily life.
-
d) Yes, and it causes me significant problems because it significantly affects my daily life.
-
-
3. Have you consulted a doctor or any professional about your memory problems?
-
Yes, who have you consulted?
-
No.
-
The participants were initially categorized as having SCD if they responded yes to question 1, selected alternatives c or d in question 2 and answered yes in question 3 (sometimes people answer no to this question). Participants’ responses to the Everyday Memory Questionnaire (EMQ) (Ribeiro et al., 2024) evidenced memory failures (greater than 10 points). All of them had normal scores on the Auditory Verbal Learning Test (AVLT) (Cavaco et al., Reference Cavaco, Gonçalves, Pinto, Almeida, Gomes, Moreira and Teixeira-Pinto2015) (difference between fifth and first essays [learning]; delayed recall), in the Memory Strategies Test scores (TEM) (Fernandes et al., Reference Fernandes, Araújo, Vázquez-Justo, Pereira, Silva, Paul and Maestú2018) and preserved activities of daily living according to the Functional Scale of Instrumental Activities of Daily Living (IADL) (Lawton, Reference Lawton1969; Núcleo de Estudos de Geriatria da Sociedade Portuguesa de Medicina Interna – GERMI, 2017). Montreal Cognitive Assessment Test (MOCA) scores denote normality according to Freitas et al. norms adjusted for age and educational level (Freitas et al., Reference Freitas, Simões, Martins, Vilar and Santana2010). The group with SCD consisted of 33 participants (14 women and 16 men), with a mean age of 76.81 (SD = 5.60). In this group, 50% of the participants have basic/primary education, 31.3% have completed the second and third cycles, 15.6% have secondary education and 3.1% have higher education.
The inclusion criteria for participants with MCI (Petersen, Reference Petersen2004; Reference Petersen2011) were based on corroborated memory complaints using the three questions described above, and the above-mentioned cut-off point in EMQ, evidencing significant memory failures. Additionally, they showed a deficit in the memory domain assessed by AVLT and TEM. All participants in this group have sufficiently preserved ability to perform activities of daily living, as evidenced by the IADL, and they do not meet diagnostic criteria for dementia. MOCA scores of these participants are below the mean, considering age and educational level (Freitas et al., Reference Freitas, Simões, Martins, Vilar and Santana2010). The group of patients with MCI had a total of 34 participants (16 women and 17 men), with a mean age of 79.65 (SD = 5.43). In terms of education, 2.9% have only basic literacy, 94.1% have completed primary education and 2.9% have completed the second and third cycles.
Identification of the HC group involved the absence of complaints related to memory, language or other cognitive functions, in addition to presenting normal or even above-average performance in memory tests (AVLT and TEM) and IADL. MOCA scores denote normality according to Freitas et al. (Reference Freitas, Simões, Martins, Vilar and Santana2010) norms adjusted for age and educational level. The HC group had a total of 31 participants (16 women and 15 men), with a mean age of 75.55 (SD = 4.88). This group had 54.8% participants with primary education, 32.3% had completed the second and third cycles, 6.5% with secondary education and 6.5% with higher education.
2.2. Instruments
Initially, a sociodemographic questionnaire was applied to collect basic information about participants, such as age, gender, education level, medical history and other relevant variables. A set of instruments was selected to classify participants according to the inclusion/exclusion criteria mentioned before:
-
a) EMQ (Rodrigues et al., Reference Rodrigues, Bártolo, Ribeiro, López-Higes, Rubio-Valdehita, Caetano and Fernandes2025; Royle & Lincoln, Reference Royle and Lincoln2008) is an instrument that assesses memory failure in everyday life. It contributes to a deeper understanding of the memory difficulties perceived by the subjects. Evidence of a strong internal reliability (Cronbach’s alpha of 0.92) of the scale items (Rodrigues et al., Reference Rodrigues, Bártolo, Ribeiro, López-Higes, Rubio-Valdehita, Caetano and Fernandes2025). The use of this instrument made it possible to differentiate participants who presented subjective memory complaints.
-
b) AVLT (Cavaco et al., Reference Cavaco, Gonçalves, Pinto, Almeida, Gomes, Moreira and Teixeira-Pinto2015; Ryan & Geisser, Reference Ryan and Geisser1986) is a relevant instrument for evaluating verbal memory in elderly people, especially those with cognitive dysfunctions. This instrument integrates immediate and delayed recall tests and provides information on verbal acquisition, learning and retention (Cavaco et al., Reference Cavaco, Gonçalves, Pinto, Almeida, Gomes, Moreira and Teixeira-Pinto2015). It has excellent internal consistency (Cronbach’s alpha of 0.89). This instrument is crucial to highlight deficits in the memory domain and to differentiate the performance of participants with MCI, SCD and HC.
-
c) TEM (Fernandes et al., Reference Fernandes, Araújo, Vázquez-Justo, Pereira, Silva, Paul and Maestú2018; Yubero, Maestú, Paul & Gil, Reference Yubero, Gil, Paul and Maestú2011) is an instrument that assesses the impact of executive functions in declarative memory tasks, distinguishing whether memory deficits are due to primary memory problems or executive difficulties. Useful for evaluating memory and executive functions in different age groups, presenting criterion validity and providing empirical data for interpreting performance in memory tests. TEM has acceptable internal consistency (Cronbach’s alpha of 0.74) (Fernandes et al, Reference Fernandes, Araújo, Vázquez-Justo, Pereira, Silva, Paul and Maestú2018).
-
d) IADL (Lawton, Reference Lawton1969; Núcleo de Estudos de Geriatria da Sociedade Portuguesa de Medicina Interna – GERMI, 2017) is an instrument that assesses the necessary skills so that the elderly can live in a healthy way independently. It is essential for identifying changes in functional capacity over time, providing insights into an individual’s daily functioning across eight domains of function (Graf, Reference Graf2008). This instrument shows high internal consistency (Cronbach’s alpha of 0.88) in its item scale (Araújo, Paúl & Ribeiro, Reference Araújo, Ribeiro and Paúl2017). This instrument was relevant in this study to assess the level of independence of the elderly and thus exclude individuals who do not have sufficiently preserved autonomy for daily activities.
-
e) STAI-Y2 (Silva & Spielberger, Reference Silva and Spielberger2007; Spielberger et al., Reference Spielberger, Gonzalez-Reigosa, Martinez-Urrutia, Natalicio and Natalicio1971) is a self-report questionnaire to assess state–trait anxiety using a Likert scale. It is extremely used and reliable across different age groups, with good internal consistency (Cronbach’s alpha of 0.87) (Bieling, Antony & Swinson, Reference Bieling, Antony and Swinson1998; Julian, Reference Julian2011). This instrument identifies the presence or absence of anxious symptoms, thus allowing the exclusion of participants who present levels of anxious symptoms above the cutoff point.
-
f) GDS (Apostle, et al., Reference Apostle, de Jesus Loureiro, dos Reis, da Silva, Cardoso and Sfetcu2014; Yesavage, et al., Reference Yesavage, Brink, Rose, Lum, Huang, Adey and Leirer1982) is an instrument that identifies the presence and severity of depressive symptoms. It consists of a series of questions that address feelings and behaviors associated with depression, revealing good internal consistency (Cronbach’s alpha of 0.83) (Apostle, et al., Reference Apostle, de Jesus Loureiro, dos Reis, da Silva, Cardoso and Sfetcu2014; Yesavage, et al., Reference Yesavage, Brink, Rose, Lum, Huang, Adey and Leirer1982). GDS is essential to control depressive symptoms that can interfere with participants’ cognition, so it is used to exclude participants who have a level of depression above the cutoff point.
-
g) MOCA (Freitas et al., Reference Freitas, Simões, Martins, Vilar and Santana2010; Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin and Chertkow2005) is an effective and practical cognitive screening instrument to detect cognitive impairment. This instrument assesses six cognitive domains: memory, executive functions, attention and concentration, language, visuospatial ability and orientation. Furthermore, it shows good internal consistency in the Portuguese adaptation, with a Cronbach’s alpha coefficient of 0.79 (Freitas et al., Reference Freitas, Simões, Martins, Vilar and Santana2010).
The study includes in its experimental phase four types of VF tasks with different executive loads. In the PVF tasks, participants evoke words starting with the letter ‘A’, and then with the letter ‘S’. In SVF tasks, participants must produce animal names and then pieces of clothing. In the alternating VF variant, the subjects were asked to alternate between words starting with the letter ‘M’ and then a name of a fruit category, continuing in the same manner until the end of the established time. In the orthographic constraint VF, participants must produce names corresponding to parts of the body that did not have the letter ‘R’ until the end of time.
2.3. Procedure
The study procedure began with sending the project proposal to the Health Ethics Committee of the Portucalense University Infante D. Henrique. The data collection process began with the prior sending of authorization requests to the participant institutions to carry out the collection on their premises, requesting the guarantee of an adequate and quiet environment, with a table and two chairs. After acceptance by the institutions, informed consent was presented to the voluntary participants with details on the purpose of the study, the procedures involved, the exclusion criteria, the rights of the participants and the guarantee of confidentiality of the information provided. Participants had the opportunity to review informed consent and ask questions before agreeing to participate. Data collection began only after the participant signed the informed consent. Furthermore, the data collection process was carried out ethically, guaranteeing the privacy and confidentiality of the participants.
This process was individual and carried out in a single moment, divided into two blocks separated by a break, with an average duration of 60–90 minutes, depending on the participant. The first block consisted of the application of the neuropsychological assessment protocol described in the Instruments subsection, adapted to the Portuguese population.
After a break, all participants were subjected to an experimental phase in which they performed the four VF tasks. All these tasks had an execution time of 90 seconds each. The order of presentation of these tasks was counterbalanced across subjects.
When participants do not follow the instructions, lose the thread or forget what to do, the examiner provides gentle reminders or prompts to help them refocus on the task. If they continue to have difficulty, the examiner might give a brief rest period or clarify the instructions again. In some cases, if the participant is unable to continue despite prompts, the task is discontinued, and the performance is recorded as incomplete.
2.4. Data analysis
As a previous step, we computed mean scores for PVF and SVF tasks. Descriptive statistics were included for sociodemographic variables and neuropsychological characteristics of groups. Then, a multivariate ANOVA was used to find significant differences between the groups in all these variables.
In the second phase, we obtained descriptive statistics for the groups across tasks and time intervals, as well as the correlations between participants’ years of education and all dependent measures. A generalized mixed ANOVA (Group [3] x Task [4] x Time interval [3]), with participants’ years of education as a covariate, was conducted to search for significant simple and interaction effects. Partial Eta square (Ƞ2p) was used to estimate the effect size. A series of ANOVAs was used to explore the significant interactions involving Group. Paired-sampled t tests were also used to search for differences between the pairs (1) PVF and SVF, and (2) alternating and orthographic constraint VF in the HC group.
A discriminant analysis was performed to determine the relative weight of VF variants in subjects’ classification. Finally, ROC curve analyses were performed to check if the variables that had the most important role in classification should show individually good diagnostic accuracy and to obtain cut-off values with potential clinical significance in those variables.
3. Results
3.1. Descriptives and differences between groups
A multivariate ANOVA with all the variables summarized in Table 1 showed that there were statistically significant differences between the three groups in the following variables: Years of education (F[2, 89] = 10.88, p < .001), EMQ (F[2, 89] = 44.32, p < .001), learning in AVLT (fifth essay–first essay) (F[2, 89] = 19.06, p < .001), delayed recall in AVLT (F[2, 89] = 48.35, p < .001), TEM1 (F[2, 89] = 48.77, p < .001), TEM2 (F[2, 89] = 24.31, p < .001), TEM3 (F[2, 89] = 46.03, p < .001), TEM4 (F[2, 89] = 61.12, p < .001), TEM5 (F[2, 89] = 38.01, p < .001) and MOCA (F[2, 89] = 95.77, p < .001).
Table 1. Descriptive data by group and statistical differences between groups

Notes: * p < .01; ** p < .001.
Abbreviations: EMQ: Everyday Memory Questionnaire; AVLT: Auditory Verbal Learning Test; TEM: Test de Estrategias de Memoria (English trad.: Memory Strategies Test); IADL: Functional Scale of Instrumental Activities of Daily Living; STAI-Y2: State–Trait Anxiety Inventory; GDS-15: Geriatric Depression Scale 15 items; MOCA: Montreal Cognitive Assessment Test.
There were no differences between groups in age, IADL, STAI_Y2 and GDS-15. Significant pairwise post hoc comparisons between groups are displayed in the right half of Table 1.
Descriptive statistics for time intervals across tasks and diagnostic groups are shown in Table 2.
Table 2. Descriptive statistics for time intervals across tasks and groups

Participants’ years of education were strongly associated with all the dependent variables in the study (see Table 3); thus, this sociodemographic variable was included as a covariate in subsequent analysis.
Table 3. Pearson correlations between all the dependent variables and participants’ years of education

Note: **p < .0001; *p < .05.
A multivariate mixed ANOVA 3 x 4 x 3 was conducted involving one between-subjects factor (diagnostic Group), two repeated measures factors (Task and Time Interval) and the participants’ years of education as a covariate. The results of this analysis pointed out that there were significant differences due to Group (F[2, 88] = 46.82, p < .001; Ƞ2p = .52, power = 1.00), Task (F[3, 86] = 32.75, p < .001; Ƞ2p = .53, power = 1.00) and Time Interval (F[2, 87] = 18.43, p < .001; Ƞ2p = .30, power = 1.00). There were also three significant interactions: Group x Task (F[6, 174] = 6.21, p < .001; Ƞ2p = .17, power = .99), Group x Time Interval (F[4, 176] = 15.17, p = .012; Ƞ2p = .07, power = .83) and Task x Time Interval (F[6, 83] = 14.86, p < .001; Ƞ2p = .52, power = 1.00.
Regarding the influence of the covariate, this analysis revealed that participants’ years of education have an overall effect on VF performance (F[1, 88] = 141.14, p < .001; Ƞ2p = .61, power = 1.00). This covariate also interacts significantly with Task (F[3, 264] = 15.52, p < .001; Ƞ2p = .15, power = 1.00), Interval (F[2, 176] = 16.79, p < .001; Ƞ2p = .16, power = .99), and Task x Interval (F[12, 528] = 3.21, p = .004; Ƞ2p = .03, power = .93). A linear trend better characterized the relationship between the covariate and Task (F[1, 88] = 30.48, p < .001; Ƞ2p = .26, power = 1.00) and between the covariate and Interval (F[1, 88] = 26.18, p < .001; Ƞ2p = .23, power = .99). That is, as participants’ education level increases, their performance across different tasks and time intervals tends to change in a predictable, linear way.
Next, a detailed analysis of the two former interactions (Group x Task and Group x Time Interval) was completed through univariate ANOVAs, first across Tasks (SVF, PVF, Alternating VF, and Orthographic constraint VF) and then across Intervals (0–30, 31–60, and 61–90) and using post hoc pairwise comparisons between Groups. In the first set of analyses, we used the total number of words produced in 90 seconds for each Task. Table 4 summarizes descriptives and significant differences between groups (pairwise comparisons) across tasks.
Table 4. Descriptives and significant differences between groups (pairwise comparisons) across tasks

Notes: 1: HC; 2: SCD; 3: MCI * p = .01, ** p < .001.
An inspection of Table 4 reveals that the HC and SCD groups differ significantly from the MCI group in all four tasks. On the other hand, there are no differences between the groups HC and SCD in SVF nor in PVF, but the number of words produced by the two groups differs significantly in the Alternating and Orthographic VF versions. Concerning the proposed gradient for the VF tasks in the HC group, results pointed out that there are significant differences between SVF and PVF (paired-samples t test = 6.64, p < .001), and between the Alternating and the Orthographic versions (paired-samples t test = −4.59, p < .001).
Another set of analyses was conducted for the second interaction of interest (Group x Time Interval). For this purpose, we compute the overall number of words produced in each Time Interval (0–30, 31–60, and 61–90) by the three groups of participants. Table 5 shows significant differences between groups (pairwise comparisons) across time intervals.
Table 5. Differences between groups (pairwise comparisons) across time intervals

** p < .001.
Significant contrasts follow the same pattern across time intervals. The overall number of words produced by HC and SCD participants across time intervals is similar, but it differs significantly from that of the MCI group.
3.2. Relevant variables in subjects’ classification
To determine their importance for subjects’ classification, a discriminant analysis was computed using the overall number of words produced in each task as an independent variable. As the Box’s M statistic was significant, the analysis was performed using group covariance matrices of the canonical discriminant functions. Table 6 summarizes the eigenvalues of the two discriminant functions obtained in the analysis using the Enter method, the percentage of variance explained by each one, as well as Wilks’ Lambda and Chi-square statistics.
Table 6. Results of discriminant analysis

Results show that 80.04% of the original cases were correctly classified using these functions. The structure matrix (Table 7) shows that the Orthographic constraint and the Alternating VF (in this order) have the highest correlations with the first function, explaining 80.6% of the total variance. Regarding the second function, PVF is the unique significant variable explaining the remaining 19.4% of the variance.
Table 7. Structure matrix

Notes: Variables are ordered by the absolute size of correlation within the function. An asterisk (*) means that the variable remains in the discriminant equation at the final step.
3.3. ROC analyses
ROC analysis using Orthographic constraint VF and HC as the positive success criteria (the negative state group includes SCD and MCI participants) showed an area under the curve (AUC) of 0.863, 95% CI (.756, .915), p < .001. This value means that the Orthographic constraint VF score has good diagnostic accuracy. Figure 1 illustrates the curve related to diagnostic accuracy.

Figure 1. ROC curve for the VF with Orthographic constraint overall word productivity (VF_ORTHOG). The reference line is grey.
The Kolmogorov–Smirnov (K-S) metric indicates that the model is good enough for classification since it is equal to 0.542 (maximum = 1.0), thus it has a medium quality to distinguish between the two groups. The optimal cut-off for the Orthographic constraint VF score based on the K-S metric for our sample was 8.5.
Concerning the Alternating VF score, ROC analysis using HC as the positive success criteria (the negative state group includes again SCD and MCI participants) showed an AUC of 0.819, 95% CI (.735, .903), p < .001. This value means that the Alternating VF score also has good diagnostic accuracy. Figure 2 illustrates the curve related to its diagnostic accuracy.

Figure 2. ROC curve for the Alternating VF overall word productivity (VF_ALTERN). The reference line is grey.
K-S is equal to .509, indicating that the model is good enough for classification, thus it has a medium quality to distinguish between the two groups. The optimal cut-off for the VF_ALTERN score based on the K-S metric for our sample was 4.5.
4. Discussion
4.1. Summary of findings
The present study investigated the potential benefits of extending standard VF protocols commonly used in the neuropsychological assessment of age-related cognitive decline. We hypothesized that tasks with greater executive demands – specifically, Alternating and Orthographic constrained VF – would lead to lower performance among healthy older adults (HC) and better distinguish individuals with SCD from their cognitively healthy peers. The results broadly supported these hypotheses. Among HC, they produced significantly more words in the SVF task than in the PVF task, as expected. Additionally, and somewhat unexpectedly, performance varied between the Alternating and Orthographic VF tasks, suggesting different degrees of executive demand across conditions. In terms of task difficulty, performance among HC participants followed the order: SVF > Orthographic VF > PVF > Alternating VF.
Importantly, this performance gradient was also reflected in the group comparisons. While individuals with SCD performed comparably to HC participants on the SVF and PVF tasks, they showed significantly lower performance on the more demanding VF variants. This indicates that executive-loaded tasks may be more sensitive to subtle cognitive changes and could provide added diagnostic value beyond traditional VF assessments.
As expected, MCI participants performed significantly worse than both the HC and SCD groups across all VF tasks. This is consistent with a wide range of studies documenting global cognitive deficits in MCI (McDonnell et al., Reference McDonnell, Dill, Panos, Amano, Brown, Giurgius and Miller2020; Kirova et al., Reference Kirova, Bays and Lagalwar2015). Nevertheless, since the MCI group had significantly fewer years of education, these differences cannot be attributed solely to cognitive impairment. Education emerged as a strong covariate across all tasks and intervals, highlighting the importance of considering it as a continuous moderating factor. This underscores the need to calibrate expectations of cognitive performance according to educational background in both clinical and research settings.
Nonetheless, the findings reinforce the idea that both traditional (SVF and PVF) and executive-demanding VF tasks are effective tools for identifying cognitive decline at different stages.
4.2. Executive demands in fluency tasks
The performance pattern observed in the HC group aligns well with theoretical expectations regarding executive demand. SVF, which relies primarily on automatic semantic retrieval, was the easiest (Shores et al., Reference Shores, Carstairs and Crawford2006). In contrast, both PVF and orthographic VF required greater cognitive control, while Alternating VF imposed the highest executive demands due to the need for mental set shifting, attentional monitoring and inhibition (Lam & Marquardt, Reference Lam and Marquardt2022; Macoir et al., Reference Macoir, Tremblay and Hudon2022). This gradient is consistent with previous research (e.g., Heim et al., Reference Heim, Eickhoff, Friederici and Amunts2009; Macoir et al., Reference Macoir, Tremblay and Hudon2022; Shao et al., Reference Shao, Janse, Visser and Meyer2014), reinforcing the idea that tasks imposing multiple executive constraints place a heavier burden on cognitive resources.
Notably, orthographic VF appeared less demanding than PVF for HC participants, despite involving inhibitory control. One possible explanation is that semantic-based retrieval (e.g., naming body parts) may benefit from more automatic associative processes (Macoir et al., Reference Macoir, Tremblay and Hudon2022), which remain relatively preserved under orthographic constraints – especially in transparent orthographies like Portuguese. PVF, by contrast, involves artificial constraints and suppresses natural retrieval strategies, thereby increasing cognitive load (Heim et al., Reference Heim, Eickhoff, Friederici and Amunts2009; Marko et al, Reference Marko, Michalko, Dragašek, Vančová, Jarčušková and Riečanský2023).
These task-specific demands were particularly relevant for differentiating SCD from HC individuals. While previous studies have reported inconsistent findings regarding the sensitivity of SVF and PVF in early cognitive impairment (Nikolai et al., Reference Nikolai, Bezdicek, Markova, Stepankova, Michalec, Kopecek and Vyhnalek2018; Aiello et al., Reference Aiello, Verde, Solca, Milone, Giacopuzzi Grigoli, Dubini and Poletti2023), our results emphasize the diagnostic value of tasks specifically designed to engage executive functions. The significant performance gap between SCD and HC groups on executive-loaded VF tasks supports the hypothesis that early executive dysfunction characterizes SCD, a view echoed by recent studies (Jessen et al., Reference Jessen, Wolfsgruber, Kleineindam, Spottke, Altenstein, Bartels and Düzel2023; López-Higes et al., Reference López-Higes, Prados, Rubio, Montejo and Del Río2017; Macoir et al., Reference Macoir, Tremblay and Hudon2022).
Previous studies have also reported that older adults with SCD tend to exhibit slightly lower average performance across several cognitive domains compared to those without SCD (Kielb et al., Reference Kielb, Rogalski, Weintraub and Rademaker2017; Morrison, Reference Morrison2023). These findings suggest that some individuals with SCD may experience subtle neuropsychological deficits that do not yet meet the clinical threshold for MCI but can still be identified through standardized neuropsychological testing. Wolfsgruber et al. (Reference Wolfsgruber, Kleineidam, Guski, Polcher, Frommann and Roeske2020) found that lower performance on composite scores assessing verbal memory, executive function, language and global cognition was associated with elevated CSF biomarkers of AD in memory clinic patients with SCD. This supports the notion that minor neuropsychological deficits in individuals with SCD may signal emerging AD pathology. Patients with SCD who demonstrate subtle deficits on cognitive tests not only show higher levels of AD biomarkers but also face an increased risk of cognitive decline and clinical progression. Therefore, the presence of these minor neuropsychological impairments may help identify a high-risk subgroup among individuals seeking medical advice for subjective cognitive concerns (Stark et al., Reference Stark, Wolfsgruber, Kleineidam, Frommann, Altenstein, Bartels and Wagner2023).
4.3. Clinical implications
Provided that all VF tasks are brief and easy-to-administer assessment instruments, it would be highly recommended to include both classic tasks (SVF and PVF) and tasks with greater executive load (alternating and with orthographic constraint) in neuropsychological assessment protocols. The former tasks would be sensitive in more advanced stages of deterioration (MCI), whereas the latter variants have good diagnostic accuracy to detect patients with incipient executive problems (preclinical stage SCD). Both discriminant and ROC analyses confirmed the diagnostic accuracy of these tasks, with areas under the curve exceeding 0.80 and clear cut-off points (≤ 8.5 for Orthographic VF; ≤ 4.5 for Alternating VF). These benchmarks are consistent with prior findings (Macoir et al., Reference Macoir, Tremblay and Hudon2022; Alegret et al., Reference Alegret, Peretó, Pérez, Valero, Espinosa, Ortega and Boada2018) and provide clinicians with actionable thresholds for early detection. This is especially valuable in primary care or community settings where access to biomarkers or neuroimaging may be limited.
There are some previous studies that find results compatible with this line of reasoning. For example, the study conducted by Alegret et al. (Reference Alegret, Peretó, Pérez, Valero, Espinosa, Ortega and Boada2018) with a Spanish sample aimed to explore the presence of VF deficits in individuals with MCI and mild AD dementia. The researchers also sought to evaluate the utility of VF tests in identifying cognitively healthy individuals who are at risk of progressing to MCI, as well as those with MCI who may later develop dementia. Additionally, the study aimed to establish specific VF cut-off scores that could be useful for cognitive assessment within the Spanish population. To achieve these objectives, the study involved a large sample comprising 568 cognitively healthy participants, 885 individuals with MCI and 367 with mild AD dementia. All participants underwent VF testing alongside a comprehensive neuropsychological battery. Furthermore, longitudinal analyses were conducted on a subset of the sample (231 cognitively healthy individuals and 667 MCI subjects) to identify whether VF performance could predict future diagnosis conversions over time. The results demonstrated that lower performance on VF tests was significantly associated with the progression from cognitively healthy status to MCI, as well as from MCI to dementia. When analyzing the influence of time until diagnosis conversion, VF performance showed a significant effect in predicting faster progression from normality to MCI. However, this predictive effect was not observed for the transition from MCI to dementia. The researchers also calculated VF cut-off scores along with sensitivity and specificity values for six different conditions, which were stratified by three age ranges and two educational levels. In conclusion, their findings suggest that VF testing is a valuable neuropsychological tool for detecting different stages of cognitive decline related to AD. Since VF deficits appear early in the disease process, this assessment can be particularly useful not only for identifying cognitively healthy individuals at risk of developing MCI but also for monitoring the progression from MCI to dementia.
McDonnell et al. (Reference McDonnell, Dill, Panos, Amano, Brown, Giurgius and Miller2020) evaluate the ability of SVF (animals) and PVF (FAS) to discriminate among normal aging, amnestic-MCI (a-MCI) and AD. Their study included a sample of 332 participants with a mean age equal to 65.75 (99 healthy older adults, 90 a-MCI and 43 AD patients). Results indicate that SVF and PVF were significant predictors of diagnostic classification, but semantic fluency explained a greater amount of the discriminant ability of the model. These results suggest from a practical standpoint that, particularly, SVF may be an accurate and efficient tool in screening early dementia in time-limited clinical/medical settings. This research also supported the fact that VF, especially SVF, declines progressively in AD, with semantic fluency deteriorating faster than phonemic fluency as the disease advances.
There is only one previous study showing that VF tasks with higher executive load are useful for detecting cognitive deficits in the preclinical stage of AD. Macoir et al. (Reference Macoir, Tremblay and Hudon2022) conducted a research in which a total of 60 adults with SCD and 60 HCs, all of them French-speaking Canadians, performed one free action (verb) fluency task, an alternating fluency task (between words beginning with the letter T and words belonging to the clothes category) and an orthographic constraint fluency task (names of animals whose written form did not involve the letter A). The performance of the participants with SCD and the HCs in the free action fluency task was similar. However, HCs performed significantly better than SCD in the alternating fluency task, which involved cognitive flexibility, and the orthographic constraint fluency task, which required inhibition.
5. Conclusions and limitations of the study
In HC, task performance revealed a clear gradient of difficulty aligned with increasing executive demands, ranging from superior performance in SVF to progressively lower output in the Orthographic constraint, PVF and Alternating fluency tasks.
While participants with SCD matched HCs in traditional VF tasks (SVF and PVF), their reduced performance in the more demanding tasks – particularly those requiring cognitive flexibility and inhibition – supports the hypothesis that VF tasks with higher executive load are sensitive indicators of early cognitive changes in the preclinical phase of AD. These findings align with prior research emphasizing early executive dysfunction in SCD and reinforce the diagnostic value of complex VF paradigms.
Results obtained in the study also highlight the need to consider education not as a categorical variable but as a continuous covariate in cognitive assessment. This approach enhances the precision and fairness of neuropsychological evaluations.
Finally, from a clinical point of view, the study advocates for the integration of both traditional and executive-loaded VF tasks into routine neuropsychological assessments. While classic SVF and PVF tasks remain useful for identifying cognitive decline in more advanced stages (e.g., MCI), tasks such as Alternating VF and Orthographic constraint VF offer significant diagnostic accuracy for detecting early executive deficits, particularly in SCD. Establishing task-specific cut-off scores further contributes to the development of reliable and efficient screening tools for early detection and monitoring of AD-related cognitive decline.
In sum, VF tasks – especially those incorporating higher executive demands – should be regarded as essential components in the early identification and longitudinal monitoring of neurodegenerative processes.
It should be noted as a limitation of the study that results are obtained from a small sample of participants in each group, which limits generalizability. The SCD sample shows some degree of heterogeneity in terms of gender, age and educational background; however, its representativeness of the broader population of cognitively healthy individuals with subjective cognitive complaints is limited. The recruitment strategy, although designed to access individuals with varied cognitive and sociodemographic profiles, relied on convenience sampling. These institutions are more likely to attract older adults who are socially engaged, potentially more health conscious and possibly better functioning than the general population of older adults with subjective cognitive complaints; thus, this issue should be considered when assessing the scope of the results obtained (Snitz et al., Reference Snitz, Wang, Cloonan, Jacobsen, Chang, Hughes and Ganguli2018).
The cross-sectional design used is also a limitation that should be overcome in future longitudinal studies, given that SCD with more attentional/executive and linguistic difficulties seems to have a higher risk of AD (Valech et al., Reference Valech, Tort-Merino, Coll-Padrós, Olives, León, Rami and Molinuevo2017). Larger and homogeneous clinical cohorts are also specific goals for future studies. Additionally, future analyses could benefit from incorporating qualitative metrics such as clustering and switching, which may offer further insight into the specific cognitive mechanisms affected in preclinical stages of dementia.
Data availability statement
The data and syntax code describing all statistical analyses, and the detailed description of the data structure (included in the data file), are accessible at: https://osf.io/tbrgm/?view_only=344126762cb24431947fbe540b4c424d
Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.
Competing interests
The authors declare none.