Statement of Research Significance
Research Question(s) or Topic(s):
-
• Cognitive and affective symptoms in patients with cerebellar disorders are referred to as the cerebellar cognitive affective syndrome (CCAS).
-
• To evaluate cognitive deficits as part of CCAS, a brief screener – the CCAS scale – was developed.
-
• Considering indications of suboptimal validity, this study aimed to evaluate the validity of the CCAS scale using a gold standard neuropsychological examination and a control group of cerebellar patients.
Main Findings:
-
• Sensitivity of the CCAS scale was acceptable, but specificity was insufficient due to high false-positive rates.
-
• Correlations were found between outcomes of the scale and both education and age.
Study Contributions:
-
• Our findings call for age- and education-dependent reference values, which may improve the validity and usability of the scale.
Introduction
Patients with cerebellar disorders typically have motor- and vestibular-related dysfunctions like ataxia, but also commonly have cognitive and affective symptoms (Hernández-Torres et al., Reference Hernández-Torres, Montón, Hess Medler, de Nóbrega and Nieto2021; Mak et al., Reference Mak, Tyburski, Madany, Sokolowski and Samochowiec2016; Reumers et al., Reference Reumers, Schellekens, Lugtmeijer, Maas, Verhoeven, Boot, Ekker, Tuladhar, van de Warrenburg, Schutter, Kessels, de Leeuw, van Alebeek, Norden, Brouwers, Arntz, van Dijk, Gons and van Uden2024, Reference Reumers, Bongaerts, de Leeuw, van de Warrenburg, Schutter and Kessels2025). This is seen as the third cornerstone in clinical ataxiology and is referred to as the Cerebellar Cognitive Affective Syndrome (CCAS) (Manto & Mariën, Reference Manto and Mariën2015; Schmahmann & Sherman, Reference Schmahmann and Sherman1998). CCAS has been reported in a wide variety of cerebellar disorders, including those with structural lesions and degenerative ataxias (Hadjivassiliou et al., Reference Hadjivassiliou, Martindale, Shanmugarajah, Grünewald, Sarrigiannis, Beauchamp, Garrard, Warburton, Sanders, Friend, Duty, Taylor and Hoggard2017; Ramirez-Zamora et al., Reference Ramirez-Zamora, Zeigler, Desai and Biller2015). The syndrome is characterized by impaired executive function, disturbed spatial cognition, personality changes, and language deficits (Schmahmann & Sherman, Reference Schmahmann and Sherman1998). In recent years, the neurocognitive and affective profile has been described in more detail, including deficits in information processing speed, response inhibition, verbal fluency and memory, behavioral problems, and emotional disturbances (Ahmadian et al., Reference Ahmadian, van Baarsen, van Zandvoort and Robe2019; Hoche et al., Reference Hoche, Guell, Vangel, Sherman and Schmahmann2018; Wolf et al., Reference Wolf, Rapoport and Schweizer2009). Evaluating impairments as part of CCAS usually requires an extensive neuropsychological assessment, as standard screening instruments such as the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA) have a more general focus and are not specific to cerebellar deficits. Since patients usually perform in the clinically unimpaired range on these screeners, the lack of sensitivity makes them not suitable to differentiate between CCAS patients and controls (Alan et al., Reference Alan, Ennabe, Alsarafandi, Malik, Laws and Weinand2024). However, extensive neuropsychological assessment is time-consuming and not always feasible in clinical practice. Therefore, a brief and easy-to-administer bedside tool – the CCAS scale – was developed to examine cognitive and affective functioning in cerebellar patients (Hoche et al., Reference Hoche, Guell, Vangel, Sherman and Schmahmann2018).
The original CCAS scale has been developed (n = 77) and validated (n = 39) in a US cohort and included patients with isolated cerebellar or complex cerebrocerebellar disorders (Hoche et al., Reference Hoche, Guell, Vangel, Sherman and Schmahmann2018). Outcomes were categorized into “possible”, “probable”, and “definite” CCAS, with a sensitivity of 46–95% and specificity of 78–100% for the different outcomes. The scale was recently evaluated in a larger US sample (n = 309), describing a sensitivity of 46–83% and specificity of 46–95% (Selvadurai et al., Reference Selvadurai, Perlman, Ashizawa, Wilmot, Onyike, Rosenthal, Shakkottai, Paulson, Subramony, Bushara, Kuo, Dietiker, Geschwind, Nelson, Gomez, Opal, Zesiewicz, Hawkins and Schmahmann2024). The CCAS scale has been translated into several other languages, and diagnostic properties have been evaluated in cohorts comprising different etiologies (de Oliveira Scott et al., Reference de Oliveira Scott, Pedroso, Elias, Nóbrega, Sobreira, de Almeida, Gama, Massuyama, Barsottini, Frota and Braga-Neto2023; Guo et al., Reference Guo, Zhang, Chen, Wang, Yuan and Xie2024; Maas et al., Reference Maas, Killaars, van de Warrenburg and Schutter2021; Rodríguez-Labrada et al., Reference Rodríguez-Labrada, Batista-Izquierdo, González-Melix, Reynado-Cejas, Vázquez-Mojena, Sanz and Velázquez-Pérez2021; Szabó-Műhelyi et al., Reference Szabó-Műhelyi, Szabó, Schmahmann, Káldi, Bánréti, Béres-Molnár and Folyovich2024; Thieme et al., Reference Thieme, Roeske, Faber, Sulzer, Minnerop, Elben, Jacobi, Reetz, Dogan, Barkhoff, Konczak, Wondzinski, Siebler, Mueller, Sure, Schmahmann, Klockgether, Synofzik and Timmann2020, Reference Thieme, Faber, Sulzer, Reetz, Dogan, Barkhoff, Krahe, Jacobi, Aktories, Minnerop, Elben, van der Veen, Müller, Batsikadze, Konczak, Synofzik, Roeske and Timmann2022; Van Overwalle et al., Reference Van Overwalle, De Coninck, Heleven, Perrotta, Taib, Manto and Mariën2019). Results of these studies indicate suboptimal discriminative ability between patients and controls due to high false-positive rates, and attempts have been made to obtain more optimal threshold values. The issue of high false-positive rates may be caused by the fact that the scale does not take the effects of age and educational level into account (Thieme et al., Reference Thieme, Röske, Faber, Sulzer, Minnerop, Elben, Reetz, Dogan, Barkhoff, Konczak, Wondzinski, Siebler, Hetze, Müller, Sure, Klockgether, Synofzik and Timmann2021). Furthermore, the role of dysarthria in the assessment of CCAS-related deficits has recently been stressed (Corben et al., Reference Corben, Blomfield, Tai, Bilal, Harding, Georgiou-Karistianis and Vogel2024). Dysarthria may overestimate cognitive verbal fluency deficits and thereby contribute to false-positive outcomes. However, several studies indicate the presence of language deficits even after dysarthria was taken into account, suggesting that poor test performance may be partially, but not fully explained by dysarthria (Cocozza et al., Reference Cocozza, Costabile, Tedeschi, Abate, Russo, Liguori, Del Vecchio, Paciello, Quarantelli, Filla, Brunetti and Saccà2018; Stoodley & Schmahmann, Reference Stoodley and Schmahmann2009).
Despite indications of suboptimal validity, the scale was already used in multiple studies to describe CCAS in specific etiologies or assess its prevalence (Abderrakib et al., Reference Abderrakib, Ligot and Naeije2022; Bolzan et al., Reference Bolzan, Müller Eyng, Leotti, Saraiva-Pereira and Jardim2024; Destrebecq et al., Reference Destrebecq, Comet, Deveylder, Alaerts and Naeije2023; Destrebecq & Naeije, Reference Destrebecq and Naeije2023; Dujardin et al., Reference Dujardin, Tard, Diglé, Herlin, Mutez, Davion, Wissocq, Delforge, Kuchcinski and Huin2024; Naeije et al., Reference Naeije, Rai, Allaerts, Sjogard, De Tiège and Pandolfo2020; Selvadurai et al., Reference Selvadurai, Perlman, Ashizawa, Wilmot, Onyike, Rosenthal, Shakkottai, Paulson, Subramony, Bushara, Kuo, Dietiker, Geschwind, Nelson, Gomez, Opal, Zesiewicz, Hawkins and Schmahmann2024). Using the CCAS scale as a reliable screen presents a challenge as it has not yet been validated against extensive neuropsychological assessment, which is considered the gold standard for identifying cognitive dysfunction (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012). However, there is no consensus on the exact set of neuropsychological tests to establish or refute a CCAS diagnosis. Studies evaluating the scale thus far involved cerebellar patients without further specifying whether these individuals actually fulfilled the criteria of CCAS. It is possible that not all cerebellar patients have CCAS, or that a substantial proportion of patients have a mild cognitive impairment, as recently indicated in a large multicenter study (Liu et al., Reference Liu, Rubarth, Faber, Sulzer, Dogan, Barkhoff, Minnerop, Berlijn, Elben, Jacobi, Aktories, Huvermann, Erdlenbruch, Van der Veen, Müller, Nio, Frank, Köhrmann and Wondzinski2024). Demonstrating that the CCAS scale discriminates between cerebellar patients and a healthy control group is insufficient to determine the validity, as in clinical practice it will be used to distinguish between cerebellar patients with CCAS and those without. Therefore, a comparison with a gold standard to distinguish patients with and without actual CCAS is needed to establish the scale’s diagnostic accuracy and clinical value.
It is evident that a valid CCAS screen is essential to accurately identify cognitive and affective deficits in persons with cerebellar disorders, both in neuropsychological research and clinical practice (Chirino-Pérez et al., Reference Chirino-Pérez, Marrufo-Meléndez, Muñoz-López, Hernandez-Castillo, Ramirez-Garcia, Díaz, Nuñez-Orozco and Fernandez-Ruiz2021; Gok-Dursun et al., Reference Gok-Dursun, Gultekin-Zaim, Tan and Unal-Cevik2021; Kotkowski et al., Reference Kotkowski, Price, Blevins, Franklin, Woolsey, DeFronzo, Blangero, Duggirala, Glahn, Schmahmann and Fox2021). Validation and possible improvement of the scale, considering the aforementioned restraints, is hence important. In this study, we will examine two research questions: (1) what is the validity of the CCAS scale when compared to a control group of cerebellar patients, using a gold standard neuropsychological examination? And (2) what is the influence of dysarthria on neuropsychological test outcomes and CCAS scale validity?
Method
Participants and procedure
Data were collected at the Neurology department of Radboud University Medical Center and the Donders Centre for Cognition (DCC) in the Netherlands. Three groups of participants were included: (1) cerebellar patients with CCAS, as determined by an extensive neuropsychological examination, (2) cerebellar patients without CCAS, and (3) healthy controls (HC). Cerebellar patients included all types of degenerative cerebellar ataxias and cerebellar strokes. Minimum required sample sizes were 40, 30, and 30, respectively, as determined by a power calculation for receiver operating characteristic (ROC) analysis (power > 0.9, AUC > 0.8) with an alpha level of significance of 0.05 (two-tailed). Cerebellar patients were part of a randomized controlled trial with CCAS patients in the Netherlands (Dutch Trial Register: NL9121) or visited the outpatient clinic of the neurology or rehabilitation departments of our center. Since we did not know in advance whether the cerebellar patients would have CCAS or not, we unintentionally included more CCAS patients than the predetermined sample size. The HC group was recruited through a pool of healthy volunteers at the DCC who participated in a larger neuropsychological test battery. All eligible participants were 18 years or older and fluent in Dutch. Exclusion criteria were any (comorbid) neurological or psychiatric disorders (self-report). This study was approved by the medical ethics committee (CMO Arnhem-Nijmegen, 2021-8296), as well as the trial (CMO Arnhem-Nijmegen, NL73572.091.20) and the study at the DCC (ECSW2017-2306-520). This study was preregistered (AsPredicted#147624), and all participants provided written informed consent before inclusion, in accordance with the declaration of Helsinki.
Since clear criteria for the diagnosis of CCAS are lacking, we considered an extensive neuropsychological test battery including CCAS-related cognitive domains as the gold standard, using established cut-off scores to classify an individual as cognitively impaired. This allowed us to include cerebellar patients with and without CCAS, as well as HC. All patients were assessed with the CCAS scale (± 20 min) and the full neuropsychological test battery (± 60 min). Both were administered with at least one week in between to avoid practice, interference, or fatigue effects. All HC were administered the CCAS scale and the MoCA. The MoCA was used to ensure that no cognitive impairments were present; all controls scored ≥26 points (range: 26–30) (Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin, Cummings and Chertkow2005). Cognitive assessment was performed in a standardized manner with trained assessors. Background variables included age, sex, education level (Verhage, Reference Verhage1965), disease stage based on ambulatory status (Klockgether et al., Reference Klockgether, Lüdtke, Kramer, Abele, Bürk, Schöls and Dichgans1998), SARA (scale for the assessment and rating of ataxia) score, disease duration/time since stroke, and clinical diagnosis.
Gold standard – neuropsychological test battery
An extensive test battery was used as the gold standard to establish or refute a CCAS diagnosis. This battery consisted of widely used, reliable, and validated neuropsychological tests and included the cognitive domains that are typically compromised in CCAS, as summarized in Table 1. For the majority of tests, raw outcomes were converted into age-, sex-, and education-corrected z-scores with the use of normative values from a large Dutch dataset (de Vent et al., Reference de Vent, Agelink van Rentergem, Schmand, Murre and Huizenga2016). For the Emotion Recognition Test, published normative data were used and converted into age-, education-, and sex-adjusted z-scores (Kessels et al., Reference Kessels, Montagne, Hendriks, Perrett and de Haan2014). Impairment of individual test outcomes was defined by a z-score below 1.5 standard deviation (SD) from the normative mean. A patient was classified as cognitively impaired when a patient scored three or more test outcomes below 1.5 SD or two or more tests below 2 SD (Fischer et al., Reference Fischer, Kunkel, Bublak, Faiss, Hoffmann, Sailer, Schwab, Zettl and Köhler2014). Cognitive performance validity was assessed using the Reliable Digit Span measure as an embedded performance validity measure (Webber & Soble, Reference Webber and Soble2018).
Table 1. Test items of the test battery and CCAS scale, with corresponding domains and outcomes, and with maximal scores and threshold values for fails, respectively

The CCAS scale
The CCAS scale was developed by Hoche et al. (Hoche et al., Reference Hoche, Guell, Vangel, Sherman and Schmahmann2018) and has four parallel versions to attenuate learning effects and facilitate test–retest reliability. Only version 1A has undergone psychometric testing in individuals with cerebellar ataxia, therefore the authorized Dutch translation of version 1A was used in this study (Mariën et al., Reference Mariën, Van Overwalle, Van de Warrenburg, Kessels and Schmahmann2021). A Dutch administration procedure was also developed (see Supplementary Information 1). The CCAS scale consists of 10 scored items, which are also listed in Table 1. All test items have objective scoring criteria, except for the Affect item which entails a subjective rating of neuropsychiatric domains by the assessor, with input from the patient and caregiver. Each item was scored and the total raw score of the scale ranges from 0 to 120, with lower scores reflecting worse cognitive performance. For each test item, there is also a threshold score to determine a pass or fail. According to the evaluation criteria of the developers (Hoche et al., Reference Hoche, Guell, Vangel, Sherman and Schmahmann2018), one failed item is indicative for “possible” CCAS, two fails for “probable” CCAS, and three or more fails for “definite” CCAS.
Dysarthria correction
As an addition to this study, the influence of dysarthria in patients on test results was examined. Dysarthria was quantified using the PATA Rate Task (PRT) for speech rate; patients were instructed to repeat “PA-TA” as quickly as possible in 10 s and the number of correct repetitions was the score. Two trials were performed, and the average of both was calculated. The PRT score was applied to correct for dysarthria on the timed neuropsychological tests (Semantic Verbal Fluency test and the Stroop Color-Word Test) if the PRT score was below the lower limit of normality threshold (Schmitz-Hübsch et al., Reference Schmitz-Hübsch, Giunti, Stephenson, Globas, Baliko, Saccá, Mariotti, Rakowicz, Szymanski, Infante, van de Warrenburg, Timmann, Fancellu, Rola, Depondt, Schöls, Zdzienicka, Kang and Klockgether2008). This limit was calculated using previously published normative values of the PRT mean and SD in controls, resulting in a PRT score of 15.6 (mean – 2 × SD = 28.84 – 2 × 6.6) (Pane et al., Reference Pane, Costabile, Salvati, Aurisicchio, Abate, Liguori, Paciello, Peluso, Manganelli, De Michele, Filla and Saccà2018; Saccà et al., Reference Saccà, Costabile, Abate, Liguori, Paciello, Pane, De Rosa, Manganelli, De Michele and Filla2018).
For correction of the Semantic Verbal Fluency test score, the formula previously established by Saccà et al., was used (Saccà et al., Reference Saccà, Costabile, Abate, Liguori, Paciello, Pane, De Rosa, Manganelli, De Michele and Filla2018). This formula used the PRT to calculate a corrected time in which the test could be performed. Since we performed the PRT after the Fluency test, we subsequently calculated the corrected score per ratio with the corrected time (corrected score
$\rm={{corrected\ time}\ \over 60}{\times\ original\ score}$
).
For correction of the Stroop Color-Word test score, we established a formula based on the formulas provided by Saccà et al. (Reference Saccà, Costabile, Abate, Liguori, Paciello, Pane, De Rosa, Manganelli, De Michele and Filla2018). Correction could only be applied to Stroop cards II and III scores, as the score of card I was used as a reference. The following formula was used: corrected time =
$\rm{}\left({{original\ time \ \times\ repetition\ ratio} \over {lower\ limit\ of\ normality\ \times \ PRT\ score}}\right)+{cognition\ time}$
). The original time is the uncorrected Stroop score of card II or III. The repetition ratio is the normative mean of card I divided by the normative mean of card II or III. For card II this is 0.752 (43.50/57.87), and for card III this is 0.416 (43.50/104.52) (Schmand et al., Reference Schmand, Houx, de Koning, Gerritsen, Hoogman, Muslimovic, Rienstra, Saan, Schagen, Schilt, Spikman and van Tricht2012). The cognition time is the original time minus the original time, multiplied by the repetition ratio (Saccà et al., Reference Saccà, Costabile, Abate, Liguori, Paciello, Pane, De Rosa, Manganelli, De Michele and Filla2018).
Statistical analyses
Analyses were performed using IBM SPSS Statistics 27.0 (SPSS, Inc., Chicago, IL, USA). Means with SDs or medians with interquartile ranges were reported, as appropriate. Between-group differences in CCAS scale outcomes were tested using (Quade’s) ANCOVA with education as covariate. Cronbach’s alpha was calculated to report internal consistency of the scale, considered acceptable if ≥0.7 (Nunnally, Reference Nunnally1967). Construct validity was evaluated in terms of item-total correlations. Sensitivity and specificity were calculated to report on validity and accuracy; minimum values for acceptable sensitivity and specificity were 80% and 60%, respectively (Blake et al., Reference Blake, McKinney, Treece, Lee and Lincoln2002). ROC analyses were performed to determine the AUC as a discriminating measure. AUC values <0.7 were considered poor, between 0.7 and 0.8 as acceptable, between 0.8 and 0.9 as good, and between 0.9 and 1.0 as excellent (F. Li & He, Reference Li and He2018). In addition, we explored whether the validity of the CCAS scale could be improved by simple adaptations. We assessed whether deleting test items with many false-positive results would improve the validity. Furthermore, we applied a simple correction for lower education, where we reduced the number of fails by one for patients with Verhage education level ≤ 5 (comparable to ≤ 12 education years). Also, we applied the correction formula controlling for age, sex, and education effects from the recent study by Liu et al. (Reference Liu, Rubarth, Faber, Sulzer, Dogan, Barkhoff, Minnerop, Berlijn, Elben, Jacobi, Aktories, Huvermann, Erdlenbruch, Van der Veen, Müller, Nio, Frank, Köhrmann and Wondzinski2024) on our data, to evaluate whether this could improve the validity. Most optimal threshold values for individual CCAS scale items were determined using the Youden index, where most of the focus was on obtaining acceptable specificity (Youden, Reference Youden1950). The relationship of the CCAS scale with age, sex, education, disease duration/time since stroke, and disease stage was examined using (Spearman or Point-Biserial) correlations, as well as the coherence of the scale with the gold standard. Since this was exploratory, no correction for multiple testing was applied. Statistical significance was set at 0.05 (two-tailed) for all tests. The anonymized datasets used and analyzed during this study are available from the corresponding author on request from a qualified investigator.
Results
Participant characteristics
According to the gold standard, 49 cerebellar patients were classified as having CCAS, and 30 cerebellar patients were classified as not having CCAS, serving as “cerebellar controls” (CC). Patients were heterogeneous in terms of diagnoses; 19 had a cerebellar stroke, and 60 had a degenerative cerebellar ataxia (details are provided in Supplementary Information 2). Additionally, 32 HC were included. Characteristics per group are shown in Table 2. Validity of the outcomes was evaluated by comparing CCAS patients with CC, or with CC and HC combined. Considering these comparisons, age (CCAS vs. CC: p = .590; CCAS vs. CC + HC: p = .071) and sex (CCAS vs. CC: p = .172; CCAS vs. CC + HC: p = .703) did not significantly differ. Proportions regarding disease stage were also not significantly different (CCAS vs. CC: p = .148). Education level was slightly, yet significantly different from the CCAS patients in both CC (p = .039) and CC + HC (p < .001). The average time between the assessment of the gold standard and CCAS scale was 23.5 ± 77.4 days.
Table 2. Participant characteristics per group

SARA = scale for the assessment and rating of ataxia
a n = 47.
Gold standard – test battery performance
The outcomes of the test battery are shown in Figure 1. The cognitive performance of all patients was considered valid as evaluated using the Reliable Digit Span measure as an embedded performance validity test. Small differences were found in most outcomes between CC and CCAS patients, except for the three subtests of the Stroop Color-Word Test. This test was most frequently impaired in CCAS patients and showed the lowest z-scores with the most variance. Especially Stroop card III seemed to discriminate well, since none of the CC showed impaired performance, while 59% of the CCAS patients did. All z-scores and raw scores per test outcome are provided in Supplementary Information 3.

Figure 1. Combined boxplot with z-scores of test battery outcomes, including percentages of patients with impairment (<1.5 SD). Boxes represent outcomes within the 1st and 3rd quantile, with solid lines indicating medians and dashed lines indicating means. Whiskers represent the minimum and maximum values (without outliers); outliers are indicated by dots. RAVLT = Rey Auditory Verbal Learning Test; ROCF = Rey–Osterrieth Complex Figure; BSAT = Brixton Spatial Anticipation Test; ERT = Emotion Recognition Test.
CCAS scale performance
Outcomes of the CCAS scale per group are listed in Table 3. Between-group differences were assessed with correction for education since this significantly differed between the groups. The number of failed items was significantly higher for CCAS patients than CC (F(1,76) = 12.38, p < .001) and CC with HC combined (F(1,108) = 30.07, p < .001). The total score was significantly lower for CCAS patients than CC (F(1,76) = 16.51, p < .001) and CC with HC combined (F(1,108) = 38.45, p < .001). The high percentage of failed test items in the control groups for phonemic fluency, category switching, digit span forward, verbal recall, and affect is notable. Cronbach’s alpha for reliability was 0.707 considering the ten items of the CCAS scale. Deletion of certain test items would not result in higher alpha scores. However, three test items had low item-total correlations: cube draw/copy (r = .171), similarities (r = .253), and go/no-go (r = .196).
Table 3. Outcomes of the CCAS scale per group

Variables are presented as means with standard deviations or frequencies with percentages.
Validity
Sensitivity and specificity were obtained for each of the CCAS scale outcomes; one fail (“possible”), two fails (“probable”), and ≥ three fails (“definite”). Sensitivity was 98%, 88%, and 65% for, respectively, one, two, or ≥ three fails. The specificity when considering only the CC was 10%, 43%, and 67% for one, two, or ≥ three fails, respectively. Specificity was higher when considering both cerebellar and healthy controls: 32%, 68%, and 81% for, respectively, one, two, or ≥ three fails.
The discriminative ability of the CCAS scale as evaluated by ROC analyses is shown in Figure 2. The AUC yielded 0.743 (95% CI: 0.634–0.851) for failed items and 0.762 (95% CI: 0.654–0.869) for the total score when considering only CC. AUC values yielded 0.836 (95% CI: 0.763–0.910) and 0.851 (95% CI: 0.782–0.919), respectively, when considering both cerebellar and healthy controls together.

Figure 2. ROC curves for CCAS scale failed items (in red) and total score (in blue), the green line indicates the reference line. (a) shows the ROC curve only for cerebellar controls. (b) shows the ROC curve for cerebellar and healthy controls combined.
Optimal threshold values
Sensitivity and specificity for individual test items based on either the original scale’s thresholds or the most optimal thresholds determined by the Youden Index are shown in Table 4. Low sensitivities for the majority of test items with original threshold values are improved when Youden threshold values are taken. However, specificity decreases due to the higher Youden threshold values, and their application would lead to more false-positive results and worse overall specificity of the scale (8–44%). Note that the Youden threshold for the similarities and go/no-go items is the maximum score of that item.
Table 4. Sensitivity and specificity of CCAS scale test items, with original threshold values and thresholds determined by Youden Index

Correlations
In our entire sample, outcomes of the CCAS scale were correlated with age (total score ρ = -.328, p < .001; failed items ρ = .290, p = .002) and education (total score ρ = .439, p < .001; failed items ρ = −.508, p < .001). No significant correlations were found for sex, disease duration/time since stroke, and disease stage. Test items of the CCAS scale that were comparable to outcomes of the gold standard test battery were also examined, and moderate correlations were found for the semantic fluency tests (ρ = .637, p < .001) and digit span forward (ρ = .572, p < .001). Significant correlations were found for the digit span backward (ρ = .389, p < .001) and delayed recall (ρ = .372, p < .001). A small correlation was found for the visuospatial items (Rey-Osterrieth Complex Figure copy and Cube draw) with ρ = .289, p = .010.
For the verbal fluency items, CCAS scale outcomes were also compared to age-, sex-, and education-corrected Dutch normative data (de Vent et al., Reference de Vent, Agelink van Rentergem, Schmand, Murre and Huizenga2016). Of the 36 CCAS patients who failed the phonemic fluency item of the CCAS scale, 21 patients were not impaired on this test (z-score < −1.5 SD) according to the normative data. On the other hand, of the 13 patients who failed the semantic fluency item, only two patients were not impaired on this test according to the normative data, while 11 patients with impairments according to the normative data (z-score < −1.5 SD) did not fail the CCAS scale item.
Dysarthria correction
CCAS patients scored significantly lower on the PRT than CC (p < .001). The eighteen patients who scored below the lower limit of normality (range: 8–15.5) all had CCAS. For these patients, outcomes of the semantic fluency and Stroop Color-Word Test in the gold standard test battery were corrected, resulting in different group allocations. Six patients (out of 18; 33%) were defined as having CCAS without the correction, and as cerebellar control with the correction. Dysarthria correction did not affect the validity of the scale considerably, since the corrected sensitivity (65–100%) and specificity (11–61%) were similar to those before correction. PRT performance was found to be correlated with both the total score (ρ = .401, p < .001) and the number of fails (ρ = −.373, p < .001) of the CCAS scale. Correlations between the PRT and the individual test items of the scale were significant for semantic (ρ = .305, p = .007) and phonemic (ρ = .257, p = .024) fluency, category switching (ρ = .262, p = .021), and digit span forward (ρ = .302, p = .008).
Adaptations
The influence of simple adaptations on the validity of the CCAS scale was explored. When deleting test items with many false-positive results, such as the phonemic fluency item, the specificity increased, but not sufficiently. The specificity when considering only the CC was 20%, 53%, and 80% for one, two, or ≥ three fails, respectively. When applying a simple correction for lower education, where we reduced the number of fails by one for patients with Verhage education level ≤ 5 (comparable to ≤ 12 education years), the specificity slightly increased, but the diagnostic accuracy remained insufficient. The specificity when considering only the CC was 27%, 50%, and 80% for one, two, or ≥ three fails, respectively. Applying the age, sex, and education correction as established by Liu et al. (Reference Liu, Rubarth, Faber, Sulzer, Dogan, Barkhoff, Minnerop, Berlijn, Elben, Jacobi, Aktories, Huvermann, Erdlenbruch, Van der Veen, Müller, Nio, Frank, Köhrmann and Wondzinski2024) resulted in a sensitivity of 86% and a specificity of 37% when considering only CC. The specificity was 52% when also HC were included.
Discussion
This was the first validation study of the CCAS scale that explicitly distinguished between patients with and without a proven CCAS, as defined by a gold standard neuropsychological examination. Patients with CCAS scored significantly worse on the scale compared to CC and all controls combined. The sensitivity of the scale was acceptable (65–98%), but specificity was insufficient, especially when only considering CC (10–67%). ROC analyses showed acceptable discriminative ability at the group level, but validity at the individual level is poor due to the frequent presence of false-positive outcomes in both control groups. Outcomes of the scale were correlated with education and age and should be taken into account for improving the validity of the CCAS scale.
Insufficient specificity due to high false-positive rates of the CCAS scale was found, leading to overdiagnosis of CCAS as a clinical syndrome. This is in line with other recent studies, that challenge the evaluation criteria of Hoche et al. (Reference Hoche, Guell, Vangel, Sherman and Schmahmann2018) for the diagnosis of CCAS (Alan et al., Reference Alan, Ennabe, Alsarafandi, Malik, Laws and Weinand2024; Maas et al., Reference Maas, Killaars, van de Warrenburg and Schutter2021; Selvadurai et al., Reference Selvadurai, Perlman, Ashizawa, Wilmot, Onyike, Rosenthal, Shakkottai, Paulson, Subramony, Bushara, Kuo, Dietiker, Geschwind, Nelson, Gomez, Opal, Zesiewicz, Hawkins and Schmahmann2024; Thieme et al., Reference Thieme, Faber, Sulzer, Reetz, Dogan, Barkhoff, Krahe, Jacobi, Aktories, Minnerop, Elben, van der Veen, Müller, Batsikadze, Konczak, Synofzik, Roeske and Timmann2022). High false-positive rates are probably caused by a variety of performance among different age groups and education levels. We found that control participants without cognitive impairments commonly failed test items that are age- and education-sensitive, such as verbal fluency, category switching, and verbal recall. CCAS scale outcomes were significantly correlated with age and education level, but these aspects are currently not taken into account in the interpretation of outcomes. Moreover, the discrepancies we found between verbal fluency outcomes of the CCAS scale and outcomes corrected with normative data further illustrate the importance of age and education level in the interpretation of outcomes. The fact that education and age are essential factors to consider has been stressed before by Thieme et al., who observed more false-positive outcomes in control participants with lower education (Thieme et al., Reference Thieme, Röske, Faber, Sulzer, Minnerop, Elben, Reetz, Dogan, Barkhoff, Konczak, Wondzinski, Siebler, Hetze, Müller, Sure, Klockgether, Synofzik and Timmann2021). The authors of the CCAS scale replied that a correction for education could indeed be required when testing populations with lower educational levels (Schmahmann et al., Reference Schmahmann, Vangel, Hoche, Guell and Sherman2021). In accordance with the first comment about the importance of age and education by Thieme et al. (Reference Thieme, Röske, Faber, Sulzer, Minnerop, Elben, Reetz, Dogan, Barkhoff, Konczak, Wondzinski, Siebler, Hetze, Müller, Sure, Klockgether, Synofzik and Timmann2021) and our results, other studies also found a relationship with education and/or age, suggesting that a correction for both factors could improve the scale’s properties (Rodríguez-Labrada et al., Reference Rodríguez-Labrada, Batista-Izquierdo, González-Melix, Reynado-Cejas, Vázquez-Mojena, Sanz and Velázquez-Pérez2021; Thieme et al., Reference Thieme, Faber, Sulzer, Reetz, Dogan, Barkhoff, Krahe, Jacobi, Aktories, Minnerop, Elben, van der Veen, Müller, Batsikadze, Konczak, Synofzik, Roeske and Timmann2022).
Although the CCAS scale has insufficient validity in its current form, we explored whether its usefulness for clinical and scientific purposes would be improved by adaptations. Deleting test items with many false-positive results did not improve the validity or reliability. Also, applying a simple correction for lower education did not improve the diagnostic accuracy. Another approach was to adjust the threshold values. Adjusting thresholds per test item is not recommended, as the thresholds proposed by the Youden Index did not increase the validity of the scale. Application of a threshold for the total score or number of failed items was also considered but did not yield acceptable sensitivity in the group of cerebellar patients. Thus, a more detailed adjustment for education and age will be required. The recent study by Liu et al. (Reference Liu, Rubarth, Faber, Sulzer, Dogan, Barkhoff, Minnerop, Berlijn, Elben, Jacobi, Aktories, Huvermann, Erdlenbruch, Van der Veen, Müller, Nio, Frank, Köhrmann and Wondzinski2024) employed a correction formula that controls for age, sex, and education effects, aiming for an improved evaluation of the CCAS scale outcomes. However, when applying this correction to the data in our cohort, the sensitivity (86%) and specificity (CC: 37%, CC + HC: 52%) unfortunately remained insufficient (Liu et al., Reference Liu, Rubarth, Faber, Sulzer, Dogan, Barkhoff, Minnerop, Berlijn, Elben, Jacobi, Aktories, Huvermann, Erdlenbruch, Van der Veen, Müller, Nio, Frank, Köhrmann and Wondzinski2024). An alternative is that normative values should be established in larger samples, similar to what has been done previously for the MoCA (Kessels et al., Reference Kessels, de Vent, Bruijnen, Jansen, de Jonghe, Dijkstra and Oosterman2022). Education- and/or age-stratified reference values will have to be established before the scale can be recommended for reliable use in daily clinical practice.
CCAS scale items assumed to reflect the same cognitive processes as the test battery outcomes were not strongly correlated. For some items, like semantic fluency, we would expect a strong correlation because this item was exactly the same as in the gold standard test battery. Therefore, we examined whether the scores were systematically higher at the second assessment to detect a learning or practice effect, but this turned out not to be the case. Furthermore, we found that the Stroop Color-Word Test was most frequently impaired in patients and discriminated best between patients with CCAS and those without, even after dysarthria correction. A similar item is not included in the CCAS scale and could be considered.
All patients who scored below the lower limit of the normality threshold of the PRT, indicative of dysarthria, also had CCAS. Although dysarthria did not seem to affect the validity of the CCAS scale, 33% of patients with dysarthria (6/18) were classified as having CCAS without dysarthria correction, but as cerebellar control with the correction. PRT performance was correlated with the total CCAS scale score, but also with the fluency items, category switching, and digit span forward, which were commonly failed in the cerebellar control group. A previous study on Friedreich’s ataxia found similar significant relationships between measures of speech and verbal test items (Corben et al., Reference Corben, Blomfield, Tai, Bilal, Harding, Georgiou-Karistianis and Vogel2024). Articulation speed as a potential confounding factor has been mentioned before, as well as the suggestion to correct this in the timed items of the CCAS scale (Chirino-Pérez et al., Reference Chirino-Pérez, Marrufo-Meléndez, Muñoz-López, Hernandez-Castillo, Ramirez-Garcia, Díaz, Nuñez-Orozco and Fernandez-Ruiz2021). Failure on these items may reflect slower speech production rather than cognitive deficits in verbal fluency, although deficits may still be evident when dysarthria is considered (Bolceková et al., Reference Bolceková, Mojzeš, Van Tran, Kukal, Ostrý, Kulišťák and Rusina2017; Cocozza et al., Reference Cocozza, Costabile, Tedeschi, Abate, Russo, Liguori, Del Vecchio, Paciello, Quarantelli, Filla, Brunetti and Saccà2018; Y. Li et al., Reference Li, Yang, Evans, Wong, Dissanayaka and Vogel2023; Stoodley & Schmahmann, Reference Stoodley and Schmahmann2009). This nevertheless illustrates that dysarthria, a common symptom in cerebellar disorders, may influence timed neuropsychological test outcomes, and should be taken into account to prevent misclassification (Paap et al., Reference Paap, Roeske, Durr, Schöls, Ashizawa, Boesch, Bunn, Delatycki, Giunti, Lehéricy, Mariotti, Melegh, Pandolfo, Tallaksen, Timmann, Tsuji, Schulz, van de Warrenburg and Klockgether2016). Hence, caution is warranted also in the CCAS scale, of which 50% of the total score is determined by dysarthria-sensitive items.
The use of an extensive neuropsychological test battery consisting of reliable and validated tests as the gold standard to substantiate the CCAS diagnosis of participants is a strength of this study. This also allowed us to include a control group of cerebellar patients without CCAS and increased the relevance for clinical practice. Slightly disadvantageous outcomes were observed when only considering the CC compared to all controls taken together, which illustrates the requirement to include a cerebellar control group in validity studies. The inclusion of an etiologically heterogeneous group of cerebellar patients has increased the external validity of our study. Another strength is that we explored the influence of dysarthria and took this into account by correcting test outcomes. Several limitations of our study should be mentioned. First, selection bias may have occurred due to our recruitment approach. Second, we could not provide a more detailed description of disease severity in our cohort. We had information on the ataxia disease stage for all patients, while recent SARA scores were available for only a subset. Also, we purposely included a mixed-etiology ataxia cohort, as this reflects clinical practice. The CCAS scale was specifically developed to serve the purpose as a cognitive screen for any cerebellar patient. However, this “one size fits all” approach can be criticized, and the inclusion of a larger, more etiologically homogeneous ataxia subsample may have provided different results and a more fine-grained cognitive profile for that specific etiology. Furthermore, the HC group was significantly higher educated than both patient groups. However, had the education level been more similar to that of the patients, this would probably have resulted in even more false-positive outcomes in the controls. Finally, we are unable to draw conclusions about test–retest reliability of the scale, because we have no data about the parallel versions.
The need for a validated screener to detect CCAS is high, and there also have been attempts to detect CCAS with (a brief combination of) other tests (Bolceková et al., Reference Bolceková, Mojzeš, Van Tran, Kukal, Ostrý, Kulišťák and Rusina2017; Starowicz-Filip et al., Reference Starowicz-Filip, Prochwicz, Kłosowska, Chrobak, Krzyżewski, Myszka, Rajtar-Zembaty, Bętkowska-Korpała and Kwinta2022). The current scale does not contain an item about social cognition (e.g. an item on mentalizing), which appears to be commonly affected in cerebellar patients, and the Affect item is rather brief and uses observer ratings rather than a performance task (Van Overwalle et al., Reference Van Overwalle, De Coninck, Heleven, Perrotta, Taib, Manto and Mariën2019). Since the variety in neuropsychiatric symptoms is large, it may be better to capture them on a separate scale. For instance, the Cerebellar Impulsivity-Compulsivity Assessment Scale and the Cerebellar Neuropsychiatric Rating Scale have been developed for specific use in cerebellar patients (Karamazovova et al., Reference Karamazovova, Matuskova, Ismail and Vyhnalek2023; Lin et al., Reference Lin, Amokrane, Chen, Chen, Lai, Trinh, Minyetty, Emmerich, Pan, Claassen and Kuo2023; Shao et al., Reference Shao, Wirth, Ma, Sayied, Kozel, Nieves and Trask2024). Future research will have to focus on establishing more extensive normative data for CCAS, thereby explicitly taking education and age into account (Thieme et al., Reference Thieme, Röske, Faber, Sulzer, Minnerop, Elben, Reetz, Dogan, Barkhoff, Konczak, Wondzinski, Siebler, Hetze, Müller, Sure, Klockgether, Synofzik and Timmann2021). Subsequently, longitudinal studies including the parallel versions should be performed to assess test–retest reliability and gain insight into whether the scale is suitable to monitor changes in cognitive function over time.
Conclusion
In conclusion, we recommend caution when using the CCAS scale in its current form in clinical practice due to its poor specificity. Previous studies have raised similar concerns, but the scale is already being used as a diagnostic tool and endorsed as a “promising biomarker” (Abderrakib et al., Reference Abderrakib, Ligot and Naeije2022; Bolzan et al., Reference Bolzan, Müller Eyng, Leotti, Saraiva-Pereira and Jardim2024; Selvadurai et al., Reference Selvadurai, Perlman, Ashizawa, Wilmot, Onyike, Rosenthal, Shakkottai, Paulson, Subramony, Bushara, Kuo, Dietiker, Geschwind, Nelson, Gomez, Opal, Zesiewicz, Hawkins and Schmahmann2024). We argue that the scale is not yet suitable for diagnostic purposes in clinical practice and that it may still be recommended to perform an extensive neuropsychological assessment in cerebellar patients with cognitive complaints.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S1355617725101033.
Acknowledgments
The authors thank Allan Pieterse, Hanneke van Duijnhoven, Sirwan Darweesh, Ilse Willemse, Teije van Prooije, and Lotte van de Venis for their help in the recruitment of cerebellar patients in the control group. We also thank Frank van Overwalle for his contribution to the translation of the CCAS scale to Dutch/Flemish. We are grateful to all participants who participated in this study.
Funding statement
Financial support was provided by a grant from the Hersenstichting Nederland (grant number DR-2019-00313).
Competing interests
The authors have no conflicts of interest to disclose.