Background
Bipolar disorder (BD) is associated with general and domain-specific cognitive impairment that extends to periods of euthymia (Cullen et al., Reference Cullen, Ward, Graham, Deary, Pell, Smith and Evans2016; Martínez-Arán et al., Reference Martínez-Arán, Vieta, Colom, Torrent, Sánchez-Moreno, Reinares and Salamero2004). The prevalence of severe impairment in at least one cognitive domain is approximately 40% (Martino et al., Reference Martino, Strejilevich, Scápola, Igoa, Marengo, Ais and Perinot2008), though overall, there is a substantial discrepancy in the definitions and prevalence of cognitive impairment throughout the literature (Cullen et al., Reference Cullen, Ward, Graham, Deary, Pell, Smith and Evans2016). Domains such as verbal memory and attention appear particularly impaired (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013; Zanelli, Reference Zanelli2012) and predict poor functional outcomes (Burdick et al., Reference Burdick, Russo, Frangou, Mahon, Braga, Shanahan and Malhotra2014; Hermens, Naismith, Redoblado Hodge, Scott, & Hickie, Reference Hermens, Naismith, Redoblado Hodge, Scott and Hickie2010; Jordan et al., Reference Jordan, Veru, Lepage, Joober, Malla and Iyer2018), including occupational and social functioning (Brissos, Dias, & Kapczinski, Reference Brissos, Dias and Kapczinski2008; Thompson et al., Reference Thompson, Gallagher, Hughes, Watson, Gray, Ferrier and Young2005).
Delineating factors associated with cognitive impairment is relevant for identifying modifiable risk factors, which may represent treatment targets for interventions such as cognitive remediation (CR) (Strawbridge et al., Reference Strawbridge, Tsapekos, Hodsoll, Mantingh, Yalin, McCrone and Young2021). Factors associated with cognitive impairment include duration of illness (Zanelli, Reference Zanelli2012), number of episodes (López-Jaramillo et al., Reference López-Jaramillo, Lopera-Vásquez, Gallo, Ospina-Duque, Bell, Torrent and Vieta2010), lithium use (Wingo, Wingo, Harvey, & Baldessarini, Reference Wingo, Wingo, Harvey and Baldessarini2009), history of psychosis (Lahera et al., Reference Lahera, Montes, Benito, Valdivia, Medina, Mirapeix and Sáiz-Ruiz2008), BD type (Dittmann et al., Reference Dittmann, Hennig-Fast, Gerber, Seemüller, Riedel, Emanuel Severus and Grunze2008), substance misuse (van Gorp, Altshuler, Theberge, Wilkins, & Dixon, Reference van Gorp, Altshuler, Theberge, Wilkins and Dixon1998), and demographic factors (e.g., age, gender, educational level, premorbid IQ) (Carrus et al., Reference Carrus, Christodoulou, Hadjulis, Haldane, Galea, Koukopoulos and Frangou2010; Lewandowski, Sperry, Malloy, & Forester, Reference Lewandowski, Sperry, Malloy and Forester2014; Martino, Valerio, Szmulewicz, & Strejilevich, Reference Martino, Valerio, Szmulewicz and Strejilevich2017).
Given the likely confounding effects of current mood episodes (King, Stone, Cleare, & Young, Reference King, Stone, Cleare and Young2019), it is recommended to examine cognitive impairment in euthymic BD (Miskowiak et al., Reference Miskowiak, Burdick, Martinez-Aran, Bonnin, Bowie, Carvalho and Vieta2017; Thompson et al., Reference Thompson, Gallagher, Hughes, Watson, Gray, Ferrier and Young2005). An initial systematic review and meta-analysis that examined domain-specific cognitive performance in euthymic BD pooled data from 26 studies (689 BD and 721 healthy controls (HC)) and found executive function and verbal memory to be the most impaired domains (d ≥ 0.8) (Robinson et al., Reference Robinson, Thompson, Gallagher, Goswami, Young, Ferrier and Moore2006). A subsequent individual patient data meta-analysis pooling data from 31 studies (1267 BD and 1609 HC) found the greatest impairment in Trail Making Test B (TMT-B, executive functioning; Reitan & Wolfson, Reference Reitan and Wolfson1992), followed by digit span backwards (working memory; Griffin & Heffernan, Reference Griffin and Heffernan1983), Verbal Learning Test (VLT, verbal memory; Elwood (Reference Elwood1995); de Sousa Magalhães et al., Reference de Sousa Magalhães, Fernandes Malloy-Diniz and Cavalheiro Hamdan2012), Trail Making Test A (TMT-A, attention/processing speed; Bowie & Harvey, Reference Bowie and Harvey2006), and Wisconsin Sorting Test (WCST, executive functioning; Jones, Reference Jones2021) (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013). These impairments remain significant after controlling for age, gender, and premorbid IQ.
Bourne et al. (Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013) tested the effect of six clinical predictors (number of depressive episodes, number of manic episodes, number of total episodes, number of depressive hospitalisations, number of manic hospitalisations and number of total hospitalisations) on each test. TMT-A was associated with the number of depressive hospitalisations and total episodes. The number of manic episodes was associated with VLT scores; the total number of hospitalisations was associated with TMT-B and WCST Categories. Psychotropic medication was not associated with cognitive impairment. The authors concluded that further longitudinal studies were required. Demographic factors (e.g., age and gender) have also been related to cognitive impairment in BD (Navarra-Ventura et al., Reference Navarra-Ventura, Vicent-Gil, Serra-Blasco, Massons, Crosas, Cobo and Cardoner2021).
Similar to schizophrenia (Jonas et al., Reference Jonas, Lian, Callahan, Ruggero, Clouston, Reichenberg and Kotov2022; Murray, Bora, Modinos, & Vernon, Reference Murray, Bora, Modinos and Vernon2022), controversy exists in BD as to whether cognitive impairment is explained by neurodevelopmental deficits, progressive decline following illness onset, or both (Burdick, Reference Burdick2022; Goodwin, Martinez-Aran, Glahn, & Vieta, Reference Goodwin, Martinez-Aran, Glahn and Vieta2008); this has relied on studies of cognitive functioning and neuroimaging in BD. The neurodevelopmental theory is supported by studies of undiagnosed family members (Sanches, Keshavan, Brambilla, & Soares, Reference Sanches, Keshavan, Brambilla and Soares2008) and cognitive deficits identified in the first episode (Bora & Pantelis, Reference Bora and Pantelis2015; MacCabe et al., Reference MacCabe, Lambe, Cnattingius, Sham, David, Reichenberg and Hultman2010; MacCabe et al., Reference MacCabe, Wicks, Löfving, David, Berndtsson, Gustafsson and Dalman2013). The neuroprogressive theory is predominantly supported by longitudinal analyses of cognitive functioning (Bora & Özerdem, Reference Bora and Özerdem2017), and cross-sectional correlation with illness duration and number of episodes (López-Jaramillo et al., Reference López-Jaramillo, Lopera-Vásquez, Gallo, Ospina-Duque, Bell, Torrent and Vieta2010; Zanelli, Reference Zanelli2012). These inconsistent findings may be explained by the well-established cognitive heterogeneity observed at the population level (Burdick et al., Reference Burdick, Russo, Frangou, Mahon, Braga, Shanahan and Malhotra2014), possibly reflecting distinct patient subgroups with differential illness trajectories (Millett & Burdick, Reference Millett and Burdick2021). Hence, it is likely that both neurodevelopmental abnormalities and neuro-progressive decline underlie and explain cognitive outcomes for different patients. An alternative method for testing neurodevelopmental or neurodegenerative theories is through comparing premorbid IQ against other domains. Premorbid IQ appears intact (Lewandowski, Cohen, & Öngur, Reference Lewandowski, Cohen and Öngur2011) or less impaired than other domains (Valerio, Lomastro, & Martino, Reference Valerio, Lomastro and Martino2020) in BD at the population level, supporting a combination of neuroprogressive and neurodevelopmental explanations for cognitive deficits observed in BD.
Aims and hypotheses
Our primary aim was to compare performance on general cognitive functioning, premorbid IQ, and domain-specific cognitive functioning (executive functions, verbal memory, working memory, visuo-spatial memory, attention/processing speed) between euthymic BD and HC. We hypothesised that comparative impairment in BD would be highest on average in attention and verbal memory and lowest in premorbid IQ, in accordance with prior literature (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013; López-Jaramillo et al., Reference López-Jaramillo, Lopera-Vásquez, Gallo, Ospina-Duque, Bell, Torrent and Vieta2010).
Our secondary aim was to examine if premorbid IQ, demographic, and clinical factors explained performance differences in cognitive functioning between BD and HC. We hypothesised that lower premorbid IQ (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013), male gender (Navarra-Ventura et al., Reference Navarra-Ventura, Vicent-Gil, Serra-Blasco, Massons, Crosas, Cobo and Cardoner2021), BD type 1 (BD1) (Dittmann et al., Reference Dittmann, Hennig-Fast, Gerber, Seemüller, Riedel, Emanuel Severus and Grunze2008), high hospitalisation rate (Levy, Medina, Manove, & Weiss, Reference Levy, Medina, Manove and Weiss2011), history of psychosis (Lahera et al., Reference Lahera, Montes, Benito, Valdivia, Medina, Mirapeix and Sáiz-Ruiz2008), use of antipsychotics, benzodiazepines and anticonvulsants (Cañada et al., Reference Cañada, Sabater, Sierra, Balanzá-Martínez, Berk, Dodd and García-Blanco2021; Torrent et al., Reference Torrent, Martinez-Arán, Daban, Amann, Balanzá-Martínez, del Mar Bonnín and Vieta2011), no use of lithium (Sabater et al., Reference Sabater, Garcia-Blanco, Verdet, Sierra, Ribes, Villar and Livianos2016), number of episodes (Robinson et al., Reference Robinson, Thompson, Gallagher, Goswami, Young, Ferrier and Moore2006), and illness duration (Frey et al., Reference Frey, Zunta-Soares, Caetano, Nicoletti, Hatch, Brambilla and Soares2008; Martino, Samamé, Ibañez, & Strejilevich, Reference Martino, Samamé, Ibañez and Strejilevich2015) would be associated with greater differences in cognitive test performance between BD and HC.
Methods
Protocol development and registration
This systematic review and meta-analysis were registered in the International Prospective Register of Systematic Reviews (PROSPERO; ID: CRD42021284784). The Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) were followed throughout the review (Supplementary Material A).
Alterations to Prospero registration
Following registration, reviewers decided to focus specifically on cognitive functioning in BD, avoiding additions of schizoaffective disorder due to increasing volumes of data in both diagnoses, allowing for a separate focus in different reviews. Additionally, premorbid IQ was added as a focus, following clinical input on the relevance of this as a predictor of cognitive impairment.
Search methods and strategy
Articles were systematically searched across Embase (1947–2024), Medline (1946–2024), and PsychInfo (1806–2024) in June 2024. No age or date restrictions were implemented. The Population Exposure Control Outcome and Study Design (PECO-S) model was used as a guide in the formation of the search strategy: (bipolar disorder OR manic depress*) AND (cognitive function* IQ OR cognitive performance OR cognitive decline). Searches were performed via Ovid. Backwards and forwards citation searches were conducted, and authors of known cohorts examining cognition in BD were contacted.
Data collection
The selection process occurred in two stages: (1) titles and abstracts were screened, and irrelevant articles were removed, and (2) full-text articles were then screened and selected for inclusion based on eligibility criteria (see Figure 1). Two reviewers conducted searches blind to the others’ decisions. The primary author (SS) searched for all studies, with co-authors screening studies before May 2021 (PS) and between May 2021 and June 2024 (WZ).

Figure 1. PRISMA flowchart.
Eligibility criteria
Inclusion criteria involved reporting of (1) euthymic BD and HC samples; (2) general, premorbid IQ (e.g., NART) or domain-specific cognition (e.g., verbal memory) data; (3) diagnosis using a structured interview or psychiatric assessment, either the Diagnostic and Statistical Manual for Mental Disorders (DSM-III, DSM-IIIR, DSM-IV, DSM-IVTR, DSM-5) or the International Classification of Diseases (ICD-9 or ICD-10), and (4) observational studies. Qualitative cognitive assessments, such as the executive interview (EXIT; Altshuler et al., Reference Altshuler, Tekell, Biswas, Kilbourne, Evans, Tang and Bauer2007), were excluded from the review. Studies of any language were included and translated into English by native translators. Only baseline data from longitudinal studies were analysed.
Quality assessment
Included studies were assessed for methodological quality using The Newcastle-Ottawa Scale (NOS) (Donnelly, Bracchi, Hewitt, Routledge, & Carter, Reference Donnelly, Bracchi, Hewitt, Routledge and Carter2017), which was completed independently by two reviewers (SS; MZ). Where there was no consensus, a third reviewer made the final decision (SJ). The NOS scale was completed as reported by previous researchers (Nayebirad et al., Reference Nayebirad, Mohamadi, Yousefi-Koma, Javadi, Farahmand, Atef-Yekta and Kavosi2023). Cross-sectional and cohort studies were assessed and reported separately. A study was defined as ‘very good’ quality where scores ≥90%, ‘good’ quality where scores ≥70%, ‘fair’ quality where scores ≥50% and ‘poor’ quality where <50% (Nayebirad et al., Reference Nayebirad, Mohamadi, Yousefi-Koma, Javadi, Farahmand, Atef-Yekta and Kavosi2023). The criteria for NOS in our study are shown in Supplementary Material B, with results in Supplementary Materials C and D.
Data extraction
The following data was extracted from each study: name of the author, year of publication, diagnostic classification, geographical location, cognitive domain, cognitive battery, age of onset, duration of remission, duration of illness, BD1%, number of episodes, number of depressive episodes, number of hypomanic episodes, number of hospitalisations, history of psychosis, YMRS, HAM-D, medication choices (mood stabilisers, lithium, anticonvulsants, antipsychotics, antidepressants, benzodiazepines, and no medication). Sample size, age, ethnicity, sex, cognitive test scores, functioning test scores, and employment rates were recorded for BD and HC separately. Studies were allocated to one of three authors (SS; PS; WZ) for extraction, which was then checked for consistency by the other 2.
Coding of cognitive batteries
Cognitive tests were grouped into one of seven categories: general cognitive functioning, premorbid IQ, executive functioning, working memory, verbal memory, visuo-spatial memory, and attention/processing speed. Coding of cognitive tests was performed based on previous studies of domain categorisation (Millgate et al., Reference Millgate, Hide, Lawrie, Murray, MacCabe and Kravariti2022) and in collaboration with four experts in neuropsychological assessment (AR; EK; PG; DT). Supplementary Material E gives a breakdown of cognitive tests.
When a study reported results for a specific domain (e.g., executive functioning) but did not mention the test used for domain results, data from the test were included within the function mentioned and labelled as ‘undefined’.
Data synthesis
Fifty-six cognitive tests were grouped into the seven cognitive domains. Effect sizes for each cognitive domain were calculated for the mean difference between BD and HC cognitive performance in each study. Following this, a pooled estimate for Hedge’s g effect sizes between BD and HC was calculated for each domain.
The metaset command in STATA v17.0 (Viinikainen et al., Reference Viinikainen, Böckerman, Hakulinen, Kari, Lehtimäki, Raitakari and Pehkonen2022) was used to generate Hedge’s g effect sizes using a random mixed effects model for differences in cognitive performance between BD and HC groups (Hess, Quinn, Akbarian, & Glatt, Reference Hess, Quinn, Akbarian and Glatt2015). Heterogeneity between included studies was assessed using the I 2 statistic, with high heterogeneity defined as I 2 > 75% (Higgins & Thompson, Reference Higgins and Thompson2002). Publication bias was assessed through funnel plots and Egger’s and Begg’s test statistics (Montejo et al., Reference Montejo, Torrent, Jimenez, Martínez-Arán, Blumberg and Burdick2022c).
Meta-regressions were conducted for specific risk factors (age, sex, age of onset, duration of remission, duration of illness, number of episodes, number of manic episodes, number of depression episodes, % BD1, number of hospitalisations, history of psychosis (yes/no), mood stabilisers (yes/no), lithium (yes/no), anticonvulsants (yes/no), antipsychotics (yes/no), antidepressants (yes/no) and benzodiazepines(yes/no)) in each cognitive domain, using the metareg command in STATA (Fatouros-Bergman, Cervenka, Flyckt, Edman, & Farde, Reference Fatouros-Bergman, Cervenka, Flyckt, Edman and Farde2014).
The Benjamini-Hochberg (BH) procedure (Bogdan, Ghosh, & Tokdar, Reference Bogdan, Ghosh and Tokdar2008) was used to control for multiple comparisons in both primary analyses and meta-regressions (Van Haren, Reference Van Haren2024).
Following expert input (BS; EM), a sensitivity analysis was conducted, removing studies that reported domain-specific scores (e.g., executive functioning) whilst not mentioning the specific tests that contributed to that score (e.g., TMT-B).
Results
Search and selection process
The flowchart in Figure 1 outlines the study selection process. Seventy-five observational studies were included in the review and meta-analysis. Relevant studies were removed due to not being limited to euthymic cases (Hidese et al., Reference Hidese, Yoshida, Is hida, Matsuo, Hattori and Kunugi2023; Juselius, Kieseppä, Kaprio, Lönnqvist, & Tuulio-Henriksson, Reference Juselius, Kieseppä, Kaprio, Lönnqvist and Tuulio-Henriksson2009; Zanelli, Reference Zanelli2012) and one study, which measured cognitive functioning through a qualitative measure (i.e., EXIT interview; Altshuler et al., Reference Altshuler, Tekell, Biswas, Kilbourne, Evans, Tang and Bauer2007).
Ten studies reported on multiple groups. Specifically, Navarra-Ventura et al. (Reference Navarra-Ventura, Vicent-Gil, Serra-Blasco, Massons, Crosas, Cobo and Cardoner2021) reported male and female groups separately; Soni, Singh, Shah, and Bagotia (Reference Soni, Singh, Shah and Bagotia2017)), Martino et al. (Reference Martino, Strejilevich, Marengo, Ibañez, Scápola and Igoa2014), and Czepielewski et al. (Reference Czepielewski, Massuda, Goi, Sulzbach-Vianna, Reckziegel, Costanzi and Gama2015) reported low and high functioning separately; Lahera et al. (Reference Lahera, Montes, Benito, Valdivia, Medina, Mirapeix and Sáiz-Ruiz2008) and Bora et al. (Reference Bora, Vahip, Akdeniz, Gonul, Eryavuz, Ogut and Alkan2007) reported psychotic and non-psychotic; van Gorp et al. (Reference van Gorp, Altshuler, Theberge, Wilkins and Dixon1998) reported alcohol dependence and no-alcohol dependence; Dittmann et al. (Reference Dittmann, Hennig-Fast, Gerber, Seemüller, Riedel, Emanuel Severus and Grunze2008) reported BD1 and BD2; Rosa et al. (Reference Rosa, Magalhães, Czepielewski, Sulzbach, Goi, Vieta and Kapczinski2014) reported four levels of functioning, ranging from high functioning to being unable to maintain personal self-care; Hasse-Sousa et al. (Reference Hasse-Sousa, Martins, Petry-Perin, de Britto, Scheibe, Bücker and Czepielewski2024) reported with and without suicide attempts; Jones et al. (Reference Jones, Fernandes, Husain, Ortiz, Rajji, Blumberger and Mulsant2023) reported cognitively impaired and not; Yang et al. (Reference Yang, Li, Fu, Wang, Liu, Chen and Liu2024) reported drug-naive and long-term medication. In total, 95 groups were included, with 4404 BD and 4037 HC included in the meta-analysis. As each group included data from multiple cognitive tests, a total of 349 effect sizes were calculated: 16 groups included data on general cognitive functioning, 32 pre-morbid IQ, 83 executive functioning, 52 working memory, 65 verbal memory, 19 visuo-spatial memory, and 82 attention/processing speed.
Characteristics of included studies
Tables 1 and 2 present demographic and clinical information regarding each study, respectively.
Table 1. Demographic characteristics of studies

Table 2. Illness type, severity, and functioning of participants in each study group

Demographic
53.47% of BD (SD = 17.63) and 52.92% (SD = 15.34) of HC were female. The mean age was 42.53 (SD = 11.03) and 41.04 (SD = 11.40) for BD and HC, respectively, while the mean number of education years was 13.00 (SD =5.42) for BD and 13.21 (SD = 2.75) for HC.
Illness severity
The mean and median duration of illness (in months) were 181.97 (SD = 83.44) and 173.10 (Q1 = 130.98, Q3 = 234.36, IQR = 103.38), respectively. The mean age of BD onset was 26.83 years (SD = 5.71). The mean number of episodes was 9.47, with 6.46 (SD = 4.38) depressive episodes and 4.66 (SD = 3.51) (hypo)manic episodes. The mean number of hospitalisations and percentage of participants with a history of psychosis were 2.49 (SD = 1.50) and 53.34% (SD = 24.77), respectively.
Medication
8.56% of BD participants were taking no psychotropic medication, 66.24% were on mood stabilisers, 50.69% on lithium, 47.49% on anticonvulsants, 50.25% on antipsychotic medication, 24.04% on benzodiazepines, and 30.27% on antidepressants.
Functional outcome
38.20% of BD were unemployed, compared to 22.01% of HC. Functioning Assessment Short Test (FAST) scores (M = 24.92, SD = 9.98) indicated a moderate level of functional impairment in BD.
Quality assessment
Three of the 75 studies included were of very good quality, 46 of good quality, 23 of fair quality and three of poor quality, with low sample size, comparability between BD and HC in age and years of education, and absence of structured interviews being the primary reason for reduced quality. Supplementary Materials C and D present a detailed breakdown of the quality assessment of cross-sectional and longitudinal studies, respectively.
Meta-analysis
Meta-analyses for each domain are presented in Table 3, with forest plots in Figures 2 and 3. Negative Hedge’s g effect sizes indicate worse performance in BD versus HC. All domains were found to be statistically significant. Funnel plots and Egger’s and Begg’s test statistics are presented in Supplementary Material F, showing no indication of publication bias.
Table 3. Results from meta-analyses and meta-regressions

Note: Reporting the predictors of domain-specific cognitive functioning (measures of effect are r scores *p <.05, **p <.01. p values are adjusted following Benjamin–Hochberg (BH) correction.

Figure 2. Forest plots showing the main effect of group (BD vs. HC) for general cognitive functioning, premorbid IQ, and executive function.

Figure 3. Forest plots showing the main effect of group (BD vs. HC) for verbal memory, visuo-spatial memory, working memory, and attention/processing speed.
The effect size between BD and HC in general cognitive functioning was significant after BH correction for general cognitive functioning (Hedge’s g = −0.58, 95% CI: −0.79, −0.37, p <.01 [k = 16, I 2 = 74.45%]). The largest domain-specific effect size was on verbal memory (Hedges g = −0.70, 95% CI: −0.79, −0.60, p <.01 [number of study groups (k) = 64, I 2 = 67.87%]); followed by executive function (Hedge’s g = −0.69, 95% CI: −0.78, −0.60, p <.01 [k = 83, I 2 = 77.82%]); visuo-spatial memory (Hedges g = – 0.68, 95%CI: −0.83, −0.53, p <.01 [k = 18, I 2 = 37.47%]); attention/processing speed (Hedge’s g = –0.64, 95%CI: −0.75, −0.54, p <.01 [k = 80, I 2 = 82.73%]) and working memory (Hedge’s g = −0.61, 95% CI: −0.74, −0.49, p <.01 [k = 67, I 2 = 74.58%]). A smaller effect size between groups was found for pre-morbid IQ (Hedge’s g = −0.24, 95% CI: −0.36, −0.12, p <.01 [k = 32, I 2 = 60.74%]).
Associations of cognitive performance
Data from all 95 groups were included in the meta-regression analyses. Results are presented in Table 3. Prior to BH correction, there was a significant association between higher premorbid IQ and less impairment in working memory, β = .78, p <.01, less impairment in verbal memory, β = .51, p < .05, and less impairment in attention/processing speed, β = .51, p < .05. These associations did not remain significant following BH correction, with only more years of education being the correlate of lower impairment in verbal memory, β = .066, adjusted p < .05.
Higher premorbid IQ was associated with fewer manic episodes, β = −.054, p <.05. Better executive functioning was associated with longer duration of current remission, β = −.023, p < .05. Higher working memory was associated with lower antipsychotic use, β = −.0071, p <.05. Better verbal memory was associated with bipolar 2 diagnosis, β = −.0052, p <.05, and lower number of hospitalisations, β = −.12, p < .05. Lower visuospatial memory was associated with higher antidepressant use, β = −.43, p <.05. Better processing speed was associated with lower number of hospitalisations, β < −.0077, p <.05.
Sensitivity analysis
Sensitivity analyses were conducted, removing studies that utilised ‘undefined’ cognitive assessments The largest effect size between BD and HC in cognitive performance was on executive function (Hedges g = .71, CI: −.79, −.63 [number of study groups = 79, I 2 = 69.78%]); followed by working memory (Hedges g = −.61, CI: −.74, −.49 [number of study groups = 49, I 2 = 74.72%]); and attention/processing speed (Hedges g = .64, CI: .74, .54 [number of study groups = 78, I 2 = 81.89%]).
Discussion
The current meta-analysis examined generalised and domain-specific cognitive functioning in euthymic BD, updating previous reviews, examining a wider range of associative factors (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013; Man-Wrobel, Carreno & Dickinson, Reference Mann-Wrobel, Carreno and Dickinson2011). Cognitive performance was impaired to a similar degree across all domains studied, including executive functioning, verbal memory, attention, visuo-spatial memory, and general cognitive functioning, consistent with an earlier meta-analysis (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013). Impairment in premorbid IQ was lower yet statistically significant, partially supporting both neurodevelopmental and neurodegenerative theories. Cognitive performance remained largely unaccounted for by clinical and demographic variables, despite possible cumulative effects of these factors (Tsapekos, Strawbridge, Cella, Wykes, & Young, Reference Tsapekos, Strawbridge, Cella, Wykes and Young2021).
Cognitive decline in euthymic bipolar disorder
The significant impairment in premorbid IQ, partially supports a neurodevelopmental trajectory for at least a proportion of people. However, the extent of premorbid impairment was substantially smaller than in other domains at the group level, indicating possible neuroprogressive decline for another subgroup. This is consistent with a model suggesting cognitively distinct trajectories within the BD population (Millett & Burdick, Reference Millett and Burdick2021). Some longitudinal evidence indicates a decline in a subgroup in BD (up to 48%) (Hinrichs et al., Reference Hinrichs, Easter, Angers, Pester, Lai, Marshall and Ryan2017). Other reviews indicate some studies find no longitudinal decline (Bora & Özerdem, Reference Bora and Özerdem2017; Martino et al., Reference Martino, Samamé, Ibañez and Strejilevich2015). Nevertheless, longitudinal follow-up is often short (averaging 1-5 years), which may not be sufficient to detect decline (Millett & Burdick, Reference Millett and Burdick2021). Populations in our systematic review had a mean illness duration longer than 17 years, indicating established illness (Kim et al., Reference Kim, Seo, Yun, Jung, Park, Lee and Bahk2015). Although general cognitive functioning was the second least impaired domain, small differences and large heterogeneity warrant caution in assuming that these results support evidence of greater impairment in specific domains (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013).
Associations with cognitive impairment in euthymic bipolar disorder
Premorbid IQ explained considerable variance (i.e., large coefficient) in several cognitive domains, including working memory (β = .78), verbal memory (β = .51), and attention/processing speed (β = .51), although these did not remain significant following BH correction. Premorbid IQ did not significantly predict variance in other domains, or general cognitive functioning, likely explained by a lack of studies that reported data on both premorbid IQ and those domains. This was particularly the case in general cognitive functioning, where only three studies reported both, leading to insignificance, although the coefficient was large (β = .86) (Tsapekos et al., Reference Tsapekos, Strawbridge, Mantingh, Cella, Wykes and Young2020).
Furthermore, considerable variation was left unexplained in cognitive domains, indicating the importance of determining associative factors other than premorbid IQ. Meta-regressions did not indicate any demographic or clinical predictors of cognitive performance, potentially warranting focus on other variables, or determining alternative methods to better detect the association of these variables. After correction, only higher education years significantly predicted higher verbal memory in BD compared to HC, which is perhaps unsurprising as verbal memory is acquired early in cognition, which is reflected in years of early education (Schneider, Knopf, & Sodian, Reference Schneider, Knopf, Sodian, Schneider and Bullock2010). Nevertheless, the coefficient of the association was small (β = .066).
In light of the foregoing results, the absence of significant demographic or clinical moderators beyond years of education (despite our a priori expectation that multiple factors would accentuate cognitive deficits), only education was identified as a significant moderator of cognitive performance. This possibly reflects methodological limitations inherent to study-level meta-regression. When cohort means, such as the average manic-episode count, are regressed on pooled effect sizes, genuine within-person associations are vulnerable to ecological bias and may be attenuated or reversed once data are aggregated across heterogeneous samples (Pollet, Stulp, Henzi, & Barrett, Reference Pollet, Stulp, Henzi and Barrett2015). This bias is further compounded by variability in how primary studies operationalised each predictor, collinearity among illness-history indicators, and the loss of statistical power that accompanies covariates reported by only a subset of included investigations. Collectively, these factors are liable to skew relationships documented at the study level, possibly leaving only the modest association between educational attainment and verbal memory observable at the meta-analytic level.
Clarifying whether psychosis history, lithium exposure, episode burden and the remaining hypothesised variables genuinely moderate cognitive outcomes will therefore require participant-level methodologies. Individual-data or federated mega-analyses, together with harmonised prospective cohorts, will permit multilevel modelling that partitions variance within individuals, within studies, and between studies; thereby maximising statistical power while minimising ecological bias (Wakefield, Reference Wakefield2009). These approaches are best suited to delineate the clinical and demographic determinants of cognitive trajectories in euthymic bipolar disorder.
Limitations
Heterogeneity was observed in several domains, particularly attention/processing speed and executive functioning, warranting caution in the interpretation of comparably small differences in effect size between domains. Although the current review benefited from having samples from several countries, differing levels of functioning (ranging from high functioning to being unable to maintain personal self-care), substance use, BD type and suicidality, add to this heterogeneity, which may have obscured an effect.
Heterogeneity may be explained by evidence of cognitive clusters in BD (i.e., severe impairment across domains, selective impairment in specific domains, and intact cognitive functioning) (Burdick et al., Reference Burdick, Russo, Frangou, Mahon, Braga, Shanahan and Malhotra2014; Tsapekos et al., Reference Tsapekos, Strawbridge, Mantingh, Cella, Wykes and Young2020), which could not be addressed in the group level comparisons we conducted. Nevertheless, our results are broadly in keeping with those of Bourne et al. (Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013), with greater effect sizes observed in this study.
Group-level comparisons may explain why some specific clinical and demographic factors were not associated with cognitive impairment, as seen in individual studies.
Another limitation is the categorising of cognitive tests into domains. Although there is strong evidence of the utility of this (Baune & Malhi, Reference Baune and Malhi2015), some assessments use skills from multiple domains, leading to difficulty in the choice of which domain to use for each test. Other ways of assessing cognitive functioning include studies classifying samples in homogeneous cognitive subgroups using data-driven approaches (Burdick et al., Reference Burdick, Russo, Frangou, Mahon, Braga, Shanahan and Malhotra2014; Tsapekos et al., Reference Tsapekos, Strawbridge, Mantingh, Cella, Wykes and Young2020), suggesting different levels of impairment (i.e., no impairment, impairment in certain domains, and impairment across domains).
On a related note, the use of cross-sectional data means causal inference is difficult, with very few longitudinal studies existing in the literature, indicating a decline following the first episode (Zanelli, Reference Zanelli2012; Zanelli et al., Reference Zanelli, Mollon, Sandin, Morgan, Dazzan, Pilecka and Reichenberg2019), warranting future focus on longitudinal studies. Nevertheless, the Bipolar Commission found that BD is on average diagnosed 9.5 years after illness onset (Goodwin et al., Reference Goodwin, Dolman, Young, Jones, Richardson and Kitchen2022), indicating the need for researchers to determine alternative ways of following up with individuals at-risk of later BD diagnosis, as once diagnosed, decline may have already occurred (Zanelli, Reference Zanelli2012).
Finally, a recent analysis of a large cohort (McCutcheon, Keefe, McGuire, & Marquand, Reference McCutcheon, Keefe, McGuire and Marquand2024), found that cognitive impairment across psychotic disorders (including BD) may be related to risk factor exposure (i.e., different exposure to HC) as opposed to disease-specific effects. This suggests more scrutiny of control groups for risk factor exposure, which was not possible in the current analysis, as a lot of these were not reported.
Implications and future directions
The high prevalence and severity of impairment across domains of cognitive functioning warrant increased focus for both understanding the nature and nuances of this impairment and increasing the provision of interventions to tackle cognitive difficulties. For the former, longitudinal studies will help delineate subgroups with differential cognitive trajectories. Optimally, these studies will focus on those at high risk of developing BD or first-episode psychosis (Keramatian, Torres, & Yatham, Reference Keramatian, Torres and Yatham2021) and people with first-episode mania (Jauhar et al., Reference Jauhar, Ratheesh, Davey, Yatham, McGorry, McGuire and Young2019). This may facilitate the delivery of targeted interventions at an early stage, with the aim of not only restoring potential deficits but also preventing/slowing further decline in cognitive performance (Miskowiak et al., Reference Miskowiak, Seeberg, Jensen, Balanzá-Martínez, del Mar Bonnin, Bowie and Vieta2022). CR aims to improve cognitive functioning through enhancing metacognitive skills, developing compensatory strategies, and training executive functioning. Initial evidence suggests it may be effective in tackling cognitive difficulties and transferring cognitive gains into functional improvement in euthymic BD (Strawbridge et al., Reference Strawbridge, Tsapekos, Hodsoll, Mantingh, Yalin, McCrone and Young2021; Tsapekos et al., Reference Tsapekos, Strawbridge, Cella, Goldsmith, Kalfas, Taylor and Young2023; Tsapekos, Strawbridge, Cella, Wykes, & Young, Reference Tsapekos, Strawbridge, Cella, Wykes and Young2022). Larger trials are currently underway to assess the efficacy and potential mechanisms of this treatment paradigm (Tsapekos et al., Reference Tsapekos, Strawbridge, Cella, Goldsmith, Kalfas, Taylor and Young2023). However, the intervention has not yet been tested specifically at early stages of the illness, which surely represents a promising future research direction.
Conclusion
The present systematic review and meta-analysis indicates moderate (>0.5 SD below the mean of HC) cognitive impairment in specific domains (executive function, working memory, verbal memory, visuo-spatial memory and attention/processing speed) and mild impairment (<0.5 SD below the mean of HC) in general cognitive function, in BD, consistent with previous findings of deficits across domains (Bourne et al., Reference Bourne, Aydemir, Balanzá-Martínez, Bora, Brissos, Cavanagh and Goodwin2013). Comparably lower impairment in premorbid IQ provides some basis for both neurodevelopmental and neuroprogressive hypotheses.
Nevertheless, heterogeneity was high across domains, which may be explained by cognitive clusters in BD (Burdick et al., Reference Burdick, Russo, Frangou, Mahon, Braga, Shanahan and Malhotra2014) and by potentially untested correlates (e.g., schizophrenia polygenic risk score (Ohi et al., Reference Ohi, Nishizawa, Sugiyama, Takai, Fujikane, Kuramitsu and Shioiri2023; Wu et al., Reference Wu, Hsu, Lin, Su, Lin, Chen and Wang2024) and family history, (Landau, Raymont, & Frangou, Reference Landau, Raymont and Frangou2003). Future research should determine reasons for heterogeneity in longitudinal analyses to tailor future treatments, such as CR, for individuals at risk of poor cognitive and functional outcomes.
Supplementary material
The supplementary material for this article can be found http://doi.org/10.1017/S0033291725101827.
Author contribution
S.S., concept of review; data collection; analysis; and drafting of manuscript. D.T., concept of review; drafting of manuscript; and final approval. P.S., concept; data collection. W.Z., data collection. M.Z., data collection. E.M., drafting of manuscript; final approval. R.S., concept; drafting of manuscript. R.S., statistical analysis. B.C., concept; statistical analysis. P.G., concept; initial drafting of manuscript. J.Z., concept; initial drafting of manuscript. E.K., concept; initial draft of manuscript. A.R., concept; initial draft; and final approval of manuscript. A.H.Y., concept; drafting of manuscript; and final approval. R.M.M., concept; initial draft; and final approval of manuscript. S.J., concept; initial draft; and final approval of manuscript.
Funding statement
This study was supported by the Economic and Social Research Council (project reference 2437217).
Competing interests
S.S., D.T., P.S., W.Z., M.Z., E.M., R.S., B.C., P.G., J.Z., E.K., A.R. these author declares none. R.S. has received honoraria from Janssen in the last 36 months. Allan H. Young: Employed by King’s College London; Honorary Consultant, South London and Maudsley NHS Foundation Trust (NHS UK). Editor of Journal of Psychopharmacology and Deputy Editor, BJPsych Open, Paid lectures and advisory boards for the following companies with drugs used in affective and related disorders: Flow Neuroscience, Novartis, Roche, Janssen, Takeda, Noema pharma, Compass, Astrazenaca, Boehringer Ingelheim, Eli Lilly, LivaNova, Lundbeck, Sunovion, Servier, Allegan, Bionomics, Sumitomo Dainippon Pharma, Sage, Neurocentrx, Otsuka. Principal Investigator in the Restore-Life VNS registry study funded by LivaNova. Principal Investigator on ESKETINTRD3004: “An Open-label, Long-term, Safety and Efficacy Study of Intranasal Esketamine in Treatment-resistant Depression.” Principal Investigator on “The Effects of Psilocybin on Cognitive Function in Healthy Participants.” Principal Investigator on “The Safety and Efficacy of Psilocybin in Participants with Treatment-Resistant Depression (P-TRD).” Principal Investigator on “A Double-Blind, Randomized, Parallel-Group Study with Quetiapine Extended Release as Comparator to Evaluate the Efficacy and Safety of Seltorexant 20 mg as Adjunctive Therapy to Antidepressants in Adult and Elderly Patients with Major Depressive Disorder with Insomnia Symptoms Who Have Responded Inadequately to Antidepressant Therapy.” (Janssen). Principal Investigator on “An Open-label, Long-term, Safety and Efficacy Study of Aticaprant as Adjunctive Therapy in Adult and Elderly Participants with Major Depressive Disorder (MDD).” (Janssen). Principal Investigator on “A Randomized, Double-blind, Multicentre, Parallel-group, Placebo-controlled Study to Evaluate the Efficacy, Safety, and Tolerability of Aticaprant 10 mg as Adjunctive Therapy in Adult Participants with Major Depressive Disorder (MDD) with Moderate-to-severe Anhedonia and Inadequate Response to Current Antidepressant Therapy.” Principal Investigator on ‘’ A Study of Disease Characteristics and Real-life Standard of Care Effectiveness in Patients with Major Depressive Disorder (MDD) With Anhedonia and Inadequate Response to Current Antidepressant Therapy Including an SSRI or SNR.” (Janssen). UK Chief Investigator for Compass; COMP006 & COMP007 studies. UK Chief Investigator for Novartis MDD study MIJ821A12201. Grant funding (past and present): NIMH (USA); CIHR (Canada); NARSAD (USA); Stanley Medical Research Institute (USA); MRC (UK); Wellcome Trust (UK); Royal College of Physicians (Edin); BMA (UK); UBC-VGH Foundation (Canada); WEDC (Canada); CCS Depression Research Fund (Canada); MSFHR (Canada); NIHR (UK). Janssen (UK) EU Horizon 2020. No shareholdings in pharmaceutical companies. R.M.M. has received honoraria from Viatris, Recordati, and acted as a consultant advisor to Merk, Boehringer, Abbvie. Sameer Jauhar: S.J. has received honoraria for educational talks given for Lundbeck, Janssen, Boehringer-Ingelheim, Recordati, Sunovian. He has sat on an advisory board for Boehringer-Ingelheim, and consulted for LB Pharmaceuticals. He has sat on panels for the Wellcome Trust and National Institute of Health and Care Excellence (NICE). S.J. is a Council Member of the British Association for Psychopharmacology (BAP) and Executive Committee member of the Academic Faculty, Royal College of Psychiatrists.