Introduction
Depressive syndromes frequently predate, parallel, or follow the first-episode psychosis (FEP) (Upthegrove et al., Reference Upthegrove, Birchwood, Ross, Brunett, McCollum and Jones2010). Depressive comorbidity in psychotic disorder rates vary depending on criteria and illness stage, and are reported to range from 26% up to 80% (Birchwood, Iqbal, Chadwick, & Trower, Reference Birchwood, Iqbal, Chadwick and Trower2000; Conley, Ascher-Svanum, Zhu, Faries, & Kinon, Reference Conley, Ascher-Svanum, Zhu, Faries and Kinon2007; Cotton et al., Reference Cotton, Lambert, Schimmelmann, Mackinnon, Gleeson, Berk, Hides, Chanen, McGorry and Conus2012; Upthegrove et al., Reference Upthegrove, Birchwood, Ross, Brunett, McCollum and Jones2010). These symptoms may stem from neurobiological processes inherent to psychosis (Alexandros Lalousis et al., Reference Alexandros Lalousis, Wood, Reniers, Schmaal, Azam, Mazziota, Saeed, Wragg and Upthegrove2023; Häfner et al., Reference Häfner, Maurer, Trendler, an, Schmidt and Könnecke2005), as a result of the dysphoric effects of antipsychotic medication (Voruganti & Awad, Reference Voruganti and Awad2004), or as a psychological response to the debilitating and life-changing impact of psychosis (Upthegrove, Marwaha, & Birchwood, Reference Upthegrove, Marwaha and Birchwood2017). As a result of all these factors, ~80% of psychosis patients experience a major depressive episode over the course of their illness (Upthegrove et al., Reference Upthegrove, Birchwood, Ross, Brunett, McCollum and Jones2010). These rates far exceed the lifetime prevalence of major depressive episodes in the general population, reported to be from 14% up to 24% (Bromet et al., Reference Bromet, Andrade, Hwang, Sampson, Alonso, de Girolamo, de Graaf, Demyttenaere, Hu, Iwata, Karam, Kaur, Kostyuchenko, Lépine, Levinson, Matschinger, Mora, Browne, Posada-Villa and Kessler2011; Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen, Dickens, Fox, Graham, Holliday, Howard, John, Lee, McCabe, McIntosh, Pearsall, Smith, Sudlow, Ward and Hotopf2020).
Studying depressive symptoms in patients with psychosis is crucial due to the association between comorbid depression and poor clinical outcomes and quality of life of individuals at risk for psychosis (Solmi et al., Reference Solmi, Soardo, Kaur, Azis, Cabras, Censori, Fausti, Besana, Salazar de Pablo and Fusar-Poli2023) and FEP patients (Conley et al., Reference Conley, Ascher-Svanum, Zhu, Faries and Kinon2007). Suicidal behavior remains a critical and life-threatening concern among patients with psychotic disorders, with those who have comorbid depression being at the highest risk (McGinty & Upthegrove, Reference McGinty and Upthegrove2020). As a result, clinical guidelines recommend monitoring coexisting conditions in psychosis, including depression (National Institute for Health and Care Excellence, 2014).
Although there is no clear consensus on treatment practices, recent evidence suggests that coadministering antidepressants with antipsychotics can enhance the remission of depressive symptoms in psychosis, potentially improving clinical outcomes without substantially increasing the burden of side effects (Barnes et al., Reference Barnes, Drake, Paton, Cooper, Deakin, Ferrier, Gregory, Haddad, Howes, Jones, Joyce, Lewis, Lingford-Hughes, MacCabe, Owens, Patel, Sinclair, Stone, Talbot and Yung2019; Gregory, Mallikarjun, & Upthegrove, Reference Gregory, Mallikarjun and Upthegrove2017). Current ongoing clinical trials are testing whether the addition of an antidepressant to antipsychotic medication following FEP is clinically useful in the prevention of post-psychotic depression (PPD) (Palmer et al., Reference Palmer, Griffiths, Watkins, Weetman, Ottridge, Patel, Woolley, Tearne, Au, Taylor, Sadiq, Al-Janabi, Major, Marriott, Husain, Katshu, Giacco, Barnes, Walters and Upthegrove2023). At the same time, evidence suggests that some FEP patients with comorbid depression will remit from depressive symptoms with an antipsychotic treatment alone (Phahladira et al., Reference Phahladira, Asmal, Lückhoff, du Plessis, Scheffler, Kilian, Smit, Buckle, Chiliza and Emsley2021). There is also historical, often conflicting, evidence that adjunctive antidepressants may worsen psychotic symptoms, suggesting that some FEP patients with comorbid depression will remit (Kramer et al., Reference Kramer, Vogel, DiJohnson, Dewey, Sheves, Cavicchia, Litle, Schmidt and Kimes1989; Lehman, Lieberman, Dixon, & Association, Reference Lehman, Lieberman, Dixon and Association2004; Preda, MacLean, Mazure, & Bowers, Reference Preda, MacLean, Mazure and Bowers2001). As a result, there is no consensus on the optimal strategy in the triage and management of depressive episodes before, during, and after the FEP, as well as in later psychotic stages (Barnes et al., Reference Barnes, Drake, Paton, Cooper, Deakin, Ferrier, Gregory, Haddad, Howes, Jones, Joyce, Lewis, Lingford-Hughes, MacCabe, Owens, Patel, Sinclair, Stone, Talbot and Yung2019).
In this context, artificial intelligence offers a promising avenue to help address this challenge by identifying FEP patients at risk of experiencing recurrent depressive episodes following initiation of antipsychotic medication, and who may, therefore, benefit from adjunctive treatments. Despite significant advances in the application of artificial intelligence to prognostication in psychosis (Coutts, Koutsouleris, & McGuire, Reference Coutts, Koutsouleris and McGuire2023; Saboori Amleshi et al., Reference Saboori Amleshi, Ilaghi, Rezaei, Zangiabadian, Rezazadeh, Wegener and Arjmand2025), existing tools primarily focus on the prediction of remission defined using the core positive symptoms of psychosis (Coutts et al., Reference Coutts, Koutsouleris and McGuire2023), and other outcome measures, such as functioning (Koutsouleris et al., Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai, Wobrock, Derks, Fleischhacker and Hasan2016, Reference Koutsouleris, Kambeitz-Ilankovic, Ruhrmann, Rosen, Ruef, Dwyer, Paolini, Chisholm, Kambeitz, Haidl, Schmidt, Gillam, Schultze-Lutter, Falkai, Reiser, Riecher-Rössler, Upthegrove, Hietala, Salokangas and Consortium2018), and quality of life (Beaudoin, Hudon, Giguère, Potvin, & Dumais, Reference Beaudoin, Hudon, Giguère, Potvin and Dumais2022). Prognostication of depressive episodes following FEP is critical, as patients who experience these episodes tend to have a poorer recovery and a higher risk of psychotic relapse (Guerrero-Jiménez, Carrillo de Albornoz Calahorro, Girela-Serrano, Bodoano Sánchez, & Gutiérrez-Rojas, Reference Guerrero-Jiménez, Carrillo de Albornoz Calahorro, Girela-Serrano, Bodoano Sánchez and Gutiérrez-Rojas2022; Jääskeläinen et al., Reference Jääskeläinen, Juola, Korpela, Lehtiniemi, Nietola, Korkeila and Miettunen2018). Furthermore, the high comorbidity between psychosis and depression, along with numerous transdiagnostic findings that bridge these disorders across various biopsychosocial levels (Alexandros Lalousis et al., Reference Alexandros Lalousis, Wood, Reniers, Schmaal, Azam, Mazziota, Saeed, Wragg and Upthegrove2023), underscores the clinical importance of expanding the application of AI to study and predict depressive symptoms in psychosis. To our knowledge, no large-scale multivariate pattern recognition algorithms have been trained to predict comorbid depressive episodes and symptom trajectories in FEP and externally validated in independent cohorts.
The current work fills this gap by utilizing two large, multisite, and longitudinal clinical trials of minimally antipsychotic-treated FEP patients to predict depressive symptom trajectories, including the presence of post-psychotic depressive episode. These trials included the European First-Episode Schizophrenia Trial (EUFEST; ISRCTN68736636) and the Recovery After an Initial Schizophrenia Episode Early Treatment Program (RAISE-ETP; NCT01321177). The datasets comprised sociodemographic, clinical, treatment-related, quality-of-life, blood samples, and neurocognitive measures. Our objective was to predict future depression symptomatology in FEP patients by analyzing baseline variables using machine learning, to assess the prognostic specificity of these models for depressive symptoms and episodes, rather than negative symptoms, and to study the underlying patient and treatment characteristics that predict good versus poor depression outcomes.
Methods
The Supplementary Methods section extensively details the methods of this study. Here, we provide a summary of the fundamental steps.
Study design and data processing
Data from the EUFEST and RAISE-ETP trials were used. The study designs are detailed in previous publications (Fleischhacker, Keet, & Kahn, Reference Fleischhacker, Keet and Kahn2005; Kane et al., Reference Kane, Schooler, Marcy, Correll, Brunette, Mueser, Rosenheck, Addington, Estroff and Robinson2015b). Briefly, written informed consent was obtained from all participants. EUFEST recruited 498 patients (ages 18–40 years) meeting the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition criteria for schizophrenia-spectrum disorders between 2002 and 2006 from 50 centers in 14 European countries and Israel. Participants received 1 year of treatment with antipsychotics, and outcomes included retention, psychotic symptoms assessed with the Positive and Negative Syndrome Scale (PANSS), depressive symptoms assessed using the Calgary Depression Scale for Schizophrenia (CDSS), side effects, and blood-based biomarkers like insulin, glucose, high-density lipoprotein, low-density lipoprotein, and triglycerides. RAISE-ETP recruited 404 patients (ages 15–40 years) from 34 sites across 21 US states between 2010 and 2012. Patients were allocated to either usual community care or the NAVIGATE program, with primary outcomes including quality of life, psychotic symptoms (PANSS), depressive symptoms (CDSS), and analogous blood-based biomarkers. A detailed record of the data domains and their corresponding collection dates in both studies is provided in Supplementary Tables S1 and S2. The trials complied with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2013. All procedures involving patients were approved by the local ethics committees of the participating centers. Since this study is a secondary analysis of previously collected and anonymized data, additional ethical approval was not required.
In EUFEST, we restricted the study to 320 patients for whom outcomes at the 4-week, 6-month, and 12-month visits were available. In RAISE-ETP, we restricted the study to 234 patients for whom outcomes at the 6- and 12-month visits were available. In both cases, patients were also excluded when ≥20% of the baseline variables were missing. Included and excluded patients showed no significant differences in baseline characteristics for either sample (see Supplementary Tables S5 and S6). A full description of the data extraction is provided in Supplementary Methods. A preregistration of the study can be found on the Open Science Framework platform (osf.io/r95b2/).
Statistical analyses
All group-level statistical analyses were performed using the SciPy library (1.10.0) with Python 3.11. The reported p-values are two-sided. First, univariate statistical methods were used to assess the differences between samples and classification groups. Two-sample t-tests were used to compare continuous variables, while the χ 2 test was used for categorical variables. All p-values were corrected for multiple comparisons using the Benjamini–Hochberg false discovery rate correction. For statistical analysis of effects in time series, we used mixed-design analysis of variance for comparisons involving a single between-subjects factor and mixed-effects linear regression models when considering multiple between-subjects factors. Statistical significance was defined at a threshold of α = 0.05.
Machine learning analyses
Model development and validation were guided using the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis guidelines. We used the MATLAB-based (2023b, MathWorks Inc.) machine learning software NeuroMiner version 1.2 preprocess features (scaling, pruning, and imputation of missing values), train, cross-validate, interpret, externally validate, and test the specificity of our models. Linear support vector machine (SVM) classifiers and regressors were trained and tested in a repeated, nested cross-validation framework with five outer and inner folds and permutations (see Supplementary Methods). In addition to these cross-validation experiments where patients were pooled across sites, we performed leave-site-out cross-validation analyses to test the geographic transportability of our models across the sites of each trial (see Supplementary Table S9).
Regression models were trained to predict the absolute change in CDSS scores between baseline and follow-up. Classifiers were trained using two different sets of binary labels:
-
1. A label indicating a positive or negative change (irrespective of change of magnitude) in CDSS from baseline to the 6- and 12-month follow-up (±ΔCDSS, improvers vs. deteriorators).
-
2. A classification of patients based on whether they experienced a post-psychotic depressive episode (±PPD) from baseline up to 6 months, with a depressive episode defined according to the CDSS guidelines (CDSS total ≥ 7).
For all regression models, the mean squared error was used as an optimization metric. For ±ΔCDSS classifiers, balanced accuracy (BAC) was used as a metric. To address the high imbalance in sample sizes between the ±PPD groups, we utilized an enhanced BAC metric (see Supplementary Methods).
Initially, models were trained independently with all the available data on the EUFEST or RAISE-ETP samples to predict ΔCDSS (regression) and ±ΔCDSS (classification) at 6- and 12-month follow-ups, as well as PPD (classification). These models were not externally validated due to the lack of a replication sample with identical variables. To overcome this limitation, we trained and validated the respective new models using only the harmonized variables available in both samples. Specifically, we trained on either EUFEST or RAISE-ETP patients as a discovery sample, reported the out-of-training (OOT) performance, and then validated the respective models in the external cohort using out-of-cross-validation (OOCV). A summary of the data processing and design of the analysis is given in Figure 1. Label permutation analysis (1,000 permutations) at discovery and external validation was conducted to assess the statistical significance of the models using previously described methods (Golland & Fischl, Reference Golland, Fischl, Taylor and Noble2003). In addition, models were trained and validated using a condensed set of predictive individual variables with a selection probability over 50% to see if a streamlined tool maintains performance while improving clinical scalability. Models were also retrained without blood-based variables and with only biological variables to evaluate their contribution to predictions.

Figure 1. Schematic design of the data processing and analysis design of the study.
Note: Schematic design of data processing and analysis design. Please refer to the Supplementary Methods for a detailed explanation. Abbreviations: CDSS, Calgary Depression Scale for Schizophrenia; EUFEST, European First-Episode Schizophrenia Trial; FEP, first-episode psychosis; HDL, high-density lipoprotein; LDL, low-density lipoprotein; OOCV, out of cross-validation; OOT, out of training; PPD, post-psychotic depression; RAISE-ETP, Recovery After an Initial Schizophrenia Episode Early Treatment Program; SVM, support vector machine.
Post hoc analyses
To evaluate model prediction consistency, we assessed the correlation of SVM decision scores and classification agreement between OOT and OOCV predictions for the same patients. Receiver operator characteristic (ROC) curves and their area under the curve were used to visually compare model performances. We evaluated the models’ ability to distinguish between negative and depressive symptoms by testing models trained on depressive symptom labels to predict negative symptom labels. We also tested whether PPD models predict positive symptom non-remission to assess whether the pattern underlying depressive episodes also generalizes to non-remission outcomes. This model specificity analysis included significance testing using 1,000 label permutations (see Supplementary Table S11). Model calibration at the OOT and OOCV levels was assessed using calibration curve analysis and measuring the expected calibration error (ECE; see Supplementary Figure S11). In addition, the potential net benefit of utilizing model predictions for treatment decisions, compared to treating all patients or no patients at different decision thresholds, was assessed using decision-curve analysis (see Supplementary Figure S12). To identify robust features, we calculated selection probabilities for models trained on EUFEST and RAISE-ETP samples, and visually compared features selected across both datasets. To evaluate the performance difference of models in the RAISE-ETP patients relative to EUFEST patients, we examined potential differences between both samples using inferential statistics (see Supplementary Methods).
Results
Group descriptives and comparisons
Patients across both clinical trials differed significantly in several sociodemographic and psychopathological variables (Supplementary Table S7); RAISE-ETP patients were younger (mean [SD] age: 23.90 [5.27] vs. 25.93 [5.60]), predominantly male (73.50% vs. 56.56%), had a higher body mass index (BMI; 27.06 [6.85] vs. 22.04 [3.17]), lower PANSS positive (18.84 [5.22] vs. 23.34 [6.16]), PANSS general (20.09 [5.30] vs. 44.53 ([10.77]) and CDSS scores (4.16 [3.79] vs. 5.2 [4.85]), and higher rate of antidepressant prescriptions (33.33% vs. 2.18%).
A summary of baseline sociodemographic and clinical statistics of patients grouped by the three classification labels (± ΔCDSS at 6 and 12 months, and ± PPD) is given in Table 1. In RAISE-ETP, +ΔCDSS patients at 6 months had a significantly higher BMI. In addition, +PPD patients were significantly more likely to be female. In EUFEST, +PPD patients had higher PANSS general symptoms and a different diagnostic distribution, with a higher rate of schizoaffective disorder. CDSS scores at baseline significantly differed in patients grouped by all classification labels in both EUFEST and RAISE-ETP samples.
Table 1. Differences in baseline variables across all classification labels in patients from the EUFEST sample and the RAISE-ETP sample

Abbreviations: BMI, body mass index (calculated as weight in kilograms divided by square of height in meters); CDSS, Calgary Depression Scale for Schizophrenia; CGI, clinical global impression; EUFEST, European First-Episode Schizophrenia Trial; PANSS, Positive and Negative Syndrome Scale; RAISE-ETP, Recovery After an Initial Schizophrenia Episode Early Treatment.
Machine learning analyses
Initially, we utilized all the available variables in the EUFEST and RAISE-ETP samples independently to train the linear SVMs (see Supplementary Results). Then, we reran our SVM regression and classification training pipelines with the harmonized 123 variables available across both samples. We trained the models using the EUFEST as a discovery sample and applied the models to RAISE-ETP as an OOCV sample, and vice versa. The performance metrics (OOT and OOCV) for the two regression and three classification labels are shown in Table 2. Label permutation testing (see Supplementary Methods) was used to test the significance of model predictions. EUFEST-trained classifiers predicted ±ΔCDSS at the 6-month follow-up with an OOT-BAC of 75.09% [2.31] (P < .001, ECE = 12%) and an OOCV-BAC of 66.26% (P < .001, ECE = 6%), at the 12-month follow-up with an OOT-BAC of 77.61% (P < .001, ECE = 14%) and an OOCV-BAC of 68.78% (P < .001, ECE = 11%), and ±PPD with an OOT-BAC of 69.01% (P < .001, ECE = 13%) and an OOCV-BAC of 59.15% (P < .001, ECE = 15%). RAISE-ETP-trained classifiers predicted ± ΔCDSS at the 6-month follow-up with an OOT-BAC of 66.26% (P < .001, ECE = 6%) and an OOCV-BAC of 76.35% (P < .001, ECE = 15%), at the 12-month follow-up with an OOT-BAC of 67.88% (P < .001, ECE = 11%) and an OOCV-BAC of 73.05% (P = 0.026, ECE = 13%), and ±PPD with OOT BAC of 67.67% (P < .001, ECE=12%) and an OOCV-BAC of 58.92% (P < .001, ECE = 12%). Thus, the discriminative performance of the ±ΔCDSS models was consistently higher in the EUFEST sample compared to the RAISE-ETP sample, independently of whether the EUFEST sample was used as a discovery or validation sample. This effect was not observed in models predicting PPD. In addition, the accuracy of models predicting ±ΔCDSS was higher at the 12-month follow-up. These higher performances are evident in the ROC curves shown in Figure 2a. Decision-curve analysis of models at the OOT and OOCV level (Supplementary Figure S12) shows that models predicting ±ΔCDSS provide a clear net benefit at higher threshold probabilities. With the ±PPD label, the net benefit is only evident when models are trained in the EUFEST cohort.
Table 2. Performance metrics of regressors and classifiers in discovery and out-of-sample validation

Abbreviations: AUC-ROC, area under the receiver operating characteristic curve; BAC, balanced accuracy; EUFEST, European First-Episode Schizophrenia Trial; FN, false negative; FP, false positive; NPV, negative predictive value; OOCV, out of cross-validation; OOT, out of training; PPV, positive predictive value; PSI, Prognostic Summary Index; RAISE-ETP, Recovery After an Initial Schizophrenia Episode Early Treatment Program; sens, sensitivity; spec, specificity; TN, true negative; TP, true positive.
Note: The EUFEST OOCV performance corresponds to models trained on RAISE-ETP and validated on EUFEST, whereas the EUFEST OOT performance reflects the performance of models trained and evaluated within the EUFEST cohort. Similarly, the RAISE-ETP OOCV performance pertains to models trained on EUFEST and validated on RAISE-ETP, while the RAISE-ETP OOT performance pertains to models trained and evaluated within the RAISE-ETP cohort.

Figure 2. Predictive performance and predictive features of ±ΔCDSS and ±PPD classifiers.
Note: The performance of classifiers trained using the EUFEST or RAISE-ETP samples was evaluated at the OOT and OOCV levels using ROC curve analysis. This analysis was conducted for models predicting the ±ΔCDSS label at 6 months (a1), at 12 months (a2), and the ±PPD label (a3). In addition, we assessed the robustness of the selected features by visualizing the selection probability of each harmonized feature when the model was trained with either cohort for the ±ΔCDSS label at 6 months (b1), at 12 months (b2), and the ±PPD label (b3). Abbreviations: AUC, area under the curve; CDSS, Calgary Depression Scale for Schizophrenia; CGI, clinical global impression; EUFEST, European First-Episode Schizophrenia Trial; OOCV, out of cross-validation; OOT, out of training; PANSS, Positive and Negative Syndrome Scale; PPD, post-psychotic depression; ROC, receiver operating characteristic; RAISE-ETP, Recovery After an Initial Schizophrenia Episode Early Treatment Program. Note: The EUFEST OOCV performance corresponds to models trained on RAISE-ETP and validated on EUFEST, whereas the EUFEST OOT performance reflects the performance of models trained and evaluated within the EUFEST cohort. Similarly, the RAISE-ETP OOCV performance pertains to models trained on EUFEST and validated on RAISE-ETP, while the RAISE-ETP OOT performance pertains to models trained and evaluated within the RAISE-ETP cohort.
Significant predictors and effect direction were determined by the feature selection probability and the cross-validated ratio (p > 0.5 and CVR±3; see Supplementary Methods). Overall, severity of psychopathology (PANSS and CDSS scores), mental and physical health satisfaction, job satisfaction, living alone, blood cholesterol and triglycerides, pulse, substance use, neurocognitive performance (executive functioning), haloperidol treatment, and diagnosis of schizoaffective disorder were selected by the regression and classification models to predict depressive symptoms (Supplementary Figures S5–S8).
To test the specificity of our models for depressive outcomes, we assessed whether models trained on depressive symptoms also predicted changes in negative symptoms and positive symptom non-remissions (Supplementary Table S11). To assess the clinical applicability of these predictors, each model was retrained with a condensed variable set by using variables that were individual items (not total scores) and had a selection probability of 50% or higher. We statistically compared the predictions of models using the condensed set and all harmonized variables (Supplementary Table S18) and observed that BAC was consistent for both. In addition, we examined the consistency of feature selection independently across the two samples for models trained separately in each cohort (Figure 2b and Supplementary Figure S9). Severity of psychopathology scores (PANSS and CDSS scores), general health satisfaction, living alone, blood cholesterol, and triglycerides were consistently selected irrespective of the training sample. Finally, we evaluated the consistency of model predictions and assessed the contribution of biological variables and history of depression (in the EUFEST sample) to model predictions (see Supplementary Results).
Misclassification analyses
We studied the lower predictive performance of our models in the RAISE-ETP patients. To this end, we first studied the influence of treatment assignment and antidepressant prescription at baseline on depression courses in RAISE-ETP (see Supplementary Results). We then evaluated the interactions between treatment arm, antidepressive prescription, and model misclassification. We examined the trajectories of CDSS scores stratified by treatment program (CC and NAVIGATE), and antidepressant treatment (±AD; Figure 3a). We identified higher model misclassification in patients with a prescribed antidepressant for the ±ΔCDSS (6 months) label and trending toward significance for the ±PPD label (Figure 3b). For the ±PPD label, antidepressant prescription was higher for patients with a predicted and/or observed poor outcome (Figure 3c). For ±ΔCDSS (6 months) labels, patients with a predicted good but observed bad outcome, and vice versa, were more likely to be prescribed antidepressants. We calculated the SVM score distributions, cumulative distributions, and cumulative model errors for patients with and without antidepressant prescriptions and in either treatment program (Figure 3d,e and Supplementary Figures S14 and S15). Models largely did not discriminate between patients with different treatments, except for the ±PPD model, which was more likely to predict PPD in patients with a prescribed antidepressant. In addition, the cumulative error analysis of decision scores showed that for the ±PPD label, the NAVIGATE program had significantly more misclassifications in patients where a negative outcome was predicted (+PPD) but a positive outcome was observed (−PPD; Figure 3e4). A similar effect was seen in patients with a prescribed antidepressant (Figure 3e2), although in this case, the differences in error distributions trended toward significance.

Figure 3. Comparison of CDSS scores, models’ decision scores, and misclassifications for patients in the NAVIGATE and community-based treatment, and by antidepressant prescription.
Note: (a) In the RAISE-ETP sample, we compared the trajectories of CDSS scores for patients in the NAVIGATE treatment program (NAVIGATE), the community-based care (CC), without an antidepressant prescription at baseline (-AD), and with an antidepressant prescription at baseline (+AD). In addition, (b) we compared classifier misclassifications by antidepressant prescription and (c) the percentage of antidepressant prescriptions by type of misclassification. Furthermore, we compared the distributions of SVM decision scores of classifiers by antidepressant prescription and treatment plan for the ±ΔCDSS label at 6 months (d1 and d3), at 12 months (see Supplementary Figure S14), and the ±PPD label (d2 and d4). To study the type of misclassifications by treatment, we compared the cumulative misclassifications by antidepressant prescription and treatment plan for the ±ΔCDSS label at 6 months (e1 and e3), at 12 months (see Supplementary Figure S15), and the ±PPD label (e2 and e4). Abbreviations: CDSS, Calgary Depression Scale for Schizophrenia; KS, Kolmogorov–Smirnov statistic.
Discussion
Using robust cross-validation, we built linear SVM models to predict changes in depressive symptoms (±ΔCDSS) at 6 and 12 months and depressive episodes (±PPD) in the first 6 months in FEP patients. Models trained on EUFEST and RAISE-ETP data predicted depressive symptoms with high sensitivity and specificity, and depressive episodes with moderate specificity but low sensitivity. Decision-curve analysis showed that the clinical applicability of the models would be best optimized at higher thresholds, targeting patients predicted to have the poorest outcomes as defined by depressive symptoms. Key predictors included baseline symptoms, neurocognitive performance, quality of life, and blood markers (e.g. prolactin in EUFEST and alanine transferase in RAISE-ETP). A harmonized subset of variables (n = 123) was used to retrain models, achieving consistent predictive performance across datasets. Harmonized features, such as symptom severity, quality of life, and blood markers (triglycerides and cholesterol), were validated in both samples. Cross-validation between EUFEST and RAISE-ETP models showed robust geographic transportability of predictive signatures between multiple European countries with highly different healthcare systems and the United States. These signatures were specific for depressive symptoms (CDSS), and not PANSS-based negative symptoms or positive non-remission. Finally, we demonstrated that the models maintain their predictive performance when using a highly condensed set of top predictors.
These results have potential clinical implications. First, the findings validate recently developed risk prediction models showing the high prognostic value of baseline symptom severity and history of depression for the prediction of future depressive episodes, both in the context of comorbid psychosis (Banerjee et al., Reference Banerjee, Wu, Bingham, Marino, Meyers, Mulsant, Neufeld, Oliver, Power, Rothschild, Sirey, Voineskos, Whyte, Alexopoulos and Flint2024; Carter et al., Reference Carter, Banerjee, Alexopoulos, Bingham, Marino, Meyers, Mulsant, Neufeld, Rothschild, Voineskos, Whyte and Flint2025) and in major depressive disorder independently (Song et al., Reference Song, Qian, Sui, Greiner, Li, Greenshaw, Liu and Cao2023; Teutenberg et al., Reference Teutenberg, Stein, Thomas-Odenthal, Usemann, Brosch, Winter, Goltermann, Leenings, Konowski, Barkhau, Fisch, Meinert, Flinkenflügel, Schürmeyer, Bonnekoh, Thiel, Kraus, Alexander, Jansen and Kircher2025). Second, the consistent model selection of cognitive deficits and poor quality of life metrics shows the relevance of the patients’ baseline functioning in predicting future depression-related outcomes. Cognitive deficits are a core feature of psychosis and significantly contribute to the impaired functioning of patients with psychosis, and although not extensively studied, it is known that affective symptoms in psychosis are often associated with more pronounced cognitive deficits (McCutcheon, Keefe, & McGuire, Reference McCutcheon, Keefe and McGuire2023). In addition, comorbid depression and general psychopathology have long been associated with lower quality of life and the subjective perception of poorer mental and general health (Watson et al., Reference Watson, Zhang, Rizvi, Tamaiev, Birnbaum and Kane2018), suggesting that a poorer perception of health and quality of life may be influenced by depressive mood in psychosis, and vice versa (Narvaez, Twamley, McKibbin, Heaton, & Patterson, Reference Narvaez, Twamley, McKibbin, Heaton and Patterson2008).
Some biological markers were also consistently selected in both samples (triglycerides and cholesterol blood levels) and independently (prolactin levels in EUFEST and BMI and alanine transaminase levels in RAISE-ETP), adding a key biological dimension to the models’ prediction of depressive symptoms. Excluding these markers increased models’ bias toward predicting the predominant group (patients in remission from depression), evidenced by the higher imbalance between sensitivity and specificity. This suggests that simple blood-based biological variables are necessary to ensure the unbiased and accurate detection of depressive symptom changes in FEP when combined with clinical and quality-of-life predictors. Patients with psychiatric disorders demonstrate immuno-metabolic dysfunction and metabolic comorbidity compared to the normal population (Han, Reference Han2022; Peng, Lin, Lee, & Chen, Reference Peng, Lin, Lee and Chen2024). This association can arise due to antipsychotic side effects (Mondelli et al., Reference Mondelli, Anacker, Vernon, Cattaneo, Natesan, Modo, Dazzan, Kapur and Pariante2013). However, previous studies have shown metabolic impairment in drug-naïve first-episode patients (Garrido-Torres et al., Reference Garrido-Torres, Rocha-Gonzalez, Alameda, Rodriguez-Gangoso, Vilches, Canal-Rivero, Crespo-Facorro and Ruiz-Veguilla2021), and predate illness and prodromal onset, potentially resulting from shared pro-inflammatory predisposition (Palmer et al., Reference Palmer, Morales-Muñoz, Perry, Marwaha, Warwick, Rogers and Upthegrove2024). In the context of comorbid depression and psychosis, serum lipids and prolactin have also been found to be associated with affective symptoms (Gohar et al., Reference Gohar, Dieset, Steen, Mørch, Iversen, Steen, Andreassen and Melle2019). In keeping with the previous literature, the results of this work indicate that cost-effective monitoring of abnormal lipid and metabolic hormonal levels may provide a valuable biomarker for predicting and managing depressive symptoms in FEP patients. In addition, trying to incorporate lifestyle and psychosocial interventions in first-episode patients might have an impact on the prevalence of depressive symptoms in psychosis (Schmitt et al., Reference Schmitt, Maurus, Rossner, Röh, Lembeck, von Wilmsdorff, Takahashi, Rauchmann, Keeser, Hasan, Malchow and Falkai2018).
Importantly, we observed that models predicting depressive symptom trajectories (ΔCDSS) performed better in EUFEST than in RAISE-ETP, both at the OOT and OOCV levels. Baseline comparisons revealed significant differences between the cohorts in sociodemographics, psychopathology, and antidepressant prescriptions. Notably, RAISE-ETP in the NAVIGATE program (Mueser et al., Reference Mueser, Penn, Addington, Brunette, Gingerich, Glynn, Lynde, Gottlieb, Meyer-Kalos and McGurk2015) showed a sharper decline in depressive symptoms. The same effect was also observed in patients without an antidepressant prescription. Since antidepressant prescriptions were not randomized, unlike the treatment program, this effect might be an indication of bias reflecting clinicians’ tendency to prescribe antidepressants to patients they predict to be at higher risk of poor affective outcomes. Patients in the NAVIGATE program were generally prescribed fewer antidepressants, indicating a confounding effect between prescriptions and treatment plans. Interaction analyses revealed a trend, indicating that NAVIGATE was more effective in reducing depressive symptoms among patients prescribed antidepressants at baseline. Model misclassifications for the PPD label were notably higher at negative SVM decision scores for NAVIGATE-treated patients receiving antidepressants. Specifically, these errors involved predicting negative outcomes (+PPD) when positive outcomes (−PPD) were observed. This suggests that these interventions effectively reduce the likelihood of poor outcomes, thus improving treatment results for patients who received a poor prognosis by our models. The positive effect of psychosocial interventions on depressive symptoms in FEP was observed by the RAISE-ETP group (Kane et al., Reference Kane, Robinson, Schooler, Mueser, Penn, Rosenheck, Addington, Brunette, Correll, Estroff, Marcy, Robinson, Meyer-Kalos, Gottlieb, Glynn, Lynde, Pipes, Kurian, Miller and Heinssen2015a) and other trials (Hagen, Nordahl, & Gråwe, Reference Hagen, Nordahl and Gråwe2005; Ruggeri et al., Reference Ruggeri, Bonetto, Lasalvia, Fioritti, de Girolamo, Santonastaso, Pileggi, Neri, Ghigi, Giubilini, Miceli, Scarone, Cocchi, Torresani, Faravelli, Cremonese, Scocco, Leuci, Mazzi and Group2015). In addition, growing evidence suggests that the coadministration of antidepressants is effective in the treatment of depressive symptoms in FEP (Barnes et al., Reference Barnes, Drake, Paton, Cooper, Deakin, Ferrier, Gregory, Haddad, Howes, Jones, Joyce, Lewis, Lingford-Hughes, MacCabe, Owens, Patel, Sinclair, Stone, Talbot and Yung2019; Dondé, Vignaud, Poulet, Brunelin, & Haesebaert, Reference Dondé, Vignaud, Poulet, Brunelin and Haesebaert2018). Despite this evidence, these additional therapies are not standard recommendations in clinical practice (Dondé et al., Reference Dondé, Vignaud, Poulet, Brunelin and Haesebaert2018). Various international treatment guidelines recommend waiting for primary antipsychotics to take effect during the acute phase of the illness, since some FEP patients will remit from depressive symptoms with only antipsychotic medication. Recently published guidelines, including the British Association of Psychopharmacology, suggest that coadministration of antidepressants might be justified but should be considered after careful assessment of the risks of polypharmacy (Barnes et al., Reference Barnes, Drake, Paton, Cooper, Deakin, Ferrier, Gregory, Haddad, Howes, Jones, Joyce, Lewis, Lingford-Hughes, MacCabe, Owens, Patel, Sinclair, Stone, Talbot and Yung2019). This work adds weight to the body of evidence supporting both integrative talking therapies and antidepressants for the treatment of depressive symptoms in FEP. Furthermore, it introduces a potential risk prediction tool that could identify those patients unlikely to remit with antipsychotic medication alone, complementing clinicians’ optimism bias in predicting poor mental health outcomes (Koutsouleris et al., Reference Koutsouleris, Dwyer, Degenhardt, Maj, Urquijo-Castro, Sanfelici, Popovic, Oeztuerk, Haas, Weiske, Ruef, Kambeitz-Ilankovic, Antonucci, Neufang, Schmidt-Kraepelin, Ruhrmann, Penzel, Kambeitz, Haidl and Consortium2021). Ultimately, by identifying patients who are at risk for prolonged depressive symptoms early on, this tool has the potential to improve treatment trajectories, reduce the burden of illness, and enhance patients’ quality of life by preventing periods of insufficient treatment and associated comorbidities.
Limitations
This work has limitations. First, we restricted the analysis to patients in EUFEST and RAISE-ETP who attended the follow-ups used as outcome data. Although we found no baseline differences in included and excluded patients, this decreased the sample size of our analysis to 323 EUFEST patients and 234 RAISE-ETP patients. While we employed robust nested cross-validation procedures, the moderate sample size may still pose a risk of overfitting. In addition, patients who attended the follow-ups may represent a more engaged subgroup of patients, potentially introducing a selection bias. Furthermore, although multinational and multisite, the EUFEST and RAISE-ETP are largely Western samples, which may limit the generalizability of our findings to non-Western populations. While no significant differences in model misclassifications were observed across racial and ethnic groups in the more diverse RAISE-ETP sample (see Supplementary Table S17), further research is necessary to validate these results in broader, more representative populations. Additionally, we used CDSS scores to identify depressive episodes in psychosis, with the limitation that CDSS questionnaire is not a diagnostic interview; a score above 7 suggests moderate to severe depression within 2 weeks and has an 82% specificity and 85% sensitivity for predicting the presence of a major depressive episode in schizophrenia. This limitation should be noted, as the CDSS may not fully reflect the intricacies of a depression diagnosis.
Conclusion
The results of this study demonstrated and independently validated that depressive symptoms and episodes are predictable from baseline information following psychosis onset. Notably, our study also demonstrated that certain concomitant treatments, such as the NAVIGATE treatment program and antidepressants, may be useful in the treatment of depressive symptoms in FEP. Given that integrative treatments require significant resources and a careful assessment of their benefit-risk ratios in the individual patient, the clinical algorithms presented here could serve as scalable stratification tools for the early identification of patients at the highest risk of impactful depressive symptoms and episodes following a conventional treatment for psychosis. Future research should focus on refining these models, validating them at different stages of psychosis, exploring their use with diverse data sources – such as electronic health records or genetic data – and investigating the causal mechanisms underlying depressive symptoms in psychosis.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0033291725100950.
Acknowledgments
This article uses data collected in the EUFEST and RAISE-ETP trials. RAISE-ETP was supported by an NIMH contract and accessed via the NIMH data archive (NDA). This study is registered in the NDA under DOI: 10.15154/tddz-zt18. EUFEST is funded by the European Foundation for Research in Schizophrenia, which in turn has received funds from AstraZeneca, Pfizer, and Sanofi-Synthelabo. The authors gratefully acknowledge the support of the original funders.
Funding statement
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interests
P.A.L. has received honoraria for talks presented at educational meetings organized by Boehringer Ingelheim outside of the submitted work. The remaining authors declare no competing interests exist.