Background
Mental disorders are common, costly, and burdensome (GBD Mental Health Collaborators, 2022). Understanding the interplay between genetic predisposition, lifestyle and environmental factors in mental health disorders are crucial for developing effective preventative, diagnostic, and management approaches (Wykes et al., Reference Wykes, Bell, Carr, Coldham, Gilbody, Hotopf, Johnson, Kabir, Pinfold, Sweeney, Jones and Creswell2023). Such research requires large sample sizes, such as population-based cohorts. The UK Biobank (UKB) is a large-scale, prospective cohort with extensive data for a sample of over 500,000 individuals aged between 40 and 69 at recruitment (2006–2010), providing a valuable resource for researchers (Sudlow et al., Reference Sudlow, Gallacher, Allen, Beral, Burton, Danesh, Downey, Elliott, Green and Landray2015). It includes questionnaire data, biological samples, and genetic profiles, supplemented by regularly updated linkage to National Health Service (NHS) health records. This allows clinical diagnoses to be used as risk factors and outcomes in longitudinal studies (Allen et al., Reference Allen, Lacey, Lawlor, Pell, Gallacher, Liam, Elliott, Matthews, Lyons, Whetton, Anneke, Hurles, Chapman, Roddam, Fitzpatrick, Hansell, Hardy, Marioni, O’Donnell, Williams, Lindgren, Effingham, Sellors, Danesh and Collins2024).
UKB has undergone several ‘enhancements’, involving additional participation from volunteers since the baseline assessment. These include activity monitoring, multimodal imaging, and online questionnaires. Two mental health questionnaires, MHQ1 (primarily completed in 2016) and MHQ2 (primarily completed in 2022), were introduced to assess lifetime common mental disorders (Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen, Dickens, Fox, Graham and Holliday2020, Reference Davis, Coleman, Adams, Breen, Cai, Davies, Davies, Dregan, Eley, Fox, Holliday, Hübel, John, Kassam, Kempton, Lee, Li, Maina, McCabe, McIntosh, Oram, Richards, Skelton, Starkey, ter Kuile, Thornton, Wang, Yu, Zvrskovec and Hotopf2025). Intended primarily to provide phenotypes of lifetime common mental disorders, these questionnaires include measures on depression, mania, generalized anxiety, post-traumatic stress disorder (PTSD), and alcohol use disorder, with panic disorder and eating disorders added in MHQ2. Additionally, transdiagnostic features were assessed, such as psychotic experiences, self-harm, quality of life, and resilience. The questionnaires also collected information on known mental health risk factors, including childhood and adult adverse events, cannabis use, loneliness, social isolation, and COVID-19 exposure in MHQ2. Figure 1 presents a timeline of UKB’s resource accumulation before and after baseline, while Table 1 outlines available data sources for key mental disorder topics.

Figure 1. UK Biobank timeline showing timing (but not completeness) of enhancements.
Table 1. Mental disorder topics covered in data sources within the UKB

Abbreviations: ADHD, attention-deficit and hyperactivity disorder; ASD, autism spectrum disorder; AUD, alcohol use disorder; HS, help-seeking question ‘Have you ever seen a general practitioner (GP) for nerves, anxiety, tension or depression?’; HWB, Health and Wellbeing questionnaire; o, current; QoL, quality of life; SR, self-report (open question); pSR, prompted self-report (yes/no); SUD, substance use disorder; x, lifetime.
As demonstrated by Figure 1 and Table 1, data on mental health and mental disorders in UKB come from multiple sources, including record linkages, baseline data collection, and enhancements requiring a repeat participant contact, like imaging and questionnaires. Navigating this rich yet complex data landscape is challenging, and no single resource comprehensively describes the breadth of available data. This article aimed to address this gap.
We aimed to explore UKB in a way that will be valuable for anyone planning to use UKB mental health data and those who wish to know more to evaluate UKB research. The next step is to provide an overview of the research conducted using UKB 2016–2023, to discuss the data that have been used, trends and considerations. It can also serve as a resource for future researchers to identify and build upon previous findings. We will explore UKB’s strengths and limitations as an observational data source and signpost to further advice.
Methods
Selection and extraction
We conducted a review of published research utilizing UKB to investigate mental health themes. The UKB website, which records published papers that have been submitted as part of the data access agreement, served as a primary source. However, since the website was not updated between November 2022 and December 2023, we supplemented our search for 2022–2023 papers using Google Scholar. Specifically, for papers published in 2022–2023, we identified studies citing one of three UKB mental health methodology papers: Smith et al. (Reference Smith, Nicholl, Cullen, Martin, Ul-Haq, Evans, Gill, Roberts, Gallacher and Mackay2013), Davis et al. (Reference Davis, Cullen, Adams, Brailean, Breen, Coleman, Dregan, Gaspar, Hübel, Lee, McIntosh, Nolan, Pearsall and Hotopf2019, Reference Davis, Coleman, Adams, Allen, Breen, Cullen, Dickens, Fox, Graham and Holliday2020).
Papers were selected if they had a primary focus on mental health or mental disorders, as suggested by the title, and utilized UKB data or results in some form for primary research (i.e. not reviews). Selection was conducted by one researcher (L.M.), with another (K.A.S.D.) providing guidance on uncertainty and indexing. Basic characteristics of the papers were summarized using a standardized template by L.M. and A.S.K.
Classification
Four researchers were responsible for checking and categorizing the papers (K.A.S.D., A.Z., N.T.M., and S.R.C.), with paper allocation based on individual expertise where possible. The guidelines used for categorization, iteratively produced by the researchers, are shown in Supplementary Tables S1–S3. The first divide was into those papers focused on mental health disorder topics (MD papers) and those addressing broader subjects. The MD papers included specific disorders, compound mental disorders and transdiagnostic features. A compound mental disorder category was applied in cases where broad mental health measures were used. For example, Batty, Deary et al.’s (Reference Batty, Deary, Luciano, Altschul, Kivimäki and Gale2020) study which hypothesized a link between mental health disorders and COVID-19 hospitalization, categorized participants based on a baseline question about whether they had ever seen a psychiatrist.
For MD papers, we documented the data used to define the disorder or trait, and the secondary topic, which were ‘general/methodological’, ‘environmental factors’, ‘biomarkers’, ‘genetics’, ‘physical health’, or ‘other’. Papers with the secondary topic of inflammatory biomarkers were classified under biomarker research rather than physical health. Papers using genetic methods to explore relationships between mental health and another factor were classified by the factor under investigation rather than as genetic. For example, Coleman et al.’s (Reference Coleman, Peyrot, Purves, Davis, Rayner, Choi, Hübel, Gaspar, Kan and Van der Auwera2020) study used genetics to examine depression as the mental health topic and trauma as the secondary (environmental) factor. Those papers that were categorized under physical health were further divided into ‘cardiovascular and metabolic’, ‘pain and sensory’, ‘inflammatory’, ‘infectious’, ‘general health’, or ‘other’ subcategories.
Analysis
To identify patterns, we developed a matrix summarizing MD papers by their primary mental health topic, secondary topic and the UKB data types used. In Tables 2 and 3 and Figure 2 mental disorder topics are rows.
Table 2. Matrix of papers using UK Biobank to research mental health disorder topics classified by mental health topic (rows) and secondary topic (columns) with reference numbers

Note: See Supplementary Material, Section 1.2 for bibliography.
Abbreviations: ADHD, attention-deficit and hyperactivity disorder; ASD, autism spectrum disorder; AUD, alcohol use disorder; eBaseline, enhanced baseline; MHQ, first mental health questionnaire; Rpt, repeat; SUD, substance use disorder.
Table 3. Matrix of UK Biobank published research (papers) that involve both mental health and physical health aspects, sorted into mental health topic and physical health topic, highlighting the role of genetic epidemiological techniques (genetic tx).

Note: Reference numbers refer to bibliography in supplementary materials, Section 1.2.
Abbreviations: CardioM, cardiometabolic; CVD, cardiovascular disorder; dx, disorder(s); IBD, inflammatory bowel disorder; PTSD, post-traumatic stress disorder; tx, technique(s); VTE, venous thomboembolism.

Figure 2. Numbers of papers categorized by mental disorder topic showing the data source used for ascertainment. Data sources, including ‘multiple’ are exclusive, with each paper represented only once. Nil data source occurs when disorder is not specified in UKB, for example, when used as control cohort or GWAS results from UKB used on external cohort. Note: ADHD, attention-deficit and hyperactivity disorder; ASD, autism spectrum disorder; ED, eating disorder; Multiple sp, multiple specific disorders cf ‘mental disorder’ which is one variable for multiple disorders; Other transdiagnostic, psychotic experiences, anhedonia, mood instability and other; Anxiety+, generalized anxiety disorder and/or PTSD; UD, substance or alcohol use disorder.
In Table 2 (and more detailed counterpart Supplementary Table S16), columns represent secondary topics. To assign papers to columns in Table 2, we allocated a single secondary topic, prioritizing physical health, environmental factors, biomarkers, and genetics, in that order. This approach maximized sensitivity for studies on physical comorbidities while maintaining specificity for purely genetic aspects of mental health disorders. For example, the Coleman et al.’s (2020) study is positioned at the intersection of depression and environment in the matrix. It defined depression using MHQ1 lifetime disorder status, classifying its data type as MHQ for Supplementary Table S16 and Figure 2. The paper is also summarized in the relevant Supplementary Table S4.
Figure 2 and Supplementary Table S16 details the data sources used to define mental health topics, again assigning one value per paper, with papers using more than one data source described under ‘multiple’. In Table 3, we applied the same method for studies exploring comorbidities, linking mental health disorder topics as rows with columns represent the physical health subcategorization. The use of genetic technique was something identified from relevant papers by researchers, and is summarized for the comorbidity papers in Table 3.
Results
We identified 480 papers within the broad mental health domain published between 2016 and 2023. Of these, 338 were classified under mental health disorders topics (MD papers), summarized in Supplementary Materials Appendix 2.1: Supplementary Tables S4–S9. The remaining papers, covering broader mental health topics, are listed in Supplementary Tables S10–S15.
MD papers are shown in Table 2, categorized by disorder (rows) and secondary topics (columns). Supplementary Table 16 also shows data used. Depression is revealed as the most studied disorder, with 140 papers examining depression alone, 25 combining it with anxiety, and 14 with bipolar affective disorder. There are at least 25 depression-related papers in each of environmental, biomarkers, genetic, and physical health topic categories. The next most common disorder classification was ‘multiple disorders’, with 51 papers (66 when including compound mental disorder definitions). Schizophrenia was the second most researched single disorder.
Ninety-two papers from Table 2 had biomarkers as a secondary topic, including 41 related to brain imaging and 19 examining sleep or activity patterns. Seventy-three papers explored environmental factors as a secondary topic, with 27 considering both environmental and genetic influences. Fifty-five papers had genetics as their sole secondary topic, the majority of which (40 of 55) focused on depression alone or in combination with other disorders. Eighteen papers explored secondary topics such as cognition, reproductive traits, and personality traits, categorized as ‘other’; 12 focused solely on the mental disorders they studied, categorized as ‘general’.
The pattern of data use across the mental health disorder topics illustrated in Figure 2. Eighty-eight of 338 papers (26%) used MHQ1 alone and the next most popular source (72 papers) was genetic risk, broadly defined as any genetic instrument proxy, such as polygenic risk scores (PRS). Sixty-eight used the baseline questions alone, including 30 papers using the universal baseline questions, 26 using enhanced baseline questions, and 12 using a repeat of the baseline questionnaire items at the time of imaging. Linkage was used as the only data source by 18. Seventy-eight papers used multiple or combined UKB sources.
The pattern of use of sources between mental health disorder topics partly reflects the availability of data, as shown in Table 1, but also other patterns. For instance, for the two most commonly studied of depression and schizophrenia, the use on genetic risk to define the disorder was markedly different: for depression 14% (20/140); for schizophrenia 69% (16/26). Two further schizophrenia papers used UKB as a control cohort for clinical schizophrenia samples. This is likely related to the much lower prevalence of schizophrenia in the cohort.
Eighty-seven studies examined both mental health disorders and a physical health aspect, as shown in the matrix in Table 3. The physical health aspect included both disorders (e.g. diabetes) and traits (e.g. blood pressure). The most common physical health category was cardiometabolic (34 of 87), followed by sensory issues and pain, inflammatory disorders, and infections. General health, included indicators of overall health or ill health, such as mortality. Among mental health disorders, affective disorders – primarily depression – were the most frequently studied, appearing in 49 of the 87 comorbidity papers. Genetically informed techniques were widely used to investigate the relationships between mental and physical health aspects, with 31 studies (36%) employing such methods, as noted in Table 3.
Discussion
Using a combination of the UKB website and citation checking, we identified 480 papers that used UKB to investigate mental health and related topics from inception to the end of 2023. Of these, 338 papers focused on mental disorders, with the remainder addressing areas such as cognition and personality. Depression was by far the most frequently researched disorder, appearing as the sole focus in 140 papers and commonly included in studies covering multiple disorders. This emphasis aligns with depression’s high lifetime prevalence in UKB – affecting approximately 24% of participants (Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen, Dickens, Fox, Graham and Holliday2020). Schizophrenia was the second most studied disorder, appearing in 26 papers. However, its much lower prevalence in the cohort – comprising hundreds rather than thousands of cases – meant that studies typically employed different methodologies, often relying on hospital linkage, genetic risk scores, or imaging rather than self-report or questionnaire data.
Several papers explored multiple disorders or transdiagnostic features, with self-harm being the most commonly studied of these. Among the 87 studies addressing physical – mental comorbidity, cardiometabolic illness, or traits emerged as the most frequent physical health component. Approximately 36% of papers employed genetic techniques to investigate the relationship between physical and mental disorders.
To support future users of the UKB resource, we have provided guidance on navigating the data showcase, sharing code, and handling linked datasets in Appendix 3 of the Supplementary Material. A critical methodological issue concerns how mental disorder variables are defined within UKB. Multiple data sources are available, each with distinct strengths and limitations. Supplementary Table S17 in Appendix 3 summarizes these. Self-reports are prone to recall bias; hospital and general practitioner (GP) linkage data are often insensitive; brief questionnaires may lack diagnostic specificity; and genetic data capture only a fraction of mental health heterogeneity. Davis et al. (Reference Davis, Cullen, Adams, Brailean, Breen, Coleman, Dregan, Gaspar, Hübel, Lee, McIntosh, Nolan, Pearsall and Hotopf2019) demonstrated that prevalence estimates for depression, Generalised Anxiety Disorder (GAD), bipolar disorder, and psychosis vary substantially depending on the data source. Their findings revealed poor overlap between self-reported diagnoses, symptom-based measures like the Composite International Diagnostic Interview - Short Form (CIDI-SF), and hospital records, with the latter identifying only 5–10% of cases detected by other methods. The latest enhancement, MHQ2, introduces longitudinal data, new disorder ascertainment, and additional social factors, thus expanding the scope for future research (https://osf.io/c65t7).
Incorporating multiple data enhancements can enrich research but also narrows the sample size. For instance, while 157,000 participants completed MHQ1 and 169,000 completed MHQ2, only 111,000 completed both. Applying further filters – for example, restricting to those with GP, imaging, or actigraphy data – may reduce the sample size by up to 80%, thereby limiting statistical power, particularly for advanced genetic and longitudinal analyses.
Selection effects must also be considered. Despite UKB’s structured sampling via NHS primary care, participation was low (around 6% of those invited), leading to a cohort that is disproportionately well-educated, health conscious, and socioeconomically advantaged (Fry et al., Reference Fry, Littlejohns, Sudlow, Doherty, Adamska, Sprosen, Collins and Allen2017). Minoritized ethnic groups are underrepresented, with most participants of White European ancestry. This demographic skew compromises the evaluation of ethnic disparities and limits generalizability. Genetic studies have further restricted diversity by historically excluding non-European ancestry participants, though more recent efforts aim to leverage the available diversity to improve relevance and impact (Carress, Lawson, & Elhaik, Reference Carress, Lawson and Elhaik2021; Singh et al., Reference Singh, Chatzinakos, Barr, Gentry, Bigdeli, Webb and Peterson2025).
These selection effects are particularly relevant to mental health research, as people with mental disorders tend to be underrepresented in low-participation cohorts (Knudsen, Hotopf, Skogen, Øverland, & Mykletun, Reference Knudsen, Hotopf, Skogen, Øverland and Mykletun2010). Self-selection into enhancements compounds this bias. For example, MHQ1 participants have significantly lower neuroticism scores than the broader UKB cohort, even though higher neuroticism is a known risk factor for common mental disorders (Davis et al., Reference Davis, Coleman, Adams, Breen, Cai, Davies, Davies, Dregan, Eley, Fox, Holliday, Hübel, John, Kassam, Kempton, Lee, Li, Maina, McCabe, McIntosh, Oram, Richards, Skelton, Starkey, ter Kuile, Thornton, Wang, Yu, Zvrskovec and Hotopf2025). While UKB’s estimates for common disorders align reasonably well with population-level data (Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen, Dickens, Fox, Graham and Holliday2020), linkage with secondary mental health records shows that individuals with severe mental illness are significantly underrepresented (Li et al., Reference Li, Kormilitzin, Fernandes, Vaci, Liu, Newby, Goodday, Smith, Nevado-Holgado and Winchester2022). Consequently, UKB is not suitable for estimating prevalence, even with statistical weighting. However, analyses focused on causes, mechanisms, and consequences are less susceptible to these biases, and UKB remains highly valuable for such research aims (Allen et al., Reference Allen, Lacey, Lawlor, Pell, Gallacher, Liam, Elliott, Matthews, Lyons, Whetton, Anneke, Hurles, Chapman, Roddam, Fitzpatrick, Hansell, Hardy, Marioni, O’Donnell, Williams, Lindgren, Effingham, Sellors, Danesh and Collins2024; Batty, Gale et al., Reference Batty, Gale, Kivimäki, Deary and Bell2020).
Genetic techniques have been central to much of the mental health research using UKB. The cohort’s size and high-quality genotyping support detection of small genetic effects and exploration of gene–environment interplay (Bycroft et al., Reference Bycroft, Freeman, Petkova, Band, Elliott, Sharp, Motyer, Vukcevic, Delaneau and O’Connell2018; Garg et al., Reference Garg, Karpinski, Matelska, Middleton, Burren, Hu, Wheeler, Smith, Fabre, Mitchell, O’Neill, Ashley, Harper, Wang, Dhindsa, Petrovski and Vitsios2024). Beyond examining individual Single Nucleotide Polymorphism (SNPs), researchers have used PRSs, Mendelian randomization, and genetic correlations to study complex traits. Notable examples include the use of PRS as a proxy for unmeasured gut microbiota in studies of mental health (Qi et al., Reference Qi, Guan, Cheng, Wen, Liu, Ma, Cheng, Liang, Zhang, Liang, Li, Chu, Ye, Yao and Zhang2021), and multiple studies linking schizophrenia PRS to brain structure (Grama et al., Reference Grama, Willcocks, Hubert, Pardiñas, Legge, Bracher-Smith, Menzies, Hall, Pocklington, Anney, Bray, Escott-Price and Caseras2020; Neilson et al., Reference Neilson, Shen, Cox, Clarke, Wigmore, Gibson, Howard, Adams, Harris, Davies, Deary, Whalley, McIntosh and Lawrie2019). Mendelian randomization has helped infer causal directionality, such as identifying depression as a risk factor for inflammatory bowel disease (Luo, Xu, Noordam, van Heemst, & Li-Gao, Reference Luo, Xu, Noordam, van Heemst and Li-Gao2021). Genetic correlation analyses using linkage disequilibrium score regression have illuminated relationships between mood disorders, biological rhythms, and physical health (Chen et al., Reference Chen, Xie, Liu, Liang, Liao, Liao, Song and Zhang2022; Sirignano et al., Reference Sirignano, Streit, Frank, Zillich, Witt, Rietschel and Foo2022). Candidate gene studies have explored mechanisms such as the relaxin pathway in depression (Wong, Arathimos, Lewis, Young, & Dawe, Reference Wong, Arathimos, Lewis, Young and Dawe2023) and cytochrome P450 genotype in relation to antidepressant-related falls (Pronk et al., Reference Pronk, Seppala, Trajanoska, Stringa, van de Loo, de Groot, van Schoor, Koskeridis, Markozannes, Ntzani, Uitterlinden, Rivadeneira, Stricker and van der Velde2022).
These approaches help triangulate findings, reduce confounding, and strengthen causal inference (Fallin, Duggal, & Beaty, Reference Fallin, Duggal and Beaty2016; Lawlor, Tilling, & Davey Smith, Reference Lawlor, Tilling and Davey Smith2017; Power et al., Reference Power, Sanderson, Pagoni, Fraser, Morris, Prince, Frayling, Heron, Richardson and Richmond2024). Nonetheless, interpretation must be cautious. Genetic factors account for only a modest proportion of variance in complex human traits (Tanksley, Motz, Kail, Barnes, & Liu, Reference Tanksley, Motz, Kail, Barnes and Liu2019), and the predominance of White European ancestry in UKB’s genomic data again limits generalizability (Carress et al., Reference Carress, Lawson and Elhaik2021). Moreover, the benefits of combining multiple enhancements must be weighed against the resultant drop in usable sample size, which can erode statistical power for genetic and other advanced techniques.
Strength and limitations
This review offers insight into the breadth of mental health research conducted using UKB. In selecting studies, we assumed that all relevant papers would appear on the UKB website and, for 2022–2023, that researchers citing UKB data would reference one of our identified methodology papers. While we sought consistency by assigning each paper a single secondary topic, some studies may have been missed due to categorization constraints. We did not include studies using the second mental health questionnaire (MHQ2), as doing so would have delayed publication; however, this review remains a useful reference point for researchers exploring the MHQs. The summaries provided in the Supplementary Material are intended as a practical guide rather than a comprehensive listing. In the course of the review, we have created a timeline, a comparison of data types and other resources. We hope that this will inspire high-quality future research and the interpretation of UKB results.
Conclusions
Our review of mental health disorder research in UKB highlighted common themes and demonstrated the diverse approaches used to study mental health. It is encouraging to see research exploring mental health from multiple perspectives, as a comprehensive understanding of etiology, risk factors, and mechanisms is essential for advancing prevention and treatment strategies (Patel et al., Reference Patel, Saxena, Lund, Thornicroft, Baingana, Bolton, Chisholm, Collins, Cooper, Eaton, Herrman, Herzallah, Huang, Jordans, Kleinman, Medina-Mora, Morgan, Niaz, Omigbodun, Prince, Rahman, Saraceno, Sarkar, De Silva, Singh, Stein, Sunkel and UnÜtzer2018). UKB provides a unique opportunity to investigate many theories within a single resource, including associations with diet, inflammation, brain connectome types, and genotypes. Moreover, it serves as an important resource for replication, triangulation, and sensitivity analyses, helping to reinforce or refine findings from other studies (Sheehan & Didelez, Reference Sheehan and Didelez2020; VanderWeele, Reference VanderWeele2021).
A deeper understanding of complex multimorbidity, where mental and physical long-term conditions accumulate over time, is crucial for improving population health (Ronaldson et al., Reference Ronaldson, Arias de la Torre, Prina, Armstrong, Das-Munshi, Hatch, Stewart, Hotopf and Dregan2021). UKB remains a valuable tool for addressing these questions, offering researchers the ability to explore links between mental and physical health.
As UKB continues to grow in both the scope of its data and the volume of research it supports, it remains adaptable to researchers’ evolving priorities and technological advancements. However, this expansion also increases the dataset’s complexity, requiring careful navigation to maximize its potential. To fully leverage the opportunities UKB offers, researchers must continue refining their methodologies while acknowledging their inherent limitations
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0033291725101359.
Acknowledgement
K.A.S.D. and M.H. are supported by the National Institutes of Health Research (NIHR) KCL/South London and Maudsley NHS Trust Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
Funding statement
K.A.S.D. and M.H. were part-supported by the NIHR KCL/Maudsley Biomedical Research Centre.
Competing interests
The authors declare no conflicts of interest.