Policy Significance Statement
This research highlights the urgent need for enhanced governance frameworks for human genomic data sharing. Existing regulatory mechanisms, like anonymisation and informed consent, fall short in addressing the unique risks associated with genomic data. Policymakers must consider the dual nature of genomic data—both personal and collective—when developing regulations. This article proposes practical measures for genomic data governance informed by the concept of “genomic contextualism,” including the integration of fair interest balancing and comprehensive data lifecycle management. These recommendations aim to protect individuals and underrepresented groups while maximising the scientific and clinical benefits of genomic data.
1. Introduction
Genomic data—specifically, human genomic data as referred to throughout this article—are a valuable asset for advancing genomic research and scientific understanding. It plays a crucial role in unravelling the complex mechanisms of diseases and biological processes (Gürsoy, Reference Gürsoy, Jiang and Tang2020). Genomic analyses capture emergent properties and interactions absent in discrete genetic assessments (Gallagher and Chen-Plotkin, Reference Gallagher and Chen-Plotkin2018). High-resolution genomic datasets facilitate population-level analyses of evolutionary patterns and genetic adaptations while allowing examination of molecular processes at cellular levels. Unlike single-gene studies, genomic data reveal complex gene–gene interactions and regulatory networks, providing a comprehensive account of how genomic variation relates to human phenotypes (Ritchie et al., Reference Ritchie, Holzinger, Li, Pendergrass and Kim2015). In addition, the temporal stability of genomic data, when combined with other omics data, deepens our understanding of cellular ageing and disease mechanisms (Unger Avila et al., Reference Unger Avila, Padvitski, Leote, Chen, Saez-Rodriguez, Kann and Beyer2024).
For individuals, genomic data are advancing a deeper understanding of disease care and health management. Notably, data from whole-genome sequencing (WGS) deliver more accurate results in the molecular genetic diagnosis of rare and unknown diseases, as well as the identification of actionable cancer drivers (Bagger et al., Reference Bagger, Borgwardt, Jespersen, Hansen, Bertelsen, Kodama and Nielsen2024). Due to the complexity of gene regulatory networks, WGS data outperform exome sequencing in diagnosing rare diseases, establishing it as the preferred first-line resource for this purpose (Wojcik et al., Reference Wojcik, Lemire, Berger, Zaki, Wissmann, Win, White, Weisburd, Wieczorek, Waddell, Verboon, VanNoy, Töpf, Tan, Syrbe, Strehlow, Straub, Stenton, Snow, Singer-Berk, Silver, Shril, Seaby, Schneider, Sankaran, Sanchis-Juan, Russell, Reinson, Ravenscroft, Radtke, Popp, Polster, Platzer, Pierce, Place, Pajusalu, Pais, Õunap, Osei-Owusu, Opperman, Okur, Oja, O’Leary, O’Heir, Morel, Merkenschlager, Marchant, Mangilog, Madden, MacArthur, Lovgren, Lerner-Ellis, Lin, Laing, Hildebrandt, Hentschel, Groopman, Goodrich, Gleeson, Ghaoui, Genetti, Gburek-Augustat, Gazda, Ganesh, Ganapathi, Gallacher, Fu, Evangelista, England, Donkervoort, DiTroia, Cooper, Chung, Christodoulou, Chao, Cato, Bujakowska, Bryen, Brand, Bönnemann, Beggs, Baxter, Bartolomaeus, Agrawal, Talkowski, Austin-Tse, Jamra, Rehm and O’Donnell-Luria2024). Genomic data also enable comprehensive identification of genetic variation and catalogue how such variation contributes to health and disease when combined with environmental and lifestyle factors (Bick et al., Reference Bick, Metcalf, Mayo, Lichtenstein, Rura, Carroll, Musick, Linder, Jordan, Nagar, Sharma, Meller, Basford, Boerwinkle, Cicek, Doheny, Eichler, Gabriel and Gibbs2024). Beyond clinical applications, genomic data may also help inform critical life-course decisions, such as reproductive planning (Bilkey et al., Reference Bilkey, Burns, Coles, Bowman, Beilby, Pachter, Baynam, JS Dawkins, Nowak and Weeramanthri2019). This democratisation of personal health information derived from genomic data has the potential to transform individuals from passive healthcare recipients to active participants in their health trajectories.
The significance of genomic data in driving scientific advancements and benefiting individuals, coupled with the improved efficiency and precision of WGS (Park and Kim, Reference Park and Kim2016; Satam et al., Reference Satam, Joshi, Mangrolia, Waghoo, Zaidi, Rawool, Thakare, Banday, Mishra, Das and Malonia2023), has been a catalyst for the expansion of the genomic sequencing industry and the accumulation of vast genomic datasets. Notably, these developments, including industry growth and data proliferation, have spurred greater use and development of platforms designed to enable genomic data sharing. Such platforms aim to advance genomic research and maximise the utility and value of existing datasets (Kumuthini et al., Reference Kumuthini, Zass, Chaouch, Fadlelmola, Mulder, Radouani, Ras, Samtal, Tchamga, Sathan, Ghoorah, Sangeda, Mwita, Masamu, Kassim, Gill, Mungloo-Dilmohamud, Wells, Mccormick and Pathak2023).
However, genomic data sharing also gives rise to significant ethical considerations, which have become a subject of debate. This is because it is the data controllers, such as researchers and commercial companies, rather than the individuals who have undergone WGS, that share genomic data with third parties (Gil and Guerreiro, Reference Gil and Guerreiro2024). For instance, 23andMe (www.23andme.com) utilises anonymised data from its substantial customer base to collaborate with research partners, including pharmaceutical companies (Majumder et al., Reference Majumder, Guerrini and McGuire2021). This practice is contentious because the advantages derived from technological advancements and product innovations based on shared data primarily benefit data controllers or users, while the associated risks and potential harms predominantly impact individuals and communities (Garner and Kim, Reference Garner and Kim2018; Costello, Reference Costello2022). A vast literature has examined the diverse concerns associated with genomic data sharing, including privacy risks (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020; Gürsoy, Reference Gürsoy2022; Wan et al., Reference Wan, Hazel, Clayton, Vorobeychik, Kantarcioglu and Malin2022; Myers et al., Reference Myers, Kumar, Pilgram, Bonomi, Thomas, Griffith, Fullerton and Gibbs2025) and discrimination practices (Kaiser et al., Reference Kaiser, Uberoi, Raven-Adams, Cheung, Bruns, Chandrasekharan, Otlowski, Prince, Tiller, Ahmed, Bombard, Dupras, Moreno, Ryan, Valderrama-Aguirre and Joly2024; Joly et al., Reference Joly, Dupras, Pinkesz, Tovino and Rothstein2020).
This article aims to contribute to addressing governance challenges in genomic data sharing. Given that existing legal and regulatory mechanisms for genomic data sharing are insufficient, how can a more equitable governance framework be developed to mitigate the risks of such sharing while balancing its benefits? Specifically, Section 2 analyses the historical development of human genome projects (HGPs) and the increasing accumulation of genomic data, highlighting the importance that nations attach to such data. Section 3 explores the distinctive characteristics of genomic data and justifies the concept of genomic contextualism. Section 4 presents a taxonomy of the diverse risks linked to genomic data sharing, including violations of individual privacy, group-level harms, and bioterrorism threats. Section 5 examines the regulatory frameworks of the European Union (EU) and China, demonstrating that their current data protection mechanisms are insufficient for governing genomic data. Section 6 proposes nuanced policy recommendations grounded in genomic contextualism. Finally, Section 7 summarises the article’s key findings and discusses future research directions related to genomic data sharing.
2. Defining genomic data and tracing the historical accumulation of genomic datasets
In this section, the concept of genomic data and its rapid accumulation are examined. Genomic data refer to human WGS data, whose emergence traces back to the HGP and whose accumulation is inseparable from numerous transnational and national human genome initiatives.
2.1. Defining genomic data
The discovery of deoxyribonucleic acid (DNA)’s structure by James Watson and Francis Crick in 1953 laid the foundation for modern genomics (Mersha, Reference Mersha2024). In 1977, the advent of DNA sequencing technologies paved the way for obtaining complete human genomic data (Sanger et al., Reference Sanger, Nicklen and Coulson1977). Since that milestone, advancements in sequencing techniques, particularly next-generation sequencing, have revolutionised the field by enabling rapid and cost-effective analysis of entire genomes (Bentley et al., Reference Bentley, Balasubramanian, Swerdlow, Smith, Milton, Brown, Hall, Evers, Barnes, Bignell, Boutell, Bryant, Carter, Keira Cheetham, Cox, Ellis, Flatbush, Gormley, Humphray, Irving, Karbelashvili, Kirk, Li, Liu, Maisinger, Murray, Obradovic, Ost, Parkinson, Pratt, Rasolonjatovo, Reed, Rigatti, Rodighiero, Ross, Sabot, Sankar, Scally, Schroth, Smith, Smith, Spiridou, Torrance, Tzonev, Vermaas, Walter, Wu, Zhang, Alam, Anastasi, Aniebo, Bailey, Bancarz, Banerjee, Barbour, Baybayan, Benoit, Benson, Bevis, Black, Boodhun, Brennan, Bridgham, Brown, Brown, Buermann, Bundu, Burrows, Carter, Castillo, Chiara, Catenazzi, Chang, Neil Cooley, Crake, Dada, Diakoumakos, Dominguez-Fernandez, Earnshaw, Egbujor, Elmore, Etchin, Ewan, Fedurco, Fraser, Fuentes Fajardo, Scott Furey, George, Gietzen, Goddard, Golda, Granieri, Green, Gustafson, Hansen, Harnish, Haudenschild, Heyer, Hims, Ho, Horgan, Hoschler, Hurwitz, Ivanov, Johnson, James, Huw Jones, Kang, Kerelska, Kersey, Khrebtukova, Kindwall, Kingsbury, Kokko-Gonzales, Kumar, Laurent, Lawley, Lee, Lee, Liao, Loch, Lok, Luo, Mammen, Martin, McCauley, McNitt, Mehta, Moon, Mullens, Newington, Ning, Ling Ng, Novo, O’Neill, Osborne, Osnowski, Ostadan, Paraschos, Pickering, Pike, Pike, Chris Pinkard, Pliskin, Podhasky, Quijano, Raczy, Rae, Rawlings, Chiva Rodriguez, Roe, Rogers, Rogert Bacigalupo, Romanov, Romieu, Roth, Rourke, Ruediger, Rusman, Sanches-Kuiper, Schenker, Seoane, Shaw, Shiver, Short, Sizto, Sluis, Smith, Ernest Sohna Sohna, Spence, Stevens, Sutton, Szajkowski, Tregidgo, Turcatti, vandeVondele, Verhovsky, Virk, Wakelin, Walcott, Wang, Worsley, Yan, Yau, Zuerlein, Rogers, Mullikin, Hurles, McCooke, West, Oaks, Lundberg, Klenerman, Durbin and Smith2008). To date, the final hard-to-sequence segments of the human genome have been mapped, and hundreds of thousands of individuals have undergone WGS (Kaiser, Reference Kaiser2021). Archived genomic data also have the potential to act as a lifelong resource for data subjects, supporting repeated reanalysis and reinterpretation over time.
Building on these technological advances, genomic data are obtained through WGS to offer individuals insights into their genetic composition, including predispositions to diseases, ancestry information, and pharmacogenomic insights affecting medication responses (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020). Genome sequencing entails individuals providing biological samples, such as saliva (Martins et al., Reference Martins, Murry, Telford and Moriarty2022), and involves the generation of various types of data, including “sequence read data” comprising WGS and whole-exome sequencing (WES) data, as well as data related to single-nucleotide polymorphisms (SNPs) (Belkadi et al., Reference Belkadi, Bolze, Itan, Cobat, Vincent, Antipenko, Shang, Boisson, Casanova and Abel2015). It is essential to note that raw personal WGS data alone lack meaningful interpretation; hence, these data must undergo analysis to derive interpreted genomic information that is comprehensible. The process of interpreting genomic data involves aligning sequences with a reference genome, identifying variations compared to the reference, and documenting these variances in a variant call format (Paltiel et al., Reference Paltiel, Taylor and Newson2023). Consequently, genomic information can be deduced from genomic data in conjunction with external reference data or information (El Emam, Reference El Emam2011). This implies that the more extensive and accurate the external reference data, the more comprehensive and precise the personal genomic information revealed by WGS will be. Its value will continue to increase as our understanding of it deepens. Within the realm of our research, WGS data hold particular significance as a foundational form of genomic data, serving as a primary focus for our investigation.
2.2. Tracing human genome projects and genomic data accumulation
To better understand human genomic data, its accumulation, and the significance of its sharing, it is necessary to review the historical development of HGPs.
The famous HGP, launched in October 1990, is a foundational initiative for human genomic research. It required global collaboration and accelerated biomedical research worldwide. To deliver a key component of the HGP, the International Human Genome Sequencing Consortium (2004) was formed, an open partnership involving 20 centres across six countries. This consortium ultimately produced a reference human genomic sequence, providing a basis for human genomic research. Notably, the initial reference data contained gaps and errors, which were refined in 2013 and 2019. Most recently, in 2022, the Telomere-to-Telomere (T2T) Consortium released the T2T-CHM13 reference: a complete 3.055 billion–base pair sequence of a human genome (Nurk et al., Reference Nurk, Koren, Rhie, Rautiainen, Bzikadze, Mikheenko, Vollger, Altemose, Uralsky, Gershman, Aganezov, Hoyt, Diekhans, Logsdon, Alonge, Antonarakis, Borchers, Bouffard, Brooks, Caldas, Chen, Cheng, Chin, Chow, de Lima, Dishuck, Durbin, Dvorkina, Fiddes, Formenti, Fulton, Fungtammasan, Garrison, PGS, Graves-Lindsay, Hall, Hansen, Hartley, Haukness, Howe, Hunkapiller, Jain, Jain, Jarvis, Kerpedjiev, Kirsche, Kolmogorov, Korlach, Kremitzki, Li, Maduro, Marschall, McCartney, McDaniel, Miller, Mullikin, Myers, Olson, Paten, Peluso, Pevzner, Porubsky, Potapova, Rogaev, Rosenfeld, Salzberg, Schneider, Sedlazeck, Shafin, Shew, Shumate, Sims, AFA, Soto, Sović, Storer, Streets, Sullivan, Thibaud-Nissen, Torrance, Wagner, Walenz, Wenger, JMD, Xiao, Yan, Young, Zarate, Surti, RC, Dennis, Alexandrov, Gerton, O’Neill, Timp, Zook, Schatz, Eichler, Miga and Phillippy2022).
Following the release of the reference human genomic sequence, understanding the relationship between genotype and phenotype became a central goal in biology and medicine. To deepen knowledge of genetic contributions to human health and disease, the International 1000 Genomes Project was established in 2007. Its aim was to sequence the genomes of at least 1000 volunteers from diverse global populations (Devuyst, Reference Devuyst2015). The project reconstructed the genomes of 2504 individuals from 26 populations, using a combination of low-coverage WGS, deep exome sequencing, and dense microarray genotyping (Auton et al., Reference Auton, Abecasis, Altshuler, Durbin, Abecasis, Bentley, Chakravarti, Clark, Donnelly, Eichler, Flicek, Gabriel, Gibbs, Green, Hurles, Knoppers, Korbel, Lander and Lee2015). It characterised a broad range of genetic variation: over 88 million variants in total, including 84.7 million SNPs, 3.6 million short insertions/deletions, and 60,000 structural variants—all phased onto high-quality haplotypes (Auton et al., Reference Auton, Abecasis, Altshuler, Durbin, Abecasis, Bentley, Chakravarti, Clark, Donnelly, Eichler, Flicek, Gabriel, Gibbs, Green, Hurles, Knoppers, Korbel, Lander and Lee2015). This resource serves as a benchmark for surveys of human genetic variation and remains a key component of human genomic studies.
As the cost of WGS has fallen by more than a million-fold (Satam et al., Reference Satam, Joshi, Mangrolia, Waghoo, Zaidi, Rawool, Thakare, Banday, Mishra, Das and Malonia2023), and when paired with significant public investment in genomic research, many countries have launched their own HGPs. As of 2019, over 96 major genomic programmes had been initiated to collect, store, share, and use human genomic data and related health data for diverse objectives (Nunn et al., Reference Nunn, Tiller, Fransquet and Lacaze2019). Key large-scale national and international initiatives include the US All of Us Research Program (The All of Us Research Program Investigators, 2019), and the European “1 + Million Genomes” Initiative (Saunders et al., Reference Saunders, Baudis, Becker, Beltran, Béroud, Birney, Brooksbank, Brunak, Van den Bulcke, Drysdale, Capella-Gutierrez, Flicek, Florindi, Goodhand, Gut, Heringa, Holub, Hooyberghs, Juty, Keane, Korbel, Lappalainen, Leskosek, Matthijs, Mayrhofer, Metspalu, Navarro, Newhouse, Nyrönen, Page, Persson, Palotie, Parkinson, Rambla, Salgado, Steinfelder, Swertz, Valencia, Varma, Blomberg and Scollen2019), each aiming to sequence at least 1 million individuals to inform evidence-based precision medicine (Howley et al., Reference Howley, Haas, Muftah, Annan, Green, Lundgren, Scott, Stark, Tan, North and Boughtwood2025). Moreover, there are several notable projects aimed at non-European populations, such as the GenomeAsia 100 K Project (Wall et al., Reference Wall, Stawiski, Ratan, Kim, Kim, Gupta, Suryamohan, Gusareva, Purbojati, Bhangale, Stepanov, Kharkov, Schröder, Ramprasad, Tom, Durinck, Bei, Li, Guillory, Phalke, Basu, Stinson, Nair, Malaichamy, Biswas, Chambers, Cheng, George, Khor, Kim, Cho, Menon, Sattibabu, Bassi, Deshmukh, Verma, Gopalan, Shin, Pratapneni, Santhosh, Tokunaga, Md-Zain, Chan, Parani, Natarajan, Hauser, Allingham, Santiago-Turla, Ghosh, Gadde, Fuchsberger, Forer, Schoenherr, Sudoyo, Lansing, Friedlaender, Koki, Cox, Hammer, Karafet, Ang, Mehdi, Radha, Mohan, Majumder, Seshagiri, Seo, Schuster and Peterson2019), China’s Precision Medicine Initiative (Liu et al., Reference Liu, Hui and Song2020), Singapore’s Health for Life in Singapore Study (Wang et al., Reference Wang, Mina, Sadhu, Jain, Ng, Low, Tay, Tong, Choo, Kerk, Low, Team, Lam, Dalan, Wanseicheong, Yew, Leow, Brage, Michelotti, Wong, Sheridan, Yan, Xuan, Bertin, Bellis, Hebrard, Goy, Tsilidis, Sanikini, Li, Han, Lee, Best, Tan, Elliott, Sing, Lee, Ngeow, Riboli, Lam, Loh and Chambers2024a), and Nigeria’s 100 K Genome Project (Fatumo et al., Reference Fatumo, Yakubu, Oyedele, Popoola, Attipoe, Eze-Echesi, Modibbo, Ado-Wanka, Salako, Nashiru, Salako, O’Dushlaine and Ene-Obong2022).
Alongside these HGPs, derivative human genomic data, often stored in national biobanks, have emerged as a transformative resource for understanding human genetic variation and its links to health and disease. These projects and biobanks now serve as critical platforms for advancing genomic research. By integrating high-resolution human genomic data with comprehensive phenotypic, environmental, and clinical datasets, they enable researchers to uncover the genetic basis of diseases, identify novel biomarkers, and develop precision medicine strategies tailored to diverse populations (Lee et al., Reference Lee, Kim, Kwon, Kim, Kim and An2025). For instance, the UK Biobank—a large-scale biomedical database—has recruited approximately 500,000 participants, with over 200,000 whole genomes made available for global access (J. Kaiser, Reference Kaiser2021). Another example is the All of Us Research Program, which had released genomic data for 245,388 participants as of February 2024, with plans to sequence over 1 million individuals (Bick et al., Reference Bick, Metcalf, Mayo, Lichtenstein, Rura, Carroll, Musick, Linder, Jordan, Nagar, Sharma, Meller, Basford, Boerwinkle, Cicek, Doheny, Eichler, Gabriel and Gibbs2024).
Beyond the growing volume of human genomic data collected through public initiatives, private genomic databases from commercial sources are also substantial. Notably, the WGS sector has grown rapidly—particularly since the rise of direct-to-consumer (DTC) genome sequencing enterprises (McGuire et al., Reference McGuire, Diaz, Wang and Hilsenbeck2009). By early 2019, it was documented that over 26 million individuals globally had contributed their personal human genomic information to the databases of four leading testing firms (Majumder et al., Reference Majumder, Guerrini and McGuire2021).
Human genomic data generated by public and private entities are progressively accumulating. It holds the potential to serve diverse purposes and deliver significant value, profoundly shaping scientific research, medical practice, and individuals’ health care. Meanwhile, many genomic researchers, healthcare practitioners, and other stakeholders support human genomic data sharing. Their goal is to fully deliver the benefits of genomic science to the wider human population. A key example is the Global Alliance for Genomics and Health (GA4GH)—a global alliance aimed at enabling the responsible sharing of human genomic data (Rehm et al., Reference Rehm, AJH, Smith, Adams, Alterovitz, Babb, Barkley, Baudis, MJS, Beck, Beckmann, Beltran, Bernick, Bernier, Bonfield, Boughtwood, Bourque, Bowers, Brookes, Brudno, Brush, Bujold, Burdett, Buske, Cabili, Cameron, Carroll, Casas-Silva, Chakravarty, Chaudhari, Chen, Cherry, Chung, Cline, Clissold, Cook-Deegan, Courtot, Cunningham, Cupak, Davies, Denisko, Doerr, Dolman, Dove, Dursi, SOM, Eddy, Eilbeck, Ellrott, Fairley, Fakhro, Firth, Fitzsimons, Fiume, Flicek, Fore, Freeberg, Freimuth, Fromont, Fuerth, Gaff, Gan, Ghanaim, Glazer, Green, Griffith, Griffith, Grossman, Groza, Auvil, Guigó, Gupta, Haendel, Hamosh, Hansen, Hart, Hartley, Haussler, Hendricks-Sturrup, Ho, Hobb, Hoffman, Hofmann, Holub, Hsu, Hubaux, Hunt, Husami, Jacobsen, Jamuar, Janes, Jeanson, Jené, Johns, Joly, SJM, Kanitz, Kato, Keane, Kekesi-Lafrance, Kelleher, Kerry, Khor, Knoppers, Konopko, Kosaki, Kuba, Lawson, Leinonen, Li, Lin, Linden, Liu, Liyanage, Lopez, Lucassen, Lukowski, Mann, Marshall, Mattioni, Metke-Jimenez, Middleton, Milne, Molnár-Gábor, Mulder, Munoz-Torres, Nag, Nakagawa, Nasir, Navarro, Nelson, Niewielska, Nisselle, Niu, Nyrönen, O’Connor, Oesterle, Ogishima, Wang, Paglione, Palumbo, Parkinson, Philippakis, Pizarro, Prlic, Rambla, Rendon, Rider, Robinson, Rodarmer, Rodriguez, Rubin, Rueda, Rushton, Ryan, Saunders, Schuilenburg, Schwede, Scollen, Senf, Sheffield, Skantharajah, Smith, Sofia, Spalding, Spurdle, Stark, Stein, Suematsu, Tan, Tedds, Thomson, Thorogood, Tickle, Tokunaga, Törnroos, Torrents, Upchurch, Valencia, Guimera, Vamathevan, Varma, Vears, Viner, Voisin, Wagner, Wallace, Walsh, Williams, Winkler, Wold, Wood, Woolley, Yamasaki, Yates, Yung, Zass, Zaytseva, Zhang, Goodhand, North and Birney2021).
3. The key features of genomic data and genomic contextualism
Having traced the historical development of HGPs and the growing accumulation of human genomic data—from public initiatives and private sources—it is now critical to explore the inherent characteristics of these data and lay the foundation for their regulations.
3.1. The key features of genomic data
While the commercial use of DTC genome sequencing has commodified both the sequencing process and the information it yields, genomic data are far from ordinary. It is uniquely identifiable and possesses distinct attributes such as predictive capability, immutability, and group impact (Chapman et al., Reference Chapman, Quinn, Natri, Berrios, Dwyer, Owens, Heraty and Caplan2023). More specifically, genomic data have a dual nature. On the one hand, it constitutes a form of unique personal data, even more unique than genetic data. Genomic data encompass an individual’s complete genetic makeup, specifically referring to the DNA found in normal reproductive cells. Each individual’s genomic data are unique; even the germline genomes of monozygotic twins exhibit distinctions due to early developmental mutations (Jonsson et al., Reference Jonsson, Magnusdottir, Eggertsson, Stefansson, Arnadottir, Eiriksson, Zink, Helgason, Jonsdottir, Gylfason, Jonasdottir, Jonasdottir, Beyter, Steingrimsdottir, Norddahl, Magnusson, Masson, Halldorsson, Thorsteinsdottir, Helgason, Sulem, Gudbjartsson and Stefansson2021). Consequently, as intact genetic data, genomic data can reveal unique genetic characteristics that possess a level of specificity not typically found in other forms of biological substances (Tigard, Reference Tigard2019), such as blood and internal organs, even other forms of data, including certain personal genetic data (Rahnasto, Reference Rahnasto2023).
On the other hand, genomic data not only reflect individual characteristics but also serve as collective data, revealing shared familial and ethnic traits (McGonigle, Reference McGonigle2019, p. 3). It is noteworthy that the genomic sequences of any two individuals exhibit approximately 99.9 per cent similarity at the nucleotide level (Hartl and Cochrane, Reference Hartl and Cochrane2017, p. 189). Nevertheless, when considering the approximately 3 billion base pairs in the human genome within a reproductive cell, the 0.1 per cent of the human DNA sequence—equating to around 3 million base pairs—that varies between genomes remains a substantial number. The genomic similarity is often more pronounced within ethnic groups, which may display shared genetic traits (Shriver et al., Reference Shriver, Smith, Jin, Marcini, Akey, Deka and Ferrell1997; Lowe et al., Reference Lowe, Urquhart, Foreman and Evett2001; Spielman et al., Reference Spielman, Bastone, Burdick, Morley, Ewens and Cheung2007), while relatives typically exhibit even greater genetic resemblance (Guo, Reference Guo2008). Consequently, the disclosure of an individual’s genomic data invariably reveals portions of the genetic data of other individuals to whom they are genetically related, including ancestors (Costello, Reference Costello2022).
In light of these two characteristics, a parallel can be drawn with the concept of “relational privacy,” which recognises the interconnectedness of individuals within social and familial networks, especially regarding genetic data (Entrikin, Reference Entrikin2019; Costello, Reference Costello2022). I propose to understand genomic data as “collective personal data,” reflecting both individual and shared genetic traits within families or populations. This concept helps us capture the dual nature of genomic data as both personal and collective, challenging traditional binary perspectives on privacy and blurring the distinctions between the individual and the collective. This duality may initially seem paradoxical, aligning with the notion of “essentially oxymoronic concepts” (Neuwirth, Reference Neuwirth2013), but it underscores the need for a more nuanced understanding of genomic data’s privacy implications and the best governance mechanisms that should apply to it.
Furthermore, this inherent complexity of genomic data is also one of the fundamental reasons why genomic data cannot be fully anonymised. The very nature of these data ties individuals to their familial and communal genetic identities, making it difficult to separate personal data from collective implications. Even when an individual’s genomic information is de-identified, its collective attributes can still allow others to recognise their data through group databases (Ohm, Reference Ohm2009). Specific personal details like family names and observable characteristics such as skin and eye colour are publicly accessible and can be linked to genomic data (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020). Furthermore, genomic data remain constant throughout a person’s lifetime. This enduring uniqueness establishes a strong correlation between genomic data and individual identities, making it susceptible to re-identification through identification and phenotype inference attacks (Altman et al., Reference Altman, Clayton, Kohane, Malin and Roden2013; Rocher et al., Reference Rocher, Hendrickx and de Montjoye2019; Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020). Consequently, genomic data constitute personal data that cannot be anonymised.
3.2. The genomic contextualism
There has been ongoing debate about whether genetic data are unique and require special treatment, a concept known as “genetic exceptionalism” (Green and Botkin, Reference Green and Botkin2003). Proponents of this idea argue that genetic data possess distinct characteristics, such as heritability, the potential for incidental findings, and complexity, which set it apart from other types of medical data. For instance, a single inconsequential sequence linked to an individual’s identity could potentially reveal genetic information that the person prefers to keep private (Evans et al., Reference Evans, Burke and Khoury2010). According to this view, such features of genomic data warrant special policies and protections (Green and Botkin, Reference Green and Botkin2003; Evans et al., Reference Evans, Burke and Khoury2010).
However, other types of medical and biometric data can be equally sensitive and merit similar safeguards (Price and Cohen, Reference Price and Cohen2019; Migliorini, Reference Migliorini2023). In response to this critique, the concept of “genomic contextualism” has been proposed as a more nuanced framework for addressing the ethical and policy challenges surrounding genomic data (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). Genomic contextualism is grounded in a key characteristic of genomic data: its nature as collective personal data.
To fully grasp this framework, it is first necessary to resolve a common conflation: the distinction between genetic data and genomic data. Genetic data pertain to discrete genes or markers and their variants, focusing on isolated DNA segments linked to specific phenotypic expressions (Hartl and Cochrane, Reference Hartl and Cochrane2017, p. 189). The fundamental difference lies in scope and analytical power: genetic data offer targeted insights into specific biological mechanisms, whereas human genomic data provide a holistic context for understanding the integrative functions of an individual’s complete genetic architecture.
Genomic contextualism posits that the significance of genomic data depends on the specific context in which it is used. Rather than applying blanket policies to genomic data as a whole, this approach advocates for policies tailored to the unique circumstances in which such data are utilised, whether in clinical, research, or societal settings (Garrison et al., Reference Garrison, Brothers, Goldenberg and Lynch2019a; Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). This perspective recognises the uniqueness of genomic data but also emphasises the need for flexibility to account for their varying relevance across different contexts and populations, particularly minority and Indigenous groups whose cultural values and ethical concerns might clash with mainstream approaches to the processing of genomic data (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b).
One key area of application for genomic contextualism is data sharing in genomic research. Large-scale genomic research projects and biobanks routinely generate vast amounts of data, prompting discussions about whether genomic data require special protections compared to other types of research data (Murray, Reference Murray2019). Genomic data share similarities with other sensitive data, such as medical data, in that privacy breaches can cause significant harm. However, genomic data are distinct in that they can be re-identified using demographic information or by cross-referencing other datasets (Rocher et al., Reference Rocher, Hendrickx and de Montjoye2019). They can also contain sensitive health predictions or genetic ancestry information, raising privacy and ethical concerns (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). Therefore, legal data protection regulations that rely on a one-size-fits-all model will fail to address the uniqueness of human genomic data.
4. Tripartite risk taxonomy of genomic data sharing
As mentioned in the previous section, the technological evolution facilitating genomic data proliferation has rendered de-identification measures increasingly vulnerable to reversal, compromising the presumed anonymity of genomic information. In addition, genomic data’s distinctive capacity to reveal temporally extensive and communally diffuse information creates vectors for sensitive disclosure. This architecture of vulnerability engenders a tripartite risk taxonomy: individual privacy violations, group-level harms, and bioterrorism threats.
4.1. Individual privacy violations
Sharing genomic data poses various privacy harms and risks to individuals. These risks encompass a wide range of privacy harms, including physical, psychological, autonomy, and discrimination harms (Citron and Solove, Reference Citron and Solove2022). Privacy harms have both subjective and objective dimensions (Calo, Reference Calo2011). Subjective privacy harm relates to the sense of being monitored without consent, leading to distressing mental states, whereas objective privacy harm involves external actions that exploit personal information against an individual’s wishes (Citron and Solove, Reference Citron and Solove2022).
In the context of genomic data, sharing such data can result in physical, psychological, and autonomy harms. For example, sharing genomic data may expose individuals to potential future misuse, leading to a loss of control over their personal data. This loss of control represents a form of autonomy harm (Citron and Solove, Reference Citron and Solove2022). A notable study illustrates this: researchers combined Y-chromosome haplotype analysis with genealogical registry data to predict the surnames of anonymised participants, directly undermining data control (Gitschier, Reference Gitschier2009). Additionally, discrimination harms occur when individuals face unjust differential treatment based on actual or perceived characteristics inferred from their genomic data (Berndt Rasmussen, Reference Berndt Rasmussen2019). These unjust practices restrict individuals’ access to employment, affordable insurance, housing, and other crucial life opportunities (Citron and Solove, Reference Citron and Solove2022). A famous case, Xie v. Human Resources and Social Security Bureau in Foshan City, was reported in China (Kim et al., Reference Kim, Ho, Ho, Athira, Kato, De Castro, Kang, Huxtable, Zwart, Ives, Lee, Joly and Kim2021). In 2009, 31 applicants to the Foshan local government were denied civil service roles solely because they were thalassemia gene carriers (Qiu, Reference Qiu2010). Three of these applicants later filed a lawsuit alleging discrimination. However, in 2010, the Foshan Intermediate People’s Court ruled that rejecting candidates with the thalassemia gene for civil service positions was legal (Kim et al., Reference Kim, Ho, Ho, Athira, Kato, De Castro, Kang, Huxtable, Zwart, Ives, Lee, Joly and Kim2021). Importantly, thalassemia gene carriers are not equivalent to anaemia patients. This case thus clearly demonstrates how genetic discrimination—rooted in inferences from genomic data—can directly undermine individuals’ interests and access to opportunities.
Genomic data are vulnerable to access, sharing, and use by various entities for a range of purposes, which exacerbates the associated risks and complicates mitigation efforts (Haeusermann et al., Reference Haeusermann, Fadda, Blasimme, Tzovaras and Vayena2018; Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020; Gürsoy, Reference Gürsoy2022). In the private sector, commercial actors often exploit these data for financial gain. For example, Nutrigenomix (https://nutrigenomix.com) uses genetic profiles to develop personalised nutrition services, promoting these via targeted channels like podcasts and health blogs (Gil and Guerreiro, Reference Gil and Guerreiro2024). Such practices contribute to “DNA data marketplaces,” where companies access genomic data to drive research, develop products, and market these to individuals with relevant genetic predispositions (Ahmed and Shabani, Reference Ahmed and Shabani2019). This sensitive information, when exposed publicly, may precipitate social stigmatisation and personal embarrassment. Concurrently, governmental access to genomic data raises significant concerns regarding privacy infringement (Haag, Reference Haag2019), discriminatory practices, surveillance capabilities, and potential abuse of institutional authority (Ram et al., Reference Ram, Guerrini and McGuire2018).
4.2. Group-level harms
Genomic information transcends individual boundaries, generating cascading implications for biological relatives and broader ethnocultural communities, thereby constituting a collective dimension of genomic identity (McGonigle, Reference McGonigle2016). Presently, there is a mounting concern surrounding the concept of “relational privacy” (Entrikin, Reference Entrikin2019; Costello, Reference Costello2022). The sharing of genomic data can potentially unveil sensitive information about relatives without their explicit consent (McGonigle, Reference McGonigle2019), expanding privacy risks beyond the individual and influencing familial relationships. For example, comparing genomic data among family members can reveal details about their familial ties. A significant event in 2018 saw law enforcement authorities in the U.S. utilising consumer genomic databases (e.g. GEDmatch) to identify suspects by tracing distant familial relatives (Erlich et al., Reference Erlich, Shor, Pe’er and Carmi2018; Ram et al., Reference Ram, Guerrini and McGuire2018). The “Golden State Killer,” for example, never submitted his DNA to GEDmatch but was identified through a distant cousin’s genomic profile. This case highlights how sharing one individual’s data can compromise relatives’ privacy without their input, underscoring the need to protect both individual and relational privacy in genomic data practices. Consequently, the repercussions of genomic data extend beyond the individual to encompass relational aspects, impacting all parties involved, even in the absence of explicit consent (Costello, Reference Costello2022).
The concept of group risks associated with genomic data focuses on the potential adverse implications that genomic data can have on specific groups. There is a growing apprehension concerning the collective interests and harms linked to genomic data. The sharing of data from a subset of individuals within a group can impinge on the legitimate interests of other group members (Costello, Reference Costello2022). These groups, defined by shared inherited characteristics, may consist of individuals with particular disease susceptibilities or common physical attributes. When integrated with machine learning (ML) or artificial intelligence (AI) analysis, the sharing of genomic data can endanger group interests, resulting in biases, discrimination, and the establishment and perpetuation of disparities within these specific groups (Chapman et al., Reference Chapman, Quinn, Natri, Berrios, Dwyer, Owens, Heraty and Caplan2023). Furthermore, these harms have the potential to inflict cultural and dignitary risks on these groups (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). The Havasupai Tribe case exemplifies this. In 2003, research on the Tribe’s donated blood samples—originally intended to study diabetes—was expanded without consent to investigate its ancestry and familial connections, prompting a legal dispute (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). The Tribe contended that these additional studies exceeded their initial agreement, causing cultural, dignitary, and group harm. Subsequently, a settlement was reached in 2010, awarding Tribal members $700,000 in compensation (Garrison, Reference Garrison2013).
Group-based harms manifest as stigmatisation and marginalisation, subjecting affected individuals to systemic disadvantages (Chapman et al., Reference Chapman, Quinn, Natri, Berrios, Dwyer, Owens, Heraty and Caplan2023; Rahnasto, Reference Rahnasto2023). Governmental entities may amplify these vulnerabilities through institutional practices informed by implicit biases and structural prejudices. Historical precedents illustrate this phenomenon, as with Cesare Lombroso’s “born criminal” theory, rejected for its racist underpinnings and biological determinism (Sirgiovanni, Reference Sirgiovanni2017). Should officials gain access to genomic data, such information might serve as justification for discriminatory judgments against individuals with specific genetic variations, thereby intensifying societal stratification and interethnic tensions.
4.3. Bioterrorism threats
The widespread sharing of genomic data presents potential risks of bioterrorism, impacting both national and global security due to the universal nature of the human genome. A primary concern is the potential for genetic modification to facilitate the development of biological weapons, a threat that may be intensified by the extensive dissemination and unlimited access to genomic data. Although genomic technology’s current development and application present a limited immediate threat, once it occurs, it will have extremely serious consequences, which means that we cannot wait until the risk actually materialises before regulating it. Moreover, bioterrorism threats are magnified by the intersection of modern genomic technologies with advanced AI, ML, automation, and robotic capabilities (Hendrycks et al., Reference Hendrycks, Mazeika and Woodside2023; Brent et al., Reference Brent, McKelvey and Matheny2024). This convergence could empower private biotech platforms or research communities to craft biological weapons targeting specific groups or populations (Lentzos, Reference Lentzos2020; Painter and Bastian, Reference Painter and Bastian2021). Moreover, the absence of robust cybersecurity measures within the synthetic biology sector exposes it to the potential for unauthorised synthesis of harmful biological agents by malicious actors (Puzis et al., Reference Puzis, Farbiash, Brodt, Elovici and Greenbaum2020). Unlike conventional biological weapons, which rely on naturally occurring microorganisms to inflict harm (Pal et al., Reference Pal, Tsegaye, Girzaw, Bedada, Godishala and Kandi2017), genetically modified biological agents can target specific populations with highly infectious and pathogenic organisms, thereby increasing the likelihood of severe harm (Brockmann et al., Reference Brockmann, Bauer and Boulanin2019; Ristanovic, Reference Ristanovic, Dishovsky and Pivovarov2009, p. 124).
This is not alarmist rhetoric. All technologies possess dual uses: while human genomic data can drive advancements in health-related genetic technologies, it also has the potential to enable harmful applications. A notable example from 2018 highlights the risks associated with genomics: a member of a three-person team utilised recombinant DNA, polymerase chain reaction (PCR), and synthetic DNA to recreate horsepox, a close relative of smallpox (Brent et al., Reference Brent, McKelvey and Matheny2024). Another group further developed this research, using the same tools, along with clustered regularly interspaced short palindromic repeats (CRISPR) technology, to engineer a different smallpox-related virus. Such studies underscore the ease with which this research could be repurposed to produce lethal pathogens. In 2022, a team of researchers modified an AI system initially designed to create non-toxic therapeutic molecules. They altered its parameters to reward toxicity rather than penalise it (Urbina et al., Reference Urbina, Lentzos, Invernizzi and Ekins2022). Following this adjustment, the system independently generated 40,000 candidate chemical warfare agents within just six hours. While the destructive impact of biotechnology has not yet matched that of nuclear armaments, the pace of technological progress may surpass individual nations’ regulatory capacities. Without adequate regulation, these advancements could lead to a resurgence of bioterrorism, posing a severe threat to the security and welfare of particular ethnic groups or humanity at large. The risk of malicious actors creating tools to harm humans raises critical ethical and security questions, posing major challenges to national, transnational, and global bioterrorism prevention efforts.
The sharing of genomic data generates interconnected potential risks of bioterrorism at both national and global levels, each with varying implications. These risks may arise from individual states pursuing their national interests or from the actions of terrorist groups, extremists, or other malicious entities, all of which pose substantial threats. On a national level, bioterrorism risks are heightened by the tailored development of pathogens designed to exploit the susceptibilities or vulnerabilities of specific populations within countries (Dieuliis, Reference Dieuliis2018). At a global scale, larger nations with greater racial and genetic diversity face challenges in identifying shared genetic traits and formulating targeted biological threats (Wang and Liu, Reference Wang and Liu2025, p. 220). In contrast, smaller and more ethnically homogeneous countries may be more susceptible to biological weapons.
5. Rules for genomic data sharing: a comparison of China and the EU
The efficacy of protection strategies against the risks associated with genomic data sharing is a subject of ongoing debate (Joly et al., Reference Joly, Dupras, Pinkesz, Tovino and Rothstein2020; Gürsoy, Reference Gürsoy2022), with different countries adopting varying approaches (Harbord, Reference Harbord2019; Du and Wang, Reference Du and Wang2020; Paltiel et al., Reference Paltiel, Taylor and Newson2023; Solove, Reference Solove2024). This section examines the data protection frameworks of the EU and China and assesses their suitability for the effective prevention of the risks associated with genomic data sharing. The choice of these two jurisdictions is motivated by two reasons. Firstly, both frameworks are generally very protective of personal data (Ding, Reference Ding2024; Fuster, Reference Fuster2014, p. 1; Peng et al., Reference Peng, Shao and Zheng2022) and both jurisdictions recognise genomic data as a special category of data that requires heightened privacy protection (Rahnasto, Reference Rahnasto2023; Zhang, Reference Zhang2015, p. 51). Secondly, the data protection laws of both jurisdictions have an area of geographical influence that extends beyond the borders of the jurisdiction (Bradford, Reference Bradford2020, p. 27; Erie and Streinz, Reference Erie and Streinz2021). This section first provides an overview of the personal data protection laws in each jurisdiction, before then delving into two key mechanisms: technical security mechanisms and informed consent mechanisms. It analyses these mechanisms from both normative and practical perspectives.
5.1. Overview of relevant personal data protection laws
At the constitutional level, the protection of genomic data is fundamentally linked to the safeguarding of basic human rights. Within the EU, data protection is upheld as a fundamental right by primary law, with a particular emphasis on the protection of sensitive data, encompassing genomic data. Article 8 of the Charter of Fundamental Rights of the European Union (2007) stipulates that “everyone has the right to the protection of personal data concerning him or her.” In the EU, the European Court of Human Rights (ECtHR) has underscored the necessity for heightened protection of genetic data, recognising its unique sensitivity compared to other categories of sensitive data. In S. and Marper v. The United Kingdom (2008), the ECtHR highlighted the deeply personal and sensitive nature of genetic data, emphasising its exceptional status. In China, Articles 33 and 38 of the Constitution of the People’s Republic of China (2018) establish a foundation for the right to personal information and personal data (Zhang, Reference Zhang2015, p. 48), thereby providing constitutional grounds for safeguarding genomic data. This has led to the enactment of the Personal Information Protection Law of the People’s Republic of China (2021) (PIPL), which specifically addresses the protection of personal data. Furthermore, Article 28 of the Constitution offers a constitutional basis for the protection of national security. This, in turn, underpins the Biosecurity Law of the People’s Republic of China (2024) (Biosecurity Law), which also relates to the protection of genomic data (Wang, Reference Wang2013, p. 67).
At the legislative level, genomic data receive classification as sensitive data under both EU and Chinese regulatory frameworks. In the EU, Article 9 of the General Data Protection Regulation (2016) (GDPR) establishes “special categories of personal data”—commonly termed sensitive data (Quinn and Malgieri, Reference Quinn and Malgieri2021)—encompassing genetic, health, and biometric data. Genomic data’s inherent capacity to reveal detailed genetic compositions, disease susceptibilities, and distinctive individual and community characteristics substantiates its sensitive categorisation. This position finds additional support through a fortiori reasoning (d’Almeida, Reference d’Almeida2017): if subordinate categories like genetic data warrant sensitive classification, then genomic data, representing a more comprehensive category, merit equivalent or superior protection. While some have noted Article 9(1) GDPR presents an exhaustive enumeration of “special categories” (Quinn and Malgieri, Reference Quinn and Malgieri2021), potentially excluding genomic data under strict interpretation, genomic data’s reducibility to genetic data effectively secures its designation as sensitive data.
While the GDPR was a groundbreaking data protection law, a growing body of legal, socio-political, ethical, and policy research has drawn attention to its shortcomings. For health data—including human genomic data—these shortcomings highlight four broad areas: the limited scope of traditional data protection principles in the face of emerging big data practices, the blurring of key regulatory categories, flaws in the informed consent model, and the Regulation’s narrow focus on harms and discrimination arising from data processing (Marelli et al., Reference Marelli, Lievevrouw and Van Hoyweghen2020). To address these gaps, the EU has advanced a series of legislative measures, including the Regulation (EU) 2022/868 (2022) (Data Governance Act), the Regulation (EU) 2023/2854 (2023) (Data Act), and Regulation (EU) 2025/327 (2025). The first two are cross-sectoral governance frameworks, introduced to ensure better access to data and more responsible use (Casolari et al., Reference Casolari, Buttaboni and Floridi2023). The third, the Regulation (EU) 2025/327 (2025), establishes the European Health Data Space (EHDS), which enables the reuse of health data in healthcare, as well as for research and innovation.
The EHDS has two primary objectives: to enhance individuals’ access to and control over their health data within a healthcare context and to promote societal benefits from data utilisation, such as advancing healthcare delivery and research. Under Article 51.1(f) of Regulation (EU) 2025/327 (2025), health data explicitly include “human genetic, epigenomic, and genomic data.” Beyond aligning with GDPR requirements, this Regulation gives health data access bodies broad discretion to grant data access permits, alongside principles that outline when permits should and should not be issued (Quinn et al., Reference Quinn, Ellyne and Yao2024).
The EHDS’s rules on secondary health data use share similarities with third-party use of shared human genomic data, meaning the Regulation offers useful safeguards for genomic data sharing. Yet, its scope is mainly limited to healthcare and related research contexts; it does not cover commercial scenarios. This is significant because some genomic sequencing is classified as non-health-related, providing services including paternity testing, ancestral origin analysis, athletic ability assessments, matchmaking, and tests for “fun” traits, such as earwax type and eye colour (Hoxhaj et al., Reference Hoxhaj, Stojanovic, Sassano, Acampora and Boccia2020). Thus, while the EHDS effectively protects health-related human genomic data in healthcare settings, it does not offer comprehensive coverage for all human genomic data.
China adopts two primary strategies for the protection of genomic data: first, it aligns with the EU by treating genomic data as sensitive data; second, it implements the Biosecurity Law to address the national security risks that may arise from such data. Similar to the EU, China recognises the concept of sensitive personal information. Article 28 of the PIPL defines sensitive personal information as information that, if exposed or improperly utilised, could potentially infringe upon an individual’s personal dignity or threaten their safety or possessions. It further specifies that this category includes information related to biometric identification, religious beliefs, specific identities, healthcare, financial accounts, personal location, and details concerning minors under the age of fourteen. Consequently, personal genomic sequencing information is appropriately classified as sensitive personal information (Liu et al., Reference Liu, Peng, Wu, Tian and Tian2021; Wang et al., Reference Wang, Wang and Du2024b). Unlike the GDPR, which employs a closed list of sensitive data categories, China’s PIPL does not impose barriers to classifying genomic data as sensitive information.
Article 55 of the Biosecurity Law requires that the use and export of China’s human genetic information comply with ethical principles and not harm public health, national security, or the public interest. Article 56(4) mandates that transporting or mailing this information requires approval from the health department of the State Council. Additionally, the Detailed Rules for the Implementation of the Regulation on the Administration of Human Genetic Resources (2023) stipulates in Article 37(3) that foreign entities providing genome sequencing information resources with over 500 cases must undergo a security review by the Ministry of Science and Technology. This underscores the special considerations given to genomic data.
5.2. Technical security mechanisms
Technical and organisational measures (TOMs) play a crucial role in mitigating the multifaceted risks of human genomic data sharing, and legal frameworks are designed to align with such technological advancements (Staunton et al., Reference Staunton, Slokenberga and Mascalzoni2019). Article 32 of the GDPR, for instance, obliges data processors to implement TOMs to protect personal data. Anonymisation is often treated as a minimum requirement for enabling data sharing in this context.
Traditional privacy frameworks establish a dichotomy between personal and anonymised data, with the latter excluded from regulatory protection. Article 4(1) of the GDPR defines personal data as information relating to identifiable individuals, explicitly exempting anonymised data from its protective scope. Similarly, Article 4 of the PIPL withholds protection from anonymised information.
This categorical exclusion significantly compromises safeguards for genomic data (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020). Data anonymisation regimes operate on the premise that data unlinked to personal identity fall outside the classification of “personal data” (Elliot et al., Reference Elliot, O’Hara, Raab, O’Keefe, Mackey, Dibben, Gowans, Purdam and McCullagh2018), thereby permitting unregulated collection, use, and dissemination without the data subject’s consent. While anonymisation measures represent fundamental protective mechanisms of a technical nature, they prove inadequate in mitigating genomic data risks. As established previously, genomic data resist effective anonymisation, with techniques like de-identification and pseudonymisation demonstrating insufficient protective capacity (Rocher et al., Reference Rocher, Hendrickx and de Montjoye2019).
Anonymisation of genomic data involves the removal of protected health information, such as name, and semi-identifiable information, such as postcode (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020). However, there is a consensus that genomic data cannot be truly anonymised (O’Doherty et al., Reference O’Doherty, Shabani, Dove, Bentzen, Borry, Burgess, Chalmers, De Vries, Eckstein, Fullerton, Juengst, Kato, Kaye, Knoppers, Koenig, Manson, McGrail, McGuire, Meslin, Nicol, Prainsack, Terry, Thorogood and Burke2021), although researchers have debated the varying levels of identifiability associated with different types of genetic data (Lowrance and Collins, Reference Lowrance and Collins2007). The anonymisation paradigm faces escalating challenges from evolving analytics and re-identification methodologies within genomic contexts (Purtova, Reference Purtova2018). Individual genetic uniqueness creates robust correlations between genomic data and personal identity, rendering such information particularly susceptible to re-identification through identification attacks and phenotype inference attacks (Altman et al., Reference Altman, Clayton, Kohane, Malin and Roden2013; Rocher et al., Reference Rocher, Hendrickx and de Montjoye2019; Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020).
This raises two significant issues. First, data protection frameworks may fail to safeguard individuals’ rights to their genomic data when they rely only on data anonymisation measures. Second, anonymisation provisions can enable data controllers to circumvent the protections established by regulations such as the GDPR and the PIPL. In the context of genomic data sharing, data controllers are required to de-identify or pseudonymise genomic information obtained from clinical medicine, scientific research, and commercial testing. This so-called anonymised data can subsequently be shared without adequately considering the potential risks faced by individuals, groups, and societies involved.
Notably, some argue that data anonymisation could be strengthened by adopting advanced technical measures. As technology and research methodologies evolve, several approaches have emerged to enhance the protection of human genomic data (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020), including access control (Erlich et al., Reference Erlich, Williams, Glazer, Yocum, Farahany, Olson, Narayanan, Stein, Witkowski and Kain2014), homomorphic encryption (Deuber et al., Reference Deuber, Egger, Fech, Malavolta, Schröder, Thyagarajan, Battke and Durand2019), secure multiparty computation (Cho et al., Reference Cho, Wu and Berger2018), and differential privacy (Tramèr et al., Reference Tramèr, Huang, Hubaux and Ayday2015). The combined use of multiple such methods is also becoming more common (Raisaro et al., Reference Raisaro, Choi, Pradervand, Colsenet, Jacquemont, Rosat, Mooser and Hubaux2018). Beyond these discrete technical tools, the EU’s EHDS provides a comprehensive platform for genomic data sharing, which helps address risks within healthcare and related research contexts.
Nevertheless, ongoing advances in technology and cyberattack methods present persistent challenges to human genomic data protection. A high-profile example is the 2023 data breach at 23andMe. In October 2023, the company suffered a significant breach organised by a cybercriminal known as Golem (Holthouse et al., Reference Holthouse, Owens and Bhunia2025). While official statements of 23andMe (2023) claimed only 14,000 accounts were directly compromised, the attack spread via the platform’s DNA relative feature, expanding its impact to expose over 5.5 million customer records. This cybersecurity incident triggered widespread legal action in the US and other jurisdictions, which remains unresolved. As of 23 March 2025, 23andMe had also filed a voluntary petition in a US bankruptcy court to facilitate a rapid sale of the company (Gerke et al., Reference Gerke, Jacoby and Cohen2025).
23andMe had implemented a certain level of technical security measures, but cyberattacks continue to evolve. In this case, the techniques used were relatively unsophisticated yet highly effective, focusing on brute-force attacks and credential stuffing (Holthouse et al., Reference Holthouse, Owens and Bhunia2025). The growing sophistication of such attacks underscores the need for stronger security safeguards—a lesson relevant to other DTC genomic testing companies. While the cybercriminal bears responsibility for the attack, the incident also highlights a critical systemic issue: cost considerations often deter companies from adopting enhanced or multiple technical protections. As profit-driven entities, many companies seek to minimise costs while meeting only the minimum requirements of data protection regulations—leaving human genomic data vulnerable to emerging threats.
Therefore, whether technical measures can fully address the risks of human genomic data sharing is not the focus of this article. What is clear is that some innovative technical approaches already mitigate certain risks associated with such sharing, and technical professionals will likely develop further solutions to tackle its multifaceted threats. As a product of science and technology, human genomic data inherently require technical measures for their protection.
As established earlier, risk assessments of human genomic data sharing confirm that current data anonymisation measures are insufficient, meaning the level of technical protection for these data must be elevated. The EU’s EHDS illustrates this need: it aims to build a secure environment for data access and reuse, which explicitly requires the implementation of multiple technical and organisational safeguards.
Yet, a core question remains unresolved: Who should bear the cost of these technical and/or organisational measures in human genomic data sharing practices? Unlimited free access to genomic data can deliver significant benefits, but the associated risks cannot be shouldered solely by data subjects. For this reason, regulations governing genomic data sharing must go beyond setting basic requirements; they must also allocate clear obligations and responsibilities to the various stakeholders involved.
5.3. Informed consent mechanisms
Besides technical security mechanisms, data protection laws often rely on informed consent mechanisms to legitimise data processors’ activities and avoid establishing complex interest-balancing frameworks. Some may contend that a stringent interpretation of data protection law, particularly the principle of informed consent, could enhance the safeguarding of genomic data. However, informed consent may not provide genuine protection; rather, it can facilitate the data provider’s legal right to exploit genomic data, often prioritising their benefits over the individual’s willingness.
In numerous jurisdictions, data processors are required to obtain consent from data subjects before collecting, sharing, or using their genomic data. For instance, Article 6 of the GDPR serves as the primary legal foundation for data collection and processing, with consent being a key element. When it comes to sensitive data, Article 9.2(a) of the GDPR includes provisions for cases where “the data subject has explicitly consented to the processing of their sensitive personal data for one or more specified purposes.” Similarly, the PIPL incorporates a comparable informed consent principle aimed at safeguarding sensitive information. Informed consent embodies the concept of individual autonomy (Beauchamp, Reference Beauchamp2011) and stands as a fundamental legal principle in relevant legislation. Lawful consent depends on the individual’s decision-making capacity, voluntariness, and a comprehensive grasp of relevant information (Bunnik et al., Reference Bunnik, de Jong, Nijsingh and de Wert2013).
The principle of informed consent is crucial in the realm of genomic data sharing, yet it often lacks effectiveness, enabling data controllers to manipulate the process for diverse motives (Kaye, Reference Kaye2012; Bietti, Reference Bietti2019; Oliva et al., Reference Oliva, Kaphle, Reguant, Sng, Twine, Malakar, Wickramarachchi, Keller, Ranbaduge, Chan, Breen, Buckberry, Guennewig, Haas, Brown, Cowley, Thorne, Jain and Bauer2024). This phenomenon is not new. Rights related to personal data primarily aim to empower individuals with control over their personal information, a concept that has been termed “privacy self-management” (Solove, Reference Solove2013). Under such a framework, several shortcomings of consent are identified (Solove, Reference Solove2013), including (a) cognitive limitations, which suggest that individuals often struggle to make informed and rational decisions regarding consent due to cognitive biases and a lack of understanding of complex privacy issues; (b) meaningless consent, where many individuals consent to data practices without fully grasping the implications, resulting in a scenario where consent fails to provide genuine control over personal information; and (c) structural problems, wherein the sheer volume of entities collecting personal data renders it impractical for individuals to manage their privacy effectively.
Furthermore, privacy harms frequently arise from the aggregation of data over time, complicating individuals’ ability to assess risks and benefits. These issues are particularly relevant in the context of genomic data sharing. For example, regarding cognitive limitations, research indicates that 67% of DTC testing companies fail to provide sufficient information to consumers about the use of their genomic data (Christofides and O’Doherty, Reference Christofides and O’Doherty2016), with issues attributed to ambiguous language and a lack of transparency (Laestadius et al., Reference Laestadius, Rich and Auer2017). When considering genomic data collected in a research context, or data obtained in a clinical setting and intended for future research sharing, the expansive nature of such data sharing poses substantial challenges. It becomes virtually impossible to comprehensively describe, or indeed foresee, all potential future research applications at the time of data collection (McGuire and Beskow, Reference McGuire and Beskow2010).
The complexity of genomic data further complicates matters, making it challenging for data subjects to grasp the implications fully (Majumder et al., Reference Majumder, Guerrini and McGuire2021). This underscores the need for enhanced education among healthcare professionals to effectively convey these complexities to address individuals’ cognitive limitations (Martins et al., Reference Martins, Murry, Telford and Moriarty2022). In addition, and irrespective of the level of informed consent that is given upon the first processing, individuals undergoing WGS frequently lack awareness of how their genomic data will be utilised post-collection (McGuire and Beskow, Reference McGuire and Beskow2010; Niemiec and Howard, Reference Niemiec and Howard2016; Rego et al., Reference Rego, Grove, Cho and Ormond2020). Once realising this, they often express dissatisfaction with companies profiting from their genomic data and perceive a lack of clarity in the consent process (Allyse, Reference Allyse2013).
Moreover, data providers can readily obtain consent from data subjects, either by framing it as a prerequisite in commercial testing environments or by leveraging subjects’ goodwill to advance scientific progress. In the realm of DTC testing, a pressing concern resides in the practice of conditioning access to testing services on consent to data sharing, thereby effectively coercing individuals into acquiescence. For example, while 23andMe does not explicitly detail the future uses of customers’ genomic data, its terms and conditions state: “You understand that by providing any sample, having your information processed, accessing your information, or providing information, you acquire no rights in any research or commercial products that may be developed by 23andMe or its collaborators” (23andMe, 2025). This approach raises profound ethical and legal dilemmas (Raz et al., Reference Raz, Niemiec, Howard, Sterckx, Cockbain and Prainsack2020). In the EU, Article 4 of the GDPR defines “processing” in a manner that obliges companies to obtain informed consent before anonymising, pseudonymising, or sharing data (Shabani and Borry, Reference Shabani and Borry2018). Despite this strict requirement, compliance often amounts to little more than a procedural checkbox: companies make consent a precondition for the use of services, leaving users with no meaningful choice but to accept the terms. In scientific research, by contrast, data subjects frequently donate genomic data voluntarily, motivated by a desire to support technological advancement. Yet, even when driven by altruism, this goodwill does not guarantee that adequate safeguards will be in place when researchers share the data or third parties use the data. While many subjects donate selflessly, the diverse risks inherent in genomic data sharing cannot be dismissed.
A more complex challenge in genomic data governance relates to group consent and the secondary use of such data. Genomic research frequently relies on individual-based consent, even when working with tribal members who reside outside their communities (Tsosie et al., Reference Tsosie, Yracheta and Dickenson2019). Yet, this model fails to account for the unique risks faced by small, cohesive groups like Indigenous tribes, where group-level harms can affect the entire community. Analysing genomic data at a collective level may compromise group privacy (Gusareva et al., Reference Gusareva, Ghosh, Kharkov, Khor, Zarubin, Moshkov, Kalsi, Ratan, Heinle, Cooke, Bravi, Smolnikova, Tereshchenko, Kasparov, Khitrinskaya, Marusin, Razhabov, Golubenko, Swarovskaya, Kolesnikov, Vagaitseva, Eremina, Sukhomyasova, Shtygasheva, Panicker, Ang, Lee, Koh, Leong, Park, Lohar, Yap, Ng, Dacanay, Drautz-Moses, Ramli, Tokunaga, McGonigle, Danjoh, Moreno-Estrada, Tajima, Tanabe, Nakamura, Nakagome, Tatarinova, Stepanov, Schuster and Kim2025), creating impacts that extend far beyond individual data subjects. For this reason, group-level consent, especially for Indigenous communities, is necessary to address these broader risks. The Havasupai Tribe case, referenced earlier, illustrates this clearly: secondary use of their genomic data uncovered information not covered by initial consent clauses—such as details about ancestry and familial connections—that conflicted with the Tribe’s cultural beliefs (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). In addition, under the current informed consent framework, the distribution of risks and benefits of genomic research is uneven for Indigenous communities. These groups often bear substantial risks from genomic research but gain few of its associated benefits (Hudson et al., Reference Hudson, Garrison, Sterling, Caron, Fox, Yracheta, Anderson, Wilcox, Arbour, Brown, Taualii, Kukutai, Haring, Te Aika, Baynam, Dearden, Chagné, Malhi, Garba, Tiffin, Bolnick, Stott, Rolleston, Ballantyne, Lovett, David-Chavez, Martinez, Sporle, Walter, Reading and Carroll2020).
As Professor Solove (Reference Solove2013) concluded, the framework of “privacy self-management” through consent is fundamentally flawed. In the context of genomic data sharing, the principle of informed consent is often reduced to a performative gesture, prioritising data providers’ interests over individual autonomy (Bonomi et al., Reference Bonomi, Huang and Ohno-Machado2020). This aligns with broader critiques of consent as a mechanism, highlighting cognitive limitations, ambiguous language, the inability to foresee future data uses, and the neglect of group interests—factors that collectively render consent an ineffective safeguard.
6. Proposals for governance reform of genomic data sharing
While the individual and societal benefits of data sharing are significant, the associated risks cannot be overlooked. The above discussion has highlighted the unique characteristics of genomic data and shown that the existing legal mechanisms governing data sharing require enhancement to address the diverse risks involved. To strengthen governance practices for genomic data, this section advocates for a rethinking of the governance mechanisms for genomic data under the concept of genomic contextualism. The proposals put forward here aim to balance two core objectives: safeguarding stakeholders’ interests and ensuring the benefits of genomic data sharing are distributed equitably. This balance is particularly critical for underrepresented communities, which have historically been excluded from reaping the rewards of such research (Fullerton, Reference Fullerton2011). Below, I provide a detailed introduction to these proposals, which are summarised and illustrated in Figure 1. These recommendations can be adapted to both European and Chinese contexts, while also laying the groundwork for enhanced governance of genomic data in other jurisdictions.

Figure 1. Equitable governance of genomic data sharing. Data providers may share human genomic data with third parties for utilisation only if (1) data subjects give informed consent for both the acquisition and subsequent activities; (2) additional group consent is obtained when data subjects belong to a group that may face risks of harm from utilisation. Regardless of the acquisition context (clinical, research, or commercial), subsequent data activities must safeguard stakeholders’ interests through effective risk prevention and equitable distribution of derived benefits.
6.1. Supplementing informed consent with an interest-balancing principle
As discussed earlier, informed consent is ineffective for genomic data sharing: it obscures the unfair power dynamics inherent in such transactions and overlooks the risks and interests of affected groups. Moreover, the informed consent model struggles to apply at the group level, largely because reaching group consensus is inherently challenging. Civic epistemology—a framework for understanding how societies engage with science—helps explain this: different individuals, ethnic groups, and nations hold distinct perspectives on science and technology, shaped by their unique contexts (Jasanoff, Reference Jasanoff2005, p. 250). When considering the wide range of ethnic groups across nations, each with its own cultural background, this diversity of views becomes even more pronounced, further complicating efforts to secure meaningful group consent for human genomic data sharing.
This lack of recognition is not new: Indigenous communities have been the focus of Western scientific research for centuries. For Indigenous peoples and minority groups, however, genomic data are often perceived as more sensitive than other types of health data. This sensitivity is particularly pronounced in genealogy and ancestry research—work that can challenge traditionally held beliefs, reshape cultural histories, and impact claims to identity, land, and other resources (Hudson et al., Reference Hudson, Garrison, Sterling, Caron, Fox, Yracheta, Anderson, Wilcox, Arbour, Brown, Taualii, Kukutai, Haring, Te Aika, Baynam, Dearden, Chagné, Malhi, Garba, Tiffin, Bolnick, Stott, Rolleston, Ballantyne, Lovett, David-Chavez, Martinez, Sporle, Walter, Reading and Carroll2020). A history of unethical behaviour, poor communication, disregard for cultural and spiritual beliefs, and failure to prioritise Indigenous communities’ interests has fostered deep mistrust between researchers and these groups (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b). Beyond this mistrust, Indigenous peoples also express hesitancy to participate in genomic research due to years of being studied without seeing benefits, receiving results, or being able to prevent exploitation of their potentially patentable genetic material. Compounding this, private research entities primarily prioritise profit. They are reluctant to invest in developing healthcare products for Indigenous or minority groups when such work offers little financial return. Adding to the challenge, policymakers already struggle to design policy frameworks that advance orphan medicinal products (Aartsma-Rus et al., Reference Aartsma-Rus, Dooms and Le Cam2021). This pre-existing gap makes it even harder to incentivise genomic research focused on creating new healthcare solutions for Indigenous peoples or minority groups under current policy structures.
Indigenous peoples also express concerns about protecting their rights and interests. An illustrative example is the aforementioned All of Us project, sponsored by the US National Institutes of Health (USNIH). The National Congress of American Indians called on the USNIH to “assess consultation input to date, and immediately develop clear processes and guidelines: these should require individual sovereign Tribal Nations to provide prior consent before data and specimens are collected from their members, and grant Tribal Nations oversight—including local control and storage of any data or biospecimens linked to or identified as belonging to a citizen of their nation” (National Indian Health Board, 2020). Article 31 of the United Nations Declaration on the Rights of Indigenous Peoples (2007) provides the legal basis for this stance. It states: “Indigenous peoples have the right to maintain, control, protect and develop their cultural heritage, traditional knowledge and traditional cultural expressions, […] including human and genetic resources.”
Therefore, more equitable human genomic data governance must respect the sovereignty and interests of ethnic groups. This goal can be advanced through actions like community-engaged research, clear guidelines, and policies that guarantee Indigenous communities that their interests are protected—steps that may encourage greater participation from Indigenous leaders, communities, and individuals (Garrison et al., Reference Garrison, Hudson, Ballantyne, Garba, Martinez, Taualii, Arbour, Caron and Rainie2019b).
In practice, however, genomic research projects often fail to meet this standard: they frequently recruit Indigenous individuals living in urban centres without establishing formal partnerships or consulting the individuals’ home tribes. To address this gap and achieve genuine fairness, robust due process in legislation and policy development is essential. Scholars advocate for deliberative democratic methods as a solution (Koenig, Reference Koenig2014), which prioritise inclusive dialogue and collective decision-making among all stakeholders. These methods bring diverse community members into discussions about the ethical, social, and practical impacts of genomic research (Lemke et al., Reference Lemke, Esplin, Goldenberg, Gonzaga-Jauregui, Hanchard, Harris-Wai, Ideozu, Isasi, Landstrom, Prince, Turbitt, Sabatello, Vergano, Taylor, Yu, Brothers and Garrison2022), ensuring perspectives from data subjects, providers, and users all shape equitable genomic data management, ultimately supporting fair distribution of benefits and responsibilities.
Yet, these deliberative approaches remain more theoretical than practical for much genomic research. A key challenge is the geographic reality of many tribal communities: many members live in remote areas and have high mobility, making it difficult to implement critical protocols such as recruitment, initial consent, reconsent, and long-term follow-up (Tsosie et al., Reference Tsosie, Yracheta and Dickenson2019). Compounding this, given that genomic research may pose greater harm than benefit to Indigenous peoples, many tribal nations are left questioning whether the value of their involvement outweighs the associated risks.
Beyond securing prior consent from Indigenous peoples, the interest-balancing mechanism extends beyond personal self-determination; it also has the potential to enhance or complement the informed consent framework. Importantly, interest balancing focuses on the justice of human genomic data sharing activities themselves, rather than solely prioritising personal autonomy or group sovereignty. Before exploring this further, it is critical to clarify the relationship between interest balancing and risk mitigation. Risk prevention and benefit distribution are two interrelated aspects of the same challenge. When genomic data sharing does not generate sufficient benefits to cover its costs, data providers and users must assume responsibility for risk prevention, including bearing necessary financial obligations. Conversely, when sharing activities yield substantial benefits for providers and users, these gains must not be exclusively appropriated; instead, benefits should be redistributed to ensure data subjects receive equitable returns. While fair distribution of responsibilities and benefits is equally important in human genomic data sharing, current practices often impose greater risks than benefits, particularly for data subjects and related groups. For this reason, a heightened focus on risk prevention is imperative, with clear emphasis on the obligations of data providers and users.
That said, human genomic data sharing could and should deliver benefits to individuals and their communities. Personal genomic companies have already explored models to compensate individuals for contributing their genomic data to research (Grishin et al., Reference Grishin, Obbad, Estep, Quinn, Zaranek, Zaranek, Vandewege, Clegg, César, Cifric and Church2018). A notable example is the data dividend model (Kudva and Aswani, Reference Kudva and Aswani2023), where platforms compensate data subjects for data use. In genomic data sharing, many firms also use similar strategies to encourage customers to sequence their data—offering future rewards for sharing, such as helping them sell their data to researchers (Molteni, Reference Molteni2016) and compensating genomic data contributors with company stock (Grishin et al., Reference Grishin, Obbad, Estep, Quinn, Zaranek, Zaranek, Vandewege, Clegg, César, Cifric and Church2018). Mechanisms such as data user fees (Gillette and Hopkins, Reference Gillette and Hopkins1987) can further promote fairness among all stakeholders, ensuring data subjects and related groups receive compensation. Besides financial incentives, data providers and users should also be encouraged or required to share research findings with data subjects and related groups (Ormond et al., Reference Ormond, Stanclift, Reuter, Carter, Murphy, Lindholm and Wheeler2025, p. 1). Concurrently, data users must act responsibly to ensure scientific benefits are equitably shared (Gil and Guerreiro, Reference Gil and Guerreiro2024, p. 1).
The interest-balancing mechanism not only improves the current informed consent framework but also holds independent value. The existing informed consent model is built around individual autonomy or self-management. Yet, even when data controllers legally obtain consent from data subjects, the fairness of these transactions may still be questioned—regardless of who covers the costs of sequencing (Hawkins and Emanuel, Reference Hawkins and Emanuel2008). This issue also applies to groups: even if group consent is secured, the resulting data sharing may still be unfair. For these reasons, there is a need to pursue a more just form of informed consent—one rooted in the principles of interest balancing.
6.2. Enhancing TOMs within a data lifecycle management framework
This subsection examines the practical implementation of informed consent based on interest balancing, focusing on how TOMs can fulfil risk mitigation objectives. TOMs do not hold independent value; their purpose is derived from the foundational mechanism they support—in this case, the interest-balanced informed consent framework. For TOMs specifically, a key principle applies: they must be tailored to the unique characteristics of human genomic data (the subject matter they protect) and aligned with the principle of interest balancing (the core protection goals). In practice, this means effective technical measures should be cost-efficient, with adjustments made to keep pace with evolving technologies and societal needs. To operationalise informed consent based on interest balancing, comprehensive TOMs are required. This mechanism is built around three core components: a data lifecycle management system, human genomic data sharing platforms or spaces with registered access models, and effective protection tools. Together, these elements help mitigate diverse risks and ensure that derived benefits are distributed fairly.
Effective governance of genomic data necessitates a robust lifecycle management system, underscoring the imperative for secure, transparent, and inclusive practices across all stages of data processing. Modelled on product lifecycle frameworks in management science (Stark, Reference Stark and Stark2022), genomic data lifecycle management entails a structured methodology encompassing data acquisition, secure storage, ethical sharing, responsible utilisation, and timely erasure in accordance with data subjects’ preferences. The primary objective of this system is to mitigate diverse risks and safeguard the interests of all stakeholders in the genomic data lifecycle. In China, regulatory provisions complement informed consent to govern this lifecycle. Pursuant to Article 56 of China’s Biosecurity Law, collecting, storing, sharing, or utilising genomic resources above a certain quantity requires approval from the health department of the State Council—with exceptions for routine activities like clinical diagnosis and treatment. Here, “genomic resources” include both biospecimens that generate genomic data and the genomic data itself. Notably, however, China’s current legislation does not stipulate requirements for data erasure—a critical gap in the lifecycle management process.
The emphasis of data acquisition is warranted by the observation that when individuals consent to WGS—whether for medical, research, or commercial purposes—organisations often employ measures like anonymisation or granular consent to acquire data, yet subsequent stages of storage, sharing, and utilisation remain inadequately regulated. The significance of data acquisition has been recognised, yet the protection measures remain ineffective. As highlighted earlier, the current individual consent framework lacks provisions for group consent or consultation—and even where such input is needed, it is difficult to enforce. To address this, group consent should be established as a prerequisite for data acquisition in relevant cases. Consequently, human genomic data acquisition and subsequent activities must obtain separate informed consent from individuals and from associated groups when necessary. For comprehensive protection of human genomic data, the definition of “data acquisition” should encompass biospecimen acquisition. Additionally, regulations should specify the conditions under which individuals can collect and submit their own or others’ biospecimens for WGS and provide informed consent for the acquisition and subsequent sharing of genomic data.
Data storage is managed by data providers, and it represents their most significant contribution to genomic data governance, forming the foundation for how their interests are allocated. Storage begins once data acquisition ends, when providers take actual control of the human genomic data. Regulation (EU) 2025/327 (2025) includes provisions on data storage that offer valuable lessons. Article 72 mandates that trusted health data holders store data in a secure processing environment and comply with all requirements of the Regulation. One key compliance obligation, outlined in Article 77, is that reused health data must be publicly available via standardised machine-readable dataset catalogues. Additionally, Article 62 allows these holders to charge fees for making electronic health data available for secondary use. These rules provide a robust model for governing human genomic data storage. Providers should store human genomic data in a unified format (e.g. a standardised metadata structure) to facilitate seamless access, sharing, and usage (Byrd et al., Reference Byrd, Greene, Prasad, Jiang and Greene2020). They must also implement effective technical security measures to prevent unauthorised access and data leaks—such as the breaches that affected 23andMe. Given the costs of maintaining secure, standardised storage, data providers are entitled to fair compensation.
Data sharing is central to risk prevention and interest distribution. Currently, two prominent models dominate research data sharing: the controlled-access model and the registered-access model—both designed to mitigate specific risks (Byrd et al., Reference Byrd, Greene, Prasad, Jiang and Greene2020; Dyke, Reference Dyke, Jiang and Tang2020). The controlled-access model restricts data sharing to approved researchers for specific purposes (Ramos et al., Reference Ramos, Din-Lovinescu, Bookman, McNeil, Baker, Godynskiy, Harris, Lehner, McKeon, Moss, Starks, Sherry, Manolio and Rodriguez2013), exemplified by databases such as the US Genotypes and Phenotypes database (Mailman et al., Reference Mailman, Feolo, Jin, Kimura, Tryka, Bagoutdinov, Hao, Kiang, Paschall, Phan, Popova, Pretel, Ziyabari, Lee, Shao, Wang, Sirotkin, Ward, Kholodov, Zbicz, Beck, Kimelman, Shevelev, Preuss, Yaschenko, Graeff, Ostell and Sherry2007) and the EU’s European Genome–Phenome Archive (Lappalainen et al., Reference Lappalainen, Almeida-King, Kumanduri, Senf, Spalding, ur-Rehman, Saunders, Kandasamy, Caccamo, Leinonen, Vaughan, Laurent, Rowland, Marin-Garcia, Barker, Jokinen, Torres, de Argila, Llobet, Medina, Puy, Alberich, de la Torre, Navarro, Paschall and Flicek2015). Regulation (EU) 2025/327 (2025)—which establishes the EHDS—also adopts a controlled-access model for health data reuse. A key feature of this model is the requirement for rigorous review by dedicated data access committees; however, this process can lead to delays (Dyke, Reference Dyke, Jiang and Tang2020). In the EHDS, health data access bodies perform a similar role to these committees, meaning they face the same dilemma of balancing thoroughness with efficiency.
For ethical and efficient data sharing, the registered-access model—proposed by GA4GH—offers a viable alternative. This model builds on the well-established role-based access control framework used in information technology (IT) security. Unlike controlled-access models (which typically require approval for specific research projects), registered access grants users online access based on their role and a prior risk analysis (Dyke et al., Reference Dyke, Linden, Lappalainen, De Argila, Carey, Lloyd, Spalding, Cabili, Kerry, Foreman, Cutts, Shabani, Rodriguez, Haeussler, Walsh, Jiang, Wang, Perrett, Boughtwood, Matern, Brookes, Cupak, Fiume, Pandya, Tulchinsky, Scollen, Törnroos, Das, Evans, Malin, Beck, Brenner, Nyrönen, Blomberg, Firth, Hurles, Philippakis, Rätsch, Brudno, Boycott, Rehm, Baudis, Sherry, Kato, Knoppers, Baker and Flicek2018). In theory, this model could enable access to all shared human genomic data via a unified general registration process, eliminating the need for individualised data access committee reviews. To operate, it would require funding (either public or from contributions by data providers or users) and mandate user registration, identity verification, and declaration of intended data use.
Beyond risk mitigation, fair benefit allocation must also be considered in human genomic data sharing. The EHDS falls short here, as it requires electronic health data to be anonymised before secondary reuse, effectively excluding data subjects from compensation. Article 62 of Regulation (EU) 2025/327 (2025) allows health data access bodies to charge fees for providing electronic health data for secondary use, but these data are pseudonymised or anonymised. This anonymisation is ineffective, yet it bars data subjects (and related groups) from receiving financial benefits from their data. And even if data anonymisation can be realised, fair benefit allocation is still needed. In addition, the EHDS does include one benefit allocation for data subjects: it states that “[a] healthcare provider or a third party shall not directly or indirectly charge data subjects a fee or costs, or require compensation, for sharing or accessing data.” This is reasonable given individuals’ need for primary access to their own health data. However, when applied to human genomic data—which carries unique value and risks—this approach requires further assessment through a rigorous benefit–cost analysis to ensure equity.
Data utilisation is the core objective of human genomic data sharing, encompassing two key forms: use by data subjects themselves (primary use) and reuse by third parties (secondary use). The EU’s EHDS offers valuable insights here, as it is designed to facilitate access to electronic health data for both primary and secondary uses. For primary use, the EHDS outlines a set of rights in Chapter II of Regulation (EU) 2025/327 (2025) to support individuals and their representatives in accessing electronic health data. However, a critical gap remains: Regulation (EU) 2025/327 (2025) does not explicitly clarify whether individuals who undergo commercial genomic testing can require testing companies to share their genomic data with healthcare providers. This matters because integrating such data into electronic health records (EHRs) could enable both primary use (for personal healthcare) and secondary use (for research). While genomic data integration into EHRs is still in early stages, it holds significant potential—including improving personal health outcomes, enabling effective clinical application of genomic data, and advancing genomic research (Walton et al., Reference Walton, Johnson, Person and Chamala2019).
Third-party secondary reuse of health data under the EHDS is tightly constrained: it is limited to specific purposes and explicitly excludes harmful activities. Article 53 of Regulation (EU) 2025/327 (2025) permits reuse for scientific research in health or care, policymaking and regulatory work, education and teaching, and activities to improve public or occupational health. In contrast, Article 54 prohibits reuse for three key purposes: making detrimental or discriminatory decisions about individuals or groups, conducting advertising or marketing, and developing products or services that could harm individuals, public health, or society. These detailed rules on permitted and prohibited third-party reuse create strict safeguards, helping to control risks linked to data sharing. Genomic data sharing can draw direct lessons from this framework: imposing clear limits on third-party secondary use would similarly mitigate risks while preserving the value of genomic data for beneficial purposes.
Data erasure is essential for enabling individuals whose genomic data have been collected to regain control over their data. To achieve comprehensive control, genomic data erasure must include the deletion of entire genomic and genetic datasets, along with associated information and biospecimens. The permanent nature of genomic data and the lack of legal mandates for its erasure in many jurisdictions underscore the need to enshrine a legally enforceable right to data deletion as a cornerstone of data subject autonomy (Gassner, Reference Gassner2021). By embedding erasure as a default mechanism, this framework ensures that individuals retain control over the duration of their data’s existence and mitigates long-term privacy risks associated with indefinite data retention. Regulation (EU) 2025/327 (2025) grants individuals a range of rights that overlap in function with data erasure—such as the right to data portability, the right to restrict access, and the right to opt out of primary data use. However, these rights are not equivalent to a full right to deletion. Notably, the right to opt out only applies to primary use and excludes secondary reuse of health data. This gap stems from the Regulation’s classification of secondary use data as anonymised (and thus non-personal), which also explains why the text does not grant individuals an explicit right to delete their data. A right to delete is nonetheless essential: data subjects can only truly retain control if they can monitor data use, request erasure, and withdraw consent effectively. Without this right, these other protections remain incomplete. Ultimately, incorporating a right to delete into human genomic data sharing systems would balance progress in genomic research with the protection of stakeholder interests.
In summary, interest balancing and data lifecycle management function as enhancement mechanisms for genomic data protection, grounded in the principle of genomic contextualism. Their underlying logic differs from the individual-centric informed consent mechanism, with the proposals in this article requiring sui generis protection or specific provisions within general or health data protection laws.
7. Conclusion and future work
This article highlights the need for enhanced governance of human genomic data sharing. It reviews the concept of genomic data and the historical development of HGPs and firstly emphasises that genomic data constitute collective personal data. Drawing on this uniqueness, the article argues that the concept of “genomic contextualism” should be applied to govern these distinct data. It also outlines a tripartite taxonomy of risks in genomic data sharing. To improve governance and mitigate associated risks, the article compares and analyses data protection frameworks in the EU and China, highlighting that current systems may be insufficient to address all the risks posed by genomic data sharing.
The article further puts forward governance reform recommendations to promote responsible data sharing practices. It stresses that group consent is required where genomic data sharing and related activities impact group interests. Additionally, it proposes that data protection systems should be built on the principle of interest balancing, moving beyond over-reliance on informed consent alone. To implement this principle and comprehensively mitigate the risks of genomic data sharing, the article suggests establishing a data lifecycle management framework supported by effective TOMs. Collectively, these recommendations aim to foster responsible genomic data sharing, reduce diverse risks, and ensure equitable benefits for all involved stakeholders.
Beyond the analytical content of this article, this work seeks to drive both practical and critical research on the global landscape of genomic data governance. In particular, cross-border genomic data sharing poses unique challenges, as it involves balancing transnational interests and even intersects with national competition. We must also address the rapid development of AI, ML, synthetic biology, and other related technologies. These fields are converging with genomics and possess enormous potential, capable of delivering profound benefits to humanity or posing severe risks to its well-being.
Data availability statement
No publicly available datasets were utilised or generated in this research.
Acknowledgments
I am grateful to everyone who helped improve the quality of the article at different stages of publication, especially the reviewers and the editor for their comments/suggestions.
Author contribution
G.W.: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing—original draft, Writing—review and editing. The author alone carried out the research and was responsible for drafting and editing the manuscript.
Competing interests
The authors declare none.
Comments
No Comments have been published for this article.