Human–AI Collaboration for Scientific Discovery

doi:10.1017/9781009587877.011

Chapter 10 - Human–AI Collaboration for Scientific Discovery

Published online by Cambridge University Press: 19 September 2025

Shuo Zhao ,

Yang Liu ,

Jiayu Wan ,

Tan Tang and

Xin Li

Edited by

Dan Wu and

Shaobo Liang

Show author details

Dan Wu: Affiliation:
Wuhan University, China
Shaobo Liang: Affiliation:
Wuhan University, China

Book contents

Summary

Nowadays, artificial intelligence (AI) is becoming a powerful tool to process huge volumes of data generated in scientific research and extract enlightening insights to drive further explorations. The recent trend of human-in-loop AI has promoted the paradigm shift in scientific research by enabling the interactive collaboration between AI models and human experts. Inspired by these advancements, this chapter explores the transformative role of AI in accelerating scientific discovery across various disciplines such as mathematics, physics, chemistry, and life sciences. It provides a comprehensive overview of how AI is reshaping the scientific research – enabling more efficient data analysis, enhancing predictive modeling, and automating experimental processes. Through the examination of case studies and recent developments, this chapter underscores AI’s potential to revolutionize scientific discovery, providing insights into current applications and future directions. It also addresses the ethical challenges associated with AI in science. Through this comprehensive analysis, the chapter aims to provide a nuanced understanding of how AI is facilitating scientific discovery and its potential to accelerate innovations while maintaining rigorous ethical standards.

Keywords

Human–AI Collaboration Artificial Intelligence Scientific Discovery

Information

Type: Chapter
Information: Human-AI Interaction and Collaboration , pp. 239 - 267

DOI: https://doi.org/10.1017/9781009587877.011 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

Chapter 10 Human–AI Collaboration for Scientific Discovery

10.1 Introduction

10.1.1 Introduction of Artificial Intelligence

Artificial intelligence (AI) is defined as “the science and engineering of making intelligent machines” (McCarthy et al., Reference McCarthy, Minsky, Rochester and Shannon2006). It is a branch of computer technology that enables machines to simulate human intelligence and problem-solving capabilities. The concept of AI first emerged in Alan Turing’s renowned article (Turing, Reference Turing1950), where he introduced the term “machinery intelligence.” The formal terminology was established during the Dartmouth Conference in 1956 (McCarthy et al., Reference McCarthy, Minsky, Rochester and Shannon2006), an event recognized as the birth of AI.

Over the decades, AI technology has undergone several cycles of hype and setbacks. In its early stage (1950s~1960s), foundational algorithms were developed that laid the groundwork for many of today’s advancements in AI. These algorithms can be broadly classified into two categories: symbolism and connectionism (Smolensky, Reference Smolensky1987). Symbolism, also known as symbolic AI, represented knowledge through symbols and rules, using logic and formal reasoning to manipulate these symbols. This approach excelled in tasks requiring clear and structured knowledge, such as theorem proving and game playing. On the other hand, connectionism, or connectionist AI, sought to model the interconnected neurons in brains and aimed to create artificial neural networks (NNs) capable of learning from data.

During this stage, AI technologies were often criticized for addressing what were deemed as “toy problems.” As a famous example, the early connectionism model cannot learn the XOR function (Minsky & Papert, Reference Minsky and Papert1969). Such critical limitations led to skepticism about the general capabilities of AI. For instance, the Automatic Language Processing Advisory Committee (ALPAC) of the United States stated in its 1966 report that “fully automatic high-quality machine translation was not going to be realized for a long time” (Pierce & Carroll, Reference Pierce and Carroll1966). AI at this stage was akin to human childhood, with its capabilities constrained by limited computational resources and immature reasoning abilities.

In the 1980s, expert systems, a pinnacle of symbolic AI, experienced a significant rise in popularity as the first truly commercial application of AI research and development. Unlike earlier symbolic AI, which relied solely on formal logical expressions, expert systems incorporated expert knowledge by encoding it into a set of rules and heuristics applicable to specific tasks (Waterman, Reference Waterman1985). Utilizing inference engines, these systems could reason through complex problems and provide solutions based on accumulated knowledge. At this stage, AI functioned like a specialist, capable of handling domain-specific applications with a high degree of expertise while being limited in other areas.

As a major limitation of expert systems, they can only solve “well-defined” problems within a narrow domain. Namely, if a human expert can articulate the steps of reasoning to solve a problem, an expert system can replicate this process. However, if the reasoning cannot be explicitly explained, the expert system cannot be applied to the problem effectively. To overcome this limitation, data-driven approaches, particularly NNs, have been adopted. NNs mimic the structure and function of human brains, consisting of interconnected layers of nodes (i.e., neurons). Each node processes input data and passes the result to the next layer, gradually refining the information through multiple layers and eventually producing the final output. Unlike expert systems, NNs can learn from large datasets, identifying patterns and making predictions without the need for any explicit rule-based programming.

Although the idea of NNs was proposed much earlier – such as the basic model of a single neuron developed in 1958 (Rosenblatt, Reference Rosenblatt1958) and the backpropagation algorithm for optimizing parameters in multi-layer networks derived in 1982 (Werbos, Reference Werbos1994) – it did not gain extreme popularity until the 2010s. The surge in its popularity was driven by dramatic advancements in computing capabilities and the availability of large-scale datasets. These conditions enabled deep NNs (DNNs), which consist of hundreds of layers and billions of parameters. As a result, NNs began to achieve superior performance over humans across various domains, including object detection, speech recognition, language processing, etc., thereby leading to the advent of the era of deep learning (DL), which continues to thrive today.

Despite the remarkable advancements in DL technologies, the emergence of large language models (LLMs) stands out as the most exciting accomplishment in recent years. These models are trained in an auto-regressive manner on large-scale corpora, predicting the next word in a sentence based on the context of preceding words. Its training process is enhanced by reinforcement learning from human feedback (RLHF), where human evaluators provide feedback to refine the model responses, aligning them closely with human expectations (Ouyang et al., Reference Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin and Lowe2022). As a result, LLMs can understand given instructions and generate human-like text with impressive accuracy. The capabilities of LLMs extend far beyond basic text generation. They can perform a wide range of natural language processing NLP tasks such as translation, summarization, question-answering, and even creative writing and coding. Their ability to understand context and generate coherent and contextually appropriate responses makes them incredibly versatile tools. Moreover, LLMs hold significant potential in the pursuit of artificial general intelligence (AGI). Their comprehensive cognitive abilities showcase the remarkable progresses toward machines that can understand, learn, and apply knowledge across a wide range of tasks at a level comparable to human intelligence.

10.1.2 Model Architectures and Learning Schemes

To explore human–AI collaboration for scientific discovery, it is essential to understand the popular AI model architectures and various learning schemes. Such knowledge allows us to appreciate how AI can be leveraged in scientific research and how these technologies can be tailored to address specific scientific problems. Given that DL dominates AI applications today, we will focus on the relevant topics for DL in this section.

A number of popular architectures have emerged as the foundational models in the DL era, each designed to address specific tasks. Convolutional neural networks (CNNs) are widely used for image and video processing due to their ability to capture spatial hierarchies through convolutional layers (LeCun et al., Reference LeCun, Boser, Denker, Henderson, Howard, Hubbard and Jackel1989). Recurrent neural networks (RNNs) are designed for sequential data (Pearlmutter, 1989), such as time series or natural language, and can retain information across long sequences, making them ideal for tasks like language modeling and translation. Graph neural networks (GNNs) excel in handling graph-structured data (Scarselli et al., Reference Scarselli, Gori, Tsoi, Hagenbuchner and Monfardini2008), consisting of nodes and edges, allowing for effective processing of relationships and interactions within data points, which is particularly useful in social network analysis and molecular biology (Li et al., Reference Li, Huang and Zitnik2022).

Self-attention-based models, particularly the transformer architecture, have revolutionized NLP technologies (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez and Polosukhin2017). Transformers utilize self-attention mechanisms to weigh the importance of different words in a sentence when making predictions, allowing for the modeling of long-range dependencies and the parallel processing of sequences. This has led to highly versatile and powerful language models, such as generative pre-trained transformers (GPTs). Additionally, self-attention has been adapted into vision transformers (ViTs) for image recognition, demonstrating superior performance over traditional CNNs in standard benchmarks (Dosovitskiy et al., Reference Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner and Houlsby2021).

Diffusion models represent another innovative architecture (Sohl-Dickstein et al., Reference Sohl-Dickstein, Weiss, Maheswaranathan and Ganguli2015), particularly in the field of generative modeling. These models simulate the process of data generation through a sequence of steps that gradually add and then remove noises from the data, effectively “diffusing” and then “denoising” the data to generate new samples. Diffusion models have shown remarkable success in generating high-quality images and have been applied to other areas such as audio and video generation. Their ability to model complex data distributions makes them powerful tools for a variety of generative tasks.

In addition to the various architectures of AI models, a variety of powerful learning schemes enable machines to learn from a vast amount of data and make intelligent decisions. These schemes are classified based on the nature of the training data and the learning objectives, distinguishing them into different categories such as supervised, unsupervised, semi-supervised, reinforcement, and transfer learning (LeCun et al., Reference LeCun, Bengio and Hinton2015). Supervised learning involves training an AI model on a labeled dataset, where the input data is paired with the correct output. It is highly effective for tasks like classification and regression. Unsupervised learning, on the other hand, deals with unlabeled data, allowing an AI model to identify patterns and relationships within the data on its own, which makes it suitable for clustering and dimensionality reduction tasks. Semi-supervised learning combines both supervised and unsupervised learning by using a small amount of labeled data alongside a large amount of unlabeled data to improve learning efficiency and accuracy. Transfer learning leverages pre-trained models for new tasks, saving time and resources. Additionally, reinforcement learning is a unique scheme where an AI agent learns by interacting with its environment, receiving rewards or penalties based on its actions, which makes it particularly powerful for sequential decision-making problems and tasks that require long-term planning. These schemes collectively empower AI models to tackle a wide array of complex problems, from image and speech recognition to NLP and beyond.

10.1.3 Early Human–AI Collaboration in Science

Scientific discovery is one of the most exhilarating and profoundly creative endeavors of human intelligence. It demands a profound mastery of existing knowledge, meticulous observation of natural phenomena or experimental data, strong and insightful cognition of new facts and boundless imagination to conceive underlying mechanisms. The integration of AI into scientific research is a powerful testament to the extreme capabilities and potential of AI technologies. This journey has a rich history, full of triumphs and breakthroughs. In this section, we delve into the landmark achievements of the pre-DL era (before the 2010s), celebrating the milestones that paved the way. The revolutionary cases from the DL era will be further explored in Section 10.1.4.

As mentioned previously, early AI (1950s~1960s) had quite limited capabilities. The primary efforts to apply AI in scientific research during this period focused on theorem proving. The first theorem-proving program, Logic Theorist, was developed in 1956 (Siekmann & Wrightson, Reference Siekmann and Wrightson2012). It proved propositional logic theorems by using axioms and inference rules. It worked not only for numeric expressions but also for symbolic formulas, with proof searching guided by heuristics. Impressively, Logic Theorist proved thirty-eight out of fifty-two theorems in chapter 2 of Principia Mathematica. The achievement was further advanced in 1960 when all 350 theorems across nine chapters of Principia Mathematica were proved (Wang, Reference Wang1960).

The 1960s through the 1980s witnessed the rise of expert systems. Expert systems like DENDRAL (Feigenbaum et al., Reference Feigenbaum, Buchanan and Lederberg1970) and MYCIN (Davis et al., Reference Davis, Buchanan and Shortliffe1977) demonstrated the potential of AI to replicate human expertise in specific domains. DENDRAL, developed in the 1960s, was one of the first expert systems designed to assist chemists in identifying unknown organic molecules. MYCIN, developed in the 1970s, was used to diagnose bacterial infections and recommend treatments. These systems relied heavily on symbolic AI and rule-based reasoning, encoding expert knowledge into a set of if–then rules.

Despite the success of expert systems, the reliance on symbolic AI posed significant challenges, particularly the knowledge engineering bottleneck which refers to the labor-intensive process of manually encoding domain-specific knowledge into the system. As scientific knowledge became more complex and dynamic, the limitations of symbolic AI were more pronounced. The need for more scalable and adaptable AI approaches led to the exploration of other alternative methods.

The 1990s marked a paradigm shift in AI research with the emergence of connectionist AI technologies. Unlike symbolic AI, which required explicit programming of rules, connectionist AI enabled systems to learn from data. Techniques such as decision trees, support vector machines, and early NNs began to gain traction in the scientific community. These techniques were applied to deal with the specific steps in a wide range of problems, including genomics (Libbrecht & Noble, Reference Libbrecht and Noble2015), drug discovery (Gertrudes et al., Reference Gertrudes, Maltarollo, Silva, Oliveira, Honorio and Da Silva2012), and so on.

In the pre-DL era, AI was primarily used as a tool to address specific steps in scientific research. Researchers provided explicit instructions for solving domain-specific problems and AI tools operated under human guidance, demonstrating limited autonomy and creativity. The output from AI was generally expected by the researchers. While effective within their constrained domains, these early AI systems lacked the ability to generalize and adapt to broader and complex tasks.

10.1.4 Human–AI Collaboration in Modern Scientific Research

Since 2010, we have been amid an explosive surge in AI technologies, driven by enhanced computing power and large-scale training data. This empowered AI has the capacity to delve deeper and explore broader horizons than human intelligence alone, revolutionizing scientific research across various disciplines. In mathematics, for instance, AI technologies have been harnessed to generate creative code solutions for challenging open problems, offering innovative approaches that were previously unimaginable (Romera-Paredes et al., Reference Romera-Paredes, Barekatain, Novikov, Balog, Kumar, Dupont and Fawzi2024). These initial solutions are further refined and optimized using genetic algorithms, a process that mimics natural evolution. The aforementioned AI techniques have yielded groundbreaking results, surpassing human achievements on longstanding open mathematical challenges such as the cap-set problem and the bin-packing problem.

In addition to its capabilities to solve complex problems, AI can also significantly enhance efficiency in scientific research. A notable example is AlphaFold, an AI tool developed by DeepMind, which predicts the 3D structure of proteins (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger and Hassabis2021). Proteins are essential to life, and understanding their structure is crucial for elucidating their function. Traditionally, determining protein structures required years of experimental work using techniques like X-ray crystallography or cryo-electron microscopy. Despite these efforts, only around 100,000 unique protein structures had been determined, representing a small fraction of the billions of known protein sequences. AlphaFold revolutionized this field by providing highly accurate predictions of protein structures within a short time period, dramatically accelerating the pace of biological research. The latest database release by AlphaFold contains over 200 million entries. This breakthrough not only saves time and resources but also opens new possibilities for drug discovery and understanding diseases at a molecular level (Goenka et al., Reference Goenka, Gorzynski, Shafin, Fisk, Pesout, Jensen and Ashley2022).

The emergence of LLMs has significantly broadened the scope of AI applications in scientific research. Previously, AI algorithms were typically accessed and utilized through programming, limiting their use to technical processes such as data processing and result analysis. However, LLMs enable researchers to interact with AI using natural language, making AI technologies more accessible and extending their utility beyond technical tasks. LLMs can assist with labor-intensive tasks like literature reviews, and even contribute to more creative endeavors, such as proposing hypotheses and designing experiments (Zhao et al., Reference Zhao, Chen, Zhou, Li, Tang, Harris and Li2024).

In summary, AI has evolved from a mere tool into a vital assistant that supports researchers at every stage of their work. AI systems can rapidly process a vast amount of information, identify patterns that might be challenging for humans to detect, and suggest new avenues for investigation. With the assistance by AI, human researchers can tackle more complex problems, generate deeper insights, and drive scientific innovations more effectively. By combining AI with human intuition and expertise, the human–AI partnership enhances the depth and breadth of scientific research, enabling more innovative and impactful discoveries (Wang et al., Reference Wang, Fu, Du, Gao, Huang, Liu and Zitnik2023).

In the following sections of this chapter, we will explore AI applications across various disciplines, including mathematics, physics, chemistry, and life sciences. These examples will showcase AI technologies in scientific research and illustrate how they can be effectively integrated with human expertise in different fields.

10.2 Human–AI Collaboration in Mathematics

The intersection of AI technology and mathematics represents a transformative collaboration that has significantly accelerated the pace and depth of mathematical discoveries. In this section, we explore how AI aids scientists in mathematical research, highlighting key methodologies, notable achievements, and the evolving dynamics of this partnership.

10.2.1 Automated Theorem Proving

Since the birth of computers, there has been a longstanding desire to make them reason at the level of human thought. This aspiration led to the emergence of automated theorem proving (ATP) systems in the early 1950s, which reached their peak during the 1960s (Russell & Norvig, Reference Russell and Norvig2016). These ATP systems are designed to assist, augment, or even independently carry out the process of proving mathematical theorems.

ATP systems operate by converting mathematical premises and proofs into a formal language, such as first-order logic or higher-order logic (Loveland, Reference Loveland2016), that computers can process. These languages provide precise syntax and semantics, allowing unambiguous representation of mathematical ideas. The core of ATP systems, inference engines, then applies logical rules to explore the space of possible proofs that can derive desired conclusions from these premises. Various strategies, such as resolution, term rewriting, and model checking, can be employed at this stage. To navigate through this vast space, sophisticated algorithms, often involving heuristic search techniques like depth-first search, breadth-first search, and A* search, are used. Once a proof is found, ATP systems verify its correctness by checking each logical step against the rules of the formal system, ensuring that the proof is error-free and adheres to mathematical standards.

The focus of ATP systems is on proving the existing theorems. These systems not only ensure rigor but also handle the complexity of lengthy proofs that might be error-prone if done manually. For instance, the proof of the Kepler Conjecture, a centuries-old problem about the densest arrangement of spheres, was formally verified using the HOL Light proof assistant (Harrison, Reference Harrison2013). This collaboration exemplifies how AI can handle the intricate details of mathematical proofs, allowing human mathematicians to focus on higher-level insights and strategies.

10.2.2 Discovery of New Mathematical Conjectures

With the advancement of AI technology, AI can not only provide the proof of existing mathematical theorems, but also propose new conjectures that require validation by human researchers.

Mathematical conjectures are propositions or hypotheses based on limited evidence but have not yet been proven or disproven. They hold significant meanings as they can lead to profound discoveries and advancements in mathematical theory. Famous examples of conjectures include the Goldbach Conjecture, which asserts that every even integer greater than two is the sum of two prime numbers, a conjecture that remains unresolved. Another well-known conjecture is Fermat’s Last Theorem, which claims that there are no three positive integers $a$ , $b$ , and $c$ that satisfy the equation $a^{n} + b^{n} = c^{n}$ for any integer value of $n$ greater than two; this conjecture remained unproven for over 350 years until it was finally resolved in 1994 (Wiles, Reference Wiles1995). Such conjectures not only challenge mathematicians but also drive the development of new methods and theories in mathematics.

Conjectures often arise from patterns observed in data and serve as steppingstones for further exploration and proof within the mathematical community. This data-driven aspect is where modern AI technology comes into play. In light of this, DeepMind, a leading AI research lab, has developed an AI-based framework for discovering new mathematical conjectures (Davies et al., Reference Davies, Veličković, Buesing, Blackwell, Zheng, Tomašev and Kohli2021). This framework aims to identify relationships between two mathematical objects, such as whether two properties $X (z)$ and $Y (z)$ associated with a given variable $z$ satisfy a relationship $f$ , formally expressed as $f (X (z)) = Y (z)$ .

To achieve this, data samples $(X (z), Y (z))$ are computed and collected. AI models are then trained with $X (z)$ as input and $Y (z)$ as output to learn the relationship $f$ . When the learned relationship $\hat{f}$ is more accurate than what would be expected by chance, it suggests that a valid relationship may exist, warranting further exploration. This framework has successfully been applied to discover new connections in algebraic and geometric structures of knots in Knot Theory and to propose new resolutions for long-standing open conjectures in Representation Theory.

In these cases, AI acts as a creative partner, generating promising conjectures that mathematicians can rigorously test and prove. This collaboration effectively blends human intuition with machine computational power, enhancing the research process.

10.2.3 Optimization of Solutions

The optimization capabilities of AI are another crucial aspect of its collaboration with mathematicians. Problems in number theory, combinatorics, and other fields often require extensive computation to test hypotheses or explore large solution spaces. AI algorithms, particularly those in evolutionary computation and reinforcement learning, excel at optimizing solutions and searching through vast, complex landscapes.

FunSearch (short for “searching in the function space”) exemplifies this kind of innovative approaches (Romera-Paredes et al., Reference Romera-Paredes, Barekatain, Novikov, Balog, Kumar, Dupont and Fawzi2024). Designed for combinatorics problems, FunSearch addresses challenging problems that are difficult to optimize but easy to evaluate. For example, the famous cap set problem involves finding the largest possible set of $n$ -dimensional lattice points such that no three points are collinear. For a large $n$ , brute force methods become impractical due to the exponential growth in the number of possible combinations. However, evaluating whether a given set satisfies the cap set constraint is straightforward by checking the rank of a matrix composed of these points.

To solve the cap set problem, mathematicians have proposed heuristics to decide how to add points to a cap set without violating the constraints. Different heuristics can lead to different solutions, and there is no consensus on what the optimal heuristic should be.

To efficiently explore this vast heuristic space, FunSearch leverages LLMs to generate code that calculates the priority of adding a point to the set. The effectiveness of different priority functions is then evaluated based on the size of the resulting set. Evolutionary algorithms are employed to select the best programs and feed back to the LLMs, generating increasingly improved priority functions. By iterating this process, FunSearch can discover superior solutions that outperform those found by human researchers.

The aforementioned collaboration targets solving problems with unknown optimal solutions. Human experts pinpoint the critical aspects of the problem (i.e., the heuristics), while AI takes on the arduous task of generating new heuristics. Guided by human expertise, AI uses evaluation metrics and evolutionary algorithms to refine its output continuously. This dynamic partnership produces scalable and innovative solutions, pushing the boundaries of what can be achieved in mathematical problem-solving.

10.2.4 Summary

In this section, we review how AI assists mathematicians across various problem types: (1) proving existing theorems with known conclusions, (2) proposing new conjectures, and (3) optimizing solutions for problems with unknown optimality. AI serves not only as a computational and verification tool but also as a means to visualize mathematical structures and explore high-dimensional data, thereby enhancing the mathematical intuition for researchers. It can also offer new perspectives and suggests unconventional approaches, inspiring mathematicians to think beyond traditional paradigms. Currently, human intervention is needed to establish rules, generate training data, and provide guidance. However, as AI continues to advance, future systems may autonomously generate and prove theorems, potentially revolutionizing the field.

10.3 Human–AI Collaboration in Physics

The discovery process in physics traditionally consists of several phases: hypothesis formulation, experimental design, data analysis, and result interpretation. In the conventional research paradigm, each phase relies heavily on human expertise. For instance, the formulation of hypotheses has depended on the intuition and experience of scientists, while experimentation is often complex and resource-intensive, requiring meticulous consideration of numerous variables.

The landscape of physical sciences is undergoing a revolutionary transformation with the emergence of AI. The advancement driven by AI is not confined to any single phase of the scientific discovery process; rather, AI is permeating every stage. AI is enhancing the precision, efficiency, and scope of scientific research, enabling discoveries that were previously unimaginable (Karagiorgi et al., Reference Karagiorgi, Kasieczka, Kravitz, Nachman and Shih2022; Karniadakis et al., Reference Karniadakis, Kevrekidis, Lu, Perdikaris, Wang and Yang2021).

10.3.1 Efficient Data Analysis and Interpretation

One of the primary ways AI assists scientists is through advanced data analysis and interpretation. Physical sciences often involve complex datasets that are challenging to analyze manually. AI algorithms excel at sifting through these large datasets to uncover hidden patterns and correlations.

The discovery of Higgs boson (Chatrchyan et al., Reference Chatrchyan, Khachatryan, Sirunyan, Tumasyan, Adam, Aguilo and Damiao2012) is a notable example of how AI played a crucial role in physics research. The Higgs boson, also referred to as the “God particle,” is a fundamental particle associated with the Higgs field, which gives other particles their mass. Studying particles like the Higgs boson requires colliding them at extremely high energies to reveal the fundamental components of matter and the forces that govern their interactions. By analyzing the particles produced in these high-energy collisions, physicists can test and validate theoretical models, explore the properties of elementary particles, and uncover new physics beyond the Standard Model.

One of the most significant tools for this research is the Large Hadron Collider (LHC), the world’s largest and most powerful particle accelerator, located at European Organization for Nuclear Research (CERN). Within seconds of a collision at LHC, data from millions of sensors are recorded. The data rate – over sixty terabytes per second – is too great to be entirely written to disk. To manage this challenge, a trigger system is employed to keep only the data events that are interesting enough, targeting specific configurations of particles consistent with a physics process of interest. At the high-level trigger stage, RNNs are used to predict the physical quantity of the particles based on the spatial and temporal signals collected from sensors. If the prediction matches the predefined criteria for events of interest, the data is deemed interesting enough to keep. As such, the overwhelming majority of events produced during the collision can be rejected.

Despite this filtering, analyzing the remaining data manually would still be an overwhelming task due to its sheer volume and complexity (Andreassen et al., Reference Andreassen, Komiske, Metodiev, Nachman and Thaler2020). Detecting the Higgs boson involved identifying a very specific set of collision events among the remaining immense amount of data. To achieve this, a special clustering algorithm, named particle flow, was designed to sift through the vast quantities of collision data and identify patterns and events that might indicate the presence of new particles or unusual phenomena.

Furthermore, GNNs have emerged as a powerful tool for further data interpretation (DeZoort et al., Reference DeZoort, Battaglia, Biscarat and Vlimant2023). Events recorded by the sensors of LHC can be naturally represented as graphs, where nodes correspond to individual particles or detector hits, and edges represent the spatial or temporal relationships between them. GNNs leverage this graph structure to effectively capture the complex dependencies and interactions inherent in collision data. By using GNNs, researchers can enhance the accuracy of particle tracking and identification, improve the resolution of reconstructed events, and distinguish between signals of interests and background noises with greater precision. For example, GNNs can be applied to track reconstruction tasks, where they help to connect discrete hits into coherent particle trajectories, even in densely populated environments.

As exemplified by the applications of AI at the LHC, its strong computational power and high efficiency enable the effective analysis and interpretation of large-scale experimental data. This allows researchers to focus on innovative tasks, such as discovering new particles, thereby driving scientific innovations and breakthroughs.

10.3.2 Optimized Experimental Design

In addition to the labor-intensive aspects of data analysis, AI can further significantly enhance other intelligent tasks, such as experimental design for scientific discoveries in physics. Take nuclear fusion as an example. Nuclear fusion is a process where two light atomic nuclei combine to form a heavier nucleus, releasing a tremendous amount of energy. It is the same reaction that powers the sun and stars, making it a potentially limitless and clean source of energy. Fusion offers significant advantages over nuclear fission, which involves splitting heavy atomic nuclei and is currently used in nuclear power plants. Unlike fission, fusion produces minimal radioactive waste, poses no risk of catastrophic meltdowns, and uses abundant fuel sources like hydrogen isotopes.

In the context of nuclear fusion, plasma – the fourth state of matter consisting of charged particles – plays a central role. Plasma must be confined and controlled within a fusion reactor to sustain the fusion reactions. It is crucial to control plasma shapes for optimizing the performance and stability of the fusion reaction. However, maintaining the high-temperature plasma within a tokamak – a device essential for confining plasma using magnetic fields – poses a significant challenge. Traditional methods involve controlling multiple time-varying, non-linear variables and precomputing a set of feedforward coil currents and voltages to manage plasma position, current, and shape. While effective, this method demands extensive design effort and expertise, especially when being applied to new plasma configurations.

Recent advancements have shown that AI, specifically deep reinforcement learning (RL), can revolutionize this process (Degrave et al., Reference Degrave, Felici, Buchli, Neunert, Tracey, Carpanese and Riedmiller2022). The RL-based approach uses an NN to represent the controlling policy, which is essentially a strategy for managing the magnetic coils in response to the state of the plasma. The training is guided by a reward system designed to maintain the desired plasma shapes, achieving stable confinement and optimizing performance. Through a process known as trial and error, the NN interacts with the simulation environment. It takes actions, observes results, receives rewards or penalties, and adjusts its policy to improve performance. Once the NN has been sufficiently trained by the simulator, its control policies are transferred to an actual tokamak. Experiments conducted on the actual tokamak showcased the efficacy of the RL controller in achieving accurate plasma control across a variety of configurations.

Such applications of AI not only optimize experimental design but also push the boundaries and accelerate the long journey of physical discoveries. As one plasma physicist noted, “AI would enable us to explore things that we wouldn’t explore otherwise, because we can take risks with this kind of control system we wouldn’t dare take otherwise.”

10.3.3 Summary

In summary, AI has woven itself into every stage of physical discovery, from the intricate design of experiments to the analysis of vast datasets, and the interpretation of results. Its remarkable advancements have transformed the landscape of scientific research, dramatically enhancing the precision and efficiency with which we handle and interpret complex data. Yet, despite these technological strides, AI cannot replace the ingenuity of physicists, particularly in the creative and intuitive process of hypothesis formulation. This stage remains deeply rooted in human insight and imagination, areas where AI still has limited capabilities. The synergy between AI and physicists not only accelerates scientific progress and uncovers new frontiers but also highlights the irreplaceable value of human expertise in steering and shaping research.

10.4 Human–AI Collaboration in Chemistry

Chemistry fundamentally relies on meticulous experimentation and analysis. In the traditional research paradigm, chemists have spent countless hours in laboratories conducting experiments, analyzing results, and drawing conclusions to further their understanding of chemical processes and phenomena. However, human labor, while indispensable, is prone to errors and inconsistencies. Furthermore, the sheer volume and complexity of data generated in modern chemical research can overwhelm human capacity for accurate and efficient analysis.

The integration of AI into the field of chemistry has revolutionized the research paradigm. By leveraging AI, chemists can automate data collection and analysis, enhance predictive modeling, and optimize experimental workflows (Rohrbach et al., Reference Rohrbach, Šiaučiulis, Chisholm, Pirvan, Saleeb, Mehr and Cronin2022; Sanchez-Lengeling & Aspuru-Guzik, Reference Sanchez-Lengeling and Aspuru-Guzik2018; Segler et al., Reference Segler, Preuss and Waller2018). This not only reduces the risk of human error but also significantly accelerates the pace of discovery (Baum et al., Reference Baum, Yu, Ayala, Zhao, Watkins and Zhou2021). In this section, we explore the various ways in which scientists collaborate with AI to advance chemistry discoveries.

10.4.1 Automated Data Collection

Datasets are the cornerstone of chemistry research, providing the essential information needed to understand chemical reactions, properties, and behaviors. Comprehensive and accurate datasets enable chemists to identify patterns, validate hypotheses, and predict outcomes, driving scientific discovery and innovation.

Chemical data can originate from a variety of sources, including laboratory experiments and computational simulations. These data are ultimately recorded in the literature, such as scientific articles and reports. Collecting and curating the valuable information from dispersed sources is time-consuming and resource-intensive. Furthermore, human errors in manual data entry, measurement inconsistencies, and variations in experimental conditions can result in unreliable datasets.

AI attacks many of these challenges by automating the data collection process and enhancing the quality and comprehensiveness of datasets. One notable example is the use of NLP techniques to extract chemical synthesis procedures from millions of materials science papers (Kononova et al., Reference Kononova, Huo, He, Rong, Botari, Sun and Ceder2019). In this text-mining pipeline, paragraphs related to solid-state synthesis are first identified from a vast number of articles using a random forest classifier, which is trained on topics extracted by unsupervised learning. Once these relevant paragraphs are pinpointed, a bi-directional long-short term memory (BiLSTM) NN, designed for processing text sequences, is adopted to recognize the material entities within them. Next, the synthesis operations mentioned in these paragraphs are classified into six categories – such as mixing, heating, and drying – using NNs and sentence dependency tree analysis. Operation conditions like temperature, time, and atmosphere are then extracted by using a regular expression matching approach. Finally, all these components are synthesized into a coherent chemical formula.

This automated pipeline has generated a dataset containing about 20,000 synthesis records from over 4 million papers, a task that would take humans decades to accomplish manually. It, in turn, illustrates the transformative power of AI in streamlining data collection. Under the meticulous guidance of human-designed processes, AI can handle the labor-intensive work of information extraction efficiently and accurately.

10.4.2 Enhanced Predictive Modeling

Molecular prediction in chemistry refers to the process of forecasting the properties and behaviors of molecules before they are physically synthesized and tested. Accurate molecular prediction is essential for guiding experimental work, reducing the time and cost associated with chemical research, and facilitating the discovery of new compounds with desired properties. By predicting how molecules will behave, chemists can make informed decisions and design more effective and efficient experiments.

Chemists often rely on empirical rules and heuristic models for molecular prediction, drawing on accumulated knowledge and observed patterns from previous experiments. For instance, group contribution methods (Constantinou & Gani, Reference Constantinou and Gani1994) predict properties by summing the contributions of individual molecular fragments. While these methods are straightforward and quick to apply, they may lack the precision and flexibility needed for more complex molecules and reactions. To address this challenge, computational chemistry is introduced, employing computer algorithms to solve complex equations that describe molecular interactions and reactions. Techniques such as quantum mechanics, including density functional theory (DFT) (Vignale & Rasolt, Reference Vignale and Rasolt1987), allow chemists to calculate electronic structures and predict reactivity with high accuracy. Although these methods can be highly precise, they are often computationally expensive and time-consuming.

The advancement of AI technology has opened new avenues for accurate and efficient molecular predictions. Early studies utilized non-NN AI technologies to predict molecular properties based on manually crafted features. For example, a kriging method (Fletcher et al., Reference Fletcher, Davie and Popelier2014), a type of Gaussian process regression, is introduced to predict the electrostatic energies and polarization effects of aromatic amino acids. This approach involves training kriging models with geometries distorted via normal modes of vibration, which significantly reduces computational cost while maintaining accuracy. However, a major limitation of these non-NN AI methods lies in their reliance on the quality of manually extracted features, which requires extensive domain expertise.

To address this challenge, NNs, particularly GNNs, are now widely used to represent molecules more effectively. In GNNs, atoms are treated as nodes and chemical bonds as edges, enabling the training of models to predict various molecular properties without the need for manually crafted numerical descriptors. These models are applied to predict a range of molecular properties, including coordinates (Mansimov et al., Reference Mansimov, Mahmood, Kang and Cho2019), mass spectra (Park et al., Reference Park, Jo and Yoon2024), toxicity (Cremer et al., Reference Cremer, Medrano Sandonas, Tkatchenko, Clevert and De Fabritiis2023), etc. A compact review can be found in the literature (Wieder et al., Reference Wieder, Kohlbacher, Kuenemann, Garon, Ducrot, Seidel and Langer2020).

These AI-based prediction models reduce the reliance on trial-and-error methods, allowing chemists to focus on the most promising compounds. This accelerates the discovery of new materials, pharmaceuticals, and chemicals with specific functionalities.

10.4.3 Automated Synthesis and Experimentation

Building on the advancements in automated data collection and predictive modeling, AI-powered robots are transforming the way chemists approach experimental tasks. These robots are designed to free chemists from repetitive and laborious experimental activities across different scenarios, including materials handling, synthesis, and characterization (Coley et al., Reference Coley, Thomas, Lummiss, Jaworski, Breen, Schultz and Jensen2019, Reference Coley, Eyke and Jensen2020).

For example, solubility screening is a crucial step in understanding whether molecular compounds dissolve in a particular solvent. Traditionally, this process is time-consuming and labor-intensive, requiring periodic measurements throughout the experiment. Inspired by how human chemists visually assess solutions, CNN can be employed to automate this process (Pizzuto et al., Reference Pizzuto, De Berardinis, Longley, Fakhruldeen and Cooper2022). An autonomous robot takes photos of the samples of interest, and the CNN segments the sample from the images to determine whether it has fully dissolved in the solvent. This AI-driven approach not only automates the solubility screening process but also ensures consistent and accurate assessments.

AI algorithms are also used to optimize experimental configurations during autonomous experimentation. The outcome of chemical experiments depends on various variables, including time, temperature, and atmosphere. As the number of variables increases, the experimental complexity scales exponentially. To efficiently search for the optimal configuration, a regression model is often fitted based on historical experimental data (Burger et al., Reference Burger, Maffettone, Gusev, Aitchison, Bai, Wang and Cooper2020). This model predicts the portfolio of acquisition functions for different values of variables, narrowing the search space to configurations that are most likely to yield successful outcomes. This approach accelerates the identification of optimal experimental conditions, reducing the time and resources needed for experimentation.

Beyond individual tasks, integrated AI systems can manage entire materials discovery workflows. Traditionally, this process involves slow, laborious, iterative cycles of design, synthesis, testing, and analysis. An AI-driven robotic system can revolutionize this approach by starting with the generation of candidate materials through screening a vast array of possibilities for desirable properties. It then uses computer-aided synthesis planning to propose reaction pathways and finally synthesizes and characterizes the most promising candidates using a chemical robot assistant (Koscher et al., Reference Koscher, Canty, McDonald, Greenman, McGill, Bilodeau and Jensen2023). This integration streamlines the discovery process, enabling faster and more efficient development of new materials.

10.4.4 Summary

In this section, we briefly discuss the transformative trends in collaboration between humans and AI for chemical research, specifically in data collection, predictive modeling, and automated experimentation. The integration of AI enhances accuracy and efficiency, reduces the time and effort required by human researchers, and enables rapid synthesis and analysis of new materials.

Current AI technology still requires meticulous guidance from human experts to accommodate specific data formats, molecular mechanisms, and experimental specifications. However, the future of AI in chemistry is bright, with potential for even greater breakthroughs as AI technologies continue to evolve. As AI advances, its role in chemistry will undoubtedly expand, driving further innovation and enabling scientists to tackle increasingly complex challenges.

10.5 Human–AI Collaboration in Life Science

Life science encompasses a broad range of scientific disciplines focusing on the studies of living organisms, from the molecular and cellular levels to the entire ecosystems. It includes various fields such as biology, genetics, biochemistry, pharmacology, and medicine, all of which aim to understand the complex processes that govern life. Life sciences are integral to advancing our knowledge of health, disease, and mechanism of evolution, providing the foundation for innovations in healthcare, disease treatment, and enhancing the quality of life.

Traditionally, research in life sciences has relied heavily on empirical methods, where scientists conduct experiments to test hypotheses derived from observations. This approach, often described as the “hypothesis-driven” paradigm, involves formulating a question, designing experiments, collecting data, and analyzing the results to draw conclusions. While this method has led to significant discoveries, it is often limited by the sheer complexity of biological systems, the scale of data, and the time-intensive nature of experimental work. Moreover, the vast amount of data generated in modern life sciences, particularly with the advent of high-throughput technologies like genomics and proteomics, have outpaced the capacity of traditional analytical methods (Holzinger et al., Reference Holzinger, Keiblinger, Holub, Zatloukal and Müller2023).

These challenges make life sciences well-suited to the transformative power of AI. With its ability to process vast datasets, detect patterns, and make predictions, AI provides a powerful complement to traditional research methods. By integrating AI into their work, scientists can not only accelerate the pace of discovery but also explore new and complex areas that were previously inaccessible. Given the extensive body of work on AI applications in life sciences, this section highlights key examples that showcase AI’s strengths in pattern recognition and predictive modeling.

10.5.1 Sequencing and Analyzing Genetic Information

The advent of genomic sequencing has revolutionized our understanding of biology, unlocking the code of life written in DNA. However, the sheer volume and complexity of genetic data present significant challenges. Sequencing technologies, like next-generation sequencing (NGS), generate massive datasets that require sophisticated analysis to extract meaningful insights. This is where AI steps in, offering powerful tools to decode genetic information, identify genetic variants, and understand their implications for health and disease (Dias & Torkamani, Reference Dias and Torkamani2019).

One of the critical challenges in genomic sequencing is to ensure the accuracy of large-scale DNA sequence data. Errors can occur during the sequencing process, leading to incorrect base calls, which can skew downstream analyses. AI algorithms, particularly DL models, have been developed to improve the accuracy of sequencing reads. By training on large datasets of correctly sequenced DNA, these AI models can learn to distinguish between true genetic variants and sequencing errors, enhancing the reliability of the data produced.

For instance, DeepVariant, developed by Google, is a deep learning tool that has been widely recognized for its ability to call genetic variants with higher accuracy than traditional methods (Yun et al., Reference Yun, Li, Chang, Lin, Carroll and McLean2020). It uses CNNs, similar to those used in image recognition, to analyze raw sequencing data and identify variants, reducing the error rate and improving the quality of genomic data.

Once the raw DNA sequence is obtained, the next step is variant calling – identifying differences between the sequenced genome and a reference genome. These variants can range from single nucleotide polymorphisms (SNPs) to larger structural variations. AI plays a pivotal role in this process by automating the detection and classification of genetic variants.

AI-driven tools like GATK (Genome Analysis Toolkit) use machine learning algorithms to improve the accuracy and speed of variant calling (Lin et al., Reference Lin, Chang, Hsu, Hung, Chien, Hwu and Lee2022). These tools can handle a vast amount of data generated by NGS and identify variants with high precision. Moreover, AI algorithms can prioritize variants based on their potential impact on gene function, guiding researchers toward the most biologically relevant differences.

Beyond variant calling, AI is also essential in interpreting the functional implications of these genetic variants. Machine learning models can analyze vast databases of known gene functions, protein structures, and clinical data to predict how a particular variant might affect an individual’s health. For example, AI tools can predict whether a variant is likely to be pathogenic, helping clinicians make informed decisions about diagnosis and treatment.

A groundbreaking example is the collaboration between the Broad Institute and Google, where AI was used to improve the accuracy and efficiency of genomic data processing (Genome Analysis Toolkit). The partnership led to the development of deep learning models that could analyze terabytes of sequencing data in a fraction of the time required by traditional methods, paving the way for faster and more accurate genomic studies.

Additionally, AI has been instrumental in large-scale projects such as the UK Biobank (Bycroft et al., Reference Bycroft, Freeman, Petkova, Band, Elliott, Sharp and Marchini2018), where AI-driven analyses are helping researchers understand the genetic determinants of health and disease in a population of over 500,000 participants. The insights gained from such projects are expected to lead to new diagnostics, treatments, and preventive strategies tailored to individual genetic profiles.

10.5.2 Drug Discovery and Development

The process of drug discovery and development has traditionally been a lengthy, expensive, and complex journey, often taking over a decade and costing billions of dollars to bring a new drug to market. This journey involves identifying potential drug candidates, testing their safety and efficacy, and navigating through a rigorous regulatory approval process. The challenges of this process are compounded by the high rate of failure, with many promising compounds falling short during clinical trials. However, the advent of AI has begun to revolutionize this field, offering new tools and approaches that significantly enhance the efficiency and effectiveness of drug discovery (Mak et al., 2023).

One of the first steps in drug discovery is to identify a biological target – typically a protein or gene associated with a disease – that a drug can modulate. AI technologies have proven to be invaluable in this phase (Pun et al., Reference Pun, Ozerov and Zhavoronkov2023). By analyzing large datasets from genomics, proteomics, and other omics technologies, AI can identify potential drug targets that may not be obvious through traditional methods. For example, AI algorithms can sift through a vast amount of genetic data to pinpoint mutations or expressions linked to specific diseases, helping researchers identify new targets for drug development.

Once a target has been identified, the next step is to find chemical compounds that can interact with it effectively. Traditionally, this involved screening large libraries of compounds in the lab – a time-consuming and costly process (Han et al., Reference Han, Yoon, Kim, Lee and Lee2023). AI has dramatically accelerated this process through virtual screening, where machine learning models predict how different compounds will interact with the target. These models can rapidly filter out compounds that are unlikely to be effective, allowing scientists to focus on the most promising candidates.

In addition to screening, AI is also used in lead optimization, where the chemical properties of a compound are refined to improve its efficacy, reduce toxicity, and enhance its pharmacokinetic properties. AI-driven models can predict how modifications to a compound’s structure might impact its behavior in the body, helping scientists design more effective drugs (Vora et al., Reference Vora, Gholap, Jetha, Thakur, Solanki and Chavda2023). For instance, companies like Insilico Medicine and Atomwise use AI to predict the biological activity of molecules, optimizing them for better performance before they even reach the laboratory.

The success of a drug in clinical trials is often the make-or-break point of drug development (Urbina et al., Reference Urbina, Lentzos, Invernizzi and Ekins2022). AI is increasingly being used to improve the design and execution of these trials. Predictive modeling, powered by AI, can analyze patient data to identify which populations are most likely to respond to a new treatment, enabling more targeted and efficient trials. This approach not only increases the likelihood of success but also reduces costs by minimizing the number of participants needed and shortening the trial duration.

Moreover, AI can monitor and analyze data in real-time during clinical trials, identifying potential issues or trends that might otherwise go unnoticed. This capability allows for adaptive trial designs, where protocols can be adjusted based on interim results, further increasing the chances of success.

A striking example of AI for drug discovery is the development of IBM Watson for Drug Discovery, a platform that uses AI to analyze a vast amount of scientific literature and data (Visan & Negut, Reference Visan and Negut2024). In one notable instance, Watson helped researchers at Barrow Neurological Institute identify five new genes linked to amyotrophic lateral sclerosis (ALS), a breakthrough that could lead to new treatment avenues. Similarly, the British AI company BenevolentAI used its technology to identify an existing drug, baricitinib, as a potential treatment for COVID-19, which subsequently received emergency use authorization during the pandemic.

10.5.3 Summary

The integration of AI into life sciences has revolutionized the way research is conducted, enhancing both the speed and accuracy of scientific discoveries. By integrating AI into the research process, scientists can enhance their ability to uncover new insights, accelerate discoveries, and address questions that were previously too complex to tackle. AI not only augments human expertise but also opens up new avenues for exploring the intricacies of life, making it an invaluable partner in the pursuit of scientific discovery in the life sciences.

10.6 Summary

AI has undeniably become a transformative force in scientific discovery, fundamentally changing how researchers approach problems, analyze data, and generate insights. This chapter has explored the dynamic partnership between human researchers and AI systems, emphasizing the powerful synergy that drives scientific progress across various disciplines. In this summary, we delve deeper into this collaboration by highlighting the crucial role of human guidance in applying AI technologies, examining AI’s influence on the core components of scientific discovery, and addressing the ethical considerations that emerge from this evolving relationship.

10.6.1 Impacts of AI on Scientific Discovery

Scientific discovery generally unfolds in three key steps: (i) hypothesis generation and selection, (ii) the design of methods to validate or disprove the hypothesis, and (iii) the derivation of insights from interpreting the results. AI influences each of these components to varying degrees.

The generation of hypotheses is a critical first step in scientific enquiry, traditionally driven by human intuition, experience, and creativity. This intuition is often rooted in a deep understanding of a specific scientific domain and is sparked by human intellect. AI, with its unparalleled ability to process vast amounts of information, can be harnessed to explore extensive hypothesis spaces, pinpointing those that align with existing knowledge. However, it is widely recognized that while AI excels at synthesizing and organizing existing knowledge, it still struggles to generate truly novel insights.

Once a hypothesis is formed, the next step involves developing methods to test it. This task typically falls to human experts who specialize in the relevant scientific field. AI systems can enhance this process by optimizing experimental designs, simulating various scenarios, and reducing reliance on trial and error. This not only minimizes resource use but also accelerates the path to discovery.

The final stage of scientific discovery involves interpreting results and drawing meaningful insights. This is where AI truly shines in its capacity to validate or disprove hypotheses. AI excels at processing and analyzing large datasets, identifying trends, anomalies, and subtle patterns that might elude human analysts. For example, AI can analyze complex genetic data to uncover interactions that contribute to specific traits or diseases, offering insights that could lead to new therapeutic approaches.

10.6.2 Human Guidance to AI for Scientific Discovery

While AI is increasingly embedded in every facet of scientific research, significantly reducing human labor, the success of AI in this domain still hinges on crucial human guidance. For AI to reach its full potential in scientific discovery, expert human intervention is indispensable. This guidance is vital for navigating the complexities of three key areas: (i) acquiring and preparing datasets to train AI models, (ii) designing AI models that are precisely tailored to the chosen datasets, and (iii) developing effective training schemes that enable AI models to learn meaningful patterns from the data. Each of these elements demands a deep understanding of both the scientific domain and the unique capabilities of AI technology.

Dataset Acquisition and Preparation

The success of any AI model hinges on the quality and relevance of the dataset used for training. Human expertise is essential in acquiring datasets that accurately represent the problem domain and ensuring that these datasets are comprehensive, unbiased, and appropriately curated. In scientific discovery, this often involves collecting data from various sources, such as literature, experimental results, and numerical simulation, to create a robust and representative dataset.

Although AI technology can automate many aspects of this process, such as using NLP to extract chemical equations from literature or deploying AI-powered robots to conduct experiments, human guidance remains indispensable. Expertise is essential in addressing issues like missing values, noise, and outliers, which require a deep understanding of the scientific context to ensure the data fed into the AI model is both accurate and meaningful. Without meticulous human oversight at this stage, AI models risk learning patterns from misleading datasets that are irrelevant to the research problem, potentially undermining the integrity of the scientific discovery process.

Model Design and Customization

The design of AI models must be meticulously tailored to the dataset’s specific characteristics and the scientific questions being addressed. Different data types require distinct model architectures: for example, image data typically benefits from CNNs, sequential data like genomic sequences are often best processed with RNNs or transformers, and molecular structures are most effectively analyzed using GNNs.

While these guidelines provide a foundation for selecting AI models, human expertise is essential in crafting models that not only perform well on a given dataset but also adhere to the underlying scientific principles. The goal is to create a model that can effectively capture the nuances of the data without overfitting or underfitting, which requires a balance between model complexity and generalization ability. Human judgment is also crucial in ensuring that the model’s outputs are interpretable and aligned with scientific goals, as the ability to explain and validate predictions is fundamental to advancing knowledge and making credible discoveries.

Training Scheme Development

The choice of a training scheme is closely tied to the nature of the task and the annotation status of the dataset. For tasks involving the identification of patterns or the prediction of continuous values, classification and regression training schedules are typically employed. These approaches often require a substantial volume of well-annotated data samples. While manual annotation is possible, it can be incredibly labor-intensive. A more efficient alternative is to generate data labels from experimental results or numerical simulations guided by expert-defined rules.

On the other hand, when the goal is to optimize a problem with limited labeled data but clear criteria for evaluating outcomes, reinforcement learning and genetic algorithms are often the preferred methods. These approaches excel in scenarios where the model learns by interacting with the environment and progressively improving based on feedback or evolving solutions to find the best outcome.

Training AI models to learn patterns from data is a complex process that demands strategic human oversight. Human intervention is crucial for embedding domain knowledge into the training process, such as incorporating scientific constraints or directing the model’s attention to key data features. This not only enhances the model’s performance but also ensures that the AI system yields scientifically valid and actionable insights, paving the way for meaningful discoveries.

10.6.3 Ethical Considerations in Human–AI Collaboration for Scientific Discovery

Human–AI collaboration in scientific discovery holds immense potential, but it also raises significant ethical concerns that must be addressed to ensure that this partnership benefits society without unintended harm. One of the most pressing issues is the potential for bias in AI systems. AI models are trained on existing data and, if that data reflects historical biases – whether related to gender, race, or other factors – the AI could perpetuate and even amplify these biases in scientific research. This could lead to skewed results, misrepresentation of certain populations, or biased scientific conclusions, ultimately affecting the fairness and credibility of scientific discoveries.

Another ethical concern is the transparency and interpretability of AI-driven research. AI systems, particularly complex models like deep neural networks, often operate as “black boxes,” making decisions that are difficult for even experts to fully understand or explain. This lack of transparency can pose significant risks in scientific discovery, where understanding the rationale behind a finding is crucial for its validation and acceptance. If scientists rely too heavily on AI without demanding interpretability, there is a danger of accepting results that cannot be independently verified, leading to a potential erosion of trust in scientific outcomes.

Moreover, the power dynamics in human–AI collaboration also raises ethical questions. As AI systems become more integrated into scientific research, there is a risk that the role of human scientists may be diminished or that decision-making power may shift disproportionately toward those who control the AI systems. This could create inequalities within the scientific community, where certain groups or individuals have more influence over research outcomes due to their access to or control over AI technologies. Ensuring equitable access to AI tools and maintaining a balance in human–AI collaboration is essential to prevent the concentration of power and to foster inclusive scientific progress.

The ethical implications of AI-driven discoveries also extend to the potential consequences of these discoveries themselves. AI has the capability to accelerate the pace of research, leading to breakthroughs that could have profound societal impacts – both positive and negative. For instance, AI could uncover new drugs or therapies at an unprecedented speed, but it could also be used to develop harmful technologies or exacerbate existing inequalities. The scientific community must therefore engage in ongoing ethical reflection and dialogue to consider the broader implications of their work and to ensure that AI-driven discoveries are aligned with societal values and contribute to the common good.

Finally, the integration of AI into scientific research raises questions about accountability. When AI systems are involved in generating hypotheses, designing experiments, or interpreting data, it becomes more challenging to determine who is responsible for the outcomes, particularly if those outcomes are harmful or erroneous. Establishing clear guidelines for accountability in human–AI collaborations is crucial to address this issue. This includes not only holding developers and users of AI systems accountable but also ensuring that there are mechanisms in place to correct or mitigate any negative impacts resulting from AI-driven research.

In conclusion, while the collaboration between humans and AI in scientific discovery holds great promise, it also presents a complex landscape of ethical challenges. Addressing these issues requires a proactive and multidisciplinary approach, involving ethicists, scientists, policymakers, and the public in ongoing discussions to guide the responsible development and use of AI in science. By doing so, we can harness the full potential of AI to advance knowledge while safeguarding the ethical foundations of scientific inquiry.

References

Andreassen, A., Komiske, P. T., Metodiev, E. M., Nachman, B., & Thaler, J. (2020). OmniFold: A Method to Simultaneously Unfold All Observables. Physical Review Letters, 124(18), 182001.Google Scholar PubMed

Baum, Z. J., Yu, X., Ayala, P. Y., Zhao, Y., Watkins, S. P., & Zhou, Q. (2021). Artificial Intelligence in Chemistry: Current Trends and Future Directions. Journal of Chemical Information and Modeling, 61(7), 3197–3212.10.1021/acs.jcim.1c00619CrossRef Google Scholar

Burger, B., Maffettone, P. M., Gusev, V. V., Aitchison, C. M., Bai, Y., Wang, X., … & Cooper, A. I. (2020). A Mobile Robotic Chemist. Nature, 583(7815), 237–241.10.1038/s41586-020-2442-2CrossRef Google Scholar PubMed

Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L. T., Sharp, K., … & Marchini, J. (2018). The UK Biobank Resource with Deep Phenotyping and Genomic Data. Nature, 562(7726), 203–209.10.1038/s41586-018-0579-zCrossRef Google Scholar PubMed

Chatrchyan, S., Khachatryan, V., Sirunyan, A. M., Tumasyan, A., Adam, W., Aguilo, E., … & Damiao, D. D. J. (2012). Observation of a New Boson at a Mass of 125 GeV with the CMS Experiment at the LHC. Physics Letters B, 716(1), 30–61.10.1016/j.physletb.2012.08.021CrossRef Google Scholar

Coley, C. W., Eyke, N. S., & Jensen, K. F. (2020). Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angewandte Chemie International Edition, 59(52), 23414–23436.10.1002/anie.201909989CrossRef Google Scholar PubMed

Coley, C. W., Thomas, D. A. III, Lummiss, J. A., Jaworski, J. N., Breen, C. P., Schultz, V., … & Jensen, K. F. (2019). A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning. Science, 365(6453), eaax1566.10.1126/science.aax1566CrossRef Google Scholar PubMed

Constantinou, L., & Gani, R. (1994). New Group Contribution Method for Estimating Properties of Pure Compounds. AIChE Journal, 40(10), 1697–1710.10.1002/aic.690401011CrossRef Google Scholar

Cremer, J., Medrano Sandonas, L., Tkatchenko, A., Clevert, D. A., & De Fabritiis, G. (2023). Equivariant Graph Neural Networks for Toxicity Prediction. Chemical Research in Toxicology, 36(10), 1561–1573.Google Scholar PubMed

Davies, A., Veličković, P., Buesing, L., Blackwell, S., Zheng, D., Tomašev, N., … & Kohli, P. (2021). Advancing Mathematics by Guiding Human Intuition with AI. Nature, 600(7887), 70–74.CrossRef Google Scholar PubMed

Davis, R., Buchanan, B., & Shortliffe, E. (1977). Production Rules as a Representation for a Knowledge-based Consultation Program. Artificial Intelligence, 8(1), 15–45.10.1016/0004-3702(77)90003-0CrossRef Google Scholar

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., … & Riedmiller, M. (2022). Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning. Nature, 602(7897), 414–419.10.1038/s41586-021-04301-9CrossRef Google Scholar PubMed

DeZoort, G., Battaglia, P. W., Biscarat, C., & Vlimant, J. R. (2023). Graph Neural Networks at the Large Hadron Collider. Nature Reviews Physics, 5(5), 281–303.10.1038/s42254-023-00569-0CrossRef Google Scholar

Dias, R., & Torkamani, A. (2019). Artificial Intelligence in Clinical and Genomic Diagnostics. Genome Medicine, 11(1), 70.Google Scholar PubMed

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … & Houlsby, N. (2021). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations, 9, 1–21.Google Scholar

Feigenbaum, E. A., Buchanan, B. G., & Lederberg, J. (1970). On Generality and Problem Solving: A Case Study using the DENDRAL Program (No. NASA-CR-123182).Google Scholar

Fletcher, T. L., Davie, S. J., & Popelier, P. L. (2014). Prediction of Intramolecular Polarization of Aromatic Amino Acids using Kriging Machine Learning. Journal of Chemical Theory and Computation, 10(9), 3708–3719.10.1021/ct500416kCrossRef Google Scholar PubMed

Gertrudes, J. C., Maltarollo, V. G., Silva, R. A., Oliveira, P. R., Honorio, K. M., & Da Silva, A. B. F. (2012). Machine Learning Techniques and Drug Design. Current Medicinal Chemistry, 19(25), 4289–4297.10.2174/092986712802884259CrossRef Google Scholar PubMed

Goenka, S. D., Gorzynski, J. E., Shafin, K., Fisk, D. G., Pesout, T., Jensen, T. D., … & Ashley, E. A. (2022). Accelerated Identification of Disease-causing Variants with Ultra-rapid Nanopore Genome Sequencing. Nature Biotechnology, 40(7), 1035–1041.10.1038/s41587-022-01221-5CrossRef Google Scholar PubMed

Han, R., Yoon, H., Kim, G., Lee, H., & Lee, Y. (2023). Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals, 16(9), 1259.10.3390/ph16091259CrossRef Google Scholar PubMed

Harrison, J. (2013). The HOL Light Theory of Euclidean Space. Journal of Automated Reasoning, 50, 173–190.10.1007/s10817-012-9250-9CrossRef Google Scholar

Holzinger, A., Keiblinger, K., Holub, P., Zatloukal, K., & Müller, H. (2023). AI for Life: Trends in Artificial Intelligence for Biotechnology. New Biotechnology, 74, 16–24.10.1016/j.nbt.2023.02.001CrossRef Google Scholar PubMed

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … & Hassabis, D. (2021). Highly Accurate Protein Structure Prediction with AlphaFold. Nature, 596(7873), 583–589.CrossRef Google Scholar PubMed

Karagiorgi, G., Kasieczka, G., Kravitz, S., Nachman, B., & Shih, D. (2022). Machine Learning in the Search for New Fundamental Physics. Nature Reviews Physics, 4(6), 399–412.10.1038/s42254-022-00455-1CrossRef Google Scholar

Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., & Yang, L. (2021). Physics-informed Machine Learning. Nature Reviews Physics, 3(6), 422–440.10.1038/s42254-021-00314-5CrossRef Google Scholar

Kononova, O., Huo, H., He, T., Rong, Z., Botari, T., Sun, W., … & Ceder, G. (2019). Text-mined Dataset of Inorganic Materials Synthesis Recipes. Scientific Data, 6(1), 203.10.1038/s41597-019-0224-1CrossRef Google Scholar PubMed

Koscher, B. A., Canty, R. B., McDonald, M. A., Greenman, K. P., McGill, C. J., Bilodeau, C. L., … & Jensen, K. F. (2023). Autonomous, Multiproperty-driven Molecular Discovery: From Predictions to Measurements and Back. Science, 382(6677), eadi1407.10.1126/science.adi1407CrossRef Google Scholar PubMed

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436–444.CrossRef Google Scholar PubMed

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4), 541–551.10.1162/neco.1989.1.4.541CrossRef Google Scholar

Li, M. M., Huang, K., & Zitnik, M. (2022). Graph Representation Learning in Biomedicine and Healthcare. Nature Biomedical Engineering, 6(12), 1353–1369.Google Scholar PubMed

Libbrecht, M. W., & Noble, W. S. (2015). Machine Learning Applications in Genetics and Genomics. Nature Reviews Genetics, 16(6), 321–332.10.1038/nrg3920CrossRef Google Scholar PubMed

Lin, Y. L., Chang, P. C., Hsu, C., Hung, M. Z., Chien, Y. H., Hwu, W. L., … & Lee, N. C. (2022). Comparison of GATK and DeepVariant by Trio Sequencing. Scientific Reports, 12(1), 1809.10.1038/s41598-022-05833-4CrossRef Google Scholar PubMed

Loveland, D. W. (2016). Automated Theorem Proving: A Logical Basis. Elsevier.Google Scholar

Mak, K. K., Wong, Y. H., & Pichika, M. R. (2024). Artificial Intelligence in Drug Discovery and Development. In: Vogel, H. G. (ed.), Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays (pp. 1461–1498). Springer.10.1007/978-3-031-35529-5_92CrossRef Google Scholar

Mansimov, E., Mahmood, O., Kang, S., & Cho, K. (2019). Molecular Geometry Prediction using a Deep Generative Graph Neural Network. Scientific Reports, 9(1), 20381.10.1038/s41598-019-56773-5CrossRef Google Scholar PubMed

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine, 27(4), 12.Google Scholar

Minsky, M., & Papert, S. (1969). Perceptrons, MIT Press.Google Scholar

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.Google Scholar

Park, J., Jo, J., & Yoon, S. (2024). Mass Spectra Prediction with Structural Motif-based Graph Neural Networks. Scientific Reports, 14(1), 1400.Google Scholar PubMed

Pearlmutter. (1989, June). Learning State Space Trajectories in Recurrent Neural Networks. In International 1989 Joint Conference on Neural Networks (pp. 365–372). IEEE.Google Scholar

Pierce, J. R., & Carroll, J. B. (1966). Language and Machines: Computers in Translation and Linguistics. National Academy of Sciences/National Research Council.Google Scholar

Pizzuto, G., De Berardinis, J., Longley, L., Fakhruldeen, H., & Cooper, A. I. (2022, July). Solis: Autonomous Solubility Screening using Deep Neural Networks. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1–7). IEEE.Google Scholar

Pun, F. W., Ozerov, I. V., & Zhavoronkov, A. (2023). AI-powered Therapeutic Target Discovery. Trends in Pharmacological Sciences, 44(9), 561–572.Google Scholar PubMed

Rohrbach, S., Šiaučiulis, M., Chisholm, G., Pirvan, P. A., Saleeb, M., Mehr, S. H. M., … & Cronin, L. (2022). Digitization and Validation of a Chemical Synthesis Literature Database in the ChemPU. Science, 377(6602), 172–180.10.1126/science.abo0058CrossRef Google Scholar PubMed

Romera-Paredes, B., Barekatain, M., Novikov, A., Balog, M., Kumar, M. P., Dupont, E., … & Fawzi, A. (2024). Mathematical Discoveries from Program Search with Large Language Models. Nature, 625(7995), 468–475.10.1038/s41586-023-06924-6CrossRef Google Scholar PubMed

Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65(6), 386.Google Scholar PubMed

Russell, S. J., & Norvig, P. (2016). Artificial Intelligence: A Modern Approach. Pearson.Google Scholar

Sanchez-Lengeling, B., & Aspuru-Guzik, A. (2018). Inverse Molecular Design using Machine Learning: Generative Models for Matter Engineering. Science, 361(6400), 360–365.CrossRef Google Scholar PubMed

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The Graph Neural Network Model. IEEE Transactions on Neural Networks, 20(1), 61–80.CrossRef Google Scholar PubMed

Segler, M. H., Preuss, M., & Waller, M. P. (2018). Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature, 555(7698), 604–610.10.1038/nature25978CrossRef Google Scholar PubMed

Siekmann, J., & Wrightson, G. (eds.). (2012). Automation of Reasoning: 2: Classical Papers on Computational Logic 1967–1970. Springer Science & Business Media.Google Scholar

Smolensky, P. (1987). Connectionist AI, Symbolic AI, and the Brain. Artificial Intelligence Review, 1(2), 95–109.Google Scholar

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In International Conference on Machine Learning (pp. 2256–2265). PMLR.Google Scholar

Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433.Google Scholar

Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual Use of Artificial-Intelligence-Powered Drug Discovery. Nature Machine Intelligence, 4(3), 189–191.10.1038/s42256-022-00465-9CrossRef Google Scholar PubMed

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 5998–6008.Google Scholar

Vignale, G., & Rasolt, M. (1987). Density-Functional Theory in Strong Magnetic Fields. Physical Review Letters, 59(20), 2360.10.1103/PhysRevLett.59.2360CrossRef Google Scholar PubMed

Visan, A. I., & Negut, I. (2024). Integrating Artificial Intelligence for Drug Discovery in the Context of Revolutionizing Drug Delivery. Life, 14(2), 233.Google Scholar PubMed

Vora, L. K., Gholap, A. D., Jetha, K., Thakur, R. R. S., Solanki, H. K., & Chavda, V. P. (2023). Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design. Pharmaceutics, 15(7), 1916.CrossRef Google Scholar PubMed

Wang, H. (1960). Proving Theorems by Pattern Recognition I. Communications of the ACM, 3(4), 220–234.10.1145/367177.367224CrossRef Google Scholar

Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., … & Zitnik, M. (2023). Scientific Discovery in the Age of Artificial Intelligence. Nature, 620(7972), 47–60.Google Scholar PubMed

Waterman, D. A. (1985). A Guide to Expert Systems. Addison-Wesley Longman Publishing Co., Inc.Google Scholar

Werbos, P. J. (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting (Vol. 1). John Wiley & Sons.Google Scholar

Wieder, O., Kohlbacher, S., Kuenemann, M., Garon, A., Ducrot, P., Seidel, T., & Langer, T. (2020). A Compact Review of Molecular Property Prediction with Graph Neural Networks. Drug Discovery Today: Technologies, 37, 1–12.10.1016/j.ddtec.2020.11.009CrossRef Google Scholar PubMed

Wiles, A. (1995). Modular Elliptic Curves and Fermat’s Last Theorem. Annals of Mathematics, 141(3), 443–551.10.2307/2118559CrossRef Google Scholar

Yun, T., Li, H., Chang, P. C., Lin, M. F., Carroll, A., & McLean, C. Y. (2020). Accurate, Scalable Cohort Variant Calls using DeepVariant and GLnexus. Bioinformatics, 36(24), 5582–5589.Google Scholar

Zhao, S., Chen, S., Zhou, J., Li, C., Tang, T., Harris, S. J., … & Li, X. (2024). Potential to Transform Words to Watts with Large Language Models in Battery Research. Cell Reports Physical Science, 5(3), 101844.10.1016/j.xcrp.2024.101844CrossRef Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the HTML of this book is currently unknown and may be updated in the future.