2.1 Introduction
Artificial intelligence (AI) has progressively gained autonomy and anthropomorphic traits as it advances toward “intelligence.” For instance, AI may assist humans in decision-making by utilizing gathered data; Intelligent voice assistants like Siri, Alexa, Cortana, etc., possess human-like intonation qualities and can engage in conversations with humans, thereby alleviating feelings of loneliness. The field of human–computer interaction is undergoing a transformation due to the integration of AI technology. This transformation is leading to the emergence of human–AI interaction and collaboration as the next direction of development.
The emergence of intelligent manufacturing, intelligent transportation, intelligent medical care, intelligent education, and other scenarios involving human–AI interaction has led to the question of whether people can establish effective interaction and collaborative relationships with AI. Humans, in their role as users and participants of intelligent systems, communicate their requirements. The behavior and decision-making of the intelligent system will be significantly influenced by human intention, willingness, and feedback. Hence, it is imperative to examine the future direction of AI with a focus on human-centric principles. This chapter seeks to provide a comprehensive understanding of the field of human–AI interaction and collaboration through looking at user interaction. It takes a multidisciplinary viewpoint, incorporating insights from information science, computer science, psychology, and other relevant disciplines. Additionally, it aims to investigate potential future research directions in this field.
2.2 From Human–Computer Interaction to Human–AI Interaction and Collaboration
2.2.1 Origin: Human–Computer Interaction
The interaction between human and artificial intelligence is a manifestation of the progress in human–computer interaction throughout the era of artificial intelligence. To begin with, it is important to comprehend the meaning of human–computer interaction and its origins. Human–computer interaction (HCI) is an academic discipline that investigates different types of interactions between humans and computers (Pargman et al., Reference Pargman, Eriksson, Bates, Kirman, Comber, Hedman and van den Broeck2019). It involves the study of how individuals design, develop, and utilize interactive computer systems, as well as the potential impact of computers on individuals, organizations, and societies (Myers et al., Reference Myers, Hollan, Cruz, Bryson, Bulterman, Catarci, Citrin, Glinert, Grudin and Ioannidis1996). It is a multidisciplinary field that incorporates techniques and approaches from various disciplines (Jiang et al., Reference Jiang, Sun, Fu and Lv2024; S. Kim, Reference Kim, Baecker, Grudin, Buxton and Greenberg1995), including computer science, information science, psychology, sociology, and design. During the process of conducting research, it frequently involves examining methods to assist users in completing activities and improving their access to information, as well as facilitating more convenient interactions for users (Myers et al., Reference Myers, Hollan, Cruz, Bryson, Bulterman, Catarci, Citrin, Glinert, Grudin and Ioannidis1996). Some instances include techniques for optimizing input and output devices to enhance interaction, efficient information transfer, and controlling and mastering computer behavior. Additionally, it includes designing, testing, and evaluating tools for user interfaces, as well as the principles developers should adhere to when creating human–computer interaction tools. HCI research has been described as “a field of truly constructive problem solving” (Oulasvirta & Hornbæk, Reference Oulasvirta and Hornbæk2016) because it explores multiple fields of practice through interdisciplinary collaboration to provide solutions to the problems that humans face when interacting with different types of computer systems.
Human–computer interaction is a topic that has emerged as a result of the development of computers, tracing its origins back to the establishment of the ACM Special Interest Group on Social and Behavioral Computing (SIGSOC) in 1969 (Borman, Reference Borman1996). The initial focus of SIGSOC was on the advancement of computing from a social science standpoint. With the development of the research, SIGSOC has progressively redirected its research emphasis toward investigating user requirements and behavioral traits. In 1982, SIGSOC underwent a name change and transformed into the ACM Special Interest Group on Computer–Human Interaction (SIGCHI). The first SIGCHI conference was held in Boston in 1983. Subsequently, the conference underwent a name modification and is now known as the ACM SIGCHI Conference on Human Factors in Computer Systems (CHI). It is currently recognized as the foremost meeting in the subject of human–computer interaction. Subsequently, HCI began on a golden period of advancement.
Human–computer interaction is intricately connected to the field of “Human Factors” and may be described as “Human–Tool Interaction” (Grudin, Reference Grudin2018). Some researchers (Harrison et al., Reference Harrison, Sengers and Tatar2011; Sun et al., Reference Sun, Zhang, Qin, Li and Wang2020) have proposed that the research paradigm of human–computer interaction can be divided into three paradigms. The first paradigm focuses on the relationship between physical systems and human, which is based on Human Factors. The second paradigm studies the cognitive process in human–computer information interaction, which is based on information processing. The third paradigm focuses on the situational factors of human–computer interaction and believes that intelligent technology enhances the perception and cognitive ability of machines. It can actively initiate interactive behaviors by sensing user needs, emphasizing the integration of human, machines, and the environment. Due to the incorporation of artificial intelligence technology, the third paradigm of human–computer interaction introduces new difficulties and possibilities to the area, making it complex and unclear. “Human–AI Interaction” has evolved into a state-of-the-art subject in HCI (Amershi et al., Reference Amershi, Weld, Vorvoreanu, Fourney, Nushi, Collisson, Suh, Iqbal, Bennett, Inkpen, Teevan, Kikin-Gil and Horvitz2019).
2.2.2 Development: Human–AI Interaction
The field known as “Human–AI Interaction” researches and develops solutions for improving communication and collaboration between human and AI systems. The goal of human–AI interaction is to develop AI systems that are easy to use, reliable, morally sound, and advantageous for people. The chronicle of human–AI contact commences with the advent of AI. During the 1950s and 1960s, researchers in the field of artificial intelligence successfully created the initial AI systems capable of executing complex tasks like playing chess, demonstrating theorems, and translating languages. Individuals could engage with artificial intelligence systems through the act of playing checkers at that time. During the 1970s and 1980s, scientists created artificial intelligence systems capable of emulating human cognitive processes and achieving more authentic interactions, including speech recognition, computer vision, and natural language understanding. Nevertheless, these systems encounter difficulties in terms of reliability and explainability, and are frequently condemned as black boxes. Since the 1990s, deep learning has made significant progress. Artificial intelligence has exhibited exceptional proficiency in activities such as image processing, natural language processing, and game playing, and has even outperformed human talents in certain domains. New patterns have also been formed as a result of the interaction between humans and artificial intelligence. These new patterns include recommendation systems, voice assistants, conversational agents, and other similar systems. However, there are also concerns regarding privacy, security, morality, and ethics that are associated with artificial intelligence.
Within the field of human–AI interaction, researchers are consistently investigating new approaches to enhance the capabilities of artificial intelligence. For instance, they examine ways to make artificial intelligence systems more transparent and explainable in order to make it easier for people to comprehend how the system operates and the outcomes of its operation. The needs of users and the input they provide are also taken into consideration by researchers, who are continually investigating ways to develop artificial intelligence systems that are more responsive. Researchers are also considering how to develop suitable indicators to evaluate the influence that artificial intelligence systems have on users, society, and the environment. Therefore, human–AI interaction involves multiple disciplines such as computer science, psychology, sociology, design, and ethics, and has interdisciplinary research characteristics. Researchers use methods such as user research, prototyping, and evaluation to create user-centered artificial intelligence systems. By doing so, they want to amplify and enhance human capabilities.
The field of human–AI interaction has introduced new challenges due to the unique characteristics of AI systems, including situational awareness, adaptive learning, autonomous decision-making, and active interaction. These challenges include addressing issues such as the allocation of human–computer initiative, situational adaptation, and active response modes, which are not typically encountered in traditional HCI fields. When AI is introduced into the context of society, AI that assumes a social role should possess anthropomorphic characteristics such as language proficiency, emotion detection, and cognitive abilities (Spatola et al., Reference Spatola, Marchesi and Wykowska2022). However, this also presents new challenges in the study of human–AI interaction.
2.2.3 New Motivation: Human–AI Collaboration
Human–AI collaboration is when humans and AI each use their own strengths to jointly complete a task. The purpose is to establish a synergistic relationship between the two to contribute to the completion of tasks in various fields (Cañas, Reference Cañas2022). During the collaborative process, humans and artificial intelligence transfer information, comprehend each other’s objectives through interaction, and coordinate tasks and advancement following appropriate communication, finally accomplishing tasks proficiently. Studies indicate that the collaboration between humans and AI yields superior outcomes compared to situations when either people or AI work alone (Kahn et al., Reference Kahn, Savas, Morrison, Shaffer and Zapata2020). Artificial intelligence thrives in the domain of computing, which involves the rapid processing of data and tasks using predetermined algorithms and rules. Despite significant developments in machine learning, deep learning, and neural networks, artificial intelligence remains short in cognitive capacity, domain knowledge, creative thinking, and moral judgment when compared to humans. The collaboration between humans and artificial intelligence involves the integration of human perception and the computational capabilities of artificial intelligence to deliver novel solutions to complex challenges.
The aim of human–AI collaboration is to enhance the level and efficacy of overall job accomplishment by assembling a team consisting of humans and AI. Furthermore, it is imperative to take into account the human–AI interaction experience and situational awareness in specific scenarios (Kitchin & Baber, Reference Kitchin and Baber2016), as well as the division of roles of the AI system in the workflow. Given that AI systems serve as collaborators, it is essential to address the matter of trust between humans and AI. Additionally, it is important to investigate the psychological models and changes in emotions experienced by humans throughout the collaboration process. An additional significant problem that arises during the process of carrying out tasks is the question of how to combine the practical experience and expertise of human workers with the results of calculations performed by artificial intelligence. When both humans and artificial intelligence created information, the next step should be to concentrate on effectively transferring that information. In addition, the focus of human–AI collaboration is also on the development of an effective interactive interface and approach.
The study of Human–AI interaction prioritizes the improvement of interaction methods and interface design of artificial intelligence systems, with a strong emphasis on enhancing the user experience. In contrast, human–AI collaboration encompasses a broader scope of research. Human–AI collaboration focuses on using the strengths of each to effectively accomplish job tasks. In practical situations, AI can be utilized in collaborative decision-making environments, where it works alongside humans to offer insights, evaluate data, and participate in decision-making processes (Deshpande et al., Reference Deshpande, Pan and Foulds2020; D. Wang et al., Reference Wang, Weisz, Muller, Ram, Geyer, Dugan, Tausczik, Samulowitz and Gray2019). In the field of human-centered collaborative automated driving, AI can help human drivers with navigation and traffic management by evaluating sensor data (Xing et al., Reference Xing, Lv, Cao and Hang2021). Within the field of artistic production, the advancement of generative artificial intelligence allows designers to employ artificial intelligence modeling for the purpose of generating stimulating samples that can enhance their creativity (Jeon et al., Reference Jeon, Jin, Shih and Han2021). Overall, the relationship and collaboration between humans and artificial intelligence should prioritize humans, empower human workers to take control, and utilize AI to its fullest potential without excessive dependence (Buçinca et al., Reference Buçinca, Malaya and Gajos2021).
2.3 A Framework of Human–AI Interaction and Collaboration
With the growing autonomy of AI, it assumes a novel role in systems of human–computer interaction. Artificial intelligence technology has transformed machines from being simple assistants to become possible collaborators in human collaboration (Seeber et al., Reference Seeber, Bittner, Briggs, de Vreede, de Vreede, Elkins, Maier, Merz, Oeste-Reiß, Randrup, Schwabe and Söllner2020). Researchers have seen this trend of transformation and have conducted investigations into research areas such as human–AI interaction (Amershi et al., Reference Amershi, Weld, Vorvoreanu, Fourney, Nushi, Collisson, Suh, Iqbal, Bennett, Inkpen, Teevan, Kikin-Gil and Horvitz2019), human–AI teamwork (Seeber et al., Reference Seeber, Bittner, Briggs, de Vreede, de Vreede, Elkins, Maier, Merz, Oeste-Reiß, Randrup, Schwabe and Söllner2020), and human–AI symbiosis (Nagao, Reference Nagao and Nagao2019). A conceptual framework of joint cognitive systems was proposed by Xu and Gao (Reference Xu and Gao2024) as part of the framework research of human–AI interaction. This framework was intended to reveal the interaction between human–AI collaborative teams, drawing from Erik Hollnagel and David Woods’s joint cognitive systems theory, Mica Endsley’s situation awareness cognitive engineering theory, and the widely used agent theory in the AI communities (Xu & Gao, Reference Xu and Gao2024). The framework defines the human–AI team as a joint cognitive system consisting of two cognitive agents. The team achieves the complementarity of human biological intelligence and computer intelligence through their collaboration and interaction.
Interaction is essential in the topic of human–AI interaction. Hornbæk et al. (Reference Hornbæk, Mottelson, Knibbe and Vogel2019) conducted a comprehensive analysis of the articles published at the CHI conference during the last thirty-five years. The analysis focused on the key issues in the field of HCI, especially looking at the features of interaction style and interaction quality. Interaction style generally refers to a variety of factors, including the type of input/output, the technology that is utilized during the interaction, or the media that is used for the interaction. Different types of interactions, such as mobile interaction, touch screen interaction, cross-device interaction, and virtual reality interaction, are examples of interaction styles that are frequently associated with the way they take place. The quality of interaction is connected to the emotions and thoughts of individuals, and it is typically expressed in the user experience, which includes features such as the utility, usability, efficiency, and other aspects of interaction.
The research in human–AI interaction aims to investigate the relationship and behavioral patterns that exist between humans and AI. We have developed a study framework for the interaction and collaboration between humans and AI based on prior studies (Figure 2.1). Similar to researchers who explore HCI in the field of artificial intelligence, humans and AI are considered to be the two primary subjects of the research. The two are connected through interaction and collaboration in a variety of scenarios and tasks. In the context of human–AI collaboration research, we expand the traditional human–computer interaction framework and refer to Hornbæk’s research (Hornbæk et al., Reference Hornbæk, Mottelson, Knibbe and Vogel2019) to separate it into two aspects: interaction quality and interaction mode. The quality of interaction is most closely linked to the user’s interactive experience. Driven by the idea of “human-centered AI” (GDPi, 2018; Xu, Reference Xu2019), human perception of artificial intelligence is placed in an important position. The aspect of interaction quality includes various human-related subjects, including user psychological models of human–AI collaborative teams, the ability to explain AI interaction behaviors and outcomes, human trust in AI team members, and human emotional awareness of AI. The interaction mode is associated with the technology of artificial intelligence interaction and user behavior in the interaction process. The attributes of artificial intelligence, including situational awareness, autonomous decision-making, and adaptive learning, introduce new aspects to the way of interaction, which encompass the allocation of roles for AI in interactions, the methods of interacting with AI, the boundaries of human–AI interaction, and the design of interaction in innovative interactive contexts.

Figure 2.1 The framework of Human–AI Interaction and Collaboration (HAII&C)
Figure 2.1Long description
Interaction or collaboration leads to user via user experience and to AI via user behavior. Both the user and AI are interconnected individually to scenarios or tasks. Interaction quality lists the following. Human mental model, explainable AI, trust, and anthropomorphism. Interaction styles are role assignment, interaction paradigm, interaction boundary, and ethical interaction design.
Based on this Human–AI Interaction and Collaboration (HAII&C) framework, this book reviews the current research status of related issues and points out future development directions. The literature review is a qualitative study that examines current literature and highlights research topics. While it may not be exhaustive, it should be enough to identify the main challenges and prospects in the present domain. These findings will call for increased attention from researchers toward the topic of human–AI interaction and collaboration; and offer strategic suggestions for future initiatives in human–AI interaction.
2.4 Interaction Quality in Human–AI Interaction and Collaboration
The quality of user interaction is inextricably linked to the user experience. In contrast to traditional research on HCI, which has focused on system usability and ease of use, research on human–AI interaction has emerged as a new area of enquiry in the context of emerging technologies. In situations where an AI is engaged in a collaborative task as a helper or team member, it is pertinent to ascertain whether humans are able to accurately assess the AI capabilities and work with it in an efficient manner.
The question thus arises as to whether humans are more tolerant of AI performance or whether they set more exacting standards for AI. To what extent do humans trust AI, and to what extent do they trust their human colleagues? What is the human perception of the reliability of the results provided by AI? As AI becomes increasingly involved in human activities, these questions will continue to emerge and drive further discussion among researchers. This discussion encompasses a range of topics, including users’ mental models in human–AI interactions and collaborations, interpretable AI, users’ trust in AI, and AI anthropomorphism.
2.4.1 Human Mental Model of AI
The ability to construct an accurate mental model of AI will influence human use of AI (Steyvers & Kumar, Reference Steyvers and Kumar2023). A mental model of AI consists of human beliefs about AI and expectations about interacting with AI. In general, a mental model is a simplified perception of the world developed by humans, according to which humans process information and make predictions (Barnes, Reference Barnes1944). The more accurate the mental model people construct of the AI, the more likely the AI will be used correctly (Bansal et al., Reference Bansal, Nushi, Kamar, Lasecki, Weld and Horvitz2019). If a user constructs the wrong mental model, it can lead to inappropriate trust and reliance on the AI (Steyvers & Kumar, Reference Steyvers and Kumar2023), creating barriers to human–AI collaboration.
Collaborative work emphasizes shared mental models (SMMs) among human teams (Merry et al., Reference Merry, Riddle and Warren2021), which require team members to have consistent perceptions of both the team and the task, and this mental model can better facilitate effective collaboration. In the context of human–AI interaction, researchers explore the content of human perceptions when using AI as a potential object of collaboration. The researcher introduces the cognitive science approach to the field of human–AI interaction and discusses the concept of user perception of AI through the user’s perceptual image of the AI entity (Hwang & Won, Reference Hwang and Won2022). Improving our understanding of how humans perceive their AI teammates is an important foundation for our understanding of human–AI teams in general, and Kelly et al. (Reference Kelly, Kumar, Smyth and Steyvers2023) proposes a framework based on Item Response Theory (IRT) to model users’ perceptions, and apply this framework to the real world to observe human perceptions of the AI, as well as the perceptions of other teammates.
Individuals’ perceptions of phenomena are invariably influenced by a multitude of factors. It is hypothesized that interactive explanations, affective associations, and cognitive biases will exert an influence on users’ mental models of AI. Nevertheless, empirical evidence indicates that, in the context of adaptive in-vehicle systems, the provision of interactive explanations does not markedly enhance system comprehensibility and fails to facilitate the optimization of users’ mental models, when compared to text-based explanatory approaches (Graefe et al., Reference Graefe, Rittger, Carollo, Engelhardt, Bengler, Duffy, Krömker, Streitz and Konomi2023). The researchers posit that participants may exhibit greater cognitive proficiency in scenarios involving AI-based driving systems, necessitating comprehensive investigations across diverse scenarios. Pataranutaporn et al. (Reference Pataranutaporn, Liu, Finn and Maes2023) found that users’ mental models of AI will influence their perceptions, experiences, and interactions. Furthermore, it was discovered that users tend to construct mental models using their past views and expectations of the experience. Consequently, different users will perceive trustworthiness, empathy, and validity differently when confronted with the same AI. During the interaction, if the AI responds to the user’s emotions in a manner that evokes a sense of empathy, the user will respond with a similar empathetic response. This raises the question for developers as to whether it is preferable to conceptualize the AI as an emotional entity or as an algorithmic system devoid of emotional capacity. The understanding and perception of AI by humans is frequently distorted by cognitive biases. Rastogi et al. (Reference Rastogi, Zhang, Wei, Varshney, Dhurandhar and Tomsett2022) employed a mathematical approach to model these biases and devised a time-based de-anchoring strategy based on the constructed framework. Their study demonstrated that the time-allocation strategy can effectively mitigate the anchoring bias and enhance the efficacy of human–AI collaboration.
The mental model is not a fixed entity; rather, it will undergo gradual transformation as a result of the interaction and collaboration between humans and AI. To illustrate, in the context of a team voting process conducted in collaboration with AI, the performance of the AI system will have a significant impact on the confidence of the human team members in their ability to make decisions collectively with the AI. When the AI system demonstrates poor performance, the confidence of the human team members in the AI system tends to decline rapidly (M. Hu et al., Reference Hu, Zhang, Chong, Cagan and Goucher-Lambert2025). Cao et al. (Reference Cao, Gomez and Huang2023) demonstrated that, in AI-assisted human decision-making, users were more likely to adhere to the AI’s recommendations when presented with a longer time frame for decision-making. This finding elucidates the impact of time pressure at different stages of the decision-making process on human cognition. Similarly, the accumulation of experience gives rise to alterations in mental models. In the course of playing games with AI players, Villareale et al. (Reference Villareale, Harteveld and Zhu2022) identified two distinct approaches to the development of mental models by players. The first is based on observational learning, whereby players observe the actions of others in the game. The second is based on the construction of mental models using a priori experience, whereby players draw upon their existing knowledge and understanding to interpret and make sense of the game. The formation of mental models is a complex process that is influenced by a number of factors and exhibits dynamic behavior. These insights provide valuable guidance for researchers engaged in the field of human–AI interaction and collaboration.
2.4.2 Human-Centered Explainable AI
Explainable Artificial Intelligence (XAI) has emerged as a prominent research area in recent years, driven primarily by the ongoing discourse surrounding “black box” algorithms and the imperative for transparency. It has been demonstrated that systems that are opaque can result in comprehension issues for users, which in turn can impair the efficacy of human–machine collaboration. The concepts of “explainability,” “transparency,” and “interpretability” are often intertwined. The terms “interpretability” and “transparency” are often used interchangeably. Indeed, “explainability” and “interpretability” are reflections of the transparency of a system. “Explainability” emphasizes that the logic of a system can be understood by humans from a human perspective (Rosenfeld & Richardson, Reference Rosenfeld and Richardson2019), whereas “interpretability” focuses on the results from an output perspective. Researchers in the HCI community have placed a particular emphasis on explainability, which is more closely related to the user during human–AI collaboration.
The provision of more detailed and comprehensive explanations will enhance the transparency of AI, which will facilitate the identification of AI biases, mitigate the influence of AI biases on human judgments (T.-Y. Hou et al., Reference Hou, Tseng and Yuan2024), and enable users to discern AI failures through such explanatory mechanisms (Cabrera et al., Reference Cabrera, Perer and Hong2023). To gain a deeper understanding of the concept of explainability, researchers have initiated a detailed examination of the user requirements for interpretability in diverse contexts. C. Chen and Zheng (Reference Chen and Zheng2023) demonstrated that consumers have a greater need for explainability in AI recommendations in utilitarian scenarios, such as those pertaining to healthcare. Users are more interested in acquiring information that is genuinely useful than in the intricacies of a technical system. Furthermore, they seek to utilize the information for the enhancement of their own abilities (S. S. Y. Kim et al., Reference Kim, Watkins, Russakovsky, Fong and Monroy-Hernández2023). Malandri et al. (Reference Malandri, Mercorio, Mezzanzanica and Nobani2023) even based their study on user needs, with the objective of developing a system that incorporates user knowledge and experience into the AI interpreter, thereby enhancing the utility of the explainability. The choice of explanation strategy also presents a trade-off between advantages and disadvantages, which must be considered in the context of the specific application scenario. B. Wang et al. (Reference Wang, Yuan and Rau2024) established three distinct explainability strategies: global explainability, comparative explainability, and deductive explainability. Global explainability is more appropriate for tasks that require significant cognitive effort, whereas comparative explainability is better suited to daily scenarios that involve higher social presence and trust. In scenarios that require rapid interaction with the AI, it is advisable for the AI system design to consider reducing the mental effort required by the user by employing global or deductive explainability.
The explainability of human–AI interaction is also not as high as it should be. The provision of excessive explanatory information can result in information overload for the user (Westphal et al., Reference Westphal, Vössing, Satzger, Yom-Tov and Rafaeli2023). The provision of explanations to users may result in the perception of increased task complexity. Ribera and Lapedriza (Reference Ribera and Lapedriza2019) proposed the concept of “user-centered explainable AI,” which entails the delivery of an optimal amount of high-quality information that is pertinent to the user. In a user-centered study, it was discovered that, when the explanation provided by the AI is in contradiction with the user’s decision, the user will experience negative emotions and express dissatisfaction with the AI (Ebermann et al., Reference Ebermann, Selisky and Weibelzahl2023). The degree of alignment between user perception and the provided explanation will influence the efficacy of the latter. We put forth the proposition that AI should be permitted to engage in discourse with users, wherein they may elucidate their reasoning processes and proffer tailored explanations contingent on the cognitive capabilities of disparate users.
2.4.3 Trust between AI and Humans
The current state of AI systems does not yet allow for the development of a fully reliable decision-support system. As a result, trust becomes a significant factor in human–AI interaction. The concept of trust in AI refers to the extent to which humans can rely on the technology when faced with uncertainty during human–AI interaction and collaboration. It is important to note that trust is a double-edged sword, and excessive trust can lead to users reducing their workloads by delegating tasks to the AI (Harbarth et al., Reference Harbarth, Gößwein, Bodemer and Schnaubert2025), which can ultimately result in less efficient collaborative work. Moreover, an understanding of the fundamental principles governing the system can mitigate the tendency of users to place excessive trust and complacency in its capabilities. It is, therefore, essential to establish a suitable trust relationship between humans and AI, also known as calibrated trust, in order to optimize the collaborative performance of AI. The question of human trust in AI remains a topic of ongoing debate among researchers. It was found that humans will demonstrate a higher level of trust when they collaborate with AI compared to when they engage in human collaboration (Jain et al., Reference Jain, Garg and Khera2022). Conversely, Georganta and Ulfert (Reference Georganta and Ulfert2024) posited that humans exhibit diminished perceptions of trustworthiness and emotional interpersonal trust in AI when it is introduced as a novel member of the team, relative to human teammates. This disparate outcome also exemplifies, from a particular standpoint, that the trust between humans and AI is a multifaceted matter that may be influenced by a convergence of assorted types of elements.
Indeed, researchers are also investigating the factors that influence the level of trust that individuals place in AI systems. These factors include aspects such as user education, past experiences, user biases, and perceptions of automation (Asan et al., Reference Asan, Bayrak and Choudhury2020). K. Hou et al. (Reference Hou, Hou and Cai2023) developed a model that suggests that there are three main factors that influence the level of trust that individuals place in AI systems: interaction characteristics, environmental characteristics, and personal characteristics. Interaction characteristics include factors such as perceived anthropomorphism, perceived rapport, and perceived enjoyment. Environmental characteristics include factors such as peer influence and facilitating conditions. Personal characteristics include factors such as self-efficacy. The findings of this study indicate that the interaction will be an important factor of trust that is established between humans and AI team members. In regard to personal characteristics, Tutul et al. (Reference Tutul, Chaspari, Levitan and Hirschberg2023) yielded analogous results, indicating that user openness is positively correlated with trust. The more open and accepting a user is, the more likely he or she is to trust the AI’s decisions. In order to foster appropriate trust in AI, G. Zhang et al. (Reference Zhang, Raina, Brownell and Cagan2022) proposed a “deception strategy” to calibrate trust. This strategy involved users being unaware that they were working with AI. However, the results demonstrated that this approach did not enhance the average performance of teamwork. It is, therefore, recommended that future research should also explore how to appropriately calibrate human trust in AI and improve the joint performance of humans and AI.
2.4.4 Anthropomorphism and Emotional Support
Anthropomorphism can be defined as the tendency to attribute human characteristics, motives, intentions, or emotions to nonhuman subjects, whether actual or imagined (Epley et al., Reference Epley, Waytz and Cacioppo2007). The anthropomorphic characteristics of AI encompass both the physical and psychological attributes of human beings (M. Zhang et al., Reference Zhang, Gursoy, Zhu and Shi2021). The term “physical” is used to describe the face or body of a person, whereas “psychological” refers to the mind or personality. The uncanny valley theory posits that anthropomorphism exerts inconsistent effects on users (Gursoy et al., Reference Gursoy, Chi, Lu and Nunkoo2019). At higher levels of anthropomorphism, positive effects are observed within a certain range, including increased human trust in automated systems, enjoyment of interacting with AI assistants (A. Kim et al., Reference Kim, Cho, Ahn and Sung2019), and the quality of interactions with AI systems (Xie et al., Reference Xie, Zhao, Zhou, Lu, Liang and Jiang2024). However, at some point, the practice of anthropomorphization may prove detrimental to the user. Anthropomorphization has the potential to foster a sense of intimacy between humans and machines, thereby enhancing the emotional experience of users. However, in the context of human–AI interaction and collaboration, this intimacy may intensify the user’s perception of the intrusion of their personal information, leading to a heightened sense of privacy invasion (Chi et al., Reference Chi, Denton and Gursoy2020).
Through a series of investigations, researchers have endeavored to elucidate the processes by which anthropomorphism influences the efficacy of HCI. This includes the examination of the influence of psychological distance on user perceptions of AI-based assistants (Li & Sung, Reference Li and Sung2021). Lee et al. (Reference Lee, Pan and Hsieh2021) posit that emotional support enhances interactant satisfaction with communication, urging designers to prioritize the emotional and empathetic responses of AI over functional considerations. To gain further insight into the impact of anthropomorphism on interaction, Xie et al. (Reference Xie, Zhu, Zhou and Liang2023) employ the smart home as a case study, delineating anthropomorphism into four dimensions: visual, identity, emotional, and auditory cues. The findings indicate that emotional and auditory cues exert a pronounced positive influence on interaction satisfaction, whereas visual and identity cues exhibit a relatively minimal effect. Furthermore, the incorporation of emotional cues, such as humor, can directly influence the user’s perception of the interaction content. It is important to note that the study of Xie et al. (Reference Xie, Zhu, Zhou and Liang2023) focuses on smart home voice assistants. Given the specific characteristics of this technology, users are likely to be influenced by the task environment, demonstrating a tendency to prioritize the sound and interaction content during the interaction. It is thus imperative to conduct further investigations in other scenarios. In order to more accurately ascertain users’ emotional satisfaction, Shin et al. (Reference Shin, Choi, Hwang and Kim2021) employ a methodology designated as Kansei engineering (alternatively referred to as affective engineering) to quantify the influence of voice-based intelligent systems with disparate dialogical styles on users’ emotional satisfaction during interactions. Research on the anthropomorphization and emotional characterization of AI can provide insights to advance the development of the “emotional quotient (EQ)” in AI systems and facilitate harmonious human–AI interaction.
2.5 Interaction Mode in Human–AI Interaction and Collaboration
Interaction mode primarily signifies the manner or conduct of human beings when engaging with technology in the setting of artificial intelligence technology assistance. In the context of technological advancement in the age of smart devices, the environment of interaction has undergone a significant transformation, evolving from the traditional WIMP (window, icon, menu, pointing) paradigm to a multimodal interaction that supports a range of modalities, including speech, vision, and gesture. In light of these developments, researchers have conducted studies on effective interaction paradigms in the new environment. In the context of specific interaction processes, the advancement of AI capabilities and autonomy has led to a shift in the role of AI, which is no longer constrained to a supporting role but has the potential to assume a more prominent position. Concurrently, the enhanced autonomy of AI has prompted scholars to examine the boundaries of interaction and interaction ethics, thereby creating new avenues for HCI research.
2.5.1 Role Assignment in Collaboration
The collaboration between human and AI can be an effective method for leveraging the strengths of machines, including accuracy, speed, and flexibility. The integration of these strengths with human traits can be a powerful approach for addressing complex problems. It has been demonstrated that AI can facilitate disease assessment by pathologists (Lindvall et al., Reference Lindvall, Lundström and Löwgren2021), provide assistance with the writing of second language texts (Zou & Huang, Reference Zou and Huang2024), and automate data analysis (D. Wang et al., Reference Wang, Weisz, Muller, Ram, Geyer, Dugan, Tausczik, Samulowitz and Gray2019) with a view to enhancing productivity. In these task scenarios, the AI is primarily engaged in a supportive role. Indeed, there are various types of role assignments involved in human–AI collaboration. Scholtz (Reference Scholtz2003) defines five roles for humans in their interactions with robots, based on the level of automation: the Supervisor, the Operator, the Teammate, the Bystander, and the Mechanic. In light of the distinctions inherent to the collaborative process between humans and AI, Jiang et al. (Reference Jiang, Sun, Fu and Lv2024) postulated four distinct models of human–AI collaboration: assisted intelligence, augmented intelligence, cooperative intelligence, and autonomous intelligence. This classification of collaboration modes is primarily based on the varying degrees of human control and supervision involved in collaborative tasks. Further distinction can be made between the dominant roles of human and AI, leading to the classification of collaboration modes as human-led (including assistive and augmentative intelligence), collaboration modes in which humans and AI divide the work (which can be categorized as a complete or incomplete division of labor), and AI-led collaboration modes (in which AIs dominate the assignment of tasks).
In response to the division of different roles, scholars have conducted research on the work efficiency of human–AI collaboration. In contexts where humans occupy a dominant role, some studies have revealed that dominant individuals derive a sense of power (P. Hu et al., Reference Hu, Lu and Wang2022) from treating AIs as subordinates (Sadeghian & Hassenzahl, Reference Sadeghian and Hassenzahl2022). This sense of power is manifested in the user commanding the AI assistant to perform the desired actions, which the AI then accomplishes to the best of its ability. This perceived sense of power can result in humans exhibiting overconfidence and optimism, which in turn reduces the perception of potential risks associated with AI. The aforementioned perception of risk can also affect human attitudes toward AI. Consequently, individuals occupying decision-making roles may exhibit a greater reluctance to embrace AI in comparison to their less influential counterparts (Jain et al., Reference Jain, Garg and Khera2022). There is a paucity of research examining the phenomenon of AI assuming a dominant position. One study, however, did identify a preference among some collaborators for the independence and speed afforded by an AI leader. This suggests that, in scenarios where time is of the essence, the assignment of the leader role to AI may prove advantageous (Lobo et al., Reference Lobo, Koch, Renoux, Batina and Prada2024). The AI-dominated collaboration model may offer enhanced efficiency and could become the prevailing model for human–AI collaboration in certain domains. This shift may also encourage a more profound level of research in human–AI collaboration.
2.5.2 Effective Intelligent Interaction Paradigm
The advancement of AI has prompted a shift in the manner of human–machine interaction, whereby the objective is to convey the desired outcome to the machine rather than to instruct it on the means to achieve that outcome. Historically, HCI research has undergone a significant transformation, evolving from batch processing to command-based interaction design. Batch processing interactions typically do not entail a back-and-forth between the user and the machine. Instead, the user specifies the entire workflow and then delegates it to the machine for execution. In the command-based interaction paradigm, the user and the computer take turns executing commands, with text and graphical user interfaces emerging in the process. While the majority of contemporary generative AI tools engage with users through a user interface, they are capable of performing tasks based on the user’s intent and represent the current intention-based interaction paradigm. It is possible that this paradigm will persist and evolve for an extended period.
In accordance with the intention-guided paradigm, researchers conduct a further analysis of the interaction paradigm and verify its validity. To illustrate, J. Fan et al. (Reference Fan, Tian, Dai, Du and Liu2018) put forth a framework for software research that encompasses interface paradigms, interaction design principles, and mental models. In particular, the study by Desolda et al. (Reference Desolda, Dimauro, Esposito, Lanzilotti, Matera and Zancanaro2024) examines the enhancement of interaction effects through the implementation of three distinct interaction strategies: clarification, negotiation, and reconfiguration. To illustrate these concepts, the authors utilize a healthcare environment as a case study. Nevertheless, this interaction remains user interface centric. In the context of multimodal and pervasive computing, the development of methods for interacting with users in natural ways, such as through vision, hearing, and gesture, represents a promising avenue of research. By analyzing the user’s gaze, it is able to discern the user’s implied points of interest and project human intentions (Newn et al., Reference Newn, Singh, Velloso and Vetere2019). Zhao and Bao (Reference Zhao and Bao2023) developed a story generation tool to support multimodal interactions by recognizing the user’s gestures in conjunction with an image generation model. The utilization of sensors can also assist AI in acquiring further information, including determining the location of an object, extracting intricate data from speech, and recognizing user gazes and gestures of interest (Paul et al., Reference Paul, Nicolescu and Nicolescu2023). In conclusion, the design of multimodal fusion interaction paradigms for AI systems will become a significant issue in the future of human–AI interaction and collaboration.
2.5.3 Human-Controlled Interaction Boundary
AI is typically met with greater skepticism than humans, and examining interactions centered on human control may help alleviate concerns. Haupt et al. (Reference Haupt, Freidank and Haas2025) indicate that texts generated by interactions in which the AI acts as the author but ultimate control is in the hands of humans will acquire a higher level of credibility, reflecting human concerns about control. Control is a fundamental human need, whereby humans seek to oversee the process and outcome of events (C. Y. Chen et al., Reference Chen, Lee and Yap2016). When their need for control is threatened, they experience frustration or helplessness. Moreover, the majority of AI systems are “black box” models, rendering them vulnerable. In the event of a system malfunction, the loss of human control will impede operators’ ability to comprehend the system’s operation (Sarter & Woods, Reference Sarter and Woods1995). However, it is not always the case that humans have a strong desire for control. In the context of education, one study revealed that teachers are inclined to collaborate with AI in order to alleviate the burden of pairing students (Yang et al., Reference Yang, Lawrence, Echeverria, Guo, Rummel, Aleven, De Laet, Klemke, Alario-Hoyos, Hilliger and Ortega-Arranz2021). This reflects a desire to leverage the autonomy of AI in certain scenarios or tasks, thereby enhancing the efficiency of collaboration. It is, therefore, recommended that the future focus of research be on the allocation of control in human–AI interaction and collaboration, as well as the delineation of the boundaries of these interactions.
In future research, it would be beneficial to explore methods of maximizing the autonomy of AI while maintaining human control. Autonomy represents a significant aspect of AI technology. However, research on system autonomy within the HCI community remains in its nascent stages. Further research is required to assess the applicable scenarios and degree of manifestation of AI autonomy, as well as the relationship between autonomy and safety, reliability, and accountability. As an illustration, while contemporary self-driving automobiles may be autonomous (Biondi et al., Reference Biondi, Alvarez and Jeong2019), it is crucial to recognize the potential risks and concurrently investigate the boundaries of human–AI interaction to guarantee that humans can assume control of the vehicle in emergencies and to develop human-controllable AI.
2.5.4 Ethical Interaction Design
The ethical design of artificial intelligence has been the subject of considerable research and practical application. In a significant development, UNESCO released the inaugural global standard on AI ethics, the Ethics of Artificial Intelligence (Ethics of Artificial Intelligence | UNESCO, n.d.). This document proposes a series of key principles to enhance privacy and data protection, transparency and interpretability, responsibility and accountability, and fairness and nondiscrimination in the development of AI. Additionally, prominent technology companies such as Google (Google AI Principles, n.d.), IBM, and Microsoft have published ethical principles for the development of AI systems. Nevertheless, there is still a considerable distance to travel before ethical principles can be upheld in human–AI interaction and collaboration.
In order to achieve this objective, researchers in the HCI community are investigating methods for incorporating transparent design principles into automated systems, with the aim of facilitating the tracing back of system error data (Santoni de Sio & van den Hoven, Reference Santoni de Sio and van den Hoven2018). Presently, researchers are training and optimizing the behavior of AI by incorporating users into the design and testing process. This allows for the training and optimization of AI’s behavior through user interaction and feedback data, with the aim of improving the results provided by AI and aligning them with human value needs. Furthermore, the interaction between humans and AI must be governed by ethical standards. To this end, accountability, authorization, and supervision should be implemented to regulate human interactions. Moreover, those working in the HCI field may wish to consider incorporating additional ethical cues into the interaction to remind human operators of the legitimacy of the interaction process (van Diggelen et al., Reference van Diggelen, Barnhoorn, Post, Sijs, van der Stap, van der Waa, Russo, Ahram, Karwowski, Di Bucchianico and Taiar2021). In sum, the formation of ethical human–AI interaction and collaboration necessitates not only the engineering of reliable technology but also the guidance of human behavior to foster a sustainable human–AI collaboration environment.
2.6 Future Directions and Challenges
2.6.1 Building A Complete Cognitive Framework for Human–AI Interaction
In recent years, there has been a significant increase in research interest on the application of AI for decision-making and prediction. It is widely acknowledged that AI has the potential to greatly aid people in practical tasks. Optimally, both humans and AI will reap the benefits of engaging in human–AI interaction and collaboration. However, in large-scale communication and cooperation between humans and AI, people need to face a challenging problem: how to make AI compatible with human values, cognitive laws, and morality and ethics, so as to form a complete cognitive framework to ensure trust and reliability in human–AI interactions.
The comprehensive cognitive framework encompasses fundamental topics ranging from user perception to value ethics. From a micro perspective, individuals might have biases or mistrust toward AI based on their intrinsic perceptions, which can hinder the collaboration between humans and AI. In addition, if the design principles of the AI do not align with the cognitive laws of humans, the AI’s behaviors may confuse humans and lead them to discontinue its usage. Aside from the limitations imposed on the utilization of AI as a result of distrust, an excessive reliance on AI might also prompt humans to unquestioningly adopt the choice outcomes furnished by AI throughout the collaborative procedure, so potentially resulting in erroneous judgments. The field of artificial intelligence has seen a transition from the use of small models to the utilization of large models. AI large models include a diverse array of Internet data into their training sets, which may include erroneous information, disinformation, and biased information. Consequently, these inaccuracies can significantly impact the effectiveness of the AI during human–AI interaction, posing a substantial risk. At the moment, the lack of ability to precisely understand the user’s cognitive patterns and alert the human at the appropriate moment would be a significant disadvantage. The instantaneous, complexity, and flexibility of user cognition also present major challenges for human–AI interaction. The ability to precisely comprehend the user’s cognitive, emotional, and demand shifts in human–AI interaction will significantly influence the efficacy of AI.
From a broad perspective, such as considering human values, the contemporary world can be described as a pluralistic society. AI as a global topic will be viewed differently in different cultures. Further discussion is required among researchers to harmonize the value system supporting AI with the cultural values of human society. This includes addressing concerns related to protecting privacy, evaluating reliability, monitoring risks, and other relevant issues in the context of global AI governance. Attaining consensus within the human community will establish a foundation to uphold the cognitive framework for human–AI collaboration. According to this, AI can enhance human cognition and comprehension of decision support, aiding humans in developing a more precise grasp of AI and consequently enabling more accurate utilization of AI (Bansal et al., Reference Bansal, Nushi, Kamar, Lasecki, Weld and Horvitz2019).
In the future, we can investigate the topics that follow: understanding how humans perceive the capabilities of AI, including their ability to determine what AI is capable of and what it is not; examining how humans evaluate the privacy protection and security of AI, specifically whether sharing information during interactions and collaborations with AI presents a risk to information security; human assessment of AI reliability and trustworthiness, in which scenarios the results provided by AI will be more trustworthy and how the level of trust for AI-generated results will change during the changes in trust in the results generated by the AI over a long period of interaction and collaboration; and human perceptions of the risks and threats of AI, such as the hazards that inappropriate use will bring to the practice. Through the comprehensive depiction of human social values as well as the individual user’s cognitive system, it promotes the formation of a complete cognitive framework for human–AI interaction and collaboration, which can guide the design of interaction and optimize functions, so as to enable AI to better empower human beings, and improve the efficiency of the future human–AI collaborative society.
2.6.2 Supporting Adaptive Learning in Human–AI Collaboration
The degree of intelligence exhibited by AI systems in collaborative work with humans is significantly influenced by their capacity to autonomously learn, considering both user requirements and expert knowledge. Improving the AI’s capacity for adaptive learning will be a crucial research topic in the domain of human–AI interaction and collaboration.
There are now two well-established paradigms in the field of AI: connectionism and symbolicism. Deep neural networks are the most frequently utilized among connectionist AI models in contemporary times. Deep neural networks are trained to acquire perceptual capabilities by learning from data in existing environments, and subsequently utilize these capabilities to make decisions when presented with new information in different contexts. This perceptive competence is dependent on an enormous amount of training data. In future complex scenarios of human–AI collaboration, it will be difficult for the AI to perform better without pre-prepared training data. In the meantime, the training and computation of neural networks are opaque, making it challenging to figure out the meaning of the parameters and the rationale behind the outcomes. Among the currently common methods in symbolist AI are nonmonotonic reasoning. The technique assumes that the processing at the perceptual level has been finalized and that the information in the given circumstance has been transformed into coherent knowledge. Humans utilize this knowledge to engage in logical reasoning in order to get valid conclusions or results. This approach is reliant on knowledge and requires the involvement of human specialists to maintain a knowledge foundation that supports logical thinking. symbolicism and connectionism AI approach problem-solving from distinct angles. Human–AI collaboration have the potential to combine symbolic and associative AI, thereby utilizing the respective advantages of each approach. Integrating human expertise and supervision into machine learning workflows effectively combines human and machine intelligence, enabling human specialists to directly enhance model performance.
The process of human–AI interaction and collaboration is frequently characterized by its dynamic and flexible nature. Aside from integrating expert knowledge into model training to enhance the model’s overall performance, it is also crucial to consider the intelligent system’s ability to adapt to different learning scenarios. Nevertheless, the existing AI technology requires further enhancement in terms of autonomy and particularly its capacity for learning. It ought to try to advance toward autonomous continuous learning and acquiring knowledge through user interaction. Sekmen and Challa (Reference Sekmen and Challa2013) revealed that adaptive learning mechanisms can enable machine intelligence to consistently and independently update user models, acquire knowledge about the preferences and behaviors of communicating with users, and modify its own behaviors accordingly. This leads to improved and friendly interactions. These adaptive learning robots are highly favored by users.
By integrating user knowledge, adaptive learning in human–AI collaboration can enhance the friendliness of AI interaction behaviors and approaches. Future research can focus on investigating efficient methods for human–AI collaboration in circumstances with abundant information. For example, conducting insightful research in fields that possess rich expert knowledge, such as healthcare, education, and finance. And from an information science perspective, it provides insights for data and knowledge fusion to support the optimization of machine learning. We can hope that this will lead the way for AI to achieve previously unimaginable levels of self-learning and facilitate more harmonious human–AI partnerships.
2.6.3 Complementing the Strengths of Humans and AI
An important goal in human–AI collaboration is to achieve complementarity (Kelly et al., Reference Kelly, Kumar, Smyth and Steyvers2023) and, in doing so, to obtain good performance beyond that of humans alone or AI alone doing the work (Donahue et al., Reference Donahue, Chouldechova and Kenthapadi2022). This is similar to the division of tasks in a human team, but it is important to note that this is the ideal situation. Collaboration between human and AI does not always result in better performance than either people or AI working alone. In fact, research shows that human decision-makers are unable to improve the team overall efficacy when working with AI, and that AI can attain optimal performance when left to its own devices (Green & Chen, Reference Green and Chen2019). Nevertheless, we should have expectations for the future of human–AI collaboration. From this study, we should realize that collaborative work between humans and AI in practice is complex, and how to truly complement the strengths of humans and AI is a major challenge for future development.
As AI is undergoing a period of rapid development, the application of large models presents a multitude of possibilities for AI. The “black box” and opaque operational process make it challenging for humans to accurately predict the collaborative behavior of AI, which can result in unexpected outcomes that may negatively impact the overall performance of the team. To achieve effective integration of human and AI capabilities, it is essential to first identify the respective strengths of humans and AI in different task scenarios. This requires an analysis of the unique contributions that humans and AI can make as standalone entities, as well as an exploration of the ways in which their respective insights can be combined to create a more comprehensive and valuable knowledge base. Given the intricate nature of human knowledge, it is imperative to categorize the knowledge that humans may generate in diverse contexts, establish methodologies for knowledge organization, and integrate the findings of AI data analysis with this knowledge to develop a new knowledge base that can assist in decision-making processes. Furthermore, it is essential to investigate how humans perceive and evaluate the capabilities of AI in diverse contexts. During human–AI collaboration, the AI has become a teammate with a human-like role. In practical scenarios, humans can discern the potential advantages and supplementary functions of AI through the provision of explanatory data and instructions by AI. Furthermore, humans are guided by AI to achieve a genuine complementarity between humans and AI. The realization of complementarity between humans and AI necessitates a focus on the collaborative and communicative aspects between these two entities. As AI develops, it will gradually gain autonomy and initiative. It will be able to judge the support that human intelligence can provide in the current task through proactive situational awareness. It will then be able to provide feedback and suggestions to human beings, seek human help, and stimulate human creativity and initiative in this way. Accordingly, the establishment of a two-way communication channel in human–AI collaboration is a crucial element in attaining complementary advantages.
In practical situations, effective interaction and collaboration between two entities can be facilitated by a comprehensive understanding of human intent and the capabilities of AI. In the context of interaction scenarios, it is of great importance to investigate the ways in which AI can be utilized to enhance human intelligence. This can be achieved, for instance, by facilitating the collection, organization, and sharing of information and expertise through the introduction of AI participation in crowdsourcing efforts. Furthermore, in the process of human access to knowledge, it is essential to explore the ways in which AI can enhance an interactive search, improve human understanding of various domains, and satisfy the needs and cognitive abilities of different users. In collaboration scenarios, the utilization of AI capabilities in an aging society is a promising area of research. Potential applications include the modeling of health-related domain knowledge to assist in the creation of AI models that can collaborate with caregivers to complete care tasks and reach intelligent care. In the field of scientific discovery, the introduction of sophisticated AI tools can facilitate collaboration with human experts through interactive AI to drive scientific innovation acceleration. It can be posited that future research will be situated at the crossroads of humans and AI, facilitating the integration of disparate subject matter, and drawing upon the strengths of both to achieve efficacious collaboration.