Hostname: page-component-54dcc4c588-m259h Total loading time: 0 Render date: 2025-09-24T00:39:17.966Z Has data issue: false hasContentIssue false

Exploring dynamic movement representations in context using generative AI: effects of media types on the evaluation of service robot morphology

Published online by Cambridge University Press:  27 August 2025

Yong-Gyun Ghim*
Affiliation:
University of Cincinnati, USA

Abstract:

With the increase of service robots, understanding how people perceive their human-likeness and capabilities in use contexts is crucial. Advancements in generative AI offer the potential to create realistic, dynamic video representations of robots in motion. This study introduces an AI-assisted workflow for creating video representations of robots for evaluation studies. As a comparative study, it explores the effect of AI-generated videos on people's perceptions of robot designs in three service contexts. Nine video clips depicting robots in motion were created and presented in an online survey. Videos increased human-likeness perceptions for supermarket robots but had the same effect on restaurant and delivery robots as images. Perceptions of capabilities showed negligible differences between media types. No significant differences in the effectiveness of communication were found.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s) 2025

1. Introduction

People assume a robot's capabilities from its appearance when they encounter or interact with it (Reference Haring, Watanabe, Velonaki, Tossell and FinomoreSims et al., 2005; Reference Kwon, Jung and KnepperKwon et al., 2016; Reference Sims, Chin, Sushil, Barber, Ballion, Clark, Garfield, Dolezal, Shumaker and FinkelsteinHaring et al., 2018). This initial expectation is crucial, as the acceptance of a robot often depends on whether its actual performance aligns with the expectation (Reference LohseLohse, 2011), with its level of human-likeness playing a significant role (Reference Dubois-Sage, Jacquet, Jamet and BaratginDubois-Sage et al., 2023). People not only tend to anthropomorphize robots (Reference Damiano and DumouchelDamiano & Dumouchel, 2018), but they also expect greater capabilities from anthropomorphic robots (Reference Phillips, Zhao, Ullman and MallePhillips et al., 2018). Therefore, the human-likeness level of a robot should be carefully determined to match its function and make it more acceptable (Reference FinkFink, 2012). At the same time, people's expectations are dependent on the usage context (Reference Roesler, Naendrup-Poell, Manzey and OnnaschGoetz et al., 2003; Reference Goetz, Kiesler and Powers Roesler et al., 2022), in which the task nature affects the robot's design and its appropriate human-likeness level. For tasks involving close and frequent social interactions, people expect a higher level of human-likeness in a robot's appearance. On the other hand, less anthropomorphic robots are preferred for tasks requiring high physical demands (Reference Roesler, Naendrup-Poell, Manzey and OnnaschRoesler et al., 2022). While service robots are expected to grow in number and application areas (Reference LeeLee, 2021), their morphology remains much to be explored, particularly in terms of determining the optimal level of human-likeness. Given the wide range of service contexts with varying degrees of social interaction, the human-likeness level of a service robot should be defined carefully based on the nature of its intended usage context.

This raises a key question: how can designers determine the proper level of human-likeness for a service robot in relation to its usage context and design it accordingly, especially in the early stages of the design process? In human-robot interaction (HRI) research, various media types - ranging from text scenarios to images, videos, virtual reality (VR), and to live physical robots - have been studied as means of evaluating people's perceptions of a robot's utility, design, and behavior (Reference Mara, Stein, Latoschik, Lugrin, Schreiner, Hostettler and AppelXu et al., 2012; Reference Jung, Lazaro and YunJung et al., 2021; Reference Xu, Ng, Cheong, Tan, Wong, Tay and ParkMara et al., 2021). Though low-fidelity media, photos and videos have often been used in larger-scale studies and for early-stage design evaluations. Considering the high cost of robot development, establishing a design direction early using these media can be beneficial. However, while photos or still images can be effective media for evaluating robot designs and user perceptions, they fall short in capturing movement and interaction (Reference Randall and SabanovicRandall & Sabanovic, 2023), which are important factors influencing perceptions of human-likeness. As Dubois-Sage et al. (Reference Dubois-Sage, Jacquet, Jamet and Baratgin2023) highlight, appearance, behavior, movement, and voice are four factors that affect anthropomorphism in robot design, among which only appearance can be captured through still images. On the other hand, videos can convey usage context and dynamic movements or interactions of robots more vividly (Reference Jung, Lazaro and YunJung et al., 2021). Although not without limitations, such as challenges in examining social interactions (Reference Bainbridge, Hart, Kim and ScassellatiBainbridge et al., 2011), videos have been found to induce responses similar to live interactions with physically present robots when evaluating perceptions of robots' attributes (Reference Mara, Stein, Latoschik, Lugrin, Schreiner, Hostettler and AppelWoods et al., 2006; Reference Woods, Walters, Koay and DautenhahnMara et al., 2021).

Building on the strengths of videos in better depicting usage contexts and robot movements, this paper extends the author's previous study on the relationship between usage context, perceived capabilities, and the level of human-likeness of service robots (Reference GhimGhim, 2024). Using rendered images as evaluation stimuli in three service contexts - restaurants, supermarkets, and delivery - this earlier study revealed preferences for human-likeness levels according to the service context. Across all three contexts, the least human-like designs were rated positively for all capabilities, while a wide range of human-likeness was favored for restaurant robots. It also demonstrated the effectiveness of AI-generated photorealistic, in-context renderings from ideation sketches in communicating robot designs and environments. By comparing with the results from the previous study, this paper examines whether presenting robots in motion through videos enhances the communication of design and context, and whether it elicits different evaluation results compared to still images. Additionally, it proposes a novel method for creating evaluation media for HRI research by leveraging generative AI's ability to create images and videos easily and quickly (Reference Han and CaiHan & Cai, 2023). Through an iterative, co-creative process with AI, short video clips depicting robots in motion within context were generated and presented in an online survey.

The following sections detail the rationale and workflow for co-creating robot video representations with generative AI, the evaluation study procedure, results, and a discussion of findings. This paper contributes to the fields of HRI and design by introducing an AI-assisted workflow for quickly generating video representations of robot designs in motion and context for evaluation studies and by comparing how different media types - still images versus videos - induce different perceptions.

2. Evaluation methods

2.1. Evaluation media and generative AI

For a more accurate evaluation of the perception of robot designs, it is essential to present them in realistic representations, situated within a context, and shown in motion. To understand people's preferences and guide design decisions, it is also important to show multiple design variations for comparison. Conventional industrial design practices for new product development involve creating renderings from CAD models after the initial ideation phase to gather stakeholder feedback, evaluate designs, and make key decisions (Reference Kim and LeeKim & Lee, 2016). Although this occurs before further investment is made in product development, it still requires significant time and resources. The unique nature of robots - with their autonomous and dynamic behavior, strict engineering constraints, and people's relative unfamiliarity with them in real life - adds complexity to design evaluations. Of course, the level of detail and focus in design representations for research purposes differ from those for mass-produced products. Nonetheless, creating representations of robot designs for evaluation studies has consistently required considerable time and effort. Finding a way to reduce this complexity while maintaining or even enhancing the realistic depiction of a robot's design and behavior would significantly improve understanding of the relationship between robot morphology and human perception.

Since generative AI tools became widely accessible in 2022, there has been an explosion of generative AI visualization tools, including image generators like DALL-E, Midjourney, and Stable Diffusion, as well as video generators like OpenAI's Sora, Luma AI's Dream Machine, Pika Labs, and RunwayML. Designers across disciplines quickly recognized their potential for facilitating ideation and began incorporating these tools, particularly image generators, into their design processes (Reference Shi, Gao, Jiao and CaoShi et al., 2023). For instance, designers use these tools to create image boards for inspiration, visualize early ideas quickly into realistic renderings, and generate design variations. Their strengths lie in quickly generating realistic images and facilitating divergent ideation (Reference Zhang, Wang, Pangaro, Martelaro and ByrneChiou et al., 2023; Reference Chiou, Hung, Liang and WangZhang et al., 2023). Generative image AI tools are also capable of creating contextual images and integrating them into scenes, such as the generative fill feature in Adobe Photoshop, which helps viewers better understand the design's context. These capabilities align with what is desired for the evaluation media mentioned above, highlighting the potential of generative AI for design evaluations in HRI. Among the numerous generative image AI tools, Vizcom (vizcom.ai) - a sketch-to-image generator - has gained significant attention from industrial designers for its ability to quickly turn hand-drawn sketches into photorealistic renderings. Unlike text-to-image generators, tools like Vizcom help preserve the original design intent while enabling the exploration of design variations.

Generative AI for video generation enables general users to create realistic and dynamic content from textual or visual inputs easily while significantly reducing manual effort and costs for video creation through its automation capability (Reference Zhou, Wang, Liu, Hao, Hui, Tarkoma and KangasharjuZhou et al., 2024). Image-to-video generators, in particular, use an image as input to produce a sequence of frames with temporal consistency (Reference Xu, Nie, Liu, Liu, Kautz, Wang and VahdatXu et al., 2024). HRI research can harness such strengths of AI video generators to create videos for an evaluation study with ease. By animating photorealistic renderings, the motion and context of robots can be better presented.

2.2. Preparation of evaluation media

To examine the relationship between robot morphology and context, three service contexts were selected for this study: restaurants, supermarkets, and delivery services, with the assumption that the degree of social interaction and the desired level of human-likeness would decrease progressively across these contexts in that order. A senior undergraduate student in the industrial design program was recruited to develop the evaluation materials for this study. The student had prior experience designing a service robot in one of the author's design studio courses, where he was also exposed to robot designs for the selected service contexts through the work of his classmates. The co-creation process with AI for generating robot video representations involved three steps, as illustrated in Figure 1.

Figure 1. AI-assisted workflow for video creation

Step 1 - Sketching: At the beginning of the study, the student was provided with a positioning map on which the corresponding human-likeness levels across the three contexts were mapped according to the assumption outlined above. Based on this map, the student ideated and drew sketches of robots for each context by hand. Each sketch was drawn in a front three-quarter view, in black on a white background.

Step 2 - Rendering: These sketches were then uploaded to Vizcom, resulting in initial renderings without contextual backgrounds. These early renderings contained unintended or misinterpreted elements that required manual adjustments in Photoshop. Contextual background images were also generated in Photoshop and integrated with the refined renderings. The updated images were uploaded to Vizcom again for further adjustments and to create design variations. This process resulted in a variety of robot designs with different levels of human-likeness, all rendered realistically in their respective service contexts. Three renderings were selected for each context to be used in the evaluation study: a base model that most truthfully reflected the original design, and two variations that clearly depicted different levels of human-likeness from one another. In total, nine robot designs across three contexts were prepared as photorealistic in-context renderings.

Step 3 - Animation: Each rendering was then transformed into a five-second-long animated video using RunwayML (runwayml.com). The renderings for each context were uploaded to RunwayML, with prompts specifically defining the robot's locomotion and restating the environmental and functional context. Figure 2 shows a sequence of video scenes for the base model of the restaurant robot, generated by RunwayML: the robot is seen from slightly above, navigating forward between tables and chairs, and slowly turning to the right. Using the final rendering in step 2 as the starting scene, the following prompt was given to the AI for the generation of the video. “A sleek, mobile service robot moves backward. The robot advances at a slow, controlled pace through the narrow space between tables and chairs in a well-lit dining area. It navigates obstacles smoothly, making slight adjustments to maintain backward motion.” Because the video generator repeatedly produced animations of the robot moving backward despite the prompt specifying forward movement, backward was used instead in the final prompt to resolve the issue.

Figure 2. AI-generated video scenes for the base model of a restaurant robot

In the same manner, videos for supermarket and delivery robots were generated, with prompts slightly adjusted to better match the robot designs, their orientations in the scene, and the environments. For example, Figure 3 shows both the rendering and scenes from a video for one of the supermarket robot designs. In the video, the robot slowly advances toward the camera, leaving the grocery shelves behind, and then rotates its body counterclockwise at the end of the video. The videos for the other two design variations in the supermarket context exhibit similar robot movements.

Figure 3. Rendering and video scenes for a supermarket robot

For the delivery context (Figure 4), each of the three videos shows the corresponding robot moving forward along a street in an urban area, with its wheels' rotation clearly visible.

Figure 4. Rendering and video scenes for a delivery robot

Compared to the other two contexts, the videos for the delivery context show little to no rotational movement of the robots. In total, nine videos across three contexts were generated from the renderings, ready to be used for the evaluation study. The evaluation procedure is explained in the next section.

2.3. Evaluation procedure

An online survey questionnaire was prepared using the videos described in the previous section. The survey consisted of four sections, maintaining the same structure and questions as the previous evaluation study conducted by the author (Reference GhimGhim, 2024), except that videos were used as evaluation media in place of renderings, allowing for a comparison between different media types.

The first section began with an informed consent form that described the research purpose, procedures, potential risks and benefits, confidentiality details, and the participant's right to withdraw, followed by basic demographic questions. The second section asked questions related to the restaurant context. Renderings of the three robot designs for restaurants were presented at the beginning, arranged horizontally in a row, along with a one-sentence description of the robots' main tasks and operating environment. Then, a video clip of the first robot was presented, immediately followed by three questions measuring perceived capabilities on a 7-point Likert scale: comfort level, performance effectiveness, and suitability to the operating environment. The first two capabilities represent warmth and competency, respectively. As two underlying dimensions of robot perception adopted from social psychology (Reference Carpinella, Wyman, Perez and StroessnerCarpinella et al., 2017), warmth relates to social, emotional attributes while competence represents intelligence or ability. These two dimensions were represented as comfort and performance in the survey to better communicate their respective social aspects and functional capabilities. The third question measured the relevance of the robot's appearance to the context. The same questions were repeated for the other two designs, following each corresponding video clip. Next, the videos for all three robot designs were presented again simultaneously, cropped in the middle, and arranged horizontally in a row. Additional questions were posed, including 1) an open-ended question asking about the reasons behind the ratings of perceived capabilities, 2) the effectiveness of the videos in communicating the designs and context, rated on a 7-point Likert scale, and 3) the level of human-likeness of each robot design, rated on a scale with selected robot images from the Anthropomorphic Robot Database (ABOT) and their associated human-likeness scores (Reference Phillips, Zhao, Ullman and MallePhillips et al., 2018), as illustrated in Figure 5 (Reference GhimGhim, 2024). It should be noted that these reference robots were presented as still images, although animated videos were used to represent the robot designs. The third and fourth sections followed the same sequence and questions as the second section. A question with a seemingly obvious answer was added at the end of each section to screen for false responses.

Figure 5. A human-likeness scale based on ABOT (Reference GhimGhim, 2024)

The online survey was conducted in November 2024. Design students at the University of Cincinnati in the United States participated after reading the consent form and voluntarily consenting to participate. While seated in a classroom, participants accessed the survey questionnaire via their own laptops or mobile phones. 54 responses were collected, of which five were screened out as false responses.

3. Results and discussion

49 responses (25 male, 22 female, 2 other) were analyzed for the study, with participants aged between 18 and 35 (M = 21.69, SD = 3.4). To check the reliability of the results, internal consistency was calculated with Cronbach’s α on responses to the rating questions for each of the three contexts except the human-likeness ratings: the restaurant (α = 0.632), supermarket (α = 0.788), and delivery (α = 0.726), All these values fall within the acceptable range (Reference HajjarHajjar, 2018). The in-depth analysis of the results is described in the following subsections, with a comparison to the results from the previous study conducted in March 2024 with 36 participants (Reference GhimGhim, 2024). There was no overlap of participants between the two studies.

3.1. Human-likeness level

The results of the human-likeness scores are shown in Figure 6, along with the robot designs presented in the survey. The scores from the previous study, rated based on still images, are also overlaid below the horizontal axes for comparative analysis.

Figure 6. Results of human-likeness scores by media type

The results of a two-tailed t-test show significant differences between the two media types for the supermarket robots across all three designs (Variation 1: p = 0.045; Base Model: p = 0.044; Variation 2: p = 0.048). Videos increased the human-likeness score by 2.42 for Variation 1, 4.47 for Base Model, and 3.35 for Variation 2. In contrast, the t-test results show no significant difference for the restaurant robots (Base Model: p = 0.914; Variation 1: p = 0.151; Variation 2: p = 0.975) and delivery robots (Base Model: p = 0.467; Variation 1: p = 0.601; Variation 2: p = 0.600). There are barely any differences in the human-likeness of restaurant robots, except for Variation 1, whose score from the video is lower than that from the image by about 2.46. Even so, this difference is negligible according to the t-test result. Delivery robots show a slight increase in human-likeness scores with the results from videos, ranging from 0.28 to 0.85, but these are all negligible. Subsequently, it can be considered that showing the movement of robots through videos had the same effect as still images regarding how people perceive the human-likeness of the restaurant and delivery robots in this study, while videos contributed to an increased perception of human-likeness in the supermarket context.

An analysis of the open-ended question on the reasons for the ratings of perceived capabilities provides insight into this conflicting finding. Based on the most frequently recurring themes in the responses, three categories were defined: 1) human-likeness and emotional responses, 2) functional aspects, and 3) aesthetic qualities. Words and phrases in the responses were analyzed and grouped according to these categories. For supermarket robots, the majority of responses fell into the first category (28 out of 49, or 57.1%), with multiple references to arms, head, and eyes, compared to 21 (42.9%) in the second category and none in the third. This indicates a heightened effect of anthropomorphism and subsequent emotional responses. A variety of emotions were noted, including scary, uneasy, creepy, threatening, and friendly, with many contradictory comments about Variation 2's human-likeness causing either friendly or uncomfortable feelings. In contrast, responses for both the restaurant and delivery contexts showed a shift in focus to aesthetic qualities or functionality over human-likeness and emotions. For instance, aesthetic qualities were emphasized 8.2% more than human-likeness in the restaurant context, with references to robot size and form qualities such as rounded, sharp, sleek, and organic. For delivery robots, functional aspects drew the most attention, as represented by words like sturdy, mobility, environment, and utilitarian. Several participants also mentioned that human interactions or features were unnecessary for this context, indicating a consideration of usage contexts when assessing robot designs and capabilities. The difference in participants' focus based on context may explain the observed increase in perceived human-likeness for supermarket robots in this study.

3.2. Perceived capabilities

Figure 7 shows the ratings for comfort, effectiveness of task performance, and suitability to the environment for each robot design. The hatched bars represent results from the previous study using still images, while the solid bars represent results from the current study using video media. Each chart is divided into two sections - white above and gray below - to distinguish positive and negative ratings.

Figure 7. Results of perceived capabilities

3.2.1. Restaurant robots

One-way ANOVA tests for the current study, performed separately for each capability category, indicate no significant differences among the three restaurant robot designs for comfort level (F(2, 144) = 1.684, p = 0.189), performance effectiveness (F(2, 144) = 2.198, p = 0.115), and suitability (F(2, 144) = 0.829, p = 0.439), consistent with the findings of the previous study. These results, combined with the results of the human-likeness levels, suggest that the human-likeness level within this study's score range did not significantly affect perceived capabilities of restaurant robots, reaffirming the findings of the previous study. Additionally, according to a t-test, no significant differences were found between the two media types for all the capability categories across designs, with all p-values above 0.05. The shift from still images to videos did not result in different perceptions of restaurant robots' capabilities.

3.2.2. Supermarket robots

For supermarket robots in the current study, ANOVA results suggest significant differences among the three designs for performance (F(2, 144) = 5.820, p = 0.004) and suitability (F(2, 144) = 4.865, p = 0.009), while no significant differences were found for comfort (F(2, 144) = 2.865, p = 0.06). A t-test comparison with the previous study showed a significant difference only in the suitability rating of Variation 1, the least human-like robot in the supermarket context (p = 0.014), with the mean value decreasing from 5.44 (image) to 4.57 (video). Although no significant differences were found for Variation 1 in comfort or performance ratings, slight decreases were observed in the mean values for comfort (from 4.92 to 4.63) and performance (from 5.19 to 4.78), in addition to suitability. This suggests that the overall perception of Variation 1's capabilities slightly diminished when presented through videos, in contrast to its increased human-likeness score. In the previous study, Variation 1 was rated highest across all three capabilities. However, its dominance weakened in the current study. Not only did the mean differences between Variation 1 and the other designs decrease, but Variation 2 was also rated higher on performance. Several responses to the open-ended question mentioned Variation 1's intimidating look, attributed to its eyes, and its lack of grocery-picking functionality. This aspect was likely better captured in the robot's rotational motion in the video by fully revealing its front. Aside from this, the media change from images to videos did not result in significant differences in the evaluation.

3.2.3. Delivery robots

For delivery robots in the current study, ANOVA results indicate significant differences among the three designs for comfort (F(2, 144) = 10.367, p 0.001), performance (F(2, 144) = 8.218, p 0.001), and suitability (F(2, 144) = 16.473, p 0.001). Variation 2, the most human-like design, was rated the lowest across all capabilities. The most notable difference between the effects of images and videos is the increase in comfort for Variation 2 (image: M = 3.31; video: M = 4.39), with a t-test result of p = 0.001. Despite this increase, Variation 2's comfort rating remained the lowest among the three designs, exhibiting the same tendency as the image-based results. Aside from this, t-tests revealed no significant differences between the two media types across designs and capabilities, although the rankings of Base Model and Variation 1 for suitability were reversed with videos. Overall, the media change from images to videos had a limited effect on participants' perceptions of delivery robots.

3.3. Effectiveness of representations

The final category of this study measured how well the evaluation media communicated the design and intended environment of the robots. The participants responded with 83.7%, 73.5%, and 77.6% of positive ratings for the restaurant, supermarket, and delivery context respectively, for which ANOVA test results suggest no significant differences (F(2, 144) = 1.022, p = 0.362). Figure 8 shows the ratings segmented along the scale and their comparison with the results from the previous study. Positive ratings for the restaurant context increased by 11.5% with video media. However, a t-test comparing the means indicates no significant difference between the two media types (p = 0.091). Conversely, both the supermarket and delivery contexts showed a decrease in positive ratings with videos by 4.3% and 5.7%, respectively. No significant differences between media types were found for either context.

Figure 8. Comparison of media types on the effectiveness of design and context communication

The author speculates that the increase in positive ratings for the restaurant context is due to the videos providing a clearer depiction of the restaurant environment, with the camera changing angles as the scenes progress, whereas the renderings showed only a few tables and chairs, making the environment appear relatively ambiguous (Figure 9). On the other hand, the renderings and initial video scenes for the supermarket and delivery contexts conveyed their environments with more detail, such as shelves filled with groceries or streets with buildings, while the camera angle remained fixed throughout the videos (Figures 3 & 4). As this is speculative, further research, including in-depth interviews or more systematic image and video analyses, is needed to uncover the actual reasons. In summary, presenting robot designs and contexts through videos was found to be generally effective, with positive ratings ranging from 73.5% to 83.7%. While the impact of videos on enhancing communication appears to depend on the specific context, no significant differences were found between images and videos.

3.4. Limitations and future studies

This study is limited in several aspects. First, the participant sample size and age range were limited, restricting the generalizability of the findings. Having only design students as survey participants is another limitation, as they are more trained in visual reasoning than the general population. A large-scale study with a broader population is needed to enhance reliability. Second, the relationship between a robot's human-likeness and perceived capabilities was not fully examined. Besides contextual influences, it involves multiple factors and design elements, such as aesthetic qualities, robot size, and the visualization of functional components for task performance. Future research should incorporate in-depth interviews and qualitative methods to explore these aspects. Third, the robot designs in this study lacked a detailed representation of functional components pertinent to their intended tasks, causing confusion among some participants when assessing the robots' capabilities. Future studies should ensure functional components are better integrated into designs and clearly visualized. Finally, the reasons for the overall similarity in effects on perceptions between videos and images were not discussed in depth here. While the author speculates that the main cause was likely the limited depiction of robot motions in the videos - resulting from short duration, lack of manipulators in most designs, and confinement of robot behavior to simple translational and rotational locomotion - this aspect requires further examination through enhanced videos that provide a better representation of dynamic robot movements.

Figure 9. Comparison of the rendering and the final scene of the video for a restaurant robot

4. Conclusion

This paper examined how presenting robots in motion through videos affects people's perceptions of human-likeness and capabilities across three service contexts: restaurants, supermarkets, and delivery. Using an image-to-video generative AI tool, nine short video clips of robots in motion and context were created and employed as evaluation materials for an online survey. Comparisons with a previous study based on renderings yielded several findings regarding the effects of different evaluation media types between images and videos. First, videos did not alter perceptions of human-likeness for restaurant and delivery robots, while they increased the human-likeness of supermarket robots. Second, the change in evaluation media from images to videos had negligible or minimal effects on perceptions of comfort, performance, and suitability. Significant differences were found only for Variation 1's suitability in the supermarket context and Variation 2's comfort level in the delivery context. Finally, videos were found to be generally effective in presenting robot designs and their contexts. However, no significant differences were found between images and videos. In summary, images are as effective as videos for an evaluation study of robot morphology, though subtle differences warrant further investigation.

References

Bainbridge, W. A., Hart, J. W., Kim, E. S., & Scassellati, B. (2011). The benefits of interactions with physically present robots over video-displayed agents. International Journal of Social Robotics, 3, 4152. https://doi.org/10.1007/s12369-010-0082-7 CrossRefGoogle Scholar
Carpinella, C. M., Wyman, A. B., Perez, M. A., & Stroessner, S. J. (2017). The robotic social attributes scale (RoSAS): development and validation. In Proceedings of the 2017 ACM/IEEE International Conference on human-robot interaction (pp. 254262). https://doi.org/10.1145/2909824.3020208 CrossRefGoogle Scholar
Chiou, L. Y., Hung, P. K., Liang, R. H., & Wang, C. T. (2023). Designing with AI: an exploration of co-ideation with image generators. In Proceedings of the 2023 ACM designing interactive systems conference (pp. 19411954). https://doi.org/10.1145/3563657.3596001 CrossRefGoogle Scholar
Damiano, L., & Dumouchel, P. (2018). Anthropomorphism in human–robot co-evolution. Frontiers in psychology, 9, 468. https://doi.org/10.3389/fpsyg.2018.00468 CrossRefGoogle Scholar
Dubois-Sage, M., Jacquet, B., Jamet, F., & Baratgin, J. (2023). We do not anthropomorphize a robot based only on its cover: context matters too!. Applied Sciences, 13(15), 8743. https://doi.org/10.3390/app13158743 CrossRefGoogle Scholar
Fink, J. (2012). Anthropomorphism and human likeness in the design of robots and human-robot interaction. In Social Robotics: 4th International Conference, ICSR 2012 (pp. 199208). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-34103-8_20 CrossRefGoogle Scholar
Ghim, Y. G. (2024). Form Follows Context: Exploring the Effect of Usage Context on Human-likeness of Mobile Service Robots Using Generative AI. Design Management Journal, 19(1), 95107. https://doi.org/10.1111/dmj.12099 CrossRefGoogle Scholar
Goetz, J., Kiesler, S., & Powers, A. (2003). Matching robot appearance and behavior to tasks to improve human-robot cooperation. In The 12th IEEE International Workshop on Robot and Human Interactive Communication, 2003. (pp. 5560). IEEE. https://doi.org/10.1109/roman.2003.1251796 CrossRefGoogle Scholar
Hajjar, S. T. (2018). Statistical analysis: Internal-consistency reliability and construct validity. International Journal of Quantitative and Qualitative Research Methods, 6(1), 2738.Google Scholar
Han, A., & Cai, Z. (2023). Design implications of generative AI systems for visual storytelling for young learners. In Proceedings of the 22nd Annual ACM Interaction Design and Children Conference (pp. 470474). https://doi.org/10.1145/3585088.3593867 CrossRefGoogle Scholar
Haring, K. S., Watanabe, K., Velonaki, M., Tossell, C. C., & Finomore, V. (2018). FFAB - The form function attribution bias in human-robot interaction. IEEE Transactions on Cognitive and Developmental Systems, 10(4), 843851. IEEE. https://doi.org/10.1109/tcds.2018.2851569 CrossRefGoogle Scholar
Jung, M., Lazaro, M. J. S., & Yun, M. H. (2021). Evaluation of methodologies and measures on the usability of social robots: A systematic review. Applied Sciences, 11(4), 1388. https://doi.org/10.3390/app11041388 CrossRefGoogle Scholar
Kim, K., & Lee, K. P. (2016). Collaborative product design processes of industrial design and engineering design in consumer product companies. Design Studies, 46, 226260. https://doi.org/10.1016/j.destud.2016.06.003 CrossRefGoogle Scholar
Kwon, M., Jung, M. F., & Knepper, R. A. (2016). Human expectations of social robots. In 2016 11th ACM/IEEE international conference on human-robot interaction (HRI) (pp. 463464). IEEE. https://doi.org/10.1109/hri.2016.7451807 CrossRefGoogle Scholar
Lee, I. (2021). Service robots: a systematic literature review. Electronics, 10(21), 2658. https://doi.org/10.3390/electronics10212658 CrossRefGoogle Scholar
Lohse, M. (2011). Bridging the gap between users' expectations and system evaluations. In 20th International Symposium on Robot and Human Interactive Communication (RO-MAN) (pp. 485490). IEEE. https://doi.org/10.1109/roman.2011.6005252 CrossRefGoogle Scholar
Mara, M., Stein, J. P., Latoschik, M. E., Lugrin, B., Schreiner, C., Hostettler, R., & Appel, M. (2021). User responses to a humanoid robot observed in real life, virtual reality, 3D and 2D. Frontiers in psychology, 12, 633178. https://doi.org/10.3389/fpsyg.2021.633178 CrossRefGoogle Scholar
Phillips, E., Zhao, X., Ullman, D., & Malle, B. F. (2018). What is human-like? decomposing robots' human-like appearance using the anthropomorphic robot (abot) database. In Proceedings of the 2018 ACM/IEEE international conference on human-robot interaction (pp. 105113). https://doi.org/10.1145/3171221.3171268 CrossRefGoogle Scholar
Randall, N., & Sabanovic, S. (2023). A picture might be worth a thousand words, but it's not always enough to evaluate robots. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (pp. 437445). https://doi.org/10.1145/3568162.3576970 CrossRefGoogle Scholar
Roesler, E., Naendrup-Poell, L.., Manzey, D., & Onnasch, L. (2022). Why context matters: the influence of application domain on preferred degree of anthropomorphism and gender attribution in human–robot interaction. International Journal of Social Robotics, 14(5), 11551166. https://doi.org/10.1007/s12369-021-00860-z CrossRefGoogle Scholar
Shi, Y., Gao, T., Jiao, X., & Cao, N. (2023). Understanding design collaboration between designers and artificial intelligence: a systematic literature review. In Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2), 135. https://doi.org/10.1145/3610217 CrossRefGoogle Scholar
Sims, V. K., Chin, M. G., Sushil, D. J., Barber, D. J., Ballion, T., Clark, B. R., Garfield, K. A., Dolezal, M. J., Shumaker, R., & Finkelstein, N. (2005). Anthropomorphism of robotic forms: A response to affordances?. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 49, No. 3, pp. 602605). Sage CA: Los Angeles, CA: SAGE Publications. https://doi.org/10.1037/e577392012-082 CrossRefGoogle Scholar
Woods, S. N., Walters, M. L., Koay, K. L., & Dautenhahn, K. (2006). Methodological issues in HRI: A comparison of live and video-based methods in robot to human approach direction trials. In ROMAN 2006-the 15th IEEE international symposium on robot and human interactive communication (pp. 5158). IEEE. https://doi.org/10.1109/roman.2006.314394 CrossRefGoogle Scholar
Xu, D., Nie, W., Liu, C., Liu, S., Kautz, J., Wang, Z., & Vahdat, A. (2024). CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation. arXiv preprint arXiv:2406.02509.Google Scholar
Xu, Q., Ng, J. S. L., Cheong, Y. L., Tan, O. Y., Wong, J. B., Tay, B. T. C., & Park, T. (2012). Effect of scenario media on human-robot interaction evaluation. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction (pp. 275276). https://doi.org/10.1145/2157689.2157791 CrossRefGoogle Scholar
Zhang, C., Wang, W., Pangaro, P., Martelaro, N., & Byrne, D. (2023). Generative image AI using design sketches as input: Opportunities and challenges. In Proceedings of the 15th Conference on Creativity and Cognition (pp. 254261). https://doi.org/10.1145/3591196.3596820 CrossRefGoogle Scholar
Zhou, P., Wang, L., Liu, Z., Hao, Y., Hui, P., Tarkoma, S., & Kangasharju, J. (2024). A survey on generative ai and llm for video generation, understanding, and streaming. arXiv preprint arXiv:2404.16038.10.36227/techrxiv.171172801.19993069/v1CrossRefGoogle Scholar
Figure 0

Figure 1. AI-assisted workflow for video creation

Figure 1

Figure 2. AI-generated video scenes for the base model of a restaurant robot

Figure 2

Figure 3. Rendering and video scenes for a supermarket robot

Figure 3

Figure 4. Rendering and video scenes for a delivery robot

Figure 4

Figure 5. A human-likeness scale based on ABOT (Ghim, 2024)

Figure 5

Figure 6. Results of human-likeness scores by media type

Figure 6

Figure 7. Results of perceived capabilities

Figure 7

Figure 8. Comparison of media types on the effectiveness of design and context communication

Figure 8

Figure 9. Comparison of the rendering and the final scene of the video for a restaurant robot