1. Introduction
In 1927, the landmark film Metropolis by Fritz Lang presented a mechanical being capable of mimicking human actions, creating chaos and conflicts. Fast forward to today, we witness that the rapid progress and spread of artificial intelligence (AI) technologies bring forth new opportunities and dangers across nearly every sector of industry and society, making it essential to devise thorough strategies to prepare adequately for upcoming challenges.
Yet, in contrast to the treacherous automaton Maria described in Metropolis, AI can also be characterized by its enabling role - augmenting our skills in writing, operating vehicles, and managing data sets. AI and LLM (Large Language Model) have become vital across various digitally oriented sectors. The present research, bolstered by the hopeful perspective of numerous recent studies, highlights the promise of AI and LLM as transformative technologies (Reference Păvăloaia and NeculaPăvăloaia & Necula, 2023).
The fundamental message is that AI is not intended to undermine human power but seeks to improve our everyday experiences. This concept also extends to the educational sector, a vital area in nurturing a skilled future workforce and, by extension, the future of society. This study explored the impact and potential benefits of AI-generated feedback for students in the context of teaching ideation. The study aims to address the following research question:
(RQ) To what extent can students profit from AI-generated feedback comments during ideation?
2. Theoretical concept
2.1. Human-AI collaboration in design education
A recent publication, Artificial Intelligence and Education by Holmes et al. Reference Holmes, Chounta, Dimitrova, Wasson and Persson(2022), explores the application of AI in education through the lens of the Council of Europe’s core values: human rights, democracy, and the rule of law. The report highlights that AI increasingly shapes education, offering opportunities and challenges (Reference Holmes, Chounta, Dimitrova, Wasson and PerssonHolmes et al., 2022, p. 9). The influence of AI spans numerous domains, including design education, where generative design technologies are challenging the roles of designers, urging them to adapt their abilities to effectively leverage these innovations (Păvăloaia & Necula, Reference Păvăloaia and Necula2023; Saadi & Yang, Reference Saadi and Yang2023). However, perception tends to be skeptical, with studies indicating that AI is often regarded as less competent than human designers, especially in fulfilling specific design criteria (Reference Chong and YangChong & Yang, 2023). With the emergence of LLMs, dialogues have transitioned toward the dangers and ethical dilemmas associated with AI in design education. Despite these worries, AI has already become ingrained in various activities, generating advantages and challenges in education. Advancements in AI have reshaped traditional design processes, particularly in engineering and product design. These technologies not only automate and optimize stages from conceptualization to evaluation but also require selecting appropriate AI techniques for specific contexts to maximize benefits (Reference Yüksel, Börklü, Sezer and CanyurtYüksel et al., 2023). Generative design technologies are redefining the role of designers, necessitating skill adaptation to leverage these tools effectively (Reference Saadi and YangSaadi & Yang, 2023). However, as AI technology advances, its integration into design education presents an opportunity to redefine collaboration between human designers and intelligent systems, ultimately enhancing creativity and efficiency. AI is increasingly operating as a collaborative educational partner, acting as both a mentor and a companion, fostering students’ intellectual and emotional growth (Reference Kim, Lee and ChoKim et al., 2022). This raises the question: How can generative AI be integrated into design education? According to Dwivedi et al. Reference Dwivedi, Kshetri, Hughes, Slade, Jeyaraj, Kar, Baabdullah, Koohang, Raghavan, Ahuja, Albanna, Albashrawi, Al-Busaidi, Balakrishnan, Barlette, Basu, Bose, Brooks, Buhalis and Wright(2023), ChatGPT can generate highly polished text comparable to human writing. According to Dwivedi et al. Reference Dwivedi, Kshetri, Hughes, Slade, Jeyaraj, Kar, Baabdullah, Koohang, Raghavan, Ahuja, Albanna, Albashrawi, Al-Busaidi, Balakrishnan, Barlette, Basu, Bose, Brooks, Buhalis and Wright(2023), ChatGPT can generate highly polished text comparable to human writing.
This research investigates whether AI can partner with designers by offering constructive feedback to enhance design ideation. It examines how ChatGPT can assist and polish the creative process, evaluating its potential as a feedback instrument in design education.
2.2. Feedback in education
Effective feedback is essential in education, enabling students to refine their ideas, enhance critical thinking, and improve learning outcomes. However, as university enrollment increases, the growing student-to-professor ratio presents challenges in delivering personalized and meaningful feedback.
In Germany, student enrollment steadily rose from under two million in 2002/2003 to nearly three million in 2022/2023 (Statista, 2024). The student-to-professor ratio, averaging 1:65 (Professors supervise an average of 65 students, 2022), is considered high compared to other higher education systems. For instance, the United States maintains a ratio of approximately 18:1 (Reference Irwin, Zhang, Wang, Hein, Wang, Roberts, York, Barmer, Bullock Mann, Dilig, Parker, Nachazel, Barnett and PurcellIrwin et al., 2021). Even in classes of around 40 students, educators struggle to establish close working relationships and provide detailed feedback (Reference Hevner and vom BrockeHevner & vom Brocke, 2023).
The German Science and Humanities Council (Reference Hevner and vom Brocke2022) emphasizes that feedback should be central to assessing student progress. Educators help students identify promising design concepts by offering feedback that provides guidance and direction. How can this feedback approach be effectively adapted for interactions with generative AI?
Research on generative AI tools—such as ChatGPT and DALL·E—is expanding in education. However, most studies focus on content generation or creative applications. By providing constructive feedback, educators support students in developing critical thinking and problem-solving skills (German Science and Humanities Council, 2022, p. 9). This is particularly important, as effective learning often requires multiple feedback iterations (Reference Hevner and vom BrockeHevner & vom Brocke, 2023). Could AI-driven feedback help improve the student-to-professor ratio?
Feedback is defined in various ways depending on context. Education understands feedback as providing learners with information to assess their performance and guide their improvement (Reference Athimni, Yaakoubi, Bouzaiene, McCallum and CoombeAthimni et al., 2020, pp. 330–332). More than a simple exchange of information, feedback is a complex process integral to teaching and learning (Reference Henderson, Ajjawi, Boud and MolloyHenderson et al., 2019). It can take many forms, including intentional or unintentional, immediate or delayed, and cognitive, affective, motivational, relational, or social (Reference Henderson, Ajjawi, Boud and MolloyHenderson et al., 2019). This study focuses specifically on written feedback provided to students.
3. Research method
3.1. Study design
The study design (Figure 1) employed an experimental approach to evaluate how students perceived ChatGPT-generated versus human-generated textual feedback on an ideation task. Twenty-five students participated, receiving feedback from both GPT-4 and human instructors for direct comparison.

Figure 1. Overview of study design
A mixed-methods approach combined qualitative measures from open-ended survey responses with quantitative data from structured surveys assessing perceptions of AI- and human-generated feedback. The pre-study aimed to establish a structured experimental framework, including obtaining ethical approval, refining the participant cohort, developing the ideation task, and selecting an AI tool for generating feedback. It involved optimizing the AI’s feedback through prompt engineering to ensure students found it difficult to distinguish between AI- and human-generated feedback. Lastly, students and lecturers were recruited from various universities, and a systematic approach for evaluating feedback was developed to support the core experiment, which followed a three-stage design.
In Round 01, each student developed a startup idea to mitigate academic stress using the ‘Crazy 8s’ (Reference Knapp, Zeratsky and KowitzKnapp et al., 2016, p. 118). They rapidly produced ideas in eight minutes, followed by a 15-minute refinement phase with an optional extension. The task was designed to be engaging and concise, facilitating student involvement with minimal preparation. The students received the following task description: “Get creative and develop a practical, new, and feasible startup idea that helps students reduce their stress. The stress level reflects the intensity of negative mental and physical symptoms caused by stressors in the daily (study) life. Think in detail about the startup’s concrete purpose and function. Please document the outcome in the following template as a written text.”
The ideation outcomes were then submitted for feedback.
In Round 02, human lecturers and ChatGPT evaluated student submissions using a standardized feedback form to ensure consistency. The form guided evaluators to provide a 150-word critique, structured into three key sections: strengths of the submission, improvement areas, and further development recommendations.
In Round 03, students received both AI- and human-generated feedback, with the order randomized to minimize bias—13 students received human feedback first, while 12 received AI feedback first. After reading the feedback comments, they accessed a survey link and code via email to complete the survey. This was followed by a semi-structured interview, conducted with prior consent for recording and later transcription. The outcome was 25 completed surveys and 25 transcribed interview recordings for further analysis.
3.2. Hypothesis development
This research examined current literature to pinpoint essential elements impacting feedback in concept development while evaluating the efficacy of AI-generated feedback to human feedback.
Four search phrases were employed “helpful feedback”, “AI technology in education”, “AI-created feedback”, and “feedback in idea generation”. The study’s hypotheses were then based on these findings.
Social-emotional perception. AI-generated feedback is expected to foster a positive social-emotional response, as it can be designed to encourage (Reference Sung, Guillain and SchneiderSung et al., 2025). Research by Henderson et al. Reference Henderson, Ajjawi, Boud and Molloy(2019) and Hattie and Timperley Reference Hattie and Timperley(2007) emphasize emotional support’s importance in effective feedback. When AI feedback conveys positive emotions that meet or exceed expectations, it enhances social-emotional perception (Reference Han, Yin and ZhangHan et al., 2023; Reference Mallick, Flathmann, Lancaster, Hauptman, McNeese and FreemanMallick et al., 2023). Additionally, acceptance of feedback is strongly influenced by positive support, fairness, and perceived competence (Reference SinghSingh, 2019; Reference Strijbos, Pat-El and NarcissStrijbos et al., 2021).
Encouragement and inspiration. AI-generated feedback is perceived as encouraging and inspiring because AI can provide structured, motivational, and solution-oriented feedback (Reference Paterson, Paterson, Jackson and WorkMohammed & Khalid, 2025). Research by Cui et al. Reference Cui, Thrash, Shkeyrov, Varga, Pritzker and Runco(2020) highlighted that inspiration and encouragement were critical for creativity and productivity, and AI could be designed to deliver feedback that fostered these qualities. Moreover, interacting with AI systems has inspired and encouraged creative thinking, particularly in computational creativity, where AI enhances creative processes in research and art practice (Reference EdmondsEdmonds, 2022).
Coherence. Coherence involves the clarity and relevance of feedback regarding accuracy and logical reasoning. Feedback should provide precise information tailored to students’ needs (Reference Selvaraj and AzmanSelvaraj & Azman, 2020). Research such as Paterson et al. Reference Paterson, Paterson, Jackson and Work(2020) indicates that the coherence of feedback relies on linguistic clarity and contextual significance. Aspects mentioned within feedback should be explicitly formulated to help students understand their learning (Reference Woitt, Weidlich, Jivet, Orhan Göksün, Drachsler and KalzWoitt et al., 2023). Recent research by Bai Reference Bai(2024) reveals that AI often struggles with nuanced expressions and the logical flow of language.
Another challenge is that AI-generated feedback can be influenced by biases introduced at various stages of the AI pipeline, from dataset creation to data analysis and evaluation. As a result, AI-generated feedback may reflect these biases, potentially leading to inconsistencies and reduced coherence (Reference Srinivasan and ChanderSrinivasan & Chander, 2021).
Structure and time. Structured feedback positively influences students’ work quality and perception of feedback usefulness, leading to better learning outcomes (Reference Crain and BaileyCrain & Bailey, 2022). Whether text-based or digitally recorded, structured comments enhance the learning experience by being more precise, practical, and satisfying for students (Reference Phillips, Henderson and RyanPhillips et al., 2016). The immediate feedback (time) factor is closely associated with feedback structure, as a well-structured commentary allows for quick understanding and overview (Reference Ossenberg, Henderson and MitchellOssenberg et al., 2019). A key anticipation regarding AI-generated feedback is its capacity to deliver swift responses rapidly (Reference Thomas, Gatz, Gupta, Lin, Tipper and KoedingerThomas et al., 2024), with well-structured content (Reference Escalante, Pack and BarrettEscalante et al., 2023).
Based on the abovementioned factors, this study formulated five hypotheses (H1-H5), which will be rigorously tested and subsequently confirmed or rejected:
-
H1: AI-generated feedback leaves the recipient with a positive social-emotional perception.
-
H2: AI-generated feedback is perceived as encouraging and inspiring.
-
H3: AI-generated feedback reveals weaknesses in terms of coherence.
-
H4: AI-generated feedback can be generated and transmitted fast and well-structured.
-
H5: AI-generated feedback does not significantly underperform human feedback regarding H1, H2, and H4 properties.
3.3. Participant recruitment
The students represented a variety of academic programs, including human factors engineering, architecture, computer science, human-computer interaction, design leadership, industrial design, mechanical engineering, and business administration. They were in their 5th semester of a Bachelor’s program up to the 3rd semester of a Master’s program. To ensure the experiment runs smoothly, participants should be familiar with ideation, allowing them to understand and complete the task within a limited timeframe. The participating lecturers should have at least two years of teaching experience and ideally come from a design-related field or be familiar with ideation. This ensures they can provide meaningful feedback on students’ ideation work.
Student participants were recruited through various university channels, including email and interdisciplinary project course groups. In total, 25 students from four universities in Bavaria, Germany, were recruited. Lecturers were recruited via email and represent a multidisciplinary cohort from two German universities. Their expertise spans management, strategic leadership, international relations, branding, design, media theory, and customer experience.
3.4. Data analysis
This study utilized a mixed-methods methodology to juxtapose AI-generated and human feedback, with quantitative and qualitative methods. The quantitative analysis focused on survey responses and feedback duration time. Feedback duration was measured in minutes, and categorical survey responses were assigned numeric codes. Likert-scale responses ranged from 1 (“strongly disagree”) to 5 (“strongly agree”). Three-point scale responses included 1 (negative), 2 (neutral), and 3 (positive). A benchmark of 3.0 on the 5-point Likert scale was selected due to its common application in academic contexts, as Hattie and Timperley Reference Hattie and Timperley(2007) suggest that scores exceeding 3.0 typically reflect a favorable perception (Reference Kankaraš and CapecchiKankaraš & Capecchi, 2024). Although Likert-scale data is ordinal, it is frequently treated as interval-scaled; therefore, students were guided to view the scale points as evenly spaced, facilitating metric statistical evaluation. Survey data was retrieved from LimeSurvey, with qualitative responses eliminated and incomplete submissions discarded, resulting in a refined dataset for analysis in R. Descriptive statistics were computed to evaluate central tendency and variability, confirming H1–H4. For H5, a Wilcoxon signed-rank test was conducted to compare lecturer-generated feedback (A) and AI-generated feedback (B), given the paired nature of the data and its non-normal distribution.
The qualitative assessment examined student interviews and open-ended survey responses utilizing a Grounded Theory framework in MAXQDA (Figure 2).

Figure 2. Grounded Theory qualitative evaluation displaying Code-System Matrices based on student interviews and surveys, with screenshots from MAXQDA
A three-step coding process included memo writing to mitigate researcher bias (Reference Chong and YeoChong & Yeo, 2015). Interview transcripts and survey responses were analyzed line by line, grouping similar responses under overarching terms. Unnecessary codes were discarded, and the remaining codes were organized into key factors: coherence, encouragement and inspiration, social-emotional perception, structure and appearance, and potential weaknesses. Codes were hierarchically arranged to align with statistical hypothesis testing.
4. Results
The research findings reveal a positive perception of AI-generated feedback, with students showing favorable attitudes toward its multifaceted benefits, including social-emotional perception, encouragement and inspiration, coherence, structure, and outer appearance. Across all sub-categories, AI feedback consistently outperformed human-generated feedback (Figure 3), with students praising its inspiring tone and organized structure.

Figure 3. Boxplot comparing results of AI-generated and human-generated feedback
H1: AI-generated feedback leaves the recipient with a positive social-emotional perception. A survey assessed the emotional perception of the feedback through questions on positivity, supportiveness, fairness, and competence, measured on a 5-point Likert scale. H1 was supported as AI feedback achieved average ratings above 3.0. Positivity and supportiveness received median values 5.00 (p = 0.002, p < 0.05). Overall, AI feedback was perceived as supportive and well received.
H2: AI-generated feedback is perceived as encouraging and inspiring. Four survey questions assessed students’ views on encouragement and inspiration. The ratings showed little disagreement, with scores of 2.00 and 3.00. Both measures had a first quartile of 4.00, with 25% of students rating at or below this, while median values were 4.00 and 5.00. H2 was validated as AI feedback surpassed the 3.0 threshold for all sub-aspects (p = 0.002, p < 0.05).
H3: AI-generated feedback reveals weaknesses in terms of coherence. Six survey questions evaluated the coherence of AI feedback, assessing correctness, context fit, conclusiveness, order, detail, and overall impression. Median ratings were high across all sub-aspects, with correctness at 5.00, context fit at 5.00, and conclusiveness at 5.00. No sub-aspects fell below 3.0, confirming strong coherence in AI feedback and leading to the acceptance of H3 (p = 0.006, p < 0.05).
H4: AI-generated feedback can be generated and transmitted fast and well-structured. Students rated the structure of AI feedback on a 5-point Likert scale. The feedback was generated in a median of 1.59 minutes, highlighting its speed. Students rated structure and organization highly, with a minimum median of 4.00 across all questions and 5.00 for overall organization, confirming H4 (p = 0.001, p < 0.05).
H5: AI-generated feedback does not significantly underperform human feedback regarding the properties outlined in H1, H2, and H4. Students completed a survey evaluating feedback received on their ideation results, with an embedded example shown in Figure 4. Examples of both human and AI feedback are illustrated in Figure 5. H5 is supported if AI feedback is not significantly worse than human feedback in social-emotional perception, encouragement and inspiration, coherence, and structure. The Wilcoxon signed-rank test revealed significant differences in social-emotional perception and overall feedback ratings between human- and AI-generated comments. Qualitative analysis identified factors contributing to a more positive social-emotional perception of AI-generated feedback. As with previous elements, the Wilcoxon signed-rank test was applied to each subsection of students’ perceptions, overall perception, and measured time. At a significance level of α = 0.05, the test yielded significant results (p < 0.05) across all sub-aspects. In summary, AI-generated feedback (Comment B) was never rated significantly lower than human-generated feedback in any evaluated aspect; in fact, students rated AI feedback markedly higher across all categories.

Figure 4. An example of a student’s anonymized ideation

Figure 5. An example of feedback from a lecturer and ChatGPT
5. Discussion
The findings of this study indicate that students rated AI-generated feedback higher than human feedback across all measured dimensions. AI-generated feedback was shown to encourage and inspire students in the context of ideation. However, it is essential to understand the reasons for the superior performance of AI and its implications for the educational setting. Students benefited significantly from AI-generated feedback during ideation, particularly in structure and clarity. In contrast, the clarity and organization of human lecturers’ comments varied based on their style, time constraints, and level of involvement.
Additionally, AI-generated feedback generally showed higher encouragement and support, which might have influenced students’ opinions by maintaining a consistently positive tone and avoiding harsh criticism. Whereas human feedback was sometimes too brief or overly critical, AI answers were generally more thorough and organized in a way that improved understanding. However, a crucial subject of future research is whether the students’ awareness of the feedback source influences their reception. Would students have confidence in the results if they knew the feedback came from AI rather than humans?
Given its favorable reception, could AI-generated feedback become standard practice in educational settings? It is important to note the essential criticality (Reference SchönSchön, 1983), which is fundamental to design education, where students are encouraged to challenge ideas, refine concepts, and engage in iterative problem-solving. While AI is created to be supportive and flexible, its potential to question student assumptions remains ambiguous. Furthermore, its propensity to soften its replies when confronted with contradiction casts doubt on its ability to engage critically. This implies that although AI feedback is valuable in directing students, human lecturers remain essential for delivering more in-depth critiques and challenging concepts. Instead of viewing AI as a replacement for human feedback, this research emphasizes the opportunity to advance human feedback methodologies by incorporating the strengths identified in AI-generated responses. Establishing structured guidelines for educators could facilitate improvements in consistency and clarity.
The first limitation of this research is the restriction to textual feedback, even though oral feedback is a common practice in education. Secondly, when integrating feedback comments into teaching, the lecturers and the students matter. However, this study focused on the student perspective. Nevertheless, it would also be interesting to know how lecturers rate the AI-generated feedback comments regarding their quality. Thirdly, a significant amount of time was invested in optimizing the prompting; it went through several phases of iteration to suit the context regarding quality and reliability. So, adapting to an educational framework will demand coding and prompting knowledge. Fourthly, this study differed in some respects from the actual educational setting, as the lectures did not know the participating students personally and the other way round. The ideation task was not part of an actual graded study project. This influenced the effort the participating lecturers and students were willing to invest in. However, the decision to anonymize and conduct an ungraded task outside the real study context was made due to ethical considerations.
Rather than substituting traditional feedback approaches, AI can facilitate lecturers by ensuring students receive consistent, structured, and instant feedback. This research advocates for a synergistic strategy, wherein AI supports educators in delivering feedback, thereby fostering a system that maintains the vital human elements of education and learning while capitalizing on the efficiency of AI. However, further research with a larger sample size and in an actual study context should be conducted to support the initial findings of this explorative research.
6. Conclusion
This paper investigated the extent to which students benefit from AI-generated feedback during ideation. It provides answers from an experimental study in which students (n=25) were given feedback from ‘Generative AI GPT-4’, five human professors, and one lecturer. The results indicate that students view AI feedback positively regarding social and emotional aspects, encouragement, inspiration, coherence, and structure. With the rapid development of generative AI technology, a promising direction for future research could be incorporating real-time feedback conversations through text-to-speech transcriptions. This approach would enable students and educators to document discussions and track progress.
The feedback could be clustered on the learning goals and assessment methods, enhancing the educational process’s constructive alignment (Reference BiggsBiggs, 1996).
Furthermore, there is potential for generative AI tools to aid in evaluating design quality based on aesthetic criteria or human assessments, which could become a key focus for AI training (Reference Thoring, Huettemann and MuellerThoring et al., 2023). Future investigations could also explore Bloom Reference Bloom(1984) 2 Sigma Dilemma, examining the incorporation of AI into education to increase the impacts of individualized tutoring on student achievement. By embracing AI as a collaborator in education and beyond - capitalizing on its ability for personalized learning, adaptive feedback, and interdisciplinary cooperation - we can transform the narrative from one of apprehension and disorder, as depicted in Metropolis by Fritz Lang, to one of empowerment, innovation, and collective advancement for a more sustainable future.