1. Introduction
This plenary paperFootnote 1 considers what automation brings to feedback, exploring the contribution of Automated Writing Evaluation (AWE) and Generative Artificial Intelligence (Gen AI). With teachers increasingly overwhelmed by rising expectations to provide more feedback, of a better quality and to ever more students, automation is increasingly seen as a solution to teacher despair and burnout. A major question, then, is, can generative AI tools, like large language models and chatbots such as ChatGPT, help relieve this burden and reduce teacher burnout?
There are obviously many issues embedded in that simple question and things are moving at pace – and apparently without our control. However, I want to focus on five sub-questions that we might want to address about automated feedback:
1. Is it accurate? Do its responses actually reflect the language of target texts?
2. Is it relevant? Can it support the particular needs of students in different disciplines?
3. Is it empathetic? Can it sugar coat the pill of criticism with sympathetic comments?
4. Is it valued? Do learners trust the feedback, and are they prepared to accept it?
5. Is it educational? Does it produce better writers or just better texts?
First though, where are we with feedback?
2. Feedback on writing
The benefits of feedback given on student writing are now fairly well established, so that we broadly assume that it:
• provides writers with a sense of audience and a grasp of what readers’ value in a text
• gives opportunities for students to learn from feedback and the expectations of the communities they are writing for
• offers the assistance of an expert, guiding the learner
• points forward to other texts students will write
• offers targeted instruction supporting classroom teaching
Feedback then, is central to whichever writing pedagogy you subscribe to. For Process adherents, feedback is the key to drafting and learning, moving the focus from a concern with accuracy to development of meaning. For teachers using a Genre model, feedback guides a student through the Zone of Proximal Development, Vygotsky’s idea of the learning area between what students know and what they can achieve with help. Teachers help students learn by gradually reducing support to allow students to write more independently, like an expert training an apprentice with the scaffolding adjusted for genre, proficiency, and familiarity.
As a result, teachers have come under increasing pressure to provide more, including more personalised, more timely, and more detailed responses to students (Hattie & Timperley, Reference Hattie and Timperley2007), that encourage student engagement (Carless, Reference Carless2016) and contain do-able recommendations for improvement (Ferris & Kurzer, Reference Ferris, Kurzer, Hyland and Hyland2019). Written feedback is supposed to be positive, specific, clear, and point to revision not evaluation.
There is, therefore, no shortage of advice, actually admonishments below the surface. These points, taken from just one website, A guide to ‘giving feedback that sticks’,Footnote 2 suggests, with little thought for the harassed teacher it seems, the following:
• Start by providing positive feedback – establish a constructive atmosphere
• Be specific and clear
• Prioritise key issues
• Explain your corrections, edits, or comments
• Give specific examples
• Flag common issues once
• Connect comments to class content
• Avoid negative language
• Focus on revision, not just evaluation
• Propose recommendations for enhancement
• Explain your corrections, edits, and comments
• Set realistic goals
• Help students acquire the ability to monitor their progress
• Refer students to external resources
• End with additional optimistic commentary or motivation
• Write a cover page with two to four points, lessons, or themes
To provide some context here, a review of data from the Higher Education Statistics Agency (HESA) and the Office for National Statistics (ONS), found that some 15,000 UK academics left the industry between 2014 and 2022, while the number of students has increased by more than 400,000. This has pushed up staff-student ratios drastically. While the impact of increased student-staff ratios obviously varies depending on the course and how it is delivered, universities across the board have to contend with less staff overall. With ever-larger class sizes, more bureaucratic burdens, and heavier workloads, attempting to comply with these feedback expectations is a daunting prospect for teachers. The Conference on College Composition and Communication (CCCC), the leading body representing US writing teachers, for example, attempts to hold the cap on writing courses to 20 students with an ideal of 15. This is a goal rarely met.
As a result, fatigue and burn-out are now common. In the US, for example, female teachers reported an exhaustion rate of 63% in 2024, compared to 56% in 2022 and the highest among all occupations. Over 270,000 teachers are projected to resign each year for the next three years in the USA (Jacobs-Pinson, Reference Jacobs-Pinson2025). With teachers pushed to their limits, it isn’t surprising that correcting papers often feels like carrying a ball and chain. At this point, turning to AI seems an option too self-evident to ignore.
3. Automated assistance
Automation, then, has the potential to change all this, improving student writing and lifting some of the grind of giving feedback. In the last few years, we have seen new digital resources promising enhanced support for L2 writing instruction. The rapid development and massive uptake of large-scale LLMs has been extraordinary. ChatGPT, for example, was the fastest web service in history to break the 100 million user milestone in January 2023 and now processes 1 billion queries every day (Singh, Reference Singh2025).
Automated Writing Evaluation (AWE) systems such as Grammarly, ProWritingAid, and Pigai have been with us for over a decade and assist learners by providing corrective feedback on student writing. Grammarly, for example, can review the spelling, grammar, and tone of a piece of writing as well as identify possible instances of plagiarism. It can also suggest style and tone recommendations to users. Commonly these programmes have focused on linguistic rather than rhetorical and discoursal properties of texts, but with recently added AI capabilities, they can generate and rewrite content based on prompts. These systems rely on natural language processing to identify errors in writing and suggest improvements but with the recent rapid emergence of sophisticated generative AI Large Language Models (LLMs), such as GPT-4 by OpenAI and PaLM 2 by Google, AWE enters a new era embracing discoursal aspects of writing.
These new digital resources have already proven to be valuable in supporting L2 writing and teaching, offering automatic translation, in-text error correction, automated scoring systems, and other benefits. AI is able to correct and explain language use, offer example sentences and translations (Kohnke et al., Reference Kohnke, Moorhouse and Zou2023), and to scaffold and review students’ argumentative writing (Su et al., Reference Su, Lin and Lai2023). The ability to provide instant local and global written feedback across multiple drafts targeted to student needs and in greater quantities is the greatest potential asset. The hope is that this may increase learner motivation and autonomy while relieving teachers of hours of marking.
But are these empty marketing claims designed by tech companies to raise expectations and sell products? Are we overlooking weaknesses in these tools, and even their dangers, in our enthusiasm to remove some of the drudgery of mundane grammar correction? And can AI really deliver on its promise? I don’t want to rain on the parade, but I think we need to explore the questions I introduced at the beginning of this paper before going full throttle celebrating AI:
1. Is it accurate? Does it respond consistently in ways that reflect target texts?
2. Is it rhetorically appropriate? Can it support the specific disciplinary writing requirements?
3. Is it empathetic? Can it deliver critical feedback usefully and sensitively?
4. Is it valued? Are learners prepared to trust the advice? Which students benefit most?
5. Is it educational? Does it produce better writers or just better texts?
Let’s briefly look at each in turn.
1. Is it linguistically accurate? Does it provide correct information and advice to students on their writing in a useful and consistent way?
To look more closely at this, Victor Zhang and I (Zhang & Hyland, Reference Zhang and Hyland2018) compared the feedback given to Chinese learners by human raters with the AWE programme Pigai. This naturalistic case study looked at how two Chinese students of English engaged with both teacher and AWE feedback on their writing over a 16-week semester. Drawing on student texts, teacher feedback, AWE feedback, and student interviews, we found that teacher feedback addressed more error types (16) than the AWE feedback (8) (Table 1). In addition, many mechanical errors, such as number and abbreviations, were not diagnosed by the AWE. In other words, the programme failed to spot many important errors, suggesting that its feedback can be inaccurate and mislead students into complacency concerning their own grammatical correctness.
Table 1. Comparison of error focus between teacher feedback and AWE feedback

But while teacher feedback may have advantages over the AWE programme in this area, Pigai highlighted, rather than corrected, student errors, potentially encouraging students to correct their own errors (Table 2).
Table 2. Comparison of feedback type between teacher feedback and AWE feedback

While there may be no clear consensus in the literature regarding the effectiveness of such indirect feedback on learning, I think it is clear that this process can help raise students’ awareness of their mistakes. Although this does, of course, involve them reviewing and correcting their work based on the feedback they receive. The accuracy of AI feedback, however, can vary based on the sophistication of the AI model, the aspects of writing being evaluated, and the context in which the feedback is given.
Providing linguistically accurate feedback, however, is currently the big claim for AI as it adds to the potential of AWE in this area. Certainly it seems to do a good job of identifying errors and suggesting changes to surface structure features of texts. ChatGPT, for example, can offer immediate feedback on grammar, vocabulary, and coherence, while also identifying key concepts (Kung et al., Reference Kung, Cheatham, Medenilla, Sillos, De Leon, Elepaño, Madriaga, Aggabao, Diaz-Candido, Maningo and Tseng2023). So, AI Feedback has strengths in these areas:
1. Grammar and syntax: AI tools are generally very effective at identifying grammatical errors, punctuation mistakes, and issues with syntax. They can provide suggestions for corrections that are often accurate and helpful.
2. Clarity and conciseness: Many AI writing assistants can analyse sentence structure and suggest ways to improve clarity and conciseness, helping students express their ideas more effectively.
3. Consistency: AI can help ensure an even writing style, tone, and formatting, useful in longer essays.
4. Plagiarism detection: AI tools can identify potential plagiarism, helping students maintain academic integrity.
5. Structure: Some AI systems can evaluate the structure of an essay, checking for the presence of genre stages such as introductions, conclusions, and transitions.
But before we open the champagne, we might pause and recognise some nagging doubts. The ability of digital tools to deliver useful feedback obviously depends on their effectiveness in analysing texts, but this is far from assured. Curry et al. (Reference Curry, Baker and Brookes2024), for example, found that ChatGPT4 did a poor job of categorising keywords in specialised texts, made false inferences about concordance lines, and failed to identify and analyse direct and indirect questions, so making function-to-form analysis problematic. While this research focused on how well the bot was able to manage the automated qualitative analysis of texts, it also raises questions about its ability to deliver reliable feedback based on an accurate reading of student writing.
There are also limitations in the feedback it provides. Lin & Crosthwaite’s (Reference Lin and Crosthwaite2024) comparison of human versus ChatGPT generated written corrective feedback found that while teachers provided fairly consistent WCF in terms of the amount and the ratio of direct/indirect, local/global feedback, that produced by ChatGPT varied considerably even when using the same prompt on the same text, rendering it difficult to use when attempting to standardise feedback across larger cohorts. There also seems to be significant differences between the scores given by human raters and those generated by GenAI when grading texts, with ChatGPT being a significantly tougher, although more consistent, marker than teachers (Topuz et al. Reference Topuz, Yıldız, Taşlıbeyaz, Polat and Kurşun2025).
2. Is it rhetorically appropriate?
By this I mean does it accurately reflect the target discourses students are aiming to produce?
While teachers often spend inordinate amounts of time correcting student’s written work, research indicates that students actually desire targeted and personalised learning support, such as discourse-level feedback that goes beyond grammar correction (e.g. Chan & Hu, Reference Chan and Hu2023), and assessments which align with the expectations of academic disciplines (Fox & Artemeva, Reference Fox and Artemeva2017). And, at first glance, it may seem that AI generates texts which fit this bill in being indistinguishable from human ones. Tools such as GPTZero and AICheatCheck have been developed to detect AI involvement in students’ writing but have so far been unable to reliably distinguish the two (Barrot, Reference Barrot2023; Gao et al., Reference Gao, Howard and Markov2023). Scarfe et al. (Reference Scarfe, Watcham, Clarke and Roesch2024), for example, compared examination submissions generated by ChatGPT and undergraduates, with 94% of AI submissions undetected. Similarly, Gao et al. (Reference Gao, Howard and Markov2023) showed that human reviewers only identified 68% of research abstracts created by ChatGPT.
Applied linguists have done rather better at distinguishing machine-generated texts from those written by humans. A growing body of research has begun to show that ChatGPT does not write like humans in terms of linguistic features, rhetorical strategies, or even content (Yang et al., Reference Yang, Chen, Zhu, Lin and Wang2024). Indeed, in one comparative study, Herbold et al. (Reference Herbold, Hautli-Janisz, Heuer, Kikteva and Trautsch2023) found that expert markers rated ChatGPT generated essays more highly than those produced by German high school students, with the AI models exhibiting linguistic characteristics distinct from those of the human-written essays. Similarly, Kong and Liu (Reference Kong and Liu2024) also found significant differences between human and AI-generated texts. Examining research article abstracts across four disciplines, they found the abstracts differed in the number and types of rhetorical moves employed, regular move patterns, and the frequency of move embedding.
Kevin Jiang and I have also looked at this. In a series of six papers over the last two years, we have compared undergraduate essays with AI generated texts on the same topics and length. This was 145 texts of around 75,000 words in each corpus. This was our prompt:
‘You are a competent undergraduate student writer of English texts for academic purposes. Write 16 argumentative essays with a persuasive writing style on the topic of [transport]. Each essay is about 500 words long.’
We used the results to compare the use of various interpersonal features including bundles (Jiang & Hyland, Reference Jiang and Hyland2024), engagement (Jiang & Hyland, Reference Jiang and Hyland2025a), metadiscourse (Jiang & Hyland, Reference Jiang and Hyland2025b), and stance (Jiang & Hyland, Reference Jiang and Hyland2025c). In terms of stance, we found that the ChatGPT 4 essays, though structurally coherent and logically organised, contained a significantly lower frequency of features such as hedges, boosters, attitude markers, and self-mention (Table 3).
Table 3. Frequency of stance features in the two corpora

This leads to a more impersonal and expository tone, as exemplified here:
(1) Efforts to bridge this divide must be multifaceted, addressing not just physical access to computers and the internet, but also the skills to use them effectively and the development of relevant content. (ChatGPT essay)
(2) It also has an effect on related industries, including feed production, transport and equipment manufacturing. However, this economic impact must be weighed against the subsidies often provided to the beef industry, which can distort market prices and lead to overproduction. (ChatGPT essay)
The student essays, on the other hand, were more heavily loaded with stance features, showing they were able to clearly distinguish ‘plausible reasoning’ from ‘certain knowledge’ in constructing their arguments. For example, they were able to use hedges to display uncertainty, indirectness, or a desire to reduce the imposition on the reader:
(3) To a certain extent, even if sovereignty was lost, some national personality would be preserved. The fact is that, when united, European unity would still in essence be divided.
(Student essay)
(4) The idea of ‘sovereignty’ is an established term in the world of international relations and is generally accepted to mean that while other states may have a right to have some say in a way that …
(Student essay)
Similarly, ChatGPT also makes significantly less use of self-mention. This has much to do with the engineering of the model where ChatGPT is ‘merely’ pulling out some ‘coherent thread of text’ from the ‘statistics of conventional wisdom’ that it has accumulated (Wolfram, Reference Wolfram2023, p. 96). In so doing, it produces what we recognise as a ‘reasonable continuation’ of text based on what it has learned from billions of internet texts of various kinds, rather than seeking to stamp a personal position on its texts (Jiang & Hyland, Reference Jiang and Hyland2024, Reference Jiang and Hyland2025a; Herbold et al., Reference Herbold, Hautli-Janisz, Heuer, Kikteva and Trautsch2023). It also reflects the underlying artificial-neural-net structure of ChatGPT, which mimics an idealised human working without the need for self-referencing. The end result is texts which resemble the kind of formal academic argumentation of impersonality and objectivity which is rarely seen today:
(5) Efforts to bridge this divide must be multifaceted, addressing not just physical access to computers and the internet, but also the skills to use them effectively and the development of relevant content. (ChatGPT essay)
The students, in contrast, didn’t hesitate to mark their views:
(6) As far as schools and colleges are concerned, I think it is very important to study as wide a field as is possible.
(Student essay)
(7) In my opinion if boxing were to be banned then, not only would many people lose their job but I think that boxing would go ‘back street’.
(Student essay)
This more retiring or ‘invisible’ stance means that ChatGPT is unable to accurately offer feedback on rhetorical and pragmatic aspects of arguments produced by real human writers. The rather bland and faceless ‘positioning’ of AI bots reflect a more traditional style of argument rarely seen in student essays, making it difficult to depend on ChatGPT to provide reliable feedback on student writing.
The student essays also display greater efforts to create engagement with readers, employing more devices to solicit reader interaction with the ideas and encourage buy-in to the argument. Table 4 lists the most common ways academic writers seek to engage their readers together with the normed frequencies in each corpus. These features are the most explicit ways which writers recognise their readers in the text, acknowledge their expectations of inclusion, and respond to their possible objections and alternative interpretations (Hyland, Reference Hyland2005). Clearly, this does not tell us a great deal about the quality of the essays, as more interactional devices do not necessarily mean more effective texts. But significantly fewer markers of engagement reveal a distinctive characteristic of the AI texts and indicate a gulf in the interactional positions taken in the two corpora.
Table 4. Frequency of engagement in the two corpora

Reader mentions and directives, for example, are designed to foster both a shared perspective and ensure the argument is understood. They create a more conversational tone and help make the argument more accessible, contributing to the impression of reader-awareness and recipient design in a text. This helps construct an effective line of reasoning, establishing a connection with readers:
(8) In order to solve our transport problems, we need to start taking a more long term view.
(Student essay)
(9) This is why it could be said we don’t use our brains as much.
(Student essay)
Questions, on the other hand, invite direct collusion since the reader is addressed as someone with an interest in the issue raised by the question, the ability to recognise the value of asking it, and the good sense to follow the writer’s response to it. Questions, then, are the main strategy of dialogic involvement, inviting readers to orientate themselves in a certain way to the argument and to enter a frame of discourse where they can be led to the writer’s viewpoint (Hyland, Reference Hyland2002). A considerable percentage of questions in the student essays combined with inclusive we as writers interjected questions on behalf of the intelligent reader who is brought into the text through this shared exploration of the topic:
(10) But we ought to ask ourselves ‘What happens when the computer-orientated world collapses?’ We would then have to use our brains.
(Student essay)
(11) But are we right to blame him?Let us consider that he has discovered a cure for cancer as a result of genetic engineering.
(Student essay)
In contrast, ChatGPT essays contained very few questions. Instead, the model prioritises clarity and structural coherence, reflecting its algorithmic nature and training. So, while ChatGPT can generate reasoned and contextually appropriate text, it lacks an inherent understanding of audience. Unlike human writers, who develop a mental model of their readers and adjust their writing accordingly (Hyland & Jiang, Reference Hyland and Jiang2023), ChatGPT does not possess an intrinsic awareness of who might be reading its output. Similarly, M. Zhang & Crosthwaite (Reference Zhang and Crosthwaite2025) found that ChatGPT excels in generating texts with formal and complex vocabulary suited for academic and technical themes, while undergraduate L2 writers focused on personal and social issues, using more varied and context-rich vocabulary.
This gap between informational load and reader engagement is also shown in the grammar of texts created by the two sources. Research shows that AI generated texts exhibit higher density of content with more noun and preposition-based structures (Herbold et al., Reference Herbold, Hautli-Janisz, Heuer, Kikteva and Trautsch2023; Markey et al., Reference Markey, Brown, Laudenbach and Kohler2024), while human-authored texts contained more clause-related bundles and use of stance markers (Jiang & Hyland, Reference Jiang and Hyland2024). Markey et al. (Reference Markey, Brown, Laudenbach and Kohler2024), for instance, analysed a small corpus of AI generated texts in two genres and compared them with novice and published academic writers, observing that the ChatGPT texts were more informationally dense than authored texts and often read as dialogically closed, ‘empty’, and ‘fluffy’, so that they ‘may not be congruent with established trajectories of writing development in higher education’ (p. 571).
A point to be made here is that Chat GPT might be modelling features of academic prose that many teachers would encourage in their students. But we are usually not encouraging students to write in the style of research papers but in ways that effectively engage with and persuade teachers, supervisors and peers. Texts created by ChatGPT’s statistical algorithms therefore appear to be ‘vaguer and more formulaic’ (Gao et al., Reference Gao, Howard and Markov2023, p. 1), reinforcing a reductionist view of writing where it might be seen as a fixed structure following a system of grammatical rules (Vee, Reference Vee2023). Because it tends to objectivise academic writing it ‘fails to take account of the beliefs and knowledge writers assume readers will draw on in reading their texts’ (Hyland, Reference Hyland2026, p. 6). Other evidence supports this view. Yang et al. (Reference Yang, Chen, Zhu, Lin and Wang2024), for example, found that ChatGPT texts rely more on repetitive connective signals and constant thematic progression, suggesting less careful planning for text development, which leads to redundancy and a simplistic list-like structure.
So, if GPT can’t write papers like students, then it does not have the resources to offer useful feedback. How can it evaluate, correct and advise students on their argumentative writing? As Jiang and Hyland (Jiang & Hyland, Reference Jiang and Hyland2025b, p. 26) observe:
The limited use of these strategies in ChatGPT texts suggests a missed opportunity for enhancing reader engagement and interaction, which could contribute to more natural, human-like, and persuasive discourse. This points once again to the model’s training parameters, which seem to prioritize informational flow and the conventions of academic formality over interactive discourse that create reader engagement.
The problem is that AI users, whether students or teachers, cannot determine the source texts which comprise its training data. While users can now add plug-ins to bring their relevant data into the chatbot to train ChatGPT, this is only available to the subscription version ChatGPT Plus, which many teachers are unlikely to have access to. The use of third-party builders like BotSonic or SocialIntents to train a custom chatbot may also be beyond the reach of most teachers. Without these workarounds, we can’t be sure that the output produced by Gen AI is truly representative of writing in the target language domain (Crosthwaite & Baisa, Reference Crosthwaite and Baisa2023; Flowerdew & Petrić, Reference Flowerdew and Petrić2024). As a result, there are ‘strong arguments for the continued use of corpus-based pedagogy to meet students’ specific needs in terms of appropriate lexico-grammar for the discipline, thesis sub-section and topic’ (Flowerdew & Petrić, Reference Flowerdew and Petrić2024, p. 10).
3. Is it empathetic?
While feedback is asked to carry a heavy informational load, it is often seen as purely informational, a means of channelling reactions and advice to facilitate improvements. Response is therefore discussed, particularly in debates around automation, as if it were an objective, impersonal, and purely didactic discourse: an interaction between a teacher and a text. But research in neuroscience has found that even when criticism is constructive, we tend to focus on how we feel about it rather than its content (Haden, Reference Haden2025). So while the information in feedback is a key factor in learning to write, it is only effective if it engages with the writer and gives him or her a sense that this is a response to a person rather than a script. This question, then, addresses whether Gen AI can deliver feedback, particularly critical responses to writing, in a way that is both sympathetic and useful.
Teachers are often sensitive to the impact of their comments and give feedback to students rather than their texts. They think about the impact of their remarks on the reader, which means they try and tone down criticisms and suggestions. Negative feedback may encourage students to be apprehensive about writing and lock them into a cycle of failure, lack of motivation, and further failure. Feedback which is sensitively delivered and interpersonal, however, demonstrates respect for their work and builds confidence which may encourage them to revise and reflect.
In a paper Fiona Hyland and I published 25 years ago called ‘Sugaring the pill’ (Hyland & Hyland, Reference Hyland and Hyland2001) we identified various mitigation strategies teachers used to comment sympathetically on student work.
• Paired comments: Combining criticism matched with either praise or a suggestion
‘Vocabulary is good but grammar is not accurate and often makes your ideas difficult to understand.’
‘Good movement from general to specific, but you need to make a clearer promise to the reader.’
• Hedged comments: Modal verbs, imprecise quantifiers, usuality devices
‘Some of the material seemed a little long-winded and I wonder if it could have been compressed a little.’
‘There is possibly too much information here.’
• Personal attribution: Teacher responds as ordinary reader rather than as expert
‘I’m sorry, but when reading this essay I couldn’t see any evidence of this really. Perhaps you should have given me your outline to look at with the essay.’
‘I find it hard to know what the main point of each paragraph is.’
• Interrogative form: Express element of doubt or uncertainty in the comment
‘The first two paragraphs – do they need joining?’
‘Did you check your spelling carefully? Why not make a spelling checklist of words you often get wrong and use this before handing in your final?’
GenAI’s ability to provide empathetic feedback may, of course, partly depend on the prompt used and specific, meaningful feedback might be elicited by, for example, asking for specific strengths in an essay with examples. ChatGPT itself is confident it can provide this kind of supportive feedback, insisting it could through:
1. Tone control
phrases like: ‘Great start!’
‘You’ve used some excellent vocabulary here!’
2. Praise + suggestion format
Positive observation: ‘Your topic sentence is clear.’
Encouraging suggestion: ‘You might make the paragraph even stronger by adding a specific example.’
3. Conversational style
‘I noticed you used the word “interesting” – which works, but maybe we can try a more vivid word? How about “fascinating” or “captivating”?’
It then provided an example:
I like to go the beach on weekend. It is relax place where I can hear the sound of water and see the blue sky. Sometime I bring book or listen music. The sand is warm and soft. I feel happy there.---
AI Feedback in a Supportive Tone:
Hi! Thanks for sharing your paragraph – it’s a lovely description of a peaceful place. I can really imagine the relaxing sound of the water and the warmth of the sand – nice job painting a picture!
Here are just a few small suggestions to help make it even better:
‘I like to go the beach’ → Try saying ‘I like to go to the beach’ (just a small missing word).
‘on weekend’ → You could say ‘on the weekend’ or ‘on weekends’ – both are common.
‘It is relax place’ → Maybe change it to ‘It is a relaxing place’. We usually use the -ing form when describing something that makes us feel relaxed.
‘Sometime I bring book or listen music’ → Try ‘Sometimes I bring a book or listen to music.’ Just adding the little words helps it flow better!
You’re doing really well! You used great sensory details like ‘blue sky’ and ‘soft sand’, which makes your writing vivid and enjoyable to read. Keep going – you’re clearly developing a strong voice in English writing!
I don’t know what you think, but I feel expressions like ‘We usually use the -ing form …’, ‘you could say x’, and ‘we usually use y’ lack specificity and linguistic insight – they aren’t really building students’ grammar knowledge or writing abilities. While the gushing positivity of the praise is all too overwhelming. I suspect that this does not really help to develop students’ ability to understand what they have done wrong or improve their writing skills. This feedback might be sensitive but it is not instructional while the whole tenor oozes artificiality. As teachers are aware, students can recognise insincere feedback, as this Chinese student observes:
‘Sometimes maybe the teacher doesn’t mean it, but they just try to encourage you. […] Because there is always “but” after the positive. Sometimes the teacher just tries to find something good in my essay and then may be that strength is not the main point.’
In other words, the Gen AI feedback seems formulaic while positive feedback lacks sincerity and specificity.
4. Is it trusted and valued by learners?
Much of the available research focuses on the personalised nature of the feedback that AI can instantly provide. But does this swing students towards a preference for AI feedback?
Li et al. (Reference Li, Huang, Wu and Whipple2024) found both GPT3.5 and GPT4o’s written feedback was highly rated by Chinese tertiary L2 English students as being more relevant to their specific needs as compared with teachers’ more general feedback. A systematic review of 52 studies of AWE feedback found that students generally considered automated feedback useful and were motivated when using it, although they criticised a lack of accuracy and explicitness as the feedback tended to be generic and formulaic (Fu et al., Reference Fu, Zou, Xie and Cheng2022). In addition, teachers’ feedback focused mainly on accuracy and was seen as less relevant regarding content and organisation, while GPT’s feedback was considered as balanced across each domain. Chan et al’s randomised controlled trial (Chan et al., Reference Chan, Lo and Wong2024) with Hong Kong students arrived at similar findings. The researchers concluded that the group who received AI feedback demonstrated statistically significant improvements in the quality of their writing compared with those receiving teacher feedback, and they also reported higher levels of motivation and engagement during revising.
Escalante et al. (Reference Escalante, Pack and Barrett2023) found an equal split among students voting for AI-generated versus human-generated feedback after a six-week course with weekly feedback from both on short assignments. The main reason for preferring tutor feedback was that interacting face to face with a human was seen to be ‘more engaging’ compared to reading through AI-generated feedback. They liked the way they could ask follow-up questions and get immediate feedback while allowing them to develop their writing and speaking skills at the same time.
Those students preferring AI-generated feedback cited the clarity, understandability, consistency, and specificity of the feedback, especially that which related to academic style and vocabulary. Others appreciated the detailed feedback on errors they received from GPT, while several students also commented how AI was not constrained by time or availability like human tutors, and they could review the feedback at any time. Similarly, Allen and Mizumoto (Reference Allen and Mizumoto2024) discovered that Japanese tertiary EAP students preferred AI-assisted editing and proofreading compared with student-led writing groups, finding higher confidence in Gen AI’s ability to produce targeted feedback specific to their own writing.
There are very few studies of teachers’ views, although Guo and Wang (Reference Guo and Wang2023) also seem to consider Gen AI feedback to be useful for supporting their own feedback, especially as ChatGPT is able to deliver more of it than they can themselves. These teachers also appreciated that the feedback addressed content, organisation, and language-related issues with equal quantity and quality.
EAP students in Wang’s (Reference Wang2024) study, however, questioned whether dependence on Gen AI tools might hinder their development of critical thinking skills or lead to a decline in writing abilities. There is also the problem that navigating AI tools requires an additional skill set that may not be intuitive, or even widespread, especially for students, and teachers with limited digital literacy. How should they best interrogate the bot? How can we design effective prompts to instruct it to provide the kind of useful feedback students need? Increasingly, with writing mediated by digital technologies, digital literacy is becoming indispensable in language teaching. In order to achieve accuracy in grading, teachers need to create a rubric and phrase it for the AI interface in a way which ensures that grades and comments are awarded consistently. ChatGPT, however, does not always give the same results when assessing the same text using a rubric. Getting a confident estimation of a final grade for an essay may involve running it through the AI tool over a hundred times and calculating the mean grade and standard deviation. It could be argued, however, that this is, in some ways, little different from a teacher grading a batch of papers and then returning to regrade one, and the computer can do this in a fraction of the time.
In terms of learners’ digital literacy, a study of seven Chinese students by Zhang and Hyland (Reference Zhang and Hyland2024) found the students’ familiarity with digital tools had considerable impact on how they engaged with AWE feedback, and this was regardless of their L2 proficiency. The researchers characterised digital literacy as an awareness of the affordances and constraints of technologies, an ability to evaluate digital information, and a willingness to use digital technologies for peer collaboration, all of which can be key in effective student engagement with AWE feedback.
Finally, it is worth pointing out that, encouragingly, research suggests EAP students value human feedback. Teng (Reference Teng2024) and Han and Li (Reference Han and Li2024), for example found that many students missed the ‘human touch’ of teacher feedback compared with the overly formal written feedback provided by ChatGPT. As I will come to, this suggests that Chat GPT might be usefully integrated with teacher feedback.
5. Is it educationally effective?
A great deal of the discussions around AI and feedback imply that it is a mechanical process simply aimed at improving student texts rather than encouraging the human activity of learning. This question, then, focuses on whether AI can generate feedback to produce better writers rather than just better drafts. In other words, are any improvements merely short term and simply reflected in a corrected draft, or does GenAI encourage a lasting change in behaviour resulting from the student’s ability to recognise the value of the advice they receive? Can they incorporate it into their repertoire, transferring the skills to improve and apply the knowledge in later contexts? This implies developing critical thinking, enhancing metacognitive awareness, and fostering self-regulated learning.
In an experimental study comparing student writing over a six-week course Escalante et al. (Reference Escalante, Pack and Barrett2023) found no difference in learning outcomes between a group receiving AI feedback and one receiving human tutor feedback. While studies have generally found AI to have a positive effect on writing development, particularly if they lead to substantive revisions (Roscoe et al., Reference Roscoe, Wilson, Johnson and Mayra2017), some report inconsistent improvements (see Zhai & Ma, Reference Zhai and Ma2023). For example, Stevenson and Phakiti (Reference Stevenson, Phakiti, Hyland and Hyland2019) found that L2 students using AWE reduced error rates across drafts of the same task, but that learning did not transfer across tasks.
We have noted that AI feedback is especially good at:
• Spotting grammar and spelling errors
• Suggesting sentence rephrasing
• Identifying awkward phrasing or word choice
But while this kind of surface-level correction can help improve the current draft, it is unlikely to lead to long-term learning unless the student reflects on and understands the feedback. Effective prompts and continuing dialogues with GenAI seem important. Some studies show that when students engage actively with AI feedback (e.g. using it to revise, reflect on errors, compare drafts, and so on), they improve not just their texts but their language and writing skills. However, passive use – just accepting changes without understanding them – tends to only improve only the current text, not the student.
Fuchs (Reference Fuchs2023), for example, warns that students’ uncritical acceptance of Gen AI feedback may mean they fail to develop academic skills such as evaluating sources for quality and reliability. One aspect of improving writers’ skills is the development of their autonomy, but it is unclear how far it does so. A meta-analysis of 18 peer-reviewed articles on AI feedback by Mohebbi (Reference Mohebbi2025), for example, found that AI promoted self-regulated learning but flagged the dangers of an over-reliance on technology, potentially hindering their development of critical writing and editing skills (Kohnke et al., Reference Kohnke, Moorhouse and Zou2023). So, specific challenges arising from students’ over-reliance on Gen AI are that it could reduce their creativeness, criticality, clarity, and linguistic precision, each of which are key skills allowing them to connect their real-world experiences to their academic writing (Creely, Reference Creely2024; Crompton et al., Reference Crompton, Edmett, Ichaporia and Burke2024). In sum, Gen AI may deprive students of learning opportunities and hinder the development of an academic voice (Barrot, Reference Barrot2023; Park & Ahn, Reference Park and Ahn2024).
The central problem, I think, is that we are still at the gee whizz stage of AI development where we are overawed by the tech. Applied linguistics journals and conferences are now swamped by studies of ChatGPT. But, as with many new technical developments, we have put greater emphasis on the AI tools rather than learners. As a result, we have invested too much in the belief that feedback can improve L2 students’ writing abilities and consolidate language learning if delivered effectively. Far less attention has been given to whether it is received attentively. But as every teacher knows, the provision of feedback does not automatically lead to writing improvement. The affordances of GenAI are obviously related to student training and individual learners’ engagement with the advice they receive and actual use of the programme. Cheng et al. (Reference Cheng, Li and Lee2025), for example, report a study where doctoral students maintained their academic voice while using ChatGPT for research writing after receiving a comprehensive tutorial on AI literacy.
Rather, we have to look at whether students are engaging effectively with the responses they receive to their work and whether this is sufficient to unlock the benefits of feedback. Broadly, we can see engagement as referring to the extent students are invested or committed to their learning. It is a term which embraces a complex of factors which can be seen in students’ responses to texts and their attitudes to writing and responding. It brings together students’ degree of attention, curiosity, interest and willingness to employ their language proficiency and learning skills to make progress.
Zhang and Hyland (Reference Zhang and Hyland2022) studied two Chinese undergraduate students and found that they adopted different styles of engagement. The engaged learner participated actively and reflected more deeply with both teacher and AWE feedback on her writing. The unengaged learner was less motivated and showed less willingness to use the feedback he received from both sources. So, more engaged learners spend more time working with feedback, show more positive attitudes toward feedback, and employ more revising strategies in response to feedback, which suggests that behavioural, affective, and cognitive engagement to both types of feedback interact with each other. Importantly, they noted that L2 students were more cognitively engaged in correcting grammatical errors when rewriting or reorganising texts after receiving both AWE and teacher feedback than with AWE alone.
Combining machine and teacher feedback, with teachers offering more substantive, higher-order feedback following feedback from Gen AI can produce greater benefits in writing (Han & Sari, Reference Han and Sari2024; Wilson & Czik, Reference Wilson and Czik2016). To improve writing skills, rather than simply texts, the feedback has to go beyond formulaic responses to encourage engagement and learning. This means the feedback needs to be:
• Understandable: Students need to grasp why something is wrong.
• Actionable: It should guide them toward better writing strategies.
• Interactive: Ideally, students should be able to ask ‘why?’ and get explanations.
• Pedagogically supported: Teachers need to integrate AI feedback into lessons and follow it up with instruction.
The challenge for teachers is to educate learners to take control of AI tools and use them meaningfully and responsibly to move learning forward.
4. Conclusions
To sum up, research on the effectiveness of AI in providing feedback on essays has shown mixed results. In broad terms:
a. Positive outcomes: AI feedback can lead to improvements in grammar, organisation, and mechanical, surface-level aspects of writing.
b. Comparative studies: While AI can provide more balanced feedback, this is often impersonal, vague, and limited to surface issues. Human feedback provides deeper insights, pragmatic advice, and more nuanced critiques.
c. User experience: The effectiveness of AI feedback can also depend on students’ digital literacy skills and level of engagement with the feedback. Students who actively reflect on and apply AI suggestions tend to benefit more.
AI can be accurate and effective in providing feedback on essays, particularly for grammar, clarity, and structure. However, it has limitations in understanding context, depth of analysis, and rhetorical elements of writing. But at this point it is important to note that things are changing fast with models improving over time. Herbold et al’s. study (2023), for instance, found that while ChatGPT-3 had a significantly lower linguistic diversity than humans, ChatGPT-4 was significantly higher than the students. Unfortunately, research is barely keeping up with technological developments which means that this discussion may seem dated by the time it is published.
There is, however, a tendency in all discussions of AI to focus attention on the tools rather than the learners themselves and the skills they need to engage with automated feedback effectively. Navigating AI tools requires new digital literacy skills, an issue which poses challenges for both teachers and students which need addressing. One problem is that a lot of the research has been conducted outside of naturalistic learning contexts, raising questions about their usefulness to classroom teachers. We also need to place far more emphasis on teacher and student training in critical AI literacy. Darvin (Reference Darvin2025), for example, argues that by understanding how GenAI platforms trained on large datasets can privilege certain ways of thinking and writing, L2 writers can develop a more critical perspective of how these technologies can shape the way we write.
For optimal results, we should consider using AI feedback in conjunction with human feedback and self-reflection to help students develop their writing skills more comprehensively. At its most effective, formative feedback is more than information transmission where students simply receive advice on their texts. Teachers recognise that feedback is a dialogue with students designed to encourage reflection and growth. This understanding of feedback as a social practice requires a greater role for teachers in the process, making it crucial that we leverage AWE and GenAI to create more interactive and collaborative feedback loops rather than static, one-way advice.
The way forward, then, would seem to be a Teacher-Gen AI -Student partnership, requiring an investment of effort and engagement by students and a new role for AI:
1. AI as scaffolder
Teachers let students write a first draft, get AI feedback, then hold a mini-lesson on common errors before students revise. AI becomes a teaching tool not just a fixer.
2. AI as error analyst
Students get AI feedback and then categorise their errors (e.g. tenses, word order, lexical choices). Over time, they build an awareness of language and their writing weaknesses.
3. AI as a writing coach
With improved digital literacy among teachers it may be possible to set up ChatGPT as a coach students can query about improving clarity, structure, or argumentation in their essays. This interactive format helps students think metacognitively.
Adopting such a collaborative model, Han and Li (Reference Han and Li2024) asked over 100 students to complete two writing tasks, with both corrective and holistic feedback provided by ChatGPT which was later modified by teachers for students’ subsequent use. Results suggest the students incorporated more of this co-produced feedback into subsequent revisions, in effect ‘leveraging the analytic strengths of AI while maintaining essential teacher-student interactions’ (p. 1). We cannot, then, be taken in a direction led by the technology and the megacorporations behind it. Teachers do not want to be replaced by AI, and outcomes are better when they are not. At the core of this balanced approach is fostering student agency and critical AI literacy – the ability to use AI effectively while understanding its limitations and biases.
Ken Hyland is an Honorary Professor at the University of East Anglia, UK. He has published over 330 articles and 30 books on writing and academic discourse with over 100,000 citations on Google Scholar. A fifth edition of his Teaching and Researching Writing and a second edition of his co-edited Handbook of EAP will be published by Routledge in 2026. According to the Stanford/Elsevier analysis of the Scopus database, he has been the most influential scholar in language and linguistics for the past five years (2021-2025). A collection of his work, The Essential Hyland, was published in 2018 by Bloomsbury. He is the Editor of two book series with Routledge and Bloomsbury and a Foundation Fellow of the Hong Kong Academy of the Humanities. He was founding co-editor of the Journal of English for Academic Purposes and co-editor of Applied Linguistics.