Hostname: page-component-cb9f654ff-fg9bn Total loading time: 0 Render date: 2025-09-10T04:19:09.909Z Has data issue: false hasContentIssue false

Ethical AI for language assessment: Principles, considerations, and emerging tensions

Published online by Cambridge University Press:  01 September 2025

Evelina Galaczi
Affiliation:
Cambridge University Press and Assessment, Cambridge, UK
Carla Pastorino-Campos*
Affiliation:
Cambridge University Press and Assessment, Cambridge, UK
*
Corresponding author: Carla Pastorino Campos; Email: carla.pastorino@cambridge.org

Abstract

Many language assessments – particularly those considered high-stakes – have the potential to significantly impact a person’s educational, employment and social opportunities, and should therefore be subject to ethical and regulatory considerations regarding their use of artificial intelligence (AI) in test design, development, delivery, and scoring. It is timely and crucial that the community of language assessment practitioners develop a comprehensive set of principles that can ensure ethical practices in their domain of practice as part of a commitment to relational accountability. In this chapter, we contextualize the debate on ethical AI in L2 assessment within global policy documents, and identify a comprehensive set of principles and considerations which pave the way for a shared discourse to underpin an ethical approach to the use of AI in language assessment. Critically, we advocate for an “ethical-by-design” approach in language assessment that promotes core ethical values, balances inherent tensions, mitigates associated risks, and promotes ethical practices.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adams, C., Pente, P., Lemermeyer, G., & Rockwell, G. (2021). Artificial intelligence ethics guidelines for K-12 education: A review of the global landscape. In Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., & Dimitrova, V. (Eds.), Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Part II (2428). Springer International Publishing. https://doi.org/10.1007/978-3-030-78270-2CrossRefGoogle Scholar
AERA, APA, & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association.Google Scholar
Aiken, R., & Epstein, R. (2000). Ethical guidelines for AI in education: Starting a conversation. International Journal of Artificial Intelligence in Education, 11, 163176.Google Scholar
AlgorithmWatch. (n.d.). AI Ethics Guidelines Global Inventory. https://inventory.algorithmwatch.orgGoogle Scholar
ALTE. (2020). ALTE Principles of Good Practice. https://www.alte.org/MaterialsGoogle Scholar
Association of Test Publishers. (2022, January). Artificial Intelligence Principles. https://www.testpublishers.org/ai-principlesGoogle Scholar
Australian Government, Department of Industry, Science and Resources. (2024). Voluntary AI Safety Standard. https://www.industry.gov.au/sites/default/files/2024-09/voluntary-ai-safety-standard.pdfGoogle Scholar
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77101. https://doi.org/10.1191/1478088706qp063oaCrossRefGoogle Scholar
Braun, V., & Clarke, V. (2023). Thematic analysis. In Cooper, H., Coutanche, M. N., McMullen, L. M., Panter, A. T., Rindskopf, D., & Sher, K. J. (Eds.), APA handbook of research methods in psychology: Research designs: Quantitative, qualitative, neuropsychological, and biological (2nd ed., pp. 6581). American Psychological Association. https://doi.org/10.1037/0000319-004Google Scholar
Briggs, D. C. (2024). Strive for measurement, set new standards, and try not to be evil. Journal of Educational and Behavioral Statistics, 49(5), 694701. https://doi.org/10.3102/10769986241238479CrossRefGoogle Scholar
Burstein, J. (2023). The Duolingo English Test Responsible AI Standards. https://scite.ai/reports/duolingo-english-test-responsible-ai-4LJW88MP 10.46999/VCAE5025CrossRefGoogle Scholar
Caines, A., Benedetto, L., Taslimipoor, S., Davis, C., Gao, Y., Andersen, O., Yuan, Z., Elliott, M., Moore, R., Bryant, C., Rei, M., Yannakoudakis, H., Mullooly, A., Nicholls, D., & Buttery, P. (2023). On the application of Large Language Models for language teaching and assessment technology arXiv:2307.08393. arXiv. https://doi.org/10.48550/arXiv.2307.08393CrossRefGoogle Scholar
Cambridge University Press & Assessment. (2023, May). English language education in the era of generative AI: our perspective. https://www.cambridgeenglish.org/Images/685411-english-language-education-in-the-era-of-generative-ai-our-perspective.pdfGoogle Scholar
Capel, T., & Brereton, M. (2023). What is human-centered about human-centered AI? A map of the research landscape. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 123. https://doi.org/10.1145/3544548.3580959CrossRefGoogle Scholar
Davies, A. (2010). Test fairness: A response. Language Testing, 27(2), 171176. https://doi.org/10.1177/0265532209349466CrossRefGoogle Scholar
Deygers, B. (2019). Fairness and social justice in English language assessment. In Gao, J. (Ed.), Second handbook of information technology in primary and secondary education (pp. 129). Springer International Publishing. https://doi.org/10.1007/978-3-319-58542-0_30-1Google Scholar
ETS Research Institute. (n.d.). Responsible use of AI in assessment. https://www.ets.org/Rebrand/pdf/ETS_Convening_executive_summary_for_the_AI_Guidelines.pdfGoogle Scholar
European Union. (2024, June). EU Artificial Intelligence Act. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689Google Scholar
Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center for Internet & Society. https://doi.org/10.2139/ssrn.3518482Google Scholar
Galaczi, E., & French, A. (2011). Context validity. In Taylor, L. (Ed.), Examining speaking: Research and practice in assessing second language speaking (Vol. 30, pp. 112170). UCLES/Cambridge University Press.Google Scholar
General Data Protection Regulation. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council. https://gdpr-info.eu/Google Scholar
Holmes, W. (2023). The unintended consequences of artificial intelligence and education. Education International.Google Scholar
Holmes, W., & Porayska-Pomsta, K. (2022). The ethics of artificial intelligence in education: Practices, challenges, and debates (1st ed.). Routledge. https://doi.org/10.4324/9780429329067CrossRefGoogle Scholar
Holmes, W., & Tuomi, I. (2022). State of the art and practice in AI in education. European Journal of Education, 57(4), 542570. https://doi.org/10.1111/ejed.12533Google Scholar
Holstein, K., & Doroudi, S. (2021). Equity and artificial intelligence in education: Will ‘AIEd’ amplify or alleviate inequities in education? arXiv:2104.12920. arXiv. https://doi.org/10.48550/arXiv.2104.12920.CrossRefGoogle Scholar
IBM. (2022). Everyday ethics for artificial intelligence. https://www.ibm.com/watson/assets/duo/pdf/everydayethics.pdfGoogle Scholar
The Institute for Ethical AI in Education. (2021). The ethical framework for AI in education. https://www.ai-in-education.co.uk/resources/the-institute-for-ethical-ai-in-education-the-ethical-framework-for-ai-in-educationGoogle Scholar
Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of ethics guidelines. Nature Machine Intelligence, 1(9), 389399. https://doi.org/10.1038/s42256-019-0088-2CrossRefGoogle Scholar
Kishimoto, A., Régis, C., Denis, J.-L., & Axente, M. L. (2024). Introduction. In Régis, C., Denis, J.-L., Axente, M. L., & Kishimoto, A. (Eds.), Human-centered AI: A multidisciplinary perspective for policy-makers, auditors, and users (1st ed., pp. 110). Chapman and Hall/CRC. https://doi.org/10.1201/9781003320791-1Google Scholar
Kormos, J., & Taylor, L. B. (2020). Testing the L2 of learners with specific learning difficulties. In Winke, P. & Brunfaut, T. (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 413421). Routledge.Google Scholar
Kunnan, A. J. (2013). Fairness and justice in language assessment. In Kunnan, A. J. (Eds.), The companion to language assessment (pp. 10981114). John Wiley & Sons. https://doi.org/10.1002/9781118411360.wbcla144CrossRefGoogle Scholar
Kunnan, A. J. (2018). Evaluating language assessments (1st ed.). Routledge. https://doi.org/10.4324/9780203803554Google Scholar
Leslie, D., Rincon, C., Briggs, M., Perini, A., Jayadeva, S., Borda, A., Bennett, S., Burr, C., Aitken, M., Katell, M., & Fischer, C. (2024). AI ethics and governance in practice: An introduction. The Alan Turing Institute. https://doi.org/10.2139/ssrn.4731635Google Scholar
McNamara, T., & Ryan, K. (2011). Fairness versus justice in language testing: The place of English literacy in the Australian citizenship test. Language Assessment Quarterly, 8(2), 161178. https://doi.org/10.1080/15434303.2011.565438CrossRefGoogle Scholar
Nguyen, A., Ngo, H. N., Hong, Y., Dang, B., & Nguyen, B.-P. T. (2023). Ethical principles for artificial intelligence in education. Education and Information Technologies, 28(4), 42214241. https://doi.org/10.1007/s10639-022-11316-wCrossRefGoogle ScholarPubMed
Ryan, M. (2024). We’re only human after all: A critique of human-centred AI. AI & SOCIETY, 40(3), 13031319. https://doi.org/10.1007/s00146-024-01976-2CrossRefGoogle ScholarPubMed
UNESCO. (2021). Recommendation on the ethics of artificial intelligence. https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligenceGoogle Scholar
UNESCO. (2023). Guidance for generative AI in education and research. https://www.unesco.org/en/articles/guidance-generative-ai-education-and-researchGoogle Scholar
Whittlestone, J., Nyrup, R., Alexandrova, A., Dihal, K., & Cave, S. (2019). Ethical and societal implications of algorithms, data, and artificial intelligence: A roadmap for research. Nuffield Foundation.Google Scholar
Xi, X. (2023). Advancing language assessment with AI and ML–Leaning into AI is inevitable, but can theory keep up? Language Assessment Quarterly, 20(4–5), 357376. https://doi.org/10.1080/15434303.2023.2291488CrossRefGoogle Scholar
Xu, J., Schmidt, E., Galaczi, E., & Somers, A. (2024). Automarking in language assessment: Key considerations for best practice. Cambridge University Press & Assessment. https://doi.org/10.17863/CAM.117098.CrossRefGoogle Scholar