Ethical AI for language assessment: Principles, considerations, and emerging tensions

Evelina Galaczi; Carla Pastorino-Campos

doi:10.1017/S0267190525100081

Ethical AI for language assessment: Principles, considerations, and emerging tensions

Published online by Cambridge University Press: 01 September 2025

Evelina Galaczi and

Carla Pastorino-Campos

Show author details

Evelina Galaczi: Affiliation:
Cambridge University Press and Assessment, Cambridge, UK
Carla Pastorino-Campos*: Affiliation:
Cambridge University Press and Assessment, Cambridge, UK
*: Corresponding author: Carla Pastorino Campos; Email: carla.pastorino@cambridge.org

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Many language assessments – particularly those considered high-stakes – have the potential to significantly impact a person’s educational, employment and social opportunities, and should therefore be subject to ethical and regulatory considerations regarding their use of artificial intelligence (AI) in test design, development, delivery, and scoring. It is timely and crucial that the community of language assessment practitioners develop a comprehensive set of principles that can ensure ethical practices in their domain of practice as part of a commitment to relational accountability. In this chapter, we contextualize the debate on ethical AI in L2 assessment within global policy documents, and identify a comprehensive set of principles and considerations which pave the way for a shared discourse to underpin an ethical approach to the use of AI in language assessment. Critically, we advocate for an “ethical-by-design” approach in language assessment that promotes core ethical values, balances inherent tensions, mitigates associated risks, and promotes ethical practices.

Keywords

ethical use of AI language assessment human-centred AI fairness validity

Information

Type: Research Article
Information: Annual Review of Applied Linguistics , First View , pp. 1 - 21

DOI: https://doi.org/10.1017/S0267190525100081 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adams, C., Pente, P., Lemermeyer, G., & Rockwell, G. (2021). Artificial intelligence ethics guidelines for K-12 education: A review of the global landscape. In Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., & Dimitrova, V. (Eds.), Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Part II (24–28). Springer International Publishing. https://doi.org/10.1007/978-3-030-78270-2CrossRef Google Scholar

AERA, APA, & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association.Google Scholar

Aiken, R., & Epstein, R. (2000). Ethical guidelines for AI in education: Starting a conversation. International Journal of Artificial Intelligence in Education, 11, 163–176.Google Scholar

AlgorithmWatch. (n.d.). AI Ethics Guidelines Global Inventory. https://inventory.algorithmwatch.org Google Scholar

ALTE. (2020). ALTE Principles of Good Practice. https://www.alte.org/Materials Google Scholar

Association of Test Publishers. (2022, January). Artificial Intelligence Principles. https://www.testpublishers.org/ai-principles Google Scholar

Australian Government, Department of Industry, Science and Resources. (2024). Voluntary AI Safety Standard. https://www.industry.gov.au/sites/default/files/2024-09/voluntary-ai-safety-standard.pdf Google Scholar

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oaCrossRef Google Scholar

Braun, V., & Clarke, V. (2023). Thematic analysis. In Cooper, H., Coutanche, M. N., McMullen, L. M., Panter, A. T., Rindskopf, D., & Sher, K. J. (Eds.), APA handbook of research methods in psychology: Research designs: Quantitative, qualitative, neuropsychological, and biological (2nd ed., pp. 65–81). American Psychological Association. https://doi.org/10.1037/0000319-004Google Scholar

Briggs, D. C. (2024). Strive for measurement, set new standards, and try not to be evil. Journal of Educational and Behavioral Statistics, 49(5), 694–701. https://doi.org/10.3102/10769986241238479CrossRef Google Scholar

Burstein, J. (2023). The Duolingo English Test Responsible AI Standards. https://scite.ai/reports/duolingo-english-test-responsible-ai-4LJW88MP 10.46999/VCAE5025CrossRef Google Scholar

Caines, A., Benedetto, L., Taslimipoor, S., Davis, C., Gao, Y., Andersen, O., Yuan, Z., Elliott, M., Moore, R., Bryant, C., Rei, M., Yannakoudakis, H., Mullooly, A., Nicholls, D., & Buttery, P. (2023). On the application of Large Language Models for language teaching and assessment technology arXiv:2307.08393. arXiv. https://doi.org/10.48550/arXiv.2307.08393CrossRef Google Scholar

Cambridge University Press & Assessment. (2023, May). English language education in the era of generative AI: our perspective. https://www.cambridgeenglish.org/Images/685411-english-language-education-in-the-era-of-generative-ai-our-perspective.pdf Google Scholar

Capel, T., & Brereton, M. (2023). What is human-centered about human-centered AI? A map of the research landscape. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–23. https://doi.org/10.1145/3544548.3580959CrossRef Google Scholar

Davies, A. (2010). Test fairness: A response. Language Testing, 27(2), 171–176. https://doi.org/10.1177/0265532209349466CrossRef Google Scholar

Deygers, B. (2019). Fairness and social justice in English language assessment. In Gao, J. (Ed.), Second handbook of information technology in primary and secondary education (pp. 1–29). Springer International Publishing. https://doi.org/10.1007/978-3-319-58542-0_30-1Google Scholar

ETS Research Institute. (n.d.). Responsible use of AI in assessment. https://www.ets.org/Rebrand/pdf/ETS_Convening_executive_summary_for_the_AI_Guidelines.pdf Google Scholar

European Union. (2024, June). EU Artificial Intelligence Act. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689 Google Scholar

Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center for Internet & Society. https://doi.org/10.2139/ssrn.3518482Google Scholar

Galaczi, E., & French, A. (2011). Context validity. In Taylor, L. (Ed.), Examining speaking: Research and practice in assessing second language speaking (Vol. 30, pp. 112–170). UCLES/Cambridge University Press.Google Scholar

General Data Protection Regulation. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council. https://gdpr-info.eu/Google Scholar

Holmes, W. (2023). The unintended consequences of artificial intelligence and education. Education International.Google Scholar

Holmes, W., & Porayska-Pomsta, K. (2022). The ethics of artificial intelligence in education: Practices, challenges, and debates (1st ed.). Routledge. https://doi.org/10.4324/9780429329067CrossRef Google Scholar

Holmes, W., & Tuomi, I. (2022). State of the art and practice in AI in education. European Journal of Education, 57(4), 542–570. https://doi.org/10.1111/ejed.12533Google Scholar

Holstein, K., & Doroudi, S. (2021). Equity and artificial intelligence in education: Will ‘AIEd’ amplify or alleviate inequities in education? arXiv:2104.12920. arXiv. https://doi.org/10.48550/arXiv.2104.12920.CrossRef Google Scholar

IBM. (2022). Everyday ethics for artificial intelligence. https://www.ibm.com/watson/assets/duo/pdf/everydayethics.pdf Google Scholar

ILTA. (2020). Guidelines for Practice. https://www.iltaonline.com/page/ILTAGuidelinesforPractice Google Scholar

The Institute for Ethical AI in Education. (2021). The ethical framework for AI in education. https://www.ai-in-education.co.uk/resources/the-institute-for-ethical-ai-in-education-the-ethical-framework-for-ai-in-education Google Scholar

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of ethics guidelines. Nature Machine Intelligence, 1(9), 389–399. https://doi.org/10.1038/s42256-019-0088-2CrossRef Google Scholar

Kishimoto, A., Régis, C., Denis, J.-L., & Axente, M. L. (2024). Introduction. In Régis, C., Denis, J.-L., Axente, M. L., & Kishimoto, A. (Eds.), Human-centered AI: A multidisciplinary perspective for policy-makers, auditors, and users (1st ed., pp. 1–10). Chapman and Hall/CRC. https://doi.org/10.1201/9781003320791-1Google Scholar

Kormos, J., & Taylor, L. B. (2020). Testing the L2 of learners with specific learning difficulties. In Winke, P. & Brunfaut, T. (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 413–421). Routledge.Google Scholar

Kunnan, A. J. (2013). Fairness and justice in language assessment. In Kunnan, A. J. (Eds.), The companion to language assessment (pp. 1098–1114). John Wiley & Sons. https://doi.org/10.1002/9781118411360.wbcla144CrossRef Google Scholar

Kunnan, A. J. (2018). Evaluating language assessments (1st ed.). Routledge. https://doi.org/10.4324/9780203803554Google Scholar

Leslie, D., Rincon, C., Briggs, M., Perini, A., Jayadeva, S., Borda, A., Bennett, S., Burr, C., Aitken, M., Katell, M., & Fischer, C. (2024). AI ethics and governance in practice: An introduction. The Alan Turing Institute. https://doi.org/10.2139/ssrn.4731635Google Scholar

McNamara, T., & Ryan, K. (2011). Fairness versus justice in language testing: The place of English literacy in the Australian citizenship test. Language Assessment Quarterly, 8(2), 161–178. https://doi.org/10.1080/15434303.2011.565438CrossRef Google Scholar

Nguyen, A., Ngo, H. N., Hong, Y., Dang, B., & Nguyen, B.-P. T. (2023). Ethical principles for artificial intelligence in education. Education and Information Technologies, 28(4), 4221–4241. https://doi.org/10.1007/s10639-022-11316-wCrossRef Google Scholar PubMed

Ryan, M. (2024). We’re only human after all: A critique of human-centred AI. AI & SOCIETY, 40(3), 1303–1319. https://doi.org/10.1007/s00146-024-01976-2CrossRef Google Scholar PubMed

UNESCO. (2021). Recommendation on the ethics of artificial intelligence. https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligence Google Scholar

UNESCO. (2023). Guidance for generative AI in education and research. https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research Google Scholar

UNICEF. (2021). Policy guidance on AI for children. https://www.unicef.org/innocenti/reports/policy-guidance-ai-children Google Scholar

Whittlestone, J., Nyrup, R., Alexandrova, A., Dihal, K., & Cave, S. (2019). Ethical and societal implications of algorithms, data, and artificial intelligence: A roadmap for research. Nuffield Foundation.Google Scholar

Xi, X. (2023). Advancing language assessment with AI and ML–Leaning into AI is inevitable, but can theory keep up? Language Assessment Quarterly, 20(4–5), 357–376. https://doi.org/10.1080/15434303.2023.2291488CrossRef Google Scholar

Xu, J., Schmidt, E., Galaczi, E., & Somers, A. (2024). Automarking in language assessment: Key considerations for best practice. Cambridge University Press & Assessment. https://doi.org/10.17863/CAM.117098.CrossRef Google Scholar

Article contents

Ethical AI for language assessment: Principles, considerations, and emerging tensions

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests