Skip to main content Accessibility help
×
Hostname: page-component-5b777bbd6c-5mwv9 Total loading time: 0 Render date: 2025-06-20T04:48:28.223Z Has data issue: false hasContentIssue false

28 - Pronunciation

from Part VI - Language Skills and Areas

Published online by Cambridge University Press:  15 June 2025

Glenn Stockwell
Affiliation:
Waseda University, Japan
Yijen Wang
Affiliation:
Waseda University, Japan
Get access

Summary

This chapter addresses pronunciation in second language (L2) learning, which ranges from phoneme-level pronunciation to conversation training. First, the definition of phonemes and their relationship with articulation are explained. Vowels and consonants are classified according to different dimensions. The concept of distinctive features is also described. These provide a basis to model and identify phoneme-level pronunciation errors. Suprasegmental features such as stress and rhythm are also addressed. Next, speech analysis methods are described. While formant analysis is effective for diagnosing the pronunciation of vowels, articulatory attribute detection is explored for comprehensive analysis of all phonemes. The chapter then introduces automatic speech recognition (ASR) technology to detect pronunciation errors. Settings of minimal pairs of words, prompted text, and free input can be designed. ASR models are also used for pronunciation grading. The goodness of pronunciation (GOP) score is computed for each phoneme and aggregated over all phonemes in the utterance. Nonnative speech modeling is crucial for effective L2 pronunciation learning.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

References

Anderson, R. S. (1985). Phonology in the twentieth century. The University of Chicago Press.Google Scholar
Badin, P., Tarabalka, Y., Elisei, F., & Bailly, G. (2010). Can you “read” tongue movements? Evaluation of the contribution of tongue display to speech understanding. Speech Communication, 52, 493503. https://doi.org/10.1016/j.specom.2010.03.002CrossRefGoogle Scholar
Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2022). “Look, I can speak correctly”: Learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2022.2080230Google Scholar
Behr, N. S. (2022). English diphthong characteristics produced by Thai EFL learners: Individual practice using PRAAT. CALL-EJ, 23(1), 401424.Google Scholar
Bernstein, J. (2003). Objective measurement of intelligibility. Proceedings of the ICPhS (International Congress of Phonetic Sciences), 1581–1584.Google Scholar
Best, C. T. (1991). The emergence of native-language phonological influences in infants: A perception assimilation model. Haskins Laboratories Status Report on Speech Research, SR-107/108, 130.Google Scholar
Chen, M. (2022). Computer-aided feedback on the pronunciation of Mandarin Chinese tones: Using Praat to promote multimedia foreign language learning. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2022.2037652CrossRefGoogle Scholar
Chomsky, N., & Halle, M. (1968). The sound patterns of English. Harper & Row.Google Scholar
Collins, B., Mees, I. M., & Carley, P. (2019). Practical English phonetics and phonology: A resource book for students. Routledge. https://doi.org/10.4324/9780429490392CrossRefGoogle Scholar
Dantsuji, M. (1989). Onseigaku to on’inron [Phonetics and phonology]. In Sakiyama, O. (Ed.), Nihongo to Nihongokyouiku [Japanese language and Japanese language education] (Vol. 11, 2159). Meijishoin.Google Scholar
Dantsuji, M., & Tsubota, Y. (2005). Dainigengo no onsei shutoku to CALL [Speech acquisition of L2 and CALL]. Onsei Kenkyu [Journal of the Phonetic Society of Japan], 9(1). 515.Google Scholar
de Jong, N. H., Pacilly, J., & Heeren, W. (2021). PRAAT scripts to measure speed fluency and breakdown fluency in speech automatically. Clinical Linguistics & Phonetics, 35(5), 456476. https://doi.org/10.1080/0969594X.2021.1951162Google Scholar
Duan, R., Kawahara, T., Dantsuji, M., & Nanjo, H. (2019). Cross-lingual transfer learning of nonnative acoustic modeling for pronunciation error detection and diagnosis. IEEE/ACM Transactions on Audio, Speech & Language Process, 28, 391401. http://dx.doi.org/10.1109/TASLP.2019.2955858CrossRefGoogle Scholar
Evers, K., & Chen, S. (2020). Effects of an automatic speech recognition system with peer feedback on pronunciation instruction for adults. Computer Assisted Language Learning, 35(8), 18691889. https://doi.org/10.1080/09588221.2020.1839504CrossRefGoogle Scholar
Ezykin, L. (2018). English: Average vowel formants F1 and F2. https://commons.wikimedia.org/w/index.php?curid=71013415Google Scholar
Flege, J. E. (1995). Second language speech learning. In Strange, W. (Ed.), Speech perception and linguistic experience (pp. 233277). York Press.Google Scholar
Goss, S. (2020). Exploring variation in nonnative Japanese learners’ perception of lexical pitch accent: The roles of processing resources and learning context. Applied Psycholinguistics, 41(1), 2549. https://doi.org/10.1017/S0142716419000377CrossRefGoogle Scholar
Handley, Z., & Hamel, M. (2005). Establishing a methodology for benchmarking speech synthesis for computer-assisted language learning (CALL). Language Learning & Technology, 9(3), 99-120. http://dx.doi.org/10125/44034Google Scholar
Hu, W., Qian, Y., Soong, F.-K., & Wang, Y. (2015). Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Communication, 67, 154166. https://doi.org/10.1016/j.specom.2014.12.008CrossRefGoogle Scholar
Imoto, K., Tsubota, Y., Raux, A., Kawahara, T., & Dantsuji, M. (2002). Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system. Proceedings of ICSLP (International Conference of Spoken Language Processing), 749–752. http://doi.org/10.21437/ICSLP.2002-244CrossRefGoogle Scholar
International Phonetic Association. (1999). Handbook of the international phonetic association. Cambridge University Press.Google Scholar
Jakobson, R., Fant, C. G. M., & Halle, M. (1952). Preliminaries to speech analysis: The distinctive features and their correlates. MIT Press.Google Scholar
Jo, C.-H., Kawahara, T., Doshita, S., & Dantsuji, M. (1998). Automatic pronunciation error detection and guidance for foreign language learning. Proceedings of ICSLP (International Conference of Spoken Language Processing), pp. 2639–2642. http://doi.org/10.21437/ICSLP.1998-759CrossRefGoogle Scholar
Kaur, J., Singh, A., & Kadyan, V. (2021). Automatic speech recognition system for tonal languages: State-of-the-art survey. Archives of Computational Methods in Engineering, 28, 10391068. https://doi.org/10.1007/s11831-020-09513-0CrossRefGoogle Scholar
Knight, D., & Adolphs, S. (2015). Building a spoken corpus: What are the basics? In O’Keeffe, A. & McCarthy, M. (Eds.), The Routledge handbook of corpus linguistics (2nd ed., pp. 2134). Routledge.Google Scholar
Kochem, T., Beck, J., & Goodale, E. (2022). The use of ASR-equipped software in the teaching of suprasegmental features of pronunciation: A critical review. CALICO Journal, 39(3), 306325. https://doi.org/10.1558/cj.41968CrossRefGoogle Scholar
Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The native language magnet theory. In de Boysson-Bardies, B., de Schonen, S., Jusczyk, P., MacNeilage, P., & Morton, J. (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 259274). Kluewer Academic Press.CrossRefGoogle Scholar
Ladefoged, P. (2001). A course in phonetics (4th ed.). Heinle & Heinle.Google Scholar
Ladefoged, P., & Maddieson, I. (1996). Sounds of the world’s languages. Blackwells.Google Scholar
Lee, C.-H., & Siniscalchi, M. (2013). An information-extraction approach to speech processing: Analysis, detection, verification and recognition. Proceedings of the IEEE, 101(5), 10891115. https://doi.org/10.1109/JPROC.2013.2238591CrossRefGoogle Scholar
Li, S., & Wang, L. (2012). Cross linguistic comparison of Mandarin and English EMA articulatory data. Proceedings of INTERSPEECH, 903–906.CrossRefGoogle Scholar
McCrocklin, S., & Edalatishams, I. (2020). Revisiting popular speech recognition software for ESL speech. TESOL Quarterly, 54(4), 10861097. https://doi.org/10.1002/tesq.3006CrossRefGoogle Scholar
Meng, H., Lo, W.-K., Harrison, A. M., Lee, P., Wong, K. H., Leung, W.-K., & Meng, F. (2010). Development of automatic speech recognition and synthesis technologies to support Chinese learners of English: The CUHK experience. Proceedings of APSIPA ASC, 811–820.Google Scholar
Munro, M., & Derwing, T. (2015). Intelligibility in research and practice: Teaching priorities. In Reed, M. & Levis, J. (Eds.), The handbook of English pronunciation (pp. 377396). Wiley-Blackwell. https://doi.org/10.1002/9781118346952.ch21Google Scholar
Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30, 8393. https://doi.org/10.1016/S0167–6393(99)00046-1CrossRefGoogle Scholar
Newmeyer, F. J. (2022). American linguistics in transition: From post-Bloomfieldian structuralism to generative grammar. Oxford University Press. https://doi.org/10.1093/oso/9780192843760.001.0001CrossRefGoogle Scholar
Ortega-Llebaria, M., & Wu, Z. (2021). Chinese-English speakers’ perception of pitch in their non-tonal language: Reinterpreting English as a tonal-like language. Language and Speech, 64(2), 267291. https://doi.org/10.1177/0023830919894CrossRefGoogle Scholar
Radzikowski, K., Wang, L., Yoshie, O., & Nowak, R. (2021). Accent modification for speech recognition of nonnative speakers using neural style transfer. EURASIP Journal on Audio, Speech, and Music Processing, 2021(11). https://doi.org/10.1186/s13636-021-00206-4CrossRefGoogle Scholar
Raux, A., & Kawahara, T. (2002). Automatic intelligibility assessment and diagnosis of critical pronunciation errors for computer-assisted pronunciation learning. Proceedings of ICSLP (International Conference of Spoken Language Processing), 737–740. https://doi.org/10.21437/ICSLP.2002-241CrossRefGoogle Scholar
Schlechtweg, M. (2023). Optimizing English pronunciation of German students online and with Praat. In Suárez, M. & El-Henawy, W. M. (Eds.), Optimizing online English language learning and teaching (pp. 233277). Springer. https://doi.org/10.1007/978-3-031-27825-9_14Google Scholar
Shadiev, R., & Liu, J. (2023). Review of research on applications of speech recognition technology to assist language learning. ReCALL, 35(1), 7488. https://doi.org/10.1017/S095834402200012XCrossRefGoogle Scholar
Shimizu, K. (1982). Onsei no chouon to chikaku [Articulation and perception of speech sounds]. Shinozaki Shorin.Google Scholar
Shimizu, K. (2008). L2 Onseigakushuu to sono rironteki haikei [L2 speech learning and its theoretical background]. Nagoyagakuindaigaku Ronshuu Gengo Bunka hen, 19(2), 8187.Google Scholar
Shimizu, K., & Dantsuji, M. (1983). A study on the perception of /r/ and /l/ in natural and synthetic speech sounds. Studia Phonologica, 17, 114.Google Scholar
Tsubota, Y., Dantsuji, M., & Kawahara, T. (2000). Computer-assisted English vowel learning system for Japanese speakers using cross language formant structures. Proceedings of ICSLP (Int’l Conf. Spoken Language Processing), 3, 566569. https://doi.org/10.21437/ICSLP.2000-598Google Scholar
Tsubota, Y., Kawahara, T., & Dantsuji, M. (2002). Recognition and verification of English by Japanese students for computer-assisted language learning system. Proceedings of ICSLP (International Conference of Spoken Language Processing), 1205–1208. https://doi.org/10.21437/ICSLP.2002-245CrossRefGoogle Scholar
Tsubota, Y., Kawahara, T., & Dantsuji, M. (2004). An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors. ReCALL, 16(1), 173188. https://doi.org/10.1017/S0958344004001314CrossRefGoogle Scholar
van Dalen, R. C., Knill, K. M., & Gales, M. J. F. (2015). Automatically grading learners’ English using a Gaussian process. Proceedings of SLaTE (Speech and Language Technology in Education). https://doi.org/10.21437/10.21437/SLaTE.2015-2CrossRefGoogle Scholar
van Doremalen, J., Boves, L., Colpaert, J., Cucchiarini, C., & Strik, H. (2016). Evaluating automatic speech recognition-based language learning systems: A case study. Computer Assisted Language Learning, 29(4), 833851. https://doi.org/10.1080/09588221.2016.1167090CrossRefGoogle Scholar
Yang, R., Nanjo, H., & Dantsuji, M. (2018). Training Japanese speakers to identify nasal codas of Mandarin Chinese. Journal of Language Teaching & Research, 9(1), 715.CrossRefGoogle Scholar
Zechner, K., Higgins, D., & Xi, X. (2007). Speechrater: A construct-driven approach to scoring spontaneous nonnative speech. Proceedings of SLaTE (Speech and Language Technology in Education). https://doi.org/10.21437/SLaTE.2007-31CrossRefGoogle Scholar

Further Reading

This article reviews studies in a variety of areas of spoken language technology in education. It highlights the potential benefits and challenges of incorporating such technology into language learning and assessment.

While not dedicated to technology in the teaching of pronunciation, this article brings together past research to show that pronunciation has been delegated to a more minor role in communicative language teaching despite its importance. It explores how we should be teaching pronunciation and includes a discussion on how technology can contribute to improved practice in this regard.

In this article, Selieek and Elimat investigate the effectiveness of ASR in improving the pronunciation of EFL learners. The research results indicate that ASR technology has the potential to enhance learners’ performance in pronunciation by offering them accurate and timely feedback. However, the authors also recognize the necessity for additional research and development to optimize the integration of ASR into language education.

Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51, 832844. https://doi.org/10.1016/j.specom.2009.04.005CrossRefGoogle Scholar
Pennington, M. C. (2021). Teaching pronunciation: The state of the art 2021. RELC Journal, 52(1), 321. https://doi.org/10.1177/00336882211002283CrossRefGoogle Scholar
Seileek, A. A., & Elimat, A. K. (2014). Automatic speech recognition technology as an effective means for teaching pronunciation. The JALT CALL Journal, 10(1), 2147. https://doi.org/10.29140/jaltcall.v10n1.166Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×