Pronunciation

doi:10.1017/9781009294850.034

28 - Pronunciation

from Part VI - Language Skills and Areas

Published online by Cambridge University Press: 15 June 2025

Tatsuya Kawahara and

Masatake Dantsuji

Edited by

Glenn Stockwell and

Yijen Wang

Show author details

Glenn Stockwell: Affiliation:
The Education University of Hong Kong
Yijen Wang: Affiliation:
Waseda University, Japan

Book contents

Get access

Summary

This chapter addresses pronunciation in second language (L2) learning, which ranges from phoneme-level pronunciation to conversation training. First, the definition of phonemes and their relationship with articulation are explained. Vowels and consonants are classified according to different dimensions. The concept of distinctive features is also described. These provide a basis to model and identify phoneme-level pronunciation errors. Suprasegmental features such as stress and rhythm are also addressed. Next, speech analysis methods are described. While formant analysis is effective for diagnosing the pronunciation of vowels, articulatory attribute detection is explored for comprehensive analysis of all phonemes. The chapter then introduces automatic speech recognition (ASR) technology to detect pronunciation errors. Settings of minimal pairs of words, prompted text, and free input can be designed. ASR models are also used for pronunciation grading. The goodness of pronunciation (GOP) score is computed for each phoneme and aggregated over all phonemes in the utterance. Nonnative speech modeling is crucial for effective L2 pronunciation learning.

Keywords

computer-assisted pronunciation learning (CAPL)intelligibility articulation distinctive features suprasegmental features formant frequencies articulatory attributes automatic speech recognition (ASR)goodness of pronunciation (GOP)

Information

Type: Chapter
Information: The Cambridge Handbook of Technology in Language Teaching and Learning , pp. 460 - 478

DOI: https://doi.org/10.1017/9781009294850.034 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Anderson, R. S. (1985). Phonology in the twentieth century. The University of Chicago Press.Google Scholar

Badin, P., Tarabalka, Y., Elisei, F., & Bailly, G. (2010). Can you “read” tongue movements? Evaluation of the contribution of tongue display to speech understanding. Speech Communication, 52, 493–503. https://doi.org/10.1016/j.specom.2010.03.002 CrossRef Google Scholar

Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2022). “Look, I can speak correctly”: Learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2022.2080230 Google Scholar

Behr, N. S. (2022). English diphthong characteristics produced by Thai EFL learners: Individual practice using PRAAT. CALL-EJ, 23(1), 401–424.Google Scholar

Bernstein, J. (2003). Objective measurement of intelligibility. Proceedings of the ICPhS (International Congress of Phonetic Sciences), 1581–1584.Google Scholar

Best, C. T. (1991). The emergence of native-language phonological influences in infants: A perception assimilation model. Haskins Laboratories Status Report on Speech Research, SR-107/108, 1–30.Google Scholar

Chen, M. (2022). Computer-aided feedback on the pronunciation of Mandarin Chinese tones: Using Praat to promote multimedia foreign language learning. Computer Assisted Language Learning. Advance online publication. https://doi.org/10.1080/09588221.2022.2037652 CrossRef Google Scholar

Chomsky, N., & Halle, M. (1968). The sound patterns of English. Harper & Row.Google Scholar

Collins, B., Mees, I. M., & Carley, P. (2019). Practical English phonetics and phonology: A resource book for students. Routledge. https://doi.org/10.4324/9780429490392 CrossRef Google Scholar

Dantsuji, M. (1989). Onseigaku to on’inron [Phonetics and phonology]. In Sakiyama, O. (Ed.), Nihongo to Nihongokyouiku [Japanese language and Japanese language education] (Vol. 11, 21–59). Meijishoin.Google Scholar

Dantsuji, M., & Tsubota, Y. (2005). Dainigengo no onsei shutoku to CALL [Speech acquisition of L2 and CALL]. Onsei Kenkyu [Journal of the Phonetic Society of Japan], 9(1). 5–15.Google Scholar

de Jong, N. H., Pacilly, J., & Heeren, W. (2021). PRAAT scripts to measure speed fluency and breakdown fluency in speech automatically. Clinical Linguistics & Phonetics, 35(5), 456–476. https://doi.org/10.1080/0969594X.2021.1951162 Google Scholar

Duan, R., Kawahara, T., Dantsuji, M., & Nanjo, H. (2019). Cross-lingual transfer learning of nonnative acoustic modeling for pronunciation error detection and diagnosis. IEEE/ACM Transactions on Audio, Speech & Language Process, 28, 391–401. http://dx.doi.org/10.1109/TASLP.2019.2955858 CrossRef Google Scholar

Evers, K., & Chen, S. (2020). Effects of an automatic speech recognition system with peer feedback on pronunciation instruction for adults. Computer Assisted Language Learning, 35(8), 1869–1889. https://doi.org/10.1080/09588221.2020.1839504 CrossRef Google Scholar

Ezykin, L. (2018). English: Average vowel formants F1 and F2. https://commons.wikimedia.org/w/index.php?curid=71013415 Google Scholar

Flege, J. E. (1995). Second language speech learning. In Strange, W. (Ed.), Speech perception and linguistic experience (pp. 233–277). York Press.Google Scholar

Goss, S. (2020). Exploring variation in nonnative Japanese learners’ perception of lexical pitch accent: The roles of processing resources and learning context. Applied Psycholinguistics, 41(1), 25–49. https://doi.org/10.1017/S0142716419000377 CrossRef Google Scholar

Handley, Z., & Hamel, M. (2005). Establishing a methodology for benchmarking speech synthesis for computer-assisted language learning (CALL). Language Learning & Technology, 9(3), 99-120. http://dx.doi.org/10125/44034 CrossRef Google Scholar

Hu, W., Qian, Y., Soong, F.-K., & Wang, Y. (2015). Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Communication, 67, 154–166. https://doi.org/10.1016/j.specom.2014.12.008 CrossRef Google Scholar

Imoto, K., Tsubota, Y., Raux, A., Kawahara, T., & Dantsuji, M. (2002). Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system. Proceedings of ICSLP (International Conference of Spoken Language Processing), 749–752. http://doi.org/10.21437/ICSLP.2002-244 CrossRef Google Scholar

International Phonetic Association. (1999). Handbook of the international phonetic association. Cambridge University Press.Google Scholar

Jakobson, R., Fant, C. G. M., & Halle, M. (1952). Preliminaries to speech analysis: The distinctive features and their correlates. MIT Press.Google Scholar

Jo, C.-H., Kawahara, T., Doshita, S., & Dantsuji, M. (1998). Automatic pronunciation error detection and guidance for foreign language learning. Proceedings of ICSLP (International Conference of Spoken Language Processing), pp. 2639–2642. http://doi.org/10.21437/ICSLP.1998-759 CrossRef Google Scholar

Kaur, J., Singh, A., & Kadyan, V. (2021). Automatic speech recognition system for tonal languages: State-of-the-art survey. Archives of Computational Methods in Engineering, 28, 1039–1068. https://doi.org/10.1007/s11831-020-09513-0 CrossRef Google Scholar

Knight, D., & Adolphs, S. (2015). Building a spoken corpus: What are the basics? In O’Keeffe, A. & McCarthy, M. (Eds.), The Routledge handbook of corpus linguistics (2nd ed., pp. 21–34). Routledge.Google Scholar

Kochem, T., Beck, J., & Goodale, E. (2022). The use of ASR-equipped software in the teaching of suprasegmental features of pronunciation: A critical review. CALICO Journal, 39(3), 306–325. https://doi.org/10.1558/cj.41968 CrossRef Google Scholar

Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The native language magnet theory. In de Boysson-Bardies, B., de Schonen, S., Jusczyk, P., MacNeilage, P., & Morton, J. (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 259–274). Kluewer Academic Press.10.1007/978-94-015-8234-6_22CrossRef Google Scholar

Ladefoged, P. (2001). A course in phonetics (4th ed.). Heinle & Heinle.Google Scholar

Ladefoged, P., & Maddieson, I. (1996). Sounds of the world’s languages. Blackwells.Google Scholar

Lee, C.-H., & Siniscalchi, M. (2013). An information-extraction approach to speech processing: Analysis, detection, verification and recognition. Proceedings of the IEEE, 101(5), 1089–1115. https://doi.org/10.1109/JPROC.2013.2238591 CrossRef Google Scholar

Li, S., & Wang, L. (2012). Cross linguistic comparison of Mandarin and English EMA articulatory data. Proceedings of INTERSPEECH, 903–906.10.21437/Interspeech.2012-272CrossRef Google Scholar

McCrocklin, S., & Edalatishams, I. (2020). Revisiting popular speech recognition software for ESL speech. TESOL Quarterly, 54(4), 1086–1097. https://doi.org/10.1002/tesq.3006 CrossRef Google Scholar

Meng, H., Lo, W.-K., Harrison, A. M., Lee, P., Wong, K. H., Leung, W.-K., & Meng, F. (2010). Development of automatic speech recognition and synthesis technologies to support Chinese learners of English: The CUHK experience. Proceedings of APSIPA ASC, 811–820.Google Scholar

Munro, M., & Derwing, T. (2015). Intelligibility in research and practice: Teaching priorities. In Reed, M. & Levis, J. (Eds.), The handbook of English pronunciation (pp. 377–396). Wiley-Blackwell. https://doi.org/10.1002/9781118346952.ch21 Google Scholar

Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30, 83–93. https://doi.org/10.1016/S0167–6393(99)00046-1 CrossRef Google Scholar

Newmeyer, F. J. (2022). American linguistics in transition: From post-Bloomfieldian structuralism to generative grammar. Oxford University Press. https://doi.org/10.1093/oso/9780192843760.001.0001 CrossRef Google Scholar

Ortega-Llebaria, M., & Wu, Z. (2021). Chinese-English speakers’ perception of pitch in their non-tonal language: Reinterpreting English as a tonal-like language. Language and Speech, 64(2), 267–291. https://doi.org/10.1177/0023830919894 CrossRef Google Scholar

Radzikowski, K., Wang, L., Yoshie, O., & Nowak, R. (2021). Accent modification for speech recognition of nonnative speakers using neural style transfer. EURASIP Journal on Audio, Speech, and Music Processing, 2021(11). https://doi.org/10.1186/s13636-021-00206-4 CrossRef Google Scholar

Raux, A., & Kawahara, T. (2002). Automatic intelligibility assessment and diagnosis of critical pronunciation errors for computer-assisted pronunciation learning. Proceedings of ICSLP (International Conference of Spoken Language Processing), 737–740. https://doi.org/10.21437/ICSLP.2002-241 CrossRef Google Scholar

Schlechtweg, M. (2023). Optimizing English pronunciation of German students online and with Praat. In Suárez, M. & El-Henawy, W. M. (Eds.), Optimizing online English language learning and teaching (pp. 233–277). Springer. https://doi.org/10.1007/978-3-031-27825-9_14 Google Scholar

Shadiev, R., & Liu, J. (2023). Review of research on applications of speech recognition technology to assist language learning. ReCALL, 35(1), 74–88. https://doi.org/10.1017/S095834402200012X CrossRef Google Scholar

Shimizu, K. (1982). Onsei no chouon to chikaku [Articulation and perception of speech sounds]. Shinozaki Shorin.Google Scholar

Shimizu, K. (2008). L2 Onseigakushuu to sono rironteki haikei [L2 speech learning and its theoretical background]. Nagoyagakuindaigaku Ronshuu Gengo Bunka hen, 19(2), 81–87.Google Scholar

Shimizu, K., & Dantsuji, M. (1983). A study on the perception of /r/ and /l/ in natural and synthetic speech sounds. Studia Phonologica, 17, 1–14.Google Scholar

Tsubota, Y., Dantsuji, M., & Kawahara, T. (2000). Computer-assisted English vowel learning system for Japanese speakers using cross language formant structures. Proceedings of ICSLP (Int’l Conf. Spoken Language Processing), 3, 566–569. https://doi.org/10.21437/ICSLP.2000-598 Google Scholar

Tsubota, Y., Kawahara, T., & Dantsuji, M. (2002). Recognition and verification of English by Japanese students for computer-assisted language learning system. Proceedings of ICSLP (International Conference of Spoken Language Processing), 1205–1208. https://doi.org/10.21437/ICSLP.2002-245 CrossRef Google Scholar

Tsubota, Y., Kawahara, T., & Dantsuji, M. (2004). An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors. ReCALL, 16(1), 173–188. https://doi.org/10.1017/S0958344004001314 CrossRef Google Scholar

van Dalen, R. C., Knill, K. M., & Gales, M. J. F. (2015). Automatically grading learners’ English using a Gaussian process. Proceedings of SLaTE (Speech and Language Technology in Education). https://doi.org/10.21437/10.21437/SLaTE.2015-2 CrossRef Google Scholar

van Doremalen, J., Boves, L., Colpaert, J., Cucchiarini, C., & Strik, H. (2016). Evaluating automatic speech recognition-based language learning systems: A case study. Computer Assisted Language Learning, 29(4), 833–851. https://doi.org/10.1080/09588221.2016.1167090 CrossRef Google Scholar

Yang, R., Nanjo, H., & Dantsuji, M. (2018). Training Japanese speakers to identify nasal codas of Mandarin Chinese. Journal of Language Teaching & Research, 9(1), 7–15.10.17507/jltr.0901.02CrossRef Google Scholar

Zechner, K., Higgins, D., & Xi, X. (2007). Speechrater: A construct-driven approach to scoring spontaneous nonnative speech. Proceedings of SLaTE (Speech and Language Technology in Education). https://doi.org/10.21437/SLaTE.2007-31 CrossRef Google Scholar

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.

Book contents

28 - Pronunciation

Summary

Keywords

Information

Access options

Book purchase

Temporarily unavailable

References

References

Further Reading

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Save book to Kindle

Save book to Dropbox

Save book to Google Drive