Zaiwa (ISO 639-3 code: atb; Glottocode: zaiw1241) belongs to the Burmese branch of the Tibeto-Burman languages, sharing many common features with the Burmese and Achang languages of the same branch. It is primarily spoken by a subgroup of Jingpo people, who identify as ‘Zaiwa’. Beyond Zaiwa, the Jingpo people encompass four distinct subgroups, each conversing in unique linguistic variations, namely Jingpo (景颇), Langsu (浪速), Leqi (勒期), and Bola (波拉). Jingpo is distinctively affiliated with the Jingpo branch of the Tibeto-Burman languages, whereas the other three languages, including Zaiwa, are categorized within the Burmese branch (He, 2016). The majority of Zaiwa speakers are found in Luxi (潞西), Yingjiang (盈江), Longchuan (陇川), Ruili (瑞丽), Lianghe (梁河), and Wanding (畹町) counties within the Dehong Dai and Jingpo Autonomous Prefecture (德宏傣族景颇族自治州) of Yunnan province (云南省) as well as the Shan and Kachin states in Myanmar. Zaiwa is widely used in Zaiwa-dominant areas or communities with a significant Zaiwa presence. It is used not only in daily life contexts, such as among family members, villages, markets, and shops, but also in a range of social sectors, including in government and judicial offices, as well as on radio and broadcasting stations. Among the Jingpo languages, Zaiwa has the largest number of users. Individuals who speak Zaiwa often speak languages of other Jingpo subgroups in addition to Mandarin Chinese. Due to the extensive promotion and dissemination of Mandarin, particularly in educational and media contexts, Mandarin has emerged as the predominant second language for the youth within the community. Moreover, in neighboring regions or mixed communities where the Zaiwa subgroup is prominent, individuals from other ethnic groups such as the Achang, Han, Dai, and Lisu also frequently speak Zaiwa. According to the statistics from China’s Sixth National Population Census in 2010, the total population of the Jingpo ethnic group is approximately 140,000. There were over 80,000 Zaiwa speakers within China, constituting more than 60% of the total Jingpo ethnic population in the country (He, 2016). Scholars such as Xu and Xu (1984), Dai (1989), Kong (2001), Pan (2014), He (2016), Lu and Kong (2019), and Lu et al. (2025) have conducted studies on the phonetics of Zaiwa.
Previous research has generally categorized Zaiwa into two principal dialects (He, 2016): the Bengwa dialect (迸瓦方言), which encompasses Wuchalu Township (五岔路乡), as well as Yingjiang County and parts of Xishan Township (西山乡) in Mang County (芒市), and the Longzhun dialect (龙准方言), which encompasses Zhefang (遮放) and Manghai (芒海) Townships as well as the southern sector of Xishan Township in Mang County. In addition, the Longzhun dialect spans across Longchuan and Ruili Counties. The present Illustration is based on data collected by a female native Zaiwa speaker from Ruili County. As shown in Figure 1, Ruili is located in the southwestern area of the Dehong Dai and Jingpo Autonomous Prefecture. It is adjacent to Mang County to the east, borders Longchuan County to the north, and shares its northwest, southwest, and southeast borders with Myanmar. As of September 2022, Ruili’s total area included 944.75 square kilometers. By the end of 2021, the county’s permanent resident population totaled 226,639. The female native Zaiwa speaker in our study was 26 years old at the time of recording. She was born and raised in Huyu Township of Ruili County, and had not spent long periods outside of the area. She completed her education at Dehong Teachers’ College, specializing in Zaiwa and is presently employed by a Village Committee of the Ruili Government. The devices used during the recording sessions were a laptop computer, an external sound card, a mixing console, and a unidirectional collar clip microphoneFootnote 1 . The software used was Adobe Audition 2023, with a sampling rate of 44.1 kHz and 16-bit resolution. Recordings were performed in a quiet room. Basic speech parameters were extracted using Praat (Boersma & Weenink, 2020). Chao’s five-scale pitch system was applied to transcribe the tones throughout this article (Chao, 1980). Vowels with pressed voice appear with the “creaky-voiced” diacritic and the distinction between pressed and modal voice vowels will be described in detail in the Vowels section.

Figure 1. Distribution of Zaiwa. The areas of Zaiwa are shaded pink. The map is drawn from C1-24 of Language Atlas of China, second edition.
Consonants
As shown in the table below, the consonants of Zaiwa comprise 36 phonemes, including 22 simple consonants and 14 palatalized consonants. There are only voiceless obstruents, no voiced obstruents, and no consonant clusters. Plosives and affricates contrast in terms of aspiration. Aspirated consonants and fricatives cannot be combined with vowels with pressed voice. The approximant 𰀯ʋ𰀯 has an allophone [w] that precedes the monophthong 𰀯a𰀯. Since the fricative 𰀯f𰀯 generally appears only in Chinese loanwords, such as 𰀯fa21min21𰀯 ‘invention’, examples will not be provided in the following text.

The following minimal and near-minimal pairs illustrate the contrasts summarized above:

Plosives
There are 13 plosives in Zaiwa, all voiceless, at bilabial, alveolar, and velar places of articulation. All plosives contrast in terms of aspiration. Voice Onset Time (VOT) for unaspirated voiceless stops is shorter than that for aspirated voiceless stops, with a value of almost zero. Figure 2 presents a spectrogram of 𰀯tha51𰀯 ‘to argue/rebuke’ and 𰀯ta35𰀯 ‘time’ (measure word), demonstrating the difference in the VOT between the two. When pronouncing 𰀯ta35𰀯, the stop release and vocal fold vibration occur almost simultaneously, resulting in a VOT that is almost 0; whereas the VOT for 𰀯tha51𰀯 is approximately 96 ms. As shown in Figure 3, the VOT of unaspirated plosives, that is, 𰀯p t k𰀯, is shorter with the release and voicing onset occurring within approximately 12 ms. By contrast, the VOT of aspirated plosives, that is, 𰀯ph th kh𰀯, ranges from 100–160 ms. In addition to appearing in onset position, the unaspirated plosives 𰀯p𰀯, 𰀯t𰀯, and 𰀯k𰀯 can also appear as stop codas followed by vowels and 𰀯ʔ𰀯 can only appear in coda position.

Figure 2. Waveforms and spectrograms for the minimal pairs 𰀯tha51𰀯 ‘argue, rebuke’ and 𰀯ta35𰀯 ‘time’ (measure).

Figure 3. VOT of plosives in Zaiwa, where unaspirated and aspirated plosives are indicated by distinct colors. The mean and standard deviations were calculated using five tokens of each plosive from a single speaker.
Affricates
There are 11 affricates in Zaiwa. The spectrogram in Figure 4 shows that the aspirated affricate is acoustically comprised of stop release burst, frication and aspiration intervals whereas the unaspirated affricate has only release burst and frication intervals. As shown in Figure 5, the VOT of unaspirated affricates is much shorter than the aspirated affricates.

Figure 4. Waveforms and spectrograms for the minimal pairs 𰀯tse51𰀯 ‘items’ and 𰀯tsʰe51𰀯 ‘ten’.

Figure 5. VOT of affricates in Zaiwa, where unaspirated and aspirated affricates are indicated by distinct colors. The mean and standard deviations were calculated using five tokens of each affricate from a single speaker.
Fricatives
Five fricatives are produced at three articulation locations: alveolar, postalveolar, and velar. Fast Fourier Transform (FFT) spectra (made with a 23-ms window centered on the peak of noise intensity) of five fricative samples of Zaiwa are provided in Figure 6. The energy distributions of the five fricatives are different. 𰀯s𰀯 has energy mainly concentrated between 8,000–12,000 Hz. The energy of 𰀯ʃ𰀯 is primarily between 4,000–6,000 Hz, and that of 𰀯x𰀯 is between 2,000–4,000 Hz. The energy of 𰀯ʃj𰀯 is primarily around 6,000 Hz and 𰀯xj𰀯 is primarily around 4,000 Hz. Additionally, the peak frequency and center of gravity of palatalized fricatives’ spectra are comparatively higher than those of their non-palatalized counterparts.

Figure 6. FFT (Fast Fourier Transform, blue) and logarithmic (red) spectra (Linear Predictive Coding, red) spectrum (made with a 23-ms window centered on the peak of noise intensity) of the frication in 𰀯sa51𰀯 ‘hand basket’, 𰀯ʃai51𰀯 ‘naughty’, 𰀯ʃje51𰀯 ‘loose’, 𰀯xe51𰀯 ’swing, deny’, and 𰀯xje51𰀯 ‘that’.
Nasals
The six nasals of Zaiwa include bilabial (𰀯m𰀯, 𰀯mj𰀯), alveolar (𰀯n𰀯, 𰀯nj𰀯), and velar (𰀯ŋ𰀯, 𰀯ŋj𰀯). 𰀯m n ŋ𰀯 can occur both in the onset and coda position, while palatalized 𰀯mj nj ŋj𰀯 occur only in the onset position. Syllabic nasals also occur in certain words, for example, 𰀯m̩51𰀯 for ‘mm-hmm’.
Lateral approximants and approximants
There are two lateral approximants, 𰀯l lʲ𰀯, and three central approximants, 𰀯ʋ𰀯, 𰀯ɹ𰀯, 𰀯j𰀯 in Zaiwa. They are produced at three articulation locations: labiodental, alveolar, and palatal. Scholars (Xu & Xu, 1984) have described 𰀯ʋ𰀯 and 𰀯ɹ𰀯 as the voiced counterparts to the voiceless consonants 𰀯f𰀯 and 𰀯ʃ𰀯, respectively, labeling them as the voiced phonemes 𰀯v𰀯 and 𰀯ʒ𰀯. Subsequent studies, utilizing empirical analyses and comparative studies, have posited that the phonemes 𰀯ʋ𰀯 and 𰀯ɹ𰀯 in Zaiwa exhibit characteristics more akin to those of approximants (Pan, 2014). We posit the latter due to their phonetic characteristics. In our study, we also found that 𰀯ʋ𰀯 and 𰀯ɹ𰀯 are excited mainly by voicing, with only slight frication, as shown in Figure 7. Furthermore, we identified [w] as an allophone of 𰀯ʋ𰀯. Hence, treating 𰀯ʋ𰀯 as an approximant aligns more closely with the data observed. If 𰀯ʋ𰀯 and 𰀯ɹ𰀯 are treated as approximants, this can explain both the phonological configuration and the phonological evolution, and it also aligns more closely to the actual pronunciation.

Figure 7. Waveforms and spectrograms for 𰀯ʋa21𰀯 [wa] ‘bamboo’ and 𰀯ɹa21𰀯 ‘need’.
Palatalized consonants
In Zaiwa, the bilabial plosive, alveolar plosive, velar plosive, postalveolar fricatives, velar fricatives, nasals, and lateral approximant each have their palatalized counterpart, namely, 𰀯pj𰀯, 𰀯phj𰀯, 𰀯tj𰀯, 𰀯thj𰀯, 𰀯ʧj𰀯, 𰀯ʧhj𰀯, 𰀯ʃj𰀯, 𰀯kj𰀯, 𰀯khj𰀯, 𰀯xj𰀯, 𰀯mj𰀯, 𰀯nj𰀯, 𰀯ŋj𰀯, and 𰀯lj𰀯. According to the spectrogram for plain 𰀯p𰀯 in Figure 8(a), F1 remains stable at approximately 1,000 Hz, whereas F2 starts at 1,150 Hz and slowly rises to 1,350 Hz, and then remains stable. As can be seen in the spectrogram for palatalized 𰀯p𰀯 in Figure 8(b), F1 starts at 400 Hz, gradually rises to 1,000 Hz, and remains stable, whereas F2 increases 3,000 Hz to 2,300 Hz and then remains steady. The dynamic trajectories of F1 and F2 are similar to the process from the high front vowel to the low central vowel. As shown in Figure 9, we statistically analyzed the mean F2 onset of the plain consonants and corresponding palatalized consonants. The F2 onsets of palatalized consonants were significantly higher than those of plain consonants.
Vowels
Monophthongs



Figure 8. Waveforms and spectrograms for the minimal pairs 𰀯pa51𰀯 ‘silly’ versus 𰀯pja51𰀯 ‘perform’. The red dotted lines represent the formants.
Zaiwa has five vowel phonemes: 𰀯i e a o u𰀯, further differentiated according to phonation (modal versus pressed voice), forming 10 monophthong vowels. Vowels 𰀯i ḭ𰀯 are high front vowels, 𰀯e ḛ𰀯 are mid-high front vowels, 𰀯a a̰𰀯 are central low vowels, 𰀯o o̰𰀯 are mid-low back vowels, and 𰀯u ṵ𰀯 are high back vowels. All 10 vowels can occur in closed syllables with nasal codas 𰀯m n ŋ𰀯 and stop codas 𰀯p t k ʔ𰀯. The vowel chart was plotted based on the relative F1 and F2 values of the vowel phonemes in Zaiwa, as shown in Figure 10. We chose 10 syllables with different consonants for each vowel and then used the Praat software to extract the values of the stable segment of the formants. Vowel [ɤ] is an allophone of 𰀯i𰀯, 𰀯ḭ𰀯, and [ə] is an allophone of 𰀯e𰀯 and 𰀯ḛ𰀯. In the subsequent sections, we will provide a detailed introduction to the conditions under which these two variants occur.

Figure 9. The mean F2 onset after plain versus palatalized consonants. The mean and standard deviations were calculated using eight tokens of each consonant from the single speaker.

Figure 10. Formant plots for the monophthongs of Zaiwa, F1 and F2 of each vowel were based on mean formant values of 10 open syllables. The ellipses show the F1 and F2 values to two standard deviations.
The phonemes 𰀯i𰀯 and 𰀯ḭ𰀯 are realized as [ɤ] and [ɤ̰] after velar consonants 𰀯k𰀯, 𰀯kh𰀯, 𰀯x𰀯, as well as bilabials, alveolars, and the approximant 𰀯ʋ𰀯, whereas they remain unchanged following the palatalized counterparts of bilabial, alveolar, and velar consonants. The example includes 𰀯pik31𰀯 realized as [pɤk31] ‘to shoot’. The spectrographic analysis presented in Figure 11 for [pɤk31] ‘to shoot’ compared to 𰀯tjik55𰀯 ‘urgent, fast’ distinctly shows that the separation between the formants F1 and F2 in [pɤk31] is markedly less than that observed in 𰀯tjik55𰀯.

Figure 11. Waveforms and spectrograms for [pɤk31] ‘to shoot’ and 𰀯tjik55𰀯 ‘urgent, fast’. The red dotted lines represent the formants.
The phonemes 𰀯e𰀯 and 𰀯ḛ𰀯 are realized as [ə] and [ə̰] following bilabial, alveolar, and velar consonants, the labiodental approximant 𰀯ʋ𰀯, and in null onset positions, whereas they remain unchanged following the palatalized counterparts of bilabial, alveolar, and velar consonants. An example of this phenomenon is observed in 𰀯pe35𰀯, which is articulated as [pə35] ‘fall off’. In Figure 12, which shows spectrograms of [pə35] ‘fall off’ to 𰀯tje35𰀯 ‘send’, we find that the separation between the formants F1 and F2 for [pə35] ’fall off’ is considerably narrower than that for 𰀯tje35𰀯 ‘send’. Figure 13 shows the spectrogram of two syllables 𰀯me21𰀯 ‘green blue’ and 𰀯mik21𰀯 ‘greedy’, which are realized respectively as [mə21] and [mɤk21].

Figure 12. Waveforms and spectrograms for the minimal pairs [pə35] ‘fall off’ versus 𰀯tje35𰀯 ‘send’. The red dotted lines represent the formants.
Vowel phonation
The distinction between vowels with pressed versus modal voice constitutes a pivotal feature within the phonology of Zaiwa. Through an analytical review of the manifestations and combinatorial rules of vowels with pressed voice and modal voice across various Tibeto-Burman languages, Dai (1989) posited that the differentiation between vowels with pressed and modal voice represents a critical characteristic in a subset of Tibeto-Burman languages. Regarding the formation of the pressed and modal voice vowels, Dai (1989) pointed out that the opposition in the Tibeto-Burman language family of China was the result of convergence from two different paths: one derived from the voiced versus voiceless contrast in onset consonants (noted in languages such as Zaiwa and Jingpo), and the other from the contrast between checked and unchecked vowels (observed in languages like Yi, Lahu, Lisu, and Hani). For example, one of the falling tones in Lisu is represented as being ‘checked’ by a glottal stop (Tabain, 2019>). The former involves changes in vowel quality influenced by the elements preceding the vowel, whereas the latter involves changes influenced by the elements following the vowel. Both the consonants before and after the vowel influence vowel changes, driving vowels toward a pressed and modal opposition, albeit under different conditions. This is a distinctive feature of the evolutionary development of the vowel system within the Tibeto-Burman language family, separating it from other language families. From the perspective of phonation types, Kong (2001) indicated that the vowels with modal and pressed voice in Zaiwa belong to different phonation types. The vowels with modal voice are largely modal but with a slightly breathy voice quality, while the vowels with pressed voice have a form representing a highly complex form of creaky voice.
Thus, we calculated corrected H1*–H2* using Matlab for five minimal pairs or near-minimal pairs of vowels with pressed and modal voice in Zaiwa. The findings are presented in Figure 14, and reveal that the difference of H1*–H2* between vowels with pressed and modal voice is large. This suggests that vowels with pressed voice are produced with constricted vocal folds.
Cooccurrence restrictions between consonants and vowels
Vowels with a modal voice can combine with all consonants, while vowels with a pressed voice can only combine with unaspirated plosives, unaspirated affricates, nasals, lateral approximants, and approximants.


Figure 13. Waveforms and spectrograms for [mə21] ‘green blue’ and [mɤk21] ‘greedy’. The red dotted lines represent the formants.

Figure 14. H1*–H2* of vowels with modal and pressed voice. The means and standard deviations were calculated using five tokens of each phonation type from the single speaker.
Diphthongs


In Zaiwa, the primary vowels 𰀯a o u𰀯 combine with the coda 𰀯-i, -u𰀯 to form eight diphthongs with modal and pressed voice: 𰀯ai a̰i oi o̰i ui ṵi au a̰u𰀯, all of which are rising diphthongs. Figure 15 shows that F1 and F2 of the diphthongs were measured at 20% and 80%, which indicates the onset and offset values of the diphthongs.

Figure 15. Formant plots for the diphthongs of Zaiwa, F1 and F2 of vowel were based on mean formant values of 5 syllables.The ellipses show the F1and F2 values to 2 standard deviations.
Tones

The Zaiwa tone system is rather complex. Previous research by Xu and Xu (1984) suggested that Zaiwa has three tones, with values of 21, 55, and 51. Dai (1989) identified two methods for categorizing Zaiwa tones. The first approach aggregates tones based on their proximal tone values and consolidates checked and unchecked tones into a single tone category. This classification yielded three tones, marked as 21, 55, and 51, with the 21 and 55 categories each encompassing both checked and unchecked variations. The second approach separated checked from unchecked tones, thus leading to five tones: three unchecked (21, 55, 51) and two checked (21, 55). Vowels with stop codas 𰀯p t k ʔ𰀯 can only combine with checked tones. He (2016) concurred with the existence of three tones within Zaiwa but argued for a revision of the tonal values to 22, 55, and 51. Furthermore, it has been shown in past studies that the high level tone 55 in Zaiwa changes to 35 for modal vowels preceded by unaspirated plosives and affricates, nasals, lateral approximants, or approximants. With other tones, vowels can either be pressed or modal in phonation. Lu & Kong (2019) observed that the Zaiwa tonal system has six tones for monosyllabic words, including two level tones, three falling tones and one rising tone, whose tone values are 55, 44, 51, 31, 21, and 35 respectively. As the phonation types of Zaiwa vowels vary between modal and pressed voice, and are closely integrated with tones, Lu & Kong (2019) also agreed with the limitations on the conditions under which the tone 35 appears in previous research.
Based on the study by Lu & Kong (2019) and the current survey, we undertook an acoustic analysis of the Zaiwa tones, whereby the fundamental frequency of each monosyllabic utterance was extracted to analyze the tones. When selecting monosyllabic words to determine tones, we preferred words starting with unaspirated stop consonants to facilitate the extraction of fundamental frequency data. The f0 values of each tone were extracted using Praat (Boersma & Weenink, 2020). Each tonal contour is obtained by averaging across six tokens. Tonal values were marked using the five-scale pitch system developed by Chao (1980), in which 5 was the highest and 1 was the lowest. As shown in Figure 16, there is a difference in the tone values between the checked and unchecked tones. Checked tone 55, marked by a stop coda of 𰀯p, t, k, ʔ𰀯, was approximately one semitone higher in F0 than unchecked tone 55. The checked tone was approximated at 55, while the unchecked tone was at 44. Similarly, for tone 21, the checked tone characterized by a stop coda of 𰀯p, t, k, ʔ𰀯 was approximately 31, whereas the unchecked tone was around 21. Additionally, due to the considerable duration difference between checked and unchecked tones, we posit that distinguishing between checked and unchecked tones aligns more closely with perception. Additionally, there is a significant difference in fundamental frequency between the high-level tone and its variant 35, leading us to conclude that they should be distinguished as two separate tones. We also found that the combination of tone 35 with consonants and vowels aligns with the findings of previous researchers.

Figure 16. F0 contours of Zaiwa tones for the one speaker.
We agree with Lu & Kong (2019), who posit that the tonal system of Zaiwa should be described as follows: Zaiwa has six tones, comprising two level tones, one rising tone, and three falling tones, with the tone values of 55, 44, 35, 51, 31, and 21, respectively. Among these, tones 31 and 55 are checked tones, in that they only occur on syllables ending in a stop coda 𰀯p, t, k, ʔ𰀯, while tones 21, 44 and 35 are unchecked tones, meaning that they cannot occur on syllables ending in a stop coda. The occurrence of rising tone 35 is conditional, requiring the consonant to be unaspirated plosives and affricates, nasals, lateral approximants and approximants, and the vowel to be in a modal voice, whereas the vowels in other tones can be in either a modal or pressed voice.
Syllable structure
There are seven syllable structures, namely V, VV, C, VC, CV, CVV, and CVC. Since type C consists only of syllabic 𰀯m𰀯 and 𰀯n𰀯, which are modal particles, it should be treated as extra-linguistic and not part of the lexical phonology. As shown in Table 1, syllable types V and VV respectively represent all monophthongs and diphthongs as previously mentioned. In syllable types VC, CV, CVV, and CVC, the onset consonants include all consonants except for the glottal stop, and the coda consonants include the nasal sounds 𰀯m n ŋ𰀯 and the stops 𰀯p t k ʔ𰀯.
Table 1. Syllable structure and examples of Zaiwa

Transcription of the recorded passage
The passage used for the recordings is the story ‘The Wind and the Sun’, which is transcribed using the consonants, vowels, and tones of Zaiwa described above. The transcription below is broad phonemic. The symbol indicates a minor intonational break (corresponding to final lengthening), while denotes a major intonational break preceding a pause, following the International Phonetic Alphabet conventions. The transcription of each sentence is presented in three versions: the first line is an IPA transcription; the second, in italics, is a Zaiwa writing system transcription; and the third is the interlinear morphemic glossing. Abbreviations used in interlinear glossing follow the Leipzig Glossing Rules (LGR, https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf). The non-standard abbreviation (not included in the LGR) is: PREP = preposition. Additionally, a free English translation is also provided in quotations.

Acknowledgments
This work was supported by the National Social Science Fund of China (No. 22&ZD213). We are grateful to the journal’s editors, Marc Garellek and Marija Tabain, as well as the two anonymous reviewers, for their detailed comments and insightful suggestions, which have significantly contributed to improving the analysis and presentation of the Illustration.
Supplementary material
To view supplementary material for this article (including audio files to accompany the language examples), please visit https://doi.org/10.1017/S002510032510073X

