1. Introduction
Complex speech segments can be defined as single segments articulated with more than one constriction (Ladefoged & Maddieson, Reference Ladefoged and Maddieson1996). For example, the Arabic emphatic stop /tˤ/ is produced with a primary constriction in the alveolar region and a secondary constriction in the pharyngeal region (e.g., Al-Ani, Reference Al-Ani1970). Previous research indicates that complex speech segments are usually late-acquired (between 5 and 8 years old) – for example, liquids in English (Shriberg, Reference Shriberg1993), doubly articulated stops in Igbo (Nwokah, Reference Nwokah1986), and emphatic obstruents in Arabic (Amayreh, Reference Amayreh2003; Amayreh & Dyson, Reference Amayreh and Dyson1998).
The current study aims to investigate how Arabic-speaking children aged 3–6 years acquire complex emphatic consonants. Through analysing the acoustic properties of these sounds in child-produced speech, we aim to gain insights into the developmental trajectory and linguistic factors that may influence acquisition.
1.1 Articulatory and acoustic characteristics of emphatics
Emphatic consonants, also known as “pharyngealised” or “uvularised” consonants, are a set of sounds in Arabic that are produced with two constrictions: a primary constriction in the coronal region and a secondary constriction whose exact location is subject to debate – some researchers refer to these sounds as pharyngealised, while others argue they are uvularised segments (Al-Ani, Reference Al-Ani1970; Al-Tamimi et al., Reference Al-Tamimi, Alzoubi and Tarawnah2009; Al-Tamimi & Heselwood, Reference Al-Tamimi, Heselwood, Heselwood and Hassan2011; Al-Tamimi, Reference Al-Tamimi2015, Reference Al-Tamimi2017; Bin-Muqbil, Reference Bin-Muqbil2006; Ghazeli, Reference Ghazeli1977; Laufer & Baer, Reference Laufer and Baer1988; Zawaydeh, Reference Zawaydeh1999; Zawaydeh & de Jong, Reference Zawaydeh and de Jong2011). In Modern Standard Arabic (MSA), which serves as the written and formal language of the Arab world, the four emphatic consonants /sˤ tˤ dˤ ðˤ/ are maintained and clearly distinguished from each other (e.g., Al-Ani, Reference Al-Ani1970). However, variations in the realisation of emphatic consonants are evident across different spoken varieties of Arabic. For example, /dˤ/ is realised as [ðˤ] in most of the Gulf and Tunisian dialects, while /ðˤ/ is realised as either [dˤ] or [zˤ] in Hijazi, Egyptian, and Lebanese dialects (Watson, Reference Watson2002).
Arabic emphatics differ acoustically from their plain counterparts. The primary acoustic correlate of emphasis is the lowering of the second formant (F2) of adjacent vowels, most prominently in /a/ compared to /i/ and /u/ (Algryani, Reference Algryani2014; Bin-Muqbil, Reference Bin-Muqbil2006; Bukshaisha, Reference Bukshaisha1986; Card, Reference Card1983; Hassan & Esling, Reference Hassan and Esling2011; Israel et al., Reference Israel, Proctor, Goldstein, Iskarous and Narayanan2012; Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011; Kalaldeh & Al-Shdaifat, Reference Kalaldeh and Al-Shdaifat2019; Khattab et al., Reference Khattab, Al-Tamimi and Heselwood2006; Laufer & Baer, Reference Laufer and Baer1988; Zawaydeh & de Jong, Reference Zawaydeh and de Jong2011). Other cues to emphasis may include raising of the first and third formant frequencies (F1 and F3) of adjacent vowels and, for stops, shorter voice onset time (VOT) and lower spectral mean of the burst (Abudalbuh, Reference Abudalbuh2010; Al Malwi, Reference Al Malwi2017; Al-Khairy, Reference Al-Khairy2005; Khattab et al., Reference Khattab, Al-Tamimi and Heselwood2006). In addition to affecting immediately adjacent vowels, the emphasis feature may also spread to adjacent syllables with dialect-specific parameters for the domain and direction of spread, as well as the types of segments that block spreading (Alhammad, Reference Alhammad2014; Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004; Card, Reference Card1983; S. Davis, Reference Davis1995; Israel et al., Reference Israel, Proctor, Goldstein, Iskarous and Narayanan2012; Younes, Reference Younes, Eid and Holes1993).
Moreover, there may be some gender differences in the acoustic realisation of the plain–emphatic contrast. Findings, however, are inconsistent. Females were reported, in some studies, to show larger acoustic differences between emphatic and plain consonants than males – for example, greater lowering of F2 (Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004; Almbark, Reference Almbark2008; Almuhaimeed, Reference Almuhaimeed2021; Kahn, Reference Kahn1975). On the other hand, some studies have reported that males exhibit stronger cues to emphatics than females (Abudalbuh, Reference Abudalbuh2010; Lehn, Reference Lehn1963; Wahba, Reference Wahba1993).
The Arabic variety of interest in this study is the Saudi Arabian dialect, particularly the urban Hijazi dialect spoken in Taif City (Alzaidi et al., Reference Alzaidi, Xu and Xu2019). In this dialect, /tˤ/ and /sˤ/ are realised without allophonic variation and contrast phonemically with the plain (non-emphatic) counterparts /t/ and /s/, respectively (Table 1). The current study is, therefore, only concerned with the voiceless emphatics /tˤ/ and /sˤ/.
Table 1. The Arabic voiceless emphatic obstruents and their plain counterparts

1.2 The acquisition of emphatics
Previous research on the phonological development of Arabic-speaking children classifies the emphatic consonants among the late-acquired sounds (Al-Awaji, Reference Al-Awaji2014; Al-Buainain et al., Reference Al-Buainain, Shain, Al-Timimy and Khattab2012; Alqattan, Reference Alqattan2015; Amayreh, Reference Amayreh2003; Amayreh & Dyson, Reference Amayreh and Dyson1998; Ammar & Morsi, Reference Ammar and Morsi2018; Ayyad, Reference Ayyad2011; Dyson & Amayreh, Reference Dyson and Amayreh2000; Hamdan & Amayreh, Reference Hamdan and Amayreh2007; Saleh et al., Reference Saleh, Shoeib, Hegazi and Ali2007). Typical production errors include emphatic omissions for very young children (1–2 years) or de-emphasis for older children (3–8 years), that is, no secondary articulation, resulting in neutralisation of the plain–emphatic contrast – for example, /tˤ/ may be realised as [t] (Abdoh, Reference Abdoh2011; Alajroush, Reference Alajroush2019; Al-Awaji, Reference Al-Awaji2014; Al-Buainain et al., Reference Al-Buainain, Shain, Al-Timimy and Khattab2012; Alqattan, Reference Alqattan2015; Amayreh & Dyson, Reference Amayreh and Dyson1998; Ammar & Morsi, Reference Ammar and Morsi2018; Dyson & Amayreh, Reference Dyson and Amayreh2000; Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022; Saleh et al., Reference Saleh, Shoeib, Hegazi and Ali2007).
The generally late acquisition of emphatics has been linked to factors commonly suggested to account for late acquisition across other segment types: articulatory complexity, frequency of occurrence, and functional load (Contreras Kallens et al., Reference Contreras Kallens, Elmlinger, Wang, Goldstein, McLeod, Crowe and Christiansen2023; Stokes & Surendran, Reference Stokes and Surendran2005). First, the required precise coordination between articulatory gestures of the front part and root of the tongue may slow down the acquisition of emphatics, with de-emphasis often cited as evidence. Second, input frequency and functional load are widely recognised as key factors in early phonological development; sounds that occur more frequently in the input and serve to distinguish more minimal pairs are generally acquired earlier (Pye et al., Reference Pye, Ingram and List1987; Stokes & Surendran, Reference Stokes and Surendran2005; Surendran & Niyogi, Reference Surendran and Niyogi2003). In the case of Arabic, emphatic consonants rank between 16th and 27th in frequency out of the 28 consonants in MSA (Amayera et al., 1999, as cited in Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022), which Amayera et al. suggest may account for their late acquisition. Amayreh (Reference Amayreh2003) similarly attributes the delayed acquisition of emphatics in Jordanian Arabic to their low functional load, although no measures thereof were provided. However, assessing the impact of these factors for any spoken variety of Arabic, including Saudi-Hijazi Arabic, is complicated by the lack of a (digitised) phonemic lexicon and frequency counts (e.g., Al-Thubaity, Reference Al-Thubaity2015). As Ingram and Babatsouli (Reference Ingram and Babatsouli2024) note, a complete understanding of phonological development depends on research into under-represented languages, where many assumptions drawn from well-studied languages may not necessarily apply to less-documented languages, such as Arabic.
Most estimates of the age of acquisition for emphatics are based on adults’ perceptual judgements of children’s production and vary substantially across studies, from 2;06 to 7;11. These variations could be due to dialect differences, linguistic factors, and methodology (Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022), or a combination of factors. In terms of dialect, earlier acquisition of (some) emphatics has been observed for Egyptian and Kuwaiti children at 2;06 and 3;00, respectively (Alqattan, Reference Alqattan2015; Morsi, Reference Morsi2003), and later acquisition for Ammani and Syrian children, at 4;06 – 4;11 or 6;00 (Amayreh & Dyson, Reference Amayreh and Dyson1998; Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022; Owaida, Reference Owaida2015). Linguistic factors, such as manner of articulation and word position, may also affect the age of acquisition, in line with well-documented trends cross-linguistically (Jakobson, Reference Jakobson1968). The emphatic stops are typically acquired before the emphatic fricatives – for example, 3;10 for /tˤ/ versus 4;07 for /sˤ/ for Kuwaiti children (Ayyad, Reference Ayyad2011), and word-initial (WI) emphatics are acquired earlier (between 6;00 and 6;11) than word-final (WF) (7;00 and 7;11) for Jordanian children (Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022). Such linguistic factors are not always systematically addressed in studies on emphasis acquisition, highlighting the need for more detailed investigation to fully capture their influence across different developmental contexts.
One prior study has supplemented perceptual judgements with acoustic analysis to investigate the acquisition of Arabic emphatic consonants in 2- to 7-year-old Ammani-Arabic-speaking children (Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022). Using the criteria for the customary, acquisition, and mastery stages of consonant development proposed by Sander (Reference Sander1972), emphatics were customarily produced by age 3 years – more than 50% of the children were able to produce these sounds correctly as judged by adults. The acquisition stage was considered to occur by age 4 years with at least 75% correct production. The mastery stage was by age 5 years, with over 90% of children producing emphatics correctly.
Despite the mastery stage starting at 5 years, the acoustic parameters of children’s productions, particularly F2 − F1 values of adjacent vowels, were not adult-like for all manners of articulation and word positions until well over the age of 6 years.Footnote 1 For example, children produced acoustically adult-like emphatics earlier in WI and word-medial (WM) positions, while emphatics in WF positions were not adult-like until the age of 7;01 – 7;11. Moreover, the fact that children produced acoustically non-adult-like realisations, even when adults judged their productions largely to be correct, suggests that the fine motor control and specific articulatory gestures required to produce adult-like emphatics may take longer to develop (Zharkova, Reference Zharkova2018). Conversely, for children who were perceived to neutralise the contrast, for example, their target /tˤ/ was perceived as [t], the acoustic analysis showed a statistically significant contrast in F2 − F1 between vowels adjacent to plain and emphatic consonants. This suggests that the phonological contrast was covertly produced (Barton & Macken, Reference Barton and Macken1980).Footnote 2
The work by Mashaqba et al. (Reference Mashaqba, Daoud, Zuraiq and Huneety2022) thus highlights that acoustic measurements can capture gradual, age-related changes during the acquisition process of emphatics, both before acquisition begins and after it is completed according to adult judgements. However, while Mashaqba et al. (Reference Mashaqba, Daoud, Zuraiq and Huneety2022) acoustically examined children’s production of the plain–emphatic contrast, they focused solely on one acoustic cue to the contrast, that is, vocalic cues, which may not fully reflect the range of strategies children might use to realise these contrasts. For instance, children might realise the plain–emphatic contrast by only implementing a vocalic difference, such as producing a back vowel. Thus, it is important to consider both the consonantal and vocalic cues when examining children’s production of the plain–emphatic contrast, which may shed light on how children realise this contrast.
1.3 The current study
The current study, therefore, aimed to build on Mashaqba et al.’s (Reference Mashaqba, Daoud, Zuraiq and Huneety2022) acoustic work by examining the developmental trajectory of the production of emphatic consonants in 3- to 6-year-old Saudi Arabic-speaking children, a dialect that is still understudied with respect to children’s phonological and phonetic development in general (Abdoh, Reference Abdoh2011; Alajroush, Reference Alajroush2019; Turki, Reference Turki2022). Productions were elicited for stops and fricatives, in WI, WM, and WF positions. Acoustic characterisation of children’s emphatics was conducted by examining not only formant frequencies of adjacent vowels but also VOT durations of stop consonants.
We hypothesised that the acquisition process would be gradual (Macken & Barton, Reference Macken and Barton1980; Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022). Therefore, older children may show larger acoustic differences between plain and emphatic consonants than younger children. We additionally expected that the plain–emphatic contrast might be larger in initial than in final positions, especially for younger compared to older children (Amayreh, Reference Amayreh2003; Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022). For WM position, a larger acoustic contrast was expected for vowels immediately preceding than immediately following the target consonant (Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004; Zawaydeh, Reference Zawaydeh1999). We also expected a larger plain–emphatic contrast for stops compared to fricatives (Amayreh, Reference Amayreh2003; Jakobson, Reference Jakobson1968). Finally, gender differences might be observed such that females may produce larger contrasts compared to males (Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004; Almbark, Reference Almbark2008; Almuhaimeed, Reference Almuhaimeed2021; Kahn, Reference Kahn1975). Because a comprehensive description of emphatics in the Saudi-Hijazi dialect is currently still missing from the literature, adult participants were also included, serving as reference data.
2 Methods
2.1. Participants
Thirty-eight monolingual Arabic-speaking children aged between 3;02 and 6;11 from Saudi Arabia participated (M age = 5;01, female = 20, male = 18) – see Table A.1 in Supplementary Appendix A for more details. All children were reported by their parents to have typical cognitive and language development. Thirteen native Saudi Arabic-speaking adults between 17 and 38 years old (M age = 24, female = 5, male = 8) were also recruited.
The study was approved by the Macquarie University Faculty of Medicine, Health, and Human Sciences Human Research Ethics Sub-Committee. Adult participants and parents or legal guardians of child participants signed an informed consent form before the experiment. Verbal assent was obtained from children before the start of the procedure.
2.2. Stimuli
Target consonants examined in this study were the alveolar voiceless emphatic obstruents and their plain counterparts, that is, the oral-stop contrast /tˤ/ vs. /t/ and the fricative contrast /sˤ/ vs. /s/. Consonants were elicited in real words with a ˈCa.CaC structure, in WI and WM positions for the stops, and WI, WM, and WF positions for the fricatives.
The selection of stimulus items to elicit these targets was guided by the need to balance phonological control with developmental appropriateness. While minimal pairs are often preferred in phonetic research for isolating segmental contrasts, they were not always feasible for the current study due to constraints on lexical familiarity to young children, morphosyntactic structure (more details below), and picturability. As such, the study employed 15 near-minimal word pairs (for a total of 30 words), comprising 6 oral-stop pairs and 9 fricative pairs and contrasting an emphatic and plain target consonant within each pair (see Table 2 for examples; for a full list of stimuli, see Table A.2 in Supplementary Appendix A).
Table 2. Stimuli sample of /tˤ/ versus /t/and /sˤ/ versus /s/ in word-initial, word-medial, and word-final positions

Abbreviations: SG, singular; 3, third person; M, masculine.
All target words were real, picturable nouns or verbs, and were selected to be familiar to young children. Familiarity was assessed through an informal lexical survey involving 14 Saudi-Hijazi Arabic-speaking parents of children aged 2;02–7;11 (M age = 4;02). A word was deemed familiar if at least 60% of parents reported that their child understood it.
In designing the stimulus set, additional morphosyntactic and phonological constraints were considered. In Arabic, WF /t/ usually occurs as a suffix to mark the third-person singular feminine as in (1) or the first-person singular as in (2). As a result, it was difficult to find word pairs where the plain /t/ in WF position was part of the stem, not a suffix/morpheme. Moreover, young Arabic-speaking children primarily use verbs that are marked for singular masculine, the default or unmarked verb form, as in (3), before developing the verbal agreement paradigm after the age of 4 years (Aljenaie, Reference Aljenaie2010; Omar, Reference Omar1973). Therefore, a pair like (4a,b) is likely to be produced as (5a,b) by the youngest age group in this study (3 years old), where the final /t/ would be omitted, preventing comparison between the members of the pair.

Therefore, word pairs with the target stops in the final position were excluded. Consequently, target stops were restricted to WI and WM positions, whereas target fricatives were elicited in all three word positions.
To minimise phonetic differences between stimulus words, all target stimulus words were controlled for number of syllables (all disyllabic with the structure ˈCV.CVC) and vowels (all low front vowel /a/). Stress was fixed on the initial syllable, such that WI targets occurred in stressed syllables and medial/final targets in unstressed positions. Apart from the target emphatics, words did not include any other emphatic consonants, to avoid emphasis spread from non-target consonants, inducing F2 lowering. Additionally, to minimise any potential effect of adjacent non-target consonants, non-target consonants nearest to the target consonant within each pair were matched (where possible) for place of articulation.
Two high-frequency, picturable practice items (/moːz/ “banana” and /sajːarah/ “car”) were included at the beginning of the task to familiarise participants with the elicitation format. All auditory stimuli were recorded by the first author, a native speaker of Saudi-Hijazi Arabic, using a high-quality lavalier microphone (Rode Lavalier GO) and Zoom H2n recorder in a quiet room. The pictures and audio recordings were then piloted with a small sample of adults and children to ensure that they effectively elicited the target sounds.
2.3. Procedure
Data collection was conducted in a hybrid in-person and online format. Speech recordings were collected by a trained research assistant who was present with participants to set up recording equipment and closely monitor the experiment. Stimuli were presented to participants in person on a MacBook Pro laptop. Experiment instructions were delivered to participants online via a videotelephony platform (Zoom). The first author monitored the progression of the experiment and participants’ performance via the remote-control function in Zoom. Either before or after the experiment, adult participants and child participants’ parents filled out a short language background questionnaire (see Supplementary Appendix B).
Data collection took place in schools, daycares, and participants’ homes. In each location, recordings were made in the quietest room available. To improve the acoustic conditions of the recordings, foam isolation shields were placed to the sides and in front of participants to reduce background noise and reverberation. Participants’ responses were recorded using a Rode Lavalier GO omnidirectional lavalier microphone connected to a Zoom H2n voice recorder.
Two practice trials were implemented first and were repeated until participants understood the task. Target consonants were then elicited through a single-word repetition task with a picture prompt. In this task, a picture on a computer screen coupled with an audio prompt of the picture’s name was presented to participants. Participants were then instructed to repeat the word five times. To minimise any potential effects of presentation order, stimuli were pseudorandomised by participant, with the restriction that successive trials alternated between words with a target emphatic and words with a target plain consonant.
2.4. Data selection and exclusion criteria
Out of the five repetitions of each target word, only the three most spontaneous and clear repetitions were considered for acoustic analysis (e.g., no hesitations, or extremely high or low intonations, etc.). These were mostly the three middle repetitions. This resulted in a total of 3,403 tokens produced by children and 1,170 tokens produced by adults.Footnote 3
Tokens were then excluded from the consonant and vowel analyses if they exhibited excessive background noise, had a clipped waveform signal, or included mispronunciations or disfluencies that rendered the acoustic measurements unreliable for addressing the research questions (e.g., /fasˤal/realised as /stˤatar/) (n = 96). Additionally, tokens where the target acoustic parameters could not be measured were excluded from the consonant analysis (n = 18). In most cases, this was caused by the absence of a meaningful closure in a target stop consonant in medial position, which may imply that the target stop was realised as a fricative-like segment. This resulted in a total of 4,459 tokens (children = 3,305; adults = 1,154) selected for further analyses.
2.5. Data analyses
Acoustic annotation. Selected data were automatically segmented using Munich Automatic Segmentation system (MAUS) forced aligner (Kisler et al., Reference Kisler, Reichel and Schiel2017; Schiel, Reference Schiel1999) with the Arabic grapheme-to-phoneme converter (Al-Tamimi et al., Reference Al-Tamimi, Schiel, Khattab, Sokhey, Amazouz, Dallak and Moussa2022). Segment boundaries were hand-corrected in Praat (Boersma & Weenink, Reference Boersma and Weenink2023) according to the following acoustic criteria: VOT for stop consonants in initial and medial positions was marked from the onset of the burst release to the onset of periodicity in the waveformFootnote 4 (Figures 1 and 2). In cases where multiple bursts were observed, the first burst was taken as the onset of the burst release (e.g., Davis, Reference Davis1994; Sučková, Reference Sučková2020; Tar, Reference Tar2014; Yang, Reference Yang2018).

Figure 1. Example of the acoustic annotation of target segments for /tafal/ in word-initial position (1) represents the VOT interval for the target consonant /t/ and (2) represents the interval of the following target vowel /a/.

Figure 2. Example of the acoustic annotation of target segments for /batˤal/ in word-medial position (1) represents the interval of the preceding target vowel /a/, (2) represents the interval of the closure duration for the target consonant /tˤ/, (3) represents the VOT interval, and (4) represents the interval of the following target vowel /a/.
Vowel onsets were identified by the first presence of F2, while vowel offset was marked at the end of F2. For WI and WF tokens, only the vowel immediately adjacent to the target consonant was considered for analysis (Figures 1 and 3). For WM, both vowels preceding and following the target consonant were considered (Figure 2).

Figure 3. Example of the acoustic annotation of target segments for /ragasˤ/ in word-final position (1) represents the interval of the preceding target vowel /a/ and (2) represents the interval of the target consonant /sˤ/.
Blind to the original hand-correction, a second coder hand-corrected around 10% of the automatically segmented data (children = 343; adults = 112 tokens). For stop consonants, VOT measurements are typically considered in agreement when they differ by <10 ms (Fabiano-Smith & Bunta, Reference Fabiano-Smith and Bunta2012; Stoehr et al., Reference Stoehr, Benders, van Hell and Fikkert2017). However, in our data, we chose a smaller range (5 ms) to increase the chances of finding VOT differences associated with the emphatic-plain contrast. For vowels, measurements were considered in agreement when they differed by <15 ms for adults and 20 ms for children. Percentage agreement for adults and children was as follows: for onset bursts, percentage agreement was 98% for adults and 100% for children. For vowel onsets, percentage agreement was 94% for adults and 98% for children. For vowel offsets, percentage agreement was 95% for adults and 99% for children.
Formant measurements. Formant trajectories were first automatically estimated, then hand-corrected using an interactive Praat script (Szalay et al., Reference Szalay, Benders, Cox, Proctor and Billington2022) that visualises the trajectories overlaid on wide-band spectrograms and enables researchers to identify and manually correct misalignments in the formant trajectories. To minimise any potential effect of the adjacent non-target consonants, formant values were averaged over the initial 35% of the vowel following the target consonant in WI and WM positions, as well as the final 35% of the vowel preceding the target consonant in WM and WF positions.
Statistical analyses. All visualisations and analyses were conducted in R statistical software (R Core Team, 2021) using the following packages: lme4 (Bates, Mächler, et al., Reference Bates, Mächler, Bolker and Walker2015) and lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) for linear mixed-effect models (LMMs), sjPlot to generate tables (Ludecke et al., Reference Ludecke, Bartel, Schwemmer, Powell, Djalovski and Titz2015), and ggplot2 (Wilkinson, Reference Wilkinson2011) to generate plots.
Six LMMs were fitted separately in R for adults and children with either logged VOT (one adult model and one child model) or the F2:F1 ratio (two adult models and two child models) as the dependent variable. In the vowel models, the ratio of the first two formants (F2:F1) was taken as the dependent variable (rather than the difference [F2 − F1], as Mashaqba et al. [Reference Mashaqba, Daoud, Zuraiq and Huneety2022] used). This is considered a normalisation technique because formant ratios are less sensitive to developmental changes in the size of the vocal tract and other physiological properties that often vary with age and sex, making it more reliable for reducing interspeaker variability (Sapir et al., Reference Sapir, Ramig, Spielman and Fox2010).
All models included fixed main effects for Consonant type (Emphatic [E] = −0.5, Plain [P] = +0.5) and Gender (Male [M] = −0.5, Female [F] = +0.5). For the children’s model, we also included two orthogonal polynomial terms for Age (in months, created using R’s poly() function), linear and quadratic, as visualisation of the children’s data showed curvilinear trends with age. One or two further fixed main effects were included in the analyses, depending on the exact data subset (as detailed below), and all interactions between all main effects were included.
The consonant analysis, on VOT, only included stops not in the WF position. The additional fixed main effect was thus Word Position (WM = −0.5 and WI = +0.5).
The first vowel analysis was conducted on WI and WF tokens, including both stop and fricative consonants for WI positions but only fricative consonants for WF positions. The additional fixed main effect for these analyses was termed WpMoA, a combination of Word position and Manner of articulation variables, coded with a sliding-difference contrast with two coefficients. The first coefficient WF Fricatives (WFFs) estimates to what extent the F2:F1 ratio is higher in WI Fricatives (WIFs) than WFFs (estimate = WIF−WFF), and the second coefficient WI Stops (WISs) estimates to what extent the F2:F1 ratio is higher in WISs than WIFs (estimate = WIS−WIF). This factor has three levels (WIS = WIF−WFF = +1/3 and WIS = WIF = +2/3; WIF = WIF−WFF = +1/3 and WIS = WIF = −1/3; and WFF = WIF−WFF = −2/6 and WIS = WIF = −1/3).
The second vowel analysis was conducted on WM tokens with both stop and fricative target consonants. The additional fixed main effects for these analyses were thus MoA (MoA: Fricative [f] = −0.5; Stop [s] = +0.5) and Vowel (Following the target consonant [foll] = −0.5; Preceding the target consonant [pre] = +0.5).
For each LMM, the final model’s random structure was determined through an iterative procedure of model simplification (Bates, Kliegl, et al., Reference Bates, Kliegl, Vasishth and Baayen2015), starting with a maximal model incorporating the exhaustive combination of random effects based on the study’s design. Random intercepts were always set for both participant and target word (where “Target word” is defined as a lexical item). By-participant random slopes were included for all within-subject effects (the within-subject main effects and their interactions), and by-item random slopes for within-item effects (the within-item main effects and their interactions if applicable). When the maximal model failed to converge or was singular, indicating an over-parameterisation (Bates, Kliegl, et al., Reference Bates, Kliegl, Vasishth and Baayen2015), a systematic simplification process was initiated. Throughout this iterative process, the random slope for “Consonant type” was consistently retained due to its importance to the research questions. The simplification process began with the removal of correlations between random effects. If the model did not converge or was singular, the next step involved eliminating complex interactions and main effects, focusing on interactions first and higher-order interactions before lower-order ones. For multiple potential removals, separate models were fitted for each random slope removal. If multiple models converged and were not singular, the model with the lowest Akaike Information Criterion was selected. If none of those models converged or were singular, each model was further simplified and assessed for convergence.
To address the research questions about the difference (acoustic contrast) between plain and emphatic consonants, only the main effect of Consonant type and its interactions are relevant. Only those effects will thus be reported in the text. The reader is referred to the tables in Supplementary Appendix C for full model results on all effects.
3. Results
3.1. Adult model (VOT)
Results of the Adult VOT model (Figure 4 and Table 3) show a positive significant main effect of Consonant type, suggesting, as expected, that VOT is longer in plain than in emphatic stop consonants. The positive significant interaction between Consonant type and Gender suggests that the VOT difference between the plain and emphatic consonants is larger in females than in males. All the other two-way and higher-order interactions with Consonant type were not significant.

Figure 4. Boxplots showing adult VOT durations for the emphatic /tˤ/ and the plain /t/ across Gender and Word position. The line within the box represents the median. The whiskers represent the 1.5 interquartile range (1.5*IQR). Grey dots are individual data points.
Table 3. LMM results for adults’ logged VOT

Note: * p < 0.05; **p < 0.01; ***p < 0.001.
3.2. Child model (VOT)
Results of the child VOT model (Figure 5 and Table 4) show a significant positive main effect of Consonant type, suggesting that VOT is longer in plain than in emphatic stop consonants. The positive two-way interaction between Consonant type and AgeLinear suggests that as age increases, the acoustic contrast increases. However, this increase in the acoustic contrast with age is stronger before than after the sample’s mean age (60 months), as suggested by the significant negative interaction between Consonant type and AgeQuadratic. The significant positive two-way interaction between Consonant type and Gender suggests that females produce a larger acoustic contrast than males. This larger female contrast is increasingly greater in WM than WI position as children are closer to the sample’s mean age, as suggested by the negative four-way interaction between Consonant type, AgeQuadratic, Word position, and Gender.

Figure 5. Curvilinear plot showing children’s VOT durations of the emphatic /tˤ/ and the plain /t/ across Age, Word position, and Gender. Grey dots are individual data points.
Table 4. LMM results of children’s logged VOT

Note: *p < 0.05; *p < 0.01; ***p < 0.001.
3.3. Adult model (F2:F1 Word-initial and Word-final)
Results of the Adult F2:F1 model for WI and WF tokens (Figure 6 and Table 5) show a positive significant main effect of consonant type, suggesting that F2:F1 is larger in vowels next to a plain than an emphatic consonant. None of the two-way and three-way interactions with Consonant type were significant.

Figure 6. Boxplots showing adults’ F2:F1 ratios between plain–emphatic consonants in word-initial and word-final positions across MoA and Gender. The line within the box represents the median. The whiskers represent the 1.5 interquartile range (1.5* IQR). Grey dots are individual data points.
Table 5. LMM results for adults’ F2:F1 in word-initial and word-final positions

Note: *p < 0.05; **p < 0.01; ***p < 0.001.
3.4. Child Model (F2:F1 Word-initial and Word-final)
Results of the Child F2:F1 model for WI and WF tokens (Figure 7 and Table 6) show a positive significant main effect of Consonant type, suggesting that F2:F1 is larger in vowels next to a plain compared to an emphatic consonant. The significant positive interaction between Consonant type and AgeLinear suggests that the acoustic contrast increases with age, with older children showing a larger acoustic contrast relative to younger children. None of the other two-way and higher-order interactions with Consonant type were significant.

Figure 7. Curvilinear plot showing children’s F2:F1 between emphatic and plain consonants in word-initial and word-final positions across Age, MoA, and Gender. Grey dots are individual data points.
Table 6. LMM results for children’s F2:F1 in word-initial and word-final positions

Note: *p < 0.05; **p < 0.01; ***p < 0.001.
3.5. Adult model (F2:F1 Word-medial)
Results of the Adult F2:F1 model for WM tokens (Figure 8 and Table 7) show a significant positive main effect of Consonant type, suggesting that F2:F1 is larger in vowels next to a plain compared to an emphatic consonant. The significant positive two-way interaction between Consonant type and Vowel suggests that the acoustic contrast is larger in preceding compared to following vowels. The significant positive two-way interaction between Consonant type and MoA suggests that the acoustic contrast is larger in vowels next to stops compared to fricatives. The positive three-way interaction between Consonant type, Vowel, and MoA suggests that the extent to which the acoustic contrast is larger in preceding than following vowels is more apparent in stops compared to fricatives. None of the other two-way and higher-order interactions with Consonant type were significant.

Figure 8. Boxplots showing adults’ F2:F1 between plain and emphatic consonants in word-medial position across MoA, Vowel, and Gender. The line within the box represents the median. The whiskers represent the 1.5 interquartile range (1.5*IQR). Grey dots are individual data points.
Table 7. LMM results for adults’ F2:F1 in word-medial position

Note: *p < 0.05; **p < 0.01; ***p < 0.001.
3.6. Child model (F2:F1 Word-medial)
Results of the Child F2:F1 model for WM tokens model (Figure 9 and Table 8) show a significant positive main effect of Consonant type, suggesting that F2:F1 is larger in vowels next to a plain compared to an emphatic consonant.

Figure 9. Curvilinear plot showing children’s F2:F1 between emphatic and plain consonants in word-medial position across Age, MoA, Vowel, and Gender. Grey dots are individual data points.
Table 8. LMM results for children’s F2:F1 in word-medial position

Note: *p < 0.05; **p < 0.01; ***p < 0.001.
The positive two-way interaction between Consonant type and AgeLinear suggests that the acoustic contrast increases as children grow older. The negative two-way interaction, however, between Consonant type and AgeQuadratic indicates that the acoustic contrast is increasing more before than after the sample’s mean age (60 months).
The positive two-way interaction between Consonant type and Vowel suggests that the acoustic contrast is larger in preceding than following vowels. The positive two-way interaction between Consonant type and MoA suggests that the acoustic contrast is larger in vowels next to stops compared to fricatives. This larger contrast in vowels next to stops is greater in preceding compared to following vowels – as suggested by the positive three-way interaction between Consonant type, Vowel, and MoA.
The negative three-way interaction between Consonant type, AgeQuadratic, and MoA suggests that the extent to which the acoustic contrast is larger in vowels next to stops compared to fricatives reaches its peak around children’s mean age. The positive three-way interaction between Consonant type, AgeQuadratic, and Gender suggests that the extent to which the acoustic contrast is larger in females compared to males is at its minimum around children’s mean age. The negative four-way interaction between Consonant type, AgeLinear, Vowel, and MoA suggests that the extent to which the preceding vowel advantage is larger in vowels next to stops than in fricatives decreases as children grow older.
4. Discussion
This study investigated the production of emphatic consonants in real words by Saudi-Hijazi Arabic-speaking children aged 3–6 years. We hypothesised a gradual acquisition process, with older children exhibiting a larger acoustic contrast between plain and emphatic consonants than younger children. We additionally predicted that the plain–emphatic acoustic contrast would be larger in initial than in final positions and larger for vowels preceding than following the target consonants for WM position. Children were also expected to produce larger acoustic contrasts in stops compared to fricatives. Finally, we predicted that females may produce a larger acoustic contrast compared to males.
The results suggest that children in the 3–6 years age range, on average, produce an acoustic contrast between plain and emphatic consonants that increases with age, with the rate of increase tapering off as children grow older. Notably, the acoustic contrast was detected on both the consonant (stop VOTs) and the immediately adjacent vowels (formant frequency ratios) in similar patterns as for adults, suggesting that children are targeting the emphatic consonants, at least for emphatic stops, rather than implementing a different strategy – for example, producing only a back vowel.Footnote 5
The increase in acoustic contrast with age is consistent with Mashaqba et al. (Reference Mashaqba, Daoud, Zuraiq and Huneety2022), who noted that although Ammani-speaking children produced plain–emphatic acoustic contrasts by age 5 years, adult-like production emerged later, around 6~7 years of age. The age-related increase in the contrasts in our study similarly suggests an ongoing refinement of speech production abilities as children grow older (Sadagopan & Smith, Reference Sadagopan and Smith2008; Walsh & Smith, Reference Walsh and Smith2002) and may reflect increased lingual control as children mature (Zharkova et al., Reference Zharkova, Hewlett, Hardcastle and Lickley2014). Given the articulatory complexity associated with emphatics and cross-linguistic findings of protracted development of complex speech segments, such as English liquids (Howson & Redford, Reference Howson and Redford2021), we hypothesise that the development of these segments may continue after 6;11 – that is, the age of the oldest children tested in this study.
The gradual development of the plain–emphatic contrast may explain why some previous studies relying on perceptual judgements reported “no acquisition” of emphatics until over 6 years old (Amayreh, Reference Amayreh2003; Amayreh & Dyson, Reference Amayreh and Dyson1998; Dyson & Amayreh, Reference Dyson and Amayreh2000). Listeners in those studies may have identified children’s still-developing emphatic realisations as not yet meeting their threshold for an emphatic percept. In contrast, the first author of the present study, a native Saudi-Hijazi Arabic speaker, perceives most of the youngest children in the present sample to be producing the target emphatics. This apparent discrepancy between studies poses some questions about children’s emphatic production and adults’ perception thereof. For example, were children in previous studies, from different dialects, producing weaker acoustic contrasts than the children in our sample? Which auditory cues do adult native speakers across dialects rely on when evaluating children’s production of emphatics? Are there inherent biases or variations in perceptual thresholds that might skew interpretations of whether children are indeed producing emphatics accurately? These questions highlight the need for future research to combine instrumental measures with perceptual evaluations, ensuring a comprehensive understanding of the development of emphatics.
4.1. The acoustic contrast and linguistic factors
The current study also explored the effects of the manner of articulation and the target segment word position on the production of emphatics. Consistent with our hypotheses, both children and adults produced larger acoustic contrasts in vowels next to stops compared to fricatives. As for positional effects, there was no evidence for or against a difference between children’s (or adults’) productions in WIFs and WFFs. For WM position, both children and adults produced larger plain–emphatic acoustic contrasts in vowels immediately preceding compared to immediately following the target consonant.
The most parsimonious explanation for this similarity between adults’ and children’s patterns is that children are aligning their production with the adult input. In what follows, we discuss the relevant articulation mechanisms, explaining how these might give rise to each of the observed linguistic effects, even in adults, and might affect children in particular. While the present data do not permit the disentanglement of input factors from articulation development, we aim to embed the observed linguistic and developmental patterns for emphatics into the general literature on phonological and articulatory development.
Regarding the manner of articulation effect, recall that the larger acoustic contrast in vowels next to stops compared to fricatives was observed in both adults and children, aligning with previous findings on other dialects of Arabic (Ghazeli, Reference Ghazeli1977; Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011). One possible explanation for this general difference between emphatic stops and emphatic fricatives is the differential degree of tongue involvement. Fricative production requires a precise, narrow constriction to allow for continuous airflow that generates turbulence at the place of constriction. Maintaining this aerodynamic requirement might limit the extent to which the tongue can move, for example, assume a secondary pharyngeal constriction, without disrupting the fricative noise. In contrast, stops require a complete blockage of airflow, allowing the tongue greater freedom to move to different positions before and after the closure without influencing sound production. This flexibility might result in more pronounced spectral differences for emphatic stops compared to fricatives (Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011).
In the context of language development, the controlled constriction required for fricatives might account for the universal tendencies for children to acquire fricatives later than stops (Gildersleeve-Neumann et al., Reference Gildersleeve-Neumann, Davis and Macneilage2000; Jakobson, Reference Jakobson1968; Stoel-Gammon, Reference Stoel-Gammon1985). The challenge of producing an emphatic fricative is further increased due to the additional complexity of maintaining the secondary gesture. Most studies agree that emphatic fricatives are acquired after emphatic stops (Alqattan, Reference Alqattan2015; Ammar & Morsi, Reference Ammar and Morsi2018; Ayyad, Reference Ayyad2011; Morsi, Reference Morsi2003). In the current data, the fact that the rate of increase in the size of the acoustic contrast slows down more in stops compared to fricatives may suggest that the plain–emphatic contrast stabilises earlier in stops than in fricatives, which is in line with previous findings showing that fricative coarticulation development is more protracted than stops (Zharkova, Reference Zharkova2018).
Regarding the difference between vowels before and after target consonants, the analysis of WM position shows that both children and adults produced larger plain–emphatic acoustic contrasts in vowels immediately preceding compared to immediately following the target consonant. This aligns with previous findings on emphatics, which show stronger anticipatory compared to carryover coarticulation in adults (e.g., Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004; Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011; Zawaydeh, Reference Zawaydeh1999). This can be explained through the different mechanisms claimed to give rise to anticipatory and carryover coarticulation, which may follow different developmental trajectories. Anticipatory coarticulation, where speakers prepare for the production of upcoming sounds, has been hypothesised to reflect planning strategies that need to be learned (Hertrich & Ackermann, Reference Hertrich and Ackermann1995; Recasens, Reference Recasens1984; Waldstein & Baum, Reference Waldstein and Baum1991). In contrast, carryover coarticulation is believed to be more reflexive, arising from the physical properties and limitations of the articulators, often called mechanical inertia constraints (Baum & Waldstein, Reference Baum and Waldstein1991; Daniloff & Hammarberg, Reference Daniloff and Hammarberg1973; MacNeilage & DeClerk, Reference MacNeilage and DeClerk1969; Recasens, Reference Recasens1984).
Whether these positional effects in children’s production might reflect aspects of articulatory development depends on one’s theory of coarticulation development. On the one hand, given that children’s speech production is still in development, children’s larger acoustic contrasts in vowels preceding (rather than following) emphatics might reflect that the anticipatory coarticulation patterns are learned before the maturation of mechanical inertia constraints responsible for carryover coarticulation. Children might thus be learning planning strategies before the mature motor control required for carryover coarticulation. Recent research, on the other hand, claims that both anticipatory and carryover coarticulation might share a common mechanism, and thus, developmental patterns should be similar (Noiray et al., Reference Noiray, Abakarova, Rubertus, Krüger and Tiede2018; Rubertus & Noiray, Reference Rubertus and Noiray2020). This view is also compatible with our findings, given that children might use this single mechanism to target the adult pattern of stronger anticipatory than carryover coarticulation. Moreover, the stronger anticipatory emphasis in the adult target might make anticipatory cues to emphatics more perceptually salient and, therefore, a more prominent target for children’s production. The present data, thus, do not permit the disentanglement of different coarticulatory accounts from each other and input factors. Our ongoing work investigates the acquisition of emphasis spread, where the effects of the emphatic consonant extend bidirectionally beyond the immediately adjacent vowel, which could provide more insight into coarticulation development in children and the relative impact of input versus articulation mechanisms.
Regarding word position effects, previous findings indicate that Ammani-speaking children achieve adult-like production of emphatics earlier in WI and WM than in WF positions (Amayreh & Dyson, Reference Amayreh and Dyson1998; Mashaqba et al., Reference Mashaqba, Daoud, Zuraiq and Huneety2022), reflecting general cross-linguistic development patterns in consonant acquisition (Levelt et al., Reference Levelt, Schiller and Levelt2000; Levelt & de Vijver, Reference Levelt and de Vijver2004). The current study, however, provides no evidence for (or against) a difference between WISs and WIFs nor between WIFs and WFFs in children’s (or adults’) productions. This is particularly surprising as WI (but not WF) target consonants were in a stressed syllable, which would be expected to make the contrast in WI position more prominent and, thus, further increase children’s contrast in this position (Beckman, Reference Beckman1998; Echols & Newport, Reference Echols and Newport1992; Rose, Reference Rose2002). The apparent lack of a word position effect might be due to the opposite effects of coarticulation versus word position and stress. Because WI contrasts reflect carryover coarticulation with the following vowel, while WF contrasts reflect anticipatory coarticulation with the preceding vowel, coarticulation effects in isolation would lead to a larger contrast in WF compared to WI position, similar to the pattern observed in WM position. However, the prominence of the WI contrast, being in a stressed syllable, might have mitigated the effects of coarticulatory direction, or vice versa. Future work may explore this possibility by examining how prosodic effects, such as stress, interact with the coarticulatory development of the emphatic consonant.
4.2 The acoustic contrast and gender
While previous findings suggest that, among adults, males exhibit stronger cues to emphatics than females (Abudalbuh, Reference Abudalbuh2010; Lehn, Reference Lehn1963; Wahba, Reference Wahba1993), other studies reported the opposite, with females producing stronger cues to emphatics than males (Kahn, Reference Kahn1975; Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004). Our adult results are in partial agreement with the latter studies, as we observed that the VOT contrast in stops was larger for females than males. However, no evidence for or against gender-based differences in formant frequency ratios was found. Thus, we cannot claim that females consistently produce stronger cues to emphatics than males. The relatively small sample size (females = 5 and males = 8) might have left us unable to reliably estimate differences between female and male speakers in Hijazi Arabic.
In contrast, female children displayed larger acoustic contrasts than male children for both VOT and vowel formants, with the gender effect for vowel formants most apparent at the upper and lower limits of the age range studied. Because sex-related differences in the vocal tract may not emerge until after puberty (Barbier et al., Reference Barbier, Boë, Captier and Laboissière2015; Vorperian et al., Reference Vorperian, Wang, Schimek, Durtschi, Kent, Gentry and Chung2011) and because the vowel formant measure used in this study (F2:F1 ratio) serves as a normalised metric that reduces interspeaker variability, including that related to age and sex (Sapir et al., Reference Sapir, Ramig, Spielman and Fox2010), the observed differences in the child data are unlikely to reflect anatomical differences. Taken together, the sex-related trends in our data with children under 7 years of age may instead be interpreted as learning of gendered speech patterns (Ford et al., Reference Ford, Tabain and Docherty2018; Foulkes et al., Reference Foulkes, Docherty and Watt2005; Munson et al., Reference Munson, Crocker, Pierrehumbert, Owen-Anderson and Zucker2015). In the future, more targeted studies into this issue could collect more comprehensive information about adults’ and children’s social background, as factors such as economic status, education level, and cultural norms may moderate the extent to which gender is associated with acoustic cues to emphatics (Abudalbuh, Reference Abudalbuh2010; Al Malwi, Reference Al Malwi2017; Al-Masri & Jongman, Reference Al-Masri, Jongman, Agwuele, Warren and Park2004; Alzoubi, Reference Alzoubi2017; Khattab et al., Reference Khattab, Al-Tamimi and Heselwood2006).
4.3 Limitations and future directions
In the current study, we acoustically examined the consonantal and vocalic cues in children’s production of the plain–emphatic contrast. The acoustic measurements of the vocalic cues included only 35% of the vowel immediately adjacent to the target consonant, to minimise influences from adjacent non-target consonants. This is a potential limitation, as the analysis window does not allow us to examine the extent to which emphasis is maintained across the entire vowel immediately adjacent to the target emphatic and whether it spreads to vowels further away (e.g., Zawaydeh & de Jong, Reference Zawaydeh and de Jong2011). Moreover, our specific choice of words allowed us to examine the effect of emphasis only in the context of the low front vowel /a/, which has been previously shown to be more influenced by emphatic consonants than other vowels (e.g., Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011). Our ongoing work will explore whether the current findings can be generalised across other vowel contexts.
While we examined both consonantal and vocalic cues to emphatics for stops, we only considered vocalic cues to emphatics for fricatives. This decision was motivated by (1) the lack of evidence in previous literature for a spectral difference between plain and emphatic fricatives in adults (Al-Khairy, Reference Al-Khairy2005; Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011) and (2) the unsuitability of our recordings to perform reliable fine-grained spectral analyses. Although speech was recorded via a professional voice recorder and every effort was taken to improve the acoustic conditions in each recording location, some mild background noise was also captured during elicitation, which may impact spectral measures.Footnote 6 While the quality of the recording may also impact formant measurements, these are typically more robust, and our hand correction further ensures their accuracy. Future work, with recordings made in better sound-attenuated environments, may explore whether children exhibit a spectral difference between plain and emphatic consonants, using measures such as spectral mean and spectral tilt (Al-Khairy, Reference Al-Khairy2005; Al-Tamimi, Reference Al-Tamimi2015; Jongman et al., Reference Jongman, Herd, Al-Masri, Sereno and Combest2011). Understanding these spectral differences could provide more insights into the development of speech production in children, shedding light on how they acquire and refine articulatory patterns.
We also chose not to assess “adult-likeness” in the present study because determining adult-like patterns based on null-hypothesis significance testing may be misleading. While a significant p-value can indicate a difference between adults’ and children’s productions, a non-significant p-value does not necessarily mean they are identical and that the children’s productions are, therefore, “adult-like.” Future research may incorporate both qualitative and quantitative techniques, perhaps alongside longitudinal data collection, to offer a more comprehensive perspective on “adult-likeness” and its relevance in child speech development.
A final limitation to our interpretation of these data is due to the absence of corpus-based frequency data or a comprehensive, searchable dictionary for Saudi-Hijazi Arabic. As such, we cannot investigate or comment on the effects of frequency of lexical items or segmental frequency and functional load – all factors that are known to influence acquisition (Ambridge et al., Reference Ambridge, kidd, Rowland and Theakston2015; Pye et al., Reference Pye, Ingram and List1987; Stokes & Surendran, Reference Stokes and Surendran2005; Surendran & Niyogi, Reference Surendran and Niyogi2003). Future research is needed to investigate how these factors impact the accuracy and phonetic contrast with which children produce emphatic consonants.
While the focus of the present study was on the acoustic realisation of emphatic contrasts, the non-target-like productions observed during data annotation raise questions for future research. Among 3- and 4-year-olds, emphatic fricatives were occasionally realised with an inserted [t], producing forms such as [statan] for /sˤadam/ and [gafsats] for /gafasˤ/. Final consonant omission was also observed, as in /gafasˤ/ realised as [gafa], suggesting possible articulatory simplification strategies. One 5-year-old produced [qafasˤ] for /gafasˤ/, which may reflect emphasis overspread affecting the place of articulation of /g/ or a style shift towards the MSA pronunciation. In Hijazi Arabic, /q/ is typically realised as [g]; the child’s production of [q], therefore, is possibly triggered by increased exposure to MSA forms in formal settings, for example, schools. A similar observation was reported for Palestinian Arabic children and adolescents, who showed diglossic style shifting towards the Standard Arabic [q] in a school-like picture-naming task (Shetewi et al., Reference Shetewi, Corrigan and Khattab2024). These patterns were not analysed systematically (and were excluded from the current analysis), as they fall outside the acoustic focus of the current study. However, they raise important questions for future research – particularly about the link between phonetic patterns and phonological representations in a diglossic context.
As a final direction for future research, determining whether children are attempting the secondary constriction gesture when producing emphatics, as expected from an adult speaker, requires a closer inspection of the tongue. Future articulatory work may be necessary to confirm the articulatory gestures underlying the current acoustic results and provide a more comprehensive understanding of the developmental trajectory of these sounds.
5. Conclusion
This study is the first to assess the acquisition of emphatics by typically developing Saudi-Hijazi Arabic-speaking children between 3 and 6 years old. It is the first, to the best of the authors’ knowledge, to instrumentally examine both consonantal and vocalic cues to the plain–emphatic contrast. Our results show that children produce both consonantal and vocalic cues to the contrast and progressively enhance these contrasts over time. Additionally, the overall patterns observed in children’s productions closely align with those observed in adults, suggesting the potential impact of adult input on children’s production of emphatics.
Although this study was not designed with applied outcomes as its primary aim, the findings have potential relevance for speech-language therapists and educators working with Arabic-speaking children. The emergence of an acoustic contrast by age 3 years suggests that some children may be developing articulatory targets earlier than perceptual judgements alone would suggest (e.g., Amayreh, Reference Amayreh2003; Amayreh & Dyson, Reference Amayreh and Dyson1998). Awareness of the potential of covert contrast (Macken & Barton, Reference Macken and Barton1980; Scobbie, Reference Scobbie1998) may help clinicians in accurately assessing children’s abilities and challenges.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000925100214.
Competing interest
The authors declare none.