1. Introduction
Like most Salish languages, Lushootseed (a Coast Salish language) is known for having many obstruents in its phonemic inventory. Of the 37 consonants in Lushootseed, 31 are obstruents (i.e., 84%). The consonant inventory of Lushootseed is given in Table 1. This study examines obstruents in Lushootseed and will look at a number of acoustic measures that characterize the stop, affricate, and fricative contrast in the language. Given that the data for this study comes from archival recordings dating to the 1950s (see Section 2.1 for more details), this study also seeks to address the question of what acoustic measures can be used to analyze these kinds of recordings.
Table 1. Lushootseed consonants. The Americanist phonetic symbols used in the standard orthography for Lushootseed are shown in angled brackets ‹…›

Note: Abbreviations: ejv. = ejective; fric. = fricative; laryng. = laryngealized; lat. = lateral.
Although there was some work that’s been done on obstruents for other Salish languages (Flemming et al. Reference Flemming, Ladefoged and Thomason2008; Bird Reference Bird2016; Bird et al. Reference Bird, Gerdts and Leonard2016; Howson & Bird Reference Howson and Sonya2021; Percival Reference Percival, Sasha Calhoun, Tabain and Warren2019, Reference Percival2024), only a few conducted acoustic analyses of obstruents in these languages (Flemming et al. Reference Flemming, Ladefoged and Thomason2008; Bird Reference Bird2016; Percival Reference Percival, Sasha Calhoun, Tabain and Warren2019, Reference Percival2024; Howson & Bird 2021). The goal of this study is to fill a major gap in the literature on the realization of obstruents in the language.
1.1 Background
Lushootseed (ISO 639-3: lut) is a Coast Salish language spoken in the Puget Sound region of the Pacific Northwest (PNW). The borders extend from south Puget Sound and north past the Skagit Valley, as well as east parts of Kitsap Peninsula and western parts of the Cascades. According to Ethnologue (Eberhard et al. Reference Eberhard, Simons and Fennig2021), there are no fluent speakers remaining. However, there are many second language learners of Lushootseed, as it is a revitalizing language. The last fluent speaker of Lushootseed passed away in 2008. It is for this reason that an examination of archival recordings of fluent elder speakers is of special interest. There are two dialects of Lushootseed: Southern Lushootseed and Northern Lushootseed. Figure 1 is a map of the distribution of these two dialects.

Figure 1. Regional dialects of Lushootseed, adapted from Thom (2011).
Phonological differences between these dialects are not well understood. According to Hess (Reference Hess1977), they differ by their placement of stress. For example, the syllable containing the first non-schwa vowel of a stem is the location of primary stress in the Northern dialect, while the primary stress usually falls on the penultimate syllable of a stem in the Southern dialect.
There are 37 contrastive consonants in Lushootseed (see Table 1), where 31 are obstruents: 17 stops (10 pulmonic, six ejectives, and one glottal stop), seven affricates (four pulmonic and three ejectives), and seven fricatives (all voiceless and pulmonic). The secondary articulation of labialization for dorsal obstruents is quite common in PNW languages. Lushootseed is also known to lack nasals in its phonemic inventory. This is because Lushootseed and other neighboring languages in the linguistic area (such as Twana, Quileute, Makah, and Diditat) have undergone denasalization, where all nasal stops historically changed to voiced oral stops.
In the sections that follow, I discuss two topics that relate to the sound inventory of Lushootseed. In Section 1.2, I discuss typological research on ejectives. In Section 1.3, I discuss spectral estimates of fricatives and affricates, and potential limitations of obtaining reliable spectral estimates of fricatives and affricates from old archival recordings.
1.2 Ejective typology
Cross-linguistically, ejectives are known to differ in VOT and the voice onset quality of the following vowel (Kingston Reference Kingston1985, Reference Kingston2005; Warner Reference Warner1996; Wright et al. Reference Wright, Hargus and Davis2002; Stevens & Hajek Reference Stevens and Hajek2008; Hargus Reference Hargus2011; Percival Reference Percival2015, Reference Percival2016ab, Reference Percival, Sasha Calhoun, Tabain and Warren2019). According to Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005), there are cross-linguistic differences in the realization of ejectives. These cross-linguistic features differ in several acoustic properties. According to Kingston (Reference Kingston2005), stiff ejectives are characterized by a silent period between the consonant release and voice onset, resulting in a long VOT. Moreover, stiff ejectives are achieved with increased longitudinal tension and medial compression of the vocal folds, resulting in a raised f0 at voice onset and a tense or modal voice quality. Other characteristics of stiff ejectives include a sharp rise in the amplitude of the vowel and a relatively intense burst. Stiff ejectives apparently occur in Tigrinya (Kingston Reference Kingston1985), Nez Perce (Aoki Reference Aoki1970), Montana Salish (Flemming et al. Reference Flemming, Ladefoged and Thomason2008), K’ekchi (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996), and Navajo (Lindau Reference Lindau1984). On the other hand, slack ejectives are characterized by a short VOT and little longitudinal tension, resulting in a depressed f0 at voice onset with creaky voice quality at vowel onset. Other characteristics of slack ejectives include a slow rise time in the amplitude of the vowel and a normal burst. Slack ejectives apparently occur in Hausa (Lindau Reference Lindau1984; Lindsey et al. Reference Lindsey, Hayward and Haruna1992), Quiche (Kingston Reference Kingston1985), and Gitksan (Ingram & Rigsby Reference Ingram, Rigsby, Gamkrelidze and Remmel1987). Table 2 provides a summary of the Lindau/Kingston classification of stiff and slack ejectives.
Table 2. Proposed ejective typology following Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005)

However, Warner (Reference Warner1996), Kingston (Reference Kingston1985), Wright et al. (Reference Wright, Hargus and Davis2002), and Percival (Reference Percival2024) report language-dependent and speaker-dependent variation in these acoustic features for ejectives. Although a relatively short VOT, irregular voicing at voice onset, weak burst, and a slow rise time are observed for ejective stops in Ingush (like slack ejectives), there is a raised pitch at voice onset (like stiff ejectives) (Warner Reference Warner1996). Kingston (Reference Kingston1985) reports speaker variation in f0 following ejectives in Tigrinya. Contrary to the predictions of ejective typology proposed by Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005), Wright et al. (Reference Wright, Hargus and Davis2002) found considerable variation across speakers in Witsuwit’en. Some speakers revealed properties of stiff and slack ejectives, such as short VOT (like slack ejectives) and modal/tense voice quality at voice onset (like stiff ejectives). Other speakers revealed depressed f0 (like slack ejectives) and long VOT (like stiff ejectives). In an acoustic analysis of ejectives in Hul’q’umi’num’, which is another Coast Salish language, Percival (Reference Percival, Sasha Calhoun, Tabain and Warren2019, Reference Percival2024) found that although ejectives tend to have longer VOT (like stiff ejectives), they also revealed depressed fundamental frequency and lower (negative) spectral tilt, which is characteristic of creaky voice (like slack ejectives). In a recent typological study of ejectives across languages, Percival (Reference Percival2024) found that many languages do not necessarily fit the typological model of ejectives proposed by Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005), and that there are more variable realizations of ejectives based on these acoustic parameters.
One of the goals of this study is to examine the acoustic properties of Lushootseed ejectives and compare it against the Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005) model for ejectives. Moreover, most typological research on ejectives focused on ejective stops but not affricates. Another goal of this study is to examine the acoustic properties of affricates and to address where affricates fit in our typological understanding of ejectives. As we will see later, Lushootseed ejectives reveal acoustic properties that are observed in both stiff and slack ejectives. Moreover, affricates reveal acoustic correlates that contrast from stops. This suggests that there is no need to classify Lushootseed ejectives as either stiff or slack.
1.3 Spectral estimates of fricatives and affricates from archival recordings
A popular topic in the phonetics literature is the description and classification of fricatives based on spectral measurements (Jesus & Shadle Reference Jesus and Shadle2002; Jesus & Jackson Reference Jesus and Jackson2008; Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013; McMurray & Jongman Reference McMurray and Jongman2011; Shadle Reference Shadle, Cohn, Fougeron and Huffman2012, Reference Shadle2023). Part of the reason why there has been increased focus on this topic is because of the relative difficulty in obtaining reliable measures that distinguishes fricatives, whose source is very noisy due to the turbulent airflow that characterizes these consonants. Some of the acoustic parameters that have been reported to describe and classify fricatives include spectral peak locations (Żygis et al. 2012; Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013), spectral moments (Forrest et al. Reference Forrest, Weismer, Milenkovic and Dougall1988), and spectral slope and curvature (Jannedy & Weirich Reference Jannedy and Weirich2016, Reference Jannedy and Weirich2017).
A problem in this study is to identify reliable spectral measurements of fricatives and affricates from old archival recordings (see Section 2.1 for more details on the recordings). In general, old archival recordings tend to be noisier and have less optimal sound quality that could affect spectral measurements. Although most studies focus primarily on fricatives using up-to-date sound equipment (such as the use of condenser microphones and sound attenuated booths), older recordings (especially recordings dating to the 1950s) may not have the same capabilities of capturing certain aspects of the acoustics of speech as recording equipment used today. Another goal of this study is to identify reliable spectral measurements that can characterize the fricative contrast from these recordings, as well as discuss methods and limitations of obtaining spectral measures from old archival recordings dating to the 1950s.
2. Methods
2.1 Recordings
The recordings used for this study come from the Leon Metcalf collection, which is stored in the University of Washington’s Burke Museum’s Special Collections. These recordings were made by the musicologist Leon Metcalf, who spent over five years (1950–1956) visiting and recording elder speakers of Lushootseed throughout the Puget Sound (Hilbert Reference Hilbert1995:viii). Because of his close ties with several Indigenous members of Tulalip, Metcalf was able to meet and record several Indigenous elders in the Puget Sound region. Metcalf recorded a variety of language materials from these elders, which includes recordings of Salish tradition, legends, songs, oral histories, and private correspondences. These recordings were digitized at a sampling rate of 44.1kHz with a 32-bit depth.
2.2 Speakers
Two speakers are examined: Annie Jack and Martha Lamont. Annie Jack was an elder speaker of the Southern Lushootseed dialect and was born near Green River in the 1870s. She lived in the Muckleshoot Tribal Reservation her entire life. She was recorded by Leon Metcalf sometime between 1951–1954. Some of her living descendants include Denise Bill (great-granddaughter), Willard Bill, Jr. (great-grandson), Elise Bill-Gerrish (great-great-granddaughter and daughter of Denise Bill), and Justice Bill (great-great-grandson and son of Willard Bill, Jr.). In this study, seven recordings of Annie Jack (with a total duration of approximately 67 minutes) were examined. These are recordings of traditional Salish stories and oral histories.
Martha Lamont was an elder speaker of the Northern Lushootseed dialect and was born around the 1880s, where she lived in the Tulalip reservation her entire life. There are a few recordings of private correspondence between Annie Jack and Martha Lamont that were recorded by Metcalf. Martha Lamont was recorded by Leon Metcalf sometime between 1950–1956. Some of her living descendants include Hank Williams (grandson) and Hank’s descendants. In this study, four recordings (duration of approximately 51 minutes) of Martha Lamont were examined. Two of these recordings are recordings of private correspondence, and the other two are recordings of traditional Salish stories and oral histories.
2.3 Measurements
The software Praat (Boersma & Weenink Reference Boersma and Weenink2023) and R Studio (R Core Team 2018) was used to segment the data and extract acoustic measurements. Stops and affricates were observed in word-initial and word-medial (intervocalic) position, while fricatives were observed word-initially, word-medially, and word-finally.Footnote 1 Although stress could not be controlled in this study, most of the tokens (especially for stops and affricates) were observed in stress-initial position (except for tokens of the stop [t], which were observed most frequently in the unstressed determiners /ti/ ‘this’ and /tiːɬ/ ‘that’). Only a few word-medial (intervocalic) stops and affricates were observed in unstressed position, but they were only a handful. A total of 1,524 tokens (881 for stops, 425 for affricates, and 218 for fricatives) were examined (see Table A in Appendix A for the total number of tokens for each stop, affricate, and fricative).
An acoustic analysis of stops and affricates is conducted by using Voice Onset Time (VOT) measurements, closure duration, burst intensity for voiceless and ejective stops, and the intensity of the frication component for affricates. Closure duration could be compared between word-initial and word-medial (intervocalic) position because there were several word-initial stops that were preceded by a word-final vowel. (However, closure duration for word-initial stops was omitted from the analysis when they were preceded by a word-final stop or when preceded by a pause). VOT has been used to measure aspiration contrasts in stops (Lisker & Abramson Reference Lisker and Abramson1964). It has also been used to compare ejectives with other laryngeal settings, as well as a comparison between affricates (Abramson Reference Abramson, Bell-Berty and Raphael1995; Ladefoged & Maddieson Reference Ladefoged and Maddieson1996; Cho & Ladefoged Reference Cho and Ladefoged1999; Abramson & Whalen Reference Abramson and Whalen2017). According to Lisker & Abramson, VOT is “the single most effective measure whereby homorganic stop categories in languages generally may be distinguished physically and perceptually….” (Lisker & Abramson Reference Lisker and Abramson1964:136). In this study, I measure VOT by taking the interval between the initial burst release to the first positive or negative movement of periodicity (i.e., voice onset). For obstruents realized with voicing, VOT was measured by taking the interval from the first positive or negative movement of periodicity (i.e., voice onset) to the initial burst release. For affricates, following the suggestions of Abramson (Reference Abramson, Bell-Berty and Raphael1995) and Abramson & Whalen (Reference Abramson and Whalen2017), VOT and the duration of the release frication were measured, where the duration of the release frication was measured from the initial release from the closure to the end of the frication period. Burst intensity (dB SPL) was measured for stops. For affricates, intensity (dB SPL) was measured at five time points (10%, 30%, 50%, 70%, 90%) throughout the affricates’ release frication duration.
To examine the voice onset quality of stops and affricates, f0, jitter, and amplitude rise (i.e., intensity difference) were examined. Jitter was obtained from a 30ms window at voice onset and vowel midpoint. Voiced period marks (pulses) were generated in Praat. These marks were inspected for errors and corrected by hand if necessary. Jitter is the variation in the duration of successive fundamental frequency cycles (Gordon & Ladefoged Reference Gordon and Ladefoged2001). The mean percent jitter ratio was calculated by taking the average absolute difference between consecutive pulses divided by the average period and multiplied by 100 (Koike Reference Koike, Runia, Mendelson and Winston1973), which is also the default method of calculating jitter in Praat. Fundamental frequency f0 was measured by interpolating the pitch track and then extracting the f0 at voice onset and vowel midpoint from the interpolated pitch. The interpolated pitch was extracted from the voicing component of the vowel. f0 was measured at 10% of the voicing component of the vowel as the onset and 50% of the vowel duration as the vowel’s midpoint. Intensity was obtained at the voice onset and the vowel’s peak amplitude. Following Wright et al. (Reference Wright, Hargus and Davis2002), f0, jitter, and intensity were normalized, as summarized in Table 3. f0 perturbation was calculated by subtracting the mean f0 at vowel onset from the mean f0 at vowel midpoint. Jitter perturbation was calculated in a similar way, where the mean jitter at voice onset was subtracted by the mean jitter at the vowel midpoint. RMS amplitude difference was calculated by subtracting the RMS energy (in dB) at the vowel’s peak amplitude from the RMS energy (in dB) at the vowel onset.
Table 3. Formulas for normalized measures based on Wright et al. (Reference Wright, Hargus and Davis2002)

For fricatives and affricates, I examined the following spectral measurements: (1) FreqM, also known as the frequency of the main peak in the low- and/or mid-frequency ranges depending on the place of articulation (for alveolars, this is between 3–7kHz; for post-alveolars and laterals, 1.5–4kHz; and for posterior (dorsal) places of articulation, between 500Hz–2kHz) (Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013; Shadle Reference Shadle2023); and (2) the first and second Discrete Cosine Transform (DCT) coefficients to characterize the spectral shape (Jannedy & Weirich Reference Jannedy and Weirich2017). (See the Discussion, Section 4.2, for why these measurements are preferred over spectral moments). According to Shadle (Reference Shadle2023), the frequency measurement FreqM is transparently related to “aspects that controlled the filter, such as the length of the front cavity” (Reference Shadle2023: 1424). FreqM is thus a simple frequency measurement that provides a close approximation to aspects of the filter and has been used for the analysis of sibilants and other fricatives in English (Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013; Shadle Reference Shadle2023).Footnote 2 According to Jannedy & Weirich (Reference Jannedy and Weirich2017), DCT coefficients are obtained by decomposing a spectrum into a set of half-cycle and full-cycle cosine waveforms, where properties of the amplitude of these waveforms are the DCT coefficients. There are four DCT coefficients: DCT0 is the spectrum’s mean amplitude; DCT1 reflects the slope; DCT2 reflects the curvature; and DCT3 captures the amplitude of higher frequencies (Jannedy & Weirich Reference Jannedy and Weirich2017:399). In this study, DCT1 (which reflects the slope) and DCT2 (which reflects the curvature) is obtained. The software that was used to obtain DCT coefficients was R Studio using the package emuR (Jochim et al. Reference Jochim, Raphael Winkelmann, Cassidy and Harrington2024). FreqM, DCT1, and DCT2 were extracted from multitaper spectrums (Thomson Reference Thomson1982; Percival & Walden Reference Percival and Walden1993; Reidy Reference Reidy2015), which generates a consistent spectral estimate from samples of a single interval by multiplying those samples by sets of orthogonally weighted tapers (Blacklock Reference Blacklock2004; Reidy Reference Reidy2015; Shadle Reference Shadle, Cohn, Fougeron and Huffman2012, Reference Shadle2023). The window length for the spectrum was 10ms (extracted at the midpoint of the affricate and fricative duration). The recordings were down sampled at 14kHz so that the Nyquist matches the frequency of the cutoff. The software that was used to obtain multitaper spectrums was R Studio, using the package multitaper (Rahim et al. Reference Rahim, Burr and Thomson2014). Moreover, correlations between FreqM, DCT1, and DCT2 were analyzed to observe the relationship between the frequency of the main peak of each fricative/affricate and the slope and curvature of the spectrum.
2.4 Statistical analysis
Using the R package lme4 (Bates et al. Reference Bates, Mächler, Bolker and Walker2015), data was fit into a linear mixed-effects model, where the acoustic measurements were treated as dependent variables; segments,Footnote 3 place of articulation (PoA), laryngeal types (voiced vs. voiceless vs. ejective), and word position (initial vs. medial) were treated as fixed effects; and speakers and words were treated as random effects. A t-value of 2 or greater was considered statistically significant. For stop and affricate duration contrasts, word position, PoA, and/or laryngeal types were used as random slopes; for affricate spectral measurements, PoA was used as random slopes; for fricatives, the segments themselves were used as random slopes; and for voice onset quality measurements of stops and affricates, laryngeal type was used as random slopes. The following equation gives the model for the data analysis.
-
(1) Acoustic measurement ∼ Segment + PoA + Laryngeal Type + Position + (Segment OR PoA OR Laryngeal Type OR Position| Speaker) + (1|Word)
For stops, backwards difference coding was used to characterize the contrast between each of the stops’ place of articulation (i.e., alveolar compared with bilabial, velar compared with alveolar, uvular compared with velar, etc….). Similarly for fricatives, backwards difference coding was used to characterize the contrast between each fricative with respect to their place of articulation (i.e., post-alveolar compared with alveolar, lateral compared with post-alveolar, uvular compared with lateral, labio-velar compared with uvular, and labio-uvular compared with labio-velar).
3. Results
3.1 Lushootseed stops
The primary articulators for Lushootseed stops include labial /b p p̓/, coronal /d t t̓/, dorsal /ɡ k k̓ ɡʷ kʷ kʷ̓ q q̓ qʷ qʷ̓/, and glottal /ʔ/. Dorsal stops can be distinguished based on the presence/absence of the secondary articulation of labialization. In this section, I analyze each stop based on their closure duration, VOT, burst intensity, and voice onset quality. In Section 3.1.1 to 3.1.2, I examine duration measurements (closure duration and VOT). In Section 3.1.3, I examine burst intensity, where I compare voiceless stops with ejective stops in Lushootseed. In Section 3.1.4, I examine the voice onset quality of Lushootseed stops.
3.1.1 Closure duration
Table 4 provides descriptive statistics for closure duration of the stops’ three-way laryngeal types (voiced, voiceless, ejective) with respect to word position (word-initial vs. word-medial).
Table 4. Means and standard deviations (in parentheses) of closure duration (in ms) for the stops three-way laryngeal types (voiced, voiceless, ejective) with respect to word position (word-initial vs. word-medial)

As Figure 2 illustrates, the closure duration of voiceless stops are longer than the closure duration of voiced stops and ejective stops. Moreover, the closure duration of ejectives did not appear to differ from voiced stops.

Figure 2. Closure duration (in ms) for the stops’ three-way laryngeal types (voiced, voiceless, ejective) in word-initial vs. word-medial position. Red diamonds plot the means (here and throughout).
The test from the linear mixed effects model reveals that voiceless stops have a significantly greater closure duration than voiced (
$\beta$
= 13.548, SE = 3.556, t = 3.810) and ejective (
$\beta$
= 17.075, SE = 4.884, t = 3.496) stops, but voiced and ejective stops did not significantly differ from each other. Moreover, there was no significant effect of word position on closure duration, which suggests that closure duration is approximately the same word-initially and word-medially.
Table 5 provides a summary of closure duration for each of the stops’ place of articulation. Figure 3 illustrates the distribution of closure duration (in ms) for each place of articulation.
Table 5. Means and standard deviations (in parentheses) of closure duration for each of the stops’ place of articulation


Figure 3. Closure duration (in ms) for each of the stops’ place of articulation.
The test from the linear mixed effects model reveals that the closure duration of alveolar stops did not significantly differ from bilabial stops, and alveolar stops did not significantly differ from velar stops. The velar and uvular place of articulation did not significantly differ from one another. However, the closure duration of labio-dorsal stops was significantly less than the closure duration of their non-labialized counterparts, where labio-velars was significantly less than velars (
$\beta$
= −16.545, SE = 7.346, t = −2.252) and labio-uvulars was significantly less than uvulars (
$\beta$
= −28.361, SE = 5.370, t = −5.281).
3.1.2 Voice Onset Time
Table 6 summarizes the mean VOT (and standard deviation) for all stops in Lushootseed. There were a fair number of observations of each stop (at least 20 tokens or more) in the data. However, /ɡ/ occurred rarely in the data (where there were only nine observations of this stop). This might be due to lexical reasons, where there are far fewer words that are initialized with /ɡ/ in Lushootseed. According to the Lushootseed dictionary (Bates et al. Reference Bates, Hess and Hilbert1994), there are only 12 entries of words containing /ɡ/. Words initialized with /ɡ/ occurred rather infrequently in the data. Only three words containing /ɡ/ were observed: The word /ɡəqil/ ‘brightness, sunshine’ (which occurred four times, where two were preceded by the progressive prefix /lə-/), the word /ɡaqʃədid/ ‘move aside’ (which occurred twice), and the word /ɡəq̓ad/ ‘opening it (pass)’ (which occurred three times, where one of the three instances was preceded by the progressive prefix /lə–/) (six word-initial and three intervocalic).
Table 6. Means and standard deviations (in parentheses) of VOT (in ms) for each stop with respect to laryngeal type and place of articulation

Figure 4 illustrates the distribution of VOT for each stop with respect to their laryngeal type (voiced, voiceless, ejective) and place of articulation.

Figure 4. Box plot illustrating the VOT of each stop with respect to their laryngeal type (voiced, voiceless, ejective) and place of articulation.
As Figure 4 illustrates, VOT was the longest for ejective stops, followed by voiceless stops, and then voiced stops. The test from the linear mixed effects model reveals that ejective stops have a significantly greater VOT than voiceless stops (
$\beta$
= 33.302, SE = 5.874, t = 5.670), and voiceless stops significantly greater than voiced stops (
$\beta$
= 86.694, SE = 3.860, t = 22.458). Moreover, a place effect on VOT was observed, where dorsal stops have a significantly greater VOT than anterior (bilabial and alveolar) places of articulation (
$\beta$
= 63.409, SE = 6.285, t = 10.089). There was no significant difference between the two word-positions (i.e., word-initial vs. word-medial (intervocalic)).
When compared with the world’s languages, the VOT of voiceless pulmonic stops appear to closely match the VOT of voiceless stops in Montana Salish and Yapese (Cho & Ladefoged Reference Cho and Ladefoged1999:219; Flemming et al. Reference Flemming, Ladefoged and Thomason2008). When compared with languages that contrast aspirated stops from unaspirated stops, the VOT of voiceless pulmonic stops in Lushootseed were not as long as the VOT of aspirated stops in languages such as Gaelic, Apache, Navajo, Khonoma Angami, or Tlingit (Cho & Ladefoged Reference Cho and Ladefoged1999; Maddieson Reference Maddieson2001). However, it was not as short as the VOT of unaspirated stops in those languages either. Voiceless pulmonic stops in Lushootseed could be interpreted as being slightly aspirated.
Ejectives almost always have a silent period following the burst release, and its VOT is longer than voiceless pulmonic stops. According to Cho & Ladefoged (Reference Cho and Ladefoged1999), ejectives do not show the same restrictions as plain stops do in terms of VOT for different places of articulation. However, Figure 4 reveals that, like voiceless stops, there appears to be a place effect on VOT of ejective stops, where bilabial /p̓/ has the shortest VOT and /qʷ̓/ has the longest VOT. The effects of place of articulation on the VOT of ejectives has also been observed in other languages, where ejective bilabial and/or alveolar stops have a significantly shorter VOT than dorsal stops (Warner Reference Warner1996; Maddieson Reference Maddieson2001; Hajek & Stevens Reference Hajek and Stevens2005; Stevens & Hajek Reference Stevens and Hajek2008; Vicenik Reference Vicenik2010; Hargus Reference Hargus2011; Percival Reference Percival2015, Reference Percival, Sasha Calhoun, Tabain and Warren2019, Reference Percival2024; Bayraktar & Ridouane Reference Bayraktar and Ridouane2016). The current findings appear to pattern with many languages such as Yapese (Cho & Ladefoged Reference Cho and Ladefoged1999; Maddieson Reference Maddieson2001), Nez Perce (Maddieson Reference Maddieson2001), Witsuwit’en (Hargus Reference Hargus2011), Georgian (Vicenik Reference Vicenik2010), Hul’q’umi’num’ and Délinę Slavey (Percival Reference Percival2015, Reference Percival2024), and Waima’a (Hajek & Stevens Reference Hajek and Stevens2005), where the velar and/or uvular place of articulation has a longer VOT than alveolars and bilabials. Unlike languages such as Montana Salish (Cho & Ladefoged Reference Cho and Ladefoged1999), where the VOT of ejective bilabials is greater than ejective alveolars, the VOT of the ejective bilabial stop in Lushootseed appeared to be the shortest on average.
Although the VOT of ejective stops was (on average) greater than voiceless pulmonic stops, there was considerable variability in the ejectives’ VOT. This can be observed for the ejective stops /p̓/ and /t̓/, where the distribution of their VOT is considerably wider than their voiceless pulmonic counterparts. There was also considerable variability in the VOT of uvular ejectives. This suggests that ejectives have variable VOT. Figure 5 shows two waveforms that illustrate the variable VOT of [t̓], where Figure 5a illustrates [t̓] with a long VOT (approximately 89ms) while Figure 5b illustrates [t̓] with a short VOT (approximately 40ms). Both come from the same word, /t̓ukʷ̓/ ‘go home’, told by the same speaker, Annie Jack. This suggests that there is variable VOT for ejective stops.

Figure 5. Waveform for the ejective alveolar stop [t̓] with (a) long VOT (approximately 89ms), and (b) short VOT (approximately 40ms). Both are from the word /t̓ukʷ̓/ ‘go home’, said by the same speaker Annie Jack.
There were many instances where the ejective uvular /q̓/ and labio-uvular /qʷ̓/ had a noisier release and were often affricated. This can be observed in Figure 6, where the ejective uvular stop /q̓/ has a long release frication (rf) duration following the release, being realized as [qχ̓].

Figure 6. Waveform for the ejective uvular stop /q̓/, realized as [qχ̓], in the word /q̓ilts/ ‘his skunk cabbage’. Abbreviations: c = closure, rf = release frication, s = silence (‘i’ is the vowel /i/).
It is possible that ejective uvular and labio-uvular stops were produced with a more extended area of contact. The change in cross-sectional area of the constriction for uvulars would be much slower because of the extended area of contact between the tongue dorsum and the uvula. It was shown from X-ray studies that, in Lebanese Arabic, the tongue dorsum usually forms a constriction near the upper pharynx in the production of uvular consonants (Delattre Reference Delattre1971), which would create a greater area of contact. The rate of change in the cross secional area would be much slower at this region, which would generate a longer period of frication noise following the release. The speakers in this study may have produced ejective uvular stops in a similar manner. Bird (Reference Bird2016) found that in SENĆOT–EN, another Coast Salish language related to Lushootseed, uvular stops tend to have a noisier release than velar stops, and this provides a reliable perceptual cue to the contrast of these two places of articulation. While this was not always observed for plain (pulmonic) voiceless uvular stops in Lushootseed, the noisier (affricated) release can clearly be observed for ejective uvular and labio-uvular stops.
Voiced stops /b d/ were fully voiced in word-initial position (the voiced velar stops /ɡ ɡʷ/ are devoiced in word-initial position, discussed later). Most voiced stops in the data were realized with negative VOT because voicing occurred prior to the burst release. Figure 7 is a waveform of (a) the voiced bilabial stop /b/, and (b) the voiced alveolar stop /d/, in word-initial position.

Figure 7. Waveform and spectrogram of (a) voiced bilabial stop /b/ in the word /bitɬ̓idəxʷ/ ‘pound it now’, and (b) voiced alveolar stop /d/ in the word /dayˀiləxʷ/ ‘just now’.
As Figure 7 illustrates, the amplitude of the vocal fold vibration for both /b/ and /d/ gradually increased from the onset to about the middle of the voicing component, and then steadily decreased as it reaches the burst release. The mechanism behind voicing amplitude at the beginning of the stop closure is caused by a slight lowering of the larynx, which expands the cavity above the glottis, allowing vocal fold vibration to be maintained (Kent & Moll Reference Kent and Moll1969; Perkell Reference Perkell1969; Ewan & Krones Reference Ewan and Krones1974; Westbury Reference Westbury1979, Reference Westbury1983; Riordan Reference Riordan1980). This is accompanied by a passive expansion of the pharyngeal cavity, where the depression of the hyoid bone pulls the tongue body forward, creating a greater volume in the pharyngeal cavity (Kent & Moll Reference Kent and Moll1969; Ladefoged & Maddieson Reference Ladefoged and Maddieson1996). The arytenoid cartilages are in a neutral position, which is neither pulled apart nor pushed together (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996:50). However, once the larynx ceases to lower any further, the volume of the expanded cavity above the vocal folds remains static throughout the articulatory closure. This leads to the gradual build-up of intraoral pressure, which causes a gradual decrease in amplitude as it reaches the burst release. As the intraoral pressure increases, airflow through the glottis decreases, and eventually voicing dies out (Lindau Reference Lindau1984:148). This contrasts from implosives, where the larynx continually lowers throughout the articulatory closure. In the case of implosives, the lowering of the larynx keeps the supraglottal pressure from increasing. This allows the amplitude of the vocal fold vibration to gradually increase throughout the closure (Lindau Reference Lindau1984; Ladefoged & Maddieson Reference Ladefoged and Maddieson1996). It was suggested by Maddieson (Reference Maddieson, Haspelmath, Dryer, Gil and Comrie2008) and Cun Xi (Reference Cun2009) that Lushootseed is one of the few Indigenous American languages that has implosives (and not voiced stops) in its phonemic inventory. However, as the current findings reveal (as illustrated in Figure 7), the amplitude of the voicing component gradually decreased over time (unlike implosives, which increase over time). This suggests that voiced stops /b d/ in Lushootseed are pulmonic voiced stops and not implosives.
Voiced velar /ɡ/ and labio-velar /ɡʷ/ are usually devoiced word-initially, while they are usually fully voiced intervocalically. This is illustrated in Figure 8, where the voiced velar stop /ɡ/ is devoiced word-initially in the word /ɡaqʃədid/ ‘move aside’ (Figure 8a), while it is fully voiced throughout the closure intervocalically (Figure 8b).

Figure 8. Voiced velar stop /ɡ/ in (a) word-initial position, in the word /ɡaqʃədid/ ‘move aside’; and (b) intervocalically, in the word /ləɡəq̓ətəb/ ‘opening it (pass)’.
3.1.3 Burst intensity
A descriptive statistic for the burst intensity of voiceless and ejective stops is summarized in Table 7. The burst intensity for ejective stops was relatively greater than their voiceless pulmonic counterparts across each place of articulation (except for the bilabial place of articulation, where ejective and voiceless bilabial stops did not appear to differ significantly in burst intensity). The burst intensity of voiced stops was omitted from the analysis because there were too many error points for voiced stops.Footnote 4
Table 7. Means and standard deviations (in parentheses) of burst intensity (in dB SPL) for voiceless and ejective stops across each stop place of articulation

Figure 9 is a box plot illustrating the distribution of burst intensity for voiceless and ejective stops. Burst intensity for ejectives was relatively greater than the burst intensity of voiceless pulmonic stops across each place of articulation. The results from the linear mixed effects model reveals that ejective stops have a significantly greater burst intensity than voiceless stops (
$\beta$
= 5.5062, SE = 1.4291, t = 3.853). This suggests that ejectives in Lushootseed have properties of stiff features, where the burst intensity is relatively greater for ejectives.

Figure 9. Box plot illustrating the distribution of burst intensity (in dB) for voiceless and ejective stops across each place of articulation.
The data for VOT and burst intensity suggests that ejectives in Lushootseed may have properties of stiff ejectives because of the relatively louder burst intensity and longer VOT, which was always greater than voiceless pulmonic stops. However, as we will see in the next section, ejectives also have properties of slack ejectives, where the f0 perturbation and voice onset quality are depressed and creaky, respectively.
3.1.4 Voice onset quality
Figure 10 is a bar graph illustrating the stops (a) f0 perturbation, (b) jitter perturbation, and (c) intensity difference.

Figure 10. Bar graph illustrating the stop’s normalized measurements for voice onset quality: (a) f0 perturbation, (b) jitter perturbation, and (c) intensity difference.
As Figure 10a illustrates, the f0 perturbation for ejective stops is the most depressed. Voiced stops also reveal depressed f0 perturbation, though not to the same extent as ejectives. The f0 perturbation for voiceless stops is slightly raised. The results from the linear mixed effects model reveals that voiceless stops has a significantly greater f0 perturbation that voiced stops (
$\beta$
= 6.524, SE = 3.316, t = 2.968), and voiced stops significantly greater than ejectives (
$\beta$
= 7.543, SE = 1.879, t = 4.014). This suggests that the f0 perturbation was the most depressed for ejective stops, voiced stops less depressed than ejectives, and voiceless stops slightly raised (i.e., Voiceless > Voiced > Ejective). Figure 10b illustrates that jitter perturbation was the greatest for ejectives, which suggests that ejectives are realized with a creaky voice onset. Voiced and voiceless stops tend to be realized as more modal voiced (with a lower jitter perturbation) than ejectives. The results from the linear mixed effects model reveals that ejectives have a greater jitter perturbation than voiced (
$\beta$
= 1.6262, SE = 0.4074, t = 3.992) and voiceless (
$\beta$
= 1.6613, SE = 0.3683, t = 4.511) stops, but voiced and voiceless stops did not significantly differ. For intensity difference, ejective stops did not significantly differ from voiced and voiceless stops. There was no significant effect of word-position on these three measurements.
3.1.5 Summary of Lushootseed stops
The current findings reveal that voiceless stops have a longer closure duration than voiced and ejective stops, where voiced and ejective stops have a similar closure duration. Moreover, labio-dorsal stops have a shorter closure duration than their plain (non-labialized) dorsal counterparts. The VOT of ejectives is relatively greater than their voiceless pulmonic counterparts. Contrary to Maddieson (Reference Maddieson, Haspelmath, Dryer, Gil and Comrie2008) and Cun Xi (Reference Cun2009), Lushootseed voiced stops should be analyzed as pulmonic voiced stops and are not implosives. Ejective uvular stops have a noisier (and often affricated) release, which was not always observed for velars and plain (voiceless) uvular stops. Ejective stops have a relatively louder burst intensity than plain (voiceless) stops. For voice onset quality measurements, ejectives have a depressed f0 at voice onset and a creaky voice onset quality. This suggests that ejectives in Lushootseed have both stiff and slack properties.
3.2 Lushootseed affricates
There are three primary articulators for affricates: Alveolar /dz ts ts̓/ ‹dᶻ c c̓›, post-alveolar /dʒ tʃ tʃ̓/ ‹ǰ č č̓›, and lateral /tɬ̓/ ‹ƛ̓›. Pulmonic affricates in Lushootseed are sibilants, which are the most common types of affricates that can be found in the world’s languages (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996). The other affricates involve a glottalic egressive airstream mechanism (i.e., ejectives), which include /ts̓ tʃ̓ tɬ̓/ ‹c̓ č̓ ƛ̓› (where /tɬ̓/ ‹ƛ̓› does not have a pulmonic counterpart). In this section, I examine each affricate in Lushootseed. In Section 3.2.1 to Section 3.2.2, duration measurements (closure duration, VOT, and release frication duration) for each affricate are examined. In Section 3.2.3, spectral measurements (FreqM and DCT1) for each affricate are examined. In Section 3.2.4, intensity characteristics of each affricate are examined. In Section 3.2.5, measurements of the voice onset quality of the affricate’s following vowel are examined.
3.2.1 Closure duration
Table 8 summarizes the closure duration for the affricates three-way laryngeal type with respect to their word position.
Table 8. Means and standard deviations (in parentheses) of closure duration (in ms) for the affricates’ three-way laryngeal types (voiced, voiceless, ejective) with respect to word position (word-initial vs. word-medial)

As Figure 11 illustrates, there was an effect of word position on closure duration for the stops three-way laryngeal contrast. In word-initial position, voiced affricates had the longest closure duration. However, in word-medial position, the pattern looks similar as stops (see Section 3.1.1 for more detail), where voiced and ejective affricates have a shorter closure duration than voiceless affricates.
The test from the linear mixed effects model reveals that, in word-initial position, voiced affricates have a significantly greater closure duration than voiceless (
$\beta$
= 14.204, SE = 6.946, t = 2.045) and ejective (
$\beta$
= 17.936, SE = 6.016, t = 2.981) affricates, but voiceless and ejective affricates did not significantly differ from each other. In word-medial position, however, voiceless affricates had a significantly greater closure duration than voiced (
$\beta$
= 15.996, SE = 8.814, t = 2.815) and ejective (
$\beta$
= 13.095, SE = 5.546, t = 2.361) affricates, but voiced and ejective affricates did not significantly differ from each other.
Table 9 provides a summary of the closure duration of affricates with respect to their place of articulation. Figure 12 is a box plot illustrating the closure duration of the affricate’s place of articulation.
The test from the linear mixed effects model reveals that the alveolar place of articulation has a greater closure duration than post-alveolar (
$\beta$
= 8.128, SE = 3.030, t = 2.682) and laterals (
$\beta$
= 12.274, SE = 3.986, t = 3.079), but post-alveolar and laterals did not significantly differ in closure duration.
Overall, when compared with stops, affricates appear to have a shorter closure duration than stops. Figure 13 illustrates the closure duration (in ms) comparing stops and affricates. To test the effects of manner of articulation on closure duration, manner (stops vs. affricates) was used as fixed effects in the linear mixed effects model, which revealed that affricates have a significantly shorter closure duration than stops (
$\beta$
= −22.144, SE = 3.783, t = −5.853).
Table 9. Means and standard deviations (in parentheses) of closure duration (in ms) for each of the affricates’ place of articulation


Figure 11. Closure duration (in ms) for the affricates three-way laryngeal types (voiced, voiceless, ejective) in word-initial vs. word-medial position.

Figure 12. Closure duration (in ms) for each of the affricates’ place of articulation.

Figure 13. Closure duration (in ms) for stops and affricates.
3.2.2 Voice onset time and release frication duration
Means and standard deviations of VOT and release frication duration for each affricate are summarized in Table 10. The voiceless alveolar affricate /ts/ ‹c› had a comparable VOT as the voiceless post-alveolar affricate /tʃ/ ‹č›. Of the 23 instances of the voiced alveolar affricate /dz/ ‹dᶻ› that were examined, nine were realized with a positive VOT, which could narrowly be transcribed as [d̥z]. The realization of the voiced alveolar affricate /dz/ ‹dᶻ› as either devoiced (i.e., positive VOT) or voiced (i.e., negative VOT) was not predictable from its conditioning environment. Ejective affricates (overall) had a significantly greater VOT than voiceless affricates (
$\beta$
= 23.829, SE = 4.777, t = 4.989) and voiced affricates (
$\beta$
= 119.285, SE = 7.093, t = 16.818). However, there was no significant difference with respect to place of articulation and word position (i.e., word-initial vs. word-medial). The release frication duration for voiceless affricates was the longest, voiced affricates shortest, and ejective affricates in-between. Voiced affricates have a significantly shorter release frication duration than voiceless (
$\beta$
= −21.615, SE = 4.816, t = −4.488) and ejective (
$\beta$
= −11.852, SE = 5.283, t = –2.243) affricates. There was considerable cross-speaker differences in the frication duration for the alveolar place of articulation, where the ejective alveolar affricate had a greater frication duration than voiceless for the speaker Martha Lamont but not the speaker Annie Jack. Figure 14 is a boxplot illustrating the affricates’ (a) VOT, and (b) release frication duration.
Table 10. Means and standard deviations (in parentheses) of VOT (in milliseconds) and release frication duration (in milliseconds) for each affricate’s laryngeal type with respect to place of articulation


Figure 14. Box plot illustrating the distribution of (a) affricates’ VOT, and (b) affricates’ frication duration.
There was considerable variation in the release frication duration of the lateral ejective affricate. Two waveforms of the lateral ejective affricate /tɬ̓/ ‹ƛ̓› are illustrated in Figure 15. As Figure 15a highlights, there was a prolonged period of frication following the transient burst, which suggests that these segments are realized as affricates. However, there were a few instances where the frication duration was short. This can be observed in Figure 15b, which is a waveform of /tɬ̓/ ‹ƛ̓› with a shorter release frication.

Figure 15. Waveform of the lateral ejective affricate /tɬ̓/ ‹ƛ̓› with (a) long release frication, and (b) short release frication. Abbreviations: c = closure, rf = release frication, s = silence (‘u’ and ‘a’ are the vowels /u/ and /a/ respectively).
3.2.3 Spectral properties
Table 11 summarizes the spectral measurements FreqM, DCT1, and DCT2 for each affricate in Lushootseed. (It should be noted that these measures were obtained along the frication component for each affricate).
As Table 11 summarizes, FreqM for alveolar affricates [dz ts ts̓] was higher than post-alveolars [dʒ tʃ tʃ̓], and post-alveolars higher than laterals [tɬ̓]. Surprisingly, FreqM for the lateral ejective affricate [tɬ̓] was the lowest. DCT1 was the smallest for the alveolar place of articulation, highest for the lateral, and in-between for the post-alveolar affricates. DCT2 was the highest for alveolars, lowest for post-alveolars, and highly variable for laterals. Figure 16 is a box plot illustrating the distributions of FreqM (Figure 16a), DCT1 (Figure 16b), and DCT2 (Figure 16c) for each affricate.
Alveolars tend to reach peaks well above 7kHz (Shadle & Scully Reference Shadle and Scully1995; Shadle Reference Shadle, Cohn, Fougeron and Huffman2012, Reference Shadle2023; Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013). However, FreqM for the alveolar place of articulation in the current data is relatively low. The relatively low FreqM for the alveolar place of articulation might be explained by the upper frequency cutoff in these recordings (near 5∼7kHz), where there was less noise intensity concentrated at the upper frequency range. This may have obscured absolute measures of FreqM for the alveolar place of articulation.
Table 11. Means and standard deviations (in parentheses) for spectral measurements (FreqM, DCT1, and DCT2) for each affricate in Lushootseed


Figure 16. Boxplots for (a) FreqM (in Hz), (b) DCT1, and (c) DCT2, for each affricate. Green is the alveolar place of articulation, pink is post-alveolar, and purple is lateral.
The results from the linear mixed effects model reveals that the alveolar place of articulation has a significantly greater FreqM than post-alveolars (
$\beta$
= 1212.03, SE = 121.702, t = 9.959), and post-alveolars significantly greater than laterals (
$\beta$
= 1006.538, SE = 317.102, t = 3.174). There was no significant difference across voice quality (voiced, voiceless, and ejective) for FreqM. For DCT1, the alveolar place of articulation was significantly less than post-alveolars (
$\beta$
= −0.870, SE = 0.207, t = −4.185), and post-alveolars significantly less than laterals (
$\beta$
= −1.25, SE = 0.495, t = −2.528). Moreover, there were differences across the voice quality in DCT1, where DCT1 for voiced affricates was significantly greater than voiceless affricates (
$\beta$
= 0.593, SE = 0.1426, t = 4.159) and voiceless affricates significantly greater than ejective affricates (except for the lateral place of articulation) (
$\beta$
= 0.4837, SE = 0.1294, t = 3.737). For DCT2, the alveolar place of articulation was significantly greater than post-alveolars (
$\beta$
= 0.7851, SE = 0.2846, t = 2.758). However, laterals did not significantly differ from alveolars and post-alveolars. There was also an effect of voice quality on DCT2, where voiced affricates had a significantly greater DCT2 than voiceless (
$\beta$
= 0.884, SE = 0.2772, t = 3.190) and ejective (
$\beta$
= 0.8681, SE = 0.1763, t = 4.921) affricates, but voiceless and ejective affricates did not significantly differ from each other.
In other languages, the frequency of the peak for lateral affricates and fricatives tend to be closer to the peak of post-alveolar affricates and fricatives (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996; Gordon et al. Reference Gordon, Barthmaier and Sands2002; Percival Reference Percival2016a; Hargus et al. Reference Hargus, Levow and Wright2020). However, the frequency of the main peak for the lateral ejective affricate was significantly lower than post-alveolars in the current data. It is possible that the articulatory release for the lateral ejective affricate was made posteriorly along the sides of the palate. The more posterior the constriction, the lower the frequency of the main peak (Forrest et al. Reference Forrest, Weismer, Milenkovic and Dougall1988; Gordon et al. Reference Gordon, Barthmaier and Sands2002). However, it is also possible that the low FreqM was due to the length of the buccal cavities during the release.
The three spectral measurements (FreqM, DCT1, and DCT2) were fitted into a linear model to test the relationship between FreqM and DCT1, FreqM and DCT2, and DCT1 and DCT2. The results from the linear model reveals that the test of the relationship between FreqM and DCT1 are significantly (and negatively) correlated, R 2 = .57, F(1, 189) = 254.1, ***p<.001, where the lower the FreqM, the greater the slope (DCT1) of the spectrum. However, neither FreqM nor DCT1 correlates with DCT2. Figure 17 illustrates the linear relationship between FreqM and DCT1, where the two measures are negatively correlated. As FreqM decreases, DCT1 (i.e., the slope of the spectrum) increases. This is consistent with the findings of Jannedy & Weirich (Reference Jannedy and Weirich2017), where the slope of the spectrum increases as the length of the cavity in front of the constriction increases.

Figure 17. Scatter plot with a simple regression line (with 95% confidence bands) illustrating the relationship between FreqM and DCT1.
3.2.4 Intensity characteristics
Intensity of Sound Pressure Level (in dB) for all the affricates’ release frication was compared for the three laryngeal types (voiced, voiceless, and ejective) across each place of articulation (alveolar, post-alveolar, and lateral) at five different time points: 10%, 30%, 50%, 70%, 90%. Figure 18 is an example of a waveform and spectrogram illustrating the five time points where intensity was extracted for the affricate [tʃ̓] ‹č̓›.

Figure 18. Example of time points in red (10%, 30%, 50%, 70%, 90%) where intensity was extracted for the ejective post-alveolar affricate [tʃ̓] ‹č̓›.

Figure 19. Frication intensity (in dB) for five time points (10%, 30%, 50%, 70%, and 90%) of the total frication duration for voiced, voiceless, and ejective affricates by place of articulation (alveolar, post-alveolar, and lateral).
The results of the extracted intensity for each affricate’s laryngeal type is illustrated in Figure 19, which is a line chart (with 95% confidence intervals) plotting the average intensity of the release frication at each time point. There are some noteworthy patterns that can be observed in the intensity of the release frication with respect to the affricates’ laryngeal type. As Figure 19 illustrates, intensity drops at 90% of the frication component for ejective affricates. However, intensity increases at 90% for voiced affricates. There is a small change in intensity for voiceless affricates at 90%. Moreover, unlike the ejective alveolar and post-alveolar affricates (which reveal a steady rise in intensity from 10% to about 50% and 70%, and then a fall at 90%), lateral ejective affricates reveal a continual fall in intensity throughout their entire duration (from 10% to 90%).
The results from the linear mixed effects model reveals that there was a significant effect of laryngeal type on intensity at 90% of the release frication, where the intensity of the release frication for voiced affricates was greater than voiceless affricates (
$\beta$
= 3.62, SE = 0.90, t = 4.024), and voiceless affricates greater than ejectives (
$\beta$
= 6.00, SE = 0.9332, t = 6.432). A similar pattern was observed in Délinę Slavey and Eastern Oromo (Percival Reference Percival2016a,Reference Percivalb), where there is a dip in intensity at the end of the release frication period for ejective affricates, a rise in intensity at the end of the release frication period for voiced (or voiceless unaspirated in the case of Délinę Slavey) affricates, and voiceless (or voiceless aspirated in the case of Délinę Slavey) affricates in-between.
3.2.5 Voice onset quality
Figure 20 illustrates the affricates’ (a) f0 perturbation, (b) jitter perturbation, and (c) intensity difference.

Figure 20. Bar graph illustrating the affricates’ normalized measurements for voice onset quality: (a) f0 perturbation, (b) jitter perturbation, and (c) intensity difference.
As Figure 20a illustrates, like stops, the f0 perturbation for ejective affricates was the most depressed, followed by voiced affricates, and voiceless affricates were raised. The results from the linear mixed effects model indicates that the affricates’ laryngeal type had a significant effect on the f0 perturbation of the following vowel, where (like stops) ejective affricates had a significantly lower f0 than voiced affricates (β = −7.798, SE = 3.68, t = −2.118), and voiceless significantly greater than voiced (β = 13.158, SE = 3.67, t = 3.579) (i.e., Voiceless > Voiced > Ejective). Figure 20b illustrates that jitter perturbation for ejective affricates was the greatest, which suggests that there was greater aperiodicity (or creaky voicing) at the onset of the following vowel. The results from the linear mixed effects model reveals that ejective affricates had a significantly greater jitter perturbation than voiceless affricates (
$\beta$
= 2.139, SE = 1.061, t = 2.015), and voiceless affricates significantly greater than voiced (
$\beta$
= 1.3013, SE = 0.559, t = 2.325) (i.e., Ejective > Voiceless > Voiced). Unlike stops, the intensity difference for ejective affricates (as illustrated in Figure 20c) was greater (or “steeper”) than voiced and voiceless affricates. This suggests that the amplitude at voice onset for ejective affricates was significantly lower than voiced and voiceless affricates relative to its maximum intensity. The intensity difference for ejective affricates was significantly greater than voiced (
$\beta$
= 1.665, SE = 0.6621, t = 2.515) and voiceless (
$\beta$
= 1.364, SE = 0.4203, t = 3.245) affricates, but voiceless affricates did not significantly differ from voiced affricates (i.e., Ejective > Voiceless, Voiced). There was no significant effect of word-position on these three measurements.
3.2.6 Summary of Lushootseed affricates
In word-initial position, voiced affricates have a longer closure duration than voiceless and ejective affricates. However, in word-medial position, the closure duration of affricates parallel stops: Voiceless affricates have a greater closure duration than voiced and ejective affricates, while voiced and ejective affricates have a similar closure duration. Overall, affricates have a shorter closure duration than stops. The VOT of ejective affricates was greater than voiceless and voiced affricates. The release frication duration for voiceless affricates was the longest, voiced affricates shortest, and ejective affricates in-between. For spectral estimates, FreqM and DCT1 significantly correlated with the affricates’ place of articulation, where the lower the FreqM, the greater the slope. The intensity of the release frication differed across the affricates’ laryngeal types (voiced, voiceless, and ejective), where ejective affricates revealed a falling intensity at 90% of the frication component, voiced affricates a rising intensity at 90%, and voiceless affricates in-between. For voice onset quality, ejectives can be characterized as having a depressed f0 at voice onset, creaky voiced onset, and greater intensity difference between the voice onset and the vowel’s peak amplitude.
3.3 Lushootseed fricatives
In Lushootseed, all fricatives are pulmonic and voiceless. There are seven fricatives in Lushootseed. Each fricative is distinguished based on their place of articulation. These include /s ʃ ɬ xʷ χ χʷ h/ ‹s š ɬ xʷ x̌ x̌ʷ h›. In this section, I examine the acoustic properties of six of these fricatives: Namely, /s ʃ ɬ xʷ χ χʷ/ ‹s š ɬ xʷ x̌ x̌ʷ›. Two recordings from the speaker Annie Jack and one recording from the speaker Martha Lamont were analyzed because these recordings have a frequency cutoff at approximately 7kHz, making them suitable for the analysis of [s]. Fricatives were examined in word-initial, word-medial, and word-final positions. In what follows, I do a qualitative analysis on the spectral properties of each fricative (Sections 3.3.1 to Section 3.3.3) and a quantitative analysis (Section 3.3.4) that examines each fricative based on their spectral estimates.
3.3.1 Sibilants
There were (collectively) 52 /s/s and 28 /ʃ/s ‹š› that were observed in the data. /s/s tend to reach their peak amplitude near 4–5kHz, whereas /ʃ/s ‹š› tend to reach their peak near 2kHz. Figure 21 are samples of multitaper spectrums of /s/ and /ʃ/ ‹š›.

Figure 21. Multitaper spectrums extracted from the midpoint (10ms windows) of the fricative duration, for (a) the voiceless alveolar fricative [s], and (b) the voiceless post-alveolar fricative [ʃ] ‹š›.
As Figure 21 illustrates, the intensity of the peak for the voiceless post-alveolar fricative [ʃ] ‹š› is much greater than the voiceless alveolar fricative [s], while the frequency of the main peak for [s] is much higher than [ʃ] ‹š›.
Another observation worth noting is the second formant band (near 2kHz) for [s], which can also be observed in Figure 21. There were a few instances where [s] had a second formant band along the total fricative duration. This often occurred in word-medial (intervocalic) position. Figure 22 is a waveform and spectrogram illustrating the second formant band (around 2kHz) along the low- to mid-frequency range for [s] in the word /ʔusil/ ‘dive’.

Figure 22. Waveform and spectrogram illustrating the voiceless alveolar fricative [s] with mid-frequency bands, with the red arrow pointing at the formant structure near 1750Hz.
The second formant band was quite common in word-medial position. According to Richard Wright and Christine Shadle (personal communication), there may have been less lateral tongue bracing along the sides of the palate during the production of the voiceless alveolar fricative [s] in connected speech. This would generate turbulent airflow through a wider passage along the surfaces of the palate, which may introduce extra pole(s) in the transfer function of the signal. Moreover, there may be greater vowel-to-consonant coarticulation when [s] is produced intervocalically in connected speech. These may account for the second formant band observed in [s].
3.3.2 Lateral fricatives
There was a total of 40 lateral fricatives in the data. The voiceless lateral fricative [ɬ] had a unique property of sometimes introducing two peaks in the spectrum. This can be observed in Figure 23, where the arrows point to the two peaks in the multitaper spectrum. As Figure 23 illustrates, the first peak (which was near 2000Hz) has greater intensity than the second peak (which was near 3400Hz). The frequency of the main peak (i.e., the peak with the highest amplitude) occupies the same range as the main peak of the post-alveolar fricative [ʃ] ‹š› (c.f., Figure 21b), which is also near 2000Hz.

Figure 23. Multitaper spectrum for the lateral fricative [ɬ], with red arrows pointing at the two peaks.
3.3.3 Dorsal fricatives
There was a total of 26 [χ] ‹x̌›, 42 [xʷ], and 30 [χʷ] ‹x̌ʷ› observed in the data. Figure 24 illustrates samples of multitaper spectrums extracted for (a) labio-velar [xʷ], (b) uvular [χ] ‹x̌›, and (c) labio-uvular [χʷ] ‹x̌ʷ›.

Figure 24. Multitaper spectrums of (a) voiceless labio-velar fricative [xʷ], (b) voiceless uvular fricative [χ] ‹x̌›, and (c) voiceless labio-uvular fricative [χʷ] ‹x̌ʷ›.
Dorsal fricatives have peaks at the low-frequency range, which is primarily due to the length of the front cavity (which is greater than the length of the front cavity for sibilants and (presumably) laterals). Moreover, labio-dorsal fricatives have peaks at even lower frequencies than the plain uvular [χ], which suggests that lip rounding increases the length of the cavity in front of the constriction, effectively yielding peaks at lower frequencies. Lip rounding also appeared to affect the overall intensity of the fricative noise, where intensity at higher frequencies is reduced. Moreover, the acoustic coupling of lip rounding yielded a similar spectral shape between the labio-velar [xʷ] and labio-uvular [χʷ] ‹x̌ʷ›. While it is difficult to distinguish labio-velar [xʷ] (Figure 24a) from labio-uvular [χʷ] ‹x̌ʷ› (Figure 24c), the frequency of the main peak in the example of the labio-uvular [χʷ] ‹x̌ʷ› (measured at 519Hz) is lower than the frequency of the main peak in the example of the labio-velar [xʷ] (measured at 847Hz). This may be due to the length of the front cavity for the labio-uvular [χʷ] ‹x̌ʷ› being longer than the labio-velar [xʷ].
Table 12. Means and standard deviations (in parentheses) of measures FreqM, DCT1, and DCT2 for each fricative

3.3.4 Spectral properties
Table 12 summarizes the means and standard deviations of FreqM, DCT1, and DCT2 for each fricative (see Appendix B for a comparison between affricates and fricatives). Figure 25 illustrates the distribution of (a) FreqM, (b) DCT1, and (c) DCT2.
FreqM was the greatest for [s], followed by [ʃ] ‹š› and [ɬ], followed by [χ] ‹x̌›, then [xʷ], then [χʷ] ‹x̌ʷ›. The test from the linear mixed effects model reveals that [ʃ] ‹š› was significantly lower than [s] (
$\beta$
= –1856.42, SE = 168.61, t = −11.010), [ʃ] ‹š› did not significantly differ from [ɬ], [χ] ‹x̌› significantly lower than [ʃ] ‹š› and [ɬ] (
$\beta$
= –907.40, SE = 100.56, t = −9.023), [xʷ] significantly lower than [χ] ‹x̌› (
$\beta$
= –424.15, SE = 98.49, t = −4.307), and [χʷ] ‹x̌ʷ› significantly lower than [xʷ] (
$\beta$
= –200.72, SE = 89.43, t = −2.244). In other words, the levels of inequality for FreqM can be characterized as [s] > [ʃ] ‹š›, [ɬ] > [χ] ‹x̌› > [xʷ] > [χʷ] ‹x̌ʷ›. DCT1 for [s] was the lowest, followed by [ʃ] ‹š› and [ɬ], and then followed by the dorsal fricatives. The test from the linear mixed effects model reveals that DCT1 for [ʃ] ‹š› was significantly greater than [s] (
$\beta$
= 1.484, SE = 0.147, t = 10.046), [ʃ] ‹š› did not significantly differ from [ɬ], [χ] was significantly greater than [ʃ] ‹š› and [ɬ] (
$\beta$
= 1.155, SE = 0.541, t = 2.133), but none of the dorsal fricatives differed from each other. DCT2 was the lowest for [ʃ] ‹š›, followed by [s], then [ɬ], then [χ] ‹x̌›, and the labio-dorsal fricatives [xʷ] and [χʷ] ‹x̌ʷ› were the highest. The test from the linear mixed effects model reveals that DCT2 for [s] was significantly greater than [ʃ] ‹š› (
$\beta$
= 0.491, SE = 0.112, t = 4.372), [ɬ] was significantly greater than [ʃ] ‹š› (
$\beta$
= 1.064, SE = 0.204, t = 5.196) and [s] (
$\beta$
= 0.573, SE = 0.176, t = 3.249), [χ] ‹x̌› was significantly greater than [ɬ] (
$\beta$
= 0.625, SE = 0.142, t = 4.381), [xʷ] was significantly greater than [χ] ‹x̌› (
$\beta$
= 1.198, SE = 0.233, t = 5.132), but [xʷ] and [χʷ] did not significantly differ from one another.
The three spectral measurements were fit into a linear model to test the relationship between (a) FreqM and DCT1, (b) FreqM and DCT2, and (c) DCT1 and DCT2. The results reveal the following: (a) FreqM significantly (and negatively) correlated with DCT1 (i.e., the slope) (R 2 = .797, F(1, 216) = 852.9, *** p<.001); (b) FreqM significantly (and negatively) correlated with DCT2 (i.e., the curvature) (R 2 = .512, F(1, 216) = 226.9, *** p<.001); and (c) DCT1 significantly (and positively) correlated with DCT2 (R 2 = .487, F(1, 216) = 205.7, *** p<.001). Figure 26 depicts scatter plots (with a simple regression line) illustrating the relationship between (a) FreqM and DCT1, (b) FreqM and DCT2, and (c) DCT1 and DCT2. As FreqM decreases, DCT1 (i.e., the slope of the spectrum) and DCT2 (i.e., the curvature of the spectrum) increases. Moreover, as illustrated in Figure 26c, as the slope of the spectrum increases, the curvature of the spectrum increases.

Figure 25. Spectral measurements (a) FreqM, (b) DCT1, and (c) DCT2.
3.3.5 Summary of Lushootseed fricatives
These findings provide several implications on the effects of place of articulation on the peaks and shapes of the spectrum. Although the lateral fricative [ɬ] has peaks near the same frequency range as post-alveolar fricatives [ʃ] ‹š› and whose slope (DCT1) is approximately the same as [ʃ] ‹š›, the curvature (DCT2) of the spectrum for [ɬ] was significantly greater than [ʃ] ‹š›. Although these findings are consistent with Jannedy & Weirich (Reference Jannedy and Weirich2017), where the slope of the spectrum increases as the length of the cavity in front of the constriction increases, there are a few noteworthy observations to consider. Although labio-velar [xʷ] is articulated more anteriorly than labio-uvular [χʷ] ‹x̌ʷ› (thus, making the length of the cavity in front of a [xʷ] constriction shorter than [χʷ] ‹x̌ʷ›), both yield similar spectral shapes, where the slope and curvature of the spectrums are equally affected by the secondary articulation of lip rounding. What differentiated [xʷ] from [χʷ] ‹x̌ʷ› was the frequency of the main peak (i.e., FreqM), where it was significantly lower for [χʷ] ‹x̌ʷ› than [xʷ]. Moreover, the slope of the spectrum was equally affected for all three dorsal fricatives, while the curvature and frequency of the main peak differentiated labialized dorsal fricatives from plain dorsal fricatives.

Figure 26. Scatter plots with a simple regression line illustrating the relationship between (a) FreqM and DCT1, (b) FreqM and DCT2, and (c) DCT1 and DCT2.
4. Discussion
In this paper, I conducted the first acoustic analysis of obstruents in Lushootseed. There are several implications of the findings. In Section 4.1, I discuss the implications of the findings on the Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005) model of ejectives. In Section 4.2, I discuss the analysis and limitations of obtaining spectral measurements from these recordings.
4.1 Ejective typology
The following patterns for ejective stops can be observed: (i) long (though variable) VOT (like stiff ejectives), (ii) loud burst intensity (like stiff ejectives), (iii) depressed f0 perturbation (like slack ejectives), and (iv) creaky voiced onset (like slack ejectives). The intensity difference for ejective stops are approximately the same as voiced and voiceless stops. On the other hand, ejective affricates revealed greater intensity difference than voiced and voiceless affricates (i.e., Ejective > Voiced, Voiceless). Given this variability, the Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985) model doesn’t seem to work for Lushootseed. The current findings support the view that there are more variable realizations of ejectives across languages and that ejectives cannot simply be divided into either stiff or slack (Warner Reference Warner1996; Wright et al. Reference Wright, Hargus and Davis2002; Hajek & Stevens Reference Hajek and Stevens2005; Percival Reference Percival2015, Reference Percival2024). This suggests that the Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985) typology cannot predict the variation observed in Lushootseed ejectives.
The typology is based on widely held assumptions about vocal fold tension and compression. As Wright et al. (Reference Wright, Hargus and Davis2002:70–71) states, these assumptions may underestimate the complexities of laryngeal muscular adjustments, as well as timing relationships in the production of ejectives. An increase in f0 is primarily manipulated by the lengthening and stretching of the vocal folds (Stevens Reference Stevens2000). Vocal fold lengthening is achieved mainly by stretching the vocal folds. The vocal folds are stretched (via longitudinal tension) through contraction of the cricothyroid (CT) muscle, which is responsible for the rocking and horizontal translation of the thyroid cartilage in relation to the cricoid cartilage. The rocking and translation motion of the thyroid cartilage lengthens and stretches the vocal folds. An increase in medial compression accompanies the stiffness of the vocal folds. The muscles that cause adduction to the vocal folds and applies medial compression are the interarytenoid (IA), thyroarytenoid (TA), and lateral cricoarytenoid (LCA) muscles.
Contrary to the assumptions in Lindau’s and Kingston’s models, there may be strong medial compression due to constriction of the adductor muscles (IA, TA, and LCA) in the absence of longitudinal tension. This would inhibit voicing, which results in a long VOT that is accompanied by depressed f0 and creaky voiced onset (Wright et al. Reference Wright, Hargus and Davis2002:71). Because f0 perturbation was depressed, the longitudinal tension (CT) of the vocal folds must have decreased prior to the relaxation of medial compression (IA, TA, and LCA). Moreover, the decrease in longitudinal tension (CT) may vary at differing degrees in the production of ejectives. There may also have been differing degrees of larynx raising and medial compression. These would account for the variability observed in the data.
The intensity difference for ejective affricates was greater than voiced and voiceless affricates. The reduced amplitude at voice onset occurs because a constricted glottis (when generating creaky voicing) generally reduces the amplitude itself (Keating et al. Reference Keating, Garellek and Kreiman2015). This is because the pressure below the glottis decreases in response to the decrease in abducting forces (Stevens Reference Stevens2000). There is also reduced intraoral pressure during creaky phonation (Ingram & Rigsby Reference Ingram, Rigsby, Gamkrelidze and Remmel1987). Because creaky phonation is generated with reduced longitudinal tension and increased medial compression (a result of this is a lower f0), the spectral peak (peak amplitude) is much smaller at lower frequencies than modal phonation. The greater intensity difference of the following vowel for ejective affricates is likely to be a consequence of manipulating the adductor muscles (IA, TA, LCA) to produce creaky voicing. However, this was not observed for ejective stops, where the intensity difference was comparable with voiceless stops. This suggests that there may have been different types of creaky voicing (types that involve a subglottal pressure that was not reduced in response to the drop in abducting forces) at the vowel onset for some of the ejective stops observed in the data.
It is possible that stiff and slack ejectives are two extreme endpoints that fall along a continuum of varying realizations for ejectives. Due to the variability in the data, the ejectives observed in this study can fall anywhere along this continuum from most stiff to least stiff and from most slack to least slack. Because Lushootseed ejectives revealed instances of both short VOT and long VOT, this suggests that VOT types are not strictly exclusive features that distinguishes ejectives in Lushootseed. This also holds for voice onset quality. The view that stiff and slack ejectives are two extreme endpoints that fall along a continuum of varying realizations of ejectives has also been noted by Hajek & Stevens (Reference Hajek and Stevens2005) in their study on ejective stops in Waima’a.
Some have claimed that there is a perceptual basis for the stiff and slack distinction across languages, where stiff ejectives are “easy to perceive” (Ham Reference Ham2008:8) and are less likely to be confused with voiceless unaspirated stops (Bird Reference Bird2002, Reference Bird2020), whereas slack ejectives are “hard to perceive” (Ham Reference Ham2008:8) and are more likely to be confused with voiceless unaspirated stops (Bird Reference Bird2002, Reference Bird2020; Percival Reference Percival2024). According to Fallon (Reference Fallon2002), slack ejectives can often be misheard as voiced stops. In a recent cross-linguistic study, Percival (Reference Percival2024) conducted a perceptual experiment where she manipulated the acoustic parameters to assess which acoustic dimensions are used as auditory cues to perceive ejectives apart from plain stops by listeners of different languages. Percival found that, regardless of how ejectives are produced across different languages, not only was the period of silence following the release used as a primary cue by listeners (regardless of VOT duration), but differences in the following vowel that are potentially associated with voice quality (such as creaky voicing) are used as secondary cues by listeners to distinguish ejectives from plain stops. Because listeners used (as cues) acoustic characteristics that are defined in both stiff (period of silence) and slack (creaky voice quality) ejectives, this suggests that it’s not a matter of whether listeners are perceiving “stiff” or “slack” ejectives. Rather, it depends on the acoustic cue that happens to be the most perceptually salient in distinguishing ejectives from plain stops. The ejectives in Lushootseed have periods of silence following the release, as well as a creaky voice onset. Impressionistically, these were the most salient cues in distinguishing ejectives from plain stops.
It should be noted that the data comes from recordings of connected (spontaneous) speech. The variability that was observed may have been due to changes in the rate of speech, as well as the use of different styles and registers in traditional Salish narratives. This study has the advantage of working with data from natural connected (spontaneous) speech, which is the natural output of everyday communication. A problem with previous accounts on the realization of ejectives is that data from other languages primarily focused on words produced in a word-list. The current findings suggest the possibility that ejectives have more variable realization in natural connected speech than citation (wordlist) form. Future research may require a comparison between ejectives produced from a wordlist and ejectives produced from connected (spontaneous) speech, especially in other Salish languages with living speakers.
4.2 Implications on the analysis and limitations of obtaining spectral estimates from archival recordings dating to the 1950s
There are severe limitations when obtaining spectral measurements of fricatives (especially [s]) in archival recordings of Lushootseed. These recordings, which date to the 1950s, have frequency cutoffs that vary from 5kHz and 7kHz (the more recent the recordings, the higher the frequency cutoff).Footnote 5 However, the Metcalf recordings that were selected for the analysis of fricatives were recorded more recently (i.e., between 1954–1956, where the frequency cutoff is observed near 7kHz) than the older Metcalf recordings (where the frequency cutoff is observed near 5kHz). By carefully selecting the recordings, somewhat reliable estimates of [s] could be obtained, where [s] tends to reach peaks at (or above) 7kHz (Shadle Reference Shadle, Cohn, Fougeron and Huffman2012, Reference Shadle2023; Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013).
This raises the question of what spectral estimates are the most reliable for characterizing the fricative contrast from these recordings. There are other methods that can be used to analyze fricatives. One of these includes measurements known as “spectral moments” (Forrest et al. Reference Forrest, Weismer, Milenkovic and Dougall1988), which normalizes a spectrum into a Gaussian curve. There are four of these moments: (1) Moment 1, which measures the mean frequency of the Gaussian curve, also called Center of Gravity (CoG); (2) Moment 2, which is the variance of the mean, also called Standard Deviation or Variance; (3) Moment 3, which is a measure of the asymmetry of the spectrum, also called the Skew; and (4) Moment 4, which measures the “peakness” of the Gaussian curve, also known as Kurtosis.
There are several problems with this kind of normalization method, however. First, when there are several peaks below or above the main peak, this could affect the CoG in such a way that shifts the mean below or above the frequency of the main peak. Moreover, moments can largely be affected by the sampling frequency of the spectrum, where a relatively higher Nyquist frequency could affect the moments (Shadle Reference Shadle2023). According to Shadle (Reference Shadle2023), although moments characterize the shape of the spectrum, it does not address the probable causes of that shape. Moreover, according to Blacklock (Reference Blacklock2004), the spectral shape of fricatives (particularly [s]) can be highly variable, and it is unlikely to resemble a Gaussian curve. It is also problematic to normalize the spectrum into a Gaussian curve from these particular recordings, where there is a frequency cutoff above peaks that is expected to be observed for [s]. The amplitudes of additional peaks below the frequency cutoff could affect the means of the curve (recall in Section 3.3.1 the mid-frequency band of intervocalic /s/. Frequency bands such as these could shift the mean of the curve in such a way that it is below the frequency of the main peak).
Table 13. Summary of (1) Center of Gravity (CoG) (in Hz) for [s] and [ʃ] from Praat’s default (measured at midpoint) method, (2) CoG from time-averaging, and (3) FreqM (in Hz) from multitaper spectrums

By extracting the frequency measurement FreqM from multitaper spectrums, it was possible to obtain relative differences across affricate and fricative places of articulation. The method that reliably measured these relative differences was by extracting the frequency of the main peak along the low- and mid-frequency ranges (FreqM) from multitaper spectrums (Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013), as well as extracting DCT coefficients (Jannedy & Weirich Reference Jannedy and Weirich2017). When compared to Praat’s default method for calculating CoG, which failed to capture relative differences in the affricates’ and fricatives’ place of articulation, the spectral measurement FreqM captured relative differences robustly, where the alveolar place of articulation was greater than post-alveolar and lateral, post-alveolar and lateral greater than labio-velar and uvular, etc…. Table 13 summarizes spectral measurements for [s] and [ʃ] from three different methods: (1) The Praat default calculation of CoG (extracted from the midpoint of the fricative duration); (2) CoG calculated from time-averaged spectrums (Shadle Reference Shadle, Cohn, Fougeron and Huffman2012, Reference Shadle2023), which is a method that extracts several Discrete Fourier Transform (DFT) power spectrums from an interval of the fricative, converts them into a matrix of intensity and sampling frequencies, and then averages them; and (3) FreqM extracted from multitaper spectrums.
As Table 13 summarizes, CoG was considerably low for [s] under the two methods used for calculating CoG (far lower than the expected CoG for [s], which is between 4–8kHz (Munson Reference Munson2001; Gordon et al. Reference Gordon, Barthmaier and Sands2002). Although time-averaging slightly improved the frequency measurement of [s], CoG itself had a frequency measure that was a lot lower than the expected ranges for frequency peaks of [s] (despite capturing relative (and expected) differences between [s] and [ʃ] ‹š›). However, by extracting FreqM from multitaper spectrums, the frequency of the main peak for [s] was far more robust and greater than [ʃ] ‹š›. The findings suggest that the method of extracting the frequency measurement FreqM is far more reliable in characterizing relative differences in affricate and fricative places of articulation. Spectral moment measurements, such as extracting CoG, characterizes the spectral shape by normalizing the spectrum into a Gaussian curve. However, moments can largely be affected by a myriad of factors that can confound inferences on articulatory configuration, such as changes in the sampling frequency, which could largely affect the skew of the Gaussian curve (Shadle Reference Shadle2023). The low-frequency noise in these recordings could affect measures of CoG for [s] when there is also an upper-frequency cutoff above 5–7kHz. For this reason, extracting the frequency of the main peak along the mid-frequency range (i.e., FreqM) is far more reliable in differentiating affricates and fricatives when working with these kinds of recordings.
It should be noted that, as discussed in Section 3.3.4, FreqM is not the only measurement that could reliably differentiate the fricative contrast from these kinds of recordings. DCT coefficients, which are measures of spectral shape, also characterized the fricative contrast from these recordings. For example, although FreqM did not differentiate the contrast between the post-alveolar fricative [ʃ] ‹š› and the lateral fricative [ɬ], DCT2 (i.e., the curvature of the spectrum) differentiated the two fricatives, where DCT2 for [ɬ] was significantly more positive than [ʃ] ‹š›. Although FreqM is a useful frequency measurement, an acoustic analysis of the fricative contrast should not be devoid of measures that characterizes the shape of the spectrum. DCT coefficients are measures of spectral shape but without the amplitude normalization that is required for calculating spectral moments (Shadle Reference Shadle2023). For this reason, DCT coefficients are preferable over spectral moments.
It should be noted that measures of spectral shape (whether it’s from spectral moments or DCT coefficients) can be affected by changes in the sampling frequency. Frequency components above the upper frequency cutoff can adversely affect measures of spectral shape. Spectral components that are effectively preserved in these recordings can only be observed below the frequency cutoff. For this reason, it is important to down-sample the recordings so that the Nyquist frequency is at the cutoff frequency of the microphone signal.
5. Conclusion
In this paper, I analyzed the acoustic properties of obstruents in Lushootseed. The goal of this paper was to characterize the acoustic correlates of Lushootseed obstruents. There were several observations that were made in this study. In Section 3.1, Lushootseed stops were examined. The VOT of pulmonic voiceless stops indicates that they are slightly aspirated (neither unaspirated nor aspirated). Moreover, the findings also suggest that only voiced stops with a more anterior closure (i.e., /b d/) are fully voiced during the articulatory closure in word-initial position (when preceded by a pause, preceded by a word-final vowel, or (sometimes) preceded by another word-final consonant). This was not observed for voiced velar or labiovelar stops, which are devoiced in those positions. Another key finding was that the closure duration of voiceless stops were the greatest, while the closure duration of ejectives did not significantly differ from voiced stops. The findings also reveal that word position (word-initial vs. word-medial) did not significantly differ in VOT. This might be due to how some of the word-initial stops were preceded by a word-final vowel, which would create an environment that patterns closely to word-medial (intervocalic) position. Another interesting finding was that, among the dorsal stops, labio-dorsal stops had a significantly lower closure duration than their non-labialized counterparts. Another key finding was that ejective stops revealed long (but variable) VOT, greater burst intensity than voiceless stops, depressed f0 perturbation, and creaky voice onset. This suggests that the data does not fit the Lindau (Reference Lindau1984) and Kingston (Reference Kingston1985, Reference Kingston2005) models for ejectives. However, it does support more recent work, which suggests that there are variable realizations of ejectives across languages (Wright et al. Reference Wright, Hargus and Davis2002; Percival Reference Percival2024).
Section 3.2 provides key findings on the acoustic correlates of affricates in Lushootseed. VOT and release frication duration are reliable measurements that characterized the contrast of affricates with respect to their laryngeal type (voiced, voiceless, ejective). Interestingly, the closure duration of voiced affricates was greater than voiceless and ejective affricates, but only in word-initial position. In word-medial position, closure duration revealed the same patterns as stops, where voiceless affricates had a significantly greater closure duration than voiced and ejective affricates. Moreover, closure duration for affricates was significantly shorter than stops, which suggests that different manners of articulation could affect the closure duration of obstruents. Reliable measurements that characterized differences in the affricates’ place of articulation were the spectral measurements FreqM, DCT1, and DCT2. FreqM and DCT1 negatively correlated to characterize the place contrast. Moreover, the low FreqM and higher slope of the lateral ejective affricate [tɬ̓] ‹ƛ̓› may suggest articulatory differences in the production of this consonant, where there may have been greater length to the buccal cavities in the production of [tɬ̓] ‹ƛ̓›. The intensity of the release frication for voiced affricates increases at the end of the frication duration as it transitions into the following vowel. On the other hand, the intensity decreases as it transitions into the period of silence for ejective affricates. For voiceless affricates, the intensity of the release frication was in-between (slightly increases). Unlike ejective stops, whose intensity difference (i.e., the difference in intensity between the following vowel’s max amplitude and the voice onset) did not significantly differ from voiceless and voiced stops, ejective affricates revealed greater intensity difference than voiceless and voiced affricates, which suggests that ejective affricates can be differentiated from ejective stops in terms of their voice onset quality.
Section 3.3 covered the acoustic properties of fricatives in Lushootseed. There are some problems with obtaining frequency measurements of [s] and [ts] from these recordings, where there was an upper frequency cutoff from the microphone signal. This obscured measures of the spectral peak for [s] and [ts], which tend to reach their peak amplitude above 7kHz in other languages (Shadle 1985; Shadle & Scully Reference Shadle and Scully1995; Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013). However, relative differences could be obtained by extracting the spectral measurement FreqM from multitaper spectrums (Shadle Reference Shadle, Cohn, Fougeron and Huffman2012, Reference Shadle2023; Koenig et al. Reference Koenig, Shadle, Preston and Mooshammer2013) after carefully selecting the recordings whose frequency cutoff is observed at 7kHz. Moreover, DCT coefficients (Jannedy & Weirich Reference Jannedy and Weirich2017), which are measures of spectral shape, also helped characterize the fricative contrast in Lushootseed.
Acknowledgments
First and foremost, I would like to thank the two elder speakers, Annie Jack and Martha Lamont, who are renowned storytellers whose legacies are preserved in these recordings; my gratitude goes to them and their descendants. I would also like to acknowledge the family members of the speakers in this study: Denise Bill (great-granddaughter of Annie Jack), Willard Bill, Jr. (great-grandson of Annie Jack), Elise Bill-Gerrish (great-great-granddaughter of Annie Jack and daughter of Denise Bill), Justice Bill (great-great-grandson of Annie Jack and son of Willard Bill, Jr.); and Hank Williams (grandson of Martha Lamont), his daughter, and his (and Martha’s) descendants. I am extremely grateful for their support. Many thanks go to the Burke Museum for making these recordings available. Special thanks go to the late Leon Metcalf, who spent about six years recording elder speakers of Lushootseed during the 1950s. I would also like to thank two anonymous reviewers who provided valuable input to this article.
Appendix A. Number of tokens for each obstruent
The total number of tokens for each obstruent is summarized in Table A.
Table A. Number of tokens for each obstruent in this study. Orthography in ‹…›

Appendix B. Comparison of spectral measurements between affricates and fricatives
In this appendix, I compare the spectral measurements between the frication portion of affricates and fricatives based on their corresponding places of articulation (i.e., alveolar, post-alveolar, and lateral). Figure B are box plots illustrating the contrast between affricates and fricatives with respect to their place of articulation for the spectral measurements (a) FreqM, (b) DCT1, and (c) DCT2.

Figure B Box plots illustrating the affricate and fricative contrast for each place of articulation for spectral measurements (a) FreqM, (b) DCT1, and (c) DCT2.
Manner of articulation (affricate vs. fricative) was used as a fixed effect in the model to test the effects of manner on each spectral measurement for each place of articulation. FreqM for alveolar affricates was significantly less than alveolar fricatives (
$\beta$
= –324.34, SE = 118.78, t = −2.731), but alveolar affricates did not significantly differ from alveolar fricatives in DCT1 or DCT2. FreqM for post-alveolar affricates was significantly greater than post-alveolar fricatives (
$\beta$
= 413.2, SE = 119.2, t = 3.466) and DCT1 for post-alveolar affricates was significantly less than post-alveolar fricatives (
$\beta$
= –0.4552, SE = 0.134, t = −3.391), but DCT2 for post-alveolar affricates did not significantly differ from post-alveolar fricatives. FreqM for lateral ejective affricates was significantly less than lateral fricatives (
$\beta$
= –503.8, SE = 222.7, t = −2.262) and DCT1 for lateral ejective affricates was significantly greater than lateral fricatives (
$\beta$
= –0.355, SE = 0.176, t = 2.014), but DCT2 for lateral affricates did not significantly differ from lateral fricatives. Manner effects on FreqM and/or DCT1 for the alveolar and post-alveolar places of articulation is uncertain. However, it could be suspected that the lower FreqM and greater DCT1 for lateral ejective affricates might be due to the articulatory release being made more posteriorly along the sides of the palate than lateral fricatives. It may have also been due to the length of the buccal cavities in the production of lateral ejective affricates. As discussed in Section 3.2.3 and Section 3.3.4, lateral ejective affricates differed from post-alveolar affricates in FreqM and DCT1, whereas there was no significant difference between lateral fricatives and post-alveolar fricatives in these measurements. This may suggest that there are differences in where the articulatory release was being made in the production of lateral ejective affricates when compared to lateral fricatives. Moreover, a release being made with a closed glottis for lateral ejective affricates might affect the frequency of the main peak in such a way that differs from lateral fricatives.