1. Introduction
We use data from Filipino pop music to address a long-standing debate about stress in Tagalog/Filipino, with three main findings.
First, we find that phonologically prominent syllables are set to longer and stronger musical notes. This is not surprising; many previous works on other languages have found that songwriters choose to begin stressed syllables on strong beats, and/or to give them long durations (to mention a few: Dell & Halle Reference Dell, Halle, Arleo and Aroui2009; Hayes Reference Hayes, Arleo and Aroui2009; Proto Reference Proto and Vincenzo2013; Proto & Dell Reference Proto and Dell2013; Temperley & Temperley Reference Temperley and Temperley2013; Bellik Reference Bellik2019). However, to the best of our knowledge, this is the first study of phonological aspects of text-setting in a Philippine language, and possibly in any language of the Austronesian language family.Footnote 1
Our second finding concerns the contrast between penultimate and final prominence. Nearly, all words in Tagalog/Filipino fall into one of two types: penult-prominent (prominence on the second-to-last syllable of the word), as in [ʔábot] ‘power, capacity’ and ultima-prominent (prominence on the last syllable of the word), as in [ʔabót] ‘arrival’. As discussed below, some scholars treat the two types as qualitatively different, with penult-prominent words like [ʔábot] having underlying length (/ʔaːbot/) and ultima-prominent words like [ʔabót] having only default phrase-final prominence (Constantino Reference Constantino1965; Schachter & Otanes Reference Schachter and Otanes1972; Soberano Reference Soberano1980; Himmelmann & Kaufman Reference Himmelmann, Kaufman, Gussenhoven and Chen2020). We find evidence instead for a stress analysis: penultimate and final prominent syllables are treated similarly in the music corpus we examine, even when phrase-medial, both being set to stronger beats and longer notes. As further support for the stress analysis, we find that text-setting prominence does not shift onto phrase-final enclitics; that syllable shape matters for text-setting only where it has been claimed to matter for stress; and that syllables plausibly predicted to bear secondary stress are also set to longer notes and stronger beats. We conclude that Tagalog/Filipino has stress in all words, not just the penult-prominent ones.
Our third finding is that text-setting appears to be sensitive to prominence at an abstract, phonological level, in that it does not mirror the phonetic cues of word prominence, nor is it sensitive to vowel height, even though in speech low vowels should have longer duration and greater loudness.
In the next section, we lay out the two types of word prominence and their phonetic cues, and give background information on the language and music. §3 describes our methods; §4 and §5 give results for duration and beat strength. §6 provides Monte Carlo tests of statistical significance. §7 and §8 give results for phrase-final enclitics and pre-tonic syllables. §9 argues that the text-setting in this corpus does not track phonetics. An html file showing the annotated R code that generated our figures and results is provided as Supplementary Material.
2. Background
2.1. Tagalog and Filipino
The terms ‘Tagalog language’ and ‘Filipino language’ are often used interchangeably. When a distinction is intended, ‘Tagalog’ generally refers to the language of the Tagalog ethnic group in the northern Philippines, whereas ‘Filipino’ is the national language of the Philippines, based on Tagalog and enriched with vocabulary from other Philippine languages, English, Spanish and elsewhere. ‘Filipino’ tends to refer to the language as spoken outside the Tagalog region, or in Philippine cities where Tagalog and non-Tagalog people interact; it can also refer to a prestige variety of the language used in national media (Nolasco Reference Nolasco2007). Filipino, especially as used outside of the Tagalog region, has grammatical differences from Tagalog (Rubrico Reference Rubrico2012; Demeterio & Dreisbach Reference Demeterio and Dreisbach2017).
We use the term ‘Filipino’ in this article, because the music we are analysing forms part of the Philippine national mass media; the linguistic sources we cite use the term ‘Tagalog’.
2.2. Word prosody in Filipino
2.2.1. Two types of word
The great majority of words in Filipino fall into two classes: penult-prominent and ultima-prominent. Table 1 lists the phonetic properties that have been observed in Gonzalez (Reference Gonzalez1970), Anderson (Reference Anderson2006) and Klimenko et al. (Reference Klimenko, Maria and Javier2010);Footnote 2 examples are in (1). In the song corpus, there are about the same number of words of each kind (by type: 269 penult-prominent, 266 ultima-prominent; by token: 498 penult-prominent, 475 ultima-prominent). To avoid committing prematurely to a phonological analysis, we avoid the IPA stress notation (International Phonetic Association 1999) and instead place an acute accent over the prominent vowel.
Table 1 Two types of word in Filipino.

While lexical words are generally at least disyllabic, there are several monosyllabic function words, for example, [ʔat] ‘and’, [ba] ‘(question particle)’, [din] ‘also’. There are loanwords with antepenultimate prominence, such as [ʔágila] ‘eagle’, from Spanish águila, which was the only such word in the song corpus we used. And there are loans with prominence on a closed penult, such as [bɾiljánte] ‘diamond’, from Spanish brillante.
The spectrograms in Figure 1, made with Praat (Boersma & Weenink Reference Boersma and Weenink2017) from recordings in the online dictionary of Tagalog.com, illustrate a minimal pair in citation form.Footnote 3 The top example is [ʔábot] ‘power, capacity’, with penult and ultima vowels similarly long, penult louder than ultima (as can be seen by comparing the heights of the waveforms for [a] and [o]) and high pitch on the penult followed by a pitch fall between the two vowels, as shown by the pitch track (thick black trace overlaid on the spectrogram). The bottom example is [ʔabót] ‘arrival’, with a short penult vowel and long ultima vowel, penult and ultima similarly loud and a pitch fall late in the ultima.

Figure 1 Citation-form disyllables: penult-prominent [ʔábot] ‘power, capacity’ (top) and ultima-prominent [ʔabót] ‘arrival’ (bottom).
Further evidence that final prominence is not associated with longer duration (prominent ultimas are not longer than non-prominent ultimas) comes from Reed’s (Reference Reed2022) acoustic study of reduplication. Tagalog has semi-productive copying of a stem’s first two syllables, as in ka-sáma ‘companion’ vs. ka-sáma-sáma ‘constant companion’. Impressionistically, both copies often have word-level prominence. Reed finds that when penult-prominent roots undergo copying of their first two syllables, the copied penult is also long, as in d[aː]mi-dámi ‘quite a lot’, from dámi ‘quantity’. But when ultima-prominent roots are copied, the copy of the ultima is not long: an[o]-anó ‘what-pl’, from anó ‘what’. Reed takes this difference as evidence for the underlying-length analysis, but it is also consistent with final stress not causing additional lengthening beyond word-final lengthening.
The facts for multi-word utterances are less clear. Schachter & Otanes (Reference Schachter and Otanes1972) describe most Tagalog intonation patterns as including a phrase-final pitch-accent; for some patterns, this accent’s location depends on vowel length. The behaviour of these ‘fixed-P2 patterns’ (Schachter & Otanes Reference Schachter and Otanes1972: 32) is illustrated in (2): for penult-prominent words, the pitch accent can optionally move to the phrase-final syllable when an enclitic is added; for ultima-prominent words, the pitch accent must move to the phrase-final syllable.Footnote 4
The optionality for penult-prominent words, as well as the variety of intonation types and their overlapping semantics or usage, makes it difficult to say whether any given utterance falsifies these claims.
2.2.2. Previous analyses
Previous authors have fallen into two broad camps, summarised in Table 2.Footnote 5 For some (Constantino Reference Constantino1965; Schachter & Otanes Reference Schachter and Otanes1972; Soberano Reference Soberano1980; Himmelmann & Kaufman Reference Himmelmann, Kaufman, Gussenhoven and Chen2020), there is a phonemic length contrast in open penults. This explains the phonetic duration facts: long vowels are pronounced with greater duration, as are all final syllables (leaving the [ʔa] of [ʔabót] short). This analysis must stipulate that long vowels attract pitch accent away from its default phrase-final location, and that a syllable following the pitch accent has lower amplitude. For these authors, stress, if it exists at all in the language, is the surface result of prominence due to vowel length or intonational pitch-accent. (Constantino Reference Constantino1965 uses underlying stress to encode exceptions such as /bɾilˈjante/ → [bɾil.ján.te] ‘diamond’, which has prominence on a closed penult.)
Table 2 Approaches to word prominence in Filipino.

For other authors (Bloomfield Reference Bloomfield1917; Blake Reference Blake1925; Ramos Reference Ramos1981; French Reference French1988, Reference French1991; Avery & Lamontagne Reference Avery and Lamontagne1995; Sabbagh Reference Sabbagh2014; Richards Reference Richards2017), there is a phonemic stress contrast. This analysis must stipulate that both stressed syllables and word-final syllables are lengthened, but not additively, so that stressed, word-final syllables are not extra-long. Like the vowel-length analysis, the stress analysis stipulates that a syllable following the pitch accent has lower amplitude.
We now review three phonological phenomena relevant to length or stress, concluding that they can be analysed under either approach.Footnote 6 Thus, new data are needed in order to distinguish between the two approaches.
The first phenomenon is that closed penults cannot bear prominence (with exceptions, mainly in loans): *hágkan is not allowed. Under the length analysis, this is straightforward: as in many languages, a syllable can have either a long vowel or a final consonant, but not both, ruling out *[haːg.kan]. A stress analysis can stipulate that stress must fall on one of the last two moras of the word (with final consonants extrametrical – that is, not bearing a mora). As shown in (3), stress (the bottom-most × mark on the grids) cannot fall on a consonant mora (ruling out (3d) *[haǵ.kan]), nor can it fall on a mora that is not one of the last two (ruling out (3e) *[hág.kan]).Footnote 7 This is reminiscent of many trochaic languages’ ban on words that end with a heavy penult and a light ultima (see Hayes Reference Hayes1995 on trochaic shortening; Zuraw Reference Zuraw2018).
The second phenomenon is prominence shift in verbs. As shown in (4), prominence shifts one syllable to the right when a suffix is added. Exceptional loan verbs with prominence on a closed penult retain that prominence and gain another on the final syllable.
Under a length analysis, ultima-prominent words require no explanation; they have no long vowel underlyingly, and continue to have no long vowel after suffixation. For penult-prominent words, some form of prosodic faithfulness can be invoked, whereby length moves in order to remain penultimate; see Shryock (Reference Shryock1993) and Crosswhite (Reference Crosswhite1998) for analyses of similar phenomena in other languages. Prosodic faithfulness has to be overridden by some mechanism that anchors exceptional vowel length in a closed syllable, as in [kweːntuhan]; the final syllable is far enough from the long vowel to get phrase-final prominence.
Under a stress analysis, a similar prosodic faithfulness mechanism is needed to keep stress penultimate when a penult-prominent root is suffixed, and the same mechanism applies to ultima-prominent roots, keeping stress final (see French Reference French1988; Sabbagh Reference Sabbagh2004). For the exceptional words, again some mechanism keeps the exceptional stress in place, and an additional stress is added to the final syllable to avoid stress lapse.
The third phenomenon is prominence shift in some suffixed nouns. Depending on the morphology, several patterns are possible (see Schachter & Otanes Reference Schachter and Otanes1972: 98–102). Both penult-prominent and ultima-prominent words can, when suffixed, become either penult-prominent or ultima-prominent, with or without lengthening of the root-initial vowel, and possibly other vowels. A sampling is given in (5). Under any account, there must be morpheme-specific prosodic requirements and optionality. A length account requires length to be deleted, moved, or added; a stress account requires stress to be moved or not moved and pre-tonic length to sometimes be added (see Sabbagh Reference Sabbagh2004 and Hagberg Reference Hagberg2006 for stress-based accounts of part of the pattern).
While these length-based and stressed-based accounts have their strengths and weaknesses, both are workable, and these primary phonological data are thus not decisive.
2.3. Original Pilipino Music (OPM)
OPM stands for Original Pilipino Music. The term can refer to any Philippine pop music, but there is a stylistic core of music that is most likely to be labelled OPM. Arceo-Dumlao (Reference Arceo-Dumlao2017) is a rich collection of interviews with OPM songwriters, focusing on topics such as how an individual got into the music business, the inspiration for a song, or how a singer and songwriter met. There is little discussion of songwriting mechanics such as word choice, but the interviews do provide insight into the songwriting process. Some famous songs were written, music and lyrics, by one person in one day. In other cases, lyrics were written for a melody that had already been composed by someone else. And in yet other cases, the lyrics were written first and then another songwriter composed the melody.
Other literature on OPM includes engineering projects to classify songs into sub-genres, identify mood, distinguish OPM from K-pop, recommend songs automatically or predict hits (Deja et al. Reference Deja, Blanquera, Carabeo, Copiaco, Nishizaki, Numao, Caro and Suarez2016; Mital et al. Reference Mital, Tobias, Bandala, Billones, Dadios, Niranjan, Rana and Khurana2019; Abisado et al. Reference Abisado, Yongson and Los Trinos2021; Sulit Reference Sulit2022; Monterola et al. Reference Monterola, Abundo, Tugaff and Venturina2009); humanistic studies of the history, culture and politics of OPM (Lockard Reference Lockard1996; Maceda Reference Maceda2007; Gabrillo Reference Gabrillo2018; Domingo Reference Domingo2021; Peña Reference Peña2021; Cayabyab Reference Cayabyab, Johan and Santaella2021; Prudente Reference Prudente2021; Shunwei & Jia Reference Shunwei and Jia2022; Nagai Reference Nagai, Hamza, Chan and Chin2022; Gaillard Reference Gaillard2022); a study of cover performances on social media (Anacin et al. Reference Anacin, Baker and Bennett2021); and social-science studies of music preference (Boer et al. Reference Boer, Fischer, Atilano, Hernández, Garcia, Mendoza, Gouveia, Lam and Lo2013; Limjuco et al. Reference Limjuco, Ticudo and Pregua2014) and of attitudes towards code-switching in OPM (Bareng Reference Bareng2019). We have identified two linguistic studies of OPM. Alegado et al. (Reference Alegado, Labaya, Lirio and Rivera2021) analyses instances of English in OPM according to their length and syntax (the corpus we examine here did not happen to contain any code-switches into English). Sumalinog et al. (Reference Sumalinog, Salid, Sarino and Amante2021) discuss OPM lyrics’ use of Swardspeak, ‘the vernacular language or code used by Filipino gay men in the Philippines and in the diaspora’ (Manalansan Reference Manalansan2003: 46). Strikingly, more than half of the literature we identified was published since 2020.
We failed to find linguistic analyses of text-setting – the relationship between text and notes – in music of any Philippine language. The closest publication we could find was Anderson (Reference Anderson2015), a guide to performing Tagalog Kundiman songs, which notes several instances where it is musically effective for duration, beat strength and/or pitch to correlate with stress.
3. Methods and predictions
3.1. Song corpus
We found sheet music online for 19 usable songs. Nine songs were purchased from the composer Aldy Santos’s Web site, Aldy Sheet Music (aldysheetmusic.com), and ten songs from MuseScore (musescore.com), a site that allows users to upload their own transcriptions. (We found but excluded another seven songs: one was in 3/4 time; one was translated from Visayan, and song translators are working under different constraints; and five were each in a markedly different style from the core 19 songs.) The Appendix Table A1 lists the songs.
All songs were in the 4/4 time signature, which means that each measure has four beats. In this time signature, each beat is a quarter note (or crotchet). For those unfamiliar with musical notation, what is important about this time signature is that one can count along to the music in a repeating pattern of 1–2–3–4, 1–2–3–4, 1–2–3–4, ….
3.2. Coding
After listening to recordings and correcting the sheet music where necessary (this was rare), we hand-converted each piece of sheet music into a spreadsheet. In Figure 2, we show a fragment from ‘Akin ka na lang’ by Francis Salazar, as performed by Morisette (with accent marks added). The opening words are Bákit hindíɁ mo maramdamán… (‘Why don’t you feel…’). Each column in the spreadsheet stands for one sixteenth-note of duration, with sixteen columns per ‘measure’ of music. The rows include a repeating metrical grid (the rows with × marks; Liberman Reference Liberman1975; Lerdahl & Jackendoff Reference Lerdahl and Jackendoff1981, [1983] Reference Lerdahl and Jackendoff1996), to guide us to the correct cell for data entry, and rows to enter information about each syllable.

Figure 2 Transcription fragment, from ‘Akin ka na lang’.
As mentioned above, each measure is counted as 1–2–3–4; these numbers appear on the ‘beat’ row of the spreadsheet. We assume, following the usual convention for music in the 4/4 time signature, that the strongest position in the measure is the 1, or downbeat; we show this by giving the downbeats the tallest columns in the metrical grid, with five × marks. The second-strongest position is the 3, which we give four × marks. The next-strongest positions are the 2 and the 4, with three × marks each. If a musician wants to count to the music more finely, dividing the measure into eight equal parts, they can count 1–and–2–and–3–and–4–and, with an and falling in the middle of each beat. These ands are numbered 1.5, 2.5, 3.5 and 4.5 in the ‘beat’ row, and are the next-strongest positions, with two × marks each. A musician can count even more finely, dividing the measure into 16 equal parts, often as 1–ee–and–a–2–ee–and–a–3–ee–and–a–4–ee–and–a. These ees and as, numbered 1.25, 1.75, 2.25, 2.75, etc., are the weakest positions, with one × mark each.
The other rows encode linguistic information. Ba, for example, is entered in the ‘text’ row, in the column where it begins (the downbeat, identified as 1 in the ‘beat’ row). Because Ba is set to an eighth note, it extends over two columns; the underscore in the next cell indicates continuation. The 1 in the ‘stress’ row indicates that Ba is prominent.Footnote 8 The L in the ‘length’ row indicates that this syllable has a long vowel (in this case, predictable from being a stressed open penult); closed syllables are coded as C, and open syllables with short vowels as S. The ‘syll_position’ row shows 2, indicating that Ba is a penult (second from the end of the word). The ‘line_number’ row shows that all the syllables depicted here are in the first line of the song. The line is not a repeat, and we made no special notes (so the ‘repeat’ and ‘notes’ rows are blank). Filipino spelling does not indicate prominence, pre-tonic length, or word-final glottal stops; we relied on a combination of dictionary entries and one author’s native-speaker knowledge for the ‘stress’ and ‘length’ rows, and for the Qs indicating word-final glottal stops in the ‘text’ row.
An R script (R Core Team 2021) reads and processes these spreadsheets.
3.3. Exclusions
We excluded 223 syllables because they belonged to a word that fell entirely or in a part on a triplet (a division of a note into three equal parts instead of two or four), because we had no principled way to classify the prominence of the second and third sub-beats of a triplet. All the songs we coded had four beats per measure, but some had short passages with a different number of beats per measure, and we excluded 11 syllables because they belonged to words that fell partly or wholly within such passages. We excluded repeated lines, so that our data set would not appear (in plots and statistical analyses) to be bigger and more consistent than it really is. Finally, we excluded the one word in the corpus with antepenultimate stress, ágila ‘eagle’, a Spanish loan.
3.4. Predictions
We are comparing four syllable types: prominent and non-prominent penults and ultimas. Our absolute null hypothesis is that all four types are assigned to notes of similar length, and beats of similar strength.
The next closest to a null hypothesis is that OPM text-setting purely reflects the phonetics of word prosody, and tells us nothing about its phonology. In that case, non-prominent penults should be assigned to shorter notes than all the rest, as illustrated by the arrows on the left in (6), which start from the syllable type predicted to be set to longer notes, and point to the syllable type predicted to be set to shorter notes. The musical equivalent of loudness is less direct, but there is a tendency for loudness to signal beat strength (Lerdahl & Jackendoff [1983] Reference Lerdahl and Jackendoff1996: 17–18, 78–79). Non-prominent ultimas should then be assigned to weaker beats than all the rest, as illustrated on the right in (6), but with dashed arrows to show that the predictions are less direct, starting from the syllable type predicted to be set to stronger beats, and pointing to the syllable type predicted to be set to weaker beats.
There are two non-null hypotheses: that generated by the vowel-length theory, and that generated by the stress theory. As illustrated in (7), the vowel-length theory predicts that prominent penults, which contain a phonemically long vowel, should be set to longer notes than all other syllable types. Versions of the vowel-length theory that assign predictable stress to those phonemically long vowels also predict that prominent penults should be assigned to stronger beats than the rest (since cross-linguistically, stressed syllables tend to fall on strong beats, as stated in §1). Dashed lines are again used for the beat-strength predictions, to reflect that they are made by only one version of the vowel-length theory.
The hypothesis generated by the stress theory is that prominent syllables, because they are stressed, should be assigned to stronger beats than non-prominent syllables are, as shown in (8). There is less research on note length in uncontroversial stress languages, with some evidence that German stressed syllables are set to longer notes (Girardi & Plag Reference Girardi and Plag2022). The arrows for note length are dashed to show that this prediction is less clear.
4. Duration
4.1. Quantifying duration
Duration was quantified in quarter-note beats: a whole note (or semibreve, a note that lasts one measure) has a duration of 4 beats, a half note (minim) has a duration of 2, a quarter note (crotchet) 1, an eighth note (quaver) 0.5 and a sixteenth note (semiquaver) 0.25.
4.2. Duration results
There were 498 penult-prominent words and 475 ultima-prominent words analysed.
4.2.1. All final two syllables
The bean plots in Figure 3, made in R using the beanplot package (Kampstra Reference Kampstra2008), show results for the final two syllables of all usable words. On the left are the penult-prominent words, like ʔábot, and on the right are the ultima-prominent words, like ʔabót. The left side of each pair, coloured orange, represents the smoothed distribution of duration for the penult in each type of word, and the right side, coloured sky blue, represents the distribution of duration for the ultima. The four horizontal line segments show the mean of each distribution.Footnote 9

Figure 3 Duration of final two syllables of all words.
In the penult-prominent words, on the left, the penult tends to have a shorter duration (0.7 beats on average) than the ultima (1.0 beats). This might seem unexpected, but recall that in speech, final syllables tend to be long regardless of stress. In ultima-prominent words, the gap is bigger: penults have an average duration of 0.4 beats and ultimas 1.5. Overall then, ultimas are long, but less so in penult-prominent words.
Another way of looking at this plot is to compare the two orange distributions to each other: penults are slightly longer when prominent (0.7 on the left > 0.4 on the right). And comparing the two sky-blue distributions to each other, ultimas are longer when prominent (1.5 on the right > 1.0 on the left).
The plot in Figure 4 shows that syllable shape has little consistent effect. Whether a penult-prominent word has a closed or an open penult, ultimas tend to be slightly longer than penults. And regardless of the syllable shapes in an ultima-prominent word, the ultima is substantially longer than the penult.

Figure 4 Duration, broken down by syllable shape.
4.2.2. Separating out phrase- and line-final words
The very long durations seen for some ultimas are mostly line-final syllables, reflecting the musical tendency to place a long note at the end of a line. Splitting up the results into line-final versus line-medial words will allow us to see whether prominent ultimas are still long even when not line-final.
Furthermore, given various authors’ observations about how intonational prominence can track the end of the phrase, rather than the end of the word, we should further split line-medial words into phrase-medial and phrase-final, to see whether prominent ultimas are still long even when phrase-medial. (We assume that line-final words are always phrase-final.)
To determine phrase boundaries, we follow Schachter & Otanes’s (Reference Schachter and Otanes1972: 36) description of where optional pauses may occur; we take these optional pause locations to represent phrase boundaries. Schachter & Otanes state that phrase boundaries never occur after a proclitic or before an enclitic, and are more likely to occur at large syntactic breaks than at small ones, such as between a modifier and the word it modifies. They give the following examples:
We operationalised phrasehood by having a look-up list of enclitics, based mainly on Kaufman (Reference Kaufman2010).Footnote 10 If a word was followed by one of these enclitics, it was considered phrase-medial.Footnote 11 If not, then it was considered phrase-final. In the examples above, our procedure correctly codes all the boldface words, but leaves bagong miscoded as phrase-final. (Monosyllabic words are ignored here because they are excluded from our results.) Following this example and Schachter & Otanes’s description, we decided that two-word modifier–modified pairs connected by the linker suffix -ng, like bagong paaralan, should be hand-coded as belonging to the same phrase; we identified 19 such sequences in the song corpus.
We assumed that line breaks occurred where displayed on the lyrics website Musixmatch (https://www.musixmatch.com/). Some of the songs rhyme, providing further support for the line-break locations, but we did not use rhyming or other criteria to make changes from Musixmatch’s line breaks.
Figure 5 illustrates this three-way breakdown. As we move from phrase-medial to phrase-final to line-final, durations get longer, especially for ultimas. But the key result still holds in all three environments: prominent penults are longer than non-prominent penults; and prominent ultimas are longer than non-prominent ultimas.

Figure 5 Note duration broken down by position.
4.3. Duration summary
We have seen that, as predicted by both the vowel-length analysis and the stress analysis, prominent penults are set to longer notes than non-prominent penults. As predicted by only the stress analysis, prominent ultimas are also set to longer notes than non-prominent ultimas. This is true even for phrase-medial position, where, in speech, ultima-prominent words may lose their intonational prominence.
We defer discussion of statistical significance to §6. A regression model, included in the Supplementary Material, finds that ultimas are longer, prominent syllables are longer and prominent ultimas are the longest of all. In §6, we argue that regression may not be conservative enough, and provide an alternative method for assessing significance.
5. Beat strength
5.1. Measuring beat strength: syncopation
OPM has extensive anticipatory syncopation (Temperley Reference Temperley1999; Tan et al. Reference Tan, Lustig and Temperley2019), meaning that beats that, according to various expectations, should count as strong, begin slightly early. Two examples are shown in Figure 6: the syllable ram begins on the ‘and’ (second half) of a beat, the second-weakest position, but musically it behaves as though it began on the following measure’s first beat, which is the strongest position; the syllable man begins on the last sixteenth note of a beat, the weakest position, but musically behaves as though it began on the third beat of the measure, the second-strongest position. Temperley’s (Reference Temperley1999) solution is to move each syncopated note forward, so that it counts as beginning later than it really does.

Figure 6 Examples of syncopation in first line of ‘Akin ka na lang’.
To operationalise this, we identified, for each syllable, the strongest beat that it contains: for ram, beat 1 of a measure, and for man beat 3 of a measure. One danger is that if a note is very long, it will always end up counting as strong, because it eventually goes on long enough to include a strong position. Therefore, we only looked for the strongest beat contained in the first 1.25 beats of the note. This is enough for a note that begins on the last beat of the measure (or even one sixteenth-note earlier) to count as beginning on a downbeat, if it lasts long enough.Footnote 12 Our procedure ‘corrected’ 49% of sixteenth notes, 27% of eighth notes and 11% of quarter notes to a stronger position.
5.2. Beat strength results
As with duration, there were 498 penult-prominent words and 475 ultima-prominent words analysed.
5.2.1. All final two syllables
Results are plotted in Figure 7. The plot is the same as in Figure 3, except that the vertical axis measures beat strength. The strongest value, 5, is for a note that starts on or contains a downbeat (beat 1 of the measure); the next-strongest, 4, is for a note that starts on or contains beat 3 of a measure; 3 is for a note that starts on or contains beat 2 or 4; 2 is for a note that starts on or contains at most the second half of a beat, and 1 is for a note that starts on and contains at most the second or fourth quarter of a beat.

Figure 7 Beat strength (correcting for syncopation) of final two syllables of all words.
On the left, we see that in penult-prominent words, average strength is somewhat higher in penults than in ultimas; the modal penult of such words is set to a downbeat, while their modal ultima is set to the second half of a beat (the peak of the distribution is at 2). On the right, for ultima-prominent words, ultimas are on average much stronger (modally downbeats) than penults (modally second half of a beat). Thus, beat strength matches prominence, especially for ultima-prominent words.
Just as we did for duration, we can also divide the results by syllable shape, as shown in Figure 8, with little effect: regardless of syllable shape, penult-prominent words have a stronger penult, and ultima-prominent words have a stronger ultima.

Figure 8 Beat strength broken down by syllable shape.
5.2.2. Separating out phrase- and line-final words
Just as with duration, we separate out phrase- and line-final words. As shown in Figure 9, for phrase-medial words, penult-prominent and ultima-prominent words look symmetrical, but moving to phrase-final and then line-final, penult-prominent words show less and less of a strength difference between penults and ultimas. Nevertheless, in all three environments, prominent syllables are stronger than non-prominent syllables within the same word type; prominent penults are stronger than non-prominent penults; and prominent ultimas are stronger than non-prominent ultimas.

Figure 9 Beat strength broken down by position.
5.3. Beat strength summary
The results again support the stress analysis. As both the vowel-length analysis and the stress analysis predict, prominent penults are set to stronger beats than both non-prominent penults and non-prominent ultimas. But as predicted only by the stress analysis, prominent ultimas are also set to stronger beats than both non-prominent ultimas and non-prominent penults. This holds even phrase-medially, where, in speech, ultima-prominent words may lose their intonational prominence.
A regression model, included in the Supplementary Material, finds that prominent ultimas and penults are set to the strongest beats, and non-prominent penults to the weakest. A more conservative method of assessing statistical significance is given now.
6. Monte Carlo tests for significance
Imagine a language where most words have final stress, and a musical style where most lines of music end in a long note. Even if songwriters make no effort to align stress with long notes, stressed syllables will still tend to be placed on long notes, because of line-final syllables. In other words, the simple null hypothesis that the regression models mentioned above use, ‘stressed and unstressed syllables are set to notes of equal duration’, won’t do, because the data are inherently biased to falsify that null hypothesis. We want to construct a null hypothesis that already includes such biases. One way to do this is to randomly re-combine text and music. In some studies, this scrambled, null-hypothesis corpus is constructed by drawing text from prose (the ‘Russian method’ for poetry; Ryan Reference Ryan2011; see references in Hayes Reference Hayes2013), but it is not clear what would be suitable prose to use in the case of OPM. Instead, we follow Gunkel & Ryan (Reference Gunkel, Ryan, Jamison, Melchert and Vine2011) in scrambling the lines of lyrics within our corpus of songs.
We first compute several measures for the real data, such as the mean duration of prominent penults’ notes minus the mean duration of non-prominent penults’ notes (all the measures are listed in (10)). Then, we convert each musical line into a pattern representing the number of syllables, and the locations where a disyllabic or longer word ends. For example, Bákit hindíʔ mo maɾamdamán ‘why can’t you feel it’ has the structure σ σ | σ σ | σ σ σ σ σ |, with | representing the ends of words, not including monosyllabic mo ‘you’. These word boundaries are important because, as we saw in §4.2, word-final syllables tend to be given a long duration regardless of whether the word is penult- or ultima-prominent. Since we are interested not in that effect, but rather in the difference between penult- and ultima-prominent words, we want our null hypothesis to include word-final lengthening.
Our script randomly selects a line from the corpus that has the same structure, such as Sáma-sáma ɾiŋ maɾaɾatíŋ ‘will also arrive together’. (We generally coded words with two-syllable reduplication, like sáma-sáma ‘together’, as two separate words.) In this example, the selected line is from a different song (‘Tagumpay nating lahat’, written by Gary Granada and performed by Lea Salonga). The script combines the old line’s lyrics with the new line’s musical notes, as shown in Figure 10.

Figure 10 Lyrics randomly assigned to a new line of music.

Figure 11 Monte Carlo results.
We recompute the measures of interest on the new, scrambled corpus, and repeat the scrambling procedure 1,000 times, to obtain the distribution of values that we would expect to see under the null hypothesis, following Kessler (Reference Kessler2001); Martin (Reference Martin2007, Reference Martin2011); Hayes et al. (Reference Hayes, Zuraw, Londe and Siptár2009). Each plot in Figure 11 is for one of the measures in (10). The grey bars are a histogram of the measure’s values in the shuffled corpora, and the solid blue line is the value in the actual corpus. The estimated p value for each measure is the proportion of shuffled corpora that lie to the right of the solid blue line – how often we’d expect to see such an extreme result by chance. For many measures, the histogram does not overlap with the solid blue line at all, meaning that the estimated p value is less than 0.001. The measures and p values are given in (10), and depicted graphically in (11).Footnote 13
Measures (10a)–(10d) and (10g)–(10j), shown in bold, are those predicted by the stress analysis to be positive, or in any case greater than expected by chance; the other measures are included for completeness, even if no analysis predicts them to be non-zero. Since there are 12 measures being taken, if we require
$p < 0.05$
to reject the null hypothesis, a Bonferroni correction (Dunn Reference Dunn1961) adjusts that threshold to
$p < 0.004$
. The measures meeting that significance criterion are marked with * in (11).
Except for (10d), then, the predictions of the stress analysis given in §3.4 are supported. The vowel-length analysis and the stress analysis overlap in correctly predicting (10a), (10b), (10g) and (10h) to be positive. The vowel-length analysis alone predicts (10e) and (10k) to be positive, which was not strongly supported. We conclude that our results support the stress analysis over the vowel-length analysis.
For those who prefer a regression analysis, despite its drawbacks, the html file in the Supplementary Material includes regression models that reach similar conclusions.
7. Phrase-final enclitics
Recall from §2.2.2 that some scholars hold that apparent word-final prominence is really phrase-final prominence, because when an ultima-prominent word is put into a phrase, its prominence shifts. For example, ultima-prominent damít ‘clothes’ may appear to have final stress in citation form, but when it is combined with the enclitic ko ‘my’ to form damit ko ‘my clothes’, this ‘stress’ now falls on ko. By contrast, the pa of penult-prominent sapátos ‘clothes’ remains (or at least can remain) ‘stressed’ in sapatos ko.
To test this idea, we examine enclitics like ko. If apparent word-final prominence is really phrase-final prominence, ko should be set to longer and stronger notes when it follows ultima-prominent words (because prominence should shift onto it) than when it follows penult-prominent words.
We extracted all monosyllabic enclitics from the corpus, and counted them as phrase-final if they were followed by a content word. This yielded 143 total tokens of phrase-final monosyllabic enclitics, of which 101 were after penult-prominent words and 42 after ultima-prominent words.
Looking first at note duration, shown in Figure 12, we see that monosyllabic enclitics, whether line-medial or line-final, are not set to longer notes when they follow an ultima-prominent word (yellow) than when they follow a penult-prominent word (green).

Figure 12 Phrase-final clitic duration.
As for beat strength, shown in Figure 13, monosyllabic enclitics are set to a weaker beat when they follow an ultima-prominent word than when they follow a penult-prominent word, the opposite of what phrasal prominence-shifting predicts. This is probably explained by the fact that weak and strong beats tend to alternate in music. When a prominent ultima, like the mit in damít, gets set to a strong beat, a following syllable will tend to be set to a weak beat; whereas when a prominent penult, like the pa in sapátos, is set to a strong beat, an enclitic two syllables later can also be set to a strong beat.

Figure 13 Phrase-final clitic beat strength.
Using the same Monte Carlo method as in §6 (plots in Supplementary Material), we found that enclitics were not set to significantly longer notes after ultima-prominent words than after penult-prominent (
$p = 0.55$
), nor to significantly stronger beats (
$p = 0.82$
).
The text-setting data thus contradict the idea that ultima prominence is illusory. Even if intonational pitch accent often moves onto an enclitic in speech, songwriters keep prominence on the word ultima.
8. Pre-tonic syllables
8.1. Introduction
Filipino morphology can create a long vowel earlier in the word – before the penult, that is – though this is often variable or optional. The most common morphological source of long vowels is verb aspect reduplication. As illustrated in (12), aspect reduplication produces a copied, prefixed syllable that typically contains a long vowel. (The infixes -um- and -in- represent voice and aspect.) All but ten of the pre-tonic long vowels in the song corpus come from aspect reduplication.
French (Reference French1988) and French (Reference French1991) give partly conflicting descriptions of secondary stress, and French (Reference French1991) calls for acoustic analysis of secondary stress – which as far as we know has still not been carried out – to clarify the picture. We will focus here on French’s claims about the types of words that are well attested in the song corpus. French’s two accounts agree that aspect reduplicants, like those shown in (12), receive secondary stress; for example, French would transcribe ‘will write’ as [ˌsuˈsulat]. French (Reference French1988) claims that closed syllables in prefixes generally attract secondary stress, as in [màg-pa-ka-ʔáɾal] ‘study intensely’ (and does not address closed, pre-tonic root syllables, as in the penult of [ta:-takbó]). French (Reference French1988) further claims that a closed prefix syllable will not receive secondary stress if a following prefix syllable itself has secondary stress (the context found in aspect reduplication), as in [mag-ˌpa-pa-ka-ˈʔaɾal] ‘will study intensely’, where the prefix /pa/ has undergone aspect reduplication. The two works make conflicting claims about default locations of secondary stress when prefixes are all open syllables with no aspect reduplication.
While acknowledging that much remains to be determined about Filipino secondary stress, we extract two hypotheses from these descriptions. First, pre-tonic syllables that are closed or have long vowels, as in words like [sa:-sabí-hin] ‘will be said’ and [nag-simuláʔ] ‘began’, should tend to be treated as having secondary stress, and thus be set to longer notes and stronger beats than pre-tonic syllables that are open and have a long vowel, as in [ka-ʔibíg-an] ‘friend’. Second, looking just at antepenults, stress clash avoidance should weaken or eliminate this effect when the next syllable is a prominent penult, so that the antepenult in a word like [pag-ʔíbig] ‘love’ or [ma-pa-páwiʔ] ‘will come to an end’ would not be set to particularly long notes or strong beats, despite being closed or having a long vowel, because the following syllable is prominent.
For this part of the analysis, we used words of three or more syllables. Table 3 shows how many tokens were found in each position. Because there were so few observations of fifth-, sixth- and seventh-to-last syllables, they are not included in the analysis.
Table 3 Number of open-syllable observations for analysing pre-tonic length.

8.2. Pre-tonic duration
The bean plots in Figures 14 and 15 show the distribution of note duration for pre-tonic syllables that are short-vowelled and open, short-vowelled and closed, or long-vowelled and open. (There were no long closed tokens.) In fourth-to-last syllables (Figure 14), we see that songwriters have assigned the short open syllables to the shortest note durations. In contrast to the duration data for penults and ultimas above in Figure 4, syllable shape does matter here: the closed syllables pattern with the long-vowel syllables – although, not surprisingly given the small amount of data, the differences are not significant.Footnote 14 This is consistent with French’s (Reference French1991) contention that both closed syllables and long-vowelled syllables (aspect reduplicants) attract secondary stress, and goes against the otherwise appealing notion that the reason a closed penult cannot be prominent is that it can’t have a long vowel: even though these closed pre-tonic syllables have short vowels, they appear to be receiving prominence.Footnote 15

Figure 14 Note duration in fourth-to-last syllables.

Figure 15 Note duration in third-to-last syllables.
For third-to-last syllables, in Figure 15, the data are further divided according to whether the following syllable is prominent (penult-prominent word, as in liːlípas ‘will elapse’) or not (ultima-prominent, as in puːputíʔ ‘will turn white’). The only syllables to receive longer note duration are closed and long-vowelled syllables that are not followed by a prominent penult, and thus not subject to stress clash. The difference between on the one hand long-vowel syllables not subject to stress clash and on the other hand short-vowel syllables was significant (
$p < 0.001$
).
8.3. Pre-tonic strength
The results for beat strength are less perfectly in line with our secondary-stress predictions, but still broadly support them. In fourth-to-last position, long-vowelled syllables – but not closed syllables – trend towards being set to stronger beats, as shown in Figure 16.

Figure 16 Beat strength in fourth-to-last syllables.
In third-to-last syllables (Figure 17), the syllables set to the strongest beats are those that we predict to have secondary stress: closed and long-vowelled syllables in ultima-prominent words (no stress clash). There also appears to be a difference within the short open syllables between those that are followed by a prominent syllable and those that are not. It could be that short open syllables prefer to bear secondary stress if followed by an unstressed syllable (Blake Reference Blake1925; Avery & Lamontagne Reference Avery and Lamontagne1995). There is also a plausible musical explanation for this: unlike note length, beat strength alternates in the underlying musical structure. Because prominent penults and ultimas tend to be assigned to strong beats, there will thus be a tendency for an antepenult preceding a prominent penult to be weak, and for an antepenult preceding a prominent ultima to be strong. In the case of short-vowelled open syllables, this musical tendency creates a small difference; in the closed and long-vowelled syllables the musical tendency combines with stress clash avoidance to create a bigger difference. The difference between, on the one hand, long-vowelled syllables not subject to stress clash and, on the other, short-vowelled syllables was significant (
$p < 0.001$
).

Figure 17 Beat strength in third-to-last syllables.
8.4. Pre-tonic syllables summary
We have seen that closed or long-vowelled pre-tonic syllables are set to longer notes and stronger beats, as long as the following syllable is not the tonic (as in a word like liːlípas ‘will elapse (time)’). This supports French’s (Reference French1991) contention that pre-tonic closed syllables and long vowels attract secondary stress, subject to some stress clash avoidance. We saw earlier that open versus closed syllable shape in penults and ultimas, which does not affect stress (except that closed penults may not bear stress), was not important for note length and beat strength. Thus, syllable shape seems to matter for text-setting only where it has been claimed to matter for stress.
9. OPM text-setting does not track phonetics
While text-setting partly tracks the phonetics of duration and loudness (see §2.2.1), there are some mismatches. In speech, the last two syllables of penult-prominent words have two long vowels, and those of ultima-prominent words have a short and then a long vowel; music was a rough match to this (excluding line-final words), as summarised in Table 4, except that prominent ultimas were set to longer notes than either non-prominent ultimas or prominent penults, as predicted by the stress analysis. In speech, penult-prominent words have a loud penult and quiet ultima, which was reflected in beat strength, but prominent ultimas, which in speech have similar loudness to their non-prominent penults, were set to stronger beats, again as predicted by the stress analysis.
Table 4 Summary of phonetic and musical properties of last two syllables of words.

We also looked at vowel height, on the assumption that the low vowel /a/ should be longer and louder in speech than the high vowels /i, u/ (though the acoustic results on this in Gonzalez Reference Gonzalez1970 are not straightforward). The plots in Figure 18 show the note durations (left plot) and beat strengths (right plot) of the last two syllables of penult-prominent and ultima-prominent words. Rather than pairing the distributions of penult and ultima for each word type, the four syllables are all separated out, and each pair of distributions is for a low vowel (left) and a high vowel (right). Within each pair, the left and right distributions are almost identical, with no indication that songwriters assign low vowels to longer notes or stronger beats.

Figure 18 Note duration and beat strength by vowel height.
Although we found that OPM text-setting does not track phonetic detail, there is one area we found where it does track surface rather than underlying phonology. When a [ʔ]-final word is phrase-medial, the [ʔ] usually deletes, and the preceding vowel lengthens in compensation, as in, using Schachter & Otanes’s (Reference Schachter and Otanes1972: 16) length-based notation, [luːtoʔ] ‘cooked’ vs. [luːtuː ba] ‘cooked?’ and [hindiʔ] ‘no’ vs. [hindiː ba] ‘no?’. Within prominent ultimas followed by a consonant-initial enclitic, we found that underlyingly glottal-final ultimas, like the /diʔ/ in ‘no’, are set to somewhat longer notes and stronger beats than other prominent ultimas are, presumably reflecting their surface lengthening. (Plots are provided in the Supplementary Material.)
10. Conclusion
This study has found that prominent penults and prominent ultimas are both set, in a corpus of OPM songs, to longer notes and stronger beats, both phrase-medially and phrase-finally – and that these text-setting tendencies are not simple reflections of duration and loudness in speech. Text-setting seems to reflect stress at the word level, and not merely phrasal prominence: when an ultima-prominent word is followed by a phrase-final enclitic (e.g., damít ko), many authors have observed that intonational prominence tends to shift onto the enclitic, but as we saw in §6, it is the content word’s ultima (mít) that is musically prominent, not the enclitic (ko). Furthermore, while syllable shape (open vs. closed) did not affect text-setting of penults and ultimas, it did affect text-setting of pre-tonic syllables, which is where French (Reference French1991) has claimed that syllable shape affects stress. All this is evidence in favour of analysing Filipino as having stress, even though the stress is realised differently in different positions in speech, with stressed penults having greater duration than unstressed, and stressed ultimas having greater loudness than unstressed (in addition to possible intonational differences). As we discussed in §2.2.2, standard phonological data were insufficient to decide between the length-driven and the stress-driven analyses. We believe that the musical data here provide the first straightforward evidence in favour of one analysis, the stress-driven one.
If the basic phonological data are not decisive for phonologists, how is it that songwriters have converged on treating Filipino as having stress? It is possible that there’s something in the basic data that no phonologists have noticed, but which is decisive for children learning the language. Or cases like Filipino could be telling us that, faced with ambiguous data of the Filipino type, learners are biased to acquire a lexicon and grammar with stress.
Our findings echo those of Domene Moreno & Kabak (Reference Domene Moreno and Kabak2022) for Turkish songs. In Turkish, as in Filipino, it has been proposed that words with non-final prominence bear true stress, while words with final prominence bear only phrase-final accent. Domene Moreno & Kabak measure beat strength and melodic peakhood in a song corpus. They find that, in Western European-style children’s songs, linguistically prominent syllables receive more of both types of musical prominence, with no difference between penultimate prominence and final prominence. Like us, they take this as evidence for word-level stress in Turkish.
Domene Moreno & Kabak found that songs they analysed in the Makam style did not give musical prominence to either type of Turkish prominent syllable. This raises the question of whether Western European-style Turkish children’s songs and OPM songs are both showing influence from English-language pop music’s tendency to align musical prominence and stress. This is possible, but does not explain away their or our results, because songwriters influenced by English songs would still have to decide what counts, in their language, as the equivalent to English’s stress. And in these Turkish and Filipino corpora, the songwriters have decided to treat both final and non-final prominent syllables as needing to be musically prominent.
The one interpretation of our data that could be consistent with an underlying-length analysis is effectively an empty one, where, before any phonology applies, an underlying length contrast gets converted into surface stress for all content words, both in words that have an underlying long vowel (/ʔa:bot/ → [ˈʔa:.bot]) and in words that do not, which receive final stress (/ʔabot/ → [ʔa.ˈbot]). Without direct access to speakers’ underlying representations, the availability of a deeper level of analysis with length only, cannot be refuted by any data. More broadly, data alone cannot rule out an analysis of any phenomenon where a feature that the phonology appears to be sensitive to is actually the (un-neutralised) reflex of a different underlying feature, though there could be cross-linguistic or theoretical justifications for such an analysis. We do not, however, find any support for underlying length in the text-setting data, which appears to be sensitive only to stress.
We end with a methodological note on the usefulness of musical data for low- and medium-resource languages. Filipino could be considered a medium-resource language. Unlike for most of the world’s languages, there are corpora and engineering tools, either available or in development: see Jakubíček et al. (Reference Jakubíček, Kilgarriff, Kovář, Rychlý, Suchomel, Hardie and Love2013), Go & Nocon (Reference Go, Nocon and Roxas2017), Go et al. (Reference Go, Nocon and Borra2017), Lazaro et al. (Reference Lazaro, Policarpio and Guevara2009), Ang et al. (Reference Ang, Guevara, Miyanaga, Cajote, Ilao, Bayona and Laguna2014), and many others. But the extent of these resources is small compared to what exists for English, Korean, French and other languages with well-funded public and private research infrastructure.Footnote 16 Our song corpus consists of 1,662 words in total. A spoken corpus of that size would be too small for studying stress correlates, with too many sources of noise (speech rate, inherent duration and loudness of vowels, etc.). But in songs, we have access to songwriters’ categorical decisions about duration and strength, which makes the data clean enough for clear patterns to emerge. We originally coded and analysed just nine songs, and the main patterns were already there; adding the remaining ten songs made us more confident in the results, but didn’t change them. A small corpus of songs, even a number as small as what a research team could transcribe themselves from listening to recordings, can thus be useful for gaining insight into the phonology of a lower-resource language, as long as the object of study occurs with sufficient density in songs. In our case, most of the syllables in a song provided relevant data, so the density of observations per song was high.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0952675725000041.
Data availability statement
The annotated R code that generated our figures and results, and some additional statistical analysis is available in the Supplementary Material.
Acknowledgements
For their feedback on various parts of this project, we thank participants in the UCLA phonology seminar, the UC Berkeley Phorum, the University of Ottawa linguistics colloquium, the Linguistic Society of the Philippines RIPPLE series (Researches, Insights, and Perspectives on the Philippine Linguistics Enterprise), the Manchester Phonology Meeting and the Linguistic Society of America annual meeting. We especially thank Peter Avery, François Dell, Bruce Hayes, Larry Hyman, Dan Kaufman, Stephanie Reed, Hannah Sande and the editors and reviewers of Phonology for their comments.
Funding statement
This research was supported by a grant from the UCLA Academic Senate’s Committee on Research.
Competing interests
The authors declare no competing interests.
A. Songs
Table A1 lists the songs used in the corpus. Because the sheet music we used is made by listeners, where possible we list the performer whose performance the sheet music is based on, as well as the composer, when known.
Table A1 Songs in the corpus.
