Hostname: page-component-857557d7f7-9f75d Total loading time: 0 Render date: 2025-12-04T10:22:37.304Z Has data issue: false hasContentIssue false

Partisan conflict in nonverbal communication

Published online by Cambridge University Press:  04 December 2025

Mathias Rask
Affiliation:
Department of Political Science, Aarhus University, Aarhus, Denmark
Frederik Hjorth*
Affiliation:
Department of Political Science, University of Copenhagen, Copenhagen, Denmark
*
Corresponding author: Frederik Hjorth; Email: fh@ifs.ku.dk
Rights & Permissions [Opens in a new window]

Abstract

In multiparty systems, parties signal conflict through communication, yet standard approaches to measuring partisan conflict in communication consider only the verbal dimension. We expand the study of partisan conflict to the nonverbal dimension by developing a measure of conflict signaling based on variation in a speaker’s expressed emotional arousal, as indicated by changes in vocal pitch. We demonstrate our approach using comprehensive audio data from parliamentary debates in Denmark spanning more than two decades. We find that arousal reflects prevailing patterns of partisan polarization and predicts subsequent legislative behavior. Moreover, we show that consistent with a strategic model of behavior, arousal tracks the electoral and policy incentives faced by legislators. All results persist when we account for the verbal content of speech. By documenting a novel dimension of elite communication of partisan conflict and providing evidence for the strategic use of nonverbal signals, our findings deepen our understanding of the nature of elite partisan communication.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of EPS Academic Ltd.

1. Introduction

On January 19, 2004, Vermont Governor Howard Dean doomed his presidential campaign. In an attempt to rally his supporters after a disappointing third place in the Iowa caucuses, Dean shouted a list of the primary states ahead followed by a loud, primal yell, now known as the ‘Dean Scream.’ Its subsequent internet virality is widely perceived as a contributing factor in ending Dean’s presidential aspirations. The use of vocal style in politics can also be more deliberate. After becoming the United Kingdom’s prime minister, Margaret Thatcher famously went through extensive voice coaching, dramatically altering her vocal style in order to present a more powerful persona. The ‘Dean Scream’ and Thatcher’s engineered voice change illustrate the role of vocal style, and the nonverbal dimension more generally, in elite political communication.

Although nonverbal communication is broadly understood to matter in the abstract, researchers almost exclusively study the verbal dimension of elite political communication in practice (Taylor et al., Reference Taylor, Dean and Christopher2023). Most notably, a rich literature uses parliamentary speech as a window to party competition, particularly in systems where party cohesion masks within-party differences. This work typically builds on scaling methods using speech word counts (Slapin and Proksch, Reference Slapin and Proksch2008; Hjorth et al., Reference Hjorth, Klemmensen, Hobolt, Hansen and Kurrild-Klitgaard2015; Lauderdale and Herzog, Reference Lauderdale and Herzog2016), and more recently word embeddings (Rheault and Cochrane, Reference Rheault and Cochrane2020) to capture parties’ and individual legislators’ ideological positions. Another line of work uses statistical models (Gentzkow et al., Reference Gentzkow, Shapiro and Taddy2019) and machine learning classifiers (Peterson and Spirling, Reference Peterson and Spirling2018) to measure polarization from word choice. Finally, scholars have used sentiment and content analysis to uncover non-positional dimensions of partisan conflict (Proksch et al., Reference Proksch, Lowe, Wäckerle and Soroka2019). While studies in this vein tap into different aspects of partisan conflict, e.g., positional vs. non-positional (Serra, Reference Serra2010; Jung and Tavits, Reference Jung and Tavits2021; Skytte, Reference Skytte2021; Bjarnø e et al., Reference Bjarnøe, Adams and Boydstun2023), they share a focus on word choice, i.e., the verbal dimension of speech.

In this paper, we expand the study of elite partisan conflict to the nonverbal dimension. We do so by developing and validating a measure of nonverbal signaling of conflict based on variation in a speaker’s emotional arousal, as indicated by changes in a speaker’s vocal pitch relative to their own baseline. We test our approach using audio data from more than two decades of parliamentary debates in Denmark, the largest collection of natural audio in political science. To preview our findings, we find that emotional arousal measured using vocal pitch closely tracks lines of partisan polarization and that it predicts subsequent partisan behavior in the form of legislative voting. We also find that legislators use heightened arousal in more visible and high-profile debates and when addressing parties with greater policy bargaining leverage, both consistent with legislators using nonverbal communication strategically to signal to voter constituencies and to pressure pivotal parties.

We make three distinct contributions. First, we document a novel dimension by which partisan elites communicate conflict to citizens. This finding challenges the prevailing near-exclusive focus on verbal communication in existing research and highlights the need for multimodal studies of elite political communication.

Second, we extend the emergence of “audio as data” methodology to a novel domain. Earlier work has used audio data to study topics such as judicial decision making (Dietrich et al., Reference Dietrich, Enos and Sen2019), oral court arguments (Knox and Lucas, Reference Knox and Lucas2021), and gender representation (Dietrich et al., Reference Dietrich, Hayes and O’Brien2019; Rittmann, Reference Rittmann2024). However, we are the first to use audio data to study multiparty conflict.

Third, and more substantively, our findings shed new light on the strategic nature of elite nonverbal communication. To reiterate, we find that changes in vocal style are highly predictable from legislators’ vote-seeking and policy-seeking motives, indicating that legislators use vocal style strategically to further their political objectives. Hence, our findings indicate that deliberate, strategic use of nonverbal communication (as in the case of Margaret Thatcher) is widespread in our empirical setting, and spontaneous, non-strategic use (as in the case of Howard Dean) is much less so. Notably, this conclusion cuts against some earlier work seeing vocal style as beyond the speaker’s control (Dietrich et al., Reference Dietrich, Enos and Sen2019). This difference is plausibly attributable to differences in speakers’ institutional constraints, but our findings nevertheless add to our understanding of the nature of elite nonverbal communication. We revisit this question in the concluding section.

We proceed as follows: We elaborate on each of our contributions, situating them in the existing literature. We present our measurement approach, data, and measures for studying the nonverbal dimension of elite partisan conflict. We develop four hypotheses, which we evaluate in the results section. In the concluding section, we discuss the implications of our findings for future work.

2. Nonverbal communication of partisan conflict

Political scientists have developed an array of tools for characterizing political conflict, particularly along partisan lines. The two most prominent are, arguably, roll calls and parliamentary speeches. The former has been used predominantly in Congressional scholarship using DW-NOMINATE scores to study positional polarization between Republicans and Democrats (e.g., McCarty et al., Reference McCarty, Poole and Rosenthal2016) but also in multiparty contexts such as the European Parliament (e.g, Høyland, Reference Høyland2010). Approaches based on parliamentary speech have documented rising partisan polarization reflected in word choice (Peterson and Spirling, Reference Peterson and Spirling2018; Gentzkow et al., Reference Gentzkow, Shapiro and Taddy2019). While communication-based measures of partisan polarization accommodate the shortcomings of roll-call votes, existing measures have strictly relied on the verbal content of speech, i.e., the words used by partisan elites rather than the style in which they are used. Although a meaningful reduction from a methodological point of view, this narrow focus on verbal content limits our substantive understanding of the nature of partisan polarization in elite communication.

Although the focus on verbal content is substantively and analytically reasonable in many cases, it ignores a significant dimension of human communication. Most importantly, it strips away nonverbal elements of speech as an important marker of interpersonal conflict (Deutsch et al., Reference Deutsch, Coleman and Marcus2011). Nonverbal speech includes aspects like intonation, volume, and accent, commonly referred to as paralinguistic cues (e.g., Scherer et al., Reference Scherer, Johnstone and Klasmeyer2003). This omission is remarkable given that the centrality of nonverbal communication in human and social interaction has been firmly established in thousands of linguistics and psychology studies.Footnote 1 This literature shows that listeners rely on speakers’ vocal cues to make inferences about speakers’ emotional state, intentions, and character traits (Scherer et al., Reference Scherer, Ladd and Silverman1984, Reference Scherer, Johnstone and Klasmeyer2003; Zuckerman and Driver, Reference Zuckerman and Driver1989; Banse and Scherer, Reference Banse and Scherer1996; Owren and Bachorowski, Reference Owren and Bachorowski2007; Anderson et al., Reference Anderson, Klofstad, Mayew and Venkatachalam2014; Laustsen et al., Reference Laustsen, Petersen and Klofstad2015).Footnote 2

We build on earlier work, including a substantial body of experimental evidence, that documents the distinct role played by nonverbal communication in shaping listeners’ perceptions and evaluations. In a validation study, Cochrane et al. (Reference Cochrane, Rheault, Godbout, Whyte, Wong and Borwein2022) show that human coders consistently infer sentiment (i.e., positive vs negative) from text and audio clips, but arousal (i.e., how activated a speaker is) is not detected reliably from text, only from audio. Moving this fundamental insight to the political domain, a small number of studies examine how voters’ evaluations of candidate traits are affected by voice characteristics, with lower-pitched candidates being rated as more competent and receiving more votes than higher-pitched candidates (Touati, Reference Touati1993; Klofstad et al., Reference Klofstad, Rindy C and Susan2012; Tigue et al., Reference Tigue, Borak, O’Connor, Schandl and Feinberg2012; Klofstad, Reference Klofstad2016; Cinar and Kıbrıs, Reference Cinar and Kıbrıs2023). Recently, Taylor et al. (Reference Taylor, Dean and Christopher2023) developed a framework to study the causal effects of multimodal political data sources, such as campaign speeches. Using the framework, they show that vocal delivery matters for voters’ impression and evaluation of political candidates, even when verbal expressions are held constant.

This experimental body of work mostly considers between-speaker differences in voice characteristics in a small set of audio recordings, but a burgeoning literature is studying within-speaker changes in nonverbal expressions using massive audio collections. For example, Dietrich et al. (Reference Dietrich, Hayes and O’Brien2019) and Rittmann (Reference Rittmann2024) show that changes in legislators’ vocal pitch contain information about a legislator’s issue engagement. Another line of work finds that political candidates strategically shift their rhetorical style to align with the demands of their audiences by lowering and heightening their phonetic articulation of vowels (Neumann, Reference Neumann2019). Finally, nonverbal speech characteristics convey the attitudes of US Supreme Court Judges (Knox and Lucas, Reference Knox and Lucas2021) and their subsequent voting behavior (Dietrich et al., Reference Dietrich, Enos and Sen2019).

2.1. Voicing partisan conflict

We use these diverse sets of literature as our point of departure to theorize how legislators use nonverbal communication to signal partisan conflict. We focus on a particular aspect of nonverbal communication: the vocal dimension. To be sure, nonverbal communication also involves non-vocal features such as facial expressions, gestures, and body language (Joo et al., Reference Joo, Erik P and Claudia2019; Boussalis et al., Reference Boussalis, Coan, Holman and Müller2021; Neumann et al., Reference Neumann, Fowler and Ridout2022). Still, vocal communication makes up a significant part of nonverbal communication (Patel and Scherer, Reference Patel, Scherer, Hall and Knapp2013) and is particularly relevant in understanding inter-human conflict (Deutsch, Reference Deutsch1973).Footnote 3 We refer to the vocal dimension as nonverbal speech throughout and use nonverbal signaling to denote situations where nonverbal characteristics of a speech contain signals about partisan conflict.

We start from the simple observation that a higher voice is associated with conflict throughout social life. When having emotionally charged discussions with family members, debating politics with friends, or protesting in the streets, humans heighten their voices when they disagree. Intuitively, this happens since disagreement and conflict involve an emotional component that makes a speaker more emotionally activated, reflected in, e.g., a heightened voice. (We refer to ‘voice in an auditory sense [e.g., loudness and pitch] and not in the metaphorical sense of being heard or represented [e.g., Mansbridge Reference Mansbridge1999]). In the following, we use the term ‘emotional arousal’ (or, interchangeably, simply ‘arousal’) to denote any perception of a speaker being more or less emotionally activated. We stress that ‘emotional’ in this context does not imply autonomic, i.e., arousal can potentially be employed strategically.

The association between a speaker’s emotional arousal, whether strategic or not, and nonverbal conflict signaling is likely to hold in the political domain as well. In the following, we focus on this link as it manifests in speeches given in parliamentary debates, which legislators use primarily to signal the positions of the parties on issues that are up for discussion (Proksch and Slapin, Reference Proksch and Slapin2012; Bäck et al., Reference Bäck, Debus and Fernandes2021). Since most of the deliberations and negotiations over bills take place in committees and behind closed doors, the parliamentary debates generally serve to showcase policy positions and arguments publicly, and to highlight partisan differences (Laver et al., Reference Laver, Back, Debus and Fernandes2021), even when parties are not ideologically distinct (Kosmidis et al., Reference Kosmidis, Hobolt, Molloy and Whitefield2019).

When should we expect the emotional arousal of a legislator to convey signals of partisan conflict? We focus on two theoretically plausible drivers of partisan conflict—polarization and policy disagreement—that may cause variation in a speaker’s emotional arousal and discuss and hypothesize how each likely manifests in nonverbal communication. Although legislators can be aroused for other reasons than partisan conflict, this motive is likely to dominate other drivers (e.g., issue engagement) in certain parliamentary speeches, such as dyadic exchanges. We use the term ‘dyadic exchanges’ to refer to speeches in which the speaker addresses a single party by name or an individual legislator from that party. To fix terminology, we call the party of the former the speaker party and the latter the target party.

First, we expect arousal to reflect prevailing patterns of partisan polarization. Across dyadic exchanges, interactions between highly polarized pairs of speakers, as indicated by their partisanship, should be more conflictual on average. To the extent that partisan polarization is a driver of emotional arousal in the context of dyadic exchanges, this should then be reflected in legislators’ nonverbal signals. In European multiparty systems, party competition is generally structured by multiparty blocs (Bale, Reference Bale2003), which is also true of the Danish case (Kosiara-Pedersen and Kurrild-Klitgaard, Reference Kosiara-Pedersen and Kurrild-Klitgaard2018). As a result, we test the first hypothesis based on party blocs rather than individual parties, but the approach easily generalizes to non-bloc settings.Footnote 4 Hence, we expect:

$H_1$: Emotional arousal is higher in speeches with outbloc target parties than in speeches with inbloc target parties.

Second, we consider policy disagreement a source of conflict. To form this expectation, we turn to the coalition literature, which has shown a close connection between intra-coalition conflict and the fate of bills: Bills on which coalition parties disagree are introduced later to the agenda relative to when coalition parties are united (Martin and Vanberg, Reference Martin and Vanberg2004), take longer to pass (Martin and Vanberg, Reference Martin and Vanberg2011), and are subject to greater scrutiny by non-coalition partners (Wonka and Göbel, Reference Wonka and Göbel2016; Fortunato et al., Reference Fortunato, Martin and Vanberg2019; Behrens et al., Reference Behrens, Nyhuis and Gschwend2024). Moreover, policy disagreement is known to be reflected in verbal expressions: The sentiment expressed by the opposition predicts whether the bills are passed unanimously (Proksch et al., Reference Proksch, Lowe, Wäckerle and Soroka2019). While the government-opposition divide generally dominates legislative work (e.g., Hix and Noury, Reference Hix and Noury2016), conflict should also arise at the level of each bill (Proksch et al., Reference Proksch, Lowe, Wäckerle and Soroka2019). Specifically, regardless of the ideological distance and the government-opposition status of the parties, interactions between speakers who disagree on a bill should be more conflictual on average than interactions between speakers who agree. To the extent that policy conflict is a driver of emotional arousal in dyadic exchanges in parliamentary debates on bills, this should be reflected in legislators’ nonverbal signals. Hence, we expect:

$H_2$: Emotional arousal is higher in speech directed at target parties with whom they disagree on a bill than in speech directed at target parties with whom they agree.

At this point, it cannot be ruled out that $H_2$ is just a downstream cause of the stable bloc polarization tested formulated in $H_1$. To rule out that a relationship with policy disagreement does not merely reflect pre-existing polarization between parties, we add a test of $H_2$ including dyad fixed effects, which controls for differences in bloc affiliation and any other stable dyad-level differences in party alignment.

2.2. The strategic use of nonverbal signals

The first two hypotheses predict that legislators’ nonverbal conflict signals, in the form of heightened emotional arousal, systematically reflect partisan polarization and policy disagreement. Still, they leave open the question of whether it is strategic. We now turn to this question, developing two hypotheses that test key observable implications of strategically used nonverbal conflict signals. We first explain why signaling conflict in parliamentary debate can exert pressure on a target party. We then develop specific expectations of which parties are strategically important to pressure and under which conditions.

To explain how nonverbal conflict signals can put pressure on target parties, we highlight the role of selective media uptake. News media are more likely to cover conflictual and emotional interactions (Schulz, Reference Schulz2007; Dietrich et al., Reference Dietrich, Schultz and Jaquith2018; Gennaro and Ash, Reference Gennaro and Ash2023), and conflict often appears as a criterion in its own right in contemporary typologies of news selection criteria (Harcup and O’Neill, Reference Harcup and O’Neill2017). As a consequence, conflict signals increase the probability that an exchange in parliament will be picked up in news media. This heightened visibility can, in turn, put pressure on the target party to justify or potentially reconsider its position on the agenda item.

Crucially, legislators’ ability to exploit the media channel varies across debates. The legislative calendar features a subset of high-profile debates, most notably the opening and closing debates in each parliamentary term. These debates are characterized by full attendance, a focus on principled policy debate rather than lawmaking, and most importantly significantly increased media coverage. Given the heightened availability of the media channel, we expect legislators to concentrate expressions of conflict in these debates strategically:

$H_3$: Emotional arousal is higher in high-profile than low-profile debates.

Our third hypothesis corresponds closely to the primary hypothesis in Osnabrügge et al. (Reference Osnabrügge, Hobolt and Rodon2021) that legislators strategically employ emotive language in high-profile debates where legislative rhetoric is more likely to reach voters. This expectation is robustly supported by showing that the verbal content of speech varies systematically across debate types in the UK House of Commons. Our third hypothesis articulates our expectation that the strategic language use identified by Osnabrügge et al. (Reference Osnabrügge, Hobolt and Rodon2021) extends to the nonverbal dimension. To isolate the distinct role of the nonverbal signals, we test $H_3$ using covariates capturing the verbal use of emotive language.

Strategic use of nonverbal communication has implications not only for when legislators signal partisan polarization but also for which partisan outgroups are targeted. Whereas $H_3$ captures legislators’ vote-seeking motives (Mayhew, Reference Mayhew1974), our next expectation captures policy-seeking motives in the context of parliamentary debate. Given policy-seeking motives, we expect that legislators focus on target parties with the greatest influence on policy-making. Specifically, we expect parties to concentrate conflict signals on target parties with greater bargaining leverage, since they are most likely to ultimately affect policy, through either the existing governing majority or an alternative majority. This leads to our fourth and final hypothesis:

$H_4$: Emotional arousal is positively associated with the target party’s bargaining leverage.

Following traditional conceptualizations (Shapley and Shubik, Reference Shapley and Shubik1954), we understand bargaining leverage as a function of parties’ probability of participating in an alternative government, were one to form. We measure bargaining leverage using a recently developed approach (Kayser et al., Reference Kayser, Orlowski and Rehmert2023) (see ‘Data and Methods’ below).

Importantly, the prediction in $H_4$ cuts against what a theory of non-strategic nonverbal communication would predict. If nonverbal communication is primarily or entirely a non-strategic, affective reflex, nonverbal signals of conflict should be greater in speeches directed at more extreme targets, which are more likely to arouse anger (Webster, Reference Webster2021). In Supplementary material, Appendix A, we show that bargaining leverage is higher for mainstream parties and lower for challenger parties at the ideological extremes. Hence, if nonverbal signaling of conflict were primarily an affective reaction to extreme parties, we should expect the opposite of the predicted association. Hypothesis 4 therefore aims to discriminate between the predictions of strategic vs. non-strategic theories of nonverbal communication.

We stress that these hypotheses are implicitly causal, i.e., they reflect our theoretical understanding of nonverbal communication as a reflection of partisan conflict. In some cases, the temporal order of variables rules out reverse causation (e.g., party affiliation temporally precedes nonverbal communication), but confounding remains a concern. As discussed below, we introduce a rich set of covariates to address confounding concerns. That said, causal identification is ultimately limited by the fact that policy conflict is not randomly assigned.

3. How partisan conflict is reflected in audio data

Analyzing how partisan conflict is signaled in politicians’ nonverbal communication is challenging. Whereas verbal measures such as negativity or scaling estimates can be derived from speech transcripts, nonverbal features of speeches are generally stripped away in transcription. Consequently, text-only transcripts are often ill-suited to study nonverbal dimensions of speech. Second, text-based measures of nonverbal communication rely on the coding procedures of data sources. For instance, Imre et al. (Reference Imre, Ecker, Meyer and Müller2023) develop a novel measure of ‘coalition mood’ based on applause patterns in parliamentary debates between coalition partners in Germany and Austria. While valuable, this measure depends on the availability of stenographic protocols marking nonverbal communication in party-to-party interactions.

To deal with these shortcomings, we turn to audio recordings of political speeches. Although recordings are widely and publicly available across many political institutions, they have received only scant attention from political scientists (though see Dietrich et al., Reference Dietrich, Hayes and O’Brien2019; Neumann Reference Neumann2019; Knox and Lucas, Reference Knox and Lucas2021; Rittmann, Reference Rittmann2024). In addition to conveying the verbal content of speech (i.e., spoken words), audio contains information that goes beyond what we can infer from words alone, and most importantly, it conveys information on the emotional arousal of a speaker (Cochrane et al., Reference Cochrane, Rheault, Godbout, Whyte, Wong and Borwein2022).

3.1. How pitch reflects emotional arousal

Our indicator of emotional arousal is based on changes in a speaker’s vocal pitch, the perceptual analog of the fundamental frequency ( $F0$) of a waveform (Rabiner and Schafer, Reference Rabiner and Schafer2010).Footnote 5 The perception of the vocal pitch increases monotonically, but not linearly, with $F0$ such that a voice with a higher $F0$ is perceived as higher and vice versa. Each speaker has a baseline $F0$ which is largely explained by biological and physiological factors such as sex and height (Evans et al., Reference Evans, Neave and Wakelin2006; Pisanski et al., Reference Pisanski, Cartei, McGettigan, Raine and Reby2016). However, a rich psychological literature on the vocal expression of emotions shows that variation in pitch is a robust and strong indicator of expressed and perceived emotional arousal, independently of the verbal content (Banse and Scherer, Reference Banse and Scherer1996; Scherer et al., Reference Scherer, Johnstone and Klasmeyer2003; Bänziger and Scherer, Reference Bänziger and Scherer2005).Footnote 6

Politicians also use other nonverbal signals such as loudness, speech rate, or jeering to convey partisan and policy disagreement. We focus on the vocal pitch for three reasons. First, as already outlined, vocal pitch is a robust indicator of emotional arousal, which we expect to be higher in conflictual and polarizing contexts. Second, vocal pitch is shown to predict legislators’ issue engagement and policy priorities in both presidential and parliamentary democracies (Dietrich et al., Reference Dietrich, Hayes and O’Brien2019; Rittmann, Reference Rittmann2024) and judges’ vote intentions (Dietrich et al., Reference Dietrich, Enos and Sen2019). Third, pitch estimation is less sensitive to recording quality than features such as loudness since it depends less on the spectral characteristics of a sound (Vainio et al., Reference Vainio, Suni, Šimko and Kakouros2023). This is particularly important when analyzing nonverbal features of speech over time.

3.2. Conflict-driven arousal in dyadic exchanges

To be sure, variation in pitch can reflect other motivations than partisan conflict. To mitigate this issue, we consider only a type of interaction in which partisan conflict motives are likely to dominate. Specifically, we consider dyadic exchanges in parliamentary proceedings, where higher emotional arousal expressed in a speech is more likely to indicate partisan conflict than, for example, issue engagement due to the nature of this type of interaction. The partisan nature of such interactions makes them a prime avenue for the expression of partisan conflict. Hence, when legislators target out-partisans in a dyadic exchange, a heightening pitch is, on average, more indicative of partisan conflict in that specific context. Conversely, when legislators heighten their pitch when mentioning their social groups, this is likely more indicative of issue engagement and group commitments (e.g., Dietrich et al., Reference Dietrich, Hayes and O’Brien2019).

3.3. Other drivers of emotional arousal

While we expect conflict motives to dominate variation in arousal in dyadic exchanges, partisan conflict is not the only potential driver of arousal. We illustrate our measurement model in Figure 1 in the form of a directed acyclical graph (DAG).

Figure 1. Directed acyclical graph illustrating documented and hypothesized links between pitch, arousal, conflict, and other causes. The link from partisan conflict is highlighted to illustrate that we expect this motive to dominate in the context of dyadic exchanges.

We rely on theory and evidence establishing pitch as a measure of emotional arousal (link [a]), and we expect arousal to be heightened under conditions of partisan conflict (link [c]). At the same time, prior work has found vocal pitch to be informative of other motivations. Most pertinently, Dietrich et al. (Reference Dietrich, Hayes and O’Brien2019) and Rittmann (Reference Rittmann2024) use pitch as an indicator of issue emphasis in parliamentary speech. We illustrate this established link as [b] in Figure 1.

Other work uses pitch as an indicator for other types of motivations, namely voting intentions (Dietrich et al., Reference Dietrich, Enos and Sen2019), and inter-party politics (Arnold and Küpfer, Reference Arnold and Küpfer2024). For simplicity, we represent these and other potential drivers of heightened arousal as link [d] in Figure 1. While we theorize that legislators use arousal as a conflict signal in strategic ways (see the presentations of hypotheses 3 and 4), we note that this measurement model in itself does not hinge on that point, i.e., arousal can signal conflict for either strategic or non-strategic reasons.

3.4. Standardization of vocal pitch

We follow extant work and use a standardized measure at the speech level where the pitch is converted to standard deviations above or below a speaker’s average (Dietrich et al., Reference Dietrich, Hayes and O’Brien2019; Rittmann, Reference Rittmann2024). This is done to parse out heterogeneity arising from speaker-specific voice differences such as physiological and biological factors, akin to the reason for including unit fixed effects in dealing with panel data (Rheault and Borwein, Reference Rheault and Borwein2019) and to take possible measurement error into account (Dietrich et al., Reference Dietrich, Enos and Sen2019). As a consequence, the estimates reflect within-speaker changes in pitch. We compute the baseline of each speaker as the average pitch in all speeches given by a speaker in our corpus (see ‘Data and Measures’). Further information about our pitch measure and the implication of standardization is shown in Supplementary material, Appendix B.Footnote 7

Yet, standardization is not without drawbacks. By removing speaker heterogeneity, we pay the cost of not being able to examine the role of stable speaker-level differences in vocal pitch. Most obviously, the gender gap in vocal pitch suggests that men and women face very different constraints and roles expectations regarding vocal style (Boussalis et al., Reference Boussalis, Coan, Holman and Müller2021). In Supplementary material, Appendix C, we show that we find no appreciable effect heterogeneities with respect to gender. The question of individual, including gendered, differences is revisited in the concluding section.

3.5. Validation

The link between vocal pitch and emotional arousal has been extensively validated in the psychological literature (see e.g., Supplementary material of Dietrich et al., Reference Dietrich, Hayes and O’Brien2019), but not explicitly in the context of parliamentary speeches and as an expression of partisan conflict. To validate that vocal pitch is a valid measure arousal in such setting, we conduct three validation exercises and an analysis of the measurement error, which we present in additional detail in Supplementary material, Appendix D. To summarize the results of the validation analysis, we show that (1) coders are able to consistently and reliably infer a legislator’s emotional arousal from speech-level audio recordings, (2) speaker-standardized speech-level estimates of vocal pitch are strongly correlated with the manual arousal codings, and (3) that pitch is negatively associated with text sentiment, consistent with the assumption that variation in arousal in the context of dyadic exchanges reflects conflict.

4. Data and measures

We validate our approach in the context of parliamentary debates in the Danish parliament, the Folketing. Audio recordings are available for a vast amount of political institutions (e.g., Barari and Simko, Reference Barari and Šimko2023), but legislatures, in particular, maintain comprehensive archives with more than a thousand hours of recordings. We focus on Denmark since it is possible to obtain digitized recordings spanning more than two decades of debate, longer than any other archive to the best of our knowledge.

4.1. Text-audio corpus and alignment

We collect all available recordings of plenary sessions in the Danish parliament from October 2000 to September 2022 covering six national elections (2001, 2005, 2007, 2011, 2015, and 2019), 28 parliamentary terms, and a total of 2,186 debates containing $850,357$ speeches. To obtain transcripts of the recordings, we rely on a combination of ParlSpeech V2 and manually scraped XML files. We follow extant work using parliamentary speeches (e.g.,Dietrich et al., Reference Dietrich, Hayes and O’Brien2019; Silva et al., Reference Castanho, B and J2025) and remove shorter speeches from the corpus. Shorter speeches are typically either procedural or interjections and interruptions that carry little substantive information for the study of partisan and policy conflict. For the main analysis, we use a threshold of $40$ or more words but we show in Supplementary material, Appendix E that the results are nearly numerically identical to the choice of threshold.Footnote 8 This leaves us with a total of $393,264$ transcribed speeches. As the next step, we align the transcripts with the corresponding audio using the Python library speechannote. We were able to align text and audio for $96.3$ pct. of our speeches (a total of $378,566$). We elaborate on how we construct and preprocess our data in Supplementary material, Appendix F.

4.2. Legislative votes

For our second hypothesis, we measure policy conflict based on disagreement in legislative voting. We obtain voting records from an enhanced version of ParlSpeech V2 (Rauh and Schwalbach, Reference Rauh and Schwalbach2020) where speeches are linked to legislation. This data is available from November 2007 to September 2022. As a final step, we match each speech containing voting records with our preprocessed transcript using fuzzy string matching based on the Jaro–Winkler distance. We were able to match $80$ pct. of the speeches in the enhanced ParlSpeech V2 to our transcript.

4.3. Party dyads

Three of our four hypotheses regard dyadic exchanges in parliament, i.e., where the speaking legislator addresses a party either by name or in the form of an individual legislator from that party. To identify dyadic speeches, we rely on a dictionary approach containing the names of all parties and legislators (Schwalbach, Reference Schwalbach2023). To be classified as a dyad, the speech must reference one party or one of more of its MPs. We refer to such dyadic exchanges as party dyads (or, interchangeably, simply ‘dyads’). We identify dyads in $38.9$. pct of all aligned speeches. Supplementary material, Appendix G shows how the share of party dyads varies over time, and how this relates to the development of inbloc and outbloc dyads.Footnote 9

We use the party dyads to define our main predictors in $H_1$ and $H_2$. For $H_1$, we use a binary measure based on a party’s bloc affiliation. The Danish party system is characterized by two blocs, one denoted as left and the other as right. While each bloc contains substantial party differences, this has typically dominated party competition at the national level (Kosiara-Pedersen and Kurrild-Klitgaard, Reference Kosiara-Pedersen and Kurrild-Klitgaard2018). For $H_2$, we also use a binary measure based on whether two parties voted together. Importantly, this is conceptually and empirically distinct from $H_1$ as parties vote across their bloc affiliation.Footnote 10 While the share of votes where parties agree is larger within blocs, parties vote between blocs on several occasions (approx. $39$ pct.).

4.4. High-profile debates

Our third hypothesis $H_3$ predicts that legislators signal higher emotional arousal in debates that generate citizen and media attention. This prediction cuts across both dyadic and non-dyadic exchanges. While most debates are low-profile with a principled focus on law-making, the opening and closing debates of parliamentary terms stand out. These debates last an entire day, often over 12 hours, and cover programmatic policy differences rather than specific legislation. Because of their formal status and ideologically charged content, opening and closing debates receive considerable attention from voters, either directly or indirectly through the extensive media coverage of the debates (Osnabrügge et al., Reference Osnabrügge, Hobolt and Rodon2021). Hence, we define opening and closing debates as high-profile debates here and use the remaining set of debates as the reference category. We collect the dates of these debates, match them to our dataset, and finally generate an indicator of whether a speech is given in a low- or high-profile debate (a total of 6,116 party dyads).

4.5. Target bargaining leverage

Our fourth hypothesis $H_4$ predicts that legislators use heightened emotional arousal to put pressure on parties with higher bargaining leverage. We measure this using a recent approach introduced in Kayser et al. (Reference Kayser, Orlowski and Rehmert2023). Briefly put, the authors start from the premise that parties’ leverage in the policy process ultimately arises from credible threats to leave the government or the ability to join an alternative government. Building on this notion, Kayser et al. (Reference Kayser, Orlowski and Rehmert2023) calculate coalition inclusion probabilities (CIPs) using a predictive model with data on historical coalition patterns, party types and ideologies, election results, public opinion polls, and country-level institutional features as inputs. We use CIPs for Danish parties, which Kayser et al. (Reference Kayser, Orlowski and Rehmert2023) provide at the monthly level starting in 1970. We rely on the static version of the CIP data, which does not rely on information from between-election opinion polls. In Supplementary material, Appendix A, we present the CIPs for each party from 2000-2019.

4.6. Verbal covariates

To rule out that any observed relationship arises due to a strong correlation between verbal and nonverbal speech features, we generate three text-based measures. One reasonable concern is that our nonverbal measure of partisan conflict is encoded in speech sentiment. To account for this possibility, we first define a measure of sentiment capturing the relative use of positive and negative words in a speech using the Danish sentiment tool Sentida (Lauridsen et al., Reference Lauridsen, Dalsgaard and Svendsen2019). A separate concern is that our measure merely tracks emotive rhetoric (Osnabrügge et al., Reference Osnabrügge, Hobolt and Rodon2021). To address this, we follow the approach suggested by Gennaro and Ash (2022) and construct a continuous measure of emotionality at the speech level as our second measure. A final concern is that variation in our nonverbal measure is explained entirely by the topics discussed in party dyads. If polarizing and conflictual policy topics are discussed more in party dyads with an out-party target than in-party targets, an observed relationship could reflect topic selection rather than partisan conflict. If so, this suggests that issue engagement is a confounder. To account for this, we estimate a Structural Topic Model (Roberts et al., Reference Roberts, Stewart, Tingley, Lucas, Leder-Luis, Gadarian, Albertson and Rand2014) with $k=40$ and include the resulting topics as fixed effects. For details on how each verbal covariate is constructed, see Supplementary material, Appendix H.

5. Results

We now present results from tests of each of the four hypotheses. To simplify exposition, we first jointly present the results for $H_1$ and $H_2$, then results for $H_3$ and $H_4$. For each hypothesis, we estimate a series of linear regression models using the OLS estimator. For $H_1$, $H_2$, and $H_4$, the data consists only of speeches classified as party dyads. The dependent variable in all models is the speaker-standardized speech-level vocal pitch during a speech $i$ by each legislator $j$. We refer to this simply as ‘pitch’ or ‘vocal pitch.’ All regression tables are in Supplementary material, Appendix I.

5.1. Hypotheses 1 and 2: polarization and legislative voting

We present results for $H_1$ and $H_2$ in Figure 2. Panel 2a tests our first hypothesis predicting that legislators speak with a higher pitch in speeches between polarized pairs of legislators. We test this using an indicator of whether a speech is directed at an outbloc or inbloc legislator. The outbloc measure is a binary indicator that takes the value of $1$ if the target and speaker blocs differ and $0$ otherwise.

Figure 2. Coefficients for partisan polarization (left panel) and policy conflict (right panel) with standardized pitch as the outcome. Predictors are whether the target party is outbloc (left panel) and whether the target party voted differently in a legislative vote on a specific bill. Standard errors are clustered at the dyad level (speaker party $\leftrightarrow$ target party). Thick and thin error bars are the model-specific 90 and 95 pct. confidence intervals respectively. $Y$-axes are held fixed across the two panels to maximize comparability. (a) Partisan polarization and (b) policy conflict.

Across all five models, we estimate a positive and significant ( $p \lt 0.001$) coefficient. On average, the estimated effect size indicates that legislators speak with a pitch roughly $0.15$ standard deviations higher when talking to outbloc legislators than when talking to inbloc legislators. The relationship is robust to the inclusion of covariates capturing text sentiment, emotionality, or speech topic fixed effects and is only slightly reduced when all three text measures are included. While sentiment and emotionality have virtually no impact on the estimate, speech topic fixed effects result in a minor decrease but remain highly significant ( $p \lt 0.001$).

Turning to Panel 2(b), we test whether changes in pitch also conveys signals of partisan conflict when the conflict concerns policy disagreement and not general polarization. To do this, our second hypothesis turns the focus to legislative bills, where we predict that nonverbal signals also signal a party’s bill-level disagreement with other parties. If so, legislators should speak with a heightened pitch when addressing legislators from parties with whom they vote differently.

As for the first hypothesis, we find a positive and significant ( $p \lt 0.001$) coefficient across all five models. The relationship is almost invariant to the inclusion of verbal covariates across the board. Once again, sentiment and emotionality have no impact on the result, but the inclusion of speech topic fixed effects slightly reduces the estimate. Yet as for $H_1$, it remains highly significant ( $p \lt 0.001$). The estimated coefficient is twice the size of the coefficients for $H_1$ ( $0.2$) in the full model. Together, the results strongly indicate that legislators use nonverbal communication to signal partisan and policy conflict and that it accounts for a distinct dimension of elite partisan polarization compared to what is captured by verbal expressions alone. The coefficients for policy agreement are roughly twice the size for bloc polarization. This suggests that nonverbal signals of partisan conflict primarily reflect disagreement over policy, at least in the parliamentary arena.

To verify that the results in Panel 2(b) do not simply reflect bloc differences, we also test $H_2$ using models including dyad fixed effects. We present these models in Table J1 in Supplementary material, Appendix J. The coefficient for policy disagreement is robustly significant across these models and retains the entire magnitude ( $0.2025$ compared to $0.2021$) of the coefficients shown in Panel 2(b). Hence, even considering only variation within the same party dyads, nonverbal conflict signals strongly predict policy disagreement. As an additional robustness check, we use legislative vote margin to measure policy disagreement. In Supplementary material, Appendix J, we show that results for $H_2$ are robust to using this alternative measure. Lastly, to explore the potentially confounding role of time-specific factors, we show results by year and weekday in Supplementary material, Appendix J, finding no significant heterogeneity with respect to time.

5.2. Hypotheses 3 and 4: Debate type and target bargaining leverage

We now turn to our third and fourth hypotheses, which are observable implications of a strategic model of nonverbal communication. We use the same specifications with the only change being the predictor in each test. The results are in Figure 3.

Figure 3. Coefficients for debate type (left panel) and bargaining leverage (right panel) with standardized pitch as the outcome. The predictors are whether the speech is given in a high- compared to a low-profile debate (left panel) and the policy bargaining leverage of the target party (right panel). Standard errors are clustered at the dyad level (speaker party $\leftrightarrow$ target party). Thick and thin error bars are the model-specific 90 and 95 pct. confidence intervals respectively. $Y$-axes are held fixed across the two panels to maximize comparability. (a) Electoral: high- vs. low-profile debates. (b) Policy: target party bargaining leverage.

Panel (a) reports the results for our third hypothesis, predicting that legislators signal more partisan conflict in debates that attract substantial attention from the general public. We test this by regressing vocal pitch on an indicator of whether the debate is high-profile ( $=1$) or low-profile ( $=0$) across both dyadic and non-dyadic exchanges. The estimates reported are the coefficients on this indicator, which capture the average difference between high- and low-profile debates. Consistent with the reasoning in Osnabrügge et al. (Reference Osnabrügge, Hobolt and Rodon2021), we expect a positive estimate if legislators use vocal style strategically. Across the five models, this is also what we find. The coefficient is large, positive, strongly statistically significant ( $p \lt 0.001$), and largely invariant to the inclusion of verbal covariates. Once again, the estimate is invariant to sentiment and emotionality but is slightly smaller when topics are included in the model. These results also mirror the findings in Osnabrügge et al. (Reference Osnabrügge, Hobolt and Rodon2021), where the relationship is only slightly reduced. In the full model, legislators speak with a $.51$ standard deviation higher pitch in high-profile debates than in low-profile debates. We interpret this as indicative of a strategic model of nonverbal communication as high-profile debates reach a much larger electoral audience.

Turning to Panel (b) and our final and fourth hypothesis, we find that nonverbal signaling of conflict, as indicated by changes in pitch, correlates positively with a party’s bargaining leverage. In the bivariate specification, moving across the full range of bargaining leverage is associated with the standardized pitch rising by nearly $0.168$ standard deviations. As for the other hypotheses, the relationship is highly robust to including controls for verbal content, and the coefficient in the specification with the full set of controls remains virtually unchanged. Compared to Hypothesis 1-3, topics have no impact on the relationship. Together, the results for $H_3$ and $H_4$ are consistent with a strategic model of nonverbal signaling of conflict, suggesting that legislators deliberately use nonverbal communication to further their electoral or policy goals. In substantive terms, the results across the four hypotheses show that nonverbal communication captures a distinct dimension of partisan and policy conflict, and indicate that legislators use nonverbal communication strategically.

All the estimates above are expressed in standard deviations. To contextualize our results, our baseline estimates across tests correspond to around 9 percent (for the smallest observed differences) to around 33 percent (for the largest observed differences) of the difference in pitch between men and women in our sample. All baseline estimates exceed the estimated ‘just noticeable difference’ in pitch of around 5 Hz (Liu, Reference Liu2013), i.e., the minimal difference that is discernible to the human ear. In other words, the differences we observe are noticeable, and for the largest estimates amount to around a third of the range in pitch between the average man and woman.

6. Conclusion and discussion

Elite partisan conflict is an important feature of democratic systems, yet our understanding of how elites communicate partisan conflict to citizens is limited, partly because existing approaches only consider the verbal content of communication. To expand the study of partisan conflict in elite communication to the nonverbal domain, we examine how elite partisan conflict is reflected in legislators’ nonverbal signals. Analyzing audio data from two decades of debates in the Danish parliament, we find that partisan conflict is systematically reflected in a legislator’s nonverbal speech signals and that these signals predict subsequent legislative voting. Furthermore, we find evidence consistent with a strategic use of nonverbal communication: Legislators react more strongly to outbloc targets in high-profile debates and when addressing parties with greater bargaining leverage. Importantly, these associations remain largely unchanged when we account for the verbal content of speech, which strongly suggests that nonverbal communication accounts for a distinct dimension of elite communication of partisan conflict.

Some caveats are in order. One set of caveats pertains to our measurement approaches. Our outcome, speaker-standardized vocal pitch, is a somewhat crude measure that may not capture every aspect of nonverbal signaling of conflict. We expect that using a richer set of audio features can provide a more nuanced and fine-grained measure of nonverbal signaling and increase the precision of the measure. Relatedly, conflict signals in a speech are in practice likely to be concentrated in particular sentences or even single words. Our choice to average pitch across the entire speech effectively glosses over this variation. Future research could improve on this aggregation problem, for example by making use of novel methods for multimodal alignment which enable linkage between text and audio data at the word level (Arnold and Küpfer, Reference Arnold and Küpfer2024).

A second set relates to our measures of the verbal content of speech. We have employed a rich set of text covariates, but if these measures do not fully capture the relevant dimension in speech verbal content, our control strategy correspondingly fails to fully account for the role of the verbal content of speech. We see the development of approaches to more directly compare the role of nonverbal communication to that of verbal content as an important avenue for future research.

A third set of caveats relates to external validity. The evidence presented here comes from the Danish parliamentary system, which is characterized by, among other features, high party cohesion, relatively high party system fractionalization, and a low level of partisan polarization. Without evidence from other contexts, it is uncertain how well these findings travel to party systems with other characteristics. This is also true of previous work on audio in politics, which mainly relies on the United States as a case (although see Rittmann, Reference Rittmann2024). On this front, we see promise in the largely non-language-specific nature of nonverbal communication. Whereas text data approaches require a cross-language approach to extend to other contexts, a given measurement strategy based on audio features alone could in principle be directly applied in novel contexts without accounting for language changes (Scherer et al., Reference Scherer, Banse and Wallbott2001) at least within non-tonal communities. Like Danish, the majority of Indo-European languages are non-tonal, meaning that variation in intonation does not change the meaning of words.

These caveats notwithstanding, our findings hold important implications for the study of political communication and representation. First, our findings expand the set of known features by which elites communicate partisan conflict to citizens. Specifically, we expand this set to the nonverbal domain, studied through legislators’ vocal pitch. This in turn implies that any assessment of partisan polarization among elites based on verbal content alone is incomplete: Even in the hypothetical absence of conflict in the verbal content of speech, elites may still signal conflict by nonverbal means. As a consequence, efforts to encourage elites to engage in more civil, bipartisan, and conciliatory behavior should consider both verbal and nonverbal dimensions of communication.

Second, the indication in our findings that vocal style is employed strategically adds nuance to our understanding of the intentionality of nonverbal communication. Consider for example Dietrich et al. (Reference Dietrich, Enos and Sen2019), who work from the assumption that changes in vocal inflections are “beyond their conscious communication” (238). Consistent with this assumption, the authors find that vocal pitch in US Supreme Court Justices’ oral arguments predicts voting over and above what can be predicted from other observables. Comparing our empirical setting to that in Dietrich et al. (Reference Dietrich, Enos and Sen2019) institutional constraints are likely an important moderator. Most obviously, US Supreme Court Justices are lifetime appointees and thus by institutional design immune from reelection incentives. In contrast, we study a parliamentary setting where reelection motives loom large. Our findings indicate that in a competitive parliamentary setting, nonverbal communication is not beyond the realm of conscious communication. Regardless of the role of these plausible institutional moderators, our findings imply that nonverbal communication cannot be assumed to be an ‘honest’ window into the speakers’ true emotional state and that such an assumption would have to be justified in any specific application (see also Rittmann, Reference Rittmann2024).

Our findings also have methodological implications for experimental political science. At present, survey experimental treatments in political science overwhelmingly consist of text vignettes designed to convey the stimulus of interest (though see Taylor et al., Reference Taylor, Dean and Christopher2023). This treatment mode dominates even though the nonverbal dimension of political communication carries significant information, which is not reflected in a text vignette but could be captured in, e.g., audio or video snippets. Our findings underscore that researchers have to consider the use of multimodal treatments in survey experimental design.

Fourth and finally, our findings have substantive implications for how individual differences in vocal style condition political representation. Our theoretical framework implies that citizens perceive higher pitch to reflect increased emotional arousal, and our findings and validation analysis are consistent with this dynamic. While our analysis parses out stable individual differences by standardizing pitch within legislators, it is worth considering how a given legislator’s communication is affected by their vocal pitch (e.g., Cinar and Kıbrıs, Reference Cinar and Kıbrıs2023). For example, are legislators with a higher baseline vocal pitch generally perceived as signaling more partisan conflict? If citizens fail to adequately correct for elites’ vocal style when interpreting their speech, it could lead to biased elite perceptions. Such misperceptions could underpin for example the perception of women in politics as more emotional, a trait ascription often used to dismiss women candidates (Campbell, Reference Campbell1994). The role of trait inferences, and potential misperceptions, based on vocal style and nonverbal communication more broadly, is an important topic for future research.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2025.10059. To obtain replication material for this article, https://doi.org/10.7910/DVN/P5AKBK.

Footnotes

1 A Google Scholar search for “paralinguistic” returns $\approx$98,500 hits in December 2023.

2 See Knox and Lucas Reference Knox and Lucas(2021) for a review.

3 A multimodal approach combining text, audio, and video in studying political conflict is an important next step in unpacking the dimensions of political conflict. A multimodal approach has been pursued by Boussalis et al. (Reference Boussalis, Coan, Holman and Müller2021) in another domain, studying candidates’ emotional displays and voters’ reactions to them.

4 The use of blocs to capture the ideology of legislators in a multiparty setting like the Danish parliament is also used by Laustsen and Petersen (Reference Laustsen and Petersen2017).

5 The $F0$ is a physical property of any sound wave whether it arises from human speech, animal calls, explosions, or traffic noise. In the case of human speech, it is defined as “the number of vibrations per second made by vocal folds to produce a vocalization” (Tusing and Dillard, Reference Tusing and Dillard2000, 150).

6 The link between pitch and emotional arousal is based on a continuous model of emotions where a human’s emotional state can be placed in a two-dimensional space of valence (i.e., sentiment) and intensity (i.e., arousal). Pitch is a measure of the latter and is found to increase in both positive (joy/happy) and negative states (anger, fear, sad). Another model uses discrete emotions such as fear, joy, anger, etc. (Ekman, Reference Ekman1992; Reference Ekman1999). However, while pitch is a reliable indicator of how strong (i.e., aroused) emotions are expressed (Banse and Scherer, Reference Banse and Scherer1996), it is a challenging task to discriminate between discrete emotions from $F0$ contours. For this task, text-based measures achieve better results (Widmann, Reference Widmann2021).

7 The unstandardized pitch distribution (Panel F2a) has a bimodal shape arising from the physiological differences in the size of the vocal cords between men and women. If we were to interpret changes in absolute differences in pitch, the results would primarily reflect gender differences rather than differences in nonverbal signals of partisan conflict. Standardization effectively removes this heterogeneity, parsing out all time-invariant speaker-level characteristics.

8 We also remove speeches given by chairs as these contain no substantive information.

9 We illustrate the distribution of dyads disaggregated to the party level in Panel G2a in Supplementary material, Figure G2 in Appendix G.

10 See Panel G2b in Supplementary material, Figure G in Appendix G for a visualization of the distribution of voting dyads.

References

Anderson, RC, Klofstad, CA, Mayew, WJ and Venkatachalam, M (2014) Vocal fry may undermine the success of young women in the labor market. PloS one 9, .Google ScholarPubMed
Arnold, C and Küpfer, A (2024) Alignment helps make the most of multimodal data. https://arxiv.org/abs/2405.08454.Google Scholar
Bäck, H, Debus, M and Fernandes, JM (2021) The Politics of Legislative Debates. Oxford, UK: Oxford University Press.10.1093/oso/9780198849063.001.0001CrossRefGoogle Scholar
Bänziger, T and Scherer, KR (2005) The role of intonation in emotional expressions. Speech Communication 46, 252267.10.1016/j.specom.2005.02.016CrossRefGoogle Scholar
Bale, T (2003) Cinderella and her ugly sisters: The mainstream and extreme right in Europe’s bipolarising party systems. West European Politics 26, 6790.10.1080/01402380312331280598CrossRefGoogle Scholar
Banse, R and Scherer, KR (1996) Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70, .10.1037/0022-3514.70.3.614CrossRefGoogle ScholarPubMed
Barari, S and Šimko, T (2023) LocalView, a database of public meetings for the study of local politics and policy-making in the United States. Scientific Data 10, .10.1038/s41597-023-02044-yCrossRefGoogle Scholar
Behrens, L, Nyhuis, D and Gschwend, T (2024) Political ambition and opposition legislative review: Bill scrutiny as an intra-party signaling device. European Journal of Political Research 63, 6688.10.1111/1475-6765.12583CrossRefGoogle Scholar
Bjarnøe, C, Adams, J and Boydstun, A (2023) “Our issue positions are strong, and our opponents’ valence is weak”: An analysis of parties’ campaign strategies in ten western European democracies. British Journal of Political Science 53, 6584.10.1017/S0007123421000715CrossRefGoogle Scholar
Boussalis, C, Coan, TG, Holman, MR and Müller, S (2021) Gender, candidate emotional expression, and voter reactions during televised debates. American Political Science Review 115, 12421257.10.1017/S0003055421000666CrossRefGoogle Scholar
Campbell, S (1994) Being dismissed: The politics of emotional expression. Hypatia 9, 4665.10.1111/j.1527-2001.1994.tb00449.xCrossRefGoogle Scholar
Castanho, Silva B, Pullan D and J, Wäckerle (2025) Blending in or standing out? Gendered political communication in 24 democracies. American Journal of Political Science 69, 653668. https://doi.org/10.1111/ajps.12876.CrossRefGoogle Scholar
Cinar, AC and Kıbrıs, O (2023) Persistence of voice pitch bias against policy differences. Political Science Research and Methods 12, 115.Google Scholar
Cochrane, C, Rheault, L, Godbout, J-F, Whyte, T, Wong, MW-C and Borwein, S (2022) The automatic analysis of emotion in political speech based on transcripts. Political Communication 39, 98121.10.1080/10584609.2021.1952497CrossRefGoogle Scholar
Deutsch, M, Coleman, PT and Marcus, EC (2011) The Handbook of Conflict Resolution: Theory and Practice. Hoboken, NJ: John Wiley & Sons.Google Scholar
Deutsch, M (1973) The Resolution of Conflict: Constructive and Destructive Processes. New Haven, CT: Yale University Press.Google Scholar
Dietrich, B, Schultz, D and Jaquith, T (2018) This floor speech will be televised: Understanding the factors that influence when floor speeches appear on cable television. Technical Report Working Paper.Google Scholar
Dietrich, BJ, Enos, RD and Sen, M (2019) Emotional arousal predicts voting on the US Supreme Court. Political Analysis 27 (2): 237243.10.1017/pan.2018.47CrossRefGoogle Scholar
Dietrich, BJ, Hayes, M and O’Brien, DZ (2019) Pitch perfect: Vocal pitch and the emotional intensity of congressional speech. American Political Science Review 113 (4): 941962.10.1017/S0003055419000467CrossRefGoogle Scholar
Ekman, P (1992) Are there basic emotions? Psychological Review. 99 (3), 550553.10.1037/0033-295X.99.3.550CrossRefGoogle ScholarPubMed
Ekman, P (1999) Basic emotions. Handbook of Cognition and Emotion 98, .10.1002/0470013494.ch3CrossRefGoogle Scholar
Evans, S, Neave, N and Wakelin, D (2006) Relationships between vocal characteristics and body size and shape in human males: An evolutionary explanation for a deep male voice. Biological Psychology 72, 160163.10.1016/j.biopsycho.2005.09.003CrossRefGoogle Scholar
Fortunato, D, Martin, LW and Vanberg, G (2019) Committee chairs and legislative review in parliamentary democracies. British Journal of Political Science 49, 785797.10.1017/S0007123416000673CrossRefGoogle Scholar
Gennaro, Gloria and Ash, Elliott (2023) Emotion and Reason in Political Language. The Economic Journal 132 (643), 10371059. https://doi.org/10.1093/ej/ueab104.CrossRefGoogle Scholar
Gentzkow, M, Shapiro, JM and Taddy, M (2019) Measuring group differences in high-dimensional choices: Method and application to congressional speech. Econometrica 87, 13071340.10.3982/ECTA16566CrossRefGoogle Scholar
Harcup, T and O’Neill, D (2017) What is news? News values revisited (again). Journalism Studies 18, 14701488.10.1080/1461670X.2016.1150193CrossRefGoogle Scholar
Hix, S and Noury, A (2016) Government-opposition or left-right? The institutional determinants of voting in legislatures. Political Science Research and Methods 4, 249273.10.1017/psrm.2015.9CrossRefGoogle Scholar
Hjorth, F, Klemmensen, R, Hobolt, S, Hansen, ME and Kurrild-Klitgaard, P (2015) Computers, coders, and voters: Comparing automated methods for estimating party positions. Research & Politics 2, .10.1177/2053168015580476CrossRefGoogle Scholar
Høyland, B (2010) Procedural and party effects in European Parliament roll-call votes. European Union Politics 11, 597613.10.1177/1465116510379925CrossRefGoogle Scholar
Imre, M, Ecker, A, Meyer, TM and Müller, WC (2023) Coalition mood in European parliamentary democracies. British Journal of Political Science 53, 104121.10.1017/S0007123421000739CrossRefGoogle Scholar
Joo, Jungseock, Erik P, Bucy and Claudia, Seidel (2019) Automated Coding of Televised Leader Displays: Detecting Nonverbal Political Behavior with Computer Vision and Deep Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 15141523.Google Scholar
Jung, J-H and Tavits, M (2021) Valence attacks harm the electoral performance of the left but not the right. The Journal of Politics 83, 277290.10.1086/709299CrossRefGoogle Scholar
Kayser, MA, Orlowski, M and Rehmert, J (2023) Coalition inclusion probabilities: A party-strategic measure for predicting policy and politics. Political Science Research and Methods 11, 328346.10.1017/psrm.2021.75CrossRefGoogle Scholar
Klofstad, Casey A, Rindy C, Anderson and Susan, Peters (2012) Sounds like a winner: voice pitch influences perception of leadership capacity in both men and women. Proceedings of the Royal Society B: Biological Sciences 279 (1738), 26982704.10.1098/rspb.2012.0311CrossRefGoogle Scholar
Klofstad, CA (2016) Candidate voice pitch influences election outcomes. Political Psychology 37, 725738.10.1111/pops.12280CrossRefGoogle Scholar
Knox, D and Lucas, C (2021) A dynamic model of speech for the social sciences. American Political Science Review 115, 649666.10.1017/S000305542000101XCrossRefGoogle Scholar
Kosiara-Pedersen, K and Kurrild-Klitgaard, P (2018) Change and stability in the Danish party system. In Party System change, the European Crisis and the State of democracy. London/New York: Routledge, pp.6379.10.4324/9781315147116-4CrossRefGoogle Scholar
Kosmidis, S, Hobolt, SB, Molloy, E and Whitefield, S (2019) Party competition and emotive rhetoric. Comparative Political Studies 52, 811837.10.1177/0010414018797942CrossRefGoogle Scholar
Lauderdale, BE and Herzog, A (2016) Measuring political positions from legislative speech. Political Analysis 24, 374394.10.1093/pan/mpw017CrossRefGoogle Scholar
Lauridsen, GA, Dalsgaard, JA and Svendsen, LKB (2019) SENTIDA: A new tool for sentiment analysis in Danish. Journal of Language Works-Sprogvidenskabeligt Studentertidsskrift 4, 3853.Google Scholar
Laustsen, L, Petersen, MB and Klofstad, CA (2015) Vote choice, ideology, and social dominance orientation influence preferences for lower pitched voices in political candidates. Evolutionary Psychology 13, .10.1177/1474704915600576CrossRefGoogle ScholarPubMed
Laustsen, L and Petersen, MB (2017) Perceived conflict and leader dominance: Individual and contextual factors behind preferences for dominant leaders. Political Psychology 38, 10831101.10.1111/pops.12403CrossRefGoogle Scholar
Laver, M, Back, H, Debus, M and Fernandes, JM (2021) Analysing the Politics of Legislative Debate. Oxford: Oxford University Press.Google Scholar
Liu, C (2013) Just noticeable difference of tone pitch contour change for English-and Chinese-native listeners. The Journal of The Acoustical Society of America 134, 30113020.10.1121/1.4820887CrossRefGoogle ScholarPubMed
Mansbridge, J (1999) Should blacks represent blacks and women represent women? A contingent “yes”. The Journal of Politics 61, 628657.10.2307/2647821CrossRefGoogle Scholar
Martin, LW and Vanberg, G (2004) Policing the bargain: Coalition government and parliamentary scrutiny. American Journal of Political Science 48, 1327.10.1111/j.0092-5853.2004.00053.xCrossRefGoogle Scholar
Martin, LW and Vanberg, G (2011) Parliaments and Coalitions: The Role of Legislative Institutions in Multiparty Governance. Oxford, UK: Oxford University Press.10.1093/acprof:oso/9780199607884.001.0001CrossRefGoogle Scholar
Mayhew, DR (1974) Congress: The Electoral Connection. New Haven, CT: Yale university press.Google Scholar
McCarty, N, Poole, KT and Rosenthal, H (2016) Polarized America: The Dance of Ideology and Unequal Riches. Cambridge, MA: MIT Press.Google Scholar
Neumann, M, Fowler, EF and Ridout, TN (2022) Body Language and Gender Stereotypes in Campaign Video. Computational Communication Research 4 (1), .10.5117/CCR2022.1.007.NEUMCrossRefGoogle Scholar
Neumann, M (2019) Hooked with phonetics: The strategic use of style-shifting in political rhetoric. In Annual Meeting of the American Political Science Association. Washington, DC.Google Scholar
Osnabrügge, M, Hobolt, SB and Rodon, T (2021) Playing to the gallery: Emotive rhetoric in parliaments. American Political Science Review 115, 885899.10.1017/S0003055421000356CrossRefGoogle Scholar
Owren, MJ and Bachorowski, J-A (2007) Measuring emotion-related vocal acoustics. Handbook of Emotion Elicitation and Assessment 239266.10.1093/oso/9780195169157.003.0016CrossRefGoogle Scholar
Patel, S and Scherer, KR (2013) Vocal Behavior In Hall, JA and Knapp, ML (eds), Berlin/Boston, Nonverbal Communication, Nonverbal Communication, pp.167204.10.1515/9783110238150.167CrossRefGoogle Scholar
Peterson, A and Spirling, A (2018) Classification accuracy as a substantive quantity of interest: Measuring polarization in westminster systems. Political Analysis 26, 120128.10.1017/pan.2017.39CrossRefGoogle Scholar
Pisanski, K, Cartei, V, McGettigan, C, Raine, J and Reby, D (2016) Voice modulation: A window into the origins of human vocal control?. Trends in Cognitive Sciences 20, 30431810.1016/j.tics.2016.01.002CrossRefGoogle ScholarPubMed
Proksch, S-O, Lowe, W, Wäckerle, J and Soroka, S (2019) Multilingual sentiment analysis: A new approach to measuring conflict in legislative speeches. Legislative Studies Quarterly 44, 97131.10.1111/lsq.12218CrossRefGoogle Scholar
Proksch, S-O and Slapin, JB (2012) Institutional foundations of legislative speech. American Journal of Political Science 56, 520537.10.1111/j.1540-5907.2011.00565.xCrossRefGoogle Scholar
Rabiner, L and Schafer, R (2010) Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Prentice Hall Press.Google Scholar
Rauh, C and Schwalbach, J (2020) The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies. https://doi.org/10.7910/DVN/L4OAKN.CrossRefGoogle Scholar
Rheault, L and Borwein, S (2019) Multimodal techniques for the study of a ect in political videos. Technical Report Working Paper.Google Scholar
Rheault, L and Cochrane, C (2020) Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis 28, 112133.10.1017/pan.2019.26CrossRefGoogle Scholar
Rittmann, O (2024) Legislators’ Emotional Engagement with Women’s Issues: Gendered Patterns of Vocal Pitch in the German Bundestag. British Journal of Political Science 54 (3), 937945. https://doi.org/10.1017/S0007123423000285.CrossRefGoogle Scholar
Roberts, ME, Stewart, BM, Tingley, D, Lucas, C, Leder-Luis, J, Gadarian, SK, Albertson, B and Rand, DG (2014) Structural topic models for open-ended survey responses. American Journal of Political science 58, 10641082.10.1111/ajps.12103CrossRefGoogle Scholar
Scherer, KR, Banse, R and Wallbott, HG (2001) Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural psychology 32, 7692.10.1177/0022022101032001009CrossRefGoogle Scholar
Scherer, KR, Johnstone, T and Klasmeyer, G (2003) Vocal Expression of Emotion., 40. New York, NY: Oxford University Press.Google Scholar
Scherer, KR, Ladd, DR and Silverman, KEA (1984) Vocal cues to speaker affect: Testing two models. The Journal of The Acoustical Society of America 76, 13461356.10.1121/1.391450CrossRefGoogle Scholar
Schulz, I (2007) The journalistic gut feeling. Journalism Practice 1, 190207.10.1080/17512780701275507CrossRefGoogle Scholar
Schwalbach, J (2023) Talking to the populist radical right: A comparative analysis of parliamentary debates. Legislative Studies Quarterly 48, 371397.10.1111/lsq.12397CrossRefGoogle Scholar
Serra, G (2010) Polarization of what? A model of elections with endogenous valence. The Journal of Politics 72, 426437.10.1017/S0022381609990880CrossRefGoogle Scholar
Shapley, LS and Shubik, M (1954) A method for evaluating the distribution of power in a committee system. American Political Science Review 48, 787792.10.2307/1951053CrossRefGoogle Scholar
Skytte, R (2021) Dimensions of elite partisan polarization: Disentangling the effects of incivility and issue polarization. British Journal of Political Science 51, 14571475.10.1017/S0007123419000760CrossRefGoogle Scholar
Slapin, JB and Proksch, S-O (2008) A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52, 705722.10.1111/j.1540-5907.2008.00338.xCrossRefGoogle Scholar
Taylor, J Damann, Dean, K and Christopher, Lucas (2023) A framework for studying causal effects of speech style: application to US presidential campaigns, Journal of the Royal Statistical Society Series A: Statistics in Society, 2025, qnaf059. https://doi.org/10.1093/jrsssa/qnaf059.Google Scholar
Tigue, CC, Borak, DJ, O’Connor, JJM, Schandl, C and Feinberg, DR (2012) Voice pitch influences voting behavior. Evolution and Human Behavior 33, 210216.10.1016/j.evolhumbehav.2011.09.004CrossRefGoogle Scholar
Touati, P (1993) Prosodic aspects of political rhetoric. In ESCA Workshop on Prosody.Google Scholar
Tusing, KJ and Dillard, JP (2000) The sounds of dominance. Vocal precursors of perceived dominance during interpersonal influence. Human Communication Research 26, 148171.Google Scholar
Vainio, M, Suni, A, Šimko, J Kakouros, S (2023) The power of prosody and prosody of power: An acoustic analysis of finnish parliamentary speech. arXiv preprint arXiv:2305.16040.Google Scholar
Webster, SW (2021) The role of political elites in eliciting mass-level political anger. In The Forum., 19, Berlin/Boston: De Gruyter, pp.415437.Google Scholar
Widmann, T (2021) How emotional are populists really? Factors explaining emotional appeals in the communication of political parties. Political Psychology 42, 163181.10.1111/pops.12693CrossRefGoogle Scholar
Wonka, A and Göbel, S (2016) Parliamentary scrutiny and partisan conflict in the Euro crisis. The case of the German Bundestag. Comparative European Politics 14, 215231.10.1057/cep.2015.43CrossRefGoogle Scholar
Zuckerman, M and Driver, RE (1989) What sounds beautiful is good: The vocal attractiveness stereotype. Journal of Nonverbal Behavior 13, 6782.10.1007/BF00990791CrossRefGoogle Scholar
Figure 0

Figure 1. Directed acyclical graph illustrating documented and hypothesized links between pitch, arousal, conflict, and other causes. The link from partisan conflict is highlighted to illustrate that we expect this motive to dominate in the context of dyadic exchanges.

Figure 1

Figure 2. Coefficients for partisan polarization (left panel) and policy conflict (right panel) with standardized pitch as the outcome. Predictors are whether the target party is outbloc (left panel) and whether the target party voted differently in a legislative vote on a specific bill. Standard errors are clustered at the dyad level (speaker party $\leftrightarrow$ target party). Thick and thin error bars are the model-specific 90 and 95 pct. confidence intervals respectively. $Y$-axes are held fixed across the two panels to maximize comparability. (a) Partisan polarization and (b) policy conflict.

Figure 2

Figure 3. Coefficients for debate type (left panel) and bargaining leverage (right panel) with standardized pitch as the outcome. The predictors are whether the speech is given in a high- compared to a low-profile debate (left panel) and the policy bargaining leverage of the target party (right panel). Standard errors are clustered at the dyad level (speaker party $\leftrightarrow$ target party). Thick and thin error bars are the model-specific 90 and 95 pct. confidence intervals respectively. $Y$-axes are held fixed across the two panels to maximize comparability. (a) Electoral: high- vs. low-profile debates. (b) Policy: target party bargaining leverage.

Supplementary material: File

Rask and Hjorth supplementary material

Rask and Hjorth supplementary material
Download Rask and Hjorth supplementary material(File)
File 1.1 MB
Supplementary material: Link

Rask and Hjorth Dataset

Link