Democratic backsliding is an incremental process. Although it occurs incrementally (Bermeo Reference Bermeo2016; Levitsky and Ziblatt Reference Levitsky and Ziblatt2018; Wunsch and Blanchard Reference Wunsch and Blanchard2023), existing quantitative studies of how citizens respond to democratic backsliding have largely treated it as a static concept. Measuring voter behavior in a static setting, a rich empirical literature examines voters’ choice between democratic and undemocratic candidates, showing that partisanship, policy interests, and beliefs about politician competence dampen citizens’ willingness to vote against undemocratic candidates (for example, Carey et al. Reference Carey, Clayton, Helmke, Nyhan, Sanders and Stokes2022; Frederiksen Reference Frederiksen2022, Reference Frederiksen2024; Graham and Svolik Reference Graham and Svolik2020).
These empirical studies, while yielding valuable insights into the general willingness of citizens to condone undemocratic behavior and the factors that shape such willingness, are somewhat disconnected from the theoretical concept of democratic backsliding. First, while backsliding occurs when elected officials erode democracy, the hypothetical candidates shown in most existing experiments are potential officeholders who have yet to become ‘erosion agents’ (Kneuer Reference Kneuer2021). Thus, while these experiments probe the electability of political candidates who show tendencies to disrespect democratic norms, they fail to capture citizens’ willingness to expel democracy-transgressing incumbents that lies at the heart of citizens’ ability to safeguard democracy (Fearon Reference Fearon2011; Weingast Reference Weingast1997). Second, while democratic backsliding is an incremental process, these experiments capture a snapshot of voter preferences after presenting respondents with single-period transgressions. Thus, the existing empirical literature does not address a fundamental question about citizens’ willingness to remove the incumbent amid incremental, multi-period transgressions: if the incumbent subverts democracy step by step, will the public remove them from office promptly? In addition, which sequence of democratic transgressions is less likely to generate accountability and therefore more harmful to democracy from a public opinion perspective?
Theoretically, sequence matters because authoritarian-minded incumbents can strategize the ordering of democratic transgressions when subverting democracy. For example, the incumbent can start with relatively mild democratic transgressions to make citizens less alert that they are indeed undemocratic, and proceed with more severe transgressions in later periods to further erode democracy (Chiopris et al. Reference Chiopris, Nalepa and Vanberg2025). I call this sequence the strategy of increasing severity, and by severity I refer to the harmfulness of an incumbent’s action to democracy as judged by the public.Footnote 1 Alternatively, the incumbent can start with relatively severe transgressions to lower citizens’ expectations of them, and proceed with less severe transgressions in later periods to beat this lowered standard while continuing to erode democracy incrementally (Grillo and Prato Reference Grillo and Prato2023). I call this sequence the strategy of decreasing severity. Does sequence matter in shaping voter behavior amid democratic backsliding? Empirically, we lack an answer to this question because existing quantitative studies tend to study how citizens respond to democratic backsliding in a static setting (for a review, see Supplementary Material Section A).
I devise a large-scale, preregistered survey experiment (
$N = 4,234$
) to test the impact of sequence on voters’ willingness to remove the incumbent as episodes of democratic backsliding unfold. Focusing on US state democracy, I randomize different sequences of democratic transgressions committed by governors, followed by measuring at which point American respondents are willing to remove them from office as they erode democracy step by step. Specifically, I manipulate whether and in what sequence the hypothetical governor – whose partisanship and policy positions match those of respondents – transgresses democracy, and elicit respondents’ vote choice in a recall election (that is, a political institution that allows voters to immediately remove an incumbent from office, which represents a strong form of voter punishment in the US setting) against the governor in multiple time periods. My design thus contrasts with existing experiments that compare two politicians in a static setting, allowing me to instead interrogate voter behavior regarding a single politician over multiple periods. By devising a novel dynamic experiment that captures the incremental nature of democratic backsliding, I offer the first empirical test on how different sequences of democratic transgressions shape voters’ reactions to backsliding incumbents.
Three important insights emerge from my dynamic approach. First, I find that incumbents who incrementally decreased the severity of democratic transgressions were held accountable by respondents more promptly, compared to incumbents who incrementally increased the severity with the same set of transgressions. This finding sheds light on the micro-foundations of the strategy of ‘stealth authoritarianism’ raised in the theoretical literature and illuminates why authoritarian-minded incumbents in the real world often refrain from vehemently challenging democracy immediately after winning an election. Second, however, the results also suggest that transgressing democracy in a sporadic sequence could be as effective a strategy for backsliding incumbents as incrementally increasing the severity of transgressions. Third, by allowing respondents to observe and revise their beliefs about the intentions of the governor, my experiment shows that – even when partisan and policy interests were at stake – most respondents were eventually willing to electorally remove the governor who incrementally transgressed democracy. While prominent empirical work on how citizens respond to undemocratic politicians in a static setting suggests a gloomy outlook for Americans’ willingness to defend democracy (Graham and Svolik Reference Graham and Svolik2020; Simonovits et al. Reference Simonovits, McCoy and Littvay2022; cf. Holliday et al. Reference Holliday, Iyengar, Lelkes and Westwood2024), my experiment – which recognizes democratic backsliding as an incremental and dynamic process – paints a more nuanced picture.
Overall, my study underscores the importance of studying democratic backsliding dynamically, introduces a new empirical approach to evaluating voter behavior amid incremental erosion of democracy, and shows a proof of concept that the timing of democratic transgressions matters in shaping political accountability. Because my experiment was situated in a US subnational contextFootnote 2 (whose institutional setting may differ from the others) and abstracted itself from the real-world environment (similar to many prominent static experiments), the methodological innovation put forward by this study does not aim to give the last word on the temporal dynamics in democratic backsliding. Rather, it invites this burgeoning and important scholarship to move away from static approaches to study democratic backsliding with greater conceptual precision and tighter theoretical mapping. By building on the dynamic approach introduced by this study, future research can unpack new and useful insights into how citizens react amid incremental and sequential transgressions of democracy.
Theoretical Background
Existing literature sheds important light on the extent to which the public is willing to condone democratic transgressions, as well as who is particularly willing to condone them. Many empirical investigations are conducted in the style of candidate-choice survey experiments, where respondents are presented with pairs of conjoint tables that vary the descriptions of hypothetical candidates. While the specific manipulations differ, most of them accord with the following logic: Candidate A, on the one hand, offers democratic benefits by adhering to democratic principles; Candidate B, on the other hand, offers partisan, policy, and/or other benefits. Unlike Candidate A, however, Candidate B has made – or has revealed a tendency to make – democratic transgressions. Support for Candidate B over Candidate A is then taken as evidence of citizens’ unwillingness to defend democracy. This is because respondents fail to punish undemocratic candidates by voting against them. Existing scholarship asserts that these individuals fail the ‘stress test’ (Svolik et al. Reference Svolik, Avramovska, Lutz and Milaèiæ2023).
Based on this design that effectively captures how respondents make trade-offs between hypothetical politicians, the existing literature shows that partisanship and policy agreement are two major factors driving support for undemocratic candidates. Graham and Svolik (Reference Graham and Svolik2020, 392), for example, find that ‘only a small fraction of Americans prioritize democratic principles in their electoral choices, and their tendency to do so is decreasing in several measures of polarization, including the strength of partisanship, policy extremism, and candidate platform divergence’. A series of vignette-based experiments further suggests that citizens are biased by their partisanship and policy interests when evaluating whether undemocratic candidates are indeed transgressing democracy (Krishnarajan Reference Krishnarajan2023; Şaşmaz et al. Reference Şaşmaz, Yagci and Ziblatt2022; Simonovits et al. Reference Simonovits, McCoy and Littvay2022). While this line of scholarship provides a pessimistic account of citizens’ ability to safeguard democracy, some experiments offer a more optimistic account (Carey et al. Reference Carey, Clayton, Helmke, Nyhan, Sanders and Stokes2022; Gidengil et al. Reference Gidengil, Stolle and Bergeron-Boutin2022; Wunsch et al. Reference Wunsch, Jacob and Derksen2025). Reconciling both accounts, Frederiksen (Reference Frederiksen2024) clarifies that partisan and policy alignments do not remove citizens’ ability to discern undemocratic behavior, but the discernment itself is often insufficient to shift their voting decisions when the comparison is between an undemocratic candidate who offers partisan or policy benefits and a democratic candidate who does not offer such benefits.
These experiments have significantly advanced our knowledge of how partisanship and policy preferences color individuals’ support for undemocratic politicians, but they deviate from the concept of democratic backsliding in two important ways. First, while the theoretical literature defines the concept as ‘the incremental erosion of democratic institutions, rules and norms that results from the actions of duly elected governments’ (Haggard and Kaufman Reference Haggard and Kaufman2021, 1), respondents in most existing experiments are asked to evaluate political candidates – that is, aspiring officeholders – who have yet to become ‘erosion agents’ (Kneuer Reference Kneuer2021). As respondents in such settings need not infer from the candidates’ past actions that these politicians will transgress democracy when elected, voting for them does not necessarily imply unwillingness to punish elected officials who engage in democratic backsliding. Second, by presenting respondents with single-period transgressions and measuring voter preferences in a static setting, existing experiments ignore the dynamic and incremental nature of democratic backsliding. The theoretical literature conceptualizes democratic backsliding as ‘a decline in the quality of democracy … occurring through a discontinuous series of incremental actions’ (Waldner and Lust Reference Waldner and Lust2018, 95). Because a democracy ‘dies’ when it is too late for the public to remove the incumbent by democratic means (Levitsky and Ziblatt Reference Levitsky and Ziblatt2018), citizens’ ability to remove undemocratic incumbents in a timely manner is key (Fearon Reference Fearon2011; Weingast Reference Weingast1997). Experiments with tighter mapping onto theory should therefore measure at which point the public is able to remove the elected official, not whether the public – at a given point – is willing to vote for a candidate who has yet to subvert democracy.
Understanding the public’s role in safeguarding democracy requires us to study voter behavior regarding a single elected politician across multiple periods. After all, episodes of democratic backsliding unfold incrementally. Whenever an incumbent takes an action (for example, changing the election law), citizens need to first assess whether it constitutes a democratic transgression before deciding whether to act against it. Because citizens are characterized by their inherent uncertainty about the incumbent’s true intentions (Chiopris et al. Reference Chiopris, Nalepa and Vanberg2025), they may not be willing to remove the incumbent upon exposure to the first and only democratic transgression. Instead, they may choose to wait and see how the incumbent behaves in later periods until they can better pinpoint the incumbent’s intentions and make a more informed voting decision. Problematically, in a static experimental setting where individuals can only observe one-off actions done by the politician and are instantly required to express their vote choice, failure to vote against the politician in the same period is interpreted by researchers as evidence of support for democratic backsliding. A dynamic setting, by contrast, allows individuals to update their beliefs about – and therefore their preferred actions against – the politician as they observe the politician’s new actions in future periods. Unless the incumbent has already laid the groundwork that makes it impossible for the public to remove them from office through free and fair elections (Luo and Przeworski Reference Luo and Przeworski2023), pro-democracy citizens can reasonably choose to hold the incumbent accountable not in the very first period but in a later period where the incumbent’s authoritarian intentions have become clearer. Empirical scholarship that takes a static approach, however, tends to view these individuals as ‘anti-democratic’ citizens biased by their partisan or policy interests. It overlooks the reality that the same individuals may simply need more periods to discern the incumbent’s intentions – an important feature of democratic backsliding that can only be captured in a dynamic, multi-period setting.
Because existing quantitative studies fall short of addressing the incremental nature of democratic backsliding, whether the public can remove an undemocratic incumbent (that is, an elected official who uses their political power to reduce contestation or inclusiveness in a polity) promptly remains an open empirical question. This question is especially important in countries where voters are endowed with electoral opportunities to remove undemocratic incumbents from office. Voters in Turkey, for example, sanctioned the Justice and Development Party (AKP) in the 2019 Istanbul mayoral election following ‘incumbent-driven attempts to undermine democracy’ (Svolik Reference Svolik2023, 648). However, they were not able to oust Recep Tayyip Erdoğan in a later presidential election when given the opportunity to do so (Esen and Gumuscu Reference Esen and Gumuscu2023). Voters in Poland, on the other hand, failed to punish Andrzej Duda in the 2020 presidential election (Jacob Reference Jacob2025). Yet a record high turnout ousted the Law and Justice Party (PiS) from power in the following parliamentary election (Benson Reference Benson2023). In the United States, numerous election-denying incumbents ran for re-elections and were subjected to electoral sanctions by the public (Malzahn and Hall Reference Malzahn and Hall2025; cf. Bartels and Carnes Reference Bartels and Carnes2023). Where the institutional setting permits, voters in democracies often have the power to stop undemocratic incumbents from further eroding democracy; the question is whether they can utilize the institution promptly.
In addition to regular elections, recalls are another institutional mechanism by which voters can hold incumbents accountable. In twenty-five democracies, recalls exist at the local, regional, or even national level (Venice Commission 2019). For instance, nineteen American states have recall elections that allow voters to remove elected state officials – including governors who can exert strong influence over the electoral quality in their state – from office before their term ends. In the United States, removal of undemocratic governors would entail three parts: first, institutional possibilities of a recall; second, actual holding of a recall; and third, voter willingness to recall. My analysis focuses on the last part, as it directly taps into voter willingness to punish undemocratic incumbents amid democratic backsliding, allowing me to connect my findings to existing empirical scholarship that also focuses on measuring electoral backlash against anti-democratic politicians (see Supplementary Material Section A). Recall decisions also serve as a useful measure of citizens’ strong form of punishment for democratic transgressions, as a successful recall would immediately remove the undemocratic incumbent from office before their term ends.
Theoretical Expectations
Building on the empirical literature that makes general observations about voter judgment on undemocratic politicians, I begin with a general hypothesis that individuals will remove elected officials who transgress democracy more promptly than they will remove elected officials who do not transgress democracy.Footnote 3 Existing scholarship argues that citizens do value democracy (Carey et al. Reference Carey, Clayton, Helmke, Nyhan, Sanders and Stokes2022), can identify undemocratic behavior (Aarslew Reference Aarslew2023), and are generally not blinded by partisanship and policy preferences when making democratic evaluations (Frederiksen Reference Frederiksen2024).Footnote 4 Therefore, citizens, when given the electoral opportunity to remove the incumbent from office, should expel an undemocratic incumbent sooner than they would remove an alternative incumbent who even makes a series of politically controversial governance decisions – actions that are largely non-partisan and not inherently undemocratic. In a US subnational setting, these controversial governance actions, which could also provide citizens with legitimate reasons for a recall, can range from vetoing a bill and deferring state budgets to simply providing low-quality constituent services.
As a starting point, I compare an undemocratic incumbent with a democratic but politically controversial incumbent, because this comparison constitutes a harder test of citizens’ willingness to hold undemocratic incumbents accountable. If the comparator were instead politically uncontroversial, citizens’ timelier removal of the undemocratic incumbent would be relatively trivial.
H1: An incumbent will be more likely to be held accountable by voters if they transgress democracy step by step, compared to an incumbent who takes a sequence of politically controversial actions that do not fundamentally violate democratic principles.
Taking the dynamic and incremental nature of democratic backsliding further, I argue that the sequence of incumbent actions matters. For an incumbent aspiring to subvert democracy while surviving in office, they can strategize not only the type of democratic transgressions they make but also the general sequence of transgressions they take. Of course, the menu of strategies for the incumbent is not infinite and can be bounded by institutional constraints. This is why authoritarian-minded incumbents typically aspire to lay groundwork actions (Clayton Reference Clayton2024) and change the constitution (Scheppele Reference Scheppele2018) to ease the institutional constraints and facilitate future power grabs. My theory and experiment focus on piecemeal democratic transgressions that incumbents can already attempt, consistent with the type of transgressions typically studied in the empirical literature on democratic backsliding (Ahmed Reference Ahmed2023). Because my framework emphasizes incremental transgressions, it does not require their severity to vary substantially – an important point that guided how I operationalized democratic transgressions in my experiment (see the next section).
Theoretically, I propose that the incremental steps by which the anti-democratic incumbent transgresses democracy play an important role in shaping how voters respond to them. While the permutations of democratic transgressions are many (Wunsch and Blanchard Reference Wunsch and Blanchard2023), an incumbent has two general ways to subvert democracy step by step. First is incrementally increasing the severity of democratic transgressions. Second is incrementally decreasing the severity.
The Logic of Increasing Severity
The first sequence of actions accords with the notion of ‘stealth’ in the theoretical literature (Varol Reference Varol2015). When the incumbent comes to power by democratic means, they refrain from revealing their authoritarian tendency immediately. Rather, they play ‘a wolf in sheep’s clothing’ and take small, successive steps to incrementally consolidate power (Chiopris et al. Reference Chiopris, Nalepa and Vanberg2025). For example, instead of making drastic changes to voting laws to completely tilt the electoral playing field early in their tenure, the incumbent may tweak the media system to reduce journalistic scrutiny or alter the campaign finance regulation to make fundraising easier for themselves and harder for their political opponents. The crux of this strategy is to reduce citizens’ likelihood of identifying their authoritarian intentions in early periods.
By making milder democratic transgressions at the beginning, the incumbent lays the groundwork for future power grabs while stifling citizens’ ability to identify their authoritarian intentions and expel them from office on time. One example in the US subnational context is introducing a bill that allows the state legislature to review the ballot counting process in the state, with the intent to reject unfavorable electoral outcomes in the future (Clayton Reference Clayton2024; Scheppele Reference Scheppele2018). Status quo bias further highlights the prospect of incrementally increasing the severity of transgressions.Footnote 5 Because ‘individuals disproportionately stick with the status quo’ in their decision making (Samuelson and Zeckhauser Reference Samuelson and Zeckhauser1988, 7), they may be less able to revise their evaluation of – and recall – the incumbent instantaneously. Indeed, multiple studies have found evidence of status quo bias in citizens’ formation of political attitudes (Arceneaux and Nicholson Reference Arceneaux and Nicholson2024; Haselswerdt and Bartels Reference Haselswerdt and Bartels2015; Jerit Reference Jerit2009). This psychological mechanism points to the theoretical prediction that slow, steady, and incremental increases in the severity of democratic transgressions can make each transgression difficult for the public to detect and react promptly.
The Logic of Decreasing Severity
The second sequence of actions is the reverse of the first. For example, the incumbent starts off by directly tilting the electoral playing field or by bypassing the legislature to make unilateral decisions by decree – actions that are considered highly undemocratic by the public (Carey et al. Reference Carey, Helmke, Nyhan, Sanders and Stokes2019; Chu & Williamson, Reference Chu and Williamson2025; Chu et al. Reference Chu, Williamson and Yeung2024). Subsequently, they follow up with transgressions that are perceivably less undemocratic, such as making minor tweaks to media or campaign finance laws.
The logic of this strategy is to exploit human beings’ reliance on reference points to evaluate alternatives. Because evaluations are powerfully shaped by reference points, when individuals’ evaluation of the reference point changes, their evaluation of the existing alternative also changes (Kőszegi and Rabin Reference Kőszegi and Rabin2006; Quattrone and Tversky Reference Quattrone and Tversky1988). Thus, if the referent becomes less favorable, the existing alternative will be subjected to a lower standard and, consequently, it appears more desirable (Kahneman Reference Kahneman1992). When citizens evaluate an incumbent, one readily available reference point is their performance in the recent past (Olsen Reference Olsen2017). Analyzing a model that captures citizens’ reference-dependent preferences and allows the incumbent to first choose whether to transgress democracy and then how much to double down on the transgression, Grillo and Prato (Reference Grillo and Prato2023, 71) show the theoretical premise of incrementally decreasing the severity of democratic transgressions: ‘By challenging norms of democracy, an incumbent can lower citizens’ expectations; by not doubling down on this challenge, he can then beat this lowered standard’. Strategic incumbents can therefore exploit individuals’ reliance on reference points to make evaluations, beginning with severe democratic transgressions and proceeding with milder ones afterward. As long as the incumbent can survive in early periods, the relatively mild transgressions they make in later periods will alert citizens to a lesser degree than the more severe transgressions they would otherwise commit if they did not transgress with the most severe actions first.
While the two general forms of action sequence delineated above have respective and logically coherent psychological foundations, which strategy may help the undemocratic incumbent mitigate voter accountability is an open empirical question. Both sequences are theoretically plausible to yield political dividends for the undemocratic incumbent.
RQ: How do different sequences of democratic transgressions, which vary in their severity incrementally, affect voter accountability?
Experimental Design
To gain empirical leverage, I conducted a preregistered survey experiment on 4,234 American adults in January 2024. This sample did not include respondents who failed a pretreatment attention check. Partnering with PureSpectrum, I recruited respondents with sociodemographic characteristics mirroring the national benchmarks in sex, age, race, partisanship, and state of residence (see the sample characteristics in Table S2 and Figure S1 and information about the survey vendor in Supplementary Material Section C). Before entering the experimental module, respondents completed a pretreatment questionnaire that included items about their demographic background, dispositional characteristics (for example, partisan affect and anti-establishment orientations), partisanship, and policy preferences.
Set-up
After completing the pretreatment questionnaire, respondents read the definition of recall:
In the United States, a recall election is a procedure that allows voters to remove an elected official from office before their term ends. Historically, both Democratic and Republican governors have been recalled and removed from office.
Many states have recall elections. In these states, the governor could face more than one recall election when in office. That is, even if they survive an initial recall, they may still face other recall elections in the future.
After learning about the recall institution in the United States, respondents read a scenario about a hypothetical governor in their state. I focused on state governors because recent scholarship argues that due to federalism, governors are critical agents of democratic backsliding in the United States (Grumbach Reference Grumbach2022). Many policies that transgress democracy were enacted not at the national level but at the state level (Mickey Reference Mickey2022). Under the leadership of former governors Pat McCrory and Scott Walker, for example, North Carolina and Wisconsin suffered substantial decreases in democratic performance (Grumbach Reference Grumbach2023).
Next, I informed respondents that the setting was year 2026, where the elected governor and the respondents shared the same partisanship and policy preferences. By restricting respondents’ partisan and policy interests to be perfectly aligned with the governor’s, I offer a hard test of citizens’ willingness to recall the governor amid democratic backsliding. I included policy preferences that spanned four areas: education finance, taxation, immigration, and marijuana (Graham and Svolik Reference Graham and Svolik2020). I matched the partisan and policy interests between the governor and the respondents by using their answers to pretreatment questions (and for respondents identified as Independents, I randomized the partisanship of the governor). Such matching allowed me to not only provide a hard test but also effectively hold these two variables – major potential confounders of Americans’ willingness to punish undemocratic politicians according to the existing literature – constant in the experimental environment. If I did not specify the governor’s partisanship and policy preferences, Democrats would likely infer that the democracy-transgressing governor was Republican, and vice versa, especially given Americans’ widespread distrust of out-party members in upholding democratic norms (Braley et al. Reference Braley, Lenz, Adjodah, Rahnama and Pentland2023). In addition, the partisan and policy alignments enabled me to better examine at which point Americans will withdraw their support for the incumbent.
Experimental Conditions
After reading the background of the governor, respondents entered the experimental module. Here, I developed a new experimental paradigm to study democratic backsliding. To explicitly capture the incremental nature of democratic backsliding, I measured at which point the respondent voted to recall the governor amid a series of incumbent actions. This unique design speaks to much of the formal literature that models citizen-incumbent interactions across multiple periods (Chiopris et al. Reference Chiopris, Nalepa and Vanberg2025; Helmke et al. Reference Helmke, Kroeger and Paine2022; Luo and Przeworski Reference Luo and Przeworski2023) and is novel in the existing empirical literature that largely ignores the incremental and citizen-incumbent dynamics of democratic backsliding (Druckman Reference Druckman2024). That said, I did not design the experiment to test specific game-theoretic models. As a first-cut empirical test, the institutional environment was held constant in my experiment; initial incumbent actions did not interfere with recall institutions.
Respondents read six actions taken by the governor, each separated by three months according to the treatment vignette. The sequence of actions was randomized into one of four experimental conditions (see the covariate balance in Figures S2–S4 in Supplementary Material Section C):
-
1. Increasing Severity (
$N = 1,061$
): The governor makes six democratic transgressions one by one, each of which increases in its harmfulness to democracy compared to the previous one. -
2. Decreasing Severity (
$N = 1,046$
): The governor makes six democratic transgressions one by one, each of which decreases in its harmfulness to democracy compared to the previous one. -
3. Sporadic Severity (
$N = 1,067$
): The governor makes six democratic transgressions one by one, which do not follow a specific sequential pattern of harmfulness to democracy. -
4. No Transgressions (
$N = 1,060$
): The governor takes six policy actions one by one, each of which is politically controversial but does not violate democratic principles (for example, vetoing a bill, deferring the new state budget, and providing low-quality constituent services). This condition serves as the baseline comparison for H1, which expected that democratic transgressions would increase voters’ willingness to recall in a timelier manner. The governor was fixed to be politically controversial because an uncontroversial governor would otherwise be unlikely to generate any willingness to recall, weakening my empirical test on Americans’ willingness to hold the undemocratic incumbent accountable.
Table 1 shows the list of governor actions that I employed in my experiment to operationalize democratic transgressions. These statements of governor actions – which tap into piecemeal transgressions of democracy rather than egregious actions such as abolishing term limits and engaging in political violence against the opposition – were carefully pretested and selected from a broader list of statements in another national survey conducted on 1,568 respondents (also recruited by PureSpectrum to match national benchmarks in sex, age, and race) in November 2023. In my pretest, I presented statements of democratic transgressions to each respondent and asked them to rate the harmfulness of each governor action to democracy. I focus on subjective severity because while mass perceptions of democratic transgressions’ harmfulness – or evaluations of democracy in general – may not map tightly onto scholarly or expert judgments (Yeung Reference Yeung2023), individuals act upon their subjective beliefs. That is, political behavior is often driven not by objective reality, but by people’s perceptions of reality (Achen and Bartels Reference Achen and Bartels2016). Building on this premise and based on the pretest results reported in Supplementary Material Section D, I ranked – from the standpoint of the public – the severity of the democratic transgressions from most to least harmful to democracy:
-
1. (Most severe) Proposing to eliminate the bipartisan election commission so that the governor’s party can exercise stronger control over how elections are run
-
2. (Second most severe) Signing a redistricting plan that gave the governor’s party a few extra seats despite a decline in the polls
-
3. (Third most severe) Ruling by executive order as legislators from the opposing party did not pass legislation the governor favored
-
4. (Fourth most severe) Banning rallies held by extremists of the opposing party in the state capital
-
5. (Fifth most severe) Supervising law enforcement investigations of politicians and their associates
-
6. (Least severe) Barring election officials from accepting funds from non-profit, non-partisan organizations that help administer voting
Table 1. Undemocratic governor actions in the experiment and their mapping onto the concept of democratic transgressions

Each action maps closely onto Dahl’s (Reference Dahl1971) conceptualization of democracy that spans the procedural and liberal dimensions of democracy (column 2 in Table 1), which emphasize electoral fairness (items 1 and 2), checks and balances (item 3), civil liberties (item 4), rule of law (item 5), and electoral availability (item 6). They comprise the four general types of democratic transgressions analyzed by both theoretical and empirical literature on democratic backsliding (Ahmed Reference Ahmed2023; column 3 in Table 1): first, power-consolidating changes to democratic institutions (items 1, 2, and 6); second, violations of democratic norms (item 3); third, violations of democratic ideals (item 4); and fourth, violations of the law (item 5). These democratic transgressions also correspond to declines in the ‘basic predicates of democracy’ (Ginsburg and Huq Reference Ginsburg and Huq2018, 43) that characterize democratic backsliding (column 4 in Table 1).
Because some of these transgressions were examined in past experiments, I adapted their survey wording where possible to ensure that my operationalization of democratic transgressions can speak to existing empirical scholarship. Importantly, these transgressions are less bounded by institutional constraints and therefore plausibly committed by US governors. Former Democratic Governor Martin O’Malley, for example, said that his drawing of a redistricting map in Maryland was his intent to ‘create a district where the people would be more likely to elect a Democrat than a Republican’ (Totenberg et al. Reference Totenberg, Montanaro and Parks2019). Many Republican states, on the other hand, ‘passed laws that ban or severely restrict the acceptance of private money to help local administrators in running elections’ with the backing of Republican governors (States United Democracy Center, Protect Democracy, and Law Forward 2021, 8n30).Footnote 6 These laws substantially undermined local election officials’ ability to run elections, especially amid the COVID-19 pandemic that made voting unprecedentedly costly to administer. In these real-world examples, the governors played an important role in subverting electoral fairness and availability at the state level, in order to reduce public contestation and inclusiveness to consolidate their existing electoral advantage.
I clarify three additional design features before describing my outcome measure. First, I designed the Sporadic Severity condition to mimic a series of transgressions that did not follow a specific sequential pattern. The sequence of events shown to respondents in this condition was 3
$ \to $
4
$ \to $
2
$ \to $
5
$ \to $
1
$ \to $
6. Hence, there were no three consecutive governor actions that pointed to a specific direction of severity throughout the Sporadic Severity condition, which contrasted sharply with the Increasing Severity and Decreasing Severity conditions. An alternative choice of the comparison group would be to design a ‘Constant Severity’ condition (which future research should explore), but it would map less tightly onto a counterfactual incumbent who does not strategize the sequence of transgressions. Another fruitful way to operationalize the comparison group would be to fully randomize the transgression sequence, but it would loosen experimental control by inadvertently capturing permutations with other severity patterns.
Second, the validity of my design did not hinge on all respondents agreeing with the severity ranking. This is because my estimand of interest is the intention-to-treat (ITT) that compares hazard rates at the group – not individual – level (see Estimation Strategy below) and my pretest revealed that, on average, respondents subscribed to the severity ranking (Supplementary Material Section D).Footnote 7 This estimand connects closely to my theory because in real-world democratic backsliding, the sequence of democratic transgressions made by a given incumbent is standardized across the population they govern rather than tailor-made for different citizens living in the same country.
Third, I provided impartial descriptions of the governor’s actions across all experimental conditions. In the real world, politicians can use rhetorical strategies to justify their transgressions (Bessen Reference Bessen2024; Clayton et al. Reference Clayton, Davis, Nyhan, Porter, Ryan and Wood2021; Grossman et al. Reference Grossman, Kronick, Levendusky and Meredith2022; Stokes Reference Stokes2025). But in order to retain experimental control, I refrained from inserting language that the governor may use to justify their actions. To investigate the impact of elite rhetoric on citizens’ willingness and timeliness to safeguard democracy, future research can build on my experiment to develop an additional condition that includes politicians’ justifications for democratic transgressions.
Outcome Measure
Given my theoretical interest in examining voters’ timeliness of removing the elected official amid democratic backsliding, my main outcome measure is the number of periods until the respondent decided to recall the governor. After presenting the governor’s action in each period, I asked respondents: ‘Now suppose there is a recall election against the governor. Would you vote to recall the governor in the election?’ For example, if the respondent said yes in the first period, I code the outcome as 1, while they were still asked to evaluate the governor in later periods. If the respondent said no in the first three periods but yes in the fourth period, I code the outcome as 4. In the extreme case where the respondent said no in all six periods, the outcome is 7.Footnote 8
Studying voter behavior in recall elections offers two major advantages. First, it maps onto my theoretical construct of interest (that is, voter accountability) because successful recalls would immediately remove governors from office before their term ends. As such, it measures respondents’ willingness to punish the incumbent in one of the most severe forms (vis-à-vis protests or supporting a petition for a recall election). Second, recalls are contextually relevant in the United States. Historically, both Democratic and Republican governors – Gray Davis and Lynn Frazier – have been recalled by the American public due to controversies over their governance. While recalls rarely succeeded in US history,Footnote 9 efforts to recall governors – as well as state legislators and local officials – are very common in states that provide such constitutional rights, especially in recent years (Neuman Reference Neuman2021). In California, there have been fifty-five attempts to recall the governor since 1913, with the latest one taking place in 2021 against Democratic Governor Gavin Newsom, who survived the recall election amid mass grievances over his policy making in the state. In Wisconsin, former Republican Governor Scott Walker faced a recall election in 2012 after signing the controversial Act 10 into law. While he also survived the recall, this example illustrates how a political action made by an incumbent governor can translate into a substantive electoral opportunity for the public to oust the governor.
A recent experiment conducted in Poland allowed respondents to retract support for the undemocratic in-party incumbent against an out-party challenger in the next election (Jacob Reference Jacob2025). Conceptually, my focus on recall as an outcome measure follows a similar logic by allowing voters to remove the incumbent after observing their democratic transgressions. However, my framework differs empirically because rather than consider the role of out-party candidates in the next election for the same position, I allow the incumbent to be punished by voters immediately or in multiple future periods within their current incumbency. While measuring vote choice in a future election also brings useful insights, measuring voter accountability using the recall institution allows me to better capture a political environment where public pressures are constantly present, which could materialize without electoral challenges from out-party politicians in the real world – for example, in the form of mass collective action against the incumbent.Footnote 10 My innovative dynamic design thus allows me to tap into the important question of at which point citizens will withdraw their support for the incumbent and decide to act against them.
Figure S6 in Supplementary Material Section A illustrates my experimental design. A pre-analysis plan is available at https://osf.io/kczh6, with the survey instrument presented in Appendix B of the pre-analysis plan.
Estimation Strategy
Given my research design, the data I analyze are essentially right-censored survival data. Therefore, the Cox proportional hazard model is well-suited for my empirical analysis (Box-Steffensmeier and Jones Reference Box-Steffensmeier and Jones2004; for an experimental study that uses the same modeling strategy for such right-censored survival data, see Kertzer Reference Kertzer2017). I estimate the following preregistered equation:
where
$t$
is the number of periods until the governor is recalled. The baseline hazard function is
${h_0}\left( t \right)$
, and the individual hazard function is
${h_i}\left( t \right)$
. The dummy variables
${\rm{Increasin}}{{\rm{g}}_i}$
,
${\rm{Decreasin}}{{\rm{g}}_i}$
, and
${\rm{Sporadi}}{{\rm{c}}_i}$
indicate whether respondent
$i$
was randomly assigned to the Increasing Severity, Decreasing Severity, and Sporadic Severity conditions, respectively. The baseline group, therefore, is the No Transgressions condition. To improve the precision of my estimates, I control for preregistered covariates – sex, age, race, education, partisanship, and self-reported ideology – which are represented by
${\bf{X}}_i^{\rm{'}}$
in Equation (1). Given the analytical set-up, I will find strong support for H1 if
${\hat \beta _1} \gt 0$
,
${\hat \beta _2} \gt 0$
, and
${\hat \beta _3} \gt 0$
. Moreover, comparing
${\hat \beta _1}$
and
${\hat \beta _2}$
will answer the question of whether incrementally increasing or decreasing the severity of democratic transgressions is less conducive to voter accountability.
Results
Recall Decisions
Panel A of Figure 1 summarizes the data by visualizing the Kaplan–Meier survival curves. Each curve shows the cumulative probability of not recalling the governor in each of six periods under the experimental environment. The steeper the curve, the higher the percentage of respondents eventually voting to recall the governor. Panel B of Figure 1 reports the treatment effects estimated from the preregistered Cox model, with full information (for example, hazard ratios and coefficient estimates of covariates) reported in Table S3 in Supplementary Material Section F. I further show that the treatment effect estimates do not differ between states with recall institutions already in place and those without (Figure S7). In Supplementary Material Section G, I report descriptive statistics of recall decision and governor approval in each period (Figures S8–S9).

Figure 1. Survival curves by experimental condition and treatment effects estimated from the preregistered Cox model.
Note: in Panel A, Kaplan–Meier survival curves for each experimental condition are plotted, summarizing the proportion of respondents insisting on not recalling the governor for each of the six experimental periods. In Panel B, positive coefficients indicate a greater likelihood of voting to recall the governor, compared to the No Transgressions condition. The Cox model with covariates is preregistered; the one without is not. Full information on the regression estimates is available in Table S3 in Supplementary Material Section F. For exploratory analysis of heterogeneous treatment effects, see Table S4 in Supplementary Material Section F.
Compared to the No Transgressions condition, respondents in the Increasing Severity condition were 37 per cent more likely to recall the governor in any given period (
${\widehat \beta _1} = 0.315$
,
$s.e. = 0.053$
, 95 per cent confidence interval (CI) of adjusted hazard ratio (AHR)
$\left[ {1.24,1.52} \right]$
,
$p \lt 0.0001$
). Those in the Decreasing Severity condition were 81 per cent more likely to recall the governor in any given period (
${\widehat \beta _2} = 0.594$
,
$s.e. = 0.053$
, 95 per cent CI of AHR
$\left[ {1.63,2.01} \right]$
,
$p \lt 0.0001$
), and those in the Sporadic Severity condition were 43 per cent more likely to do so compared to the No Transgressions condition (
${\widehat \beta _3} = 0.359$
,
$s.e. = 0.053$
, 95 per cent CI of AHR
$\left[ {1.29,1.59} \right]$
,
$p \lt 0.0001$
).
These results offer strong support for H1. Respondents were more willing to recall the governor when the governor transgressed democracy step by step, compared to when the governor took a sequence of politically controversial actions that did not violate democratic principles. This finding holds regardless of the sequence of transgressions. It suggests not only that respondents were able to distinguish between democratic transgressions and politically controversial actions, but also that they were more willing to punish the governor who engaged in democratic transgressions than to punish the controversial governor for competence or policy-related reasons. My results therefore support a less alarmist interpretation of findings from existing candidate-choice experiments that ‘citizens are not completely blinded by either partisanship or policy agreement when facing undemocratic politicians’ (Frederiksen Reference Frederiksen2024, 766).
But does the sequence of democratic transgressions matter, and how? Comparing the hazard rates across different sequences of transgressions, I find that the likelihood of recalling the governor is the highest in the Decreasing Severity condition. Respondents in this condition were 26 per cent more likely to recall the governor compared to those in the Sporadic Severity condition (
${\widehat \beta _2}{\rm{\;}} - {\widehat \beta _3} = 0.235$
,
$s.e. = 0.048$
, 95 per cent CI of AHR
$\left[ {1.15,1.39} \right]$
,
$p \lt 0.0001$
) and 32 per cent more likely to do so compared to those in the Increasing Severity condition (
${\widehat \beta _2} - {\widehat \beta _1} = 0.280$
,
$s.e. = 0.049$
, 95 per cent CI of AHR
$\left[ {1.20,1.45} \right]$
,
$p \lt 0.0001$
). These results provide strong evidence that political survival is least likely in the Decreasing Severity condition. Moreover, respondents in the Increasing Severity condition were 4.4 per cent less likely to recall the governor in any given period compared to those in the Sporadic Severity condition (
${\widehat \beta _1} - {\widehat \beta _3} = - 0.045$
,
$s.e. = 0.048$
, 95 per cent CI of AHR
$\left[ {0.87,1.05} \right]$
,
$p = 0.355$
). The estimate, however, is not statistically significant. One firm conclusion that can be drawn from the data is that governors who start off by severely transgressing democracy are most likely to face voter backlash promptly. Another interesting insight is that transgressing democracy sporadically could be as fruitful a strategy for undemocratic incumbents as incrementally increasing the severity of transgressions.Footnote
11
These findings not only offer first-cut experimental evidence that the sequence of transgressions matters, but also suggest that – under the same set of democratic transgressions – an incumbent who incrementally decreases the severity of transgressions is likely to pose weaker threats to democracy than an incumbent who increases the severity step by step or an incumbent who transgresses democracy with a more random sequence. Although we have reason to believe that an undemocratic incumbent can maximize their chances of political survival by incrementally decreasing the severity of transgressions, the data are less consistent with this theoretical possibility.
One potential reason why the strategy of decreasing severity does not bear fruit is anchoring bias, a psychological phenomenon where individuals often rely heavily on the first piece of information they encounter – the ‘anchor’ – when making future judgments and decisions (Tversky and Kahneman Reference Tversky and Kahneman1974). Under the strategy of decreasing severity, that the incumbent’s first strike is so undemocratic could make citizens unusually alert to the undemocratic tendency and authoritarian intentions of the incumbent. A complementary explanation is that some individuals judge the incumbent based on the accumulation of events. These individuals would hold the incumbent accountable if their level of ‘undemocraticness’ reached a certain threshold. Mechanically, an incumbent starting with the most severe transgressions would reach this threshold sooner. An observable implication from these explanations is that, in my experiment, before the governor could exploit reference-dependence preferences in later periods to make themselves look ‘democratic’ (Grillo and Prato Reference Grillo and Prato2023), many respondents already recalled the governor in earlier periods.
To probe further, I turn to comparing the Kaplan–Meier survival curves – non-parametric statistics for survival data – between the experimental conditions. While this analysis is not preregistered, the survival curves provide a useful and transparent summary of the data, especially when the data suggest that the proportional hazards assumption is not appropriately met for the semiparametric Cox model (see Table S5 in Supplementary Material Section F). To facilitate comparisons of the survival curves non-parametrically, Figure 2 plots different pairs of survival curves across the experimental conditions. Take the red curve in the first panel of Figure 2. In the first period, the curve drops from 1.00 to 0.70, indicating that 30 per cent of respondents in the Increasing Severity condition already voted to recall the governor after seeing the first, least severe democratic transgression. In the second period, the curve further drops from 0.70 to 0.54, indicating that 16 per cent of respondents, after refusing to recall the governor in the first period, decided to recall after seeing the second transgression; only 54 per cent of respondents had not recalled the governor at this point. In addition to showing the survival curves for each experimental condition, I conduct log-rank tests to assess the null hypothesis of no difference between a given pair of survival curves. A low
${\rm{p}}$
-value from the test suggests that the survival curves statistically differ. These tests thus complement my analysis of the Cox model.

Figure 2. Survival curves and log-rank tests for pairs of experimental conditions.
Note: shaded areas indicate 95 per cent confidence intervals. Log-rank tests formally compare the statistical difference between two Kaplan–Meier survival curves, with the null hypothesis being that the two curves are equivalent. This non-parametric analysis is not preregistered.
The analysis generates three insights. First, the probabilities of not recalling the governor are the highest in the No Transgressions condition and particularly low in early periods under the Decreasing Severity condition. This finding corroborates the main results from the Cox model associated with H1 and RQ: willingness to recall was substantially higher when the governor transgressed democracy, and recalls were timelier when the governor started with the most severe transgressions. Some respondents appeared able to recognize the initial big moves by the governor and recall correspondingly.
Second, on the flip side, even if the governor started with the most severe transgression, almost half of respondents (48 per cent) did not instantly recall the governor in the Decreasing Severity condition (see second panel in Figure 2). Yet among the 497 respondents who did not recall the governor immediately, 164 of them (33 per cent) already decided to recall in the next period when they saw another severe transgression. This empirical pattern suggests that temporal contexts matter in influencing voter behavior amid democratic backsliding. Voters need not punish the incumbent immediately following the first democratic transgression; rather, they can hold them accountable in the following period when the incumbent continues to act undemocratically.
Third, regardless of the sequence of democratic transgressions, the percentage of respondents insisting on not recalling the governor remained stable. This insight can be drawn by comparing the end nodes of the survival curves in the sixth period (Figure 2). Take the final panel, which plots the survival curves for the Increasing Severity and Decreasing Severity conditions. While the curves differ significantly in earlier periods (such that more respondents in the former condition did not recall the governor between the first and fifth periods), they converge at the same point in the final period (such that nearly equal percentages of respondents in both conditions insisted on not recalling the governor throughout all six periods).Footnote
12
These respondents – what I call ‘loyal supporters’ of the governor – accounted for 18 to 19 per cent of the sample in the Increasing Severity, Decreasing Severity, and Sporadic Severity conditions. These numbers contrast sharply with the percentage of loyal supporters in the No Transgressions condition, where 40 per cent of respondents refused to recall the governor in every single period. An exploratory two-proportion Z-test indicates that the difference in proportions, 21.2 per cent, is statistically significant (
${\chi ^2}\left( 1 \right) = 197$
, 95 per cent CI
$\left[ {18.0{\rm{\;per\;cent}},24.5{\rm{\;per\;cent}}} \right]$
,
$p \lt 0.0001$
). Loyal supporters of the governor were much more prevalent where democratic transgressions were absent.
Affective Polarization and Democratic Commitment in a Dynamic Environment
Who are these loyal supporters refusing to recall the in-party, policy-congruent governor in all periods? I conduct exploratory analyses by using pretreatment covariates to predict loyal supporters in each experimental condition. Across all conditions, I find that affective polarization – measured by using standard in- and out-party feeling thermometers – is positively correlated with loyal support for the governor (Figure 3). However, while the statistical correlation is generally robust, the substantive size is small: a one standard-deviation increase in affective polarization is only associated with a two to five percentage-point increase in the likelihood of not recalling the governor in all periods. This result appears to occupy a middle ground between notable claims in the literature about how affective polarization can undermine democratic norms (Iyengar et al. Reference Iyengar, Lelkes, Levendusky, Malhotra and Westwood2019; Kingzette et al. Reference Kingzette, Druckman, Klar, Krupnikov, Levendusky and Ryan2021; McCoy and Somer Reference McCoy and Somer2019; Pierson and Schickler Reference Pierson and Schickler2020) and more recent findings that cast doubt on the role of affective polarization in eroding such norms (Broockman et al. Reference Broockman, Kalla and Westwood2023; Holliday et al. Reference Holliday, Iyengar, Lelkes and Westwood2024; Voelkel et al. Reference Voelkel, Chu, Stagnaro, Mernyk, Redekopp, Pink, Druckman, Rand and Willer2023). Apart from affective polarization, I explored other dispositional characteristics – including self-reported ideology (Yeung and Quek Reference Yeung and Quek2025) and anti-establishment orientations (Uscinski et al. Reference Uscinski, Enders, Seelig, Klofstad, Funchion, Everett, Wuchty, Premaratne and Murthi2021) – and did not find consistently reliable predictors of loyal support for the governor (see Table S7 in Supplementary Material Section G).

Figure 3. Coefficient plot of estimated relationship between affective polarization and loyal support for the governor across experimental conditions.
Note: positive coefficients indicate a greater likelihood of refusing to vote to recall the governor in all periods. Because the measure of partisan affect is standardized and the dependent variable is binary, an estimate of 0.05, for example, means that a one standard-deviation increase in affective polarization predicts a five percentage-point increase in the likelihood of being a loyal supporter of the governor (that is, not voting to recall the governor in all periods). This analysis is exploratory. Full information on the regression estimates is available in Table S8 in Supplementary Material Section G.
I further analyze the role of partisanship in shaping respondents’ willingness to recall the governor in a dynamic setting. Figure 4 plots the survival curves by party identification for each experimental condition. While the survival curves between different political camps largely overlap, Democrats – across all experimental conditions where the governor transgressed democracy – voted to recall the governor slightly sooner than Republicans and Independents. Moreover, Independents were especially slow and reluctant to recall in the Decreasing Severity condition. While previous work finds that ‘[m]ost voters are partisans first and democrats only second’ and that ‘[s]upporters of both parties employ a partisan “double standard”’ (Graham and Svolik Reference Graham and Svolik2020, 393), my experiment cuts against this finding by showing that when policy interests are at stake, Independents can be even slower to respond to democratic backsliding than Democrats and Republicans.

Figure 4. Survival curves by experimental condition and party identification.
Note: shaded areas indicate 95 per cent confidence intervals. Log-rank tests formally compare the statistical difference between the three Kaplan–Meier survival curves in each experimental condition, with the null hypothesis being that the three curves are equivalent. This analysis is exploratory.
While this empirical discovery is important and invites follow-up research to unpack the mechanism, I use the available data to probe several explanations. First, I consider the possibility that Independents particularly preferred strong-man politics, such that they were more willing than Democrats and Republicans to prioritize policy benefits over democratic principles. Yet this explanation is less likely because preferences for strong-man politics – proxied by a measure of anti-establishment orientations – were not especially strong among Independents (Supplementary Material Section G.7). In Supplementary Material Section G.8, I further rule out a less theoretical but more mechanical explanation that Independents were simply less engaged in the survey – and consequently made recall decisions differently – by showing that they spent as much time as Democrats and Republicans did in the experiment (Figure S11). While the mechanism remains unclear and warrants further research, my results provide evidence that partisanship, when disentangled from policy substance, may not have a strong, independent effect on Americans’ support for democratic transgressions (Orr et al. Reference Orr, Fowler and Huber2023). They also speak to my finding on a relatively weak relationship between affective polarization and loyal support for the governor.
The bottom line is that while much of the existing literature suggests that voters are generally reluctant to punish their co-partisan elites for democratic transgressions, my experiment reveals considerable willingness among the American public to recall a co-partisan governor, even after relatively minor violations. This finding underscores that different research designs can yield drastically different conclusions about voters’ willingness to punish democratic transgressions (Frederiksen Reference Frederiksen2024, 778). Prominent candidate-choice experiments aim to capture how much voters take democracy into account when voting between in-party and out-party candidates (Supplementary Material Section A), with the implicit assumption that respondents in these settings infer from the candidates’ past actions that they will continue to undermine democracy after being elected. By contrast, my design directly informed respondents about the specific transgressions that the co-partisan incumbent had already committed; it also did not explicitly pit the incumbent against any other candidates. To the extent that electing an out-party candidate in the next election is not the voters’ only means to punish the co-partisan incumbent (as reflected by my design based on the recall institution), my study suggests that research on democratic backsliding would benefit immensely from understanding co-partisans’ backlash against undemocratic politicians using alternative reasonable approaches.
Generalizability, Limitations, and Extensions
How generalizable are the findings beyond the United States? While a definitive answer to this question is not possible unless replications are conducted in other contexts, one could speculate that because the psychological mechanisms (for example, anchoring and status quo bias) potentially driving the results are general, the impacts of transgression sequence on individual political attitudes and behaviors may not be United States-specific. Given the importance of understanding voter responses to democratic erosion, future research should extend my study to other electoral democracies.
Another threat to generalizability concerns the specific institutional setting that characterized my research design. In addition to the uniqueness of recall as a political institution, the empirical focus on subnational democratic backsliding may limit the generalizability of results to broader national settings. One could argue that because national politics has perceivably higher stakes than subnational politics, voters’ willingness to punish undemocratic incumbents – and how they react to different sequences of transgressions – may substantively change. While understanding subnational dynamics of democratic backsliding is extremely important in both American (Grumbach Reference Grumbach2022) and comparative (Michel Reference Michel2024) contexts, it would be useful for future research to transform my experiment into a national setting.
Although my experiment was motivated by the theoretical literature and generated new empirical insights, it was not designed to test a specific game-theoretic model; to do so would require clear specifications of the particular institutional arrangements baked into the model’s assumptions. Instead, I studied a general environment in which the prospect of removal is constantly present, which could be manifested not only in the form of recall (the measure in my experiment) but also in the form of collective action (for example, anti-incumbent protests; see Gamboa Reference Gamboa2022). To the extent that recall decisions – which capture respondents’ willingness to strongly punish the incumbent – serve as a reasonable proxy for public opinion, the utility of my experimental paradigm and the new insights generated from it are not restricted by the uniqueness of recall as an institution. It remains an empirical question whether sequence matters in the same way for milder forms of punishment in the United States and for other forms of voter punishment that may be available in other institutional contexts.
Given this discussion, a natural extension of my experiment is to explore other political behaviors that also have ramifications for incumbent survival (for example, protest, signing petitions, and political donations). Future experiments may also add incumbents’ justification for their actions and compare voter behavior in treatment and control groups where such rhetoric is or isn’t provided. This extension will illuminate how elite rhetoric dampens citizens’ aversion to democratic transgressions in a dynamic setting (Clayton et al. Reference Clayton, Davis, Nyhan, Porter, Ryan and Wood2021; Stokes Reference Stokes2025).
An interesting observation to emerge from my pretest is that respondents’ average perceptions of the harmfulness of each democratic transgression did not vary substantially, with the least severe one rated at 50 and the most severe one rated at 68 on a 101-point scale. Reasonable skeptics may ask: why would varying the sequence of anti-democratic actions, which vary so little in mass perceptions of severity, generate a substantial treatment effect? While the lack of variation is part of the design feature given my focus on piecemeal democratic transgressions – instead of extreme actions (for example, staging a coup and launching violent attacks on political opponents) that would be seen as substantially more severe – and is partly attributable to random measurement error (Westwood et al. Reference Westwood, Grimmer, Tyler and Nall2022) and some respondents rating extreme values for every transgression in the pretest, it may also be due to some individuals having drastically different evaluations of severity in the first place.Footnote 13 To overcome measurement challenges, a potentially fruitful approach is to use tailored experiments (Velez and Liu Reference Velez and Liu2025), where the researcher measures respondents’ prior attitudes toward different transgressions and subsequently exposes them to action sequences that map onto their subjective perceptions of severity.Footnote 14 Measurement challenges notwithstanding, the substantial treatment effects uncovered by my experiment, which are essentially ITT estimates, provide even stronger evidence that sequence plays an important role in shaping voter accountability (see footnote 7).
One potential critique of my experiment is that it lacks mundane realism. For instance, hypothetical rather than real politicians were used in the experiment. In reality, voters might develop psychological attachment with the incumbent and would consequently be less willing to recall them. As such, even though the incumbent’s partisanship and policy preferences were restricted to be the same as those of respondents, my experiment might provide a more generous estimate of Americans’ willingness to punish undemocratic politicians. While future research could build on my experiment by introducing real-world politicians, the use of hypothetical politicians in my experiment accords with existing experiments on democratic backsliding (Supplementary Material Section A). Juxtaposing my experiment with this literature, one takeaway is that, compared to prominent experiments relying on static approaches, my dynamic experiment offers a more nuanced – and perhaps less pessimistic – account of the American public’s willingness to punish undemocratic politicians.
Another potential concern about mundane realism is that my design did not incorporate periods of regular actions taken by the governor but instead presented the undemocratic governor as continuously transgressing democracy. To offer a first test of the impact of transgression sequence on vertical accountability, I simplified complexities by abstracting from unrelated actions that could be simultaneously taken by the incumbent in the real world, with the consideration that my design was meant to capture sequential dynamics that would have periods of non-action mixed in. The underlying assumption, in other words, is that voters can evoke prior behaviors of the incumbent, which appears realistic given that other elites in real-world politics would have incentives to make the incumbent’s democratic transgressions clear in campaigns. To relax this assumption, future experiments could introduce periods of non-transgressions and investigate whether such noise would constrain the effects of transgression sequence on voter behavior.
Conclusion
Democratic backsliding, by definition, is an incremental process. While its dynamic aspect is a core feature in the theoretical literature, existing empirical scholarship has yet to develop corresponding research designs to capture the accumulative and sequential nature of democratic backsliding. Devising a new experimental paradigm to study voter behavior amid incremental democratic transgressions, I find that American respondents took the shortest period to recall governors who decreased the severity of transgressions step by step. This finding helps explain why authoritarian-minded incumbents in the real world often refrain from severely challenging democracy when they take office, even though this political strategy is theoretically rationalizable. Surprisingly, while I find that the strategy of increasing severity may be appealing for backsliding incumbents, the evidence also suggests that transgressing democracy sporadically could be as effective. By providing first-cut experimental evidence to show how different processes of democratic erosion may induce different levels of threat to democracy, my study underscores the importance of studying democratic backsliding dynamically.
This study also contributes to the empirical literature on democratic backsliding. Using a predominantly static approach, existing experiments that assess voters’ willingness to defend democracy have generated rich but mixed insights. Comparing vote choice between democratic and undemocratic candidates in a static setting, some studies find that voters are generally willing to punish the latter against the backdrop of partisan and policy interests (Carey et al. Reference Carey, Clayton, Helmke, Nyhan, Sanders and Stokes2022; Gidengil et al. Reference Gidengil, Stolle and Bergeron-Boutin2022; Wunsch et al. Reference Wunsch, Jacob and Derksen2025). Using the same or a similarly static approach, other studies conclude that many citizens are significantly less willing to punish undemocratic candidates in the presence of partisan or policy benefits (Graham and Svolik Reference Graham and Svolik2020; Simonovits et al. Reference Simonovits, McCoy and Littvay2022; Svolik et al. Reference Svolik, Avramovska, Lutz and Milaèiæ2023). While these studies are informative and lay important foundations for deeper investigations into the role of partisanship and policy preferences in shaping public support for democratic backsliding, a fundamental gap in the literature is citizens’ willingness to punish the elected incumbent amid piecemeal democratic transgressions. Measuring voter behavior regarding a single politician in multiple periods, I show that even when the elected incumbent and the respondents share the same partisanship and policy interests, most individuals are willing to vote out the incumbent when the incumbent’s undemocratic nature has become clear to them. While an important line of work casts doubt on voters’ willingness to safeguard democracy in the shadow of polarization (for example, Iyengar et al. Reference Iyengar, Lelkes, Levendusky, Malhotra and Westwood2019; Kingzette et al. Reference Kingzette, Druckman, Klar, Krupnikov, Levendusky and Ryan2021; McCoy and Somer Reference McCoy and Somer2019), my study – by taking a dynamic approach that allows individuals to observe and revise their beliefs about the incumbent’s intentions sequentially – paints a more nuanced and less gloomy picture of Americans’ willingness to defend democracy.
Finally, this study makes a methodological contribution to the booming empirical scholarship on democratic backsliding. My experimental paradigm captures the incremental nature of democratic backsliding and maps onto the theoretical foundation of the concept. Future research would benefit from building on my experimental framework to study how citizens respond to democratic backsliding when it unfolds incrementally. By making simple adjustments to the outcome measure, vignette background, or sample selection, future scholarship can ask new and timely empirical questions about how voters respond to democratic backsliding, which have been impossible to tackle due to the constraint of previous static approaches. For example, at which point are individuals willing to take costly political actions to co-ordinate against the incumbent in the midst of democratic backsliding? How would the intertemporal dynamics change when the elected incumbent and the citizen do not share the same partisanship and/or policy interests in the first place? Are the experimental findings uncovered in this study generalizable to other democracies where voters may be less sensitive to incremental violations of democratic principles? Given the role of the public in safeguarding democracy and the importance of citizens’ co-ordination against undemocratic incumbents on time, this article, by introducing a new experimental paradigm to study these questions empirically, makes a timely contribution to the literature on democratic backsliding.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0007123425100847
Data availability statement
Replication data for this article can be found in Harvard Dataverse at: https://doi.org/10.7910/DVN/IK26V9.
Acknowledgments
I am grateful to Jamie Druckman for his mentorship and support for this work. I also thank Alejandra Aldridge, Haley Allen DeMarco, Steve Bai, Zenobia Chan, Chris Carter, Jen Gandhi, Nayun Kim, Gary Leung, Samuel Liu, Jiaqian Ni, Zac Peskowitz, Megan Turnbull, Kai Quek, Zoey Xu, Qixuan Yang, participants in SoWEPS-7, and audiences in the annual meetings of AAPOR, APSA, EPOVB, and IHS for feedback. Zeynep Somer-Topcu and three anonymous reviewers at BJPS provided valuable suggestions and advice. All errors are my own.
Financial support
Support for this research was provided by the American Political Science Association’s EPOVB Early-Career Fellowship, the Institute of Humane Studies’ Junior Fellowship and Humane Studies Fellowship (grant no. IHS017776), and Emory University’s Prudentis Awards and Professional Development Support Funds.
Competing interests
None.
Ethical standards
This research was conducted in accordance with the protocols approved by Emory University Institutional Review Board (IRB ID: STUDY00006785).

