Hostname: page-component-857557d7f7-zv5th Total loading time: 0 Render date: 2025-12-09T23:48:29.472Z Has data issue: false hasContentIssue false

Evaluation of third-party punishment depends on its type and severity

Published online by Cambridge University Press:  20 November 2025

Olivia Seubert*
Affiliation:
Department of Psychology, University of Würzburg, Germany
Anne Böckler
Affiliation:
Department of Psychology, University of Würzburg, Germany
*
Corresponding author: Olivia Seubert; Email: o.seubert@icloud.com
Rights & Permissions [Opens in a new window]

Abstract

Sacrificing own resources to punish norm violators is often regarded an altruistic act, promoting cooperation and fairness within social groups. However, recent studies highlight difficulties in interpreting third-party punishment as a prosocial and cooperative signal. Moving beyond abstract, decontextualized settings typically employed in economic game paradigms, we aimed to better understand the appraisal of observed punishment and punishers in real-world situations. To this end, we created and validated 24 written vignettes of everyday-life scenarios depicting interactions between a perpetrator, a victim, and a punisher. Across two preregistered experiments, we systematically manipulated key aspects of third-party punishment: transgression type and punishment type (property-oriented, corporal, or psychological; Experiment 1; N = 48) and punishment severity (weak or strong; Experiment 2; N = 50). Participants rated punishment adequacy and the punisher’s warmth, competence, and suitability as an interaction partner, whether as a friend or team leader. Results indicated preferences for psychological punishments, punishments that aligned with transgression type, and less severe punishments. Our findings support the notion that punishment is an ambiguous issue and reveal important contextual factors that contribute to its evaluation as a useful social strategy.

Information

Type
Empirical Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Society for Judgment and Decision Making and European Association for Decision Making

1. Introduction

Picture an impatient commuter pushing past someone to board the bus first. In response, another person in the queue openly rebukes the shover for their rudeness. As an observer, would you deem the intervenor’s response appropriate? How would you judge this person? In the depicted situation, an uninvolved third party invests their own resources (e.g., time and effort) to punish a perpetrator, despite not being directly affected. This type of behavior is referred to as third-party punishment (TPP; Eisenberg and Miller, Reference Eisenberg and Miller1987; Fehr and Fischbacher, Reference Fehr and Fischbacher2004) and can benefit societies by increasing cooperation and conformity to social norms (e.g., Fehr and Gächter, Reference Fehr and Gächter2002; Henrich et al., Reference Henrich, McElreath, Barr, Ensminger, Barrett, Bolyanatz, Cardenas, Gurven, Gwako, Henrich, Lesorogol, Marlowe, Tracer and Ziker2006; Spitzer et al., Reference Spitzer, Fischbacher, Herrnberger, Grön and Fehr2007). Countless studies, predominantly using economic game paradigms, demonstrate that approximately 50% of participants are willing to pay a personal price to inflict a reciprocal cost on an unfair social partner (for an overview, see Fehr and Fischbacher, Reference Fehr and Fischbacher2004; Nowak and Sigmund, Reference Nowak and Sigmund2005; van Dijk and De Dreu, Reference van Dijk and De Dreu2021).

But punishing is an ambivalent endeavor—and is perceived as such. On the one hand, it can serve as a positive signal: Punishing a perpetrator, like compensating a victim, indicates empathic concern and compassion for the victim (Klimecki et al., Reference Klimecki, Mayer, Jusyte, Scheeff and Schönenberg2016; Leliveld et al., Reference Leliveld, Van Dijk and Beest2012; Pfattheicher et al., Reference Pfattheicher, Sassenrath and Keller2019). Studies revealed that participants trust and reward third-party punishers more than individuals who do nothing (Barclay, Reference Barclay2006; Jordan et al., Reference Jordan, Hoffman, Bloom and Rand2016; Raihani and Bshary, Reference Raihani and Bshary2015; Vaish et al., Reference Vaish, Herrmann, Markmann and Tomasello2016) and are more inclined to choose them as new social partners (Batistoni et al., Reference Batistoni, Barclay and Raihani2022; Jordan et al., Reference Jordan, Hoffman, Bloom and Rand2016; Kurzban et al., Reference Kurzban, DeScioli and O’Brien2007). On the other hand, although sometimes conceptualized as an altruistic behavior, TPP can be detrimental and stigmatizing for punishers. Punishment is associated with anger and aggression (Eriksson et al., Reference Eriksson, Andersson and Strimling2016; Lee and Warneken, Reference Lee and Warneken2020; Rodrigues et al., Reference Rodrigues, Nagowski, Mussel and Hewig2018) and spiteful motives like revenge and retaliation (Raihani and Bshary, Reference Raihani and Bshary2019; van Doorn et al., Reference van Doorn, Zeelenberg, Breugelmans, Berger and Okimoto2018). Punishers induce fear in bystanders, who may conclude that they themselves risk harsh treatment in future interactions (Balafoutas et al., Reference Balafoutas, Nikiforakis and Rockenbach2016; Delton and Krasnow, Reference Delton and Krasnow2017; Fehr and Gächter, Reference Fehr and Gächter2002). In sum, TPP can signal both cooperation and aggression and, accordingly, has ambivalent reputational benefits (Dhaliwal et al., Reference Dhaliwal, Patil and Cushman2021; Panchanathan and Boyd, Reference Panchanathan and Boyd2004).

Research on TPP decisions and their evaluation typically relies on economic game paradigms, which document third-party responses in controlled, incentivized settings (for an overview, see Thielmann et al., Reference Thielmann, Böhm, Ott and Hilbig2021). This approach has revealed important insights into conditions under which punishment is viewed as an adequate intervention strategy. Characteristics of the witnessed norm violation can justify punishment, for instance, whether the perpetrator harmed the victim on purpose (Buckholtz et al., Reference Buckholtz, Asplund, Dux, Zald, Gore, Jones and Marois2008) or how grave the transgression was (e.g., Civai et al., Reference Civai, Huijsmans and Sanfey2019). Third parties use information such as severity (i.e., coins taken by the perpetrator) to calibrate a fitting response, and punishment becomes more likely and acceptable as transgression severity increases, in both adults (Stallen et al., Reference Stallen, Rossi, Heijne, Smidts, De Dreu and Sanfey2018) and children (Arini et al., Reference Arini, Wiggs and Kenward2021). However, the predominant use of decontextualized, anonymous interactions with minimal monetary penalties (‘for every cent you invest, the other loses three cents’) fails to capture the rich variety of punishment strategies available in everyday life and does not allow to address how core features of punishment, such as its type, impact punishment evaluations (see, e.g., Molho et al., Reference Molho, Tybur, Van Lange and Balliet2020; Raihani and Bshary, Reference Raihani and Bshary2019). Punitive strategies encompass various behaviors, including verbal rebukes, social ostracism, or physical aggression, which may be (interpreted as) more adequate than financial sanctions in many contexts (Balliet et al., Reference Balliet, Molho, Columbus and Dores Cruz2022; Molho et al., Reference Molho, Tybur, Van Lange and Balliet2020). Moving beyond abstract paradigms, whose limited external validity and contextual flexibility challenge the generalizability of findings (e.g., discussed in Batistoni et al., Reference Batistoni, Barclay and Raihani2022; Raihani et al., Reference Raihani, Thornton and Bshary2012), we aim at investigating which punishment strategies enhance or harm the punisher’s reputation and under what conditions specific punitive actions are viewed as useful tools for enforcing norms in more naturalistic settings.

1.1. Types of punishment

Monetary, property-oriented sanctions, as used in economic game studies, are a key and common strategy for prosecuting wrongdoers in real-life societies via legal institutions (Carlsmith et al., Reference Carlsmith, Darley and Robinson2002; Guala, Reference Guala2012; Schoenmakers et al., Reference Schoenmakers, Hilbe, Blasius and Traulsen2014), especially for minor offenses (Statistisches Bundesamt, 2022). Less attention has been devoted to the broad range of informal punitive responses that people employ in daily life settings. Psychological strategies can be observed from an early age: Toddlers express verbal protest upon witnessing property damage (Vaish et al., Reference Vaish, Missana and Tomasello2011), and adults confront perpetrators directly through rebukes and harsh criticism (Masclet et al., Reference Masclet, Noussair, Tucker and Villeval2003; Wiessner, Reference Wiessner2005), as indicated in our initial example. More indirect social tactics encompass gossiping (Feinberg et al., Reference Feinberg, Willer and Schultz2014; Giardini et al., Reference Giardini, Vilone, Sánchez and Antonioni2021) and social exclusion (Beersma and Van Kleef, Reference Beersma and Van Kleef2011; Dimitroff et al., Reference Dimitroff, Harrod, Smith, Faig, Decety and Norman2020; Dunbar, Reference Dunbar2004; Guala, Reference Guala2012), with the latter leading to higher contributions in public goods experiments (Cinyabuguma et al., Reference Cinyabuguma, Page and Putterman2005) and reduced selfish behavior in both human (Turnbull, Reference Turnbull1961) and animal groups (Carter, Reference Carter2014; Krama et al., Reference Krama, Vrublevska, Freeberg, Kullberg, Rantala and Krams2012). Next, justice restoration can extend to corporal punishment. Physical violence against individuals or groups, for instance whipping or branding, was acceptable across European society until the late eighteenth century (King, Reference King2006; Straus, Reference Straus1991). Even today, more than 28,000 people are under sentence of death worldwide (Death Penalty Information Center, 2023). In everyday life, corporal punishment persists as a prevalent strategy for addressing misbehavior, particularly in child rearing, with about 17% of adolescents globally reporting recent experiences of such disciplining at school or home (Elgar et al., Reference Elgar, Donnelly, Michaelson, Gariépy, Riehm, Walsh and Pickett2018).

Which of these punishment types—property-oriented, corporal, or psychological—are considered appropriate—and when? Given its immediate and confrontational nature, corporal punishment as a direct violation of another’s body is often associated with aggressiveness and hostility (Franklin-Luther and Volk, Reference Franklin-Luther and Volk2022; Gershoff, Reference Gershoff2002) and can trigger stronger moral outrage and anger in observers than non-corporal harm (Asp et al., Reference Asp, Gullickson, Warner, Koscik, Denburg and Tranel2019; Eriksson et al., Reference Eriksson, Andersson and Strimling2016; Jackson et al., Reference Jackson, Rainville and Decety2006). Accordingly, children perceived spanking as less fair compared to non-corporal punishments like withdrawing toys (Vittrup and Holden, Reference Vittrup and Holden2010). In contrast, property-oriented punishment is less emotionally charged and may be considered less effective in inducing behavioral change (Larzelere and Kuhn, Reference Larzelere and Kuhn2005; Nelissen and Mulder, Reference Nelissen and Mulder2013). Due to its formal and calculable character, though, financial punishment may communicate clear messages about norm violations and carry predictable external incentives against future misconduct through cost-benefit logic (Guala, Reference Guala2012; Molho et al., Reference Molho, Twardawski and Fan2022). A potent person-related alternative, avoiding both physical and property harm, is psychological punishment. It was advocated as the preferred strategy in everyday life (Elbla, Reference Elbla2012; Molho et al., Reference Molho, Tybur, Van Lange and Balliet2020), promoted punishers’ perceived competence (Chen and Xu, Reference Chen and Xu2020), and fostered cooperation more effectively than material sanctions (Nelissen and Mulder, Reference Nelissen and Mulder2013; Wu et al., Reference Wu, Balliet and Van Lange2016) by signaling strong communal condemnation (Beersma and Van Kleef, Reference Beersma and Van Kleef2011; Feinberg et al., Reference Feinberg, Willer and Schultz2014). So far, systematic comparisons and relative preferences between these three punishment strategies are sparse, especially within a unified empirical framework (but see, e.g., Vittrup and Holden, Reference Vittrup and Holden2010, and Eriksson et al., Reference Eriksson, Andersson and Strimling2016, for comparisons between punishment types).

1.2. Proportionality of punishment

Another critical feature shaping punishment appropriateness may be the fit between punishment and prior transgression. A central perspective on what constitutes ‘adequate’ punishment is the concept of retribution (Carlsmith, Reference Carlsmith2006), according to which TPP should reflect the nature and severity of the harm inflicted so that the wrongdoer experiences consequences similar to their own misdeeds. With different punishment strategies available, the third party can choose response types that match or differ from the transgression. When both measures align, observers may consider this as restoring balance, contributing to a subjective sense of justice (Goodwin and Benforado, Reference Goodwin and Benforado2015; Hofmann et al., Reference Hofmann, Brandt, Wisneski, Rockenbach and Skitka2018), and fostering the belief that the punisher acted thoughtfully and in accordance with societal expectations (Aldrovandi et al., Reference Aldrovandi, Wood and Brown2013; Arini et al., Reference Arini, Wiggs and Kenward2021). To date, it remains an open empirical question whether similarity between the type of transgression and the type of punishment enhances perceived adequacy and the legitimacy of the punitive act.

1.3. Severity of punishment

The concept of ‘eye for an eye’ extends to the severity of the transgression and subsequent punishment. Third parties often deploy punishment proportionately to the severity of the prior norm violation (Carlsmith et al., Reference Carlsmith, Darley and Robinson2002; Fehr and Gächter, Reference Fehr and Gächter2002; Heffner and FeldmanHall, Reference Heffner and FeldmanHall2019), enhancing judgments of adequacy, fairness, and morality among observers (Balafoutas et al., Reference Balafoutas, Nikiforakis and Rockenbach2016; Carlsmith et al., Reference Carlsmith, Darley and Robinson2002). However, when punishment severity clearly exceeded that of the transgressions, punishers were seen as impulsive, less trustworthy, or even vengeful (Brandt et al., Reference Brandt, Hauert and Sigmund2003; Dhaliwal et al., Reference Dhaliwal, Patil and Cushman2021). Studies comparing strong versus weak punishment tend to show greater disapproval for harsher sanctions, whether corporal or property-oriented (Eriksson et al., Reference Eriksson, Andersson and Strimling2016; Lee and Warneken, Reference Lee and Warneken2020; Liu et al., Reference Liu, Yang and Wu2021; Solomon and Lee, Reference Solomon and Lee2025). In contrast, harsher sanctions could effectively serve deterrent functions by clearly signaling strong criticism and preventing future violations by the culprit or by others (Balliet et al., Reference Balliet, Mulder and Van Lange2011; Delton and Krasnow, Reference Delton and Krasnow2017). Despite contradictory findings, the evaluation of punishment severity may critically depend on the situation—and the punishment type. Given the formalized setups typically used to investigate severity effects, it remains unknown when, how, and why severity matters.

1.4. The current study

We aimed to systematically investigate how type and severity of punishment and their relation to the norm violation affect evaluations of punishment and punisher in a formalized manner while capturing the complexity of real-life social interactions. Three situational TPP features were manipulated: type of transgression, type of punishment (property-oriented, corporal, or psychological; Experiment 1), and severity of punishment (weak or strong; Experiment 2). Using a vignette approach (as in, e.g., Buckholtz et al., Reference Buckholtz, Asplund, Dux, Zald, Gore, Jones and Marois2008; Eriksson et al., Reference Eriksson, Andersson and Strimling2017; Lieberman and Linke, Reference Lieberman and Linke2007; Martin et al., Reference Martin, Jordan, Rand and Cushman2019), we created and validated a diverse set of written hypothetical scenarios. Each vignette followed the classic TPP structure, entailing a transgression between a perpetrator and a victim and a subsequent third-party intervention. All punishments in our studies were acts of peer-to-peer punishment carried out by ordinary third parties who had no formal authority or institutional role over the perpetrator, reflecting decentralized, informal enforcement of social norms as observed in everyday interpersonal contexts (e.g., Fehr and Fischbacher, Reference Fehr and Fischbacher2004).

Both experiments measured the effects of the manipulations on perceived punishment adequacy (Behnke et al., Reference Behnke, Strobel and Armbruster2020; Hopkins et al., Reference Hopkins, Dodd, Nolan and Bartels2022) and evaluations of the punisher’s warmth and competence, two fundamental dimensions people typically rely on when judging others (stereotype content model; Abele et al., Reference Abele, Ellemers, Fiske, Koch and Yzerbyt2021; Fiske et al., Reference Fiske, Cuddy, Glick and Xu2002). While warmth includes attributes like kindness, empathy, and trustworthiness, competence pertains to capability and efficiency. In terms of social approach, perceived warmth can foster inclinations to form friendships (Cuddy et al., Reference Cuddy, Fiske and Glick2008), and perceived competence can contribute to viewing someone as a suitable leader (Cuddy et al., Reference Cuddy, Glick and Beninger2011; Fiske et al., Reference Fiske, Cuddy and Glick2007). Correspondingly, we assessed participants’ hypothetical willingness to interact with the punisher as a friend or team leader, extending findings on reputational benefits of altruistic punishment (e.g., Barclay, Reference Barclay2006; Jordan et al., Reference Jordan, Hoffman, Bloom and Rand2016; Santos et al., Reference Santos, Rankin and Wedekind2011).

2. Experiment 1

Experiment 1 manipulated the type of transgression toward the victim and the type of punishment subsequently administered by the third party. Both acts could take property-oriented, corporal, or psychological form (examples below). After each vignette, participants rated the punishment, the punisher, and their willingness to interact with the punisher. We expected more favorable ratings of punishment and punisher following psychological and corporal transgressions compared to property-oriented transgressions, as the former are more direct and likely to inflict more emotional or physical pain, thus warranting punishment (H1; Asp et al., Reference Asp, Gullickson, Warner, Koscik, Denburg and Tranel2019; Dimitroff et al., Reference Dimitroff, Harrod, Smith, Faig, Decety and Norman2020; Jackson et al., Reference Jackson, Rainville and Decety2006; Smetana and Ball, Reference Smetana and Ball2019). Regarding punishment types, the literature supports diverging hypotheses. For one, property-oriented punishment may elicit more positive evaluations than corporal and psychological strategies (H2a; Asp et al., Reference Asp, Gullickson, Warner, Koscik, Denburg and Tranel2019; Vittrup and Holden, Reference Vittrup and Holden2010), given its common and formal use and its perception as less aggressive since it targets objects rather than the perpetrator directly (Eriksson et al., Reference Eriksson, Andersson and Strimling2016). Alternatively, property-oriented punishment could receive more negative ratings than corporal and psychological punishment (H2b), as observers may view social sanctions as more effective in promoting cooperation, ensuring a safe environment, and enhancing the group’s net benefit (Nelissen and Mulder, Reference Nelissen and Mulder2013; Wu et al., Reference Wu, Balliet and Van Lange2016). Finally, we expected more beneficial evaluations across all five rating dimensions when the punishment type (e.g., property-oriented) matches the preceding transgression type (e.g., also property-oriented; H3; Aldrovandi et al., Reference Aldrovandi, Wood and Brown2013; Carlsmith, Reference Carlsmith2006; Hofmann et al., Reference Hofmann, Brandt, Wisneski, Rockenbach and Skitka2018).

2.1. Methods

We report how the sample size was determined, all manipulations, collected measures, and data exclusions (Simmons et al., Reference Simmons, Nelson and Simonsohn2011). The preregistration for the experiments (https://osf.io/s7b3d/overview) and the stimulus material (https://osf.io/fr9bt/overview) are available on the Open Science Framework.

2.1.1. Development of study material

We created 24 scenarios, each depicting interactions among a perpetrator, a victim, and a punisher across distinct social contexts (learning, work, everyday life, and social relationships; see Figure 1). Each context included six different TPP scenarios, for example, involving roles like teachers, consultants, or actors in work-related settings. We manipulated each of the 24 scenarios according to all experimental conditions (Experiment 1: all combinations of type of transgression and type of punishment; Experiment 2: all combinations of severity of punishment and type of transgression/punishment) resulting in nine (Experiment 1) or six (Experiment 2) vignettes per scenario (see Table S1 in the Supplementary Material for an exemplary vignette set).Footnote 1 To illustrate the design based on one scenario, in an everyday life vignette involving handball players, the manipulation of transgression type looked as follows: property-oriented: taking a teammate’s headphones; corporal: pushing a teammate aside; and psychological: mocking a teammate’s cheap clothing, while punishment type was manipulated as follows: property-oriented: tearing off the perpetrator’s keychain; corporal: seizing the perpetrator by the arm; and psychological: giving the cold shoulder to the perpetrator during training. When manipulating punishment severity in Experiment 2, in the corporal punishment example, the punisher now either pinches the shover’s arm (weak punishment, original punishment downgraded) or pulls the shover back forcefully, causing them to fall (strong punishment, original punishment intensified).

Figure 1 Overview of experiments, setup, and procedure.

Note: In the upper right part of the figure, the two experiments and their independent variables (IVs) are displayed. Each trial involved participants reading a vignette depicting a transgression and subsequent punishment, always representing one specific condition. Participants’ task was to picture the scenario and then rate the punishment, the punisher, and their interaction tendency on rating scales (displayed below the questions). These ratings comprised the five dependent variables (DVs).

Each of the three norm violation categories (property-oriented, corporal, and psychological) included various subtypes of offenses, guided by previous vignette-based research (e.g., Buckholtz et al., Reference Buckholtz, Asplund, Dux, Zald, Gore, Jones and Marois2008; Martin et al., Reference Martin, Jordan, Rand and Cushman2019) and everyday-life observations. As such, our setup allowed us to explore whether the overall effects of the type of transgression/punishment on participants’ evaluations were, for instance, driven by specific behaviors. Different harm subtypes could trigger distinct emotional and moral responses, subsequently shaping participants’ judgments of punishment appropriateness.

All vignettes were developed and tested in a pilot experiment to validate their suitability as experimental stimuli. For that purpose, we deconstructed the vignettes into individual, decontextualized components displaying the transgressions and punishments separately (e.g., ‘a person pinches another person’s arm’). Thirty-nine participants (M = 22.9 years, 28 female and 11 male) rated the statements on a seven-point Likert-scale ranging from not at all morally reprehensible, wrong, or bad to extremely morally reprehensible, wrong, or bad. This approach allowed us to assess whether the individual components of our scenarios carried comparable moral weight, independent of their narrative context. Based on the results, we adjusted the vignettes to ensure comparable perceived wrongness across transgression and punishment types and severities (see Text S1 in the Supplementary Material). Further, we created male and female versions while maintaining gender consistency within vignettes and featured various age groups (children, adolescents, young adults, adults, and seniors) with identical ages within vignettes. Additionally, after expert focus-group discussions, we excluded punishments related to extensive planning or illegal activities and those that could be viewed as serving the punisher’s self-interest. Our refined scenarios aimed to reach a balance between external validity and experimental control to mitigate potential confounds, enabling a solid foundation for our investigations and assuring that observed differences in rating scores in the later experiments could be attributable to variations in the independent variables (IVs).

2.1.2. Participants

As outlined in the preregistration (https://osf.io/s7b3d/overview), required sample sizes were calculated a priori (using G*Power; Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) with α = .05, power 1 − β = .8 and an estimated medium effect size f = .25 (as a conventional benchmark in the absence of robust empirical precedents) for both main effects and the interaction, resulting in a total of N = 34 per experiment. To account for possible dropouts and achieve equal cell sizes for all vignettes (requiring a multitude of 12, see Task and procedure section), we added 14 participants for a final sample of N = 48 per experiment. The present study adheres to the ethical standards of the 1964 Declaration of Helsinki regarding participant treatment in research. Participants gave informed consent prior to starting the respective experiment.

The 48 participants of Experiment 1 were students who were recruited via the University of Würzburg’s recruitment platform and compensated with course credit, or volunteers who participated without financial compensation. All participants met the preregistered inclusion criteria, which required fluency in German and passing at least 75% (three out of four) of attention checks designed to ensure concentration and engagement in the task. Mean age was 23.4 years (SD = 5.9, range 19–57) with 11 participants identifying as male, 36 as female, and 1 as diverse.

2.1.3. Task and procedure

The experiment, programmed in PsychoPy (Peirce et al., Reference Peirce, Gray, Simpson, MacAskill, Höchenberger, Sogo, Kastman and Lindeløv2019), was accessed online via a link shared either through the recruitment platform or directly with participants and required a computer or laptop with a keyboard and mouse/trackpad. After completing demographic questions concerning age and gender, participants were instructed to read short vignettes describing hypothetical interactions between three parties. They were advised to carefully read each vignette and picture the situation as an outside observer. The instructions emphasized that all transgressions and punishments were intentional and that punishments always targeted the perpetrator.

Two practice trials preceded the experimental trials. Each trial started with a written vignette, followed by participants responding to five questions on seven-point Likert scales (see Figure 1). Questions for vignettes describing interactions between children required slight adjustments. The five questions were: (1) How adequate/appropriate do you perceive the punishment? (2) How trustworthy/upright/benevolent do you perceive the punisher? (3) How dominant/competent do you perceive the punisher? (4) Would you like the punisher to be your friend? [child option: If you were a kid yourself, how much would you like to befriend the punisher?] (5) Would you like the punisher to be your team leader? [child option: If you were a kid yourself, how much would you like to be in a group led by the punisher?]. All scales ranged from not at all to very, except the first question (punishment adequacy), which ranged from too weak to too strong with adequate in the center. For use in assuring data quality, four attention checks were intermixed with the vignettes, requiring participants to press a specified pole of the scale (e.g., Please press [not at all / very]).

2.1.4. Design and data analysis

Each experimental condition (according to the two factors type of transgression and type of punishment, each with the levels property-oriented, corporal, and psychological; nine conditions in total) was presented four times. The 36 experimental trials covered all four situational contexts (learning, work-related, everyday life, and social relational). To avoid memory effects, participants were exposed to a maximum of two vignettes from each of the 24 scenarios. Half of the trials involved male, and the other half female characters. To counterbalance gender and conditions, participants were evenly assigned to each of the 12 versions of possible vignette combinations.

Experiment 1 followed a 3 × 3 within-subjects design with the factors type of transgression (property-oriented, corporal, or psychological) and type of punishment (property-oriented, corporal, or psychological). Separate repeated measures analyses of variance (rmANOVA) were conducted for each of the five ratings. Greenhouse–Geisser corrections were applied if Mauchly’s test showed a violation of sphericity. In case of significant main effects or interactions, we computed paired t-tests to locate the origin of the effects, with Bonferroni-Holm adjustments for multiple comparisons. Additionally, we calculated correlations for all five average ratings of the dependent variables (DVs). We report ηp 2 and Cohen’s d as effect sizes.

Finally, mediation analyses clarified whether punishment type affected the participants’ tendency to interact with the punisher, mediated by the evaluation of the punisher. Perceived warmth was tested as a mediator for the willingness to befriend the punisher, and perceived competence as a mediator for the willingness to join a team led by the punisher (according to the stereotype content model; Abele et al., Reference Abele, Ellemers, Fiske, Koch and Yzerbyt2021; Fiske et al., Reference Fiske, Cuddy, Glick and Xu2002). Mediation models were calculated using a bootstrapping-based approach with 1000 resampling simulations (Preacher and Hayes, Reference Preacher and Hayes2008).

2.2. Results

The data and analysis codes for all experiments are available on the Open Science Framework (https://osf.io/fr9bt/overview). For Experiment 1, rating means per condition and for all DVs are displayed in Figure 2 and reported in Table 1. Detailed pairwise comparisons can be found in Table 2.

Figure 2 Rating means per condition and DVs and correlations between DVs (Experiment 1).

Note: (A) Perceived adequacy of the punishment. Four on the rating scale indicates appropriate adequacy, with lower values indicating too weak punishment and higher values indicating too strong punishment. (B) Perceived warmth of the punisher. (C) Perceived competence of the punisher. (D) The correlation matrix displays Pearson correlations for all five ratings averaged across conditions. (E) The hypothetical willingness to befriend the punisher. (F) The hypothetical willingness to be part of a team led by the punisher. Error bars indicate standard errors. ns p ≥ .05, * p < .05, ** p < .01, *** p < .001. Detailed violin plots displaying participant-level dispersion are provided in the Supplementary Figure S1.

Table 1 Rating means (M) and standard deviations (SD) for types of punishment by types of transgression (Experiment 1)

Table 2 Pairwise comparisons of ratings between types of punishment, separately for types of transgression (Experiment 1)

Note: Prop = property-oriented, corp = corporal, psych = psychological. Cohen’s ds (d) are displayed as effect sizes. * p < .05, ** p < .01, *** p < .001.

2.2.1. Adequacy of punishment

Contrary to H1, type of transgression did not systematically affect the perceived adequacy of the ensuing punishment (F(2,94) = 2.77, p = .068, ηp 2 = .06). However, we observed a main effect of punishment type (F(2,94) = 14.35, p < .001, ηp 2 = .23); that is, both property-oriented and corporal punishment were perceived as less adequate (i.e., too strong) than psychological punishment (t(47) ≥ 4.13, p ≤ .001, d ≥ 0.60; averaged across transgression types, see Table S2 in the Supplementary Material for details). The difference between property-oriented and corporal punishment was not significant (t < 1). Punishment and transgression type did not significantly interact (F < 1).

2.2.2. Warmth of the punisher

Evaluations of the punisher’s warmth did not vary between transgression types (F(2,94) = 3.07, p = .051, ηp 2 = .06). By contrast, perceived warmth depended on the administered punishment type (F(2,94) = 27.46, p < .001, ηp 2 = .37). In line with H2b, property-oriented punishers obtained lower ratings than corporal and psychological punishers (t(47) ≥ 4.10, p < .001, d ≥ 0.59; see Table S2 in the Supplementary Material). Furthermore, ratings for corporal punishers were lower than those for psychological punishers (t(47) = 3.12, p = .003, d = 0.45). As reflected in the significant interaction (F(4,188) = 7.48, p < .001, ηp 2 = .14), this pattern differed depending on the type of preceding transgression: Property-oriented punishers were perceived as less warm than corporal and psychological punishers, except when reacting to property-oriented transgressions (statistical values (t, p, d) are depicted in Table 2). Similarly, corporal punishers were perceived as less warm than psychological punishers, except when reacting to corporal transgressions. Psychological punishers received the highest evaluations when punishing psychological transgressions. Hence, supporting H3, warmth evaluations improved when the punitive act matched the transgression.

2.2.3. Competence of the punisher

Comparable to warmth ratings, we found no significant main effect of transgression type (F < 1), but a significant main effect of punishment type on the perceived competence of the punisher (F(2,94) = 15.32, p < .001, ηp 2 = .25). Mean ratings were lower for property-oriented punishers than for corporal and psychological punishers (t(47) ≥ 2.51, p ≤ .016, d ≥ 0.36; see Table S2 in the Supplementary Material). Corporal punishers were also rated as less competent than psychological punishers (t(47) = 2.90, p = .012, d = 0.42). Again, type of transgression interacted with type of punishment (F(4,188) = 11.49, p < .001, ηp 2 = .20, ε = .85; GG-corrected). Participants evaluated property-oriented punishers as least competent, except when addressing property-oriented transgressions (for statistical indices, see Table 2), again in line with H3. Similarly, they judged corporal punishers as significantly less competent than psychological punishers, except when responding to corporal transgressions. Evaluations of psychological punishers were highest after psychological transgressions.

2.2.4. Tendency to accept the punisher as a friend

Once more, type of transgression did not affect the punisher evaluation (F(2,94) = 1.05, p = .353, ηp 2 = .02), but type of punishment did (F(2,94) = 22.71, p < .001, ηp 2 = .33). We observed less willingness to befriend property-oriented punishers than corporal and psychological punishers (t(47) ≥ 2.90, p ≤ .006, d ≥ 0.42), as well as corporal compared to psychological punishers (t(47) = 3.75, p < .001, d = 0.54; see Table S2 in the Supplementary Material). These preferences again depended on transgression type, indicated by a significant interaction (F(4,188) = 8.10, p < .001, ηp 2 = .15). Confirming H3, property-oriented punishers were liked less as friends than those using corporal and psychological punishments, except when punishing property-oriented transgressions (see Table 2). Corporal punishers were liked less than psychological punishers, unless punishments followed corporal transgressions. The highest willingness to befriend a punisher was found when psychological punishers responded to psychological transgressions.

2.2.5. Tendency to accept the punisher as a team leader

In contrast to the other ratings, we found a small main effect of transgression type (F(2,94) = 4.26, p < .05, ηp 2 = .08). Punishers intervening after corporal compared to psychological transgressions were less preferred as team leaders (t(47) = 2.66, p = .033, d = 0.38). Type of punishment impacted the willingness to be part of a team led by the punisher (F(2,94) = 26.78, p < .001, ηp 2 = .36), with mean ratings being lower for property-oriented punishers than for corporal and psychological punishers (t(47) ≥ 3.12, p ≤ .003, d ≥ 0.45), and for corporal than psychological punishers (t(47) = 4.05, p < .001, d = 0.58; see Table S2 in the Supplementary Material). The significant interaction between punishment and transgression type (F(4,188) = 8.84, p < .001, ηp 2 = .16) revealed that participants liked property-oriented punishers less in the role of team leaders than corporal punishers addressing corporal transgressions and psychological punishers addressing either corporal or psychological transgressions (see Table 2). This aversion for property-oriented punishers was absent after property-oriented transgressions. Similarly, corporal punishers were less preferred as team leaders than psychological punishers, unless they reacted to corporal transgressions. Psychological punishers responding to psychological transgressions elicited the highest preference as team leaders.

2.2.6. Mediation analyses

When investigating how warmth and competence ratings related to interaction tendencies, we observed strong correlations between assigned warmth and the willingness to befriend the punisher (r(1726) = .76, p < .001) and between assigned competence and the willingness to accept the punisher as a team leader (r(1726) = .70, p < .001), supporting the idea that perceived warmth fosters the inclination to befriend someone, and perceived competence the willingness to accept someone as a leader (e.g., Cuddy et al., Reference Cuddy, Fiske and Glick2008, Reference Cuddy, Glick and Beninger2011; Fiske et al., Reference Fiske, Cuddy, Glick and Xu2002, 2007). It is noteworthy, however, that warmth also correlated with leader preference, and competence with friendship choice (see Figure 2D). Mediation analyses tested how punishment type (using property-oriented vs. psychological as the two most diverging types) influenced friendship or leadership choices through perceived warmth or competence. Results showed that perceived warmth rendered the direct effect of punishment type on the inclination to befriend the punisher non-significant, suggesting full mediation (see panel A of Figure 3). Perceived competence partially mediated the effect of punishment type on the willingness to be led by the punisher (see panel B of Figure 3).

Figure 3 Results of mediation analyses for the effect of type of punishment on interaction tendencies mediated by perceptions of the punisher (Experiment 1).

Note: prop = property-oriented, psych = psychological. c’ indicates the direct effect of type of punishment on the interaction tendency, c indicates the total effect c’ + a * b. (A) The perceived warmth of the punisher was revealed as a full mediator for the willingness to befriend the punisher. (B) The perceived competence of the punisher was revealed as a partial mediator for the willingness to accept the punisher as a team leader.

2.3. Discussion

A consistent pattern was observed across all DVs. While type of transgression played a minor role (Hypothesis 1), type of punishment clearly shaped perceived punishment adequacy, warmth, and competence attributed to the punisher, and the willingness to interact with the punisher as a friend or team member (evaluations for property-oriented < corporal < psychological). Although we initially grouped corporal and psychological punishments as direct, person-related punitive measures, our findings reveal that observers evaluated them differently. Overall, results support Hypothesis 2b and align with the literature highlighting the acceptance and effectiveness of socially oriented punishments (Chen and Xu, Reference Chen and Xu2020; Heffner and FeldmanHall, Reference Heffner and FeldmanHall2019; Kupfer and Tybur, Reference Kupfer and Tybur2023; Vittrup and Holden, Reference Vittrup and Holden2010).

In line with Hypothesis 3, less favored punishments rehabilitated after congruent transgressions in all DVs except the rating of punishment adequacy. Observers seem to possess an intuitive sense of the relative seriousness of different transgression types and support punishment interventions that fit these infractions (Sznycer and Patrick, Reference Sznycer and Patrick2020), judging them as especially competent, fair, and justified (Arini et al., Reference Arini, Wiggs and Kenward2021; Carlsmith et al., Reference Carlsmith, Darley and Robinson2002; Hofmann et al., Reference Hofmann, Brandt, Wisneski, Rockenbach and Skitka2018).

Taken together, Experiment 1 results show that participants prioritized the punisher’s action over the perpetrator’s action in their evaluations, indicating that punishment characteristics fundamentally shape bystanders’ opinions. Moreover, participants considered punishment contextually in relation to the transgression rather than in isolation. To explore this dynamics further, Experiment 2 shifted the focus from matching punishment and transgression types to manipulating another central aspect of the punitive act—its severity (e.g., Batistoni et al., Reference Batistoni, Barclay and Raihani2022; Liu et al., Reference Liu, Yang and Wu2021; Solomon and Lee, Reference Solomon and Lee2025; Zhang and Qi, Reference Zhang and Qi2024). Critically, we also investigated how severity interacts with punishment type to uncover more complex patterns in observers’ judgments (see, e.g., Peterson, Reference Peterson2024).

3. Experiment 2

Experiment 2 investigated how punishment type and severity affect the evaluations of punishment and punisher. We included property-oriented, corporal, and psychological punishments that, unlike in Experiment 1, always aligned with transgression types. As a novel factor, we introduced punishments varying in severity (weak or strong), with both levels deviating equally from a medium-severity transgression. As before, participants rated the punishment (adequacy), the punisher (warmth and competence), and their willingness to interact with the punisher (as friend and team leader). We hypothesized that stronger punishment, potentially being perceived as irrational and impulsive (Dhaliwal et al., Reference Dhaliwal, Patil and Cushman2021; Lee and Warneken, Reference Lee and Warneken2020; Zhang and Qi, Reference Zhang and Qi2024), would elicit more negative ratings than weaker punishment (H4). Building on findings of Experiment 1, we expected property-oriented punishment to receive more negative evaluations than corporal or especially psychological punishment (H2b). Finally, we investigated whether severity ratings depend on the type of punishment. Given that severe corporal harm can be particularly detrimental and fear-inducing (Ripoll-Núñez and Rohner, Reference Ripoll-Núñez and Rohner2006; Smetana and Ball, Reference Smetana and Ball2019), for instance in child upbringing contexts (Brown et al., Reference Brown, Holden and Ashraf2018; Larzelere and Kuhn, Reference Larzelere and Kuhn2005), we predicted that negative appraisal of severe punishment would be most pronounced for corporal punishment (H5).Footnote 2

3.1. Methods

3.1.1. Participants

We collected data of 50 participants, comprising students enrolled at the University of Würzburg and volunteers. Students received compensation in the form of course credit, and volunteers did not receive any compensation. Two participants were excluded for failing the preregistered attention checks (see Participants section of Experiment 1). The final sample (N = 48) consisted of 19 male, 28 female, and 1 diverse-gendered individuals with a mean age of 25.5 years (SD = 6.4, range 20–60; two participants did not provide age information).

3.1.2. Task and procedure

See Experiment 1.

3.1.3. Design and data analysis

The design closely resembled that of Experiment 1, with changes to the within-subjects factors and modifications to the number of experimental trials. Type of transgression remained congruent with type of punishment and was no longer treated as a within-subjects factor. Instead, severity of punishment was introduced as a new within-subjects factor (see Figure 1). Each experimental condition (according to the two factors severity of punishment with the levels weak and strong; and type of punishment with the levels property-oriented, corporal, and psychological; six conditions in total) was presented four times. The 24 experimental trials covered all four situational contexts (learning, work-related, everyday life, and social relational), with participants encountering one vignette from each of the 24 scenarios. Counterbalancing was equivalent to Experiment 1.

The data were analyzed similarly to Experiment 1, employing rmANOVA for the five rating questions. Experiment 2 followed a 2 × 3 within-subjects design with the factors severity of punishment (weak vs. strong) and type of punishment (property-oriented, corporal, or psychological; aligned to the preceding transgression). Mediation models were the same as those in Experiment 1.

3.2. Results

Rating means per condition and for all DVs are displayed in Figure 4 and reported in Table 3.

Figure 4 Rating means per condition and DV and correlations between DVs (Experiment 2).

Note: (A) Perceived adequacy of the punishment. Four on the rating scale indicates appropriate adequacy, with lower values indicating too weak punishment and higher values indicating too strong punishment. (B) Perceived warmth of the punisher. (C) Perceived competence of the punisher. (D) The correlation matrix displays Pearson correlations for all five ratings averaged across conditions. (E) The hypothetical willingness to befriend the punisher. (F) The hypothetical willingness to be part of a team led by the punisher. Error bars indicate standard errors. ns p >= .05, * p < .05, ** p < .01, *** p < .001. Detailed violin plots displaying participant-level dispersion are provided in the Supplementary Figure S2.

Table 3 Rating means (M) and standard deviations (SD) for types of punishment by severities of punishment (Experiment 2)

3.2.1. Adequacy of punishment

In line with H4, we observed a main effect of punishment severity (F(1,47) = 138.10, p < .001, ηp 2 = .75) such that strong punishment was rated as less adequate (i.e., too strong) than weak punishment. Type of punishment reached significance (F(2,94) = 12.62, p < .001, ηp 2 = .21), indicating that property-oriented and corporal punishment, which were perceived similarly (t < 1), were both considered less adequate (i.e., too strong) than psychological punishment (t(47) ≥ 4.02, p < .001, d ≥ 0.58; see Table S4 in the Supplementary Material). As reflected in the significant severity × punishment type interaction (F(2,94) = 5.86, p = .004, ηp 2 = .11), this pattern differed depending on punishment severity: Property-oriented punishments were only perceived as less adequate than psychological acts when severity was strong (t(47) = 5.07, p < .001, d = 0.73) but not when it was weak (t(47) = 1.44, p = .268, d = 0.21).

3.2.2. Warmth of the punisher

Evaluations of the punisher’s warmth differed depending on punishment severity (F(1,47) = 103.51, p < .001, ηp 2 = .69), demonstrating that those who punished strongly were perceived as less warm than those who punished weakly. Type of punishment also had a significant impact (F(2,94) = 31.95, p < .001, ηp 2 = .41, ε = .88; GG-corrected). Specifically, property-oriented and corporal punishers were rated lower than psychological punishers (t(47) ≥ 5.25, p < .001, d ≥ 0.76), while the difference between property-oriented and corporal punishers was not significant (t(47) = 1.90, p = .064, d = 0.27; see Table S4 in the Supplementary Material for details). Findings support H2b, especially regarding the contrast between property-oriented and psychological punishment. Contrary to expectations, we found no significant severity × punishment type interaction (F < 1).

3.2.3. Competence of the punisher

As for adequacy and warmth, we found a main effect of punishment severity on the perceived competence of the punisher (F(1,47) = 24.17, p < .001, ηp 2 = .34). Strong punishers were viewed as less competent than weak punishers. Additionally, type of punishment impacted competence evaluations (F(2,94) = 21.40, p < .001, ηp 2 = .31, ε = .88; GG-corrected). Pairwise comparisons (see Table S4 in the Supplementary Material) yielded significantly lower competence ratings for property-oriented than for corporal and psychological punishers (t(47) ≥ 2.09, p ≤ .042, d ≥ 0.30), and for corporal than for psychological punishers (t(47) = 4.39, p < .001, d = 0.63), further underscoring H2b. As for perceived warmth, the severity × punishment type interaction was not significant (F(2,94) = 1.27, p = .285, ηp 2 = .03).

3.2.4. Tendency to accept the punisher as a friend

Once more, severity of the punishment affected punisher evaluations (F(1,47) = 145.42, p < .001, ηp 2 = .76), as strong punishers were less preferred as friends than weak punishers. Participants took different punishing types into account when evaluating their willingness to befriend the punisher (F(2,94) = 34.85, p < .001, ηp 2 = .43). Property-oriented and corporal punishers were less likely to be sought as friends than psychological punishers (t(47) ≥ 6.53, p < .001, d ≥ 0.94; see Table S4 in the Supplementary Material). The difference between property-oriented and corporal punishers was not significant (t(47) = 1.80, p = .079, d = 0.26). This pattern was similar for both severity levels, as indicated by the absence of interaction (F(2,94) = 1.37, p = .259, ηp 2 = .03).

3.2.5. Tendency to accept the punisher as a team leader

The tendency to accept the punisher as a team leader varied with punishment severity (F(1,47) = 102.97, p < .001, ηp 2 = .69), being lower for strong punishers than for weak punishers. Additionally, we found a main effect of type of punishment (F(2,94) = 33.02, p < .001, ηp 2 = .41), with participants being less willing to be led by property-oriented or corporal than by psychological punishers (t(47) ≥ 5.98, p < .001, d ≥ 0.86; see Table S4 in the Supplementary Material). The difference between property-oriented and corporal punishers was not significant (t(47) = 1.66, p = .104, d = 0.24). As before and opposed to what we initially expected, we found no severity × punishment type interaction (F < 1).

3.2.6. Mediation analyses

Analogous to Experiment 1, we conducted two mediation analyses (Figure 5). When controlling for perceived warmth, participants still preferred psychological over property-oriented punishers as friends, albeit to a smaller degree, suggesting partial mediation (see panel A of Figure 5). Similarly, participants were more willing to be led by psychological than by property-oriented punishers, even when controlling for perceived competence (see panel B of Figure 5). In general, strong effects of warmth on friendship decisions and competence on leadership preferences emphasize the importance of these dimensions in guiding interaction inclinations, further underlined by robust correlations between warmth and friendship (r(1150) = .81, p < .001) and between competence and leadership (r(1150) = .67, p < .001; see Figure 4D). Note, however, that ascriptions of warmth also correlated with choosing punishers as leaders, and ascriptions of competence with choosing them as friends.

Figure 5 Results of mediation analyses for the effect of type of punishment on interaction tendencies mediated by perceptions of the punisher (Experiment 2).

Note: prop = property-oriented, psych = psychological. c’ denotes the direct effect of type of punishment on the interaction tendency, and c denotes the total effect c’ + a * b. (A) The perceived warmth of the punisher was revealed as a partial mediator for the willingness to befriend the punisher. (B) The perceived competence of the punisher was revealed as a partial mediator for the willingness to accept the punisher as a team leader.

3.3. Discussion

Third parties exerting strong compared to weak punishment were evaluated more negatively across all rating dimensions, aligning with Hypothesis 4 (e.g., Eriksson et al., Reference Eriksson, Andersson and Strimling2016; Lee and Warneken, Reference Lee and Warneken2020; Liu et al., Reference Liu, Yang and Wu2021). Our findings are in line with type-specific studies examining punishment severity. For instance, mild corporal punishment was seen as regrettable but tolerable, while severe corporal punishment was viewed as abusive and morally incompetent on the punisher’s part (Brown et al., Reference Brown, Holden and Ashraf2018). Similarly, research contrasting weak vs. strong property-oriented punishment (taking a couple versus many items away) found that older children disapproved of harsh punishment and preferred moderate, lenient interventions (Solomon and Lee, Reference Solomon and Lee2025). For psychological punishment, strong responses (e.g., suspension instead of a warning for minor infractions) were perceived as excessive and unwarranted (Peterson, Reference Peterson2024), and harsh instead of mild verbal reprimands eroded trust in the punisher, particularly by diminishing perceived benevolence and integrity (Zhang and Qi, Reference Zhang and Qi2024). None of these studies had included different punishment types, leaving open any potential interactions of type and severity of punitive acts.

Notably, our study demonstrated that the difference between weak and strong punishment evaluations remained consistent across all three punishment types in almost all dependent measures (Hypothesis 5). An exploratory visual examination, however, suggested that harsh corporal punishment garnered the numerically strongest disapproval in learning and workplace settings (see Text S3 and Table S5 in the Supplementary Material). Though only a preliminary numerical trend, this pattern may reflect legal and ethical codes that especially take effect in professional environments (Peterson, Reference Peterson2024), where concerns about power abuse or abusive supervision are particularly prominent (Tepper, Reference Tepper2007).

Finally, we replicated Experiment 1 findings regarding type judgments, with psychological punishment preferred over property-oriented punishment (Hypothesis 2b). Corporal punishment was rated between these two types in weak punishment scenarios but received similarly low evaluations as property-oriented punishment in strong punishment scenarios, where the salience of the sanctions’ harshness may have overshadowed any further differentiation between types (except for the consistently positive perception of psychological sanctions).

4. General discussion

Punishment is a tricky business: While it stabilizes cooperation in groups, it can tarnish the punisher’s reputation or, in worst-case scenarios, trigger feuds. To better understand punishment appraisal, this study manipulated key characteristics of TPP and investigated their effect on the evaluation of punishment adequacy, the punisher’s traits in terms of warmth and competence, and the observer’s hypothetical willingness to engage with the punisher in the future.

All punishment characteristics implemented in our study shaped observers’ evaluations. Participants preferred psychological over corporal and property-oriented sanctions (Hypothesis 2b; Experiments 1 and 2; punishment type) and weaker over stronger punishments (Hypothesis 4; Experiment 2; punishment severity). Proportionality mattered for the ratings concerning the punisher’s warmth and competence, and the willingness to interact with the punisher in the future, with punishments aligning to the type of preceding transgression rated more positively (Hypothesis 3; Experiment 1; transgression type). Interestingly, punitive interventions were not perceived negatively per se. Beyond the experimental manipulations, they were generally viewed as appropriate, with punishers judged moderately warm and competent, and participants expressing some inclination to interact with them.

Across experiments, we tested two competing hypotheses regarding punishment type. First, property-oriented sanctions may receive better evaluations due to their established use in the legal system, their focus on targeting a person’s belongings rather than their body or psyche (Guala, Reference Guala2012; Schoenmakers et al., Reference Schoenmakers, Hilbe, Blasius and Traulsen2014), and their perception as rational and predictable, offering clear incentives against future misconduct (Molho et al., Reference Molho, Twardawski and Fan2022). The counterhypothesis conjectured that less institutionalized person-oriented corporal or psychological punishments would be judged superior in everyday interactions (Balliet et al., Reference Balliet, Molho, Columbus and Dores Cruz2022; Molho et al., Reference Molho, Tybur, Van Lange and Balliet2020; Nelissen and Mulder, Reference Nelissen and Mulder2013). Our findings consistently demonstrated a preference for psychological punishment, like verbal disapproval (even if formulated harshly) or suggestions for future comportment, which can promote trust in the third party’s behavior, demonstrate personal investment and consideration, and produce long-lasting effects (Kupfer and Tybur, Reference Kupfer and Tybur2023; Philippsen et al., Reference Philippsen, Mieth, Buchner and Bell2023). Critically, while previous studies showed the efficacy of psychological sanctions when combined with financial penalties (Chen and Xu, Reference Chen and Xu2020; Nelissen and Mulder, Reference Nelissen and Mulder2013), our findings suggest that they are equally valued as stand-alone punishment. In addition, observers may view psychological punishment as most sustainable for group well-being, avoiding both resource depletion as financial punishment (Dreber et al., Reference Dreber, Rand, Fudenberg and Nowak2008; Wu et al., Reference Wu, Balliet and Van Lange2016) and threats to physical integrity of group members as corporal punishment (Eriksson et al., Reference Eriksson, Andersson and Strimling2016). The less favorable ratings of property-oriented sanctions may partly reflect expectations that especially these penalties should be imposed by legal agencies (Eriksson et al., Reference Eriksson, Strimling and Ehn2013; Martin et al., Reference Martin, Jordan, Rand and Cushman2019; Raihani and Bshary, Reference Raihani and Bshary2019) rather than by unauthorized, equal-status peers as in our scenarios (Gordon et al., Reference Gordon, Madden and Lea2014; Guala, Reference Guala2012; Schoenmakers et al., Reference Schoenmakers, Hilbe, Blasius and Traulsen2014). Our flexible vignette paradigm offers an ideal platform to test these dynamics further. Because punishment (as well as reward) is a backbone of sustaining large-scale cooperation, understanding how both public authority figures and lay third parties can gain a better reputation and legitimacy across different punishment types is crucial (Tyler et al., Reference Tyler, Goff and MacCoun2015). Importantly, our within-subjects design (in contrast to, e.g., Eriksson et al., Reference Eriksson, Andersson and Strimling2016; Martin et al., Reference Martin, Jordan, Rand and Cushman2019) enabled us to carve out differences between psychological, corporal, and property-oriented punishment while accounting for participants’ baseline attitudes. This design choice, along with the large sample size, supports the credibility of our results.

Research shows that participants infer punishers’ personality traits based on the perceived appropriateness of their actions. Those who administer fair and proportionate sanctions are ascribed higher trustworthiness, warmth, and competence (e.g., Barclay, Reference Barclay2006; Jordan et al., Reference Jordan, Hoffman, Bloom and Rand2016; Raihani and Bshary, Reference Raihani and Bshary2015), and less irrationality (Lee and Warneken, Reference Lee and Warneken2020; Liu et al., Reference Liu, Yang and Wu2021). Our experiments extend these findings, revealing a preference for punishment types that match the preceding transgression type, judging these interventions as more appropriate and rating the respective punishers as especially warm and competent (Hofmann et al., Reference Hofmann, Brandt, Wisneski, Rockenbach and Skitka2018; Sznycer and Patrick, Reference Sznycer and Patrick2020). When punishments diverged from the transgression’s severity, milder sanctions were favored over harsher ones (Hypothesis 4). Notably, our study is among the first to investigate the impact of punishment severity across three distinct punishment types, indicating that both type and severity independently shape judgments. Severe punishments consistently led to greater disapproval, regardless of their specific form (contradicting Hypothesis 5).

Research on the signaling value of TPP suggests that intervening third parties can gain social benefits, like access to new social partners (e.g., Barclay, Reference Barclay2006; Jordan et al., Reference Jordan, Hoffman, Bloom and Rand2016; Santos et al., Reference Santos, Rankin and Wedekind2011). Our findings contribute to this research by demonstrating (i) how core characteristics of punishment scenarios shape participants’ propensity for future interactions with punishers and (ii) how punisher evaluations drive these interactions. Participants showed the highest interest in interacting with punishers when punishments were psychological (particularly following psychological transgressions) and mild. Additionally, mediation analyses revealed that inclinations for future interactions were largely driven by punisher evaluations. Specifically, perceived warmth fully (Experiment 1) or partially (Experiment 2) mediated the effect of punishment type (property-oriented vs. psychological) on hypothetical friendship, suggesting that social sanctions conveyed warmth and benevolence, which in turn increased affiliation (Cuddy et al., Reference Cuddy, Fiske and Glick2008). Perceived competence partially mediated the effect on hypothetical leadership in both experiments. The fact that punishment type retained a direct impact on leadership acceptance beyond perceived competence implies that additional criteria matter—potentially detrimental decisions may carry greater weight in leadership than friendship contexts (Dong et al., Reference Dong, Van Prooijen and Van Lange2022). Altogether, while punishment type still exhibited direct effects, punishers’ inferred personal qualities clearly shaped social interaction tendencies (in line with the stereotype content model, Cuddy et al., Reference Cuddy, Glick and Beninger2011; Fiske et al., Reference Fiske, Cuddy and Glick2007). It needs to be noted that correlational analyses suggested a high overlap between the dimensions warmth, competence, potential friendship, and potential leadership. This likely reflects both methodological decisions and conceptual overlap. Methodologically, the similar response formats for the DVs and the close succession of the ratings may have encouraged participants to provide relatively consistent judgments. Furthermore, research indicates that warmth and competence are often co-attributed in moral evaluations, particularly when little information is available, as in our vignettes (Abele and Wojciszke, Reference Abele and Wojciszke2007; Fiske et al., Reference Fiske, Cuddy and Glick2007).

Building on vignette-based approaches (e.g., Gordon et al., Reference Gordon, Madden and Lea2014; Lee and Warneken, Reference Lee and Warneken2020), we employed an exceptionally wide range of scenarios with diverse punishment strategies. Realistic vignettes about familiar transgressions and punishments likely evoke stronger emotional processing (Martin et al., Reference Martin, Jordan, Rand and Cushman2019) and are more relatable than abstract economic game settings involving monetary punishments in the range of a few cents (Guala, Reference Guala2012). Evaluations were remarkably consistent: Preferences for psychological sanctions remained stable across learning, work, everyday life, and social relationship contexts involving all age groups. Within punishment types, we utilized a comprehensive range of prevalent strategies, such as reprimand, gossip, blame, public shaming, or social exclusion for psychological punishments (see, e.g., Balliet et al., Reference Balliet, Molho, Columbus and Dores Cruz2022). Post-hoc inspections of individual vignette evaluations provided first insights into which specific punitive strategies were (dis)favored (see Text S2  and Text S3 in the Supplementary Material for details and additional analyses). Although our analysis of subtypes was exploratory, future studies can adjust the repertoire of vignettes to systematically investigate differences within punishment types, e.g., by statistically comparing the evaluation of non-confrontational indirect vs. confrontational direct approaches.

4.1. Limitations

Participants assumed an observer’s perspective in scenarios delineated in written vignettes. While hypothetical responses cannot fully reflect real-life reactions to actual events (e.g., Carlsmith, Reference Carlsmith2006; Cui et al., Reference Cui, Wang, Cao and Jiao2019; FeldmanHall et al., Reference FeldmanHall, Dalgleish, Thompson, Evans, Schweizer and Mobbs2012)—for instance, third parties exhibit greater leniency toward harm in hypothetical situations due to the lack of personal involvement and consequences (Bostyn et al., Reference Bostyn, Sevenhant and Roets2018)—our goal was not to measure participants’ own punitive behavior but rather to examine how observers evaluate punishment strategies enacted by others. Further, to foster imagery and immersion in the situation, we employed vignettes enriched with detailed social information (see Evans et al., Reference Evans, Roberts, Keeley, Blossom, Amaro, Garcia, Stough, Canter, Robles and Reed2015). We nonetheless acknowledge the constraint that judgments of hypothetical situations may differ from evaluations of real observations.

Next, we relied on two non-representative samples mainly composed of German neurotypical, often female students with an interest in psychological research, who may differ from the general population in decision-making and moral motivation (Cappelen et al., Reference Cappelen, Nygaard, Sørensen and Tungodden2015). For instance, prior work suggests that women display differences in their sensitivity to harm or preference for prosocial punishment (Kamas and Preston, Reference Kamas and Preston2021). Additionally, support for certain sanctions can vary between individualistic and collectivistic cultures; for example, while physical confrontation and ostracism were more accepted in collectivistic, high power-distance societies, more emancipative societies favored gossip and disapproved of punishment more strongly (Eriksson et al., Reference Eriksson, Strimling, Gelfand, Wu, Abernathy, Akotia, Aldashev, Andersson, Andrighetto, Anum, Arikan, Aycan, Bagherian, Barrera, Basnight-Brown, Batkeyev, Belaus, Berezina, Björnstjerna and Van Lange2021). However, given that core moral values and social norms appear globally widespread with only minor regional variation (Alfano et al., Reference Alfano, Cheong and Curry2024), the observed support for psychological, proportionate, and mild punishments is unlikely to stem solely from sample characteristics. Still, findings should be interpreted with caution, and future research needs to replicate results in larger and representative samples.

Although we instructed participants that punishments were not intended to benefit the third party, this impression may have still contributed to more negative ratings, especially for property-oriented interventions (e.g., Eriksson et al., Reference Eriksson, Andersson and Strimling2016; Krasnow et al., Reference Krasnow, Delton, Cosmides and Tooby2016; Redhead et al., Reference Redhead, Dhaliwal and Cheng2021). Future studies could explicitly manipulate benefits for the third party or include a control question asking whether participants interpreted the punishment as self-serving.

Additionally, our vignettes did not provide information on the effectiveness of the punishment or the reaction of the perpetrator to the punitive action. Given prior emphasis on the relevance of punishment outcomes (Funk et al., Reference Funk, McGeer and Gollwitzer2014), this aspect could be included and even manipulated in our vignettes in future studies. This way, we could probe whether and to what extent the observer’s judgment or satisfaction concerning the punitive action is enhanced, e.g., when the perpetrator demonstrates a change in attitude or behavior.

Finally, we did not include alternative action options for third parties, such as compensation (e.g., Batistoni et al., Reference Batistoni, Barclay and Raihani2022; Heffner and FeldmanHall, Reference Heffner and FeldmanHall2019; Li et al., Reference Li, Hu, Xu and Li2021) or non-acting (e.g., Dhaliwal et al., Reference Dhaliwal, Patil and Cushman2021; Martin et al., Reference Martin, Jordan, Rand and Cushman2019). Our results imply that, in the absence of other options, punishing defectors is accepted as a default response to unfairness (Gromet and Darley, Reference Gromet and Darley2009). Future research could include a non-acting third party, allowing comparisons with those who choose not to intervene.

5. Conclusion

Considering the contentious public perception of third-party interventions (Dhaliwal et al., Reference Dhaliwal, Patil and Cushman2021; Raihani and Bshary, Reference Raihani and Bshary2015), a closer approximation and better understanding of the complex situational factors shaping evaluations of TPP is essential. In this series of experiments, we employed vignettes depicting various hypothetical yet realistic interactions. Taken together, psychological punishments, ranging from verbal reprimands to temporary exclusion from activities, were favored by external observers while corporal and resource-based punitive measures lacked broad support.

Let us return to the initially introduced bystander witnessing an impatient commuter aggressively skipping the queue. Our bystander should carefully calibrate their intervention: To be seen as competent and warm, they should choose a similar response type or opt for the widely accepted psychological approach. Ideally, they should tailor the severity of their reaction to the seriousness of the transgression or choose a slightly milder response. Remaining impartial, the bystander should focus on communicating the norm violation, offering the perpetrator an opportunity to reflect and adjust future behavior.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/jdm.2025.10021.

Disclosure of use of AI tools

During the preparation of this work, the authors used ChatGPT-5 (OpenAI, 2025) in order to improve the readability and language of the manuscript. The authors reviewed and edited the content as needed and take full responsibility for the content of the published article.

Data availability statement

The data presented in this study are openly available on the OSF at https://osf.io/fr9bt/overview.

Acknowledgments

We would like to thank our student research assistants, Maria Goldkin, Anne Kirsch, and Emil Stein, for their support in developing the study materials and assisting with data collection.

Funding statement

This research received no specific grant funding from any funding agency, commercial or not-for-profit sectors.

Competing interest

The authors declare no competing interests.

Footnotes

1 As noted in the preregistration, we conducted a third experiment examining the effects of social rank between the punisher and the perpetrator on third-party evaluations. The results of that experiment were quite complex and require follow-up studies to clarify the underlying social dynamics. To maintain thematic coherence, this manuscript focuses specifically on punishment–transgression dynamics (i.e., norm violation type and punishment type/severity), while the third experiment will be reported in future work.

2 In the preregistration, we grouped corporal and psychological punishments under the broader category of ‘person-related punishments’. Based on Experiment 1 results revealing distinct evaluation patterns between psychological and corporal punishment types and informed by theoretical literature, we refined the interaction hypothesis in Experiment 2 to focus specifically on corporal punishment rather than on both person-related punishment forms.

References

Abele, A. E., Ellemers, N., Fiske, S. T., Koch, A., & Yzerbyt, V. (2021). Navigating the social world: Toward an integrated framework for evaluating self, individuals, and groups. Psychological Review, 128(2), 290314. https://doi.org/10.1037/rev0000262 CrossRefGoogle ScholarPubMed
Abele, A. E., & Wojciszke, B. (2007). Agency and communion from the perspective of self versus others. Journal of Personality and Social Psychology, 93(5), 751763. https://doi.org/10.1037/0022-3514.93.5.751 CrossRefGoogle ScholarPubMed
Aldrovandi, S., Wood, A. M., & Brown, G. D. A. (2013). Sentencing, severity, and social norms: A rank-based model of contextual influence on judgments of crimes and punishments. Acta Psychologica, 144(3), 538547. https://doi.org/10.1016/j.actpsy.2013.09.007 CrossRefGoogle ScholarPubMed
Alfano, M., Cheong, M., & Curry, O. S. (2024). Moral universals: A machine-reading analysis of 256 societies. Heliyon, 10(6), e25940. https://doi.org/10.1016/j.heliyon.2024.e25940 CrossRefGoogle ScholarPubMed
Arini, R. L., Wiggs, L., & Kenward, B. (2021). Moral duty and equalization concerns motivate children’s third-party punishment. Developmental Psychology, 57(8), 13251341. https://doi.org/10.1037/dev0001191 CrossRefGoogle ScholarPubMed
Asp, E. W., Gullickson, J. T., Warner, K. A., Koscik, T. R., Denburg, N. L., & Tranel, D. (2019). Soft on crime: Patients with ventromedial prefrontal cortex damage allocate reduced third-party punishment to violent criminals. Cortex, 119, 3345. https://doi.org/10.1016/j.cortex.2019.03.024 CrossRefGoogle ScholarPubMed
Balafoutas, L., Nikiforakis, N., & Rockenbach, B. (2016). Altruistic punishment does not increase with the severity of norm violations in the field. Nature Communications, 7(1), 13327. https://doi.org/10.1038/ncomms13327 CrossRefGoogle Scholar
Balliet, D., Molho, C., Columbus, S., & Dores Cruz, T. D. (2022). Prosocial and punishment behaviors in everyday life. Current Opinion in Psychology, 43, 278283. https://doi.org/10.1016/j.copsyc.2021.08.015 CrossRefGoogle ScholarPubMed
Balliet, D., Mulder, L. B., & Van Lange, P. A. M. (2011). Reward, punishment, and cooperation: A meta-analysis. Psychological Bulletin, 137(4), 594615. https://doi.org/10.1037/a0023489 CrossRefGoogle ScholarPubMed
Barclay, P. (2006). Reputational benefits for altruistic punishment. Evolution and Human Behavior, 27(5), 325344. https://doi.org/10.1016/j.evolhumbehav.2006.01.003 CrossRefGoogle Scholar
Batistoni, T., Barclay, P., & Raihani, N. J. (2022). Third-party punishers do not compete to be chosen as partners in an experimental game. Proceedings of The Royal Society B: Biological Sciences, 289(1966), 20211773. https://doi.org/10.1098/rspb.2021.1773 CrossRefGoogle ScholarPubMed
Beersma, B., & Van Kleef, G. A. (2011). How the grapevine keeps you in line: Gossip increases contributions to the group. Social Psychological and Personality Science, 2(6), 642649. https://doi.org/10.1177/1948550611405073 CrossRefGoogle Scholar
Behnke, A., Strobel, A., & Armbruster, D. (2020). When the killing has been done: Exploring associations of personality with third-party judgment and punishment of homicides in moral dilemma scenarios. PLOS ONE, 15(6), e0235253. https://doi.org/10.1371/journal.pone.0235253 CrossRefGoogle ScholarPubMed
Bostyn, D. H., Sevenhant, S., & Roets, A. (2018). Of mice, men, and trolleys: Hypothetical judgment versus real-life behavior in trolley-style moral dilemmas. Psychological Science, 29(7), 10841093. https://doi.org/10.1177/0956797617752640 CrossRefGoogle ScholarPubMed
Brandt, H., Hauert, C., & Sigmund, K. (2003). Punishment and reputation in spatial public goods games. Proceedings of The Royal Society B: Biological Sciences, 270(1519), 10991104. https://doi.org/10.1098/rspb.2003.2336 CrossRefGoogle ScholarPubMed
Brown, A. S., Holden, G. W., & Ashraf, R. (2018). Spank, slap, or hit? How labels alter perceptions of child discipline. Psychology of Violence, 8(1), 19. https://doi.org/10.1037/vio0000080 CrossRefGoogle Scholar
Buckholtz, J. W., Asplund, C. L., Dux, P. E., Zald, D. H., Gore, J. C., Jones, O. D., & Marois, R. (2008). The neural correlates of third-party punishment. Neuron, 60(5), 930940. https://doi.org/10.1016/j.neuron.2008.10.016 CrossRefGoogle ScholarPubMed
Cappelen, A. W., Nygaard, K., Sørensen, E. Ø., & Tungodden, B. (2015). Social preferences in the lab: A comparison of students and a representative population. The Scandinavian Journal of Economics, 117(4), 13061326. https://doi.org/10.1111/sjoe.12114 CrossRefGoogle Scholar
Carlsmith, K. M. (2006). The roles of retribution and utility in determining punishment. Journal of Experimental Social Psychology, 42(4), 437451. https://doi.org/10.1016/j.jesp.2005.06.007 CrossRefGoogle Scholar
Carlsmith, K. M., Darley, J. M., & Robinson, P. H. (2002). Why do we punish? Deterrence and just deserts as motives for punishment. Journal of Personality and Social Psychology, 83(2), 284299. https://doi.org/10.1037/0022-3514.83.2.284 CrossRefGoogle ScholarPubMed
Carter, G. (2014). The reciprocity controversy. Animal Behavior and Cognition, 1(3), 368386.10.12966/abc.08.11.2014CrossRefGoogle Scholar
Chen, S., & Xu, Y. (2020). Warmth and competence: Impact of third-party punishment on punishers’ reputation. Acta Psychologica Sinica, 52(12), 14361451. https://doi.org/10.3724/SP.J.1041.2020.01436 CrossRefGoogle Scholar
Cinyabuguma, M., Page, T., & Putterman, L. (2005). Cooperation under the threat of expulsion in a public goods experiment. The Experimental Approaches to Public Economics, 89(8), 14211435. https://doi.org/10.1016/j.jpubeco.2004.05.011 CrossRefGoogle Scholar
Civai, C., Huijsmans, I., & Sanfey, A. G. (2019). Neurocognitive mechanisms of reactions to second- and third-party justice violations. Scientific Reports, 9(1), 9271. https://doi.org/10.1038/s41598-019-45725-8 CrossRefGoogle ScholarPubMed
Cuddy, A. J. C., Fiske, S. T., & Glick, P. (2008). Warmth and competence as universal dimensions of social perception: The stereotype content model and the BIAS map. Advances in Experimental Social Psychology, 40, 61149. https://doi.org/10.1016/S0065-2601(07)00002-0 CrossRefGoogle Scholar
Cuddy, A. J. C., Glick, P., & Beninger, A. (2011). The dynamics of warmth and competence judgments, and their outcomes in organizations. Research in Organizational Behavior, 31, 7398. https://doi.org/10.1016/j.riob.2011.10.004 CrossRefGoogle Scholar
Cui, F., Wang, C., Cao, Q., & Jiao, C. (2019). Social hierarchies in third-party punishment: A behavioral and ERP study. Biological Psychology, 146, 107722. https://doi.org/10.1016/j.biopsycho.2019.107722 CrossRefGoogle ScholarPubMed
Death Penalty Information Center. (2023). The death penalty in 2023: Year end report. Death Penalty Information Center. https://deathpenaltyinfo.org/facts-and-research/dpic-reports/dpic-year-end-reports/the-death-penalty-in-2023-year-end-report Google Scholar
Delton, A. W., & Krasnow, M. M. (2017). The psychology of deterrence explains why group membership matters for third-party punishment. Evolution and Human Behavior, 38(6), 734743. https://doi.org/10.1016/j.evolhumbehav.2017.07.003 CrossRefGoogle Scholar
Dhaliwal, N. A., Patil, I., & Cushman, F. (2021). Reputational and cooperative benefits of third-party compensation. Organizational Behavior and Human Decision Processes, 164, 2751. https://doi.org/10.1016/j.obhdp.2021.01.003 CrossRefGoogle Scholar
Dimitroff, S. J., Harrod, E. G., Smith, K. E., Faig, K. E., Decety, J., & Norman, G. J. (2020). Third-party punishment following observed social rejection. Emotion, 20(4), 713720. https://doi.org/10.1037/emo0000607 CrossRefGoogle ScholarPubMed
Dong, M., Van Prooijen, J.-W., & Van Lange, P. A. M. (2022). Strategic exploitation by higher-status people incurs harsher third-party punishment. Social Psychology, 53(4), 209220. https://doi.org/10.1027/1864-9335/a000493 CrossRefGoogle Scholar
Dreber, A., Rand, D. G., Fudenberg, D., & Nowak, M. A. (2008). Winners don’t punish. Nature, 452(7185), 348351. https://doi.org/10.1038/nature06723 CrossRefGoogle ScholarPubMed
Dunbar, R. I. M. (2004). Gossip in evolutionary perspective. Review of General Psychology, 8(2), 100110. https://doi.org/10.1037/1089-2680.8.2.100 CrossRefGoogle Scholar
Eisenberg, N., & Miller, P. A. (1987). Empathy, sympathy, and altruism: Empirical and conceptual links. New York, NY: Cambridge University Press.Google Scholar
Elbla, A. I. F. (2012). Is punishment (corporal or verbal) an effective means of discipline in schools?: Case study of two basic schools in Greater Khartoum/Sudan. Procedia—Social and Behavioral Sciences, 69, 16561663. https://doi.org/10.1016/j.sbspro.2012.12.112 CrossRefGoogle Scholar
Elgar, F. J., Donnelly, P. D., Michaelson, V., Gariépy, G., Riehm, K. E., Walsh, S. D., & Pickett, W. (2018). Corporal punishment bans and physical fighting in adolescents: An ecological study of 88 countries. BMJ Open, 8(9), e021616. https://doi.org/10.1136/bmjopen-2018-021616 CrossRefGoogle ScholarPubMed
Eriksson, K., Andersson, P. A., & Strimling, P. (2016). Moderators of the disapproval of peer punishment. Group Processes & Intergroup Relations, 19(2), 152168. https://doi.org/10.1177/1368430215583519 CrossRefGoogle Scholar
Eriksson, K., Andersson, P. A., & Strimling, P. (2017). When is it appropriate to reprimand a norm violation? The roles of anger, behavioral consequences, violation severity, and social distance. Judgment and Decision Making, 12(4), 396407. https://doi.org/10.1017/S1930297500006264 CrossRefGoogle Scholar
Eriksson, K., Strimling, P., & Ehn, M. (2013). Ubiquity and efficiency of restrictions on informal punishment rights. Journal of Evolutionary Psychology, 11(1), 1734. https://doi.org/10.1556/JEP.11.2013.1.3 CrossRefGoogle Scholar
Eriksson, K., Strimling, P., Gelfand, M., Wu, J., Abernathy, J., Akotia, C. S., Aldashev, A., Andersson, P. A., Andrighetto, G., Anum, A., Arikan, G., Aycan, Z., Bagherian, F., Barrera, D., Basnight-Brown, D., Batkeyev, B., Belaus, A., Berezina, E., Björnstjerna, M., … Van Lange, P. A. M. (2021). Perceptions of the appropriate response to norm violation in 57 societies. Nature Communications, 12(1), 1481. https://doi.org/10.1038/s41467-021-21602-9 CrossRefGoogle ScholarPubMed
Evans, S. C., Roberts, M. C., Keeley, J. W., Blossom, J. B., Amaro, C. M., Garcia, A. M., Stough, C. O., Canter, K. S., Robles, R., & Reed, G. M. (2015). Vignette methodologies for studying clinicians’ decision-making: Validity, utility, and application in ICD-11 field studies. International Journal of Clinical and Health Psychology, 15(2), 160170. https://doi.org/10.1016/j.ijchp.2014.12.001 CrossRefGoogle ScholarPubMed
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175191. https://doi.org/10.3758/BF03193146 CrossRefGoogle Scholar
Fehr, E., & Fischbacher, U. (2004). Third-party punishment and social norms. Evolution and Human Behavior, 25(2), 6387. https://doi.org/10.1016/S1090-5138(04)00005-4 CrossRefGoogle Scholar
Fehr, E., & Gächter, S. (2002). Altruistic punishment in humans. Nature, 415, 137140. https://doi.org/10.1038/415137a CrossRefGoogle ScholarPubMed
Feinberg, M., Willer, R., & Schultz, M. (2014). Gossip and ostracism promote cooperation in groups. Psychological Science, 25(3), 656664. https://doi.org/10.1177/0956797613510184 CrossRefGoogle ScholarPubMed
FeldmanHall, O., Dalgleish, T., Thompson, R., Evans, D., Schweizer, S., & Mobbs, D. (2012). Differential neural circuitry and self-interest in real vs hypothetical moral decisions. Social Cognitive and Affective Neuroscience, 7(7), 743751. https://doi.org/10.1093/scan/nss069 CrossRefGoogle ScholarPubMed
Fiske, S. T., Cuddy, A. J. C., & Glick, P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 7783. https://doi.org/10.1016/j.tics.2006.11.005 CrossRefGoogle ScholarPubMed
Fiske, S. T., Cuddy, A. J. C., Glick, P., & Xu, J. (2002). A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology, 82(6), 878902. https://doi.org/10.1037/0022-3514.82.6.878 CrossRefGoogle Scholar
Franklin-Luther, P., & Volk, A. A. (2022). The links between adult personality, parental discipline attitudes and harsh child punishment. Journal of Family Trauma, Child Custody & Child Development, 19(1), 323. https://doi.org/10.1080/26904586.2021.1957056 CrossRefGoogle Scholar
Funk, F., McGeer, V., & Gollwitzer, M. (2014). Get the message: Punishment is satisfying if the transgressor responds to its communicative intent. Personality and Social Psychology Bulletin, 40(8), 986997. https://doi.org/10.1177/0146167214533130 CrossRefGoogle ScholarPubMed
Gershoff, E. T. (2002). Corporal punishment by parents and associated child behaviors and experiences: A meta-analytic and theoretical review. Psychological Bulletin, 128(4), 539579. https://doi.org/10.1037/0033-2909.128.4.539 CrossRefGoogle ScholarPubMed
Giardini, F., Vilone, D., Sánchez, A., & Antonioni, A. (2021). Gossip and competitive altruism support cooperation in a Public Good game. Philosophical Transactions of the Royal Society B: Biological Sciences, 376(1838), 20200303. https://doi.org/10.1098/rstb.2020.0303 CrossRefGoogle Scholar
Goodwin, G. P., & Benforado, A. (2015). Judging the goring ox: Retribution directed toward animals. Cognitive Science, 39(3), 619646. https://doi.org/10.1111/cogs.12175 CrossRefGoogle ScholarPubMed
Gordon, D. S., Madden, J. R., & Lea, S. E. G. (2014). Both loved and feared: Third party punishers are viewed as formidable and likeable, but these reputational benefits may only be open to dominant individuals. PLoS ONE, 9(10), e110045. https://doi.org/10.1371/journal.pone.0110045 CrossRefGoogle ScholarPubMed
Gromet, D. M., & Darley, J. M. (2009). Punishment and beyond: Achieving justice through the satisfaction of multiple goals. Law & Society Review, 43(1), 137. https://doi.org/10.1111/j.1540-5893.2009.00365.x CrossRefGoogle Scholar
Guala, F. (2012). Reciprocity: Weak or strong? What punishment experiments do (and do not) demonstrate. Behavioral and Brain Sciences, 35(1), 115. https://doi.org/10.1017/S0140525X11000069 CrossRefGoogle ScholarPubMed
Heffner, J., & FeldmanHall, O. (2019). Why we don’t always punish: Preferences for non-punitive responses to moral violations. Scientific Reports, 9(1), 13219. https://doi.org/10.1038/s41598-019-49680-2 CrossRefGoogle ScholarPubMed
Henrich, J., McElreath, R., Barr, A., Ensminger, J., Barrett, C., Bolyanatz, A., Cardenas, J. C., Gurven, M., Gwako, E., Henrich, N., Lesorogol, C., Marlowe, F., Tracer, D., & Ziker, J. (2006). Costly punishment across human societies. Science, 312(5781), 17671770. https://doi.org/10.1126/science.1127333 CrossRefGoogle ScholarPubMed
Hofmann, W., Brandt, M. J., Wisneski, D. C., Rockenbach, B., & Skitka, L. J. (2018). Moral punishment in everyday life. Personality and Social Psychology Bulletin, 44(12), 16971711. https://doi.org/10.1177/0146167218775075 CrossRefGoogle ScholarPubMed
Hopkins, A., Dodd, S., Nolan, M., & Bartels, L. (2022). At the heart of sentencing: Exploring whether more compassionate delivery of sentencing remarks increases public concern for people who offend. Psychiatry, Psychology, and Law, 30(4), 459485. https://doi.org/10.1080/13218719.2022.2040398 CrossRefGoogle ScholarPubMed
Jackson, P. L., Rainville, P., & Decety, J. (2006). To what extent do we share the pain of others? Insight from the neural bases of pain empathy. Pain, 125(1), 59. https://doi.org/10.1016/j.pain.2006.09.013 CrossRefGoogle ScholarPubMed
Jordan, J. J., Hoffman, M., Bloom, P., & Rand, D. G. (2016). Third-party punishment as a costly signal of trustworthiness. Nature, 530(7591), 473476. https://doi.org/10.1038/nature16981 CrossRefGoogle ScholarPubMed
Kamas, L., & Preston, A. (2021). Empathy, gender, and prosocial behavior. Journal of Behavioral and Experimental Economics, 92, 101654. https://doi.org/10.1016/j.socec.2020.101654 CrossRefGoogle Scholar
King, P. (2006). Crime and Law in England, 1750–1840: Remaking Justice from the Margins (1st ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511495878 CrossRefGoogle Scholar
Klimecki, O. M., Mayer, S. V., Jusyte, A., Scheeff, J., & Schönenberg, M. (2016). Empathy promotes altruistic behavior in economic interactions. Scientific Reports, 6(1), 31961. https://doi.org/10.1038/srep31961 CrossRefGoogle ScholarPubMed
Krama, T., Vrublevska, J., Freeberg, T. M., Kullberg, C., Rantala, M. J., & Krams, I. (2012). You mob my owl, I’ll mob yours: Birds play tit-for-tat game. Scientific Reports, 2(1), 800. https://doi.org/10.1038/srep00800 CrossRefGoogle Scholar
Krasnow, M. M., Delton, A. W., Cosmides, L., & Tooby, J. (2016). Looking under the hood of third-party punishment reveals design for personal benefit. Psychological Science, 27(3), 405418. https://doi.org/10.1177/0956797615624469 CrossRefGoogle ScholarPubMed
Kupfer, T. R., & Tybur, J. M. (2023). Third-party punishers who express emotions are trusted more. Proceedings of the Royal Society B: Biological Sciences, 290(2005), 20230916. https://doi.org/10.1098/rspb.2023.0916 CrossRefGoogle ScholarPubMed
Kurzban, R., DeScioli, P., & O’Brien, E. (2007). Audience effects on moralistic punishment. Evolution and Human Behavior, 28(2), 7584. https://doi.org/10.1016/j.evolhumbehav.2006.06.001 CrossRefGoogle Scholar
Larzelere, R. E., & Kuhn, B. R. (2005). Comparing child outcomes of physical punishment and alternative disciplinary tactics: A meta-analysis. Clinical Child and Family Psychology Review, 8(1), 137. https://doi.org/10.1007/s10567-005-2340-z CrossRefGoogle ScholarPubMed
Lee, Y., & Warneken, F. (2020). Children’s evaluations of third-party responses to unfairness: Children prefer helping over punishment. Cognition, 205, 104374. https://doi.org/10.1016/j.cognition.2020.104374 CrossRefGoogle Scholar
Leliveld, M. C., Van Dijk, E., & Beest, I. (2012). Punishing and compensating others at your own expense: The role of empathic concern on reactions to distributive injustice. European Journal of Social Psychology, 42(2), 135140. https://doi.org/10.1002/ejsp.872 CrossRefGoogle Scholar
Li, Z., Hu, G., Xu, L., & Li, Q. (2021). Third-party punishment or compensation? It depends on the reputational benefits. Frontiers in Psychology, 12, 676064. https://doi.org/10.3389/fpsyg.2021.676064 CrossRefGoogle ScholarPubMed
Lieberman, D., & Linke, L. (2007). The effect of social category on third party punishment. Evolutionary Psychology, 5(2), 289305. https://doi.org/10.1177/147470490700500203 CrossRefGoogle Scholar
Liu, X., Yang, X., & Wu, Z. (2021). To punish or to restore: How children evaluate victims’ responses to immorality. Frontiers in Psychology, 12, 696160. https://doi.org/10.3389/fpsyg.2021.696160 CrossRefGoogle ScholarPubMed
Martin, J. W., Jordan, J. J., Rand, D. G., & Cushman, F. (2019). When do we punish people who don’t? Cognition, 193, 104040. https://doi.org/10.1016/j.cognition.2019.104040 CrossRefGoogle ScholarPubMed
Masclet, D., Noussair, C., Tucker, S., & Villeval, M.-C. (2003). Monetary and nonmonetary punishment in the voluntary contributions mechanism. American Economic Review, 93(1), 366380.10.1257/000282803321455359CrossRefGoogle Scholar
Molho, C., Twardawski, M., & Fan, L. (2022). What Motivates Direct and Indirect Punishment? Zeitschrift Für Psychologie, 230(2), 8493. https://econtent.hogrefe.com/doi/10.1027/2151-2604/a000455 CrossRefGoogle Scholar
Molho, C., Tybur, J. M., Van Lange, P. A. M., & Balliet, D. (2020). Direct and indirect punishment of norm violations in daily life. Nature Communications, 11(1), 3432. https://doi.org/10.1038/s41467-020-17286-2 CrossRefGoogle ScholarPubMed
Nelissen, R. M. A., & Mulder, L. B. (2013). What makes a sanction “stick”? The effects of financial and social sanctions on norm compliance. Social Influence, 8(1), 7080. https://doi.org/10.1080/15534510.2012.729493 CrossRefGoogle Scholar
Nowak, M. A., & Sigmund, K. (2005). Evolution of indirect reciprocity. Nature, 437(7063), 12911298. https://doi.org/10.1038/nature04131 CrossRefGoogle ScholarPubMed
Panchanathan, K., & Boyd, R. (2004). Indirect reciprocity can stabilize cooperation without the second-order free rider problem. Nature, 432(7016), 499502. https://doi.org/10.1038/nature02978 CrossRefGoogle ScholarPubMed
Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195203. https://doi.org/10.3758/s13428-018-01193-y CrossRefGoogle ScholarPubMed
Peterson, J. (2024). Observing coworkers’ violations and managers’ discipline: The effect of violation and punishment severity on coworkers. Journal of Leadership, Accountability and Ethics, 21(3). https://doi.org/10.33423/jlae.v21i3.7325 CrossRefGoogle Scholar
Pfattheicher, S., Sassenrath, C., & Keller, J. (2019). Compassion magnifies third-party punishment. Journal of Personality and Social Psychology, 117(1), 124141. https://doi.org/10.1037/pspi0000165 CrossRefGoogle ScholarPubMed
Philippsen, A., Mieth, L., Buchner, A., & Bell, R. (2023). Communicating emotions, but not expressing them privately, reduces moral punishment in a Prisoner’s Dilemma game. Scientific Reports, 13(1), 14693. https://doi.org/10.1038/s41598-023-41886-9 CrossRefGoogle ScholarPubMed
Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40(3), 879891. https://doi.org/10.3758/BRM.40.3.879 CrossRefGoogle ScholarPubMed
Raihani, N. J., & Bshary, R. (2015). Third-party punishers are rewarded, but third-party helpers even more so. Evolution, 69(4), 9931003. https://doi.org/10.1111/evo.12637 CrossRefGoogle ScholarPubMed
Raihani, N. J., & Bshary, R. (2019). Punishment: One tool, many uses. Evolutionary Human Sciences, 1, e12. https://doi.org/10.1017/ehs.2019.12 CrossRefGoogle ScholarPubMed
Raihani, N. J., Thornton, A., & Bshary, R. (2012). Punishment and cooperation in nature. Trends in Ecology & Evolution, 27(5), 288295. https://doi.org/10.1016/j.tree.2011.12.004 CrossRefGoogle ScholarPubMed
Redhead, D., Dhaliwal, N., & Cheng, J. T. (2021). Taking charge and stepping in: Individuals who punish are rewarded with prestige and dominance. Social and Personality Psychology Compass, 15(2), e12581. https://doi.org/10.1111/spc3.12581 CrossRefGoogle Scholar
Ripoll-Núñez, K. J., & Rohner, R. P. (2006). Corporal punishment in cross-cultural perspective: Directions for a research agenda. Cross-Cultural Research, 40(3), 220249. https://doi.org/10.1177/1069397105284395 CrossRefGoogle Scholar
Rodrigues, J., Nagowski, N., Mussel, P., & Hewig, J. (2018). Altruistic punishment is connected to trait anger, not trait altruism, if compensation is available. Heliyon, 4(11), e00962. https://doi.org/10.1016/j.heliyon.2018.e00962 CrossRefGoogle Scholar
Santos, M. D., Rankin, D. J., & Wedekind, C. (2011). The evolution of punishment through reputation. Proceedings of the Royal Society B: Biological Sciences, 278(1704), 371377. https://doi.org/10.1098/rspb.2010.1275 CrossRefGoogle ScholarPubMed
Schoenmakers, S., Hilbe, C., Blasius, B., & Traulsen, A. (2014). Sanctions as honest signalsThe evolution of pool punishment by public sanctioning institutions. Journal of Theoretical Biology, 356, 3646. https://doi.org/10.1016/j.jtbi.2014.04.019 CrossRefGoogle ScholarPubMed
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 13591366. https://doi.org/10.1177/0956797611417632 CrossRefGoogle ScholarPubMed
Smetana, J. G., & Ball, C. L. (2019). Heterogeneity in children’s developing moral judgments about different types of harm. Developmental Psychology, 55(6), 11501163. https://doi.org/10.1037/dev0000718 CrossRefGoogle ScholarPubMed
Solomon, L. H., & Lee, Y. (2025). Not all punishment is equal: The effect of punishment severity on children’s social evaluations. Developmental Psychology, 61(2), 311322. https://doi.org/10.1037/dev0001845 Google Scholar
Spitzer, M., Fischbacher, U., Herrnberger, B., Grön, G., & Fehr, E. (2007). The neural signature of social norm compliance. Neuron, 56(1), 185196. https://doi.org/10.1016/j.neuron.2007.09.011 CrossRefGoogle ScholarPubMed
Stallen, M., Rossi, F., Heijne, A., Smidts, A., De Dreu, C. K. W., & Sanfey, A. G. (2018). Neurobiological mechanisms of responding to injustice. The Journal of Neuroscience, 38(12), 29442954. https://doi.org/10.1523/JNEUROSCI.1242-17.2018 CrossRefGoogle ScholarPubMed
Statistisches Bundesamt. (2022, November 30). Pressemitteilung Nr. 501 vom 30. November 2022. Destatis. https://www.destatis.de/DE/Presse/Pressemitteilungen/2022/11/PD22_501_24311.html Google Scholar
Straus, M. A. (1991). Discipline and deviance: Physical punishment of children and violence and other crime in adulthood. Social Problems, 38(2), 133154. https://doi.org/10.2307/800524 CrossRefGoogle Scholar
Sznycer, D., & Patrick, C. (2020). The origins of criminal law. Nature Human Behaviour, 4(5), 506516. https://doi.org/10.1038/s41562-020-0827-8 CrossRefGoogle ScholarPubMed
Tepper, B. J. (2007). Abusive supervision in work organizations: Review, synthesis, and research agenda. Journal of Management, 33(3), 261289. https://doi.org/10.1177/0149206307300812 CrossRefGoogle Scholar
Thielmann, I., Böhm, R., Ott, M., & Hilbig, B. E. (2021). Economic games: An introduction and guide for research. Collabra: Psychology, 7(1), 19004. https://doi.org/10.1525/collabra.19004 CrossRefGoogle Scholar
Turnbull, C. (1961). The forest people. Random House.Google Scholar
Tyler, T. R., Goff, P. A., & MacCoun, R. J. (2015). The impact of psychological science on policing in the united states: Procedural justice, legitimacy, and effective law enforcement. Psychological Science in the Public Interest, 16(3), 75109. https://doi.org/10.1177/1529100615617791 CrossRefGoogle ScholarPubMed
Vaish, A., Herrmann, E., Markmann, C., & Tomasello, M. (2016). Preschoolers value those who sanction non-cooperators. Cognition, 153, 4351. https://doi.org/10.1016/j.cognition.2016.04.011 CrossRefGoogle ScholarPubMed
Vaish, A., Missana, M., & Tomasello, M. (2011). Three-year-old children intervene in third-party moral transgressions. British Journal of Developmental Psychology, 29(1), 124130. https://doi.org/10.1348/026151010X532888 CrossRefGoogle ScholarPubMed
van Dijk, E., & De Dreu, C. K. W. (2021). Experimental games and social decision making. Annual Review of Psychology, 72(1), 415438. https://doi.org/10.1146/annurev-psych-081420-110718 CrossRefGoogle ScholarPubMed
van Doorn, J., Zeelenberg, M., Breugelmans, S. M., Berger, S., & Okimoto, T. G. (2018). Prosocial consequences of third-party anger. Theory and Decision, 84(4), 585599. https://doi.org/10.1007/s11238-017-9652-6 CrossRefGoogle Scholar
Vittrup, B., & Holden, G. W. (2010). Children’s assessments of corporal punishment and other disciplinary practices: The role of age, race, SES, and exposure to spanking. Journal of Applied Developmental Psychology, 31(3), 211220. https://doi.org/10.1016/j.appdev.2009.11.003 CrossRefGoogle Scholar
Wiessner, P. (2005). Norm enforcement among the Ju/’hoansi Bushmen: A case of strong reciprocity? Human Nature, 16(2), 115145. https://doi.org/10.1007/s12110-005-1000-9 CrossRefGoogle ScholarPubMed
Wu, J., Balliet, D., & Van Lange, P. A. M. (2016). Gossip versus punishment: The efficiency of reputation to promote and maintain cooperation. Scientific Reports, 6(1), 23919. https://doi.org/10.1038/srep23919 CrossRefGoogle ScholarPubMed
Zhang, Z., & Qi, C. (2024). Teachers’ punishment intensity and student observer trust: A moderated mediation model. Behavioral Sciences, 14(6), 471. https://doi.org/10.3390/bs14060471 CrossRefGoogle Scholar
Figure 0

Figure 1 Overview of experiments, setup, and procedure.Note: In the upper right part of the figure, the two experiments and their independent variables (IVs) are displayed. Each trial involved participants reading a vignette depicting a transgression and subsequent punishment, always representing one specific condition. Participants’ task was to picture the scenario and then rate the punishment, the punisher, and their interaction tendency on rating scales (displayed below the questions). These ratings comprised the five dependent variables (DVs).

Figure 1

Figure 2 Rating means per condition and DVs and correlations between DVs (Experiment 1).Note: (A) Perceived adequacy of the punishment. Four on the rating scale indicates appropriate adequacy, with lower values indicating too weak punishment and higher values indicating too strong punishment. (B) Perceived warmth of the punisher. (C) Perceived competence of the punisher. (D) The correlation matrix displays Pearson correlations for all five ratings averaged across conditions. (E) The hypothetical willingness to befriend the punisher. (F) The hypothetical willingness to be part of a team led by the punisher. Error bars indicate standard errors. ns p ≥ .05, * p < .05, ** p < .01, *** p < .001. Detailed violin plots displaying participant-level dispersion are provided in the Supplementary Figure S1.

Figure 2

Table 1 Rating means (M) and standard deviations (SD) for types of punishment by types of transgression (Experiment 1)

Figure 3

Table 2 Pairwise comparisons of ratings between types of punishment, separately for types of transgression (Experiment 1)

Figure 4

Figure 3 Results of mediation analyses for the effect of type of punishment on interaction tendencies mediated by perceptions of the punisher (Experiment 1).Note: prop = property-oriented, psych = psychological. c’ indicates the direct effect of type of punishment on the interaction tendency, c indicates the total effect c’ + a * b. (A) The perceived warmth of the punisher was revealed as a full mediator for the willingness to befriend the punisher. (B) The perceived competence of the punisher was revealed as a partial mediator for the willingness to accept the punisher as a team leader.

Figure 5

Figure 4 Rating means per condition and DV and correlations between DVs (Experiment 2).Note: (A) Perceived adequacy of the punishment. Four on the rating scale indicates appropriate adequacy, with lower values indicating too weak punishment and higher values indicating too strong punishment. (B) Perceived warmth of the punisher. (C) Perceived competence of the punisher. (D) The correlation matrix displays Pearson correlations for all five ratings averaged across conditions. (E) The hypothetical willingness to befriend the punisher. (F) The hypothetical willingness to be part of a team led by the punisher. Error bars indicate standard errors. ns p >= .05, * p < .05, ** p < .01, *** p < .001. Detailed violin plots displaying participant-level dispersion are provided in the Supplementary Figure S2.

Figure 6

Table 3 Rating means (M) and standard deviations (SD) for types of punishment by severities of punishment (Experiment 2)

Figure 7

Figure 5 Results of mediation analyses for the effect of type of punishment on interaction tendencies mediated by perceptions of the punisher (Experiment 2).Note: prop = property-oriented, psych = psychological. c’ denotes the direct effect of type of punishment on the interaction tendency, and c denotes the total effect c’ + a * b. (A) The perceived warmth of the punisher was revealed as a partial mediator for the willingness to befriend the punisher. (B) The perceived competence of the punisher was revealed as a partial mediator for the willingness to accept the punisher as a team leader.

Supplementary material: File

Seubert and Böckler supplementary material

Seubert and Böckler supplementary material
Download Seubert and Böckler supplementary material(File)
File 173.2 KB