Moral Foundation Measurements Fail to Converge on Multilingual Party Manifestos

Marvin Stecker; Frederic R. Hopp

doi:10.1017/pan.2025.10011

Moral Foundation Measurements Fail to Converge on Multilingual Party Manifestos

Published online by Cambridge University Press: 15 August 2025

Marvin Stecker

and

Frederic R. Hopp

Show author details

Marvin Stecker: Affiliation:
Department of Government, University of Vienna , Vienna, Austria Department of Communication, University of Vienna , Vienna, Austria
Frederic R. Hopp*: Affiliation:
Leibniz-Institute for Psychology (ZPID) , Trier, Germany
*: Corresponding author: Frederic R. Hopp; E-mail: fhopp@leibniz-psychology.org

Article contents

Abstract
Introduction
Measurement Strategies for Extracting Moral Foundations from Text
Empirical Strategy
Results
Discussion
Financial Support
Data Availability Statement
Author Contributions
Competing interest
Ethical Standards
Author Biographies
Footnotes
References

Rights & Permissions

Abstract

Moralising language is a powerful rhetorical tool for signaling political identity, persuading audiences, and mobilising voters. The valid and reliable classification of moral language is therefore a critical objective for political scientists. Recent advances in automated text analysis have introduced myriad new strategies for measuring morality in language, but have often produced conflicting, inconclusive findings. We investigate whether this diversity of moral content analyses might partially explain inconclusive findings, using a large corpus of political manifestos in four different languages (N=810 manifestos). Our results show that, despite starting from the same framework of Moral Foundations Theory (MFT), different instruments and underlying methodologies lead to remarkably different results for extracting moral foundations. Reproducing a previous study on political parties’ ideology and their use of moral foundations, we find that different measurements can lead to opposite effect directions. We discuss the relevance of our findings for research at the intersection of politics and moral rhetoric using automated text analysis.

Keywords

moral foundations automated content analysis multiverse analysis party manifestos

Information

Type: Article
Information: Political Analysis , First View , pp. 1 - 22

DOI: https://doi.org/10.1017/pan.2025.10011 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Political Methodology

1 Introduction

The use of moralising language by political elites in Western democracies has steadily increased over the past few decades (Simonsen and Widmann Reference Simonsen and Widmann2023). Political candidates often appeal to moral values as an effective strategy to position themselves (Hackenburg, Brady, and Tsakiris Reference Hackenburg, Brady and Tsakiris2023), persuade undecided voters (Miles Reference Miles2016), and mobilize their voter base (Jung Reference Jung2020). In view of moralising language’s motivational relevance for political mobilization, mounting efforts aim to extract and classify the latent moral values present in political elite communication (Bos and Minihold Reference Bos and Minihold2022; Lewis Reference Lewis2019; Wang and Inbar Reference Wang and Inbar2020).

A prolific theoretical framework for analysing morality in political corpora is Moral Foundations Theory (MFT; Graham et al. Reference Graham2013), which (originally) proposed a set of five different foundations that are assumed to guide moral reasoningFootnote ¹ : are/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/desecration. Previous studies have documented the prevalence of moral foundations in formal manifestos of political parties (Bos and Minihold Reference Bos and Minihold2022) to more informal social media debates (Bayrak and Alper Reference Bayrak and Alper2021), as well as with respect to political (Van Vliet Reference Van Vliet2021) and ethical issues (Clifford and Jerit Reference Clifford and Jerit2013).

At the same time, the vast, ephemeral, and multilingual sphere in which political communication operates, coupled with the latent, context-dependent nature of moral foundations (Hopp and Weber Reference Hopp and Weber2021), presents several challenges for quantifying moral foundations in political messages. By working to resolve these challenges, numerous moral foundation measurement instruments have been introduced over the last years, often by different teams from disparate disciplines (Garten et al. Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018; Hopp et al. Reference Hopp, Fisher, Cornell, Huskey and Weber2021; Preniqi et al. Reference Preniqi, Ghinassi, Ive, Saitis and Kalimeri2024). This proliferation of unstandardized tools and procedures for quantifying morality, coupled with the often opaque motivations behind the selection of particular instruments or methods, is becoming a growing concern for political science. First, there is a growing uncertainty in choosing the appropriate method for examining a particular type of text or addressing a specific research question. Second, meta-analyses concerning the use and consequences of moral language become elusive, as results are based on different measurement instruments that may not converge. Indeed, Frimer (Reference Frimer2020) and Hopp et al. (Reference Hopp, Fisher, Cornell, Huskey and Weber2021) show that measurement choices in how to operationalize moral foundations in text impact the interpretation of substantive research questions. Third, MFT’s postulated relationship between political ideology and the endorsement of particular moral foundations has largely been substantiated via behavioural (Hatemi, Crabtree, and Smith Reference Hatemi, Crabtree and Smith2019; Kivikangas et al. Reference Kivikangas, Fernández-Castilla, Järvelä, Ravaja and Lönnqvist2021) and neurological measurements (Hopp et al. Reference Hopp, Amir, Fisher, Grafton, Sinnott-Armstrong and Weber2023), but findings are much less clear-cut when looking at political rhetoric (Kraft and Klemmensen Reference Kraft and Klemmensen2024). It currently remains unknown to what extent these theoretical inconsistencies arise from the employment of different moral foundation measurements.

In view of these considerations, we herein take stock of the current tool-pile for operationalizing moral foundations in political corpora. After reviewing existing moral foundation measurement instruments and procedures, we systematically apply and compare them across a large corpus of political manifestos in four different languages (N=810 manifestos; n=961,455 sentences). Despite starting from the same framework of MFT, our findings show that different moral foundation instruments and underlying methodologies lead to remarkably different results for extracting moral foundations. We then contextualize these differences by reproducing a previous study on political parties’ ideology and their use of moral foundations, highlighting that different measurements can indeed lead to opposite effect directions. Finally, we discuss the relevance of our findings for research at the intersection of politics and moral rhetoric using automated text analysis and offer best practices for future research.

2 Measurement Strategies for Extracting Moral Foundations from Text

Researchers wishing to computationally quantify moral foundations in natural language are faced with an increasing and continuously advancing arsenal of measurement tools. While the true scope of existing strategies for extracting moral foundations is larger, we review the most widely used applications below.

Originally, moral foundations were detected via the Moral Foundations Dictionary (MFD), Graham, Haidt, and Nosek Reference Graham, Haidt and Nosek2009, a document containing words that signal the upholding or violation of distinct moral foundations. Simply by counting how frequently words of the MFD appear in natural language, many insights have been gained linking moral foundations to societal phenomena (for an overview, see, e.g., Hopp et al. Reference Hopp, Fisher, Cornell, Huskey and Weber2021, Supplemental Table 1). However, the MFD has been critiqued on both theoretical and methodological grounds: Because its words were manually and deliberately selected by a few domain experts, some argue that it may exclude broader moral communities and misrepresent moral judgement as a rational, rather than intuitive process, as dictated by MFT (Hopp and Weber Reference Hopp and Weber2021). Others (Garten et al. Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018) lamented the small size and context-agnostic word-count logic of the MFD, which limits its ability to detect moral foundations in smaller documents such as social media posts. Together, these critiques have resulted in several advancements:

First, by relying on a large crowd of human coders and an intuitive highlighting procedure, Hopp et al. (Reference Hopp, Fisher, Cornell, Huskey and Weber2021) created the extended MFD (eMFD). As a result, words in the eMFD reflect the moral signal of a large pool of individuals and associations of words with moral foundations were derived from spontaneous reactions to contextualized natural language. The eMFD has quickly become a popular alternative to the MFD, with over 200 citations on Google Scholar to date.

Second, building on the work of Garten et al. (Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018) and Sagi and Dehghani (Reference Sagi and Dehghani2014), suggest discarding word-count-based strategies and instead measuring the semantic similarity of textual documents to moral foundations. These Distributed Dictionary Representations (DDR) exploit the fact that words, which are frequently mentioned in close proximity, share a semantic basis that can be captured when mapping words to vectors in a high-dimensional embedding in order to subsequently measure the similarity of individual or aggregated word vectors (e.g., via cosine similarity). By computing the vector representation of textual documents and measuring their similarity to vectors representing moral foundations (e.g., via aggregated vectors of prototypical words such as care, harm, etc.,), DDR captures the contextual representation of texts and bypasses the necessity for explicit word-based matches. DDR has become an established strategy for measuring moral foundations in text (Kennedy et al. Reference Kennedy2023; Wang and Inbar Reference Wang and Inbar2020) and for pre-selecting documents for further manual annotation (Atari et al. Reference Atari2023). Importantly, DDR has also been used to develop the MFD2 (Frimer et al. Reference Frimer, Boghrati, Haidt, Graham and Deghani2019), simply by extending the MFD’s original word list through semantically related terms.

Third, a recent and analogous approach to DDR are Contextualized Concept Representations (CCR; Atari, Omrani, and Dehghani Reference Atari2023), where moral foundations are not represented via vectors of aggregated seed words as in DDR, but instead via aggregated sentence-level representations taken from validated psychometric instruments such as the Moral Foundations Questionnaire (MFQ, Graham et al. Reference Graham, Nosek, Haidt, Iyer, Koleva and Ditto2011).

Finally, the advent of large language models (LLMs) has also introduced novel strategies for classifying moral foundations in textual documents. Simplified, LLMs are based on deep neural networks trained on millions of textual documents to infer the sequential nature of human language. Through a process termed fine-tuning, such pre-trained LLMs can be fed a smaller number of textual documents manually annotated for moral foundations (e.g., Hoover et al. Reference Hoover2020; Trager et al. Reference Trager2022) to learn which word sequences reflect particular moral foundations. Mounting evidence demonstrates that fine-tuned LLMs reach state-of-the-art accuracy for the binary detection of moral foundations in textual documents (Preniqi et al. Reference Preniqi, Ghinassi, Ive, Saitis and Kalimeri2024), yet reservations remain concerning the (un)biased aggregation of training labels in view of morality’s highly subjective nature (Mokhberian et al. Reference Mokhberian, Marmarelis, Hopp, Basile, Morstatter, Lerman, Duh, Gomez and Bethard2024).

Summarized, the extraction of moral foundations from textual documents is an active and quickly advancing field with important applications within political science. At the same time, the proliferation of measurement strategies and underlying ontologies has resulted in a tool pile that threatens the standardized quantification of moral foundations in political texts. While we sympathize with George Box’s famous notion that “all models are wrong, but some are useful”, we currently do not know whether text analysis methods for moral foundations are “wrong” in idiosyncratic ways, leading to inconsistent and potentially incomparable results. Yet, if all methods err in a similar direction, errors are more likely to be systematic and can be accounted for, leading to a shared framework within which corrections and improvements can be made. Accordingly, we set out to illuminate the heterogeneity of results produced by different moral foundations text analysis methods.

3 Empirical Strategy

We analyze the divergence or convergence between different automated measurements of morality in political party communication. To this end, we follow the logic of multiverse analysis (Pipal, Song, and Boomgaarden Reference Pipal, Song and Boomgaarden2022; Steegen et al. Reference Steegen, Tuerlinckx, Gelman and Vanpaemel2016), which draws attention to the effects that different operationalization and measurement strategies have on subsequent empirical research analysis. Our approach is similar in motivation, as well as execution, to that of Chan et al. (Reference Chan2021), who show that different sentiment measurement dictionaries could lead to diverging substantive results. Our focus is expanded, however, to include more text analysis techniques now common to social sciences beyond word-count approaches, and we aim to be sensitive to cultural and language-specific differences in the measurement devices.

3.1 Data

Our analysis is based on all annotated manifestos from the corpus of the Manifesto Project (Lehmann et al. Reference Lehmann2024) available in English, Spanish, German, and Dutch. This leaves us with 810 manifestos by 324 parties in 29 countries. Tables 1 and 2 show the distribution of manifestos per country. We also utilize the validated English translation provided by the Manifesto Project for each of the manifestos (see Ivanusch and Regel Reference Ivanusch and Regel2024; Lehmann et al. Reference Lehmann2024 for more details). Machine translations have now reached acceptable quality standards for research purposes (de Vries, Schoonvelde, and Schumacher Reference De Vries, Schoonvelde and Schumacher2018; Licht et al. Reference Licht, Sczepanski, Laurer and Bekmuratovna2024).

Table 1 Descriptives for each language.

Table 2 Descriptives for the top ten countries.

This language selection covers both a diversity of countries and political systems, but also ensures the authors have adequate language skills in each of the languages. We combine manifesto scores with expert scores from (Jolly et al. Reference Jolly2022) and utilise the ParlGov dataset to establish government composition for each country (Döring and Manow Reference Döring and Manow2024).

3.2 Research Questions

To begin with, we focus on the agreement that different moral foundations measurements provide. We score each sentence with the same instruments, focusing on the vices and virtues of each moral foundation. These scores are then averaged at the document level for further analysis. To assess measurements that are not only comparable, but identical while taking into account variations within the political cultures of different countries, we first focus only on manifestos in English (whether original or translated).

RQ1: Do different measurements of moral foundations produce similar outcomes when scoring the same text in English?

While machine translations have advanced substantially over the last decade, we do not suggest that the dominance of English language content analysis should continue (Baden et al. Reference Baden, Pipal, Schoonvelde and van der Velden2022). Rather, to highlight the diversity of measurement devices available, we compare the effects we have observed in RQ1 when using identical measurements to results when using comparable measurements. Each manifesto is therefore scored with a respective equivalent available in its original language. We can therefore compare the manifestos scored in their English translation, as well as their original language.

RQ2: How similar are scores by English-language moral foundations measurements to their counterparts in non-English languages when scoring the same text in different languages?

Lastly, to examine the downstream effects of choosing particular moral foundations extraction methods, we replicate a previously published study (Bos and Minihold Reference Bos and Minihold2022) on the relationship between political ideology and use of moral foundations in political manifesto statements, again using the English language translations. This example goes to the heart of the mixed results discussed before, where we test whether the inconclusive research results might be (partially) explained by using different measurement devices. We specifically focus on the cultural GAL-TAN dimensionsFootnote ² , rather than general left-right placements or economic policies, because these policy fields are expected to feature more moral rhetoric (Knill Reference Knill2013; Mooney and Schuldt Reference Mooney and Schuldt2008), possibly attenuating party differences. We are interested in the similarity or difference we might find between different moral foundations operationalizations. Are differences related to the strength of effect sizes one might find (Type M errors), or even to the direction of relationships (Type S errors) (Gelman and Carlin Reference Gelman and Carlin2014)?

RQ3: What effect do different measurements of moral foundations have on supporting or rejecting hypotheses on the relationship between a party’s ideology and its use of moral foundations in its manifestos?

3.3 Measurements

In our implementation, we compare the following text-based measurements of moral foundations (Table 3). Our focus lies on methods, that might be described as off-the-shelf, insofar as they are readily available and do not require the annotation of a larger training corpus for use with Supervised Machine Learning approachesFootnote ³ . These are also potentially the most “tempting” tools to be used without proper validation of their functioning, since the creation of one’s own annotation corpus and training of machine learning models necessarily requires paying close attention to performance and validity metrics, while off-the-shelf tools might present themselves as appealing due to their widespread adoption and ease of implementation.

Table 3 Measurement Tools.

3.3.1 MFD: Moral Foundations Dictionary

The MFD was originally created by Graham et al. (Reference Graham, Haidt and Nosek2009). It contains 295 words for the virtues and vices of each moral foundation in English, developed by the authors through thesauruses and discussions amongst themselves. Each foundation is, therefore, associated with a list of words. The authors initially validated it on sermons by progressive or conservative church congregations in the United States. We use the original version of the dictionary by Graham et al. (Reference Graham, Haidt and Nosek2009), relying on the R package quanteda (Benoit et al. Reference Benoit2018) and the dictionary utilities provided in quanteda.dictionaries (Benoit Reference Benoit2023). Sentences are scored on each moral foundation on a range of 0 to 1, as the percentage of moral words in each foundation relative to the overall number of words in the sentence. For the multilingual part, we use the translated dictionaries from Bos and Minihold (Reference Bos and Minihold2022) for Dutch and German, and Carvalho and Guedes (Reference Carvalho, Guedes, Nussbaum, Infante and Sánchez2022) for the Spanish language, again applying them through quanteda.dictionaries. We remove stop words in each language using the NLTK stop words list (Bird, Klein, and Loper Reference Bird, Klein and Loper2009).

3.3.2 MFD2: Moral Foundations Dictionary 2.0

The second version of the (MFD2) was developed by Frimer et al. (Reference Frimer, Boghrati, Haidt, Graham and Deghani2019). To improve upon the original dictionary, they selected new “seed” words to define each moral foundation and also used cosine-similarity scores, estimated using a (not further detailed) word2vec model, to refine the word lists. Overall, they expanded the dictionary to include a total of 2103 words. For validation, they recruited a geographically diverse sample of crowd workers, asking them to write an essay about specific moral foundations. As with the MFD, we use the original dictionary provided by Frimer et al. (Reference Frimer, Boghrati, Haidt, Graham and Deghani2019), using quanteda to apply it to our dataset. Sentences are scored equivalently to the MFD, also including stop word removal.

3.3.3 eMFD: extended Moral Foundations Dictionary

The eMFD was developed by Hopp et al. (Reference Hopp, Fisher, Cornell, Huskey and Weber2021). It also relies on word counts to score documents, yet was developed by crowdsourced annotators who were given news texts and instructed to highlight portions of the texts they understood to be related to a given moral foundation. These extracted samples were then preprocessed into individual words, with each assigned a normalised probability of indicating any moral foundation, with sentiment scores to distinguish virtues and vices. The resulting dictionary comprises 3207 words, slightly more than the MFD2. The authors validated it on a different sample of news texts by scoring these in comparison to the MFD and MFD2, comparing partisan news sources and using it to predict article sharing behaviours (Hopp et al. Reference Hopp, Fisher, Cornell, Huskey and Weber2021). To our knowledge, no translation has yet been undertaken for the eMFD, meaning we can only apply it to English-language texts. We apply the eMFD through the scoring functions from eMFDscore (Hopp et al. Reference Hopp, Fisher, Cornell, Huskey and Weber2021) integrated into spaCy (Montani et al. Reference Montani, Honnibal, Honnibal, Boyd, Van Landeghem and Peters2023), using all foundation scores for each word and vice and virtue dimensions split from each other. Stop words are removed as previously.

3.3.4 DDR: Distributed Dictionary Representation

DDR relies on word-embedding approaches for its assessment of the presence of moral foundations rather than word counts. DDR has been gaining traction in different social science applications, especially because of its conceptual flexibility and the increasing availability of pre-trained embeddings for many languages (see for example: Daenekindt and Schaap Reference Daenekindt and Schaap2022; Stoltz Reference Stoltz2019; Voyer, Kline, and Danton Reference Voyer, Kline and Danton2022). It is based on ideas by Garten et al. (Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018) and essentially averages the word vectors of a statement and calculates their cosine similarity to the word vectors of “seed” words (e.g., “finance” and “culture” to classify news stories), scoring them between 0 and 1. This might make it more attractive for researchers to employ because of a (deceptively) lower workload, since the requirements shift from having an as-exhaustive-as-possible dictionary of word counts to specifying and validating far fewer seed words. In our application, we utilize the Python module gensim (Řehůřek and Sojka Reference Řehůřek and Sojka2010) to calculate cosine similarities, using the multilingual embeddings from Wirsching et al. (Reference Wirsching, Rodriguez, Spirling and Stewart2025) and the seed word list originally used by Garten et al. (Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018), translated by the authors and validated with the help of native speakers.

3.3.5 CCR: Contextual Concept Representation

Contextual Concept Representation (CCR) is similar to DDR, however relies on different indicators of moral foundations. It is based on ideas by Atari et al. (Reference Atari2023) who suggest using sentence embeddings based on BERT transformer architecture (Vaswani et al. Reference Vaswani and Guyon2017; Devlin et al. Reference Devlin, Chang, Lee and Toutanova2019) instead of the static word embeddings like word2vec that form the basis of DDR. Using these semantically richer embeddings, the to-be-scored documents are not compared to individual seed words, but instead to, for example, items from psychometric scales. In this way, Atari et al. (Reference Atari2023) suggest bridging decades-old experimental research and advances in Natural Language Processing. In our case, we can rely on established measurement instruments to measure moral foundations in the text documents: We use the “Agreement”Footnote ⁴ questions of the original MFQ (Graham et al. Reference Graham, Nosek, Haidt, Iyer, Koleva and Ditto2011), as well as the translation into German (Jockel et al. Reference Jockel, Dogruel, Arendt, Stahl and Bowmann2010), Dutch (van Leeuwen Reference Van Leeuwen2010) and Spanish (Bedregal and León Reference Bedregal and León2008) to classify the virtue dimensions of the foundations. To calculate the vices, we rely on the Moral Foundations Vignettes: short experimental stimuli that describe violations of each particular foundation, e.g., a text describing how someone is behaving immorally and unfairly in a game and therefore, violates the “fairness” dimension. They are available in English (Clifford et al. Reference Clifford, Iyengar, Cabeza and Sinnott-Armstrong2015), Spanish (Aguiar, Corradi, and Aguilar Reference Aguiar, Corradi and Aguilar2023), and Dutch (Hopp et al. Reference Hopp, Jargow, Kouwen and Bakker2024); German translations have been completed by the authors.

This approach represents a slight divergence from the previous measurement setups, as our opposing vice and virtue poles are not measured equivalently from the same definitions, but rather, we take two separate sources of reference materials to score the dimensions. We believe this is warranted, however, as we wanted to measure both opposing poles with actually distinct measurements (as in all other approaches). Would we only use, e.g., the vignettes of moral violations, then we could only calculate the virtue scores as inversions of the vice scores. This would also mean we could not compare correlations across virtues and vices.

To calculate embeddings, we utilise Sentence-Bert for Python (Reimers and Gurevych Reference Reimers and Gurevych2019, Reference Reimers and Gurevych2020), with the monolingual “all-MiniLM-L6-v2” model for all English-translated texts, and the multilingual “paraphrase-multilingual-MiniLM-L12-v2” for all statements in their original language, and calculate cosine similarities in the range of 0 to 1.

3.3.6 MoralBERT

MoralBERT (Preniqi et al. Reference Preniqi, Ghinassi, Ive, Saitis and Kalimeri2024) is a recently developed series of ten pretrained language models (PLMs) fine-tuned for classifying each moral foundation’s polarity (vice/virtue) in text corpora. The basis for MoralBERT are several English social media corpora (Twitter, Reddit, Facebook) which were annotated by human coders for the presence of moral foundations. MoralBERT was then developed by training (fine-tuning) the popular PLM BERT to detect the presence/absence of each moral foundation’s polarity (vice versus virtue). Notably, the classification performance of MoralBERT outperformed those of several baseline models, including word-count and embedding-based approaches, as well as predictions generated by GPT-4.

MoralBERT is open-source and can be freely downloaded to score textual input documents for the presence of each moral foundation. Specifically, when scoring text documents with MoralBERT, a series of 10 (independent) scores are returned (one for each moral foundation’s vice/virtue category), which indicate the likelihood (0 to 1) that this text contains the respective moral foundation category. Accordingly, we applied MoralBERT to all English-translated texts of our sample.

4 Results

To investigate the different measurements of moral foundations, we proceed in three steps. First, to compare the identical implementations, we measure agreement between all (original or machine-translated) English manifestos (RQ1). We then proceed to repeat this with similar measurements by using all manifestos in their original language, before moving on to one example application for a substantial research question.

4.1 RQ1: Descriptive Results

First, we provide an overview of the descriptive results, beginning with each of the English measurement instruments. Figure 1 shows the distribution of morality scores for each measure, normalised with min-max scaling, for all moral foundations on the English-translated manifestos. We observe stark differences in the scores.

Figure 1 Distributions of all morality measurements applied to the corpus of English-translated manifestos (normalized).

As expected for count-based distributions, the dictionary word-count methods show peaks around zero, meaning that the majority of sentences are not classified in their scoring because they do not contain words that are also present in the dictionary for the exact matches needed to score a document. Here, the MFD2, which contains more words than the original MFD, shows a more dispersed distribution. The eMFD scores, normalized, show a much higher mean and are also more normally distributed. The DDR measurement instead has the highest mean of all measurements, while the CCR and MoralBERT measurements seem to show a wider distribution than the DDR.

Turning to the non-normalised scores, reported in detail in Table A.2 in the Supplementary Material, we find a similar picture for the word count dictionaries (MFD: $M= .05, SD = .05, CV = 0.93; $ MFD2: $M= .06, SD = .06, CV = 0.89$ ; eMFD: $M = .05, SD = .01, CV = .19$ ). The MFD and MFD2 show a greater variance than the eMFD, however. The DDR measurement, non-normalized, is actually higher in its mean ( $M = .43, SD = .03, CV = .07$ ) compared to CCR ( $M = .06, SD = .03, CV = .57$ ), but shows smaller variance, as measured by the Coefficient of Variation. The BERT-based MoralBERT shows the lowest non-normalized mean score, but the highest overall variation ( $M = .04, SD = .05, CV = 1.19 $ ). Although these diverging frequency distributions may not directly speak to instruments’ potential measurement problems and simply express the expected variation across different moral foundation measurement approaches, they provide useful information about the intensity and structure of the extracted moral signal. Hence, these comparisons can inform which method may lend itself more readily to particular downstream tasks. If the aim is to identify a potentially large corpus of general morally relevant messages (e.g., for subsequent use in more fine-grained human annotation studies), the rate of false positives may be less of a concern than potentially missing texts due to an overall low or absent moral signal. Here, measurement approaches that extract, ceteris paribus, a higher moral signal (e.g., DDR) may be preferred. However, if researchers wish to use computed morality scores in inferential models, then evaluating the normality and cross-correlation of computed scores from different approaches should be given more weight. Thus, we next evaluated the correlation of computed moral scores within and across moral foundation measurement approaches.

Looking beyond density distributions, Figure 2 shows the Kendall’s $\tau $ correlation matrices for the different measurementsFootnote ⁵ : within each measurement method, we correlate all different moral foundations with each other across vice and virtue poles. For the MFD and MFD2, we can barely observe any correlations between the different dimensions. The eMFD shows higher correlations, mostly stable across all different foundations, between 0.3 and 0.5. Notably, the eMFD is the only measurement instrument to show negative correlations between vice and virtue dimensions, meaning these are more symmetrical to each other than in the other foundations, where correlations are positive or non-existent.

Figure 2 Mean Kendall’s correlations between moral foundations within measurements.

More notable correlations are found within the DDR and CCR measurements. DDR correlations are large, mostly across the board, in the range of 0.6–0.9. CCR shows a larger spread in correlations, ranging from 0.2 to 0.8. The MoralBERT scores, lastly, are correlated in varying degrees and directions: some show no relation to each other, while there is no clear distinction between Vices and Virtues in their cross correlation, which range from -0.5 to 0.5. Together, these results complement the previous comparison of frequency distributions in meaningful ways. If researchers are interested in examining potential differences in the expression of moral foundations across political texts, then instruments and approaches that simultaneously capture a high moral signal, while still being able to distil this signal into its constituent moral domains, are preferable (e.g., MoralBERT). Analogously, if the aim is to uncover to what extent political texts reflect moral virtues or moral vices, then instruments should be chosen that draw clearer distinctions between these poles (e.g., eMFD and CCR). Finally, to obtain more stable model fits and reduce multicollinearity, scoring methods that yield normally distributed scores (cf. Figure 1) with lower internal correlations (e.g., eMFD and MoralBERT) present promising solutions.

To further investigate RQ1 beyond distributions of the individual measures, we turn towards correlations of the measurements on the English manifestos. Note that each manifesto has been scored with every measurement for every moral foundation, meaning we have ample data to compare the different methods.

Figure 3 shows the average Kendall $\tau $ of all foundations across the different measurement instruments we usedFootnote ⁶ . Surprisingly, correlations between the measurement tools are overall very low, to the point where no connection can be said to exist between any. Most notable is the correlation between the original MFD and MFD2, which reflects the extent to which the MFD2 is a direct extension of the MFD, expanding but keeping the same original vocabulary. The eMFD, even though similar in functionality to using word-counts, shows no relationship to either of them, showing that different construction methods for building dictionaries of the same concept can yield very differing results. Despite the high amount of zero-annotations returned by the MFD, MFD2, and eMFD, there are no strong correlations between these measurements. DDR mostly shows no relation to any of the other methodologies, while CCR correlates slightly with MoralBERT and the eMFD. The eMFD, therefore, is the most closely aligned word-count method with the more advanced embedding-based methods, excepting DDR.

Figure 3 Mean Kendall’s correlation between all foundations across measurements.

4.2 RQ2: Multilingual Variation

After inspecting the results using the exact same measurements on English translated texts, we turn toward the effect of using similar measurements in multiple languages. As we have discussed, all of our detailed approaches are applicable to multiple languages, either through the translation of original dictionaries or the use of multilingual (aligned) embeddings. For RQ2, we computed Kendall’s $\tau $ after scoring the same manifesto in an English translation with the original English instrument and in its original language with the respective (translated) instrument there (see the Methods section for details on each of the implementations). Ideally, we should see high correlations, as this would indicate that the translations score the same document similarly to the original English measurement, and show greater convergence. The resulting correlations are presented in Figure 4, separated by language, moral foundation and the dimensionFootnote ⁷ .

Figure 4 Kendall’s correlations between manifestos translated into English and scored using the original English instruments with manifestos scored in the original language using translated versions of the original instruments.

We note very different results for the different measurement instruments. For CCR, correlations are the most stable across the board of all measurements. Whether we use the English-language MFQ and vignettes, or those in the original language of the document to be scored, we mostly see correlations around 0.6. The Virtue dimension, measured with the similarity to the questionnaire, seems to correlate overall better across languages than the Vice dimension, which is based on the vignettes of moral violations. The lowest correlation scores for CCR are all in the Vice dimension, particularly for Loyalty and Sanctity.

DDR results are similar, again with variance across the results. Most correlations are above 0.6, with no clear patterns across Vice or Virtue poles, or across languages. The Care foundation seems the hardest to capture well, with the biggest spread of all dimensions.

The MFD, lastly, shows the greatest variety, but also overall lowest correlations, in the results, with correlation scores ranging from 0 to above 0.5. Again, there are very few clear patterns emerging between languages or Vice/Virtue dimensions, though the German dictionary in particular struggles with the Authority and Sanctity dimensions.

4.3 RQ3: Multiverse Analysis of Substantial Results

For RQ3, we investigate the impact that different measurement instruments have for answering substantive research questions, using the English-translated manifestos. Solely because different measurement instruments show low correlations with each other does not necessarily imply that they substantially impact conclusions drawn from applied work.

Returning to our original motivation, we focus on the classic distinction between moral foundations according to political ideology, which has resulted in inconclusive results so far. We mostly adhere to the study design by Bos and Minihold (Reference Bos and Minihold2022), who asked whether a party’s classification on the GAL-TAN scale (Green-Alternative-Libertarian, Traditional-Authoritarian-Nationalist) predicts the use of moral rhetoric in their party manifestos. They focused only on German and Dutch manifestos, whereas we additionally include (the English translations of) Spanish and English-language manifestos in our analysis.

Furthermore, Bos and Minihold (Reference Bos and Minihold2022) used the original English and translated (German and Dutch) versions of the word-count based MFD, which we also included, but we also employed the CCR and DDR measures. However, Bos and Minihold (Reference Bos and Minihold2022) dichotomize the MFD’s returned word-count scores to present/not-present, whereas we focus on scoring documents with continuous values. As in the original study, we use the party positioning scores by CHES as our main independent variable, with which we predict the morality score of a manifesto according to the different measurements. While Bos and Minihold (Reference Bos and Minihold2022) also control for the sentiment of the manifesto sentences, due to the multilingual nature of our analysis, and to keep results more comparable, we have decided to omit sentiment measures as control variables. We also only conduct our analysis at the document level, rather than the sentence level, as Bos and Minihold (Reference Bos and Minihold2022) do. The resulting dataset is necessarily smaller (manifestos: $N = 92$ ) than in our previous analyses, due to the limited availability of CHES expert scores. We have matched CHES scores to manifestos with a rolling window of 2 years (see Table A.3 in the Supplementary Material). We have controlled for the year, a party’s government participation during the compilation of the manifesto through PARLGOV (Döring and Manow Reference Döring and Manow2024), and added, where necessary, random effects for countries to account for the grouped nature of our data. In a few cases, such as Spanish, for example, using the original language manifestos leaves only one country, in which case we ran a linear regression model without random effects. Tables A.10 to A.49 in the Supplementary Material detail each specification.

Figure 5 shows the results of that replication. We ran multiple linear mixed regressions for each moral foundation, asking whether the party’s ideological placement on the GAL-TAN dimension is a significant (p > .05) predictor of the use of moral foundations in their manifesto statements. Considering the diversity of results and the lack of coherence discussed in the previous sections, it is perhaps not surprising that the employed measurement instruments yield different interpretations of the investigated relationship.

Figure 5 Results of multiple linear (mixed) regressions using all English translated manifestos, with standardized regression coefficients (excluding control variables) and Wald confidence intervals. Singular models are not shown.

The eMFD and DDR measurements mostly show no relationship between the ideology of a political party and the use of specific moral foundations in its manifesto. MoralBert scores show a relationship for a few of the foundations, though not across the board. MFD and MFD2 are closely aligned (except for more singular model fits with the MFD2): on the Virtue dimension, the MFD predicted a relationship between ideology and moral foundations for three out of five moral foundations. The results are also puzzling with the direction of the effects, with the MFD predicting the same direction of effects for both the Virtue and the Vice dimensions of Fairness and Loyalty. CCR, lastly, only predicts positive relationships for two out of the five virtue foundations and one of the vice dimensions. On the Virtue side, CCR, MoralBERT, MFD, and MFD2 all predict a relationship for Fairness and Loyalty (although in opposite directions), while on the Vice side, this only holds for Loyalty. Depending on the threshold one would apply, a researcher might be tempted to declare MFT rejected, or, on the other hand, at least partially supported, depending on the measurements used. These results point to many Type M errors, where some models might show significant effects for GAL-TAN as a predictor, while others do not, and for a few foundations also indicate Type S errors, where there are significant effects in opposing directions (for example: Fairness (Virtue and Vice) or Sanctity (Virtue)).

We have also extended our analysis to the multilingual context. What are the differences if one only used the original language manifestos, and scores these with the different measurement instruments, e.g., investigates the relationship between the use of moral language and the GAL-TAN position of parties only in the Netherlands, or German-speaking countries? These results necessarily need to be interpreted with more caution than the analysis of the English-translated documents, because differences here could be caused by either the methodological implementation or the differences in political contexts between countries. Nevertheless, the original hypotheses of MFT expect similar patterns regardless of context and do not distinguish why Dutch parties should be systematically different from Spanish parties, for example, which is why we believe it is important to also consider this dimension.

Figure 6 displays this in exemplary fashion for the Care foundation, all other foundations can be found in the Supplementary Material. Using the translations of the MFD, for example, would lead one to suppose that there is a positive relationship between one of the Care poles and a party’s GAL-TAN position in Dutch- and German-speaking countries, but a negative one in English-speaking countries. DDR would mostly point to no relationship, with both point estimates as well as confidence intervals around 0. CCR, on the other hand, would propose a positive relationship for Dutch-speaking countries, but a negative trend for Spanish-speaking countries. These results show not only divergence between different measurement instruments but also within the same measurement instrument, depending on the linguistic context in which it is employed.

Figure 6 Results of multiple linear (mixed) regressions for the foundation Care, with standardised regression coefficients (excluding control variables) and Wald confidence intervals. Singular models are not shown.

5 Discussion

The reliable and valid analysis of moral information in political discourse is becoming an important methodological task for political science. Although numerous measurement instruments for moral content classification have proliferated over the last years, political scientists are now facing the challenge of selecting an appropriate method for advancing their research. In order to illuminate how the choice of a moral measurement instrument may impact the resulting findings, we have employed a wide range of automated moral content analysis procedures to test their level of convergence on a large variety of multilingual party manifestos from the Manifesto Project corpus (Lehmann et al. Reference Lehmann2024).

Even though all measurement procedures aim to operationalize the well-established moral foundations as defined by MFT (Graham et al. Reference Graham, Haidt and Nosek2009), simple descriptive statistics already highlighted that the results produced across English moral measurement instruments on English (original or translated) party manifestos differ noticeably. Whereas relying on word-count based dictionaries or CCR (Atari et al. Reference Atari, Omrani and Dehghani2023) paints a political world mostly devoid of moral language, results from DDR (Garten et al. Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018) suggest a much higher presence of moral foundations in party communication. Although the overall magnitude of the extracted moral signal may be modulated by an instrument’s size (e.g., words in a dictionary) or scoring procedure (e.g., word-count based versus semantic similarity), the relationships across computed moral foundations may still be stable across approaches. To the contrary, we found that correlations within the moral foundations vary greatly between the different content analysis procedures. The original moral foundations dictionaries show hardly any relationship between moral foundations (except for the eMFD), whereas for the more recent embedding-based approaches, correlations are either higher (CCR) or, in the case of DDR, so large that discrimination between the moral foundations becomes elusive. For MoralBERT, where each foundation is assessed by a different fine-tuned transformer model, correlations between foundations appear unsystematic, are either small or large, positive or negative.

Beyond these monolingual analyses, we showed that word-counts derived from published translations of the original MFD do not correlate equally well with their original English version applied to English (native or translated) manifestos. There are at least three reasons for this: (i) MFD translations indeed detect language-specific moral nuances; (ii) noisy translations of non-English party manifestos into English; or (iii) imperfect translations and adaptations of the original English MFD into non-English languages. Although it has been argued that DDR remedies these multilingual challenges by only requiring a few seed words to be translated and relying on a multilingual embedding (Kennedy et al. Reference Kennedy2023), we found that DDR produced the lowest overall convergence when comparing the original English moral seed words and embedding with their multilingual counterparts, emphasising that DDR offers no reliable “off-the-shelf” solution for multilingual moral content classification. Rather, we showed that CCR was the most reliable multilingual procedure for moral foundation classification as indexed by the highest convergence of computed moral foundations scores using English concept representations and English (original or translated) manifestos and the multilingual (native) representations and multilingual manifesto embeddings. We also note that in our comparison to the manual coding of “Traditional Morality” in the manifestos, CCR offers by far the closest match to the annotations by the Manifesto Project coders. This holds both for the Virtue and Vice dimensions. Although the different embeddings underlying DDR and CCR could be a contributing factor to their diverging results, we reason that the reliance on whole sentences for computing concept representations in CCR captures a more stable and reliable multilingual representation of moral foundations compared to DDR that relies on a few seed words where small idiosyncrasies in embeddings and translations are more likely to result in large down-stream differences.

Lastly, we demonstrated that the choice of the moral foundations measurement procedure directly impacts the interpretation of substantive research questions relevant to political science. By examining the controversial association between a party’s political ideology and its use of moral foundations in political communication (Bos and Minihold Reference Bos and Minihold2022; Frimer Reference Frimer2020; Kraft and Klemmensen Reference Kraft and Klemmensen2024), we found that different moral foundation measurements produce results that either substantiate or refute this relationship. Notably, some techniques even produced effects that pointed into opposite directions, meaning error of sign, rather than magnitude (Gelman and Carlin Reference Gelman and Carlin2014), where for instance word-count based dictionaries suggested a negative relationship between a party’s GAL/TAN orientation and its use of fairness-related terms and CCR suggesting a positive relationship between a party’s GAL/TAN orientation and the appeal to fairness-related concepts. We note that others (Kraft and Klemmensen Reference Kraft and Klemmensen2024) have reported similar findings, albeit using slightly different methods and focusing only on political communication in English. Nevertheless, these results underscore the current risk of producing irreconcilable conclusions that stem from the application of different moral foundation measurement procedures.

5.1 Best Practices for Measuring Moral Foundations in Political Texts

In view of our results, the question arises which moral foundations measurement instrument or technique should be chosen for analysing multilingual and English-translated manifestos or political corpora in general. Although moral foundations’ latent, subjective, and context-dependent nature exacerbates a “one-size-fits-all” solution, our experience and findings from this study suggest several paths forward.

First, as statistical language modelling advances, so do the techniques for parsing and detecting moral traces in human language. Hence, demonstrated relationships between appeals to moral foundations and ideology that were established using now overhauled methods should be carefully benchmarked against state-of-the-art methods. For instance, we showed that MoralBERT (Preniqi et al. Reference Preniqi, Ghinassi, Ive, Saitis and Kalimeri2024), which is based on state-of-the-art language model fine-tuning, not only showed small correlations between extant popular moral foundations measurement instruments but also produced divergent conclusions concerning the relationship between ideology and appeals to moral foundations.

Second, as the development and validation of fine-tuned language models for multilingual contexts is a slow, costly, and computationally expensive process (Simonsen and Widmann Reference Simonsen and Widmann2023), researchers wishing to adapt existing moral foundations measurement instruments to their language or culture can harness techniques that overcome known limitations of word-count based approaches while lending themselves more efficiently to multilingual applications. Here, we find that contextualized construct representations (CCR; Atari et al. Reference Atari, Omrani and Dehghani2023) offer a promising alternative. Not only do CCR capture contextualized information via sentence embeddings and thereby move beyond mean-pooled token representations as computed via DDR Garten et al. Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018), but their underlying operationalizations for moral foundations can be derived from translated and validated instruments such as the Moral Foundations Vignettes (Clifford et al. Reference Clifford, Iyengar, Cabeza and Sinnott-Armstrong2015). Indeed, moral scores derived from CCR showed the highest predictive accuracy for human annotations of “Traditional Morality” as coded in the Manifesto Project, demonstrating its general capacity to detect references to moral values. Researchers in multilingual contexts might therefore, ideally have two viable pathways for using computational text analysis methods. On the one hand, the existence of validated, high-quality English translations is continuously increasing, enabling (a) standardized datasets to be used across research studies, and (b) researchers in any language could utilise the wide breadth of English-language text analysis tools available. While enabling better comparability across studies, this would simultaneously increase the already English-centric nature of much of the development of computational text analysis methods (Baden et al. Reference Baden, Pipal, Schoonvelde and van der Velden2022). Therefore, we are also encouraged by the development of more multilingual methods, either through fine-tuned multilingual transformer models or novel methods that enable researchers to incorporate existing multilingual materials. Nevertheless, utilising the latter method still requires validation for each use case, because it cannot be assumed that, for example, validation of the use of the Spanish in CCR means the same for the Dutch language MFQ (Licht and Lind Reference Licht and Lind2023).

Third, our benchmarked descriptives of moral foundations measurement distributions alongside their internal convergence and divergence probed the general capacity of these models to (i) detect the presence of moral appeals and to (ii) differentiate between the moral foundation categories. We have argued that measurement approaches that compute an overall higher moral signal may be preferable for applications that prioritize precision over recall. Particularly when aiming to collect morally relevant texts for further, more fine-grained human annotations or large-scale online studies, sampling a larger corpus of instances that may not uniformly be perceived as morally relevant by human coders is likely a better choice than missing texts with potentially more latent moral information. At the same time, quantifying the distribution and internal structure of different moral foundation measurement methods can inform which method may be more conducive for a particular research goal. As noted before, if researchers are interested in examining differences in the expression of moral foundations across political texts, then instruments that simultaneously capture a high moral signal, while still being able to distil this signal into its constituent moral domains, are preferable (e.g., MoralBERT). Analogously, if the aim is to uncover to what extent political texts reflect moral virtues or moral vices, then approaches should be chosen that draw clearer distinctions between these poles (e.g., eMFD and CCR). Finally, to obtain more stable model fits and reduce multicollinearity, scoring methods that yield normally distributed scores with lower internal correlations (e.g., eMFD and MoralBERT) present promising solutions.

Finally, although descriptives provide informative benchmarks, we argue that the future selection of moral foundations measurements in political and the broader social sciences should prioritize prediction over explanation (Yarkoni and Westfall Reference Yarkoni and Westfall2017). Specifically, establishing which moral foundations instruments or techniques can best predict politically relevant outcome variables, from persuasion (Simmons Reference Simmons2023) and voter mobilisation (Jung Reference Jung2020) to online sharing of political messages (Hopp et al. Reference Hopp, Fisher, Cornell, Huskey and Weber2021) may reduce the extant measurement tool-pile via a solution-oriented approach (Watts Reference Watts2017). Indeed, establishing for which political outcomes different models (dis)agree may not only provide insights into the contextual peculiarities of moral language in general, but also establish which morally relevant phenomena can be robustly predicted even by different moral foundations measurement approaches.

5.2 Limitations

Our study has limitations that impact the interpretation and generalization of our presented findings. We recognize that we focused on a particular type and form of political communication, namely, party manifestos. We chose party manifestos because they are relevant documents to investigate questions around political ideology, and their formalized production process makes their style relatively comparable across parties and countries, but research on more personal and intimate communication documents might produce different outcomes. Particularly because extant moral foundations dictionaries were validated using larger documents such as news articles or essays, it remains an open question to what extent our results generalize to other (longer) political documents, including speeches. In addition, we deliberately focused on comparing the produced outcomes of the employed measurement tools and discarded an extensive comparison of statistical assumptions for each computed model based on the computed moral foundation scores (e.g., normality, independence of error terms, etc.). Yet, given the nested structure of our data (manifestos within political parties) and the largely skewed distributions of moral foundations scores, we cannot speak to the statistical robustness of our employed models. Because extant work on the association between moralizing language and third variables is similarly light in their reporting of statistical assumptions, we strongly encourage future studies to be more transparent and rigorous in this respect. Further, our research comparing the multilingual measurement instruments relies on the parallel texts provided by the Manifesto Project corpus. While we believe that machine translations in general offer reliable quality for political science applications (Licht et al. Reference Licht, Sczepanski, Laurer and Bekmuratovna2024), and the Manifesto Project itself provides detailed information on its translation and validation methodology (Ivanusch and Regel Reference Ivanusch and Regel2024), we cannot rule out these translations influencing our results, and encourage further work to also compare translation methodologies. Moreover, although we aimed to capture a broad spectrum of moral foundation measurements, the true universe of procedures and adjustable parameters is much larger. For instance, we only relied on the fastText embeddings for DDR, which are trained on Wikipedia data, and might therefore be too domain-removed from the political manifestos we are investigating. Garten et al. (Reference Garten, Hoover, Johnson, Boghrati, Iskiwitch and Dehghani2018) similarly reported Wikipedia-trained embeddings performing worse than those trained on a Google News corpus.

5.3 Conclusion

Moral foundations undoubtedly continue to play some role in political communication (Simonsen and Widmann Reference Simonsen and Widmann2023, Reference Simonsen and Widmann2025). How we operationalize and measure moral foundations will play a critical role in understanding the form and extent of this role. To gain traction in the short-term, at the very least we should carefully justify why a certain moral foundations measurement instrument was chosen for a particular question, which underlying assumptions about the representation of moral foundations in a particular text influenced this choice, and whether this method was adapted to better fit the target context (Hackenburg et al. Reference Hackenburg, Brady and Tsakiris2023). Because all our implemented instruments and methods are openly available, we need more standardized software that easily allows the scoring of political texts with all available instruments and procedures (for existing efforts, see, e.g., https://github.com/medianeuroscience/emfdscore). We agree with Frimer (Reference Frimer2020) that this would facilitate the replication and re-analysis of existing results. As we are moving towards the multilingual and multicultural study of morality in political texts (Simonsen and Widmann Reference Simonsen and Widmann2023), we must create awareness that we do not unintentionally impose word-lists or instruments developed by a few foreign domain experts to analyze the moral rhetoric of local political entities. In the years to come, we need to develop more corpora annotated by appropriately trained crowds (Weber Reference Weber, Atteveldt and Peng2021) to develop culturally-aware methods that can capture the moral sensibilities of the local populations directly impacted by moralized political discourse. Rather than criticizing any particular moral foundations measurement instrument or presenting English moral foundations dictionaries as more authoritative than their multilingual siblings, we hope our study provides a comprehensive baseline and inspires a more methodologically reflected, transparent, and holistic approach to study moral foundations in multilingual political communication.

Acknowledgements

We want to thank Bert Bakker, Isabella Rebasso, and Soo Jin Kim for their helpful comments on previous drafts, as well as the editor and three reviewers whose suggestions improved the article. We presented this work at the Manifesto Conference 2023 and COMPTEXT 2024, whose participants we also want to thank for their questions and comments.

Financial Support

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data Availability Statement

Replication code for this article is available at Stecker and Hopp (Reference Stecker and Hopp2025). A preservation copy of the same code and data can also be accessed via Dataverse at https://doi.org/10.7910/DVN/FZVF5X

Author Contributions

M.S.: Conceptualization, Data curation, Formal analysis, Software, Writing - Original Draft, Writing - Review & Editing

F.R.H.: Conceptualization, Validation, Writing - Original Draft, Writing - Review & Editing, Supervision, Project administration

Competing interest

The authors declare none.

Ethical Standards

This research relied on secondary data from the Manifesto Project.

Author Biographies

Marvin Stecker is a research associate and PhD student at the Department of Communication and the Department of Government at the University of Vienna, Austria. He uses computational methods to study political communication, with a focus on the meaning and salience of social identities, as well as methodological questions.

Frederic R. Hopp received his PhD in Communication from the University of California in Santa Barbara, U.S.A. He is Junior (Assistant) Professor for Big Data in Psychology at the Leibniz Institute for Psychology, Trier, Germany and at Trier University. His research examines how morality permeates human communication and how moralized messages are cognitively processed.

Supplementary Material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2025.10011.

Footnotes

Edited by: Daniel J. Hopkins and Brandon M. Stewart

1 The dimensionality of the framework varies somewhat; in this article, because of the measurement instruments involved, we focus on the originally proposed five dimensions.

2 Where one end stands for green, alternative, and libertarian policies, the other for traditionalist, authoritarian, and nationalist policies.

3 We have excluded so far any approach involving generative text models due to concerns over their running cost, opaque training procedures, and stochastic output (Abdurahman et al. Reference Abdurahman2024; Bail Reference Bail2024; Barrie, Palmer, and Spirling Reference Barrie, Palmer and Spirling2024). Most reviews focusing on their use in automated text analysis have also focused on using them for classification tasks, rather than scoring documents on a continuous scale, as is the case in our application.

4 We settled on the Judgement items, because their wording clearly points to the Virtue dimension of each foundation. The relevance questions are more ambiguously worded, at least from a text analysis perspective, and variously point towards vice or virtue poles.

5 Because of the skewed distributions of our data, we use Kendall’s $\tau $ as our default correlation measure. See Figure A.2 in the Supplementary Material for robustness tests using Spearman correlations.

6 See Figure A.3 in the Supplementary Material for robustness tests across all foundations using Spearman correlations.

7 See Figure A.4 in the Supplementary Material for alternate correlation measures derived using Spearman’s correlations, and see Tables A.4 and A.5 in the Supplementary Material for exact measures, including significance levels.

References

Abdurahman, S., et al. 2024. “Perils and Opportunities in Using Large Language Models in Psychological Research.” PNAS Nexus 3 (7): 245. https://doi.org/10.1093/pnasnexus/pgae245.CrossRef Google Scholar PubMed

Aguiar, F., Corradi, G., and Aguilar, P.. 2023. “Ageing and Disgust: Is Old Age Associated with Harsher Moral Judgements?” Current Psychology 42 (10): 8460–8470. https://doi.org/10.1007/s12144-022-03423-1.CrossRef Google Scholar

Atari, M., et al. 2023. “The Paucity of Morality in Everyday Talk.” Scientific Reports 13 (1): 5967. https://doi.org/10.1038/s41598-023-32711-4.CrossRef Google Scholar PubMed

Atari, M., Omrani, A., and Dehghani, M.. 2023. “Contextualized Construct Representation: Leveraging Psychometric Scales to Advance Theory-Driven Text Analysis.” Preprint. PsyArXiv, February 24. https://doi.org/10.31234/osf.io/m93pd.CrossRef Google Scholar

Baden, C., Pipal, C., Schoonvelde, M., and van der Velden, M. A. C. G. 2022. “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda.” Communication Methods and Measures 16 (1): 1–18. https://doi.org/10.1080/19312458.2021.2015574.CrossRef Google Scholar

Bail, C. A. 2024. “Can Generative AI Improve Social Science?” Proceedings of the National Academy of Sciences 121 (21): e2314021121. https://doi.org/10.1073/pnas.2314021121.CrossRef Google Scholar PubMed

Barrie, C., Palmer, A., and Spirling, A.. 2024. “Replication for Language Models—Problems, Principles, and Best Practice for Political Science.” https://arthurspirling.org/documents/BarriePalmerSpirling_TrustMeBro.pdf. Pre-published.Google Scholar

Bayrak, F., and Alper, S.. 2021. “A Tale of Two Hashtags: An Examination of Moral Content of Pro- and Anti-government Tweets in Turkey.” European Journal of Social Psychology 51 (3): 585–596. https://doi.org/10.1002/ejsp.2763.CrossRef Google Scholar

Bedregal, P., and León, T.. 2008. “Moral Foundations Questionnaire (Spanish Translation).” Accessed November 16, 2023. https://moralfoundations.org/questionnaires/.Google Scholar

Benoit, K., et al. 2018. “Quanteda: An R Package for the Quantitative Analysis of Textual Data.” Journal of Open Source Software 3 (30): 774. https://doi.org/10.21105/joss.00774.CrossRef Google Scholar

Benoit, K. R. 2023. “Quanteda.Dictionaries.” https://github.com/kbenoit/quanteda.dictionaries.Google Scholar

Bird, S., Klein, E., and Loper, E.. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol, CA: O’Reilly Media, Inc. https://www.nltk.org/book/.Google Scholar

Bos, L., and Minihold, S.. 2022. “The Ideological Predictors of Moral Appeals by European Political Elites; An Exploration of the Use of Moral Rhetoric in Multiparty Systems.” Political Psychology 43 (1): 45–63. https://doi.org/10.1111/pops.12739.CrossRef Google Scholar

Carvalho, F., and Guedes, G.. 2022. “Dicionário de Fundamentos Morais Em Espanhol.” In Nuevas Ideas En Inform á tica Educativa, Volumen 16, edited by Nussbaum, M., Infante, C., and Sánchez, J., 287–291. Chile: Universidad de. https://www.tise.cl/Volumen16/Short%20Paper/TISE_2022_paper_11.pdf.Google Scholar

Chan, C.-H., et al. 2021. “Four Best Practices for Measuring News Sentiment Using ‘off-the-Shelf’ Dictionaries: A Large-Scale p-Hacking Experiment.” Computational Communication Research 3 (1): 1–27. https://doi.org/10.5117/CCR2021.1.001.CHAN.CrossRef Google Scholar

Clifford, S., Iyengar, V., Cabeza, R., and Sinnott-Armstrong, W.. 2015. “Moral Foundations Vignettes: A Standardized Stimulus Database of Scenarios Based on Moral Foundations Theory.” Behavior Research Methods 47 (4): 1178–1198. https://doi.org/10.3758/s13428-014-0551-2.CrossRef Google Scholar PubMed

Clifford, S., and Jerit, J.. 2013. “How Words Do the Work of Politics: Moral Foundations Theory and the Debate over Stem Cell Research.” The Journal of Politics 75 (3): 659–671. https://doi.org/10.1017/S0022381613000492.CrossRef Google Scholar

Daenekindt, S., and Schaap, J.. 2022. “Using Word Embedding Models to Capture Changing Media Discourses: A Study on the Role of Legitimacy, Gender and Genre in 24,000 Music Reviews, 1999–2021.” Journal of Computational Social Science 5 (2): 1615–1636. https://doi.org/10.1007/s42001-022-00182-8.CrossRef Google Scholar

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers), edited by J. Burstein, C. Doran, and T. Solorio, 4171–4186. Minneapolis, MN: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423.Google Scholar

De Vries, E., Schoonvelde, M., and Schumacher, G.. 2018. “No Longer Lost in Translation: Evidence That Google Translate Works for Comparative Bag-of-Words Text Applications.” Political Analysis 26 (4): 417–430. https://doi.org/10.1017/pan.2018.26.CrossRef Google Scholar

Döring, H., and Manow, P.. 2024. “ParlGov 2024 Release.” In collaboration with Holger Döring: 7307264, 106797, 5308416, 7829, 1925238, 283337, 7171, 248194, 1243395, 1567278. Accessed February 20, 2025. https://doi.org/10.7910/DVN/2VZ5ZC.CrossRef Google Scholar

Frimer, J. A. 2020. “Do Liberals and Conservatives Use Different Moral Languages? Two Replications and Six Extensions of Graham, Haidt, and Nosek’s (2009) Moral Text Analysis.” Journal of Research in Personality 84: 103906. https://doi.org/10.1016/j.jrp.2019.103906.CrossRef Google Scholar

Frimer, J. A., Boghrati, R., Haidt, J., Graham, J., and Deghani, M.. 2019. “Moral Foundations Dictionaries for Linguistic Analyses, 2.0.” Pre-published. https://doi.org/10.17605/OSF.IO/EZN37.CrossRef Google Scholar

Garten, J., Hoover, J., Johnson, K. M., Boghrati, R., Iskiwitch, C., and Dehghani, M.. 2018. “Dictionaries and Distributions: Combining Expert Knowledge and Large Scale Textual Data Content Analysis: Distributed Dictionary Representation.” Behavior Research Methods 50 (1): 344–361. https://doi.org/10.3758/s13428-017-0875-9.CrossRef Google Scholar PubMed

Gelman, A., and Carlin, J.. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–651. https://doi.org/10.1177/1745691614551642.CrossRef Google Scholar PubMed

Graham, J., et al. 2013. “Moral Foundations Theory.” In Advances in Experimental Social Psychology, vol. 47, edited by P. Devine and A. Plant, 55–130. Elsevier. https://doi.org/10.1016/B978-0-12-407236-7.00002-4.Google Scholar

Graham, J., Haidt, J., and Nosek, B. A.. 2009. “Liberals and Conservatives Rely on Different Sets of Moral Foundations.” Journal of Personality and Social Psychology 96 (5): 1029–1046. https://doi.org/10.1037/a0015141.CrossRef Google Scholar PubMed

Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., and Ditto, P. H.. 2011. “Mapping the Moral Domain.” Journal of Personality and Social Psychology 101 (2): 366–385. https://doi.org/10.1037/a0021847.CrossRef Google Scholar PubMed

Hackenburg, K., Brady, W. J, and Tsakiris, M.. 2023. “Mapping Moral Language on US Presidential Primary Campaigns Reveals Rhetorical Networks of Political Division and Unity.” PNAS Nexus 2 (6): 1–9. https://doi.org/10.1093/pnasnexus/pgad189.CrossRef Google Scholar PubMed

Hatemi, P. K., Crabtree, C., and Smith, K. B.. 2019. “Ideology Justifies Morality: Political Beliefs Predict Moral Foundations.” American Journal of Political Science 63 (4): 788–806. https://doi.org/10.1111/ajps.12448.CrossRef Google Scholar

Hoover, Joe, et al. 2020. “Moral Foundations Twitter Corpus: A Collection of 35k Tweets Annotated for Moral Sentiment.” Social Psychological and Personality Science 11 (8): 1057–1071. https://doi.org/10.1177/1948550619876629.CrossRef Google Scholar

Hopp, F. R., Amir, O., Fisher, J. T., Grafton, S., Sinnott-Armstrong, W., and Weber, R.. 2023. “Moral Foundations Elicit Shared and Dissociable Cortical Activation Modulated by Political Ideology.” Nature Human Behaviour 7 (12): 2182–2198. https://doi.org/10.1038/s41562-023-01693-8.CrossRef Google Scholar PubMed

Hopp, F. R., Fisher, J. T., Cornell, D., Huskey, R., and Weber, R.. 2021. “The Extended Moral Foundations Dictionary (eMFD): Development and Applications of a Crowd-Sourced Approach to Extracting Moral Intuitions from Text.” Behavior Research Methods 53 (1): 232–246. https://doi.org/10.3758/s13428-020-01433-0.CrossRef Google Scholar

Hopp, F. R., Jargow, B., Kouwen, E., and Bakker, B. N.. 2024. “The Dutch Moral Foundations Stimulus Database: An Adaptation and Validation of Moral Vignettes and Sociomoral Images in a Dutch Sample.” Judgment and Decision Making 19. https://doi.org/10.1017/jdm.2024.5.CrossRef Google Scholar

Hopp, F. R., and Weber, R.. 2021. “Reflections on Extracting Moral Foundations from Media Content.” Communication Monographs 88 (3): 371–379. https://doi.org/10.1080/03637751.2021.1963513.CrossRef Google Scholar

Ivanusch, C., and Regel, S.. 2024. Manifesto Corpus Translation. Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB) / Göttingen: Institut für Demokratieforschung (IfDem).Google Scholar

Jockel, S., Dogruel, L., Arendt, K., Stahl, H., and Bowmann, N.. 2010. “Moral Foundations Questionnaire (German Translation).” Accessed November 16, 2023. https://moralfoundations.org/questionnaires/.Google Scholar

Jolly, S., et al. 2022. “Chapel Hill Expert Survey Trend File, 1999–2019.” Electoral Studies 75: 102420. https://doi.org/10.1016/j.electstud.2021.102420.CrossRef Google Scholar

Jung, J.-H. 2020. “The Mobilizing Effect of Parties’ Moral Rhetoric.” American Journal of Political Science 64 (2): 341–355. https://doi.org/10.1111/ajps.12476.CrossRef Google Scholar

Kennedy, B., et al. 2023. “The (Moral) Language of Hate.” PNAS Nexus 2 (7): pgad210. https://doi.org/10.1093/pnasnexus/pgad210.CrossRef Google Scholar PubMed

Kivikangas, J. M., Fernández-Castilla, B., Järvelä, S., Ravaja, N., and Lönnqvist, J.-E.. 2021. “Moral Foundations and Political Orientation: Systematic Review and Meta-Analysis.” Psychological Bulletin 147 (1): 55–94. https://doi.org/10.1037/bul0000308.CrossRef Google Scholar PubMed

Knill, C. 2013. “The Study of Morality Policy: Analytical Implications from a Public Policy Perspective.” Journal of European Public Policy 20 (3): 309–317. https://doi.org/10.1080/13501763.2013.761494.CrossRef Google Scholar

Kraft, P. W., and Klemmensen, R.. 2024. “Lexical Ambiguity in Political Rhetoric: Why Morality Doesn’t Fit in a Bag of Words.” British Journal of Political Science 54 (1): 201–219. https://doi.org/10.1017/S000712342300008X.CrossRef Google Scholar

Lehmann, P., et al. 2024. “Manifesto Corpus.” Accessed July 16, 2024. https://doi.org/10.25522/MANIFESTO.MPDS.2024A.CrossRef Google Scholar

Lehmann, P., et al. 2024. Manifesto Project Dataset. Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB) / Göttingen: Institut für Demokratieforschung (IfDem). Accessed June 24, 2025. https://doi.org/10.25522/MANIFESTO.MPDS.2024A.Google Scholar

Lewis, P. G. 2019. “Moral Foundations in the 2015-16 U.S. Presidential Primary Debates: The Positive and Negative Moral Vocabulary of Partisan Elites.” Social Sciences 8 (8): 233. https://doi.org/10.3390/socsci8080233.CrossRef Google Scholar

Licht, H., and Lind, F.. 2023. “Going Cross-Lingual: A Guide to Multilingual Text Analysis.” Computational Communication Research 5 (2): 1. https://doi.org/10.5117/CCR2023.2.3.LICH.CrossRef Google Scholar

Licht, H., Sczepanski, R., Laurer, M., and Bekmuratovna, A.. 2024. “No More Cost in Translation: Validating Open-Source Machine Translation for Quantitative Text Analysis.” Accessed October 9, 2024. https://doi.org/10.31219/osf.io/9trjs.CrossRef Google Scholar

Miles, M. R. 2016. “Presidential Appeals to Moral Foundations: How Modern Presidents Persuade Cross-Ideologues.” Policy Studies Journal 44 (4): 471–490. https://doi.org/10.1111/psj.12151.CrossRef Google Scholar

Mokhberian, N., Marmarelis, M., Hopp, F., Basile, V., Morstatter, F., and Lerman, K.. 2024. “Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks.” In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), edited by Duh, K., Gomez, H., and Bethard, S., 7337–7349. Mexico City: Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-long.407.CrossRef Google Scholar

Montani, I., Honnibal, M., Honnibal, M., Boyd, A., Van Landeghem, S., and Peters, H.. 2023. “spaCy: Industrial-strength Natural Language Processing in Python.” Accessed April 30, 2024. https://doi.org/10.5281/ZENODO.1212303.CrossRef Google Scholar

Mooney, C. Z., and Schuldt, R. G.. 2008. “Does Morality Policy Exist? Testing a Basic Assumption.” Policy Studies Journal 36 (2): 199–218. https://doi.org/10.1111/j.1541-0072.2008.00262.x.CrossRef Google Scholar

Pipal, C., Song, H., and Boomgaarden, H. G.. 2022. “If You Have Choices, Why Not Choose (and Share) All of Them? A Multiverse Approach to Understanding News Engagement on Social Media.” Digital Journalism 11 (2): 255–275. https://doi.org/10.1080/21670811.2022.2036623.CrossRef Google Scholar

Preniqi, V., Ghinassi, I., Ive, J., Saitis, C., and Kalimeri, K.. 2024. “MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions.” In Proceedings of the 2024 International Conference on Information Technology for Social Good (GoodIT '24), 433–442. New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3677525.3678694 CrossRef Google Scholar

Řehůřek, R., and Sojka, P.. 2010. “Software Framework for Topic Modelling with Large Corpora.” In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, edited by R. Witte, H. Cunningham, J. Patrick, E. Beisswanger, E. Buyko, U. Hahn, K. Verspoor, and A. R. Coden, 45–50. Valletta: ELRA.Google Scholar

Reimers, N., and Gurevych, I.. 2019. “Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, edited by K. Inui, J. Jiang, V. Ng, and X. Wan. Association for Computational Linguistics. arxiv: https://arxiv.org/abs/1908.10084.Google Scholar

Reimers, N., and Gurevych, I.. 2020. “Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by B. Webber, T. Cohn, Y. He, and Y. Liu, 4512–4525. Association for Computational Linguistics.CrossRef Google Scholar

Sagi, E., and Dehghani, M.. 2014. “Measuring Moral Rhetoric in Text.” Social Science Computer Review 32 (2): 132–144. https://doi.org/10.1177/0894439313506837.CrossRef Google Scholar

Simmons, G. 2023. “Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity.” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 282–297. Toronto, ON: Association for Computational Linguistics. Accessed February 26, 2025. https://doi.org/10.18653/v1/2023.acl-srw.40.CrossRef Google Scholar

Simonsen, K. B., and Widmann, T.. 2023. “The Politics of Right and Wrong: Moral Appeals in Political Communication over Six Decades in Ten Western Democracies.” Preprint. Open Science Framework, July 31. Accessed August 3, 2023. https://doi.org/10.31219/osf.io/m6qkg.CrossRef Google Scholar

Simonsen, K. B., and Widmann, T.. 2025. “When Do Political Parties Moralize?: A Cross-National Study of the Use of Moral Language in Political Communication on Immigration.” British Journal of Political Science 55. https://doi.org/10.1017/s0007123425000122.CrossRef Google Scholar

Stecker, M., and Hopp, F.. 2025. “Replication Data for: Moral Foundation Measurements Fail to Converge on Multilingual Party Manifestos.” https://doi.org/10.7910/DVN/FZVF5X.CrossRef Google Scholar

Steegen, S., Tuerlinckx, F., Gelman, A., and Vanpaemel, W.. 2016. “Increasing Transparency Through a Multiverse Analysis.” Perspectives on Psychological Science 11 (5): 702–712. https://doi.org/10.1177/1745691616658637.CrossRef Google Scholar PubMed

Stoltz, D. S. 2019. “Concept Mover’s Distance: Measuring Concept Engagement via Word Embeddings in Texts.” Journal of Computational Social Science 2: 293–313. https://doi.org/10.1007/s42001-019-00048-6.CrossRef Google Scholar

Trager, J., et al. 2022. “The Moral Foundations Reddit Corpus.” August 17. Accessed August 21, 2024. arXiv:2208.05545 [cs]. http://arxiv.org/abs/2208.05545. Pre-published.Google Scholar

Van Vliet, L. 2021. “Moral Expressions in 280 Characters or Less: An Analysis of Politician Tweets Following the 2016 Brexit Referendum Vote.” Frontiers in Big Data 4: 699653. https://doi.org/10.3389/fdata.2021.699653.CrossRef Google Scholar PubMed

Van Leeuwen, F. 2010. “Moral Foundations Questionnaire (Dutch Translation).” Accessed November 16, 2023. https://moralfoundations.org/questionnaires/.Google Scholar

Vaswani, A., et al. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems, edited by Guyon, I., et al., vol. 30. Long Beach, CA: Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.Google Scholar

Voyer, A., Kline, Z. D., and Danton, M.. 2022. “Symbols of Class: A Computational Analysis of Class Distinction-Making through Etiquette, 1922-2017.” Poetics 94: 101734. https://doi.org/10.1016/j.poetic.2022.101734.CrossRef Google Scholar

Wang, S.-Y. N., and Inbar, Y.. 2020. “Moral-Language Use by U.S. Political Elites.” Psychological Science 32 (1): 14–26. https://doi.org/10.1177/0956797620960397.CrossRef Google Scholar PubMed

Watts, D. J. 2017. “Should Social Science Be More Solution-Oriented?” Nature Human Behaviour 1 (1): 0015. https://doi.org/10.1038/s41562-016-0015.CrossRef Google Scholar

Weber, R., 2021. “Extracting Latent Moral Information from Text Narratives: Relevance, Challenges, and Solutions.” In Computational Methods for Communication Science, edited by Atteveldt, W. van and Peng, Tai-Quan. New York, NY: Routledge, Taylor & Francis Group.Google Scholar

Wirsching, E. M., Rodriguez, P. L., Spirling, A., and Stewart, B. M.. 2025. “Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages.” Political Analysis 33 (2): 156–163. https://doi.org/10.1017/pan.2024.17.CrossRef Google Scholar

Yarkoni, T., and Westfall, J.. 2017. “Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning.” Perspectives on Psychological Science 12 (6): 1100–1122. https://doi.org/10.1177/1745691617693393.CrossRef Google Scholar PubMed

Table 1 Descriptives for each language.

Table 2 Descriptives for the top ten countries.

Table 3 Measurement Tools.

Figure 1 Distributions of all morality measurements applied to the corpus of English-translated manifestos (normalized).

Figure 2 Mean Kendall’s correlations between moral foundations within measurements.

Figure 3 Mean Kendall’s correlation between all foundations across measurements.

Stecker and Hopp supplementary material

File 1.7 MB

Article contents

Moral Foundation Measurements Fail to Converge on Multilingual Party Manifestos

Abstract

Keywords

Information

1 Introduction

2 Measurement Strategies for Extracting Moral Foundations from Text

3 Empirical Strategy

3.1 Data

3.2 Research Questions

3.3 Measurements

3.3.1 MFD: Moral Foundations Dictionary

3.3.2 MFD2: Moral Foundations Dictionary 2.0

3.3.3 eMFD: extended Moral Foundations Dictionary

3.3.4 DDR: Distributed Dictionary Representation

3.3.5 CCR: Contextual Concept Representation

3.3.6 MoralBERT

4 Results

4.1 RQ1: Descriptive Results

4.2 RQ2: Multilingual Variation

4.3 RQ3: Multiverse Analysis of Substantial Results

5 Discussion

5.1 Best Practices for Measuring Moral Foundations in Political Texts

5.2 Limitations

5.3 Conclusion

Acknowledgements

Financial Support

Data Availability Statement

Author Contributions

Competing interest

Ethical Standards

Author Biographies

Supplementary Material

Footnotes

References

Stecker and Hopp supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests