To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A close connection between public opinion and policy is considered a vital element of democracy. However, legislators cannot be responsive to all voters at all times with regard to the policies the latter favour. We argue that legislators use their speaking time in parliament to offer compensatory speech to their constituents who might oppose how they voted on a policy, in order to re‐establish themselves as responsive to the public's wishes. Leveraging the case of Brexit, we show that legislators pay more attention to constituents who might be dissatisfied with how they voted. Furthermore, their use of rhetorical responsiveness is contingent on the magnitude of the representational deficit they face vis‐à‐vis their constituency. Our findings attest to the central role of parliamentary speech in maintaining responsiveness. They also demonstrate that communicative responsiveness can substitute for policy responsiveness.
Debates about the European Union's democratic legitimacy put national parliaments into the spotlight. Do they enhance democratic accountability by offering visible debates and electoral choice about multilevel governance? To support such accountability, saliency of EU affairs in the plenary ought to be responsive to developments in EU governance, has to be linked to decision‐making moments and should feature a balance between government and opposition. The recent literature discusses various partisan incentives that support or undermine these criteria, but analyses integrating these arguments are rare. This article provides a novel comparative perspective by studying the patterns of public EU emphasis in more than 2.5 million plenary speeches from the German Bundestag, the British House of Commons, the Dutch Tweede Kamer and the Spanish Congreso de los Diputados over a prolonged period from 1991 to 2015. It documents that parliamentary actors are by and large responsive to EU authority and its exercise where especially intergovernmental moments of decision making spark plenary EU salience. But the salience of EU issues is mainly driven by government parties, decreases in election time and is negatively related to public Euroscepticism. The article concludes that national parliaments have only partially succeeded in enhancing EU accountability and suffer from an opposition deficit in particular.
The promises and pitfalls of automated (computer-assisted) and human-coding content analysis techniques applied to political science research have been extensively discussed in the scholarship on party politics and legislative studies. This study presents a similar comparative analysis outlining the pay-offs and trade-offs of these two methods of content analysis applied to research on EU lobbying. The empirical focus is on estimating interest groups’ positions based on their formally submitted policy position documents in the context of EU policymaking. We identify the defining characteristics of these documents and argue that the choice for a method of content analysis should be informed by a concern for addressing the specificities of the research topic covered, of the research question asked and of the data sources employed. We discuss the key analytical assumptions and methodological requirements of automated and human-coding text analysis and the degree to which they match the identified text characteristics. We critically assess the most relevant methodological challenges research designs face when these requirements need to be complied with and how these challenges might affect measurement validity. We also compare the two approaches in terms of their reliability and resource intensity. The article concludes with recommendations and issues for future research.
From the early use of TF-IDF to the high-dimensional outputs of deep learning, vector space embeddings of text, at a scale ranging from token to document, are at the heart of all machine analysis and generation of text. In this article, we present the first large-scale comparison of a sampling of such techniques on a range of classification tasks on a large corpus of current literature drawn from the well-known Books3 data set. Specifically, we compare TF-IDF, Doc2vec and several Transformer-based embeddings on a variety of text-specific tasks. Using industry-standard BISAC codes as a proxy for genre, we compare embeddings in their ability to preserve information about genre. We further compare these embeddings in their ability to encode inter- and intra-book similarity. All of these comparisons take place at the book “chunk” (1,024 tokens) level. We find Transformer-based (“neural”) embeddings to be best, in the sense of their ability to respect genre and authorship, although almost all embedding techniques produce sensible constructions of a “literary landscape” as embodied by the Books3 corpus. These experiments suggest the possibility of using deep learning embeddings not only for advances in generative AI, but also a potential tool for book discovery and as an aid to various forms of more traditional comparative textual analysis.
Supervised learning is increasingly used in social science research to quantify abstract concepts in textual data. However, a review of recent studies reveals inconsistencies in reporting practices and validation standards. To address this issue, we propose a framework that systematically outlines the process of transforming text into a quantitative measure, emphasizing key reporting decisions at each stage. Clear and comprehensive validation is crucial, enabling readers to critically evaluate both the methodology and the resulting measure. To illustrate our framework, we develop and validate a measure assessing the tone of questions posed to nominees during U.S. Senate confirmation hearings. This study contributes to the growing literature advocating for transparency and rigor in applying machine learning methods within computational social sciences.
Servitization is a key strategy for enhancing competitiveness in manufacturing, yet the managerial drivers behind this transformation remain underexplored. This study investigates the impact of top executives’ service cognition on servitization using a novel index derived from text-mined disclosures of Chinese listed manufacturing firms (2007–2020). Results show that executives’ service cognition significantly promotes servitization, even after controlling for endogeneity using instrumental variables and Heckman’s two-stage model. Mechanism analysis reveals that this cognitive orientation enhances human capital accumulation and R&D investment, which in turn drive higher service levels. Furthermore, the relationship is moderated by executive power concentration and regional internet penetration. Heterogeneity tests indicate stronger effects in high-tech industries, state-owned enterprises, and large firms. These findings highlight the critical role of executive cognition in shaping strategic transformation and offer practical implications for firms and policymakers aiming to foster servitization through leadership development and supportive digital infrastructure.
The sentiment expressed in a legislator’s speech is informative. However, extracting legislators’ sentiment requires human-annotated data. Instead, we propose exploiting closing debates on a bill in Japan, where legislators in effect label their speech as either pro or con. We utilize debate speeches as the training dataset, fine-tune a pretrained model, and calculate the sentiment scores of other speeches. We show that the more senior the opposition members are, the more negative their sentiment. Additionally, we show that opposition members become more negative as the next election approaches. We also demonstrate that legislators’ sentiments can be used to predict their behaviors by using the case in which government members rebelled in the historic vote of no confidence in 1993.
Chapter 7 builds on students’ understanding of arrays and numeric and logical data types from Chapters 2 and 4, demonstrating how to use what they already know to manipulate text in MATLAB. Text in MATLAB comes in two forms: character arrays, in which text is stored in individual letters, numbers, symbols, and spaces; and strings, in which each element of text can store any number of those characters. Differences in the utility of these structures for different tasks are discussed, as is their interchangeability when providing inputs to other MATLAB functions. Once text is introduced, students learn to interface with MATLAB via input/output features, both in the console and in pop-up windows. Lastly, because MATLAB code is also text, students learn to run text as MATLAB code, as well as potential issues with doing so and workarounds to avoid those issues.
Propagandists discredit political ideas that rival their own. In China’s state-run media, one common technique is to place the phrase so-called, in English, or 所谓, in Chinese, before the idea to be discredited. In this research note we apply quantitative text analysis methods to over 45,000 Xinhua articles from 2003 to 2022 containing so-called or 所谓 to better understand the ideas the government wishes to discredit for different audiences. We find that perceived challenges to China’s sovereignty consistently draw usage of the term and that a theme of rising importance is political rivalry with the United States. When it comes to differences between internal and external propaganda, we find broad similarities, but differences in how the US is discredited and more emphasis on cooperation for foreign audiences. These findings inform scholarship on comparative authoritarian propaganda and Chinese propaganda specifically.
A critical challenge for biomedical investigators is the delay between research and its adoption, yet there are few tools that use bibliometrics and artificial intelligence to address this translational gap. We built a tool to quantify translation of clinical investigation using novel approaches to identify themes in published clinical trials from PubMed and their appearance in the natural language elements of the electronic health record (EHR).
Methods:
As a use case, we selected the translation of known health effects of exercise for heart disease, as found in published clinical trials, with the appearance of these themes in the EHR of heart disease patients seen in an emergency department (ED). We present a self-supervised framework that quantifies semantic similarity of themes within the EHR.
Results:
We found that 12.7% of the clinical trial abstracts dataset recommended aerobic exercise or strength training. Of the ED treatment plans, 19.2% related to heart disease. Of these, the treatment plans that included heart disease identified aerobic exercise or strength training only 0.34% of the time. Treatment plans from the overall ED dataset mentioned aerobic exercise or strength training less than 5% of the time.
Conclusions:
Having access to publicly available clinical research and associated EHR data, including clinician notes and after-visit summaries, provided a unique opportunity to assess the adoption of clinical research in medical practice. This approach can be used for a variety of clinical conditions, and if assessed over time could measure implementation effectiveness of quality improvement strategies and clinical guidelines.
This paper studies the role of central bank communication for the monetary policy transmission mechanism using text analysis techniques. In doing so, we derive sentiment measures from European Central Bank (ECB)’s press conferences indicating a dovish or hawkish tone referring to interest rates, inflation, and unemployment. We provide strong evidence for predictability of our sentiments on interbank interest rates, even after controlling for actual policy rate changes. We also find that our sentiment indicators offer predictive power for professionals’ expectations, the disagreement among them, and their uncertainty regarding future inflation as well as future interest rates. Policy communication shocks identified through sign restrictions based on our sentiment measure also have significant effects on real outcomes. Overall, our findings highlight the importance of the tone of central bank communication for the transmission mechanism of monetary policy, but also indicate the necessity of refinements of the communication policies implemented by the ECB to better anchor inflation expectations at the target level and to reduce uncertainty regarding the future path of monetary policy.
It is often argued that when legislators have personal vote-seeking incentives, parties are less unified because legislators need to build bonds of accountability with their voters. I argue that these effects depend on a legislator’s ability to cultivate a personal vote. When parties control access to the ballot and the resources candidates need to cultivate personal votes, they can condition a legislator’s access to these resources on loyalty to the party’s agenda. I test this theory by conducting a difference-in-differences analysis that leverages the staggered implementation of the 2014 Mexican Electoral Reform. This reform introduced the possibility of consecutive reelection for state legislators, increasing their incentives to cultivate personal votes. I study unity in position-taking and voting behaviour of Mexican state legislators from 2012 to 2018. To analyze position-taking, I apply correspondence analysis to a new dataset of over half a million legislative speeches in twenty states. To study voting, I analyze over 14,500 roll-call votes in fourteen states during the same period. Results show that reelection incentives increased intra-party unity, which has broad implications for countries introducing electoral reforms aiming to personalize politics.
A common challenge in studying Italian parliamentary discourse is the lack of accessible, machine-readable, and systematized parliamentary data. To address this, this article introduces the ItaParlCorpus dataset, a new, annotated, machine-readable collection of Italian parliamentary plenary speeches for the Camera dei Deputati, the lower house of Parliament, spanning from 1948 to 2022. This dataset encompasses 470 million words and 2.4 million speeches delivered by 5830 unique speakers representing 77 different political parties. The files are designed for easy processing and analysis using widely-used programming languages, and they include metadata such as speaker identification and party affiliation. This opens up opportunities for in-depth analyses on a variety of topics related to parliamentary behavior, elite rhetoric, and the salience of political themes, exploring how these vary across party families and over time.
Previous accounts have suggested a potential divergence between Xi Jinping and Li Keqiang in their approaches to economic governance. This study examines the policy orientations of the two leaders concerning state–market relations, providing empirical evidence for the recent manifestation of what insiders have termed the “dispute between north and south houses” (nanbeiyuan zhi zheng) and its economic implications. By applying semi-supervised machine learning methods to textual data, this study demonstrates that Li favoured market-oriented policies, whereas Xi displayed a pronounced preference for state-centric strategies. The findings notably indicate an initial divergence in policy orientation, which was followed by a considerable convergence during Xi's second term. Our analysis further reveals that Li's market-oriented rhetoric was particularly prominent during “Mass innovation week,” indicating a campaign-style policy mobilization. Moreover, the analysis identifies that the discursive differences between the two leaders are associated with a decline in firm-level investment, suggesting that disparities in policy orientation may engender political uncertainty. This study contributes to the extant literature on the impact of leadership dynamics on economic policy, the implications of mixed signals from the central leadership and the phenomenon of campaign-style mobilization in China.
We apply moral foundations theory (MFT) to explore how the public conceptualizes the first eight months of the conflict between Ukraine and the Russian Federation (Russia). Our analysis includes over 1.1 million English tweets related to the conflict over the first 36 weeks. We used linguistic inquiry word count (LIWC) and a moral foundations dictionary to identify tweets’ moral components (care, fairness, loyalty, authority, and sanctity) from the United States, pre- and post-Cold War NATO countries, Ukraine, and Russia. Following an initial spike at the beginning of the conflict, tweet volume declined and stabilized by week 10. The level of moral content varied significantly across the five regions and the five moral components. Tweets from the different regions included significantly different moral foundations to conceptualize the conflict. Across all regions, tweets were dominated by loyalty content, while fairness content was infrequent. Moral content over time was relatively stable, and variations were linked to reported conflict events.
Stylistics is the linguistic study of style in language. Now in its second edition, this book is an introduction to stylistics that locates it firmly within the traditions of linguistics. Organised to reflect the historical development of stylistics, it covers key principles such as foregrounding theory, as well as recent advances in cognitive and corpus stylistics. This edition has been fully revised to cover all the major developments in the field since the first edition, including extensive coverage of corpus stylistics, new sections on a range of topics, additional exercises and commentaries, updated further reading lists, and an entirely re-written final chapter on the disciplinary status of stylistics and its relationship to linguistics, plus a manifesto for the future of the field. Comprehensive in its coverage and assuming no prior knowledge of the subject, it is essential reading for students and researchers new to this fascinating area of language study.
Large language models are a powerful tool for conducting text analysis in political science, but using them to annotate text has several drawbacks, including high cost, limited reproducibility, and poor explainability. Traditional supervised text classifiers are fast and reproducible, but require expensive hand annotation, which is especially difficult for rare classes. This article proposes using LLMs to generate synthetic training data for training smaller, traditional supervised text models. Synthetic data can augment limited hand annotated data or be used on its own to train a classifier with good performance and greatly reduced cost. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, a simple technique for improving the quality of synthetic text, and an illustration of its limitations. I demonstrate the usefulness of synthetic training through three validations: synthetic news articles describing police responses to communal violence in India for training an event detection system, a multilingual corpus of synthetic populist manifesto statements for training a sentence-level populism classifier, and generating synthetic tweets describing the fighting in Ukraine to improve a named entity system.
Content analysis is a valuable tool for analysing policy discourse, but annotation by humans is costly and time consuming. ChatGPT is a potentially valuable tool to partially automate content analysis for policy debates, largely replacing human annotators. We evaluate ChatGPT’s ability to classify documents using pre-defined argument descriptions, comparing its performance with human annotators for two policy debates: the Universal Basic Income debate on Dutch Twitter (2014–2016) and the pension reforms debate in German newspapers (1993–2001). We use the API (GPT-4 Turbo) and user interface version (GPT-4) and evaluate multiple performance metrics (accuracy, precision and recall). ChatGPT is highly reliable and accurate in classifying pre-defined arguments across datasets. However, precision and recall are much lower, and vary strongly between arguments. These results hold for both datasets, despite differences in language and media type. Moreover, the cut-off method proposed in this paper may aid researchers in navigating the trade-off between detection and noise. Overall, we do not (yet) recommend a blind application of ChatGPT to classify arguments in policy debates. Those interested in adopting this tool should manually validate bot classifications before using them in further analyses. At least for now, human annotators are here to stay.
Populist radical right (PRR) parties' attacks against prevailing historical interpretations have received much public attention because they question the foundations of countries' political orders. Yet, how prominent are such attacks and what characterizes their sentiment and content? This article proposes an integrated mixed-methods approach to investigate the prominence, sentiment, and interpretations of history in PRR politicians' parliamentary speeches. Studying the case of Germany, we conducted a quantitative analysis of national parliamentary speeches (2017–2021), combined with a qualitative analysis of all speeches made by Alternative for Germany (AfD) in 2017–2018. The AfD does not use historical markers more prominently but is distinctly less negative when speaking about history compared to its general political language. The collocation and qualitative analyses reveal the nuanced ways in which the AfD affirms and disavows various mnemonic traditions, underlining the PRR's complex engagement with established norms.
In its early days, the methods and theories of the digital humanities promised to reform our understanding of the canon, or, given a comprehensive archive of literature and the tools for analyzing all of it, even abolish it all together. Although these earlier utopian hopes for digital archives and computational text analysis have proven to be ill founded, the points of contact between the canon and the digital humanities have had a profound effect on both. From studies that test the formal properties of canonical literature to those that seek to explore the depths of newly available archives, the canon has remained an object of significant interest for scholars working in these burgeoning fields. This chapter explores the fraught relationship between the canon and computational analysis, arguing that, in the hands of cultural analytics, the canon has transformed from a prescriptive to a descriptive technology of literary study.