To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The rise of visually driven platforms like Instagram has reshaped how information is shared and understood. This study examines the role of social, cultural, and political (SCP) symbols in Instagram posts during Taiwan’s 2024 election, focusing on their influence in anti-misinformation efforts. Using large language models (LLMs)—GPT-4 Omni and Gemini Pro Vision—we analyzed thousands of posts to extract and classify symbolic elements, comparing model performance in consistency and interpretive depth. We evaluated how SCP symbols affect user engagement, perceptions of fairness, and content spread. Engagement was measured by likes, while diffusion patterns followed the SEIZ epidemiological model. Findings show that posts featuring SCP symbols consistently received more interaction, even when follower counts were equal. Although political content creators often had larger audiences, posts with cultural symbols drove the highest engagement, were perceived as more fair and trustworthy, and spread more rapidly across networks. Our results suggest that symbolic richness influences online interactions more than audience size. By integrating semiotic analysis, LLM-based interpretation, and diffusion modeling, this study offers a novel framework for understanding how symbolic communication shapes engagement on visual platforms. These insights can guide designers, policymakers, and strategists in developing culturally resonant, symbol-aware messaging to combat misinformation and promote credible narratives.
The capabilities of large language models (LLMs) have advanced to the point where entire textbooks can be queried using retrieval-augmented generation (RAG), enabling AI to integrate external, up-to-date information into its responses. This study evaluates the ability of two OpenAI models, GPT-3.5 Turbo and GPT-4 Turbo, to create and answer exam questions based on an undergraduate textbook. 14 exams were created with four true-false, four multiple-choice, and two short-answer questions derived from an open-source Pacific Studies textbook. Model performance was evaluated with and without access to the source material using text-similarity metrics such as ROUGE-1, cosine similarity, and word embeddings. Fifty-six exam scores were analyzed, revealing that RAG-assisted models significantly outperformed those relying solely on pre-trained knowledge. GPT-4 Turbo also consistently outperformed GPT-3.5 Turbo in accuracy and coherence, especially in short-answer responses. These findings demonstrate the potential of LLMs in automating exam generation while maintaining assessment quality. However, they also underscore the need for policy frameworks that promote fairness, transparency, and accessibility. Given regulatory considerations outlined in the European Union AI Act and the NIST AI Risk Management Framework, institutions using AI in education must establish governance protocols, bias mitigation strategies, and human oversight measures. The results of this study contribute to ongoing discussions on responsibly integrating AI in education, advocating for institutional policies that support AI-assisted assessment while preserving academic integrity. The empirical results suggest not only performance benefits but also actionable governance mechanisms, such as verifiable retrieval pipelines and oversight protocols, that can guide institutional policies.
The rise of large language models (LLMs) has marked a substantial leap toward artificial general intelligence. However, the utilization of LLMs in (re)insurance sector remains a challenging problem because of the gap between general capabilities and domain-specific requirements. Two prevalent methods for domain specialization of LLMs involve prompt engineering and fine-tuning. In this study, we aim to evaluate the efficacy of LLMs, enhanced with prompt engineering and fine-tuning techniques, on quantitative reasoning tasks within the (re)insurance domain. It is found that (1) compared to prompt engineering, fine-tuning with task-specific calculation dataset provides a remarkable leap in performance, even exceeding the performance of larger pre-trained LLMs; (2) when acquired task-specific calculation data are limited, supplementing LLMs with domain-specific knowledge dataset is an effective alternative; and (3) enhanced reasoning capabilities should be the primary focus for LLMs when tackling quantitative tasks, surpassing mere computational skills. Moreover, the fine-tuned models demonstrate a consistent aptitude for common-sense reasoning and factual knowledge, as evidenced by their performance on public benchmarks. Overall, this study demonstrates the potential of LLMs to be utilized as powerful tools to serve as AI assistants and solve quantitative reasoning tasks in (re)insurance sector.
The rapid development of generative artificial intelligence (AI) systems, particularly those fuelled by increasingly advanced large language models (LLMs), has raised concerns of their potential risks among policymakers globally. In July 2023, Chinese regulators enacted the Interim Measures for the Management of Generative AI Services (“the Measures”). The Measures aim to mitigate various risks associated with public-facing generative AI services, particularly those concerning content safety and security. At the same time, Chinese regulators are seeking the further development and application of such technology across diverse industries. Tensions between these policy objectives are reflected in the provisions of the Measures that entail different types of obligations on generative AI service providers. Such tensions present significant challenges for implementation of the regulation. As Beijing moves towards establishing a comprehensive legal framework for AI governance, legislators will need to further clarify and balance the responsibilities of diverse stakeholders.
The AI Act contains some specific provisions dealing with the possible use of artificial intelligence for discriminatory purposes or in discriminatory ways, in the context of the European Union. The AI Act also regulates generative AI models. However, these two respective sets of rules have little in common: provisions concerning non-discrimination tend not to cover generative AI, and generative AI rules tend not to cover discrimination. Based on this analysis, the Chapter considers what is currently the Eu legal framework on discriminatory output of generative AI models, and concludes that those expressions that are already prohibited by anti-discrimination law certainly remain prohibited after the approval of the AI Act, while discriminatory content that is not covered by Eu non-discrimination legislation will remain lawful. For the moment, the AI Act has not brought any particularly relevant innovation on this specific matter, but the picture might change in the future.
This chapter deals with the use of Large Language Models (LLMs) in the legal sector from a comparative law perspective, exploring their advantages and risks, the pertinent question as to whether the deployment of LLMs by non-lawyers can be classified as an unauthorized practice of law in the US and Germany, what lawyers, law firms and legal departments need to consider when using LLMs under professional rules of conduct - especially the American Bar Association Model Rules of Professional Conduct and the Charter of Core Principles of the European Legal Profession of the Council of Bars and Law Societies of Europe, and, finally, how the recently published AI Act will affect the legal tech market – specifically, the use of LLMs. A concluding section summarizes the main findings and points out open questions.
This study aims to explore the feasibility and accuracy of utilizing large language models (LLMs) to assess the risk of bias (ROB) in cohort studies. We conducted a pilot and feasibility study in 30 cohort studies randomly selected from reference lists of published Cochrane reviews. We developed a structured prompt to guide the ChatGPT-4o, Moonshot-v1-128k, and DeepSeek-V3 to assess the ROB of each cohort twice. We used the ROB results assessed by three evidence-based medicine experts as the gold standard, and then we evaluated the accuracy of LLMs by calculating the correct assessment rate, sensitivity, specificity, and F1 scores for overall and item-specific levels. The consistency of the overall and item-specific assessment results was evaluated using Cohen’s kappa (κ) and prevalence-adjusted bias-adjusted kappa. Efficiency was estimated by the mean assessment time required. This study assessed three LLMs (ChatGPT-4o, Moonshot-v1-128k, and DeepSeek-V3) and revealed distinct performance across eight assessment items. Overall accuracy was comparable (80.8%–83.3%). Moonshot-v1-128k showed superior sensitivity in population selection (0.92 versus ChatGPT-4o’s 0.55, P < 0.001). In terms of F1 scores, Moonshot-v1-128k led in population selection (F = 0.80 versus ChatGPT-4o’s 0.67, P = 0.004). ChatGPT-4o demonstrated the highest consistency (mean κ = 96.5%), with perfect agreement (100%) in outcome confidence. ChatGPT-4o was 97.3% faster per article (32.8 seconds versus 20 minutes manually) and outperformed Moonshot-v1-128k and DeepSeek-V3 by 47–50% in processing speed. The efficient and accurate assessment of ROB in cohort studies by ChatGPT-4o, Moonshot-v1-128k, and DeepSeek-V3 highlights the potential of LLMs to enhance the systematic review process.
Generative artificial intelligence has a long history but surged into global prominence with the introduction in 2017 of the transformer architecture for large language models. Based on deep learning with artificial neural networks, transformers revolutionised the field of generative AI for production of natural language outputs. Today’s large language models, and other forms of generative artificial intelligence, now have unprecedented capability and versatility. This emergence of these forms of highly capable generative AI poses many legal issues and questions, including consequences for intellectual property, contracts and licences, liability, data protection, use in specific sectors, potential harms, and of course ethics, policy, and regulation of the technology. To support the discussion of these topics in this Handbook, this chapter gives a relatively non-technical introduction to the technology of modern artificial intelligence and generative AI.
The advent and momentum gained by Generative AI erupted into the EU regulatory scene signalling a significant paradigm shift in the AI landscape. The AI Act has struggled to embrace the eruption and extraordinary popularity of Generative AI and managed to provide for specific solutions designed for these models. Nonetheless, there are legal and regulatory implications of Generative AI that may exceed the proposed solutions. Understanding the paradigm shift that Generative AI is likely to bring will allow us to assess the sufficiency and adequacy of the measures adopted and to identify possible shortcomings and gaps in the current EU framework. Generative AI raises specific problems in the compliance of AI Act obligations and in the application of liability rules that have to be acknowledged and properly addressed. Multimodality, emergence factor, scalability or generality of tasks may mismatch the assumption underlying the obligations and requirements laid down for AI systems. The chapter explores whether the current ecosystem of existing and still-to-be adopted rules on AI systems does fully and adequately address the distinctive features of Generative AI, with special consideration to the interaction between the AI Act and the liability rules as provided for the draft AILD and the revPLD.
The philosophy of linguistics reflects on multiple scientific disciplines aimed at the understanding of one of the most fundamental aspects of human existence, our ability to produce and understand natural language. Linguistics, viewed as a science, has a long history but it was the advent of the formal (and computational) revolution in cognitive science that established the field as both scientifically and philosophically appealing. In this Element, the topic will be approached as a means for understanding larger issues in the philosophy of science more generally.
Systematic reviews (SRs) synthesize evidence through a rigorous, labor-intensive, and costly process. To accelerate the title–abstract screening phase of SRs, several artificial intelligence (AI)-based semi-automated screening tools have been developed to reduce workload by prioritizing relevant records. However, their performance is primarily evaluated for SRs of intervention studies, which generally have well-structured abstracts. Here, we evaluate whether screening tool performance is equally effective for SRs of prognosis studies that have larger heterogeneity between abstracts. We conducted retrospective simulations on prognosis and intervention reviews using a screening tool (ASReview). We also evaluated the effects of review scope (i.e., breadth of the research question), number of (relevant) records, and modeling methods within the tool. Performance was assessed in terms of recall (i.e., sensitivity), precision at 95% recall (i.e., positive predictive value at 95% recall), and workload reduction (work saved over sampling at 95% recall [WSS@95%]). The WSS@95% was slightly worse for prognosis reviews (range: 0.324–0.597) than for intervention reviews (range: 0.613–0.895). The precision was higher for prognosis (range: 0.115–0.400) compared to intervention reviews (range: 0.024–0.057). These differences were primarily due to the larger number of relevant records in the prognosis reviews. The modeling methods and the scope of the prognosis review did not significantly impact tool performance. We conclude that the larger abstract heterogeneity of prognosis studies does not substantially affect the effectiveness of screening tools for SRs of prognosis. Further evaluation studies including a standardized evaluation framework are needed to enable prospective decisions on the reliable use of screening tools.
This article proposes a new approach for measuring the quality of answers in political question-and-answer sessions. We assess the quality of an answer based on how easily and accurately it can be recognized among a random set of candidate answers given the question’s text. This measure reflects the answer’s relevance and depth of engagement with the question. Drawing a parallel with semantic search, we can implement this approach by training a language model on the corpus of observed questions and answers without additional human-labeled data. We showcase and validate our methodology within the context of the Question Period in the Canadian House of Commons. Our analysis reveals that while some answers only have a weak semantic connection to questions, suggesting some evasion or obfuscation, they are generally at least moderately relevant, far exceeding what we would expect from random replies. We also find meaningful correlations between the quality of answers and the party affiliation of the members of Parliament asking the questions.
Against the proliferation of large language model (LLM) based Artificial Intelligence (AI) products such as ChatGPT and Gemini, and their increasing use in professional communication training, researchers, including applied linguists, have cautioned that these products (re)produce cultural stereotypes due to their training data. However, there is a limited understanding of how humans navigate the assumptions and biases present in the responses of these LLM-powered systems and the role humans play in perpetuating stereotypes during interactions with LLMs. In this article, we use Sequential-Categorial Analysis, which combines Conversation Analysis and Membership Categorization Analysis, to analyze simulated interactions between a human physiotherapist and three LLM-powered chatbot patients of Chinese, Australian, and Indian cultural backgrounds. Coupled with analysis of information elicited from LLM chatbots and the human physiotherapist after each interaction, we demonstrate that users of LLM-powered systems are highly susceptible to becoming interactionally entrenched in culturally essentialized narratives. We use the concepts of interactional instinct and interactional entrenchment to argue that whilst human–AI interaction may be instinctively prosocial, LLM users need to develop Critical Interactional Competence for human–AI interaction through appropriate and targeted training and intervention, especially when LLM-powered tools are used in professional communication training programs.
Protest event analysis (PEA) is the core method to understand spatial patterns and temporal dynamics of protest. We show how Large Language Models (LLM) can be used to automate the classification of protest events and of political event data more broadly with levels of accuracy comparable to humans, while reducing necessary annotation time by several orders of magnitude. We propose a modular pipeline for the automation of PEA (PAPEA) based on fine-tuned LLMs and provide publicly available models and tools which can be easily adapted and extended. PAPEA enables getting from newspaper articles to PEA datasets with high levels of precision without human intervention. A use case based on a large German news-corpus illustrates the potential of PAPEA.
Recent studies highlight the potential of large language models (LLMs) in citation screening for systematic reviews; however, the efficiency of individual LLMs for this application remains unclear. This study aimed to compare accuracy, time-related efficiency, cost, and consistency across four LLMs—GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.3 70B—for literature screening tasks. The models screened for clinical questions from the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock 2024. Sensitivity and specificity were calculated for each model based on conventional citation screening results for qualitative assessment. We also recorded the time and cost of screening and assessed consistency to verify reproducibility. A post hoc analysis explored whether integrating outputs from multiple models could enhance screening accuracy. GPT-4o and Llama 3.3 70B achieved high specificity but lower sensitivity, while Gemini 1.5 Pro and Claude 3.5 Sonnet exhibited higher sensitivity at the cost of lower specificity. Citation screening times and costs varied, with GPT-4o being the fastest and Llama 3.3 70B the most cost-effective. Consistency was comparable among the models. An ensemble approach combining model outputs improved sensitivity but increased the number of false positives, requiring additional review effort. Each model demonstrated distinct strengths, effectively streamlining citation screening by saving time and reducing workload. However, reviewing false positives remains a challenge. Combining models may enhance sensitivity, indicating the potential of LLMs to optimize systematic review workflows.
This chapter addresses how one could quantify and explore the impact of geopolitics on global businesses. Computational geopolitics is an attempt to integrate quantitative methods and geopolitical analysis to understand and predict trends. The explosive growth of data, improvements in computational power, and access to cloud computing have led to a proliferation of computational methods in analyzing geopolitics and its impact on companies. The chapter explores some tools and techniques used in computational geopolitics, including events-based approaches to measuring geopolitical tensions, textual approaches, and empirical approaches. In addition, it provides examples of ways in which analysts can quantify the impact of geopolitics on trade and foreign direct investment. It also introduces experimental methods to assess the effectiveness of companies’ strategic responses to geopolitical tensions. Large language models (LLMs) can be used for sentiment analysis, spotting trends, scenario building, risk assessment, and strategic recommendations. While they methods offer advances in quantifying the impact of geopolitics on global businesses, analysts should also be cautious about data quality and availability as well as the complexity of the phenomenon and the geopolitics of AI. The chapter concludes by pointing the reader to some widely used data sources for computational geopolitics.
Human language is increasingly written rather than just spoken, primarily due to the proliferation of digital technology in modern life. This trend has enabled the creation of generative artificial intelligence (AI) trained on corpora containing trillions of words extracted from text on the internet. However, current language theory inadequately addresses digital text communication’s unique characteristics and constraints. This paper systematically analyzes and synthesizes existing literature to map the theoretical landscape of digitized language. The evidence demonstrates that, parallel to spoken language, features of written communication are frequently correlated with the socially constructed demographic identities of writers, a phenomenon we refer to as “digital accents.” This conceptualization raises complex ontological questions about the nature of digital text and its relationship to social identity. The same line of questioning, in conjunction with recent research, shows how generative AI systematically fails to capture the breadth of expression observed in human writing, an outcome we call “homogeneity-by-design.” By approaching text-based language from this theoretical framework while acknowledging its inherent limitations, social scientists studying language can strengthen their critical analysis of AI systems and contribute meaningful insights to their development and improvement.
Biostatisticians increasingly use large language models (LLMs) to enhance efficiency, yet practical guidance on responsible integration is limited. This study explores current LLM usage, challenges, and training needs to support biostatisticians.
Methods:
A cross-sectional survey was conducted across three biostatistics units at two academic medical centers. The survey assessed LLM usage across three key professional activities: communication and leadership, clinical and domain knowledge, and quantitative expertise. Responses were analyzed using descriptive statistics, while free-text responses underwent thematic analysis.
Results:
Of 208 eligible biostatisticians (162 staff and 46 faculty), 69 (33.2%) responded. Among them, 44 (63.8%) reported using LLMs; of the 43 who answered the frequency question, 20 (46.5%) used them daily and 16 (37.2%) weekly. LLMs improved productivity in coding, writing, and literature review; however, 29 of 41 respondents (70.7%) reported significant errors, including incorrect code, statistical misinterpretations, and hallucinated functions. Key verification strategies included expertise, external validation, debugging, and manual inspection. Among 58 respondents providing training feedback, 44 (75.9%) requested case studies, 40 (69.0%) sought interactive tutorials, and 37 (63.8%) desired structured training.
Conclusions:
LLM usage is notable among respondents at two academic medical centers, though response patterns likely reflect early adopters. While LLMs enhance productivity, challenges like errors and reliability concerns highlight the need for verification strategies and systematic validation. The strong interest in training underscores the need for structured guidance. As an initial step, we propose eight core principles for responsible LLM integration, offering a preliminary framework for structured usage, validation, and ethical considerations.
The emergence of large language models (LLMs) has made it increasingly difficult to protect and enforce intellectual property (IP) rights in a digital landscape where content can be easily accessed and utilized without clear authorization. First, we explain why LLMs make it uniquely difficult to protect and enforce IP, creating a ‘tragedy of the commons.’ Second, drawing on theories of polycentric governance, we argue that non-fungible tokens (NFTs) could be effective tools for addressing the complexities of digital IP rights. Third, we provide an illustrative case study that shows how NFTs can facilitate dispute resolution of IP on the blockchain.
Audits of multilingual resources are reporting shockingly poor quality: “less than 50% … acceptable quality.” There is too much translationese in too many of our multilingual resources, e.g., Wikipedia, XNLI, FLORES, WordNet. We view translationese as a form of noise that makes it hard to generalize from a benchmark based on translation to a real task of interest that does not involve translation. Worse, too much of this translationese is in the “wrong” direction. Directionality matters. Professional translators translate from their weaker language into their stronger language. Unfortunately, many of our resources translate in the other direction, from a stronger (higher-resource) language into a weaker (lower-resource) language. In Wikipedia, for example, there is more translation out of English than into English. We recommend more investments in high-quality data, and less in translation, especially in the “wrong” direction.