To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Advances in natural language processing (NLP) and Big Data techniques have allowed us to learn about the human mind through one of its richest outputs – language. In this chapter, we introduce the field of computational linguistics and go through examples of how to find natural language and how to interpret the complexities that are present within it. The chapter discusses the major state-of-the-art methods being applied in NLP and how they can be applied to psychological questions, including statistical learning, N-gram models, word embedding models, large language models, topic modeling, and sentiment analysis. The chapter concludes with ethical discussions on the proliferation of chat “bots” that pervade our social networks, and the importance of balanced training sets for NLP models.
In the realm of data-to-text generation tasks, the use of large language models (LLMs) has become common practice, yielding fluent and coherent outputs. Existing literature highlights that the quality of in-context examples significantly influences the empirical performance of these models, making the efficient selection of high-quality examples crucial. We hypothesize that the quality of these examples is primarily determined by two properties: their similarity to the input data and their diversity from one another. Based on this insight, we introduce a novel approach, Double Clustering-based In-Context Example Selection, specifically designed for data-to-text generation tasks. Our method involves two distinct clustering stages. The first stage aims to maximize the similarity between the in-context examples and the input data. The second stage ensures diversity among the selected in-context examples. Additionally, we have developed a batched generation method to enhance the token usage efficiency of LLMs. Experimental results demonstrate that, compared to traditional methods of selecting in-context learning samples, our approach significantly improves both time efficiency and token utilization while maintaining accuracy.
Everyone is talking about bots. Much of the discussion has focused on downsides. It is too easy to use bots to cheat, but there are also many ways to use bots to improve your writing. Good writers use thesauruses. It is not cheating to use bots as a modern version of a thesaurus. It is also not cheating to use recommendation systems in a responsible way.
A critical step in systematic reviews involves the definition of a search strategy, with keywords and Boolean logic, to filter electronic databases. We hypothesize that it is possible to screen articles in electronic databases using large language models (LLMs) as an alternative to search equations. To investigate this matter, we compared two methods to identify randomized controlled trials (RCTs) in electronic databases: filtering databases using the Cochrane highly sensitive search and an assessment by an LLM.
We retrieved studies indexed in PubMed with a publication date between September 1 and September 30, 2024 using the sole keyword “diabetes.” We compared the performance of the Cochrane highly sensitive search and the assessment of all titles and abstracts extracted directly from the database by GPT-4o-mini to identify RCTs. Reference standard was the manual screening of retrieved articles by two independent reviewers.
The search retrieved 6377 records, of which 210 (3.5%) were primary reports of RCTs. The Cochrane highly sensitive search filtered 2197 records and missed one RCT (sensitivity 99.5%, 95% CI 97.4% to100%; specificity 67.8%, 95% CI 66.6% to 68.9%). Assessment of all titles and abstracts from the electronic database by GPT filtered 1080 records and included all 210 primary reports of RCTs (sensitivity 100%, 95% CI 98.3% to100%; specificity 85.9%, 95% CI 85.0% to 86.8%).
LLMs can screen all articles in electronic databases to identify RCTs as an alternative to the Cochrane highly sensitive search. This calls for the evaluation of LLMs as an alternative to rigid search strategies.
Informal caregivers such as family members or friends provide much care to people with physical or cognitive impairment. To address challenges in care, caregivers often seek information online via social media platforms for their health information wants (HIWs), the types of care-related information that caregivers wish to have. Some efforts have been made to use Artificial Intelligence (AI) to understand caregivers’ information behaviors on social media. In this chapter, we present achievements of research with a human–AI collaboration approach in identifying caregivers’ HIWs, focusing on dementia caregivers as one example. Through this collaboration, AI techniques such as large language models (LLMs) can be used to extract health-related domain knowledge for building classification models, while human experts can benefit from the help of AI to further understand caregivers’ HIWs. Our approach has implications for the caregiving of various groups. The outcomes of human–AI collaboration can provide smart interventions to help caregivers and patients.
Cadastral data reveal key information about the historical organization of cities but are often non-standardized due to diverse formats and human annotations, complicating large-scale analysis. We explore as a case study Venice’s urban history during the critical period from 1740 to 1808, capturing the transition following the fall of the ancient Republic and the Ancien Régime. This era’s complex cadastral data, marked by its volume and lack of uniform structure, presents unique challenges that our approach adeptly navigates, enabling us to generate spatial queries that bridge past and present urban landscapes. We present a text-to-programs framework that leverages large language models to process natural language queries as executable code for analyzing historical cadastral records. Our methodology implements two complementary techniques: a SQL agent for handling structured queries about specific cadastral information, and a coding agent for complex analytical operations requiring custom data manipulation. We propose a taxonomy that classifies historical research questions based on their complexity and analytical requirements, mapping them to the most appropriate technical approach. This framework is supported by an investigation into the execution consistency of the system, alongside a qualitative analysis of the answers it produces. By ensuring interpretability and minimizing hallucination through verifiable program outputs, we demonstrate the system’s effectiveness in reconstructing past population information, property features and spatiotemporal comparisons in Venice.
Codebooks—documents that operationalize concepts and outline annotation procedures—are used almost universally by social scientists when coding political texts. To code these texts automatically, researchers are increasingly turning to generative large language models (LLMs). However, there is limited empirical evidence on whether “off-the-shelf” LLMs faithfully follow real-world codebook operationalizations and measure complex political constructs with sufficient accuracy. To address this, we gather and curate three real-world political science codebooks—covering protest events, political violence, and manifestos—along with their unstructured texts and human-coded labels. We also propose a five-stage framework for codebook-LLM measurement: Preparing a codebook for both humans and LLMs, testing LLMs’ basic capabilities on a codebook, evaluating zero-shot measurement accuracy (i.e., off-the-shelf performance), analyzing errors, and further (parameter-efficient) supervised training of LLMs. We provide an empirical demonstration of this framework using our three codebook datasets and several pre-trained 7–12 billion open-weight LLMs. We find current open-weight LLMs have limitations in following codebooks zero-shot, but that supervised instruction-tuning can substantially improve performance. Rather than suggesting the “best” LLM, our contribution lies in our codebook datasets, evaluation framework, and guidance for applied researchers who wish to implement their own codebook-LLM measurement projects.
In this article, we evaluate several large language models (LLMs) on a word-level translation alignment task between Ancient Greek and English. Comparing model performance to a human gold standard, we examine the performance of four different LLMs, two open-weight and two proprietary. We then take the best-performing model and generate examples of word-level alignments for further finetuning of the open-weight models. We observe significant improvement of open-weight models due to finetuning on synthetic data. These findings suggest that open-weight models, though not able to perform a certain task themselves, can be bolstered through finetuning to achieve impressive results. We believe that this work can help inform the development of more such tools in the digital classics and the computational humanities at large.
Large language models have shown promise for automating data extraction (DE) in systematic reviews (SRs), but most existing approaches require manual interaction. We developed an open-source system using GPT-4o to automatically extract data with no human intervention during the extraction process. We developed the system on a dataset of 290 randomized controlled trials (RCTs) from a published SR about cognitive behavioral therapy for insomnia. We evaluated the system on two other datasets: 5 RCTs from an updated search for the same review and 10 RCTs used in a separate published study that had also evaluated automated DE. We developed the best approach across all variables in the development dataset using GPT-4o. The performance in the updated-search dataset using o3 was 74.9% sensitivity, 76.7% specificity, 75.7 precision, 93.5% variable detection comprehensiveness, and 75.3% accuracy. In both datasets, accuracy was higher for string variables (e.g., country, study design, drug names, and outcome definitions) compared with numeric variables. In the third external validation dataset, GPT-4o showed a lower performance with a mean accuracy of 84.4% compared with the previous study. However, by adjusting our DE method, while maintaining the same prompting technique, we achieved a mean accuracy of 96.3%, which was comparable to the previous manual extraction study. Our system shows potential for assisting the DE of string variables alongside a human reviewer. However, it cannot yet replace humans for numeric DE. Further evaluation across diverse review contexts is needed to establish broader applicability.
We examine the arguments made by Onitiu and colleagues concerning the need to adopt a “backward-walking logic” to manage the risks arising from the use of Large Language Models (LLMs) adapted for a medical purpose. We examine what lessons can be learned from existing multi-use technologies and applied to specialized LLMs, notwithstanding their novelty, and explore the appropriate respective roles of device providers and regulators within the ecosystem of technological oversight.
Traditional design automation enables parameterized customization but struggles with adapting to abstract or context-based user requirements. Recent advances in integrating large language models with script-driven CAD kernels provide a novel framework for context-sensitive, natural-language-driven design processes. Here, we present augmented design automation, enhancing parametric workflows with a semantic layer to interpret and execute functional, constructional, and effective user requests. Using CadQuery, experiments on a sandal model demonstrate the system’s capability to generate diverse and meaningful design variations from abstract prompts. This approach overcomes traditional limitations, enabling flexible and user-centric product development. Future research should focus on addressing complex assemblies and exploring generative design capabilities to expand the potential of this approach.
Effective product development relies on creating a requirements document that defines the product’s technical specifications, yet traditional methods are labor-intensive and depend heavily on expert input. Large language models (LLMs) offer the potential for automation but struggle with limitations in prompt engineering and contextual sensitivity. To overcome these challenges, we developed ReqGPT, a domain-specific LLM fine-tuned on Mistral-7B-Instruct-v0.2 using 107 curated requirements lists. ReqGPT employs a standardized prompt to generate high-quality documents and demonstrated superior performance over GPT-4 and Mistral in multiple criteria based on ISO 29148. Our results underscore ReqGPT’s efficiency, accuracy, cost-effectiveness, and alignment with industry standards, making it an ideal choice for localized use and safeguarding data privacy in technical product development.
This study investigates the integration of Large Language Models with the TRIZ to improve problem solving and innovation in industrial product development. By combining the structured problem-solving framework of TRIZ with LLMs to process large amounts of data and generate ideas, this hybrid approach seeks to overcome the limitations of traditional TRIZ and optimize solution generation. In a case study conducted in an industrial setting, the effectiveness of this integration was investigated by comparing team-generated solutions with those derived using LLMs and TRIZ-enhanced LLMs. The results show that while LLMs accelerate idea generation and provide practical solutions, the additional structure of TRIZ can provide unique insights, however depending on the application context.
Combinatorial optimization (CO) is essential for improving efficiency and performance in engineering applications. Traditional algorithms based on pure mathematical reasoning are limited and incapable to capture the contextual nuances for optimization. This study explores the potential of Large Language Models (LLMs) in solving engineering CO problems by leveraging their reasoning power and contextual knowledge. We propose a novel LLM-based framework that integrates network topology and contextual domain knowledge to optimize the sequencing of Design Structure Matrix (DSM) —a common CO problem. Our experiments on various DSM cases demonstrate that the proposed method achieves faster convergence and higher solution quality than benchmark methods. Moreover, results show that incorporating contextual domain knowledge significantly improves performance despite the choice of LLMs.
Functional decomposition (FD) is essential for simplifying complex systems in engineering design but remains a resource-intensive task reliant on expert knowledge. Despite advances in artificial intelligence, the automation of FD remains underexplored. This study introduces the use of GPT-4o, enhanced with a proposed Monte Carlo tree search for functional decomposition (MCTS-FD) algorithm, to automate FD. The approach is evaluated qualitatively by comparing outputs with those of graduate engineering students and quantitatively by assessing metrics such as structural integrity and semantic accuracy. The results show that GPT-4o, enhanced by MCTS-FD, outperforms smaller models in error rates and graph connectivity, highlighting the potential of large language models to automate FD with human-like accuracy.
Need analysis is essential for organisations to design efficient knowledge management (KM) practices, especially in contexts where knowledge is a critical asset and evolving fast. The research explores the application of large language model (LLM)-based agents in automating need analysis for KM practices. A two-layered model using Retrieval-Augmented Generation (RAG) architecture was developed and tested on datasets, including interviews with managers and consultants. The system automates NLP analysis, identifies stakeholder needs, and generates insights comparable to manual methods. Results demonstrate high efficiency and accuracy, with the model aligning with expert conclusions and offering actionable recommendations. This study highlights the potential of LLM-based systems to enhance KM processes, addressing challenges faced by non-technical professionals and optimising workflows.
The rise of visually driven platforms like Instagram has reshaped how information is shared and understood. This study examines the role of social, cultural, and political (SCP) symbols in Instagram posts during Taiwan’s 2024 election, focusing on their influence in anti-misinformation efforts. Using large language models (LLMs)—GPT-4 Omni and Gemini Pro Vision—we analyzed thousands of posts to extract and classify symbolic elements, comparing model performance in consistency and interpretive depth. We evaluated how SCP symbols affect user engagement, perceptions of fairness, and content spread. Engagement was measured by likes, while diffusion patterns followed the SEIZ epidemiological model. Findings show that posts featuring SCP symbols consistently received more interaction, even when follower counts were equal. Although political content creators often had larger audiences, posts with cultural symbols drove the highest engagement, were perceived as more fair and trustworthy, and spread more rapidly across networks. Our results suggest that symbolic richness influences online interactions more than audience size. By integrating semiotic analysis, LLM-based interpretation, and diffusion modeling, this study offers a novel framework for understanding how symbolic communication shapes engagement on visual platforms. These insights can guide designers, policymakers, and strategists in developing culturally resonant, symbol-aware messaging to combat misinformation and promote credible narratives.
The capabilities of large language models (LLMs) have advanced to the point where entire textbooks can be queried using retrieval-augmented generation (RAG), enabling AI to integrate external, up-to-date information into its responses. This study evaluates the ability of two OpenAI models, GPT-3.5 Turbo and GPT-4 Turbo, to create and answer exam questions based on an undergraduate textbook. 14 exams were created with four true-false, four multiple-choice, and two short-answer questions derived from an open-source Pacific Studies textbook. Model performance was evaluated with and without access to the source material using text-similarity metrics such as ROUGE-1, cosine similarity, and word embeddings. Fifty-six exam scores were analyzed, revealing that RAG-assisted models significantly outperformed those relying solely on pre-trained knowledge. GPT-4 Turbo also consistently outperformed GPT-3.5 Turbo in accuracy and coherence, especially in short-answer responses. These findings demonstrate the potential of LLMs in automating exam generation while maintaining assessment quality. However, they also underscore the need for policy frameworks that promote fairness, transparency, and accessibility. Given regulatory considerations outlined in the European Union AI Act and the NIST AI Risk Management Framework, institutions using AI in education must establish governance protocols, bias mitigation strategies, and human oversight measures. The results of this study contribute to ongoing discussions on responsibly integrating AI in education, advocating for institutional policies that support AI-assisted assessment while preserving academic integrity. The empirical results suggest not only performance benefits but also actionable governance mechanisms, such as verifiable retrieval pipelines and oversight protocols, that can guide institutional policies.
The rise of large language models (LLMs) has marked a substantial leap toward artificial general intelligence. However, the utilization of LLMs in (re)insurance sector remains a challenging problem because of the gap between general capabilities and domain-specific requirements. Two prevalent methods for domain specialization of LLMs involve prompt engineering and fine-tuning. In this study, we aim to evaluate the efficacy of LLMs, enhanced with prompt engineering and fine-tuning techniques, on quantitative reasoning tasks within the (re)insurance domain. It is found that (1) compared to prompt engineering, fine-tuning with task-specific calculation dataset provides a remarkable leap in performance, even exceeding the performance of larger pre-trained LLMs; (2) when acquired task-specific calculation data are limited, supplementing LLMs with domain-specific knowledge dataset is an effective alternative; and (3) enhanced reasoning capabilities should be the primary focus for LLMs when tackling quantitative tasks, surpassing mere computational skills. Moreover, the fine-tuned models demonstrate a consistent aptitude for common-sense reasoning and factual knowledge, as evidenced by their performance on public benchmarks. Overall, this study demonstrates the potential of LLMs to be utilized as powerful tools to serve as AI assistants and solve quantitative reasoning tasks in (re)insurance sector.
This chapter deals with the use of Large Language Models (LLMs) in the legal sector from a comparative law perspective, exploring their advantages and risks, the pertinent question as to whether the deployment of LLMs by non-lawyers can be classified as an unauthorized practice of law in the US and Germany, what lawyers, law firms and legal departments need to consider when using LLMs under professional rules of conduct - especially the American Bar Association Model Rules of Professional Conduct and the Charter of Core Principles of the European Legal Profession of the Council of Bars and Law Societies of Europe, and, finally, how the recently published AI Act will affect the legal tech market – specifically, the use of LLMs. A concluding section summarizes the main findings and points out open questions.