To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Traditional design automation enables parameterized customization but struggles with adapting to abstract or context-based user requirements. Recent advances in integrating large language models with script-driven CAD kernels provide a novel framework for context-sensitive, natural-language-driven design processes. Here, we present augmented design automation, enhancing parametric workflows with a semantic layer to interpret and execute functional, constructional, and effective user requests. Using CadQuery, experiments on a sandal model demonstrate the system’s capability to generate diverse and meaningful design variations from abstract prompts. This approach overcomes traditional limitations, enabling flexible and user-centric product development. Future research should focus on addressing complex assemblies and exploring generative design capabilities to expand the potential of this approach.
Effective product development relies on creating a requirements document that defines the product’s technical specifications, yet traditional methods are labor-intensive and depend heavily on expert input. Large language models (LLMs) offer the potential for automation but struggle with limitations in prompt engineering and contextual sensitivity. To overcome these challenges, we developed ReqGPT, a domain-specific LLM fine-tuned on Mistral-7B-Instruct-v0.2 using 107 curated requirements lists. ReqGPT employs a standardized prompt to generate high-quality documents and demonstrated superior performance over GPT-4 and Mistral in multiple criteria based on ISO 29148. Our results underscore ReqGPT’s efficiency, accuracy, cost-effectiveness, and alignment with industry standards, making it an ideal choice for localized use and safeguarding data privacy in technical product development.
This study investigates the integration of Large Language Models with the TRIZ to improve problem solving and innovation in industrial product development. By combining the structured problem-solving framework of TRIZ with LLMs to process large amounts of data and generate ideas, this hybrid approach seeks to overcome the limitations of traditional TRIZ and optimize solution generation. In a case study conducted in an industrial setting, the effectiveness of this integration was investigated by comparing team-generated solutions with those derived using LLMs and TRIZ-enhanced LLMs. The results show that while LLMs accelerate idea generation and provide practical solutions, the additional structure of TRIZ can provide unique insights, however depending on the application context.
Combinatorial optimization (CO) is essential for improving efficiency and performance in engineering applications. Traditional algorithms based on pure mathematical reasoning are limited and incapable to capture the contextual nuances for optimization. This study explores the potential of Large Language Models (LLMs) in solving engineering CO problems by leveraging their reasoning power and contextual knowledge. We propose a novel LLM-based framework that integrates network topology and contextual domain knowledge to optimize the sequencing of Design Structure Matrix (DSM) —a common CO problem. Our experiments on various DSM cases demonstrate that the proposed method achieves faster convergence and higher solution quality than benchmark methods. Moreover, results show that incorporating contextual domain knowledge significantly improves performance despite the choice of LLMs.
Functional decomposition (FD) is essential for simplifying complex systems in engineering design but remains a resource-intensive task reliant on expert knowledge. Despite advances in artificial intelligence, the automation of FD remains underexplored. This study introduces the use of GPT-4o, enhanced with a proposed Monte Carlo tree search for functional decomposition (MCTS-FD) algorithm, to automate FD. The approach is evaluated qualitatively by comparing outputs with those of graduate engineering students and quantitatively by assessing metrics such as structural integrity and semantic accuracy. The results show that GPT-4o, enhanced by MCTS-FD, outperforms smaller models in error rates and graph connectivity, highlighting the potential of large language models to automate FD with human-like accuracy.
Need analysis is essential for organisations to design efficient knowledge management (KM) practices, especially in contexts where knowledge is a critical asset and evolving fast. The research explores the application of large language model (LLM)-based agents in automating need analysis for KM practices. A two-layered model using Retrieval-Augmented Generation (RAG) architecture was developed and tested on datasets, including interviews with managers and consultants. The system automates NLP analysis, identifies stakeholder needs, and generates insights comparable to manual methods. Results demonstrate high efficiency and accuracy, with the model aligning with expert conclusions and offering actionable recommendations. This study highlights the potential of LLM-based systems to enhance KM processes, addressing challenges faced by non-technical professionals and optimising workflows.
The rise of visually driven platforms like Instagram has reshaped how information is shared and understood. This study examines the role of social, cultural, and political (SCP) symbols in Instagram posts during Taiwan’s 2024 election, focusing on their influence in anti-misinformation efforts. Using large language models (LLMs)—GPT-4 Omni and Gemini Pro Vision—we analyzed thousands of posts to extract and classify symbolic elements, comparing model performance in consistency and interpretive depth. We evaluated how SCP symbols affect user engagement, perceptions of fairness, and content spread. Engagement was measured by likes, while diffusion patterns followed the SEIZ epidemiological model. Findings show that posts featuring SCP symbols consistently received more interaction, even when follower counts were equal. Although political content creators often had larger audiences, posts with cultural symbols drove the highest engagement, were perceived as more fair and trustworthy, and spread more rapidly across networks. Our results suggest that symbolic richness influences online interactions more than audience size. By integrating semiotic analysis, LLM-based interpretation, and diffusion modeling, this study offers a novel framework for understanding how symbolic communication shapes engagement on visual platforms. These insights can guide designers, policymakers, and strategists in developing culturally resonant, symbol-aware messaging to combat misinformation and promote credible narratives.
The capabilities of large language models (LLMs) have advanced to the point where entire textbooks can be queried using retrieval-augmented generation (RAG), enabling AI to integrate external, up-to-date information into its responses. This study evaluates the ability of two OpenAI models, GPT-3.5 Turbo and GPT-4 Turbo, to create and answer exam questions based on an undergraduate textbook. 14 exams were created with four true-false, four multiple-choice, and two short-answer questions derived from an open-source Pacific Studies textbook. Model performance was evaluated with and without access to the source material using text-similarity metrics such as ROUGE-1, cosine similarity, and word embeddings. Fifty-six exam scores were analyzed, revealing that RAG-assisted models significantly outperformed those relying solely on pre-trained knowledge. GPT-4 Turbo also consistently outperformed GPT-3.5 Turbo in accuracy and coherence, especially in short-answer responses. These findings demonstrate the potential of LLMs in automating exam generation while maintaining assessment quality. However, they also underscore the need for policy frameworks that promote fairness, transparency, and accessibility. Given regulatory considerations outlined in the European Union AI Act and the NIST AI Risk Management Framework, institutions using AI in education must establish governance protocols, bias mitigation strategies, and human oversight measures. The results of this study contribute to ongoing discussions on responsibly integrating AI in education, advocating for institutional policies that support AI-assisted assessment while preserving academic integrity. The empirical results suggest not only performance benefits but also actionable governance mechanisms, such as verifiable retrieval pipelines and oversight protocols, that can guide institutional policies.
The rise of large language models (LLMs) has marked a substantial leap toward artificial general intelligence. However, the utilization of LLMs in (re)insurance sector remains a challenging problem because of the gap between general capabilities and domain-specific requirements. Two prevalent methods for domain specialization of LLMs involve prompt engineering and fine-tuning. In this study, we aim to evaluate the efficacy of LLMs, enhanced with prompt engineering and fine-tuning techniques, on quantitative reasoning tasks within the (re)insurance domain. It is found that (1) compared to prompt engineering, fine-tuning with task-specific calculation dataset provides a remarkable leap in performance, even exceeding the performance of larger pre-trained LLMs; (2) when acquired task-specific calculation data are limited, supplementing LLMs with domain-specific knowledge dataset is an effective alternative; and (3) enhanced reasoning capabilities should be the primary focus for LLMs when tackling quantitative tasks, surpassing mere computational skills. Moreover, the fine-tuned models demonstrate a consistent aptitude for common-sense reasoning and factual knowledge, as evidenced by their performance on public benchmarks. Overall, this study demonstrates the potential of LLMs to be utilized as powerful tools to serve as AI assistants and solve quantitative reasoning tasks in (re)insurance sector.
The rapid development of generative artificial intelligence (AI) systems, particularly those fuelled by increasingly advanced large language models (LLMs), has raised concerns of their potential risks among policymakers globally. In July 2023, Chinese regulators enacted the Interim Measures for the Management of Generative AI Services (“the Measures”). The Measures aim to mitigate various risks associated with public-facing generative AI services, particularly those concerning content safety and security. At the same time, Chinese regulators are seeking the further development and application of such technology across diverse industries. Tensions between these policy objectives are reflected in the provisions of the Measures that entail different types of obligations on generative AI service providers. Such tensions present significant challenges for implementation of the regulation. As Beijing moves towards establishing a comprehensive legal framework for AI governance, legislators will need to further clarify and balance the responsibilities of diverse stakeholders.
The AI Act contains some specific provisions dealing with the possible use of artificial intelligence for discriminatory purposes or in discriminatory ways, in the context of the European Union. The AI Act also regulates generative AI models. However, these two respective sets of rules have little in common: provisions concerning non-discrimination tend not to cover generative AI, and generative AI rules tend not to cover discrimination. Based on this analysis, the Chapter considers what is currently the Eu legal framework on discriminatory output of generative AI models, and concludes that those expressions that are already prohibited by anti-discrimination law certainly remain prohibited after the approval of the AI Act, while discriminatory content that is not covered by Eu non-discrimination legislation will remain lawful. For the moment, the AI Act has not brought any particularly relevant innovation on this specific matter, but the picture might change in the future.
This chapter deals with the use of Large Language Models (LLMs) in the legal sector from a comparative law perspective, exploring their advantages and risks, the pertinent question as to whether the deployment of LLMs by non-lawyers can be classified as an unauthorized practice of law in the US and Germany, what lawyers, law firms and legal departments need to consider when using LLMs under professional rules of conduct - especially the American Bar Association Model Rules of Professional Conduct and the Charter of Core Principles of the European Legal Profession of the Council of Bars and Law Societies of Europe, and, finally, how the recently published AI Act will affect the legal tech market – specifically, the use of LLMs. A concluding section summarizes the main findings and points out open questions.
This study aims to explore the feasibility and accuracy of utilizing large language models (LLMs) to assess the risk of bias (ROB) in cohort studies. We conducted a pilot and feasibility study in 30 cohort studies randomly selected from reference lists of published Cochrane reviews. We developed a structured prompt to guide the ChatGPT-4o, Moonshot-v1-128k, and DeepSeek-V3 to assess the ROB of each cohort twice. We used the ROB results assessed by three evidence-based medicine experts as the gold standard, and then we evaluated the accuracy of LLMs by calculating the correct assessment rate, sensitivity, specificity, and F1 scores for overall and item-specific levels. The consistency of the overall and item-specific assessment results was evaluated using Cohen’s kappa (κ) and prevalence-adjusted bias-adjusted kappa. Efficiency was estimated by the mean assessment time required. This study assessed three LLMs (ChatGPT-4o, Moonshot-v1-128k, and DeepSeek-V3) and revealed distinct performance across eight assessment items. Overall accuracy was comparable (80.8%–83.3%). Moonshot-v1-128k showed superior sensitivity in population selection (0.92 versus ChatGPT-4o’s 0.55, P < 0.001). In terms of F1 scores, Moonshot-v1-128k led in population selection (F = 0.80 versus ChatGPT-4o’s 0.67, P = 0.004). ChatGPT-4o demonstrated the highest consistency (mean κ = 96.5%), with perfect agreement (100%) in outcome confidence. ChatGPT-4o was 97.3% faster per article (32.8 seconds versus 20 minutes manually) and outperformed Moonshot-v1-128k and DeepSeek-V3 by 47–50% in processing speed. The efficient and accurate assessment of ROB in cohort studies by ChatGPT-4o, Moonshot-v1-128k, and DeepSeek-V3 highlights the potential of LLMs to enhance the systematic review process.
Generative artificial intelligence has a long history but surged into global prominence with the introduction in 2017 of the transformer architecture for large language models. Based on deep learning with artificial neural networks, transformers revolutionised the field of generative AI for production of natural language outputs. Today’s large language models, and other forms of generative artificial intelligence, now have unprecedented capability and versatility. This emergence of these forms of highly capable generative AI poses many legal issues and questions, including consequences for intellectual property, contracts and licences, liability, data protection, use in specific sectors, potential harms, and of course ethics, policy, and regulation of the technology. To support the discussion of these topics in this Handbook, this chapter gives a relatively non-technical introduction to the technology of modern artificial intelligence and generative AI.
The advent and momentum gained by Generative AI erupted into the EU regulatory scene signalling a significant paradigm shift in the AI landscape. The AI Act has struggled to embrace the eruption and extraordinary popularity of Generative AI and managed to provide for specific solutions designed for these models. Nonetheless, there are legal and regulatory implications of Generative AI that may exceed the proposed solutions. Understanding the paradigm shift that Generative AI is likely to bring will allow us to assess the sufficiency and adequacy of the measures adopted and to identify possible shortcomings and gaps in the current EU framework. Generative AI raises specific problems in the compliance of AI Act obligations and in the application of liability rules that have to be acknowledged and properly addressed. Multimodality, emergence factor, scalability or generality of tasks may mismatch the assumption underlying the obligations and requirements laid down for AI systems. The chapter explores whether the current ecosystem of existing and still-to-be adopted rules on AI systems does fully and adequately address the distinctive features of Generative AI, with special consideration to the interaction between the AI Act and the liability rules as provided for the draft AILD and the revPLD.
The philosophy of linguistics reflects on multiple scientific disciplines aimed at the understanding of one of the most fundamental aspects of human existence, our ability to produce and understand natural language. Linguistics, viewed as a science, has a long history but it was the advent of the formal (and computational) revolution in cognitive science that established the field as both scientifically and philosophically appealing. In this Element, the topic will be approached as a means for understanding larger issues in the philosophy of science more generally.
Systematic reviews (SRs) synthesize evidence through a rigorous, labor-intensive, and costly process. To accelerate the title–abstract screening phase of SRs, several artificial intelligence (AI)-based semi-automated screening tools have been developed to reduce workload by prioritizing relevant records. However, their performance is primarily evaluated for SRs of intervention studies, which generally have well-structured abstracts. Here, we evaluate whether screening tool performance is equally effective for SRs of prognosis studies that have larger heterogeneity between abstracts. We conducted retrospective simulations on prognosis and intervention reviews using a screening tool (ASReview). We also evaluated the effects of review scope (i.e., breadth of the research question), number of (relevant) records, and modeling methods within the tool. Performance was assessed in terms of recall (i.e., sensitivity), precision at 95% recall (i.e., positive predictive value at 95% recall), and workload reduction (work saved over sampling at 95% recall [WSS@95%]). The WSS@95% was slightly worse for prognosis reviews (range: 0.324–0.597) than for intervention reviews (range: 0.613–0.895). The precision was higher for prognosis (range: 0.115–0.400) compared to intervention reviews (range: 0.024–0.057). These differences were primarily due to the larger number of relevant records in the prognosis reviews. The modeling methods and the scope of the prognosis review did not significantly impact tool performance. We conclude that the larger abstract heterogeneity of prognosis studies does not substantially affect the effectiveness of screening tools for SRs of prognosis. Further evaluation studies including a standardized evaluation framework are needed to enable prospective decisions on the reliable use of screening tools.
This article proposes a new approach for measuring the quality of answers in political question-and-answer sessions. We assess the quality of an answer based on how easily and accurately it can be recognized among a random set of candidate answers given the question’s text. This measure reflects the answer’s relevance and depth of engagement with the question. Drawing a parallel with semantic search, we can implement this approach by training a language model on the corpus of observed questions and answers without additional human-labeled data. We showcase and validate our methodology within the context of the Question Period in the Canadian House of Commons. Our analysis reveals that while some answers only have a weak semantic connection to questions, suggesting some evasion or obfuscation, they are generally at least moderately relevant, far exceeding what we would expect from random replies. We also find meaningful correlations between the quality of answers and the party affiliation of the members of Parliament asking the questions.
Against the proliferation of large language model (LLM) based Artificial Intelligence (AI) products such as ChatGPT and Gemini, and their increasing use in professional communication training, researchers, including applied linguists, have cautioned that these products (re)produce cultural stereotypes due to their training data. However, there is a limited understanding of how humans navigate the assumptions and biases present in the responses of these LLM-powered systems and the role humans play in perpetuating stereotypes during interactions with LLMs. In this article, we use Sequential-Categorial Analysis, which combines Conversation Analysis and Membership Categorization Analysis, to analyze simulated interactions between a human physiotherapist and three LLM-powered chatbot patients of Chinese, Australian, and Indian cultural backgrounds. Coupled with analysis of information elicited from LLM chatbots and the human physiotherapist after each interaction, we demonstrate that users of LLM-powered systems are highly susceptible to becoming interactionally entrenched in culturally essentialized narratives. We use the concepts of interactional instinct and interactional entrenchment to argue that whilst human–AI interaction may be instinctively prosocial, LLM users need to develop Critical Interactional Competence for human–AI interaction through appropriate and targeted training and intervention, especially when LLM-powered tools are used in professional communication training programs.
Protest event analysis (PEA) is the core method to understand spatial patterns and temporal dynamics of protest. We show how Large Language Models (LLM) can be used to automate the classification of protest events and of political event data more broadly with levels of accuracy comparable to humans, while reducing necessary annotation time by several orders of magnitude. We propose a modular pipeline for the automation of PEA (PAPEA) based on fine-tuned LLMs and provide publicly available models and tools which can be easily adapted and extended. PAPEA enables getting from newspaper articles to PEA datasets with high levels of precision without human intervention. A use case based on a large German news-corpus illustrates the potential of PAPEA.