Skip to main content Accessibility help
×
Hostname: page-component-7857688df4-q9hl9 Total loading time: 0 Render date: 2025-11-18T17:47:30.727Z Has data issue: false hasContentIssue false

9 - Using Generative AI to Turn 19th-Century Library Catalogues into Data: Applications and Limitations

Published online by Cambridge University Press:  13 September 2025

Paul Gooding
Affiliation:
University of Glasgow
Melissa Terras
Affiliation:
University of Edinburgh
Get access

Summary

Introduction

In recent years, as digital textual analysis methods have become more userfriendly, and as access to the computing power necessary to run these analyses has become more readily available, there has been a surge of interest in creating textual corpora – collections of digitised, machine-processable texts – and doing what is sometimes referred to as ‘distant reading’ on them: analysing these corpora quantitatively in search of patterns (Underwood, 2017). In some examples of this sort of research, the corpus is well-defined and clearly bounded, (e.g. all published works by a particular author). In other cases, the composition of the corpus is more subjective; for example, an analysis of 240 crime novels in which one of four specified cities ‘play[s] a major role’ (Rauscher et al., 2013, 65). More problematic examples are those where the composition of the underlying corpus is unknown, particularly the hundreds of published papers based on the Google Books Ngrams data (books.google.com/ngrams), which traces the frequency of word usage over time among the vast and largely undocumented collection of books that make up the Google Books database. As has been demonstrated (e.g. Pechenick, Danforth and Dodds, 2015), the Google Books corpus is skewed towards academic publications rather than popular works, which makes it dangerous to draw conclusions from it about changes in popular culture or popular language usage.

Information

Type
Chapter
Information
Library Catalogues as Data
Research, Practice and Usage
, pp. 167 - 184
Publisher: Facet
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×