Using Generative AI to Turn 19th-Century Library Catalogues into Data: Applications and Limitations

doi:10.29085/9781783306602.011

9 - Using Generative AI to Turn 19th-Century Library Catalogues into Data: Applications and Limitations

Published online by Cambridge University Press: 13 September 2025

Julia Bauder and

Edited by

Melissa Terras and

Paul Gooding: Affiliation:
University of Glasgow
Melissa Terras: Affiliation:
University of Edinburgh

Book contents

Get access

Summary

Introduction

In recent years, as digital textual analysis methods have become more userfriendly, and as access to the computing power necessary to run these analyses has become more readily available, there has been a surge of interest in creating textual corpora – collections of digitised, machine-processable texts – and doing what is sometimes referred to as ‘distant reading’ on them: analysing these corpora quantitatively in search of patterns (Underwood, 2017). In some examples of this sort of research, the corpus is well-defined and clearly bounded, (e.g. all published works by a particular author). In other cases, the composition of the corpus is more subjective; for example, an analysis of 240 crime novels in which one of four specified cities ‘play[s] a major role’ (Rauscher et al., 2013, 65). More problematic examples are those where the composition of the underlying corpus is unknown, particularly the hundreds of published papers based on the Google Books Ngrams data (books.google.com/ngrams), which traces the frequency of word usage over time among the vast and largely undocumented collection of books that make up the Google Books database. As has been demonstrated (e.g. Pechenick, Danforth and Dodds, 2015), the Google Books corpus is skewed towards academic publications rather than popular works, which makes it dangerous to draw conclusions from it about changes in popular culture or popular language usage.

Information

Type: Chapter
Information: Library Catalogues as Data
Research, Practice and Usage
, pp. 167 - 184

DOI: https://doi.org/10.29085/9781783306602.011 [Opens in a new window]

Publisher: Facet

Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.