This journal utilises an Online Peer Review Service (OPRS) for submissions. By clicking "Continue" you will be taken to our partner site
https://mc.manuscriptcentral.com/ch-research.
Please be aware that your Cambridge account is not valid for this OPRS and registration is required. We strongly advise you to read all "Author instructions" in the "Journal information" area prior to submitting.
To save this undefined to your undefined account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your undefined account.
Find out more about saving content to .
To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
While the indirect evidence suggests that already in the early scholastic period the literary production based on records of oral teaching (so-called reportationes) was not uncommon, there are very few sources commenting on the practice. This article details the design of a study applying stylometric techniques of authorship attribution to a collection developed from reportationes – Stephen Langton’s Quaestiones Theologiae – aiming to uncover layers of editorial work and thus validate some hypotheses regarding the collection’s formation. Following Camps, Clérice, and Pinche (2021), I discuss the implementation of an HTR pipeline and stylometric analysis based on the most frequent words, POS tags, and pseudo-affixes. The proposed study will offer two methodological gains relevant to computational research on the scholastic tradition: it will directly compare performance on manually composed and automatically extracted data, and it will test the validity of transformer-based OCR and automated transcription alignment for workflows applied to scholastic Latin corpora. If successful, this study will provide an easily reusable template for the exploratory analysis of collaborative literary production stemming from medieval universities.
This article introduces a strategy for the large-scale corpus analysis of music audio recordings, aimed at identifying long-term trends and testing hypotheses regarding the repertoire represented in a given corpus. Our approach centers on computing evolution curves (ECs), which map style-relevant features, such as musical complexity, onto historical timelines. Unlike traditional approaches that rely on sheet music, we use audio recordings, leveraging their widespread availability and the performance nuances they capture. We also emphasize the benefits of pitch-class features based on deep learning, which improve the robustness and accuracy of tonal complexity measures compared to traditional signal processing methods. Addressing the frequent lack of exact work dates (year of composition) in historical corpora, we propose a heuristic method that aligns works with timelines using composers’ life dates. This method effectively preserves historical trends with minimal deviation compared to using actual work dates, as validated against available metadata from the Carus Audio Corpus, which spans 450 years of choral and sacred music and contains 5,729 tracks with detailed metadata. We demonstrate the utility of our strategy through case studies of this corpus, showing how ECs provide insights into stylistic developments that confirm expectations from musicology, thus highlighting the potential of computational studies in this field. For example, we observe a steady increase in tonal complexity from the Renaissance through the Baroque period, stable complexity levels in the 19th and 20th centuries, and consistently higher complexity in minor-key works compared to major-key works. Our visualizations also reveal that vocal music was more complex than instrumental music in the 18th century, but less complex in the 20th century. Finally, we conduct comparative analyses of individual composers, exploring how historical and biographical contexts may have influenced their works. Our findings highlight the potential of this strategy for computational corpus studies in musicological research.
Texts, whether literary or historical, exhibit structural and stylistic patterns shaped by their purpose, authorship and cultural context. Formulaic texts, which are characterized by repetition and constrained expression, tend to differ in their information content (as defined by Shannon) compared to more dynamic compositions. Identifying such patterns in historical documents, particularly multi-author texts like the Hebrew Bible, provides insights into their origins, purpose and transmission. This study aims to identify formulaic clusters: sections exhibiting systematic repetition and structural constraints, by analyzing recurring phrases, syntactic structures and stylistic markers. However, distinguishing formulaic from non-formulaic elements in an unsupervised manner presents a computational challenge, especially in high-dimensional and sample-poor data sets where patterns must be inferred without predefined labels.
To address this, we develop an information-theoretic algorithm leveraging weighted self-information distributions to detect structured patterns in text. Our approach directly models variations in sample-wise self-information to identify formulaicity. By extending classical discrete self-information measures with a continuous formulation based on differential self-information in multivariate Gaussian distributions, our method remains applicable across different types of textual representations, including neural embeddings under Gaussian priors.
Applied to hypothesized authorial divisions in the Hebrew Bible, our approach successfully isolates stylistic layers, providing a quantitative framework for textual stratification. This method enhances our ability to analyze compositional patterns, offering deeper insights into the literary and cultural evolution of texts shaped by complex authorship and editorial processes.
This article implements a critical method for assessing bias in large historical datasets that we term the “Environmental Scan.” The Environmental Scan sheds new light on newspaper collections by linking newly available “reference metadata” gathered from historical sources to existing full-text and catalogue metadata. The rise of computational methods in history and the social sciences, in tandem with newly “datafied” source materials, creates a challenge for researchers to adapt their existing critical practices to the increasing scale and complexity of computational research. To help address this challenge, the Environmental Scan situates big historical datasets in much greater context, including estimating what materials are missing, thereby revealing the ways digital collections can be “oligoptic” in nature. Using the British Newspaper Archive (BNA) as a case study, we diagnose the biases and imbalances in the digitised Victorian press. We determine which voices are under- or over-represented in relation to the political composition of the collection as well as its content and we trace the origins of these biases in the digitisation process. This article informs future interdisciplinary discussions about data bias and offers a conceptual model adaptable to diverse historical datasets. The Environmental Scan provides a more nuanced and accurate understanding of how newspaper data reflects past societies, making it a valuable tool for researchers.
undate is an ambitious, in-progress effort to develop a pragmatic Python package for computation and analysis of temporal information in humanistic and cultural data, with a particular emphasis on uncertain, incomplete, and imprecise dates and with support for multiple calendars. The development of undate is grounded in domain-specific work on digital and computational humanities projects from multiple institutions, including Shakespeare and Company Project, Princeton Geniza Project, and Islamic Scientific Manuscript Initiative. With increasing support for different formats and calendars, Undate aims to bridge technical gaps across different communities and methodologies. In this article, we describe the undate software package and the functionality of the core Undate and UndateInterval classes to work with dates and date intervals. We discuss why this software exists, how it expands on and generalizes prior work, how it compares to other approaches and tools, and its current limitations. We describe the development methodology used to create the software, our plans for active and continuing development, and the potential undate has to impact computational humanities research.