Skip to main content Accessibility help
×
Hostname: page-component-7dd5485656-wlg5v Total loading time: 0 Render date: 2025-10-22T19:15:17.717Z Has data issue: false hasContentIssue false

27 - Quantitative Methods and the History of English

from Part IV - Modelling the Record: Methods and Theories

Published online by Cambridge University Press:  18 October 2025

Merja Kytö
Affiliation:
Uppsala Universitet, Sweden
Erik Smitterberg
Affiliation:
Uppsala Universitet, Sweden
Get access

Summary

This chapter surveys the role of quantitative methods in diachronic studies which investigate the past and present stages of English. We begin with simple bivariate analyses, such as tracing frequency developments, discuss inferential models and explain the advantages of multifactorial (including hierarchical) modelling. Moreover, we trace changing productivity, association strengths, lexical biases and collocational preferences within constructions. We also discuss methods for exploratory analysis of multidimensional data and provide a brief overview of methods drawing from machine learning and other scientific disciplines (e.g. word embeddings and agent-based modeling). To illustrate our methodological points, we include small-scale hands-on analyses on the variation between will and be going to constructions as markers of future-time reference from the ARCHER and COHA corpora.

Information

Type
Chapter
Information
The New Cambridge History of the English Language
Documentation, Sources of Data and Modelling
, pp. 665 - 693
Publisher: Cambridge University Press
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Baayen, R. Harald. 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.10.1017/CBO9780511801686CrossRefGoogle Scholar
Baayen, R. Harald. 2009. Corpus linguistics in morphology: morphological productivity. In Lüdeling, Anke and Kytö, Merja (eds.), Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter, pp. 900919.Google Scholar
Baumann, Andreas and Ritt, Nikolaus. 2018. The basic reproductive ratio as a link between acquisition and change in phonotactics. Cognition 176: 174183.10.1016/j.cognition.2018.03.005CrossRefGoogle ScholarPubMed
Berglund, Ylva. 2005. Expressions of Future in Present-Day English: A Corpus-Based Approach. Uppsala: Acta Universitatis Upsaliensis.Google Scholar
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024CrossRefGoogle Scholar
Biber, Douglas and Finegan, Edward. 1997. Diachronic relations among speech-based and written registers in English. In Nevalainen, Terttu and Kahlas-Tarkka, Leena (eds.), To Explain the Present: Studies in the Changing English Language in Honour of Matti Rissanen. Helsinki: Société Néophilologique, pp. 253275.Google Scholar
Bizzoni, Yuri, Degaetano-Ortlieb, Stefania, Menzel, Katrin, Krielke, Pauline and Teich, Elke. 2019. Grammar and meaning: analysing the topology of diachronic word embeddings. In Tahmasebi, Nina, Borin, Lars, Jatowt, Adam and Xu, Yang (eds.), Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Florence: Association for Computational Linguistics, pp. 175185.10.18653/v1/W19-4722CrossRefGoogle Scholar
Blythe, Richard A. and Croft, William. 2012. S-curves and the mechanisms of propagation in language change. Language 88.2: 269304.Google Scholar
Bohmann, Axel. 2019. Variation in English Worldwide: Registers and Global Varieties. Cambridge: Cambridge University Press.10.1017/9781108751339CrossRefGoogle Scholar
Danchev, Andrei and Kytö, Merja. 1994. The construction be going to + infinitive in Early Modern English. In Kastovsky, Dieter (ed.), Studies in Early Modern English. Berlin: Mouton de Gruyter, pp. 5977.Google Scholar
De Cuypere, L. 2015. The Old English to-dative construction. English Language and Linguistics 19: 126.Google Scholar
De Smet, Hendrik and Van de Velde, Freek. 2017. Experimenting on the past: a case study on changing analysability in English ly-adverbs. English Language and Linguistics 21.2: 317340.Google Scholar
Denis, Derek and Tagliamonte, Sali A.. 2017. The changing future: competition, specialization and reorganization in the contemporary English future temporal reference system. English Language and Linguistics 22.3: 403430.Google Scholar
Fehringer, Carol and Corrigan, Karen. 2015. The rise of the going to future in Tyneside English: evidence for further grammaticalisation. English World-Wide 36.2: 198227.10.1075/eww.36.2.03fehCrossRefGoogle Scholar
Fonteyn, Lauren. 2021. Varying Abstractions: a conceptual vs. distributional view on prepositional polysemy. Glossa: A Journal of General Linguistics 6.1: 90. 128.Google Scholar
Fonteyn, Lauren and Manjavacas, Enrique. 2021. Adjusting scope: a computational approach to case-driven research on semantic change. In Ehrmann, Maud, Karsdorp, Folgert, Wevers, Melvin, Andrews, Tara Lee, Burghardt, Manuel, Kestemont, Mike, Manjavacas, Enrique, Piotrowski, Michael and van Zundert, Joris J. (eds.), Proceedings of the Conference on Computational Humanities Research 2021. Aachen: CEUR Workshop Proceedings, pp. 280298.Google Scholar
Fonteyn, Lauren and van de Pol, Nikki. 2016. Divide and conquer: the formation and functional dynamics of the modern English ing-clause network. English Language and Linguistics 20.2: 185219.10.1017/S1360674315000258CrossRefGoogle Scholar
Gelman, Andrew. 2008. Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine 27: 28652873.10.1002/sim.3107CrossRefGoogle ScholarPubMed
Gries, Stefan Th. 2015. The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models. Corpora 10.1: 95125.Google Scholar
Gries, Stefan Th. 2021. Statistics for Linguistics with R. Third edition. Boston, MA and Berlin: de Gruyter Mouton.10.1515/9783110718256CrossRefGoogle Scholar
Gries, Stefan Th. 2022. How to use statistics in quantitative corpus analysis. In McCarthy, Michael and O’Keeffe, Anne (eds.), The Routledge Handbook of Corpus Linguistics. Second edition. New York and London: Routledge, pp. 168181.Google Scholar
Gries, Stefan Th. and Hilpert, Martin. 2012. Variability-based neighbor clustering: a bottom-up approach to periodization in historical linguistics. In Nevalainen, Terttu and Traugott, Elizabeth C. (eds.), The Oxford Handbook of the History of English. Oxford: Oxford University Press, pp. 134144.10.1093/oxfordhb/9780199922765.013.0014CrossRefGoogle Scholar
Gries, Stefan Th. and Stefanowitsch, Anatol. 2004. Extending collostructional analysis: a corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics 9.1: 97129.Google Scholar
Hilpert, Martin. 2006. Distinctive collexeme analysis and diachrony. Corpus Linguistics and Linguistic Theory 2.2: 243256.10.1515/CLLT.2006.012CrossRefGoogle Scholar
Hilpert, Martin. 2008. Germanic Future Constructions: A Usage-Based Approach to Language Change. Amsterdam: John Benjamins.10.1075/cal.7CrossRefGoogle Scholar
Hilpert, Martin. 2021. Ten Lectures on Diachronic Construction Grammar. Leiden: Brill.10.1163/9789004446793CrossRefGoogle Scholar
Hilpert, Martin and Gries, Stefan Th.. 2016. Quantitative approaches to diachronic corpus linguistics. In Kytö, Merja and Pahta, Päivi (eds.), The Cambridge Handbook of English Historical Linguistics. Cambridge: Cambridge University Press, pp. 3653.10.1017/CBO9781139600231.003CrossRefGoogle Scholar
Hinrichs, Lars, Szmrecsanyi, Benedikt and Bohmann, Axel. 2016. Which-hunting and the standard English relative clause. Language 91.4: 806836.Google Scholar
Hopper, Paul J. and Traugott, Elizabeth C.. 2003. Grammaticalization. Second edition. Cambridge: Cambridge University Press.Google Scholar
Hosmer, David W. and Lemeshow, Stanley. 2000. Applied Logistic Regression. New York: Wiley.Google Scholar
Jenset, Gard B. 2013. Mapping meaning with distributional methods: a diachronic corpus-based study of existential there. Journal of Historical Linguistics 3.2: 272306.10.1075/jhl.3.2.04jenCrossRefGoogle Scholar
Jenset, Gard B. and McGillivray, Barbara. 2017. Quantitative Historical Linguistics. A Corpus Framework. Oxford: Oxford University Press.Google Scholar
Kaufman, Leonard and Rousseeuw, Peter J.. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ: Wiley.Google Scholar
Kauhanen, Henri and Walkden, George. 2018. Deriving the Constant Rate Effect. Natural Language & Linguistic Theory 36: 483521.Google Scholar
Krug, Manfred. 2000. Emerging English Modals: A Corpus-Based Study of Grammaticalization. Berlin: Mouton de Gruyter.Google Scholar
Tove, Larsson, Plonsky, Luke and Hancock, Gregory R.. 2021. On the benefits of structural equation modeling for corpus linguists. Corpus Linguistics and Linguistic Theory 17.3: 683714.Google Scholar
Levshina, Natalia. 2015. How to Do Linguistics with R. Amsterdam: John Benjamins.Google Scholar
Levshina, Natalia. 2020. Conditional inference trees and random forests. In Paquot, Magali and Gries, Stefan Th. (eds.), A Practical Handbook of Corpus Linguistics. Cham: Springer International Publishing, pp. 611643.10.1007/978-3-030-46216-1_25CrossRefGoogle Scholar
Mair, Christian. 1997. The spread of the going-to-future in written English: a corpus-based investigation into language change in progress. In Hickey, Raymond and Puppel, Stanisław (eds.), Language History and Linguistic Modelling. Berlin: Mouton de Gruyter, pp. 15371543.Google Scholar
Nesselhauf, Nadja. 2012. Mechanisms of language change in a functional system. Journal of Historical Linguistics 2.1: 83132.Google Scholar
Perek, Florent. 2018. Recent change in the productivity and schematicity of the way-construction: a distributional semantic analysis. Corpus Linguistics and Linguistic Theory 14.1: 6597.Google Scholar
Petré, Peter and Van de Velde, Freek. 2018. The real-time dynamics of the individual and the community in grammaticalization. Language 94.4: 867901.10.1353/lan.2018.0056CrossRefGoogle Scholar
Pijpops, Dirk, Beuls, Katrien and Van de Velde, Freek. 2015. The rise of the verbal weak inflection in Germanic: an agent-based model. Computational Linguistics in the Netherlands Journal 5: 81102.Google Scholar
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing, www.R-project.org/.Google Scholar
Sagi, Eyal, Kaufmann, Stefan and Clark, Brady. 2012. Tracing semantic change with Latent Semantic Analysis. In Allan, Kathryn and Robinson, Justyna A. (eds.), Current Methods in Historical Semantics. Berlin: de Gruyter Mouton, pp. 161183.Google Scholar
Schmid, Hans-Jörg and Küchenhoff, Helmut. 2013. Collostructional analysis and other ways of measuring lexicogrammatical attraction: theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics 24.3: 531577.10.1515/cog-2013-0018CrossRefGoogle Scholar
Sommerer, Lotte and Hofmann, Klaus. 2021. Constructional competition and network reconfiguration: investigating sum(e) in Old, Middle and Early Modern English. English Language and Linguistics 25.1: 133.Google Scholar
Steels, Luc. 2016. Agent-based models for the emergence and evolution of grammar. Philosophical Transactions of the Royal Society B: Biological Sciences 371.1701: 20150447.10.1098/rstb.2015.0447CrossRefGoogle ScholarPubMed
Stefanowitsch, Anatol. 2013. Collostructional analysis. In Trousdale, Graeme and Hoffmann, Thomas (eds.), The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, pp. 290306.Google Scholar
Stefanowitsch, Anatol and Gries, Stefan Th.. 2003. Collostructions: investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8.2: 209243.Google Scholar
Szmrecsanyi, Benedikt. 2003. ‘Be going to’ versus ‘will/shall’: does syntax matter? Journal of English Linguistics 31: 295323.Google Scholar
Szmrecsanyi, Benedikt. 2016. About text frequencies in historical linguistics: disentangling environmental and grammatical change. Corpus Linguistics and Linguistic Theory 12.1: 153171.Google Scholar
Tagliamonte, Sali and Baayen, R. Harald. 2012. Models, forests and trees of York English: was/were variation as a case study for statistical practice. Language Variation and Change 24.2: 135178.10.1017/S0954394512000129CrossRefGoogle Scholar
Tizón-Couto, David and Lorenz, David. 2021. Variables are valuable: making a case for deductive modeling. Linguistics 59.5: 12791309.Google Scholar
Torres-Cacoullos, Rena and Walker, James A.. 2009. The present of the English future: grammatical variation and collocations in discourse. Language 85: 321354.Google Scholar
Van de Velde, Freek and Petré, Peter. 2020. Historical linguistics. In Adolphs, Svenja and Knight, Dawn (eds.), The Routledge Handbook of English Language and Digital Humanities. London: Routledge, pp. 328359.10.4324/9781003031758-18CrossRefGoogle Scholar
Winter, Bodo. 2020. Statistics for Linguists. An Introduction Using R. New York: Routledge.Google Scholar
Winter, Bodo and Wieling, Martijn. 2016. How to analyze linguistic change using mixed models, Growth Curve Analysis and Generalized Additive Modeling. Journal of Language Evolution 1.1: 718.10.1093/jole/lzv003CrossRefGoogle Scholar
Wolk, Christoph, Bresnan, Joan, Rosenbach, Anette and Szmrecsanyi., Benedikt 2013. Dative and genitive variability in Late Modern English: exploring cross-constructional variation and change. Diachronica 30.3: 382419.10.1075/dia.30.3.04wolCrossRefGoogle Scholar
Zehentner, Eva. 2019. Competition in Language Change: The Rise of the English Dative Alternation. Berlin: de Gruyter Mouton.Google Scholar
Zimmermann, Richard. 2019. Studying semantic chain shifts with Word2Vec: FOOD>MEAT>FLESH. In Tahmasebi, Nina, Borin, Lars, Jatowt, Adam, and Xu, Yang (eds.), Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Florence: Association for Computational Linguistics, pp. 2328.10.18653/v1/W19-4703CrossRefGoogle Scholar

Accessibility standard: WCAG 2.0 A

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

The PDF of this book conforms to version 2.0 of the Web Content Accessibility Guidelines (WCAG), ensuring core accessibility principles are addressed and meets the basic (A) level of WCAG compliance, addressing essential accessibility barriers.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.
Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.
Short alternative textual descriptions
You get concise descriptions (for images, charts, or media clips), ensuring you do not miss crucial information when visual or audio elements are not accessible.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.

Structural and Technical Features

ARIA roles provided
You gain clarity from ARIA (Accessible Rich Internet Applications) roles and attributes, as they help assistive technologies interpret how each part of the content functions.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×