Skip to main content Accessibility help
×
Hostname: page-component-857557d7f7-zntvd Total loading time: 0 Render date: 2025-12-06T09:30:37.110Z Has data issue: false hasContentIssue false

Chapter 1 - Data-Intensive Approaches to English Linguistics

Published online by Cambridge University Press:  03 December 2025

Mikko Laitinen
Affiliation:
University of Eastern Finland
Paula Rautionaho
Affiliation:
University of Eastern Finland
Get access

Summary

This chapter defines data-intensive research in the context of the English language and explores its prospects. It argues that data intensiveness extends beyond a single digital method or the use of advanced statistical tools; rather it encompasses a broader transformation and fuller integration of digital tools and methods throughout the research process. We also address the potential pitfalls of data fetishism and over-reliance on data, and we draw parallels with the digital transformation in another discipline, specifically biosciences, to illustrate the fundamental changes proposed as a result of digitalization. The lessons learned from other fields underscore the need for increased multi- and interdisciplinary collaboration and the development of broader digital infrastructures. This includes investments in enhanced computing power, robust data management processes, and a greater emphasis on replicability and transparency in reporting methods, data, and analytical techniques.

Information

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Adolphs, Svenja, and Knight, Dawn (eds.) (2020). The Routledge Handbook of English Language and Digital Humanities. London: Routledge.CrossRefGoogle Scholar
Ampuja, Marko (2020). “The blind spots of digital innovation fetishism,” in Stocchetti, Matteo (ed.), The Digital Age and Its Discontents: Critical Reflections in Education. Helsinki: Helsinki University Press, pp. 3154.Google Scholar
Anthony, Laurence (2023). “Corpus AI: Integrating Large Language Models (LLMs) into a Corpus Analytic Toolkit.” A speech at the Forty-Ninth Annual Convention for the Japan Association for English Corpus Studies (JAECS), Kansai University, Osaka, Japan, September 9–10, 2023.Google Scholar
Baron, Alistair, and Rayson, Paul (2008). “VARD 2: A tool for dealing with spelling variation in historical corpora,” in Proceedings of the Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK, May 22, 2008.Google Scholar
Biber, Douglas (1993). “Representativeness in corpus design.” Literary and Linguistic Computing, 8(4), 243257.CrossRefGoogle Scholar
Biber, Douglas, and Reppen, Randi (eds.) (2015). The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Christiansen, Jeff, Coddington, Paul, Davis, Brian, Duncan, Ian, Francis, Rhys, Hall, Christina, Kemp, Carina, Manos, Steven, Nelson, Tiffanie, Nisbet, Sarah, Price, Gareth, Stevens, Frankie, and Lonie, Andrew (n.d.). The Australian Biocommons: An Exemplar of International Engagement in Research Infrastructure. https://conference.eresearch.edu.au/wp-content/uploads/2019/11/2.-1630-1650_Jeff-Christiansen.pdf (accessed November 26, 2024).Google Scholar
Church, Kenneth, and Liberman, Mark (2021). “The future of computational linguistics: On beyond alchemy.” Frontiers in Artificial Intelligence, 4, 625341.CrossRefGoogle ScholarPubMed
Coats, Steven (2023). “Dialect corpora from YouTube,” in Busse, Beatrix, Dumrukcic, Nina, and Kleiber, Ingo (eds.), Language and Linguistics in a Complex World. Berlin: de Gruyter, pp. 79102.CrossRefGoogle Scholar
Coats, Steven (2024). “Building a searchable online corpus of Australian and New Zealand aligned speech.Australian Journal of Linguistics, 44(2–3), 261–277.CrossRefGoogle Scholar
Davies, Mark (2009). “The 385+ million-word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights.” International Journal of Corpus Linguistics, 14(2), 159190.CrossRefGoogle Scholar
Davies, Mark (2015). “Corpora: An introduction,” in Biber, Douglas, and Reppen, Randi (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, pp. 1131.CrossRefGoogle Scholar
Del Tredici, Marco, and Fernández, Raquel (2018). “The road to success: Assessing the fate of linguistic innovations in online communities,” in Bender, Emily M., Derczynski, Leon, and Isabelle, Pierre (eds.), Proceedings of the Twenty-Seventh International Conference on Computational Linguistics. Santa Fe: Association for Computational Linguistics, pp. 15911603.Google Scholar
Depraetere, Ilse, Cappelle, Bert, Hilpert, Martin, De Cuypere, Ludovic, Dehouck, Mathieu, Denis, Pascal, Flach, Susanne, Grabar, Natalia, Grandin, Cyril, Hamon, Thierry, Hufeld, Clemens, Leclercq, Benoît, and Schmid, Hans-Jörg (eds.) (2023). Models of Modals from Pragmatics and Corpus Linguistics to Machine Learning. Berlin: de Gruyter.Google Scholar
Di Cristofaro, Matteo (2024). Corpus Approaches to Language in Social Media. London: Routledge.Google Scholar
D’Ignazio, Catherine, and Klein, Lauren F. (2020). Data Feminism. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Flanders, Julia, and Jannidis, Fotis (eds.) (2019). The Shape of Data in Digital Humanities: Modeling Texts and Text-Based Resources. London: Routledge.Google Scholar
Francis, Rhys, and Christiansen, Jeff (2024). Australian BioCommons Strategic Plan 2023–2028 (1.0). Zenodo.Google Scholar
Granger, Sylviane (2003). “The International Corpus of Learner English: A new resource for foreign language learning and teaching and second language acquisition research.” TESOL Quarterly, 37(3), 538546.CrossRefGoogle Scholar
Greenbaum, Sidney, and Nelson, Gerald (1996). “The International Corpus of English (ICE) Project.” World Englishes, 15(1), 315.CrossRefGoogle Scholar
Gries, Stefan (2009). Statistics for Linguistics with R: A Practical Introduction. Berlin: Mouton de Gruyter.CrossRefGoogle Scholar
Grieve, Jack, Bartl, Sara, Fuoli, Matteo, Grafmiller, Jason, Huang, Weihang, Jawerbaum, Alejandro, Murakami, Akira, Perlman, Marcus, Roemling, Dana, and Winter, Bodo. (2025). “The sociolinguistic foundations of language modeling.” Frontiers in Artificial Intelligence, 7, 1472411.CrossRefGoogle ScholarPubMed
Grieve, Jack, Nini, Andrea, and Guo, Diansheng (2016). “Analyzing lexical emergence in Modern American English online.” English Language and Linguistics, 21(1), 99127.CrossRefGoogle Scholar
Grieve, Jack, Nini, Andrea, and Guo, Diansheng (2018). “Mapping lexical innovation on American social media.” Journal of English Linguistics, 46(4), 293319.CrossRefGoogle Scholar
Grieve, Jack, Hovy, Dirk, Jurgens, David, Kendall, Tyler S., Nguyen, Dong, Stanford, James N., and Sumner, Meghan (eds.) (2023). Computational Sociolinguistics. Lausanne: Frontiers Media SA.CrossRefGoogle Scholar
Huang, Yuan, Guo, Diansheng, Kasakoff, Alice, and Grieve, Jack (2016). “Understanding US regional linguistic variation with Twitter data analysis.” Computers, Environment and Urban Systems, 59, 244255.CrossRefGoogle Scholar
Jespersen, Otto (1924). The Philosophy of Grammar. London: George Allen & Unwin.Google Scholar
Kortmann, Bernd (2021). “Reflecting on the quantitative turn in linguistics.” Linguistics, 59(5), 12071226.CrossRefGoogle Scholar
Laitinen, Mikko, and Fatemi, Masoud (2022). “Big and rich social networks in computational sociolinguistics,” in Rautionaho, Paula, Parviainen, Hanna, Kaunisto, Mark, and Nurmi, Arja (eds.), Social and Regional Variation in World Englishes: Local and Global Perspectives. London: Routledge, pp. 166189.CrossRefGoogle Scholar
Laitinen, Mikko, Fatemi, Masoud, and Lundberg, Jonas (2020). “Size matters: Digital social networks and language change.” Frontiers in Artificial Intelligence, 3(46).CrossRefGoogle ScholarPubMed
Larsson, Tove, Egbert, Jesse, and Biber, Douglas (2022). “On the status of statistical reporting versus linguistic description in corpus linguistics: A ten-year perspective.” Corpora, 17(1), 137157.CrossRefGoogle Scholar
Lütge, Christiane, and Merse, Thorsten (eds.) (2021). Digital Teaching and Learning: Perspectives for English Language Education. Tübingen: Narr Francke Attempto Verlag.Google Scholar
Mair, Christian (2006). Twentieth-Century English: History, Variation, and Standardization. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
McGillivray, Barbara, Alex, Beatrice, Ames, Sarah, Armstrong, Guyda, Beavan, David, Ciula, Arianna, Colavizza, Giovanni, Cummings, James, De Roure, David, Farquhar, Adam, Hengchen, Simon, Lang, Anouk, Loxley, James, Goudarouli, Eirini, Nanni, Federico, Nini, Andrea, Nyhan, Julianne, Osborne, Nicola, Poibeau, Thierry, Ridge, Mia, Ranade, Sonia, Smithies, James, Terras, Melissa, Vlachidis, Andreas, and Willcox, Pip (2020). “The challenges and prospects of the intersection of humanities and data science: A White Paper from the Alan Turing Institute.” Figshare. http://dx.doi.org/10.6084/m9.figshare.12732164 (accessed May 15, 2024).CrossRefGoogle Scholar
Miconi, Andrea (2024). “On digital fetishism: A critique of the big data paradigm.” Critical Sociology, 50(4–5), 629642.CrossRefGoogle Scholar
Milroy, James, and Milroy, Lesley (1985). “Linguistic change, social networks and speaker innovation.” Journal of Linguistics, 21, 339384.CrossRefGoogle Scholar
Nature (2023). “Language models and linguistic theories beyond words.” Nature Machine Intelligence, 5, 677678.CrossRefGoogle Scholar
Nevalainen, Terttu, Palander-Collin, Minna, and Säily, Tanja (eds.) (2018). Patterns of Change in 18th-Century English: A Sociolinguistic Approach. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Nini, Andrea (2019). “The multi-dimensional analysis tagger,” in Berber Sardinha, Tony, and Veirano Pinto, Marcia (eds.), Multi-Dimensional Analysis: Research Methods and Current Issues. London: Bloomsbury Academic, pp. 6794.Google Scholar
Quirk, Randolph (1960). “Towards a description of English usage.Transactions of the Philological Society, 59(1), 4061.CrossRefGoogle Scholar
Rissanen, Matti (1989). “Three problems connected with the use of diachronic corpora.” ICAME Journal, 13, 1619.Google Scholar
Rissanen, Matti (1992). “The diachronic corpus as a window to the history of English,” in Svartvik, Jan (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82. Stockholm 4–8 August 1991. Berlin: Mouton de Gruyter, pp. 185205.Google Scholar
Rüdiger, Sofia, and Dayter, Daria (eds.) (2020). Corpus Approaches to Social Media. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Schweinberger, Martin, and Haugh, Michael (eds.) (2025). “Reproducibility and transparency in interpretive corpus pragmatics.” Special issue of International Journal of Corpus Linguistics.CrossRefGoogle Scholar
Sönning, Lukas, and Schützler, Ole (eds.) (2023). Data Visualization in Corpus Linguistics: Critical Reflections and Future Directions. (Studies in Variation, Contacts and Change in English). Helsinki: VARIENG e-series. https://varieng.helsinki.fi/series/volumes/22/index.html.Google Scholar
Sönning, Lukas, and Werner, Valentin (2021). “The replication crisis, scientific revolutions, and linguistics.” Linguistics, 59(5), 11791206.CrossRefGoogle Scholar
Suhr, Carla, Nevalainen, Terttu, and Taavitsainen, Irma (eds.) (2019). From Data to Evidence in English Language Research. Leiden, The Netherlands: Brill.CrossRefGoogle Scholar
Sukhova, Natalya, Dubrovskaya, Tatiana, and Lobina, Yulia (eds.) (2021). Multimodality, Digitalization and Cognitivity in Communication and Pedagogy. Cham: Springer Verlag.CrossRefGoogle Scholar
Svartvik, Jan, Eeg-Olofsson, Mats, Forsheden, Oscar, Oreström, Bengt, and Thavenius, Cecilia (1982). Survey of Spoken English. Resort on Research 1975–81. Lund: Lund University Press.Google Scholar
Benedikt, Szmrecsanyi, Grafmiller, Jason, and Rosseel, Laura (2019). “Variation-based distance and similarity modeling: a case study in world Englishes.” Frontiers in Artificial Intelligence, 2(23).Google Scholar
Tagliamonte, Sali (2012). Variationist Sociolinguistics: Change, Observation, Interpretation. Oxford: Wiley-Blackwell.Google Scholar
Tay, Dennis, and Molly, Xie Pan (eds.) (2022). Data Analytics in Cognitive Linguistics: Methods and Insights. Berlin: De Gruyter Mouton.CrossRefGoogle Scholar
Taylor, Ann, Nurmi, Arja, Warner, Anthony, Pintzuk, Susan, and Nevalainen, Terttu (2006). Parsed Corpus of Early English Correspondence, Parsed Version. Compiled by the Corpus of Early English Correspondence (CEEC) Project Team. York: University of York and University of Helsinki. Distributed through the Oxford Text Archive.Google Scholar
Uchida, Satoru (2024). “Using early LLMs for corpus linguistics: Examining ChatGPT”s potential and limitations.” Applied Corpus Linguistics, 4(1), 100089.CrossRefGoogle Scholar
Vásquez, Camilla (ed.) (2022). Research Methods for Digital Discourse Analysis. London: Bloomsbury.Google Scholar
Walters, William (2023). “The effectiveness of software designed to detect AI-generated writing: A comparison of 16 AI text detectors.” Open Information Science, 7(1), 20220158.CrossRefGoogle Scholar
Zhu, Jian, and Jurgens, David (2021). “The structure of online social networks modulates the rate of lexical change,” in Toutanova, Kristina, Rumshisky, Anna, Zettlemoyer, Luke, Hakkani-Tur, Dilek, Beltagy, Iz, Bethard, Steven, Cotterell, Ryan, Chakraborty, Tanmoy, and Zhou, Yichao (eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics June 6–11, 2021. Human Language Technologies, pp. 22012218.Google Scholar

Accessibility standard: Inaccessible, or known limited accessibility

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

The PDF of this book is known to have missing or limited accessibility features. We may be reviewing its accessibility for future improvement, but final compliance is not yet assured and may be subject to legal exceptions. If you have any questions, please contact accessibility@cambridge.org.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.
Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.
Short alternative textual descriptions
You get concise descriptions (for images, charts, or media clips), ensuring you do not miss crucial information when visual or audio elements are not accessible.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.

Structural and Technical Features

ARIA roles provided
You gain clarity from ARIA (Accessible Rich Internet Applications) roles and attributes, as they help assistive technologies interpret how each part of the content functions.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×