Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-11T10:31:42.675Z Has data issue: false hasContentIssue false

10 - Comparing Corpus-Driven and Corpus-Based Approaches to Diachronic Variation

Grammatical Changes in Late Modern and Present-Day English

from Part IV - Applications of Classification-Based Approaches

Published online by Cambridge University Press:  06 May 2022

Ole Schützler
Affiliation:
Universität Leipzig
Julia Schlüter
Affiliation:
Universität Bamberg
Get access

Summary

Focusing on grammatical changes in Late Modern and Present-Day English, the author applies a corpus-driven method to texts from two diachronic corpora, the Representative Corpus of Historical English Registers (ARCHER) and the Corpus of Historical American English (COHA). He compares his findings to those returned by more conventional corpus-based methods, which can be characterized as hypothesis-driven. To this purpose, the study employs automated profiling of large feature sets, such as word- and POS-based mono-, bi- and trigrams, chunks, syntactic dependency labels and measures of constituent order and length. The derived feature profiles are combined in a supervised classification task with a given division of texts into earlier and later corpus subperiods to reveal patterns of over- and underuse. Structures that profiled as over- or under-represented in the diachronic subsections are then browsed for grammatical changes that may have been missed by previous research. According to the author, an advantage of such approaches is that they are theory-neutral and may generate novel hypotheses for investigation. These may then serve as input to further corpus-based approaches.

Type
Chapter
Information
Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 291 - 322
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Further Reading

Evert, Stefan. 2006. How Random Is a Corpus? The Library Metaphor. Zeitschrift für Anglistik und Amerikanistik 54(2). 177–90.CrossRefGoogle Scholar
Grimmer, Justin, and Stewart, Brandon. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21(3). 267–97.CrossRefGoogle Scholar
Jurafsky, Dan, and Martin, James H.. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Pearson Prentice Hall.Google Scholar
Schneider, Gerold, and Lauber, Max. 2019. Introduction to Statistics for Linguists. Pressbooks. https://dlf.uzh.ch/openbooks/statisticsforlinguists/Google Scholar
Sinclair, John McHardy, and Carter, Ronald. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge.Google Scholar

References

Aarts, Bas. 1992. Comments. In Svartvik, Jan, ed. Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82. Stockholm, 4–8 August 1991. Berlin: Mouton de Gruyter. 180–3.Google Scholar
Aarts, Bas. 2019. Syntactic Argumentation. In Aarts, Bas, Jill, Bowie and Popova, Gergana, eds. The Oxford Handbook of English Grammar. Oxford: Oxford University Press.CrossRefGoogle Scholar
Abney, Steven. 1995. Chunks and Dependencies: Bringing Processing Evidence to Bear on Syntax. In Cole, Jennifer, Green, Georgia and Morgan, Jerry, eds. Computational Linguistics and the Foundations of Linguistic Theory. Chicago: University of Chicago Press. 145–64.Google Scholar
Ananiadou, Sophia, Kell, Douglas B. and Tsujii, Jun-Ichi. 2006. Text Mining and Its Potential Applications in Systems Biology. Trends in Biotechnology 24(12). 5719.CrossRefGoogle ScholarPubMed
Anderson, Chris. 2008. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine 06.2008. www.wired.com/2008/06/pb-theory (accessed 20 February 2020).Google Scholar
Arppe, Antti, Gilquin, Gaëtanelle, Glynn, Dylan, Hilpert, Martin and Zeschel, Arne. 2010. Cognitive Corpus Linguistics: Five Points of Debate on Current Theory and Methodology. Corpora 5(1). 127.Google Scholar
Baron, Alistair, and Rayson, Paul. 2008. VARD 2: A Tool for Dealing with Spelling Variation in Historical Corpora. In Proceedings of the Postgraduate Conference in Corpus Linguistics. Birmingham: Aston University. http://ucrel.lancs.ac.uk/people/paul/publications/BaronRaysonAston2008.pdf.Google Scholar
Biber, Douglas. 2003. Compressed Noun-Phrase Structures in Newspaper Discourse: The Competing Demands of Popularization vs. Economy. In Aitchison, Jean and Lewis, Diana M., eds. New Media Language. London: Routledge. 169–81.Google Scholar
Biber, Douglas, Finegan, Edward and Atkinson, Dwight. 1994. ARCHER and Its Challenges: Compiling and Exploring a Representative Corpus of Historical English Registers. In Fries, Udo, Tottie, Gunnel and Schneider, Peter, eds. Creating and Using English Language Corpora: Papers from the 14th International Conference on English Language Research on Computerized Corpora, Zurich 1993. 113.Google Scholar
Biber, Douglas, and Conrad, Susan. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.Google Scholar
Brants, Thorsten. 2020. Inter-Annotator Agreement for a German Newspaper Corpus. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00). Luxembourg: European Language Resources Association (ELRA).Google Scholar
Denison, David. 1998. Syntax. In Romaine, Suzanne, ed. The Cambridge History of the English Language, vol. 4, 1776–1997. Cambridge: Cambridge University Press. 92329.Google Scholar
Dorman, Carsten F., Elith, Jane, Bacher, Sven et al. 2013. Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance. Ecography 36. 2746. https://damariszurell.github.io/files/Dormann_etal_Ecography_2013.pdf.CrossRefGoogle Scholar
Evert, Stefan. 2006. How Random Is a Corpus? The Library Metaphor. Zeitschrift für Anglistik und Amerikanistik 54(2). 177–90.Google Scholar
Finn, Aidan, and Kushmerick, Nicolas. 2003. Learning to Classify Documents According to Genre. Proceedings of IJCAI-03 Workshop on Computational Approaches to Style Analysis and Synthesis. www.aidanf.net/publications/finn03learninggenre.pdf.Google Scholar
Gries, Stefan T. 2010. Corpus Linguistics and Theoretical Linguistics: A Love-Hate Relationship? Not Necessarily … International Journal of Corpus Linguistics 15(3). 327–43.CrossRefGoogle Scholar
Gries, Stefan T. 2015. The Most Under-Used Statistical Method in Corpus Linguistics: Multi-Level (and Mixed-Effects) Models. Corpora 10(1). 95125.Google Scholar
Gries, Stefan Th., and Hilpert, Martin. 2008. The Identification of Stages in Diachronic Data: Variability-Based Neighbor Clustering. Corpora 3 (1). 5981.Google Scholar
Grimmer, Justin, and Stewart, Brandon. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21(3). 267–97.CrossRefGoogle Scholar
Grover, Claire, and Tobin, Richard. 2006. Rule-Based Chunking and Reusability. In Calzolari, Nicoletta, Choukri, Khalid, Gangemi, Aldo, Maegaard, Bente, Mariani, Joseph, Odijk, Jan and Tapias, Daniel, eds. Proceedings of LREC 2006, Genoa, Italy: European Language Resources Association (ELRA). 873–8.Google Scholar
Gulordava, Kristina, and Merlo, Paola. 2015. Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek. International Conference on Dependency Linguistics, Uppsala. www.aclweb.org/anthology/W15-2115.Google Scholar
Harrell, Frank E. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Cham: Springer.CrossRefGoogle Scholar
Hawkins, John A. 2004. Efficiency and Complexity in Grammars. Oxford: Oxford University Press.CrossRefGoogle Scholar
Hilpert, Martin, and Gries, Stefan. 2016. Quantitative Approaches to Diachronic Corpus Linguistics. In Kytö, Merja and Pahta, Paivi, eds. The Cambridge Handbook of English Historical Linguistics. Cambridge: Cambridge University Press. 3653.CrossRefGoogle Scholar
Hopper, Paul J., and Traugott, Elizabeth Closs. 2003. Grammaticalization. 2nd ed. Cambridge Textbooks in Linguistics. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Hundt, Marianne, Denison, David and Schneider, Gerold. 2012. Relative Complexity in Scientific Discourse. English Language and Linguistics 16(2). 209–40.CrossRefGoogle Scholar
Jurafsky, Dan, and Martin, James H.. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Pearson Prentice Hall.Google Scholar
Klein, Dan, and Manning, Christopher. 2004. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04). 478–85. www.aclweb.org/anthology/P04–1000.CrossRefGoogle Scholar
Kroch, Anthony, and Taylor, Ann. 2000. Verb-Object Order in Early Middle English. In Pintzuk, Susan, Tsoulas, George and Warner, Anthony, eds. Diachronic Syntax: Models and Mechanisms. Oxford: Oxford University Press. 132–87.Google Scholar
Lapata, Mirella, and Keller, Frank. 2005. Web-Based Models for Natural Language Processing. ACM Transactions on Speech and Language Processing. 2(1). 131.Google Scholar
Larsson, Tove, Plonsky, Luke and Hancock, Gregory R.. 2020. On the Benefits of Structural Equation Modeling for Corpus Linguists. Corpus Linguistics and Linguistic Theory, published ahead of print. https://doi.org/10.1515/cllt-2020-0051.CrossRefGoogle Scholar
Leech, Geoffrey, Hundt, Marianne, Mair, Christian and Smith, Nicholas. 2009. Change in Contemporary English: A Grammatical Study. Cambridge: Cambridge University Press.Google Scholar
López-Couso, Maria José, Aarts, Bas and Méndez-Naya, Belén. 2012. Late Modern English Syntax. In Bergs, Alexander and Brinton, Laurel J., eds. Historical Linguistics of English: An International Handbook. Volume I. Handbooks of Linguistics and Communication Science [HSK] 34.1. Berlin: Mouton de Gruyter. 869–87.Google Scholar
Los, Bettelou. 2005. The Rise of the to-Infinitive. Oxford: Oxford University Press.CrossRefGoogle Scholar
Paul, Ranjit. 2017. Multicollinearity: Causes, Effects and Remedies. New Delhi: Indian Agricultural Research Institute. www.researchgate.net/publication/255640558_MULTICOLLINEARITY_CAUSES_EFFECTS_AND_REMEDIES.Google Scholar
Pfenninger, Simone. 2009. Grammaticalization Paths of English and High German Existential Constructions. Bern: Peter Lang.Google Scholar
Rayson, Paul, Archer, Dawn, Baron, Alistair, Culpeper, Jonathan and Smith, Nicholas. 2007. Tagging the Bard: Evaluating the Accuracy of a Modern POS Tagger on Early Modern English Corpora. In Davies, Matthew, Rayson, Paul, Hunston, Susan and Danielsson, Pernilla, eds. Proceedings of Corpus Linguistics 2007. University of Birmingham.Google Scholar
Röthlisberger, Melanie, and Schneider, Gerold. 2013. Of-Genitive versus s-Genitive: A Corpus-Based Analysis of Possessive Constructions in 20th Century English. In Bennett, Paul, Durrell, Martin, Silke Scheible, Richard J. Whitt, Holger Keibel, Kupietz, Marc and Mair, Christian, eds. New Methods in Historical Corpora. Tübingen: Narr. 163–80.Google Scholar
Marina, Santini. 2004. A Shallow Approach to Syntactic Feature Extraction for Genre Classification. Proceedings of the 7th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK 2004). Birmingham, UK. https://pdfs.semanticscholar.org/4de8/23dac38edce09f60cfb3eb93524204f57e7e.pdf.Google Scholar
Saville-Troike, Muriel, and Barto, Karen. 2017. Introducing Second Language Acquisition. 3rd ed. Cambridge: Cambridge University Press.Google Scholar
Schmid, Helmut. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of International Conference on New Methods in Language Processing. www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger1.pdf.Google Scholar
Schneider, Gerold. 2008. Hybrid Long-Distance Functional Dependency Parsing. Doctoral thesis. Institute of Computational Linguistics, University of Zurich.Google Scholar
Schneider, Gerold. 2018. Differences between Swiss High German and German High German via Data-Driven Methods. In 3rd Swiss Text Analytics Conference (SwissText 2018). Winterthur, 12 June 2018–13 June 2018. CEUR-WS, 1725. www.zora.uzh.ch/id/eprint/162838/.Google Scholar
Schneider, Gerold. 2020. Spelling Normalisation of Late Modern English: Comparison and Combination of VARD and Character-Based Statistical Machine Translation. In Kytö, Merja and Smitterberg, Erik, eds. Late Modern English: Novel Encounters. Studies in Language Series. Amsterdam: John Benjamins. 243–68.Google Scholar
Schneider, Gerold, Lehmann, Hans Martin and Schneider, Peter. 2014. Parsing Early and Late Modern English Corpora. Literary and Linguistic Computing 30(3). 423–39. https://doi.org/10.1093/llc/fqu001.Google Scholar
Schneider, Gerold, Hundt, Marianne and Oppliger, Rahel. 2016. Part-of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER. In Dipper, Stefanie, Neubarth, Friedrich and Zinsmeister, Heike, eds. Proceedings of the 13th Conference on Natural Language Processing, KONVENS 2016, September 19–21, 2016. Bochumer Linguistische Arbeitsberichte 16. Bochum: Germany.Google Scholar
Schneider, Gerold, Pettersson, Eva and Percillier, Michael. 2017. Comparing Rule-Based and SMT-Based Spelling Normalisation for English Historical Texts. In Bouma, Gerlof and Adesam, Yvonne, eds. Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language. Linköping: Linköping University Electronic Press. 40–6.Google Scholar
Schreiber-Gregory, Deanna. 2018. Regulation Techniques for Multicollinearity: Lasso, ridge, and Elastic Nets. Proceedings of Western Users of SAS Software Conferences 2018, September 5–7, 2018. Sacramento, CA. www.lexjansen.com/wuss/2018/131_Final_Paper_PDF.pdf.Google Scholar
Shannon, Claude E. 1951. Prediction and Entropy of Printed English. The Bell System Technical Journal 30. 5064.CrossRefGoogle Scholar
Sinclair, John McHardy, and Carter, Ronald. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge.Google Scholar
Szmrecsanyi, Benedikt, Rosenbach, Anette, Bresnan, Joan and Wolk, Christoph. 2014. Culturally Conditioned Language Change? A Multi-Variate Analysis of Genitive Constructions in ARCHER. In Hundt, Marianne, ed. Late Modern English Syntax. Cambridge: Cambridge University Press. 133–52.Google Scholar
Tesnière, Lucien. 1959. Eléments de syntaxe structurale. Paris: Librairie Klincksieck.Google Scholar
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins.CrossRefGoogle Scholar
van Noord, Gertjan and Bouma, Gosse. 2009. Parsed Corpora for Linguistics. In Baldwin, Timothy and Kordoni, Valia, eds. Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous? Association for Computational Linguistics. 33–9. www.aclweb.org/anthology/W09–0107.Google Scholar
Xiao, Richard. 2009. Theory-Driven Corpus Research: Using Corpora to Inform Aspect Theory. In Lüdeling, Anke and Kytö, Merja, eds., Corpus Linguistics: An International Handbook, vol. 2. Berlin: Mouton de Gruyter. 9871008.Google Scholar
Yang, Li-Gong, Jian, Zhu and Shi-Ping., Tang 2013. Keywords Extraction Based on Classification. Advanced Materials Research 765. 1604–9.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×