No CrossRef data available.
Published online by Cambridge University Press: 01 January 2026
The German gender system is known for its complexity, and there is a persistent misconception that it is largely arbitrary, and hence a challenge for the typology of gender systems. In response, we construct a database of more than 30,000 German nouns and show that a boosting tree model achieves a predictive success of 96%. Even more surprising, the model performs at 87% when trained on just the 100 most frequent nouns. We thus demonstrate that the complex German system fits into a typologically well-known scheme, being a combination of semantic and formal assignment principles. In addition to our success with the specific problem, we show the value of statistical modeling for typologists and reflect on what exactly we can learn from these techniques.
Versions of this paper were read at the MultiGender Workshop ‘A multilingual approach to grammatical gender’, Centre for Advanced Study, Oslo 2020, at the 53rd annual meeting of the Societas Linguistica Europaea, Bucharest (online) 2020, at the 54th annual meeting of the Societas Linguistica Europaea, Athens (online) 2021, and at the International Symposium of Morphology (ISMo), Paris (online) 2021. We thank members of those audiences for their comments. We are grateful to colleagues who offered helpful suggestions, and to those who read and commented on drafts of the paper: Jenny Audring, Matthew Baerman, Laura Becker, Sacha Beniamine, Dunstan Brown, Johannes Dellert, Hans-Olav Enger, Sebastian Kürschner, Barbara Schlücker, and Anna Thornton. We thank Olivier Bonami for suggesting this collaboration, and Lisa Mack and Penny Everson for their help with the preparation of the manuscript. Our special thanks go to the editors of Language Andries Coetzee and John Beavers, the co-editor Shelome Gooden, associate editor Titus von der Malsburg, and the two referees, Harald Baayen and an anonymous referee, for their sustained constructive engagement with the paper. Financial support from the ESRC (grant ES/R00837X/1 ‘Optimal categorization: The origin and nature of gender from a psycholinguistic perspective’), from the Emmy Noether project (grant 504155622 ‘Bayesian modelling of spatial typology’), and from the Centre for Advanced Study, Oslo, is gratefully acknowledged. Ce travail a bénéficié d'une aide de l'Etat gérée par l'Agence Nationale de la Recherche au titre du programme ‘Investissements d'Avenir’ portant la référence ANR-10-LABX-0083. Il contribue à l'IdEx Université de Paris - ANR-18-IDEX-0001.