Search

How Linguistics Learned to Stop Worrying and Love the Language Models
Richard Futrell, Kyle Mahowald
Journal:

Behavioral and Brain Sciences / Accepted manuscript

Published online by Cambridge University Press:

24 July 2025, pp. 1-98
- Article
- - You have access
- PDF
- Export citation
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don’t really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.

A common symptom geometry of mood improvement under sertraline and placebo associated with distinct neural patterns
Lucie Berkovitch, Kangjoo Lee, Jie Ji, Markus Helmer, Masih Rahmati, Jure Demsar, Aleksij Kraljic, Andraz Matkovic, Zailyn Tamayo, John Murray, Grega Repovs, John Krystal, William Martin, Clara Fonteneau, Alan Anticevic
Journal:

Psychological Medicine / Volume 55 / 2025

Published online by Cambridge University Press:

04 July 2025, e185
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Background
Understanding the mechanisms of major depressive disorder (MDD) improvement is a key challenge to determining effective personalized treatments.
Methods
To identify a data-driven pattern of clinical improvement in MDD and to quantify neural-to-symptom relationships according to antidepressant treatment, we performed a secondary analysis of the publicly available dataset EMBARC (Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care). In EMBARC, participants with MDD were treated either by sertraline or placebo for 8 weeks (Stage 1), and then switched to bupropion according to clinical response (Stage 2). We computed a univariate measure of clinical improvement through a principal component (PC) analysis on the variations of individual items of four clinical scales measuring depression, anxiety, suicidal ideas, and manic-like symptoms. We then investigated how initial clinical and neural factors predicted this measure during Stage 1 by running a linear model for each brain parcel’s resting-state global brain connectivity (GBC) with individual improvement scores during Stage 1.
Results
The first PC (PC1) was similar across treatment groups at stages 1 and 2, suggesting a shared pattern of symptom improvement. PC1 patients’ scores significantly differed according to treatment, whereas no difference in response was evidenced between groups with the Clinical Global Impressions Scale. Baseline GBC correlated with Stage 1 PC1 scores in the sertraline but not in the placebo group.
Using data-driven reduction of symptom scales, we identified a common profile of symptom improvement with distinct intensity between sertraline and placebo.
Conclusions
Mapping from data-driven symptom improvement onto neural circuits revealed treatment-responsive neural profiles that may aid in optimal patient selection for future trials.

Children’s simultaneous or successive acquisition of vocabulary and grammar: Evidence from cross-situational learning
Wensi Zhang, Padraic Monaghan, Sophie Bennett, Patrick Rebuschat
Journal:

Journal of Child Language , First View

Published online by Cambridge University Press:

04 July 2025, pp. 1-15
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Recent evidence from cross-situational learning (CSL) studies have shown that adult learners can acquire words and grammar simultaneously when sentences of the novel language co-occur with dynamic scenes to which they refer. Syntactic bootstrapping accounts suggest that grammatical knowledge may help scaffold vocabulary acquisition by constraining possible meanings, thus, for children, words and grammar may be acquired at different rates. Twenty children (ages 8 to 9) were exposed in a CSSL study to an artificial language comprising nouns, verbs, and case markers occurring within a verb-final grammatical structure. Children acquired syntax (i.e., word order) effectively, but we found no evidence of vocabulary learning, whereas previous adult studies showed learning of both from similar input. Grammatical information may thus be available early for children, to help constrain and support later vocabulary learning. We propose that gradual maturation of declarative memory systems may result in more effective vocabulary learning in adults.

4 - How Do Children Learn Language?
from Section I - The Early Years
Paul Ibbotson, The Open University
Book:

Child Development

Published online:

19 June 2025

Print publication:

12 June 2025, pp 51-66
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Describe the challenges children face in learning language; understand key features of child language development; explain the strategies children use to learn sounds, words, and grammar.

Phonological processing of Japanese Kanji word by L1-Chinese learners of Japanese: verification of consistency and frequency effects
Nan Kang, Satoru Saito
Journal:

Applied Psycholinguistics / Volume 46 / 2025

Published online by Cambridge University Press:

02 June 2025, e16
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Studies investigating phonological processing indicate that words with high regularity/consistency in pronunciation or high frequency positively impact reading speed and accuracy. Such effects of consistency and frequency have been demonstrated in Japanese kanji words and are known as consistency and frequency effects. Using a mixed-effects model analysis, this study reexamines the two effects in Chinese–Japanese second-language (L2) learners with two different L2 proficiency levels. The two effects are robustly replicated in oral reading tasks; in particular, the performance of intermediate learners is similar to that of Japanese semantic dementia patients, whose reading accuracy is affected by sensitivity to the statistical properties of words (i.e., reading consistency and lexical frequency). These results are explained by the interaction between semantic memory and word statistical properties. Moreover, the interaction highlights the important consequences of statistical learning underlying L2 phonological processing.

Incident-specific cyber insurance
Wing Fung Chong, Daniël Linders, Zhiyu Quan, Linfeng Zhang
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 55 / Issue 2 / May 2025

Published online by Cambridge University Press:

27 March 2025, pp. 395-425

Print publication:

May 2025
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In today’s insurance market, numerous cyber insurance products provide bundled coverage for losses resulting from different cyber events, including data breaches and ransomware attacks. Every category of incident has its own specific coverage limit and deductible. Although this gives prospective cyber insurance buyers more flexibility in customizing the coverage and better manages the risk exposures of sellers, it complicates the decision-making process in determining the optimal amount of risks to retain and transfer for both parties. This article aims to build an economic foundation for these incident-specific cyber insurance products with a focus on how incident-specific indemnities should be designed for achieving Pareto optimality for both the insurance seller and the buyer. Real data on cyber incidents are used to illustrate the feasibility of this approach. Several implementation improvement methods for practicality are also discussed.

Chapter 14 - Learned Activation Patterns of Simple Connected Processing Units
from Part III - Mathematical Theories
Falk Huettig, Max-Planck-Institut für Psycholinguistik, The Netherlands
Book:

Looking Ahead

Published online:

20 March 2025

Print publication:

06 March 2025, pp 146-162
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Connectionist networks consisting of large numbers of simple connected processing units implicitly or explicitly model aspects of human predictive behavior. Prediction in connectionist models can occur in different ways and with quite different connectionist architectures. Connectionist neural networks offer a useful playground and ‘hands-on way’ to explore prediction and to figure out what may be special about how the human mind predicts.

Affixation patterns in native language and sequence processing by statistical learning mechanisms
Mikhail Ordin
Journal:

Evolutionary Human Sciences / Volume 7 / 2025

Published online by Cambridge University Press:

14 February 2025, e11
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The suffixing bias (the tendency to exploit suffixes more often than prefixes to express grammatical meanings) in languages was identified a century ago, yet we still lack a clear account for why it emerged, namely, whether the bias emerged because general cognitive mechanisms shape languages to be more easily processed by available cognitive machinery, or if the bias is speech-specific and is determined by domain-specific mechanisms. We used statistical learning (SL) experiments to compare processing of suffixed and prefixed sequences on linguistic and non-linguistic material. SL is not speech-specific, and we observed the suffixing preference only on linguistic material, suggesting its language-specific origin. Moreover, morphological properties of native languages (existence of grammatical prefixes) modulate suffixing preferences in SL experiments only on linguistic material, suggesting limited cross-domain transfer.

The graded effects of bilingualism and language ability on children’s cross-situational word learning
Kimberly Crespo, Margarita Kaushanskaya
Journal:

Bilingualism: Language and Cognition / Volume 28 / Issue 2 / March 2025

Published online by Cambridge University Press:

08 January 2025, pp. 498-509
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The present study examined whether length of bilingual experience and language ability contributed to cross-situational word learning (XSWL) in Spanish-English bilingual school-aged children. We contrasted performance in a high variability condition, where children were exposed to multiple speakers and exemplars simultaneously, to performance in a condition where children were exposed to no variability in either speakers or exemplars. Results revealed graded effects of bilingualism and language ability on XSWL under conditions of increased variability. Specifically, bilingualism bolstered learning when variability was present in the input but not when variability was absent in the input. Similarly, robust language abilities supported learning in the high variability condition. In contrast, children with weaker language skills learned more word-object associations in the no variability condition than in the high variability condition. Together, the results suggest that variation in the learner and variation in the input interact and modulate mechanisms of lexical learning in children.

13 - Universal Compression
from Part II - Lossless Data Compression
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 245-266
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 13 we will discuss how to produce compression schemes that do not require a priori knowledge of the generative distribution. It turns out that designing a compression algorithm able to adapt to an unknown distribution is essentially equivalent to the problem of estimating an unknown distribution, which is a major topic of statistical learning. The plan for this chapter is as follows: (1) We will start by discussing the earliest example of a universal compression algorithm (of Fitingof). It does not talk about probability distributions at all. However, it turns out to be asymptotically optimal simultaneously for all iid distributions and with small modifications for all finite-order Markov chains. (2) The next class of universal compressors is based on assuming that the true distribution belongs to a given class. These methods proceed by choosing a good model distribution serving as the minimax approximation to each distribution in the class. The compression algorithm for a single distribution is then designed as in previous chapters. (3) Finally, an entirely different idea are algorithms of Lempel–Ziv type. These automatically adapt to the distribution of the source, without any prior assumptions required.

Prediction and Classification in Nonlinear Data Analysis: Something Old, Something New, Something Borrowed, Something Blue
Jacqueline J. Meulman
Journal:

Psychometrika / Volume 68 / Issue 4 / December 2003

Published online by Cambridge University Press:

01 January 2025, pp. 493-517
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Prediction and classification are two very active areas in modern data analysis. In this paper, prediction with nonlinear optimal scaling transformations of the variables is reviewed, and extended to the use of multiple additive components, much in the spirit of statistical learning techniques that are currently popular, among other areas, in data mining. Also, a classification/clustering method is described that is particularly suitable for analyzing attribute-value data from systems biology (genomics, proteomics, and metabolomics), and which is able to detect groups of objects that have similar values on small subsets of the attributes.

Rationale and Applications of Survival Tree and Survival Ensemble Methods
Yan Zhou, John J. McArdle
Journal:

Psychometrika / Volume 80 / Issue 3 / September 2015

Published online by Cambridge University Press:

01 January 2025, pp. 811-833
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Classification and Regression Trees (CART), and their successors—bagging and random forests, are statistical learning tools that are receiving increasing attention. However, due to characteristics of censored data collection, standard CART algorithms are not immediately transferable to the context of survival analysis. Questions about the occurrence and timing of events arise throughout psychological and behavioral sciences, especially in longitudinal studies. The prediction power and other key features of tree-based methods are promising in studies where an event occurrence is the outcome of interest. This article reviews existing tree algorithms designed specifically for censored responses as well as recently developed survival ensemble methods, and introduces available computer software. Through simulations and a practical example, merits and limitations of these methods are discussed. Suggestions are provided for practical use.

Statistical learning of foreign language words in younger and older adults
Yuxin Ge, Susana Correia, Yun-Wei Lee, Ziyi Jin, Jason Rothman, Patrick Rebuschat
Journal:

Bilingualism: Language and Cognition / Volume 28 / Issue 3 / May 2025

Published online by Cambridge University Press:

12 December 2024, pp. 716-727
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Statistical learning, that is, our ability to track and learn from distributional information in the environment, plays a fundamental role in language acquisition, yet little research has investigated this process in older language learners. In the present study, we address this gap by comparing the cross-situational learning of foreign words in younger and older adults. We also tested whether learning was affected by previous experience with multiple languages. We found that both age groups successfully learned the novel words after a short exposure period, confirming that statistical learning ability is preserved in late adulthood. However, the two groups differed in their learning trajectories, with the younger group outperforming the older group during the later stages of learning. Previous language experience did not predict learning outcomes. Given that implicit language learning mechanisms are shown to be preserved over the lifespan, the present data provide crucial support for the assumptions underlying claims that language learning interventions in older age could be leveraged as a targeted intervention to help build or maintain resilience to age-related cognitive decline.

6 - Computational Approaches to Bilingual Phonetics and Phonology
from Part I - Approaches to Bilingual Phonetics and Phonology
- By Frans Adriaans
Edited by Mark Amengual, University of California, Santa Cruz
Book:

The Cambridge Handbook of Bilingual Phonetics and Phonology

Published online:

14 November 2024

Print publication:

21 November 2024, pp 126-144
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Computational models allow researchers to formulate explicit theories of language acquisition, and to test these theories against natural language corpora. This chapter puts the problem of bilingual phonetic and phonological acquisition in a computational perspective. The main goal of the chapter is to show how computational modeling can be used to address crucial questions regarding bilingual phonetic and phonological acquisition, which would be difficult to address with other experimental methods. The chapter first provides a general introduction to computational modeling, using a simplified model of phonotactic learning as an example to illustrate the main methodological issues. The chapter then gives an overview of recent studies that have begun to address the computational modeling of bilingual phonetic and phonological acquisition, focusing on phonetic and phonological cues for bilingual input separation, bilingual phonology in computational models of speech comprehension, and computational models of L2 speech perception. The chapter concludes by discussing several key challenges in the development of computational models of bilingual phonetic and phonological acquisition.

The role of phonology in non-native word learning: Evidence from cross-situational statistical learning
Yuxin Ge, Padraic Monaghan, Patrick Rebuschat
Journal:

Bilingualism: Language and Cognition / Volume 28 / Issue 1 / January 2025

Published online by Cambridge University Press:

25 March 2024, pp. 15-30
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Adults often encounter difficulty perceiving and processing sounds of a second language (L2). In order to acquire word-meaning mappings, learners need to determine what the language-relevant phonological contrasts are in the language. In this study, we examined the influence of phonology on non-native word learning, determining whether the language-relevant phonological contrasts could be acquired by abstracting over multiple experiences, and whether awareness of these contrasts could be related to learning. We trained English- and Mandarin-native speakers with pseudowords via a cross-situational statistical learning task (CSL). Learners were able to acquire the phonological contrasts across multiple situations, but similar-sounding words (i.e., minimal pairs) were harder to acquire, and words that contrast in a non-native suprasegmental feature (i.e., Mandarin lexical tone) were even harder for English-speakers, even with extended exposure. Furthermore, awareness of the non-native phonology was not found to relate to learning.

Statistical learning of phonotactics by children can be affected by another statistical learning task
Peter T. Richtsmeier, Lisa Goffman
Journal:

Applied Psycholinguistics / Volume 44 / Issue 6 / November 2023

Published online by Cambridge University Press:

28 November 2023, pp. 1124-1142
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Children typically produce high-frequency phonotactic sequences, such as the /st/ in “toaster,” more accurately than the lower frequency /mk/ in “tomcat.” This high-frequency advantage can be simulated experimentally with a statistical learning paradigm, and when 4-year-old children are familiarized with many examples of a sequence like /mk/, they generally produce it more accurately than if they are exposed to just a few examples. Here, we sought to expand our understanding of the high-frequency advantage, but surprisingly, we instead uncovered an exception. Twenty-nine children between 4 and 5 years of age completed a phonotactic statistical learning experiment, but they also completed a separate experiment focused on statistical learning of prosodic contours. The order of the experiments was randomized, with the phonotactic statistical learning experiment occurring first for half of the children. For the children who completed the phonotactic learning experiment first, the results were consistent with previous research and a high-frequency advantage. However, children who completed the phonotactic learning experiment second produced low-frequency sequences more accurately than high-frequency sequences. There is little precedent for the latter effect, but studies of multistream statistical learning may provide some context for unpacking and extending the result.

9 - Language Acquisition
from IV - Language Acquisition
Harry van der Hulst, University of Connecticut
Book:

A Mind for Language

Published online:

13 October 2023

Print publication:

21 September 2023, pp 293-336
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 9 focuses on the claim that the language input that children are exposed to is not rich enough to explain how they can construct a mental grammar. This leads to the poverty of the stimulus argument in support of the Innateness Hypothesis, which holds that if the input is insufficient, children must be born with an innate system that bridges the gap between the poor input and the richness of their knowledge of language. We will examine in detail in which ways the input could be called poor. We then turn to Chomsky’s Principles and Parameters model of language acquisition, paying attention to certain developments in this model that reduced the role of innate knowledge. Along the way we also introduce two additional arguments. The argument from convergence is based on the fact that all learners that grow up in the same speech community end up with (essentially) the same mental grammar despite having received different input. We also mention the argument from speed of acquisition, which is based on the fact that language acquisition is “fast,” no matter how you measure it. We then review alternative, more empiricist, approaches to language acquisition.

Word segmentation from transcriptions of child-directed speech using lexical and sub-lexical cues
Zébulon GORIELY, Andrew CAINES, Paula BUTTERY
Journal:

Journal of Child Language / Volume 52 / Issue 1 / January 2025

Published online by Cambridge University Press:

12 September 2023, pp. 1-41
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We compare two frameworks for the segmentation of words in child-directed speech, PHOCUS and MULTICUE. PHOCUS is driven by lexical recognition, whereas MULTICUE combines sub-lexical properties to make boundary decisions, representing differing views of speech processing. We replicate these frameworks, perform novel benchmarking and confirm that both achieve competitive results. We develop a new framework for segmentation, the DYnamic Programming MULTIple-cue framework (DYMULTI), which combines the strengths of PHOCUS and MULTICUE by considering both sub-lexical and lexical cues when making boundary decisions. DYMULTI achieves state-of-the-art results and outperforms PHOCUS and MULTICUE on 15 of 26 languages in a cross-lingual experiment. As a model built on psycholinguistic principles, this validates DYMULTI as a robust model for speech segmentation and a contribution to the understanding of language acquisition.

Quality, not quantity, impacts the differentiation of near-synonyms
Aja Altenhof, Gareth Roberts
Journal:

Language and Cognition / Volume 15 / Issue 4 / December 2023

Published online by Cambridge University Press:

04 August 2023, pp. 854-883
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
How much information do language users need to differentiate potentially absolute synonyms into near-synonyms? How consistent must the information be? We present two simple experiments designed to investigate this. After exposure to two novel verbs, participants generalized them to positive or negative contexts. In Experiment 1, there was a tendency across conditions for the verbs to become differentiated by context, even following inconsistent, random, or neutral information about context during exposure. While a subset of participants matched input probabilities, a high proportion did not. As a consequence, the overall pattern was of growth in differentiation that did not closely track input distributions. Rather, there were two main patterns: When each verb had been presented consistently in a positive or negative context, participants overwhelmingly specialized both verbs in their output. When this was not the case, the verbs tended to become partially differentiated, with one becoming specialized and the other remaining less specialized. Experiment 2 replicated and expanded on Experiment 1 with the addition of a pragmatic judgment task and neutral contexts at test. Its results were consistent with Experiment 1 in supporting the conclusion that quality of input may be more important than quantity in the differentiation of synonyms.

Frequent vs. infrequent words shape toddlers’ real-time sentence comprehension
Christine E. POTTER, Casey LEW-WILLIAMS
Journal:

Journal of Child Language / Volume 51 / Issue 6 / November 2024

Published online by Cambridge University Press:

04 July 2023, pp. 1478-1488
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We examined how noun frequency and the typicality of surrounding linguistic context contribute to children’s real-time comprehension. Monolingual English-learning toddlers viewed pairs of pictures while hearing sentences with typical or atypical sentence frames (Look at the… vs. Examine the…), followed by nouns that were higher- or lower-frequency labels for a referent (horse vs. pony). Toddlers showed no significant differences in comprehension of nouns in typical and atypical sentence frames. However, they were less accurate in recognizing lower-frequency nouns, particularly among toddlers with smaller vocabularies. We conclude that toddlers can recognize nouns in diverse sentence contexts, but their representations develop gradually.

Search Results

Refine search

Refine search

Actions for selected content:

43 results

How Linguistics Learned to Stop Worrying and Love the Language Models

A common symptom geometry of mood improvement under sertraline and placebo associated with distinct neural patterns

Children’s simultaneous or successive acquisition of vocabulary and grammar: Evidence from cross-situational learning

4 - How Do Children Learn Language?

Summary

Phonological processing of Japanese Kanji word by L1-Chinese learners of Japanese: verification of consistency and frequency effects

Incident-specific cyber insurance

Chapter 14 - Learned Activation Patterns of Simple Connected Processing Units

Summary

Affixation patterns in native language and sequence processing by statistical learning mechanisms

The graded effects of bilingualism and language ability on children’s cross-situational word learning

13 - Universal Compression

Summary

Prediction and Classification in Nonlinear Data Analysis: Something Old, Something New, Something Borrowed, Something Blue

Rationale and Applications of Survival Tree and Survival Ensemble Methods

Statistical learning of foreign language words in younger and older adults

6 - Computational Approaches to Bilingual Phonetics and Phonology

Summary

The role of phonology in non-native word learning: Evidence from cross-situational statistical learning

Statistical learning of phonotactics by children can be affected by another statistical learning task

9 - Language Acquisition

Summary

Word segmentation from transcriptions of child-directed speech using lexical and sub-lexical cues

Quality, not quantity, impacts the differentiation of near-synonyms

Frequent vs. infrequent words shape toddlers’ real-time sentence comprehension

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

43 results

Summary

Summary

Summary

Summary

Summary