To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don’t really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.
Understanding the mechanisms of major depressive disorder (MDD) improvement is a key challenge to determining effective personalized treatments.
Methods
To identify a data-driven pattern of clinical improvement in MDD and to quantify neural-to-symptom relationships according to antidepressant treatment, we performed a secondary analysis of the publicly available dataset EMBARC (Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care). In EMBARC, participants with MDD were treated either by sertraline or placebo for 8 weeks (Stage 1), and then switched to bupropion according to clinical response (Stage 2). We computed a univariate measure of clinical improvement through a principal component (PC) analysis on the variations of individual items of four clinical scales measuring depression, anxiety, suicidal ideas, and manic-like symptoms. We then investigated how initial clinical and neural factors predicted this measure during Stage 1 by running a linear model for each brain parcel’s resting-state global brain connectivity (GBC) with individual improvement scores during Stage 1.
Results
The first PC (PC1) was similar across treatment groups at stages 1 and 2, suggesting a shared pattern of symptom improvement. PC1 patients’ scores significantly differed according to treatment, whereas no difference in response was evidenced between groups with the Clinical Global Impressions Scale. Baseline GBC correlated with Stage 1 PC1 scores in the sertraline but not in the placebo group.
Using data-driven reduction of symptom scales, we identified a common profile of symptom improvement with distinct intensity between sertraline and placebo.
Conclusions
Mapping from data-driven symptom improvement onto neural circuits revealed treatment-responsive neural profiles that may aid in optimal patient selection for future trials.
Recent evidence from cross-situational learning (CSL) studies have shown that adult learners can acquire words and grammar simultaneously when sentences of the novel language co-occur with dynamic scenes to which they refer. Syntactic bootstrapping accounts suggest that grammatical knowledge may help scaffold vocabulary acquisition by constraining possible meanings, thus, for children, words and grammar may be acquired at different rates. Twenty children (ages 8 to 9) were exposed in a CSSL study to an artificial language comprising nouns, verbs, and case markers occurring within a verb-final grammatical structure. Children acquired syntax (i.e., word order) effectively, but we found no evidence of vocabulary learning, whereas previous adult studies showed learning of both from similar input. Grammatical information may thus be available early for children, to help constrain and support later vocabulary learning. We propose that gradual maturation of declarative memory systems may result in more effective vocabulary learning in adults.
Describe the challenges children face in learning language; understand key features of child language development; explain the strategies children use to learn sounds, words, and grammar.
Studies investigating phonological processing indicate that words with high regularity/consistency in pronunciation or high frequency positively impact reading speed and accuracy. Such effects of consistency and frequency have been demonstrated in Japanese kanji words and are known as consistency and frequency effects. Using a mixed-effects model analysis, this study reexamines the two effects in Chinese–Japanese second-language (L2) learners with two different L2 proficiency levels. The two effects are robustly replicated in oral reading tasks; in particular, the performance of intermediate learners is similar to that of Japanese semantic dementia patients, whose reading accuracy is affected by sensitivity to the statistical properties of words (i.e., reading consistency and lexical frequency). These results are explained by the interaction between semantic memory and word statistical properties. Moreover, the interaction highlights the important consequences of statistical learning underlying L2 phonological processing.
In today’s insurance market, numerous cyber insurance products provide bundled coverage for losses resulting from different cyber events, including data breaches and ransomware attacks. Every category of incident has its own specific coverage limit and deductible. Although this gives prospective cyber insurance buyers more flexibility in customizing the coverage and better manages the risk exposures of sellers, it complicates the decision-making process in determining the optimal amount of risks to retain and transfer for both parties. This article aims to build an economic foundation for these incident-specific cyber insurance products with a focus on how incident-specific indemnities should be designed for achieving Pareto optimality for both the insurance seller and the buyer. Real data on cyber incidents are used to illustrate the feasibility of this approach. Several implementation improvement methods for practicality are also discussed.
Connectionist networks consisting of large numbers of simple connected processing units implicitly or explicitly model aspects of human predictive behavior. Prediction in connectionist models can occur in different ways and with quite different connectionist architectures. Connectionist neural networks offer a useful playground and ‘hands-on way’ to explore prediction and to figure out what may be special about how the human mind predicts.
The suffixing bias (the tendency to exploit suffixes more often than prefixes to express grammatical meanings) in languages was identified a century ago, yet we still lack a clear account for why it emerged, namely, whether the bias emerged because general cognitive mechanisms shape languages to be more easily processed by available cognitive machinery, or if the bias is speech-specific and is determined by domain-specific mechanisms. We used statistical learning (SL) experiments to compare processing of suffixed and prefixed sequences on linguistic and non-linguistic material. SL is not speech-specific, and we observed the suffixing preference only on linguistic material, suggesting its language-specific origin. Moreover, morphological properties of native languages (existence of grammatical prefixes) modulate suffixing preferences in SL experiments only on linguistic material, suggesting limited cross-domain transfer.
The present study examined whether length of bilingual experience and language ability contributed to cross-situational word learning (XSWL) in Spanish-English bilingual school-aged children. We contrasted performance in a high variability condition, where children were exposed to multiple speakers and exemplars simultaneously, to performance in a condition where children were exposed to no variability in either speakers or exemplars. Results revealed graded effects of bilingualism and language ability on XSWL under conditions of increased variability. Specifically, bilingualism bolstered learning when variability was present in the input but not when variability was absent in the input. Similarly, robust language abilities supported learning in the high variability condition. In contrast, children with weaker language skills learned more word-object associations in the no variability condition than in the high variability condition. Together, the results suggest that variation in the learner and variation in the input interact and modulate mechanisms of lexical learning in children.
In Chapter 13 we will discuss how to produce compression schemes that do not require a priori knowledge of the generative distribution. It turns out that designing a compression algorithm able to adapt to an unknown distribution is essentially equivalent to the problem of estimating an unknown distribution, which is a major topic of statistical learning. The plan for this chapter is as follows: (1) We will start by discussing the earliest example of a universal compression algorithm (of Fitingof). It does not talk about probability distributions at all. However, it turns out to be asymptotically optimal simultaneously for all iid distributions and with small modifications for all finite-order Markov chains. (2) The next class of universal compressors is based on assuming that the true distribution belongs to a given class. These methods proceed by choosing a good model distribution serving as the minimax approximation to each distribution in the class. The compression algorithm for a single distribution is then designed as in previous chapters. (3) Finally, an entirely different idea are algorithms of Lempel–Ziv type. These automatically adapt to the distribution of the source, without any prior assumptions required.
Prediction and classification are two very active areas in modern data analysis. In this paper, prediction with nonlinear optimal scaling transformations of the variables is reviewed, and extended to the use of multiple additive components, much in the spirit of statistical learning techniques that are currently popular, among other areas, in data mining. Also, a classification/clustering method is described that is particularly suitable for analyzing attribute-value data from systems biology (genomics, proteomics, and metabolomics), and which is able to detect groups of objects that have similar values on small subsets of the attributes.
Classification and Regression Trees (CART), and their successors—bagging and random forests, are statistical learning tools that are receiving increasing attention. However, due to characteristics of censored data collection, standard CART algorithms are not immediately transferable to the context of survival analysis. Questions about the occurrence and timing of events arise throughout psychological and behavioral sciences, especially in longitudinal studies. The prediction power and other key features of tree-based methods are promising in studies where an event occurrence is the outcome of interest. This article reviews existing tree algorithms designed specifically for censored responses as well as recently developed survival ensemble methods, and introduces available computer software. Through simulations and a practical example, merits and limitations of these methods are discussed. Suggestions are provided for practical use.
Statistical learning, that is, our ability to track and learn from distributional information in the environment, plays a fundamental role in language acquisition, yet little research has investigated this process in older language learners. In the present study, we address this gap by comparing the cross-situational learning of foreign words in younger and older adults. We also tested whether learning was affected by previous experience with multiple languages. We found that both age groups successfully learned the novel words after a short exposure period, confirming that statistical learning ability is preserved in late adulthood. However, the two groups differed in their learning trajectories, with the younger group outperforming the older group during the later stages of learning. Previous language experience did not predict learning outcomes. Given that implicit language learning mechanisms are shown to be preserved over the lifespan, the present data provide crucial support for the assumptions underlying claims that language learning interventions in older age could be leveraged as a targeted intervention to help build or maintain resilience to age-related cognitive decline.
Computational models allow researchers to formulate explicit theories of language acquisition, and to test these theories against natural language corpora. This chapter puts the problem of bilingual phonetic and phonological acquisition in a computational perspective. The main goal of the chapter is to show how computational modeling can be used to address crucial questions regarding bilingual phonetic and phonological acquisition, which would be difficult to address with other experimental methods. The chapter first provides a general introduction to computational modeling, using a simplified model of phonotactic learning as an example to illustrate the main methodological issues. The chapter then gives an overview of recent studies that have begun to address the computational modeling of bilingual phonetic and phonological acquisition, focusing on phonetic and phonological cues for bilingual input separation, bilingual phonology in computational models of speech comprehension, and computational models of L2 speech perception. The chapter concludes by discussing several key challenges in the development of computational models of bilingual phonetic and phonological acquisition.
Adults often encounter difficulty perceiving and processing sounds of a second language (L2). In order to acquire word-meaning mappings, learners need to determine what the language-relevant phonological contrasts are in the language. In this study, we examined the influence of phonology on non-native word learning, determining whether the language-relevant phonological contrasts could be acquired by abstracting over multiple experiences, and whether awareness of these contrasts could be related to learning. We trained English- and Mandarin-native speakers with pseudowords via a cross-situational statistical learning task (CSL). Learners were able to acquire the phonological contrasts across multiple situations, but similar-sounding words (i.e., minimal pairs) were harder to acquire, and words that contrast in a non-native suprasegmental feature (i.e., Mandarin lexical tone) were even harder for English-speakers, even with extended exposure. Furthermore, awareness of the non-native phonology was not found to relate to learning.
Children typically produce high-frequency phonotactic sequences, such as the /st/ in “toaster,” more accurately than the lower frequency /mk/ in “tomcat.” This high-frequency advantage can be simulated experimentally with a statistical learning paradigm, and when 4-year-old children are familiarized with many examples of a sequence like /mk/, they generally produce it more accurately than if they are exposed to just a few examples. Here, we sought to expand our understanding of the high-frequency advantage, but surprisingly, we instead uncovered an exception. Twenty-nine children between 4 and 5 years of age completed a phonotactic statistical learning experiment, but they also completed a separate experiment focused on statistical learning of prosodic contours. The order of the experiments was randomized, with the phonotactic statistical learning experiment occurring first for half of the children. For the children who completed the phonotactic learning experiment first, the results were consistent with previous research and a high-frequency advantage. However, children who completed the phonotactic learning experiment second produced low-frequency sequences more accurately than high-frequency sequences. There is little precedent for the latter effect, but studies of multistream statistical learning may provide some context for unpacking and extending the result.
Chapter 9 focuses on the claim that the language input that children are exposed to is not rich enough to explain how they can construct a mental grammar. This leads to the poverty of the stimulus argument in support of the Innateness Hypothesis, which holds that if the input is insufficient, children must be born with an innate system that bridges the gap between the poor input and the richness of their knowledge of language. We will examine in detail in which ways the input could be called poor. We then turn to Chomsky’s Principles and Parameters model of language acquisition, paying attention to certain developments in this model that reduced the role of innate knowledge. Along the way we also introduce two additional arguments. The argument from convergence is based on the fact that all learners that grow up in the same speech community end up with (essentially) the same mental grammar despite having received different input. We also mention the argument from speed of acquisition, which is based on the fact that language acquisition is “fast,” no matter how you measure it. We then review alternative, more empiricist, approaches to language acquisition.
We compare two frameworks for the segmentation of words in child-directed speech, PHOCUS and MULTICUE. PHOCUS is driven by lexical recognition, whereas MULTICUE combines sub-lexical properties to make boundary decisions, representing differing views of speech processing. We replicate these frameworks, perform novel benchmarking and confirm that both achieve competitive results. We develop a new framework for segmentation, the DYnamic Programming MULTIple-cue framework (DYMULTI), which combines the strengths of PHOCUS and MULTICUE by considering both sub-lexical and lexical cues when making boundary decisions. DYMULTI achieves state-of-the-art results and outperforms PHOCUS and MULTICUE on 15 of 26 languages in a cross-lingual experiment. As a model built on psycholinguistic principles, this validates DYMULTI as a robust model for speech segmentation and a contribution to the understanding of language acquisition.
How much information do language users need to differentiate potentially absolute synonyms into near-synonyms? How consistent must the information be? We present two simple experiments designed to investigate this. After exposure to two novel verbs, participants generalized them to positive or negative contexts. In Experiment 1, there was a tendency across conditions for the verbs to become differentiated by context, even following inconsistent, random, or neutral information about context during exposure. While a subset of participants matched input probabilities, a high proportion did not. As a consequence, the overall pattern was of growth in differentiation that did not closely track input distributions. Rather, there were two main patterns: When each verb had been presented consistently in a positive or negative context, participants overwhelmingly specialized both verbs in their output. When this was not the case, the verbs tended to become partially differentiated, with one becoming specialized and the other remaining less specialized. Experiment 2 replicated and expanded on Experiment 1 with the addition of a pragmatic judgment task and neutral contexts at test. Its results were consistent with Experiment 1 in supporting the conclusion that quality of input may be more important than quantity in the differentiation of synonyms.
We examined how noun frequency and the typicality of surrounding linguistic context contribute to children’s real-time comprehension. Monolingual English-learning toddlers viewed pairs of pictures while hearing sentences with typical or atypical sentence frames (Look at the… vs. Examine the…), followed by nouns that were higher- or lower-frequency labels for a referent (horse vs. pony). Toddlers showed no significant differences in comprehension of nouns in typical and atypical sentence frames. However, they were less accurate in recognizing lower-frequency nouns, particularly among toddlers with smaller vocabularies. We conclude that toddlers can recognize nouns in diverse sentence contexts, but their representations develop gradually.