Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-11T06:59:34.208Z Has data issue: true hasContentIssue false

A cross-linguistic examination of young children’s everyday language experiences

Published online by Cambridge University Press:  24 September 2024

John Bunce*
Affiliation:
Department of Human Development and Women’s Studies, California State University, East Bay, Hayward, CA, USA Department of Psychology, University of Manitoba, Winnipeg, MB, Canada
Melanie Soderstrom
Affiliation:
Department of Psychology, University of Manitoba, Winnipeg, MB, Canada
Elika Bergelson
Affiliation:
Department of Psychology, Harvard University, Cambridge, MA, USA
Celia Rosemberg
Affiliation:
Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental - CONICET, Buenos Aires, Argentina
Alejandra Stein
Affiliation:
Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental - CONICET, Buenos Aires, Argentina
Florencia Alam
Affiliation:
Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental - CONICET, Buenos Aires, Argentina
Maia Julieta Migdalek
Affiliation:
Centro Interdisciplinario de Investigaciones en Psicología Matemática y Experimental - CONICET, Buenos Aires, Argentina
Marisa Casillas*
Affiliation:
Department of Comparative Human Development, University of Chicago, Chicago, IL, USA Language Development Department, Max Planck Institute for Psycholinguistics, Nijmegen, GE, Netherlands
*
Corresponding authors: John Bunce and Marisa Casillas; Emails: john.bunce@csueastbay.edu; mcasillas@uchicago.edu
Corresponding authors: John Bunce and Marisa Casillas; Emails: john.bunce@csueastbay.edu; mcasillas@uchicago.edu
Rights & Permissions [Opens in a new window]

Abstract

We present an exploratory cross-linguistic analysis of the quantity of target-child-directed speech and adult-directed speech in North American English (US & Canadian), United Kingdom English, Argentinian Spanish, Tseltal (Tenejapa, Mayan), and Yélî Dnye (Rossel Island, Papuan), using annotations from 69 children aged 2–36 months. Using a novel methodological approach, our cross-linguistic and cross-cultural findings support prior work suggesting that target-child-directed speech quantities are stable across early development, while adult-directed speech decreases. A preponderance of speech from women was found to a similar degree across groups, with less target-child-directed speech from men and children in the North American samples than elsewhere. Consistently across groups, children also heard more adult-directed than target-child-directed speech. Finally, the numbers of talkers present in any given clip strongly impacted children’s moment-to-moment input quantities. These findings illustrate how the structure of home life impacts patterns of early language exposure across diverse developmental contexts.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Introduction

Across human populations, children’s early language experiences vary substantially with respect to who talks to them, what is talked about, and what the children themselves are expected to contribute (e.g., Brown, Reference Brown, Duranti, Ochs and Schieffelin2011; Brown & Gaskins, Reference Brown, Gaskins, Enfield, Kockelman and Sidnell2014; Casillas et al., Reference Casillas, Brown and Levinson2020; de León, Reference de León, Duranti, Ochs and Schieffelin2011; Demuth & Mputhi, Reference Demuth and Mputhi1979; Gaskins, Reference Gaskins, Enfield and Levinson2006; Ochs & Schieffelin, Reference Ochs, Schieffelin, Schweder and LeVine1984; Pye, Reference Pye1986; Rogoff et al., Reference Rogoff, Paradise, Arauz, Correa-Chávez and Angelillo2003; Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012; Vogt et al., Reference Vogt, Mastin and Schots2015). For example, home pedagogical techniques, such as caregiver use of rhetorical questions and directly addressed instructions, are more common in some linguistic contexts than others (e.g., US versus Mayan groups, see e.g., Gaskins, Reference Gaskins, Harkness and Super1996; Rogoff et al., Reference Rogoff, Paradise, Arauz, Correa-Chávez and Angelillo2003; Shneidman et al., Reference Shneidman, Gaskins and Woodward2016).

Research today, primarily revolving around urban, Western contexts, situates child-directed speech (CDS) – more specifically, interactive speech produced by adult caregivers – as fundamental for early language development (e.g., Cartmill et al., Reference Cartmill, Armstrong, Gleitman, Goldin-Meadow, Medina and Trueswell2013; Hoff, Reference Hoff2003; Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2017a, Reference Ramírez-Esparza, García-Sierra and Kuhl2017b). Recent findings converge on the idea that so-called “high quality” (interactive, one-on-one) CDS is a consistent and robust predictor of children’s growing vocabulary (e.g., Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014; Rowe, Reference Rowe2008). However, the focus of most research using CDS to predict vocabulary outcomes reflects the political and economic priorities of growing, urban societies – especially their need for a unified and literate workforce. These priorities may not generalize across understudied cultural-linguistic contexts, where other language phenomena (e.g., specific rhetorical practices) may prove more relevant (Ochs & Kremer-Sadlik, Reference Ochs and Kremer-Sadlik2020; Sperry et al., Reference Sperry, Miller and Sperry2015).

Recent cross-linguistic and cross-cultural work on typically developing children supports the idea that there is significant natural variation in children’s exposure to CDS. For example, Shneidman (Reference Shneidman2010; Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012) found an almost ten-fold difference in the proportion of child-directed speech in the linguistic environments of Chicago (US) and Yucatec Mayan (Mexico) children before age three. Scaff, Cristia, and colleagues find that Tsimane’-acquiring children (Bolivia) are directly spoken to infrequently, with recent estimates as low as approximately one half-minute per hour (Cristia et al., Reference Cristia, Ganesh, Casillas and Ganapathy2018; Scaff et al., Reference Scaff, Casillas, Stieglitz and Cristia2023). Relatedly, Casillas and colleagues (Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021) found surprisingly similar, and relatively infrequent rates of directed input in two rural populations with substantially different approaches to child language socialization (Tseltal Mayan (Mexico) and Rossel Island Papuan (Papua New Guinea)). A recurrent theme across much of this work examining CDS in rural and developing populations has been the role of input from other children (e.g., siblings, cousins, and other peers; see also Alam et al., Reference Alam, Ramírez and Migdalek2021; Cristia et al., Reference Cristia, Gautheron and Colleran2023; Loukatou et al., Reference Loukatou, Scaff, Demuth, Cristia and Havron2022). Cristia (Reference Cristia2023) pulls all these findings and more together into a systematic review, highlighting a consistent difference in higher versus lower input rates between urban and rural societies, respectively.

It is not yet understood how differences in CDS exposure play a role in how children process or learn language in their first few years. The emerging evidence on this topic in a cross-linguistic and cross-linguistic context is complex. For example, Ramírez-Esparza et al. (Reference Ramírez-Esparza, García-Sierra and Kuhl2017b) found that CDS heard in a group context (as opposed to one-on-one interactions) was related to vocabulary development in US Spanish-English bilinguals but not monolinguals from the same population. Consistent with this view, studies of populations where caregiver CDS appears relatively rare have found that young children meet language development milestones at roughly the same rate as children growing up in contexts where adult CDS is reported to be very common (Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021), though lexical development may be more sensitive (Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2017b; Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012).

In short, there is a great deal yet to learn about how language learning is supported by CDS and other sources of input. These other sources may include adult conversations that young children observe (passively or actively), CDS produced by other children, and multimodal and multiparty interactions (Alam et al., Reference Alam, Ramírez and Migdalek2021; Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Cristia et al., Reference Cristia, Ganesh, Casillas and Ganapathy2018, Reference Cristia, Gautheron and Colleran2023; de León, Reference de León1998; de León & García-Sánchez, Reference de León and García-Sánchez2021; Hou, Reference Hou2024; Loukatou et al., Reference Loukatou, Scaff, Demuth, Cristia and Havron2022; Scaff et al., Reference Scaff, Casillas, Stieglitz and Cristia2023).

The present study takes a first step toward describing multiple sources of input – not just CDS – across a linguistically and culturally diverse sample of young children. Specifically, we examine how child age and cultural-linguistic group influence the quantities of directly addressed and overhearable adult speech that children encounter in five distinct settings. Before we dive into the methods and findings, we will set up the present study with a brief overview of relevant work on measuring children’s linguistic input: first we define ‘child-directed speech’ and ‘adult-directed speech’ as we use them here; then we review the major factors known to influence the quantities of each input source; finally, we describe prior approaches taken in estimating these input sources from daylong audio recordings.

What counts as “child-directed” input?

A great deal of prior work has contrasted child- and adult-directed speech, but what gets counted as “child-directed” varies from study to study. There are two basic approaches. In the first, these two terms (“CDS” and “ADS”) are used to denote the intended addressee, i.e., child vs. adult. In the second approach, these terms denote the speech register or other characteristics of the speech, regardless of actual addressee. That is, any speech that contains the prosodic, lexical, grammatical, and affective characteristics typically associated with speech to children is classified as child-directed speech, regardless of who was being spoken to. In the present study, we will measure linguistic input quantities based on the first approach: the utterance’s intended addressee (e.g., separating speech exclusively directed to the target child versus to another child versus to an adult, etc.).

While qualitative properties of different input types are also vital to consider when constructing comparative theories of child language development (e.g., Bornstein et al., Reference Bornstein, Tal, Rahn, Galperin, Pecheux, Lamour and Tamis-LeMonda1992; Broesch et al., Reference Broesch, Rochat, Olah, Broesch and Henrich2016; Brown, Reference Brown, Arnon, Casillas, Kurumada and Estigarribia2014; de León & García-Sánchez, Reference de León and García-Sánchez2021; Masek et al., Reference Masek, Ramirez, McMillan, Hirsh-Pasek and Golinkoff2021; Ochs & Schieffelin, Reference Ochs, Schieffelin, Schweder and LeVine1984; Pye, Reference Pye2017), input quantities are ideal for roughly comparing the linguistic material children encounter in their daily lives. Moreover, input quantity estimates that are centered specifically on directed vs. non-directed speech can capture some aspects of input “quality”. Addressees have an advantage in comprehending conversational talk addressed to them over talk addressed to others, precisely because the conversational talk in question is tailored specifically to the addressees’ immediate comprehension (Bell, Reference Bell1984; Foushee et al., Reference Foushee, Srinivasan and Xu2021; Schober & Clark, Reference Schober and Clark1989). Thus, general (and likely universal) mechanisms of human coordination (Clark, Reference Clark1996) predict that child-addressed speech is a referentially clearer linguistic signal for the child learner than adult-directed speech.

In the present study, we compare adult-directed speech (ADS) quantities and target-child directed speech (TCDS) quantities, the latter being speech addressed specifically to the child under study, rather than to another nearby child. These measures represent two qualitatively distinct sources of linguistic input; our present study could thus be described as measuring the quantity of two quality types.

Factors shaping input quantity

A broad spectrum of factors has been suggested to influence the quantity of CDS children encounter in their daily lives. Much less work has investigated factors influencing the quantity of ADS children encounter. We briefly summarize the primary factors examined in prior work, from the macro scale to the micro scale. These factors inform the present study’s analyses.

On the macro end of the spectrum, CDS quantities are thought to be influenced through group membership – for example, via socioeconomic group membership or via culturally held beliefs and practices around child rearing (e.g., for a recent review, see Rowe & Weisleder, Reference Rowe and Weisleder2020; for reviews regarding language socialization and culture, see Gaskins, Reference Gaskins, Enfield and Levinson2006; Ochs & Schieffelin, Reference Ochs, Schieffelin, Schweder and LeVine1984). For example, regarding socioeconomic group, meta-analyses of nearby adult talk in daylong audio data (Piot et al., Reference Piot, Havron and Cristia2022) and CDS in naturalistic, unstructured interaction data (Dailey & Bergelson, Reference Dailey and Bergelson2022) suggest a small but significant positive correlation of linguistic input quantity with socioeconomic status (but cf. Bergelson et al., Reference Bergelson, Soderstrom, Schwarz, Rowland, Ramirez-Esparza, Hamrick, Marklund, Kalashnikova, Guez, Casillas, Benetti, Alphen and Cristia2023).

Regarding cultural group, some prior work found no evidence for differences in baseline TCDS rate between Tseltal- and Yélî Dnye-speaking children under age three, despite clear ethnographic evidence that adults in these two communities take very different approaches to talking to infants and young children (“non-child-centric” vs. “child-centric” input environments; Brown, Reference Brown, Duranti, Ochs and Schieffelin2011, Reference Brown, Arnon, Casillas, Kurumada and Estigarribia2014; Brown & Casillas, Reference Brown, Casillas, Fentiman and Goody2025; Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021). In contrast, Shneidman and Goldin-Meadow (Reference Shneidman and Goldin-Meadow2012) did observe clear differences in US and Yucatec Mayan children’s input quantities, with the US children under age three hearing significantly more directed input. This evidence concords with Cristia’s (Reference Cristia2023) characterization of the primary split in input quantities being in rural versus urban populations, rather than differences between individual cultural groups. Complementing this work on input quantity, studies of input quality consistently show clear cross-cultural variability in how often children are talked to, by whom, and what is talked about (e.g., de León, Reference de León, Duranti, Ochs and Schieffelin2011; Demuth & Mputhi, Reference Demuth and Mputhi1979; Gaskins, Reference Gaskins, Enfield and Levinson2006; Ochs & Schieffelin, Reference Ochs, Schieffelin, Schweder and LeVine1984; Pye, Reference Pye1986; Rogoff et al., Reference Rogoff, Paradise, Arauz, Correa-Chávez and Angelillo2003; Rosemberg et al., Reference Rosemberg, Alam, Audisio, Ramirez, Garber and Migdalek2020, Reference Rosemberg, Alam, Ramirez and Ibañez2023; Stein et al., Reference Stein, Menti and Rosemberg2021; Vogt et al., Reference Vogt, Mastin and Schots2015).

In the meso part of the spectrum, children’s age and available interactants may also shape input quantities. Regarding age, prior work does not consistently demonstrate evidence of change in CDS quantity with child age but does demonstrate age-related change for other input sources, including ADS and non-canonical CDS (Bergelson et al., Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b; Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2017b; see also Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012). Regarding available interactants, prior work points to a greater availability of CDS from adults compared to children – and, among adults, from women compared to men (e.g., Bergelson et al., Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b; Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012). As noted above, however, a recurrent theme in work on rural populations is the presence of other children and hence the high prevalence of peer-produced CDS (Alam et al., Reference Alam, Ramírez and Migdalek2021; Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Cristia et al., Reference Cristia, Gautheron and Colleran2023; Loukatou et al., Reference Loukatou, Scaff, Demuth, Cristia and Havron2022; Scaff et al., Reference Scaff, Casillas, Stieglitz and Cristia2023).

Lastly, on the micro end of the spectrum, CDS and ADS rates fluctuate moment to moment given factors such as the ongoing activity (e.g., playing or eating), the number of potential interactants present, the physical condition of the target child and their surrounding family (e.g., sleeping/awake, stationary/in motion), and more. Soderstrom and colleagues (Soderstrom et al., Reference Soderstrom, Grauer, Dufault and McDivitt2018; Soderstrom & Wittebolle, Reference Soderstrom and Wittebolle2013) found that linguistic input rates systematically varied depending on the activity context and number of adults present in Canadian daylong recordings (see also Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Greenwood et al., Reference Greenwood, Thiemann-Bourque, Walker, Buzhardt and Gilkerson2011; Rosemberg et al., Reference Rosemberg, Alam, Audisio, Ramirez, Garber and Migdalek2020, Reference Rosemberg, Alam, Ramirez and Ibañez2023). Although we will not have information about activity context in the present work, we will at least be able to account for the number of individual talkers present. When there are more talkers there is more talk. That is, the presence of each additional talker increases competition for the conversational floor, and when four or more talkers are present, group conversations often split into smaller, simultaneous conversations, multiplying the amount of observable talk (conversational “schism” see, e.g., Holler et al., Reference Holler, Alday, Decuyper, Geiger, Kendrick and Meyer2021; Sacks et al., Reference Sacks, Schegloff and Jefferson1978). Minimally, the number of talkers present can be considered a nuisance variable to help explain fluctuations in CDS and ADS rate over the day. More informatively, however, the number of talkers may serve as a proxy for interactional contexts that involve denser family participation (e.g., in overlapping, co-present subgroups) versus contexts where smaller groups of individuals are on their own.Footnote 1

Extracting CDS and ADS from daylong recordings

Ecologically valid estimates of speech input rates are now possible via long-format (e.g., daylong) recordings of children’s home language environments (e.g., LENA, Greenwood et al., Reference Greenwood, Thiemann-Bourque, Walker, Buzhardt and Gilkerson2011; Bergelson et al., Reference Bergelson, Amatuni, Dailey, Koorathota and Tor2019a; see Pisani et al., Reference Pisani, Gautheron and Cristia2021 for a review). However, to date these recording systems cannot reliably and automatically differentiate between CDS and ADS across a variety of recording settings (for a promising start, see Bang et al., Reference Bang, Kachergis, Weisleder, Marchman, Gong and Kpogo2022). Studies that have leveraged daylong recordings have therefore relied on manual annotation to supplement any automated output, taking several different approaches. For example, Weisleder and Fernald (Reference Weisleder and Fernald2013) manually classified 5-min blocks of time as primarily child-directed or adult-directed, while Ramírez-Esparza and colleagues (Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2017a, Reference Ramírez-Esparza, García-Sierra and Kuhl2017b) manually annotated speech-dense clips of audio as having: (1) speech addressed to the child; (2) speech containing the parentese register features of CDS versus ADS (independent of addressee); and (3) who was present as a conversational partner. Moving from the audio-clip level to the utterance level, Bergelson et al. (Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b) extracted individual utterances using LENA’s automated utterance annotations and then annotated them as child- or adult-directed, based on recognizable CDS and ADS register characteristics.

While these studies examine CDS and ADS in large and highly naturalistic datasets, they either take a very coarse perspective (e.g., examining 5-minute intervals), or tell us about input patterns during the day’s interactional peaks rather than illustrating patterns in children’s average language experiences over the course of a day. In order to extract a representative measure of linguistic input, i.e., how much language children encounter from different types of people in different types of interactional contexts across their day (including typical “down” time), we must take random or periodic samples of the language environment (Casillas & Cristia, Reference Casillas and Cristia2019; see also Alam et al., Reference Alam, Ramírez and Migdalek2021; Rosemberg et al., Reference Rosemberg, Alam, Audisio, Ramirez, Garber and Migdalek2020, Reference Rosemberg, Alam, Ramirez and Ibañez2023; Stein et al., Reference Stein, Menti and Rosemberg2021) rather than only analyzing interactional peaks or estimating across time periods. To gather accurate and representative estimates of natural, at-home CDS and ADS in the present study, we therefore randomly sampled clips from daylong audio recordings and fully transcribed all hearable speech, annotating intended addressee for each utterance in each clip.Footnote 2

The current work

We examine baseline rates of target-child-directed speech (TCDS) and adult-directed speech (ADS) in the daylong recordings of children growing up in five culturally and linguistically distinct groups: North American English (“NA English”; US & Canadian), United Kingdom English (“UK English”; England), Argentinian Spanish (“Arg. Spanish”; Argentina), Tseltal (Tenejapa, Mayan, Mexico), and Yélî Dnye (Rossel Island, Papuan, Papua New Guinea). As detailed below, some of these corpora include samples from multiple, distinct sub-populations (e.g., NA English includes both US and Canadian English), so we hereafter refer to each of these samples as “language groups” rather than “languages”. This unique metacorpus draws on seven pre-existing collections of daylong recordings (“corpora”) that were gathered by different research teams, with a variety of different recording devices (i.e., not all LENA), and for a range of different research purposes.Footnote 3 Our primary objective was to quantitatively measure the exposure of young children in these groups to two different sources of linguistic input – TCDS and ADS – and to examine several factors associated with variation in this exposure: age, language group, talker type, and number of talkers present.

To accomplish this goal, we defined a second, critical objective: to generate an audio sampling and annotation approach that could be fruitfully employed across recordings made in culturally and linguistically diverse populations (Soderstrom et al., Reference Soderstrom, Casillas, Bergelson, Rosemberg, Warlaumont and Bunce2021). As motivated above, our analyses focus on two distinct types of linguistic input: TCDS and ADS. Our annotation scheme additionally allows us to examine other types of input, e.g., OCDS (other-child-directed speech, i.e., speech directed to children other than the target child; see Figure 1). For the sake of simplicity, we report data for OCDS in the Supplementary Materials Section 1, where we combine it with TCDS to generate parallel analyses of all-CDS (TCDS + OCDS), with similar results to what is reported below.

Figure 1. Summary of clip selection and annotation method across corpora.

Exploratory hypotheses

Following from the findings summarized above, the specific aims of our analysis were to examine how TCDS and ADS varied across age, language group, talker type, and number of talkers in a highly naturalistic and culturally varied set of daylong recordings. To the prior literature, the present work adds an apples-to-apples comparative view on these effects, given that each of the included corpora recorded, sampled, and annotated children’s input in highly similar ways (Soderstrom et al., Reference Soderstrom, Casillas, Bergelson, Rosemberg, Warlaumont and Bunce2021). Before analysis, we established a specific set of exploratory hypotheses – with corresponding regression formulae – regarding TCDS and ADS. We term them “exploratory” here because each corpus includes data from only 9–10 children, which is large relative to prior comparable work, but still small overall (Table 1). These hypotheses were slightly different for TCDS and ADS, based on findings from prior work (see Tables 2 and 3 for detailed overviews).

Parentheses following the mean indicate the range across participants.

Table 2. Predictions for TCDS analysis. Asterisk indicates effects previously observed with daylong child language data (Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Scaff et al., Reference Scaff, Casillas, Stieglitz and Cristia2023)

The ‘Supported’ column reflects the extent to which each current finding aligns with its predicted outcome.

Table 3. Predictions for ADS analysis. Asterisk indicates effects previously observed with daylong child language data (Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Scaff et al., Reference Scaff, Casillas, Stieglitz and Cristia2023)

The ‘Supported’ column reflects the extent to which each current finding aligns with its predicted outcome.

Target-child-directed speech hypotheses

Based on the work cited above, we expected TCDS rate to vary across language groups (e.g., to be higher in more urban groups) and to come most often from women, but with a greater presence of other-child-produced TCDS in some groups (Tseltal, Yélî Dnye, Argentina). We did not expect any effects of child age on TCDS rate. We expected that the TCDS rate would be higher when more talkers were present, given the idea that more talkers produce more talk.

Adult-directed speech hypotheses

We expected for ADS rate to vary across language groups (e.g., to be lower in more urban contexts), to decrease significantly with child age (especially in groups with high ADS rates early on, e.g., Yélî Dnye), to come most often from women, but with greater contributions from children in some groups (Tseltal, Yélî Dnye, Argentina). We also expected that the ADS rate would be higher when more talkers were present.

We limited our hypotheses to simple effects and two-way interactions. We might anticipate other, more complex effects (e.g., the three-way interaction of age-language group-talker type on TCDS rate), but given the limited size of our metacorpus (N = 10 recordings per corpus maximum) we leave these effects to be tested in future, larger datasets.

The present paper is the first to bring together all these different factors known to influence TCDS and ADS and to examine their joint effects across multiple language groups (see also Bergelson et al. (Reference Bergelson, Soderstrom, Schwarz, Rowland, Ramirez-Esparza, Hamrick, Marklund, Kalashnikova, Guez, Casillas, Benetti, Alphen and Cristia2023) for a LENA-based, comparative mega-analysis of nearby adult input). We examine these factors in order to identify axes of consistency and variation across the multi-corpus sample. While similar analyses have been previously conducted on two individual corpora (Tseltal and Yélî Dnye; Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021), the current study offers a first view into baseline TCDS and ADS rates at home in North American English, UK English, and Argentinian Spanish while simultaneously providing a comparative perspective on how each of the proposed factors – group, child age, and interactants – influences children’s input rates.

To reiterate, we do not attempt to examine all possible interactions, and instead take a hypothesis-driven approach to analysis.Footnote 4 Importantly, while we identify key points of theoretically relevant variation across the samples in this study, we do not argue that these language groups represent the full spectrum of diversity in linguistic input experiences, even within the specific populations we have sampled.

Methods

Metacorpus construction

We use the Analyzing Child Language Experiences around the World (ACLEW) metacorpus (Soderstrom et al., Reference Soderstrom, Casillas, Bergelson, Rosemberg, Warlaumont and Bunce2021) of long-form audio recordings of children’s everyday language environments, comprising seven corpora from five culturally and linguistically distinct groups, labeled here as: North American English (NA English), United Kingdom English (UK English), Argentinian Spanish (Arg. Spanish), Tseltal, and Yélî Dnye.Footnote 5 Each group is represented by a single corpus except North American English, for which we had access to three corpora. Recordings for each corpus were originally collected for the unique research purposes of the individual lab contributing the corpus, and therefore there is variation across corpora in the recruitment practices, recording equipment (i.e., not all LENA), recording duration, target child ages (see Supplementary Materials Section 3), and other demographic characteristics (see Table 1 for an overview).

Sampling technique

Our sampling and annotation scheme needed to be suitable for daylong recordings of different durations made with different recording devices, and for variable annotation situations (e.g., in a lab or in the field).

We selected a single day’s recording for 10 children from each corpus, except the McDivitt-Winnipeg corpus from which we selected 9 recordings due to a sampling error (total recordings N = 69); this sample size per corpus reflects what was possible with the smallest corpora in our sample. We used a script to select recordings that were as balanced within and across corpora in reported child gender (male/female), maternal education (below high school–advanced degree), and child age (0;2–3;0; see https://osf.io/pysth/ for details). The range of available ages was more limited in North American English compared to the other corpora but our statistical approach accounts for this (also see the Supplementary Materials Section 3). Five of the included recordings overlap with those used in Bergelson et al. (Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b) and the same Tseltal and Yélî Dnye annotations have been analyzed somewhat differently in separate work (Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021).

Each dataset and contributing lab came with a specific set of constraints on what was possible for manual annotation work (e.g., teams of undergraduate students versus individual collaborations with native speakers in remote field sites), so we settled on two basic techniques for sub-sampling and transcribing data from these long-format recordings. These methods for sampling and preparing clips for annotation are illustrated in Figure 1.

For North American English, UK English, and Argentinian Spanish (49 of the 69 recordings), we wrote a Python script to randomly pick start times for 15 two-minute clips from throughout the day of each recording, excluding any possibility of clip overlap. The script selected the start and stop times of each clip, as well as the start and stop times of an associated three-minute context period for each clip (see Figure 1, upper left). Thus each of the 15 clips per recording contained one minute of prior context, followed by two minutes of audio to be transcribed and annotated, followed by two more minutes of additional context. The start and stop times of the context and to-be-transcribed clips were then added automatically to a single ELAN (Wittenburg et al., Reference Wittenburg, Brugman, Russel, Klassmann and Sloetjes2006) audio annotation file that spanned the entire recording. This process resulted in 30 total minutes of annotation per recording.

The Tseltal and Yélî Dnye corpora (20 of the 69 recordings) used a similar method, except only 9 clips were randomly selected. However, the clips were longer than in the other corpora. Tseltal clips were 5 minutes long and Yélî Dnye clips were 2.5 minutes long, resulting in a total of 45 minutes and 22.5 minutes of annotation per recording for the Tseltal and Yélî Dnye corpora, respectively. The five-minute clips in Tseltal had no additional context; this length of clip already provides significant context. The 2.5-minute clips for Yélî Dnye were followed by an additional 2.5 minutes of recording context. Thus, the total context review period for annotation clips across all corpora was five minutes (Figure 1).

Minor deviations in the sampling process between corpora are not expected to have meaningful effects on the analyses: all clips are short and randomly selected from throughout the child’s waking day. These deviations arose because the Tseltal and Yélî Dnye datasets required significant contributions from native local speakers in each remote community sampled, and so the annotation workflow was adapted to suit the associated researcher’s fieldwork schedule.

The final clip collection therefore consists of 35.8 hours of transcribed and annotated recording time, of which 16.3 hours consists of communicative vocalizations. Given the constraints across corpora on transcription work hours, this was near the ceiling of manual annotation data we could generate. It was unknown in advance how many recording minutes would be needed to produce meaningful results. That said, Casillas et al. (Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021) found that the present amount and distribution of recording minutes were sufficient to detect many of the effects predicted here. Their findings are especially promising for the current set of analyses, which includes a similar statistical approach and re-uses those two datasets (now with additional corpora for comparison). Recent studies (Cychosz et al., Reference Cychosz, Villanueva and Weisleder2020; Marasli & Montag, Reference Marasli, Montag, Goldwater, Anggoro, Hayes and Ong2023; Micheletti et al., Reference Micheletti, de Barbaro, Fellows, Hixon, Slatcher and Pennebaker2020) have started building up a more general approach to sampling naturalistic behavior from daylong recordings, but a lack of prior knowledge about the distribution of different input densities from different types of talkers across these groups prevented us from being able to confidently peg our sampling technique to anticipated underlying effects. To counteract what we anticipated would be limited statistical power, we planned to only analyze effects for which we had strong a priori predictions (see an overview in Tables 2 and 3).

Annotation technique

Each of the randomly selected segments was annotated using the ACLEW Annotation Scheme (https://osf.io/b2jep/, Casillas et al., Reference Casillas, Bergelson, Warlaumont, Cristia, Soderstrom, VanDam, Sloetjes, Lacerda, House, Heldner, Gustafson, Strömbergsson and Włodarczak2017a; Soderstrom et al., Reference Soderstrom, Casillas, Bergelson, Rosemberg, Warlaumont and Bunce2021), an ELAN-based approach (Wittenburg et al., Reference Wittenburg, Brugman, Russel, Klassmann and Sloetjes2006). Each annotator undergoes a rigorous and independent training and testing process to ensure intra- and inter-lab consistency in coding. Annotators segmented and transcribed all hearable human communicative vocalizations in the samples, with a separate tier for each individual talker to allow for overlapping talk. Each tier was identified by the talker’s perceived age and gender category (adult/child/unknown and female/male/unknown; e.g., FA1 = female adult 1 in Figure 1). All utterances (except the target child’s) were also annotated for the intended addressee in seven categories – exclusively target child, non-target child, adult, mixed-age, animal, other, unknown – on the basis of any available contextual and interactional information within the audio recordings.

Annotator reliability was checked by the complete re-annotation of one-minute from each recording by a new annotator. We then compared the original minute’s annotations to the re-coded minutes’ annotations. A full reliability report is available at https://osf.io/pysth/, but to briefly summarize, error estimates for talker type annotations (e.g., disagreements about whether the talker is the target child or a different child) are far better than prior work has found between human and LENA (i.e., automated) annotations. Further, comprehensive kappa scores reflect moderate-to-substantial agreement (cross-corpus k range = 0.55–0.68) for talker types and slight-to-substantial agreement (cross-corpus k range = 0.32–0.64) for addressee, with wide variability in agreement between corpora. Despite CDS having some cross-linguistically recognizable features (e.g., Bornstein et al., Reference Bornstein, Tal, Rahn, Galperin, Pecheux, Lamour and Tamis-LeMonda1992; Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989; Hilton et al., Reference Hilton, Moser, Bertolo, Lee-Rubin, Amir, Bainbridge, Simson, Knox, Glowacki, Alemu, Galbarczyk, Jasienska, Ross, Neff, Martin, Cirelli, Trehub, Song, Kim, Schachner and Mehr2022), we had expected somewhat lower reliability scores for addressee annotations because the reliability annotators were not always native speakers of the language of the file they were annotating; their annotation decisions were thus less informed by lexicosyntactic content than the (native-speaking) original annotators’. Most cases of disagreement arose when one annotator indicated silence or overlapping talk where the other annotator indicated talk from a single person – confusion between actual addressee categories was relatively low (see Supplementary Materials Section 8 for more details).

Data analysis

All statistical analyses were conducted in R with the glmmTMB package (Brooks et al., Reference Brooks, Kristensen, van Benthem, Magnusson, Berg, Nielsen, Skaug, Maechler and Bolker2017; R Core Team, Reference Team2019) and all figures were generated with ggplot2 (Wickham, Reference Wickham2016). Analysis scripts and raw anonymized data are available at https://osf.io/pysth/. Our two dependent measures were the rates of TCDS and ADS (both expressed in minutes per hour). We calculated TCDS and ADS input rate for each clip for each of three talker types: female adults (here “women”), male adults (here “men”), and children (here “children”, including both male and female children). All other utterances (e.g., language addressed to animals and language produced by electronic devices) were excluded. As motivated above, we designate TCDS versus ADS utterances based on who they were perceivably addressed to: ‘TCDS’ includes communicative utterances that were addressed exclusively to the target child (from an adult or another child). ‘ADS’ includes communicative utterances addressed to one or more adults (from an adult or from another child).

TCDS and ADS input rate cannot be negative. In practice, they are modally zero or close to zero across clips. Given our random sampling technique, which can include periods of silence, many clips include no TCDS or ADS. These “down” times for input are part of the representative pattern of children’s language experienceFootnote 6 but also present an analytical challenge: observed cases of 0 TCDS/ADS in many clips combined with a skewed non-negative distribution of > 0 TCDS/ADS in other clips. This distribution of TCDS/ADS across sampled clips cannot be modeled with the assumption of normality. We therefore used zero-inflated negative binomial mixed-effects regressions for our analyses. This regression type uses a two-model approach to overcome non-negative, overdispersed data with extra cases of zero – the case for the present data (Brooks et al., Reference Brooks, Kristensen, van Benthem, Magnusson, Berg, Nielsen, Skaug, Maechler and Bolker2017; Smithson & Merkle, Reference Smithson and Merkle2013).

The two model components constructed for the analyses of TCDS and ADS are: (1) a zero-inflation model (indicated by “ziformula” in the model formulae), which uses a logistic regression to model the likelihood of the presence of ‘zero’ cases in the data (e.g., answering questions like ‘are zero-TCDS clips less likely for older target children?’) and (2) a count model, which uses linear regression to model how the non-zero rate of TCDS/ADS is influenced by the predictors of interest (e.g., answering questions like ‘is TCDS rate higher for older target children?’). The a priori predictions we laid out above can be applied to both model components, as shown in Tables 2 and 3.

The simple effects included in the models were target child age (centered and standardized from age in months), number of talkers present in that clip (centered and standardized from the unique number of talkers across all clips), talker type (woman versus man/child), and language sample (North American English versus UK English/Argentinian Spanish/Tseltal/Yélî Dnye). We only included interactions for which we had a strong a priori hypothesis and thus the models for TCDS and ADS differ slightly in their structure (see the Results for the regression formulae).

We modeled language group and talker type as dummy-coded factorial variables, which limited our ability to make comparisons among language groups; e.g., if Tseltal were the reference level, the model outcomes for language group would give pairwise comparisons between Tseltal and all the other language groups, but not pairwise comparisons between other language groups, for example, between Argentinian Spanish and UK English. We selected ‘North American English’ and ‘women’ as our default reference levels for reporting model estimates below, given that North American English and linguistic input from female adults are the most well represented in (a) the current dataset and (b) prior work done on these populations. In addition, we were interested in establishing under-studied patterns that may be present in our dataset – effects that diverge from groups that are currently over-represented in the literature. Setting these levels as a reference gives us a first glimpse into the variation that has gone under-examined in past work. This analysis should not be understood as positioning North American English as a global default for understanding development.

That said, pairwise comparisons between language groups may also be of interest to readers. For those curious about how the reported effects below are impacted by the selected reference level of language group, we include versions of our models with each language group as the reference level in the Supplementary Materials (i.e., four additional versions of the TCDS and ADS model each; Section 6). Here in the main text our results focus on models of TCDS and ADS with North American English as the reference level for language group, and women as the reference level for talker type.

Results

Descriptive statistics for observed TCDS and ADS rates by language group and talker type are shown in Table 4 and in Figure 2. A visual summary of statistical model outcomes from the count models of TCDS and ADS rate is shown in Figure 3. Further, marginal mean plots of model-predicted TCDS and ADS rates across age, language group, and talker type are available in Supplementary Materials Section 7. In Tables 2 and 3 we provide a high-level summary of which hypothesized outcomes were statistically supported in the regressions described below.

Table 4. Average input rates per clip across participants for each language group. Note that these descriptive statistics are raw rates and therefore reflect overall differences between corpora without controlling for, e.g., number of talkers present, which are accounted for in the statistical analyses

Parentheses following the mean indicate the median and range across participants in each group.

Figure 2. Mean by-recording rates of TCDS (above) and ADS (below) min/hr rates across language groups and talker types. For example, the upper-leftmost datapoint shows a recording with an average of 10 minutes per hour of TCDS from women talkers in North American English. The left-to-right order of language group within each of the six panels matches the order shown in the legend (NA English, UK English, Arg. Spanish, Tseltal, Yélî Dnye).

Figure 3. Coefficients with 95% confidence intervals from the count models of TCDS (left) and ADS (right) for all included fixed effects in the model with NA English and women set as the reference levels for language group and talker type, respectively. Intervals not overlapping with zero indicate significance. Color indicates population, ‘C’ and ‘M’ indicate effects related to child- and man-produced utterances, respectively. For example, both the left and the right panels show that both child- and man-produced input rates are significantly lower compared to the reference levels of woman-produced input. Note that the fixed effects included in each model are determined by the predictions laid out above separately for TCDS and ADS.

As a reminder, we report results from the models of TCDS and ADS with North American English as the reference level for language group and women as the reference group for talker type. Identical models with the full range of alternate reference levels for language group are available in the Supplementary Materials (Section 6). Unless otherwise noted, the significant effects reported below are qualitatively similar (i.e., significant in the same direction) in all alternate models.

Target-child-directed speech

On average, across all recordings, children were exposed to 3.66 minutes of TCDS per hour (median = 3.24), with substantial individual variation between children (range = 0–10.12). Our model of TCDS rate included target child age (numeric; standardized), talker type (factorial; woman/man/child), the number of talkers present in the clip (numeric; standardized), and language group (factorial; NA English/UK English/Argentinian Spanish/Tseltal/Yélî Dnye), with two additional two-way interactions (talker type by language group and child age by talker type) and random intercepts by child. The zero-inflation model component included child age and language group as predictors (N = 2745 clips, log-likelihood = −2,703.72, overdispersion estimate = 8.94; formula = TCDS.min.p.hr ~ child.age + talker.type + num.tlkrs.in.clip + lang.grp + talker.type:lang.grp + child.age:talker.type + (1 | child.id), ziformula = ~ child.age + lang.grp).

Effects of child age, talker type, and number of talkers present

As predicted, we found no evidence that TCDS changed with age (B = −0.03, SE = 0.09, z = −0.31, p = 0.76). TCDS rate was significantly lower for men compared to women (B = −2.03, SE = 0.19, z = −10.69, p < 0.001) and for children compared to women (B = −3.54, SE = 0.37, z = −9.64, p < 0.001). TCDS rate was also significantly higher when there were more talkers present (B = 0.33, SE = 0.04, z = 7.62, p < 0.001).

Effects relating to language group

The baseline rate of TCDS input in North American English was estimated to be significantly higher than Yélî Dnye (B = −0.95, SE = 0.32, z = −2.97, p < 0.01), with no evidence for difference in baseline TCDS rate between North American English and any other language group (all p’s ≥ 0.58). TCDS input rate from men varied between language groups: compared to North American English, the TCDS rate from men was significantly higher in both Argentinian Spanish (B = 0.70, SE = 0.31, z = 2.29, p = 0.02) and Yélî Dnye (B = 0.75, SE = 0.37, z = 2.02, p = 0.04). Similarly, TCDS from children varied between language groups: compared to North American English, TCDS rates from children were significantly higher in all other language groups (UK English: B = 1.10, SE = 0.51, z = 2.16, p = 0.03; Argentinian Spanish: B = 1.58, SE = 0.45, z = 3.48, p < 0.001; Tseltal: B = 1.91, SE = 0.49, z = 3.91, p < 0.001; Yélî Dnye: B = 2.81, SE = 0.46, z = 6.11, p < 0.001).

Interaction between child age and talker type

We found no evidence that TCDS from men changed with age relative to TCDS from women (B = −0.13, SE = 0.13, z = −1.01, p = 0.31). In contrast, TCDS from children increased with age relative to TCDS from women (B = 0.29, SE = 0.12, z = 2.35, p = 0.02).

The zero-inflation regression component did not suggest any additional evidence for effects of child age or language group (North American English versus other groups) on the likelihood of a clip containing zero TCDS (all p’s ≥ 0.27).

Adult-directed speech

On average, across all recordings, children were exposed to 10.08 minutes of ADS per hour (median = 8.34), again with considerable variation between children (range = 0–38.54). Our model of ADS rate included target child age (numeric; standardized), talker type (factorial; woman/man/child), number of talkers in the clip (numeric; standardized), and language group (factorial; NA English/UK English/Argentinian Spanish/Tseltal/Yélî Dnye), with two additional two-way interactions (talker type by language group and child age by language group) and random intercepts by child. The zero-inflation model component only included language group; we had planned to also include child age in the zero-inflation component, but its inclusion led to model non-convergence issues. Child age remained a predictor in the count model (N = 2745 clips, log-likelihood = −4,190.69, overdispersion estimate = 15.69; formula = ADS.min.p.hr ~ child.age + talker.type + num.tlkrs.in.clip + lang.grp + talker.type:lang.grp + child.age:lang.grp + (1 | child.id), ziformula = ~ lang.grp).

Effects of child age, talker type, and number of talkers present

ADS rate decreased significantly with age (B = −0.31, SE = 0.13, z = −2.45, p = 0.01), but this effect was non-significant in some alternate models with other reference levels (see Supplementary Materials Section 6) and so should be considered preliminary. We note that, across all alternate models, the estimate for an effect of age remained numerically negative. ADS rate was also significantly lower for men compared to women (B = −1.00, SE = 0.13, z = −7.77, p < 0.001) and for children compared to women (B = −0.89, SE = 0.12, z = −7.16, p < 0.001). This result, suggesting a difference between children and women, depends on which language group is chosen for the reference level: it is non-significant, though still numerically negative, when UK English is set as the reference level (see Supplementary Materials Section 6). As with TCDS, ADS was significantly higher when there were more talkers present (B = 0.71, SE = 0.03, z = 21.54, p < 0.001).

Effects relating to language group

There was no evidence for differences between baseline ADS input rate in North American English and any other language group (all p’s ≥ 0.22). There was also no evidence that the difference in women’s and men’s ADS input rates varied between North American English and any other language group (all p’s ≥ 0.14). In contrast, the difference in women’s and children’s ADS input rates was significantly smaller in UK English compared to North American English (B = 0.60, SE = 0.26, z = 2.31, p = 0.02). There was also no evidence that age-related change in ADS input rates varied between North American English and any other language group (all p’s ≥ 0.11).

The zero-inflation regression component did not suggest any additional evidence for effects of child age or language group (North American English versus other groups) on the likelihood of a clip containing zero ADS (all p’s ≥ 0.99).

Readers interested in exploring further pairwise comparisons of TCDS and ADS effects between language groups (e.g., Tseltal versus UK English) are encouraged to view alternate versions of the models of TCDS and ADS in the Supplementary Materials (Section 6).

Discussion

We examined how two input sources, TCDS and ADS, vary in children’s early language environments, depending on child age, talker type, language group, and number of talkers present. Our data come from a metacorpus of 69 daylong recordings from children under three in five culturally and linguistically distinct groups. The present paper is the first to examine the joint effects of these factors across multiple language groups, shedding light on typical patterns in children’s early language experiences across these different contexts. This project also presented a successful model for sampling and annotating child language data in a unified manner across different labs. In this discussion we highlight four major findings: (1) minimal effects of age; (2) women’s input predominates, men’s is rare, and children’s varies between language groups; (3) more talkers leads to more talk; and (4) minimal evidence for baseline differences in TCDS and ADS input rates between language groups. While many of the predictions we made initially were supported, some were not (Tables 2 and 3). In what follows, we briefly discuss each of the four major findings highlighted, raising the most relevant implications of each.

Minimal effects of age

TCDS rate showed no significant change across this developmental period (0;0–3;0) while ADS rate significantly decreased with target child age. The result replicates prior findings on daylong TCDS and ADS in a subset of these groups (Bergelson, Reference Bergelson2020; Bergelson et al., Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b; Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021). However, this significant effect of target child age on ADS rate should be taken as preliminary, given that alternate reference-level models do not always show this effect. The lack of evidence for an increase in TCDS rate with age, consistent with our predictions, may appear inconsistent with the findings reported in Ramírez-Esparza et al.’s (Reference Ramírez-Esparza, García-Sierra and Kuhl2017a) study. The substantial differences in their and our constructs (i.e., “parentese” register versus TCDS) and measurement approach (i.e., clip-by-clip classification versus utterance-based transcript analysis) unfortunately prevent direct comparison between these two studies, but future work may examine both approaches within the same corpus to get a more comprehensive view of how child age impacts the quantities of different input sources.

It is not yet clear what would lead to this decrease in ADS with target child age. One existing proposal is that children become independently able to wander away from adult conversation as they gain mobility and independence (Bergelson et al., Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b). This proposal is consistent with our main result, but confirming it would require information beyond the current recordings, and we would moreover need to explain why the decrease in ADS is sensitive to which language group is selected as the reference.

The lack of evidence for an increase in TCDS during this early period, when we know that children experience immense growth in their linguistic knowledge and processing capacity, aligns with recent work reasoning that growth in early linguistic skills reflects children’s changing efficiency and sophistication in extracting relevant information from their ambient linguistic environments, as opposed to direct changes to their linguistic input (see Bergelson, Reference Bergelson2020 for a review). Rather than attributing development to changes in the input, this theoretical approach looks instead to growth in children’s ability to engage in real-time language prediction and use of already acquired world and symbolic language knowledge (Bergelson, Reference Bergelson2020; Meylan & Bergelson, Reference Meylan and Bergelson2022; Snedeker et al., Reference Snedeker, Geren and Shafto2007). To this account, the current findings add a preliminary but important cross-linguistic datapoint: this basic idea may hold across diverse linguistic contexts.

Women’s input predominates, men’s is rare, and children’s varies between language groups

Regarding the talkers producing children’s input, we found that women predominate in children’s language environments. The prevalence of woman-produced language over man- and child-produced language was evident for both TCDS and ADS. However, the extent to which women’s input predominates – especially for TCDS – varied. The rate of TCDS produced by men was significantly higher in Argentinian Spanish and Yélî Dnye compared to North American English. The rate of TCDS produced by children was significantly higher in all language groups compared to North American English, and TCDS produced by children increased more with age relative to TCDS produced by women. In contrast, we found very little evidence for change in the rates of ADS from different talkers across age or language group: only UK English showed a significant difference from North American English, with a significantly smaller gap between children and women’s ADS rates. We are cautious in interpreting this lack of evidence for age-related change within speaker types given our limited sample size.

One implication of these findings is that, across these different language groups, women’s input plays an outsize role in their children’s input, both in terms of directed and observable language. While there was very clear cross-linguistic variation in the contribution of different talker types, this central role for woman-produced linguistic input was clear across our dataset. We are far from the first researchers to make this observation for child language input (see, e.g., Bateson, Reference Bateson and Bullowa1979; Bergelson et al., Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019b; Bruner, Reference Bruner1983; Cooper & Aslin, Reference Cooper and Aslin1989; Mannle et al., Reference Mannle, Barton and Tomasello1992), and talker-specific effects on early linguistic representations have been demonstrated previously in experimental tests of implicit language knowledge (e.g., Bergelson & Swingley, Reference Bergelson and Swingley2018; Hillairet de Boisferon et al., Reference Hillairet de Boisferon, Dupierrix, Quinn, Lœvenbruck, Lewkowicz, Lee and Pascalis2015; Houston & Jusczyk, Reference Houston and Jusczyk2000; Martin et al., Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015). However, our findings underscore how cross-linguistically pervasive these effects may be, urging further work on the talker-specific properties of infants’ early linguistic representations and the mechanisms by which these early representations become more robust to different talker types over time.

More talkers leads to more talk

As predicted, and consistent with prior work (Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Sacks et al., Reference Sacks, Schegloff and Jefferson1978; Stein et al., Reference Stein, Menti and Rosemberg2021), we found that more talkers leads to more input, both for TCDS and ADS. This effect is due in part to the simple fact that, all else being equal, the presence of more talkers leads to more talk. The presence of more talkers increases competition for the floor (Holler et al., Reference Holler, Alday, Decuyper, Geiger, Kendrick and Meyer2021; Sacks et al., Reference Sacks, Schegloff and Jefferson1978). When there are four or more individuals present, as is the average case in all but the English-speaking groups (see Supplementary Materials Section 2), there is an opportunity for interactants to break off into smaller conversations (e.g., two, two-person conversations), potentially doubling the observable talk via overlapping conversation in the input (Sacks et al., Reference Sacks, Schegloff and Jefferson1978). Future work might selectively examine subsets of daylong data to more precisely characterize how interactionally driven factors such as the number of talkers present accounts for fluctuations in a child’s linguistic input rate, both within and across language groups. Doing so will likely require more transcribed interactions than the current dataset offers.

It is notable that multi-party talk and, in general, observable talk between others is abundant across these groups, with raw ADS estimates typically far outpacing individually addressed child input (i.e., TCDS). While a variety of language-learning processes may indeed benefit from early exposure to observable talk (e.g., syntactically complex prosodic structures in adult-adult conversation), ADS learning effects in infancy and early toddlerhood have gone largely unexamined (though see, e.g., Akhtar et al., Reference Akhtar, Jipson and Callanan2001; Foushee et al., Reference Foushee, Srinivasan and Xu2021; Oshima-Takane et al., Reference Oshima-Takane, Goodz and Derevensky1996). Given the current results, it may be worthwhile to dig deeper into how observable talk and multi-party talk influence the very early stages of language learning (e.g., de León, Reference de León1998; de León & Garcia-Sánchez, Reference de León and García-Sánchez2021), especially the early development of non-referential linguistic knowledge (e.g., regarding phonology, see Cristia, Reference Cristia2020; regarding conversational turn taking, see Dunn & Shatz, Reference Dunn and Shatz1989). As we discuss in more detail below, an important addendum to this discussion is that systematic differences in multi-party interaction might also be understood as cultural – not solely situation-specific.

Minimal evidence for baseline differences in language group

Regarding effects of language group on baseline TCDS and ADS input rates, we had predicted that children acquiring Argentinian Spanish, Tseltal, and Yélî Dnye would encounter lower rates of TCDS and higher rates of ADS compared to North American English-acquiring children. By and large, our data show little evidence for these hypotheses. When it came to TCDS, we only observed one case where input rates differed: Yélî Dnye’s baseline TCDS rate was significantly lower than that of North American English. When it came to ADS, we found no evidence for differences in baseline ADS rate between North American English and other language groups. This set of results may come as a surprise, considering that the raw rates of TCDS and ADS clearly vary between language groups, in ways that often align with our original predictions (e.g., the raw ADS rate of Yélî Dnye is nearly two and a half times larger than that of North American English; Table 4). A reasonable question is then why our model estimates don’t reflect these differences. Beyond the concern of statistical power – which is relevant given our relatively small samples – it is essential to think again about where differences between groups could come from. In particular, it’s worth re-examining how to understand the effect of the number of talkers present: could this be group-specific behavior or not?

We observed that many of the apparent inconsistencies with mean overall TCDS and ADS rate come from systematic between-group differences in the number and composition of talkers. For example, Yélî Dnye children had an average of 6.06 talkers present in addition to the target child, while North American English children only had 1.81. So, even if the baseline rate of TCDS is significantly lower per talker in Yélî Dnye (as our model above suggests), there are many more talkers present in the Yélî children’s acoustic environment compared to the North American children. Consequently, the overall experienced TCDS by children in these two groups appears overall comparable (Table 4).

We tested this idea in a post-hoc analysis where we removed number and type of talker from the regression models, only leaving child age and language group as predictors (Supplementary Materials Section 4).Footnote 7 To the extent that number of talkers and talker type are correlated with language group, their associated variance will be incorporated with language group effects in these simpler models (Wurm & Fisicaro, Reference Wurm and Fisicaro2014), giving us an interpretive view closer to that implied by the by-language-group averages in Table 4. The simpler models suggested no evidence for difference in TCDS rate between North American English and the other groups, and suggest significantly higher ADS in Yélî Dnye (and significantly lower likelihood of a zero-ADS clip) compared to North American English. In sum, the results for Yélî Dnye look very different depending on whether variance in the number of talkers (and thereby variance in the quantity of TCDS/ADS) can be attributed to the language group (i.e., in the “simple” models) or whether it’s pulled out as a separate, nuisance predictor (i.e., in the primary models). This point is important for two reasons.

First, the same data can lead to very different conclusions depending on what variance is treated as group-specific vs. not. From an ethnographic perspective, it may be completely valid to consider features like number and composition of talkers a part of children’s specific cultural and linguistic milieu. The number of talkers present, after all, likely relates to cultural practices around childcare (e.g., alloparenting), household organization (e.g., multigenerational housing), and daily activities (e.g., food preparation routines; see Gaskins (Reference Gaskins and Göncü1999) and Casillas (Reference Casillas2023) for more discussion of these issues). Put differently, variation in the number of talkers present can signal group-specific routines, practices, and interactional contexts. However, our perspective is that, in order to understand how these factors might generally and cross-culturally influence children’s linguistic input, we need to analyze them as (partly) separate from culture. Doing so gives us a glimpse into how basic processes of conversational coordination and caregiving may shape children’s input in broadly similar ways across diverse human groups and thus give us insight into how children learn language so robustly across widely varied home environments.

Second, and complementarily, a lack of group effects (beyond differences in the prevalence of multiparty talk) does not imply that early language environments are cross-culturally and cross-linguistically similar. Our measures represent highly simplified quantifications of two sources of linguistic input – one designed specifically for the target child and one designed for adults – but capture nothing about the content of naturalistic input or its integration into children’s interactive or multi-modal experiences (e.g., Bergelson et al., Reference Bergelson, Amatuni, Dailey, Koorathota and Tor2019a; Broesch et al., Reference Broesch, Rochat, Olah, Broesch and Henrich2016; de León, Reference de León1998; Kuchirko et al., Reference Kuchirko, Tafuro and Tamis-LeMonda2018; Montag, Reference Montag, Denson, Mack, Xu and Armstrong2020; Rosemberg et al., Reference Rosemberg, Alam, Audisio, Ramirez, Garber and Migdalek2020, Reference Rosemberg, Alam, Ramirez and Ibañez2023; Rowe, Reference Rowe2012). The measure we use here, while a crucial starting point, is too coarse to make detailed conclusions regarding qualitative similarities or differences in children’s early language experiences. To do so, we would need much more than the present data can offer, at least: (1) detailed generative models of how much input children encounter, from whom, and under what conditions – to which the present study contributes; (2) an understanding of the content of that input and how it fluctuates under different conditions; and (3) documentation of the local cultural, institutional, social, economic, and material realities that may radically change the experienced linguistic input.

Finally, we note that our findings do not cleanly divide between so-called “WEIRD” and “non-WEIRD” (Henrich et al., Reference Henrich, Heine and Norenzayan2010) groups. For example, Yélî Dnye and Tseltal – the two rural subsistence communities represented here – do not pattern together in our data, and neither do the two historically related urban post-industrial populations – North American and UK English (see Cristia, Reference Cristia2023 for further discussion). This highlights the importance of considering each population in its own right when making claims about cultural and linguistic similarities and differences. While ultimately we hope to pinpoint areas of similarity and systematic variation in language development across a wide variety of developmental contexts, it is far too early to make universalist claims about patterns in children’s real-world language experiences. The WEIRD or non-WEIRD distinction, while helpful to illustrate cultural biases in behavioral research, can also unfortunately reinforce those same biases by grouping together very distinct cultural groups in opposition to a Western, primarily North American, groups (for further discussion relating specifically to infant research, see Singh et al., Reference Singh, Cristia, Karasik, Rajendra and Oakes2023).

Limitations

There were minor methodological variations in sampling and in transcribers and transcription due to the logistical constraints in doing annotation across different labs and language settings – this minor variation is inevitable in comparative work on naturalistic interaction. We carefully considered these minor deviations, and have no reason to believe that they impacted our findings in any meaningful way. Of greater concern, however, is whether our collection of annotated clips constitute enough data to reveal true underlying effects. We sampled randomly over the course of the daylong recording to capture a representative sample of young children’s input, which often includes “down” time moments. Given the diversity of populations in our metacorpus, random sampling was also the most straightforward way to ensure that our sampling method itself did not introduce confounds across corpora (e.g., if we had picked high-vocal-activity segments only or otherwise activity-centered moments like “play”). However, the highly zero-inflated nature of children’s daily experiences (Mendoza & Fausey, Reference Mendoza and Fausey2021) challenges our statistical approach and interpretations.

Best estimates to date suggest that our sample size (22.5–45 minutes per recording) is reasonable for obtaining preliminary stable estimates (Cychosz et al., Reference Cychosz, Villanueva and Weisleder2020; Micheletti et al., Reference Micheletti, de Barbaro, Fellows, Hixon, Slatcher and Pennebaker2020). For example, Marasli and Montag (Reference Marasli, Montag, Goldwater, Anggoro, Hayes and Ong2023) examine estimated versus true word counts from daylong recordings using a variety of random sampling schemes, finding that a total of 30 minutes of randomly sampled 1–5-minute clips yields accurate average word count estimates, with varying but symmetrical rates of error depending on clip duration (shorter is better). Word count and utterance duration are highly correlated, and utterance duration directly corresponds to our measure of quantity (see DeAnda et al., Reference DeAnda, Bosch, Poulin-Dubois, Zesiger and Friend2016; see also Räsänen et al., Reference Räsänen, Seshadri, Lavechin, Cristia and Casillas2021 for evidence from the specific corpora used here). Therefore we consider the currently sampled data as sufficient for an accurate approximation of input rates from daylong recordings. However, we leave it to future work to refine this assumption based on a greater diversity of daylong recording types.

Indeed, while we may have sampled sufficiently within recordings to create stable estimates for each child, the present analyses would be more powerful if done over more recordings, with a greater number of language groups, and/or with a more systematic or theory-driven selection of cultural or linguistic contexts to study. These are persistent problems in the field of developmental science (e.g., Kosie & Lew-Williams, Reference Kosie and Lew-Williams2022; Oakes, Reference Oakes2017; Singh et al., Reference Singh, Cristia, Karasik, Rajendra and Oakes2023) and so, as usual, any null effects should be taken as preliminary.

Importantly, we see the present study as an initial assessment of differences between these populations in children’s home linguistic experiences, and do not believe that any single study should be considered the final word in comparisons of this nature. Indeed, another weakness of the current study is a lack of deep incorporation of existing ethnographic and language socialization claims about these populations (Brown & Gaskins, Reference Brown, Gaskins, Enfield, Kockelman and Sidnell2014; Gaskins, Reference Gaskins, Enfield and Levinson2006; Ochs & Schieffelin, Reference Ochs, Schieffelin, Schweder and LeVine1984). What our findings do highlight is that specific facets of behavioral patterns (e.g., housing arrangements, child caregivers, etc.) are visible in quantitative measures of children’s language environment in ways that allow us to identify axes of cross-linguistic and cross-cultural variation that are relevant for developing generalizable theories of language learning. By looking deeper at the local context for each dataset, we would better understand variation within each. For example, the construct of “socioeconomic status” is so different between these communities that it is hard to imagine a meaningful way of directly comparing between groups. Instead, within-population analyses that take into account individual and collective power within social hierarchies and relevant local institutions seem much more likely to shed light on socioeconomic effects across these corpora.Footnote 8 We thus strongly urge readers to take caution in generalizing our results (or those of other researchers) beyond the current data, to new populations.

Finally, the present set of analyses examines input without taking into account patterns of target child vocalization, turn-taking with the target child, or examining how overlapping vocalizations would change the estimates presented here (e.g., Broesch et al., Reference Broesch, Rochat, Olah, Broesch and Henrich2016; de León, Reference de León1998; Donnelly & Kidd, Reference Donnelly and Kidd2021; Elmlinger et al., Reference Elmlinger, Goldstein and Casillas2023; Kuchirko et al., Reference Kuchirko, Tafuro and Tamis-LeMonda2018; Scaff et al., Reference Scaff, Casillas, Stieglitz and Cristia2023). Examinations of the target children’s vocalizations and their active interaction with other talkers is outside of the scope of the present paper but is an active area of work by the present author team. Determining what overlapping vocalizations may be seriously degraded in children’s perception of their input (Erickson & Newman, Reference Erickson and Newman2017; Hall et al., Reference Hall, Grose, Buss and Dev2002) is also beyond the scope of the present paper, complicated by the varying types and levels of background noise, the time spent outdoors, and the activity contexts in which overlap is embedded (e.g., two simultaneous adult conversations versus simultaneous chanting of a phrase by three children playing a game). Surely excluding all overlapping talk would reduce the estimates presented here, but we are unconvinced that doing so would contribute much more to our understanding than the current data do. For research directly considering this issue of overlapping speech and its impact on input estimates, we point readers to work by Scaff et al. (Reference Scaff, Casillas, Stieglitz and Cristia2023).

Conclusion

Our findings revealed that, across a diverse set of cultural and linguistic contexts, the quantity of input directed to children during the first three years is both relatively low and stable across age. Overhearable adult-directed input is much more available, but our preliminary evidence suggests that it decreases across age. Language group also impacts who input is likely to come from, especially when it comes to directed input from other children, which is more common in some groups than others. That said, women’s input predominates overall. Finally, the number of talkers who are present matters a great deal for the amount of language encountered, both target-child directed and adult-directed. These results add to a growing body of work quantifying the outsize role women’s input plays in children’s early language exposure across varied cultural and linguistic groups. It also highlights the fact that children’s relative exposure to input from other talker types – especially language from other children – is an important and understudied aspect of their early linguistic input. Finally, it underscores the importance of understanding how other aspects of everyday life drive patterns in language exposure (e.g., the number of others present), opening up pathways for future work to more precisely pinpoint the nature of these differences and their relationship to early language development.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S030500092400028X.

Acknowledgements

This research was supported by the Social Sciences and Humanities Research Council of Canada (435-2015-0628, 869-2016-0003) and by the Natural Sciences and Engineering Research Council of Canada (501769-2016-RGPDD) to Melanie Soderstrom; by the National Endowment for the Humanities (HJ-253479-17), National Institutes of Health Grant DP5-OD019812, and National Science Foundation BCS-1844710 to Elika Bergelson; a CONICET grant, PIP 80/2015, and a MINCyT grant, PICT 3327/2014 to Celia Rosemberg; and an NWO Veni Innovational Scheme Grant (275-89-033) to Marisa Casillas. We thank Anne Warlaumont and Caroline Rowland for contributing their datasets to this project and for helpful feedback on this manuscript. Finally, we thank the families who participated in the recordings that made this research possible.

Footnotes

1 The typical number of talkers present may vary systematically between populations (see Supplementary Materials Section 2 and Brown, Reference Brown, Duranti, Ochs and Schieffelin2011, Reference Brown, Arnon, Casillas, Kurumada and Estigarribia2014; Gaskins, Reference Gaskins, Enfield and Levinson2006; Rosemberg et al., Reference Rosemberg, Alam, Audisio, Ramirez, Garber and Migdalek2020). Therefore any number-of-talkers measure may partly capture cultural differences—not just within-participant variation. We suspect that, even for children in talker-dense populations, variation in the number of talkers present impacts how much CDS and ADS is observable at a given moment. In the present work, we thus add this factor as separate from cultural-linguistic group in the statistical models.

2 Alternatively, one could comprehensively annotate and analyze children’s daylong input (Montag, Reference Montag, Denson, Mack, Xu and Armstrong2020), but manually annotating input at the utterance level in this way is a many-years-long undertaking (e.g., at a representative work-to-recording ratio of 60 minutes:1 minute, that would be roughly 43,839 work hours for the current sample of 69 children; approximately 23 full-time work years).

3 While the data collectively represent a linguistically and culturally diverse array of early language experiences, one might still consider the present data to be a convenience sample—drawn from pre-existing, available recordings.

4 To simplify the structure of our results, in our analyses below, we treat the largest corpus—North American English—as the reference level for language group, and the most studied talker type—women—as the reference-level for talker type. We in no way imply with this decision that the North American English data are the global default or norm. Rather, we mean to highlight which results are likely to have gone under-reported in past work. The Supplementary Materials include alternate models with all language groups as reference level to allow for further inspection and full pairwise comparisons between language groups.

5 For more information on the caregiving and early language environments of children acquiring Argentinian Spanish, Tseltal, and Yélî Dnye, we refer readers to other work (Brown, Reference Brown1998, Reference Brown, Duranti, Ochs and Schieffelin2011, Reference Brown, Arnon, Casillas, Kurumada and Estigarribia2014; Brown & Casillas, Reference Brown, Casillas, Fentiman and Goody2025; Rosemberg et al., Reference Rosemberg, Alam, Audisio, Ramirez, Garber and Migdalek2020).

6 Note also that while it is a somewhat common practice to exclude naptime from consideration in analyses of longform audio recording (e.g. Bergelson, Amatuni, et al. (Reference Bergelson, Amatuni, Dailey, Koorathota and Tor2019a); Ramírez-Esparza et al. (Reference Ramírez-Esparza, García-Sierra and Kuhl2014)), naptime is not a culturally appropriate construct in some of our sampled populations.

7 I.e., formula = min.p.hr ~ child.age + lang.grp + (1 | child.id), ziformula = ~ child.age + lang.grp.

8 While we do have indicators of maternal education for the recordings used here, we are unconvinced that this indicator is similarly meaningful across such economically and culturally diverse populations and so we do not use it in our analyses.

References

Akhtar, N., Jipson, J., & Callanan, M. A. (2001). Learning words through overhearing. Child Development, 72(2), 416430.Google Scholar
Alam, F., Ramírez, L., & Migdalek, M. (2021). Other children’s words in the linguistic environment of infants and young children from distinct social groups in Argentina (Las palabras de otros niños en el entorno lingüístico de bebés y niños pequeños de distintos grupos sociales de Argentina). Journal for the Study of Education and Development, 44(2), 269302, DOI: 10.1080/02103702.2021.1888489Google Scholar
Bang, J. Y., Kachergis, G., Weisleder, A., & Marchman, V. A. (2022). An Automated Classifier for Child-Directed Speech from LENA Recordings. In Gong, Ying and Kpogo, Felix (Eds.) Proceedings of the 46th annual Boston University Conference on Language Development, (pp. 4861). Somerville, MA: Cascadilla Press.Google Scholar
Bateson, M. C. (1979). The epigenesis of conversational interaction: A personal account of research development. In Bullowa, M. (Ed.), Before speech (pp. 6379). New York, NY: Cambridge University Press.Google Scholar
Bell, A. (1984). Language style as audience design. Language in Society, 13(2), 145204.Google Scholar
Bergelson, E. (2016). Bergelson SEEDLingS HomeBank corpus. Doi, 10, T5PK6D.Google Scholar
Bergelson, E. (2020). The comprehension boost in early word learning: Older infants are better learners. Child Development Perspectives, 14(3), 142149.Google Scholar
Bergelson, E., Amatuni, A., Dailey, S., Koorathota, S., & Tor, S. (2019a). Day by day, hour by hour: Naturalistic language input to infants. Developmental Science, 22, e12715. https://doi.org/10.1111/desc.12715Google Scholar
Bergelson, E., Casillas, M., Soderstrom, M., Seidl, A., Warlaumont, A. S., & Amatuni, A. (2019b). What do North American babies hear? A large-scale cross-corpus analysis. Developmental Science, 22, e12724. https://doi.org/10.1111/desc.12724Google Scholar
Bergelson, E., Soderstrom, M., Schwarz, I.-C., Rowland, C. F., Ramirez-Esparza, N. R., Hamrick, L., Marklund, E., Kalashnikova, M., Guez, A., Casillas, M., Benetti, L., Alphen, P. van, & Cristia, A. (2023). Everyday language input and production in 1,001 children from six continents. Proceedings of the National Academy of Sciences, 120(52), e2300671120. https://doi.org/10.1073/pnas.2300671120Google Scholar
Bergelson, E., & Swingley, D. (2018). Young infants’ word comprehension given an unfamiliar talker or altered pronunciations. Child Development, 89(5), 15671576.Google Scholar
Bornstein, M. H., Tal, J., Rahn, C., Galperin, C. Z., Pecheux, M.-G., Lamour, M., & Tamis-LeMonda, C. S. (1992). Functional analysis of the contents of maternal speech to infants of 5 and 13 months in four cultures: Argentina, France, Japan, and the United States. Developmental Psychology, 28(4), 593.Google Scholar
Broesch, T., Rochat, P., Olah, K., Broesch, J., & Henrich, J. (2016). Similarities and differences in maternal responsiveness in three societies: Evidence from Fiji, Kenya, and the United States. Child Development, 87(3), 700711. https://doi.org/10.1111/cdev.12501Google Scholar
Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J., Maechler, M., & Bolker, B. M. (2017). Modeling zero-inflated count data with glmmTMB. bioRxiv. https://doi.org/10.1101/132753Google Scholar
Brown, P. (1998). Conversational structure and language acquisition: The role of repetition in Tzeltal adult and child speech. Journal of Linguistic Anthropology, 2, 197221. https://doi.org/10.1525/jlin.1998.8.2.197Google Scholar
Brown, P. (2011). The cultural organization of attention. In Duranti, A., Ochs, E., & and Schieffelin, B. B. (Eds.), Handbook of Language Socialization (pp. 2955). Malden, MA: Wiley-Blackwell.Google Scholar
Brown, P. (2014). The interactional context of language learning in Tzeltal. In Arnon, I., Casillas, M., Kurumada, C., & Estigarribia, B. (Eds.), Language in interaction: Studies in honor of Eve V. Clark (pp. 5182). Amsterdam, NL: John Benjamins.Google Scholar
Brown, P., & Casillas, M. (2025). Childrearing through social interaction on Rossel Island, PNG. In Fentiman, A. J. & Goody, M. (Eds.), Esther Goody revisited: Exploring the legacy of an original inter-disciplinarian (pp. XX–XX). New York, NY: Berghahn.Google Scholar
Brown, P., & Gaskins, S. (2014). Language acquisition and language socialization. In Enfield, N. J., Kockelman, P., & Sidnell, J. (Eds.), Handbook of Linguistic Anthropology (pp. 187226). Cambridge, UK: Cambridge University Press. https://doi.org/10.1017/CBO9781139342872.010Google Scholar
Bruner, J. (1983). Child’s talk: Learning how to use language. New York, NY: Norton.Google Scholar
Cartmill, E. A., Armstrong, B. F., Gleitman, L. R., Goldin-Meadow, S., Medina, T. N., & Trueswell, J. C. (2013). Quality of early parent input predicts child vocabulary 3 years later. Proceedings of the National Academy of Sciences, 110(28), 1127811283. https://doi.org/10.1073/pnas.1309518110Google Scholar
Casillas, M. (2023). Learning language in vivo. Child Development Perspectives, 17(1), 1017.Google Scholar
Casillas, M., Bergelson, E., Warlaumont, A. S., Cristia, A., Soderstrom, M., VanDam, M., & Sloetjes, H. (2017a). A new workflow for semi-automatized annotations: Tests with long-form naturalistic recordings of children’s language environments. In Lacerda, F., House, D., Heldner, M., Gustafson, J., Strömbergsson, S., & Włodarczak, M. (Eds.), Proceedings of the 18th annual conference of the international speech communication association (INTERSPEECH 2017) (pp. 20982102). Stockholm, Sweden. https://doi.org/10.21437/Interspeech.2017-1418Google Scholar
Casillas, M., Brown, P., & Levinson, S. C. (2017b). Tseltal and Yelî Dnye HomeBank corpora.Google Scholar
Casillas, M., Brown, P., & Levinson, S. C. (2020). Early language experience in a Tseltal Mayan village. Child Development, 91(5), 18191835.Google Scholar
Casillas, M., Brown, P., & Levinson, S. C. (2021). Early language experience in a Papuan community. Journal of Child Language, 48, 792814.Google Scholar
Casillas, M., & Cristia, A. (2019). A step-by-step guide to collecting and analyzing long-format speech environment (LFSE) recordings. Collabra: Psychology, 5(1), 24. https://doi.org/10.1525/collabra.209Google Scholar
Clark, H. H. (1996). Using language. Cambridge University Press.Google Scholar
Cooper, R. P., & Aslin, R. N. (1989). The language environment of the young infant: Implications for early perceptual development. Canadian Journal of Psychology, 43(2), 247265.Google Scholar
Cristia, A. (2020). Language input and outcome variation as a test of theory plausibility: The case of early phonological acquisition. Developmental Review, 57, 100914.Google Scholar
Cristia, A. (2023). A systematic review suggests marked differences in the prevalence of infant-directed vocalization across groups of populations. Developmental Science, 26(1), e13265.Google Scholar
Cristia, A., Ganesh, S., Casillas, M., & Ganapathy, S. (2018). Talker diarization in the wild: The case of child-centered daylong audio-recordings. https://doi.org/10.21437/Interspeech.2018-2078Google Scholar
Cristia, A., Gautheron, L., & Colleran, H. (2023). Vocal input and output among infants in a multilingual context: Evidence from long-form recordings in anuatu. Developmental Science, e13375.Google Scholar
Cychosz, M., Villanueva, A., & Weisleder, A. (2020). Efficient estimation of children’s language exposure in two bilingual communities.Google Scholar
Dailey, S., & Bergelson, E. (2022). Language input to infants of different socioeconomic statuses: A quantitative meta-analysis. Developmental Science, 25(3), e13192.Google Scholar
DeAnda, S., Bosch, L., Poulin-Dubois, D., Zesiger, P., & Friend, M. (2016). The language exposure assessment tool: Quantifying language exposure in infants and children. Journal of Speech, Language, and Hearing Research, 59(6), 13461356.Google Scholar
de León, L. (1998). The emergent participant: Interactive patterns in the socialization of Tzotzil (Mayan) infants. Journal of Linguistic Anthropology, 8(2), 131161.Google Scholar
de León, L. (2011). Language socialization and multiparty participation frameworks. In Duranti, A., Ochs, E., & and Schieffelin, B. B. (Eds.), Handbook of Language Socialization (pp. 81111). Malden, MA: Wiley-Blackwell. https://doi.org/10.1002/9781444342901.ch4Google Scholar
de León, L., & García-Sánchez, I. M. (2021). Language Socialization at the Intersection of the Local and the Global: The Contested Trajectories of Input and Communicative Competence. Annual Review of Linguistics, 7, 421448. https://doi.org/10.1146/annurev-linguistics-011619-030538Google Scholar
Demuth, K., & Mputhi, T. S. (1979). Introduction to Sesotho: An oral approach (with language tapes).Google Scholar
Donnelly, S., & Kidd, E. (2021). The longitudinal relationship between conversational turn-taking and vocabulary growth in early language development. Child Development, 92(2), 609625.Google Scholar
Dunn, J., & Shatz, M. (1989). Becoming a conversationalist despite (or because of) having an older sibling. Child Development, 399410.Google Scholar
Elmlinger, S. L., Goldstein, M. H., & Casillas, M. (2023). Immature vocalizations simplify the speech of Tseltal Mayan and US caregivers. Topics in Cognitive Science, 15(2), 315328.Google Scholar
Erickson, L. C., & Newman, R. S. (2017). Influences of background noise on infants and children. Current Directions in Psychological Science, 26(5), 451457.Google Scholar
Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., & Fukui, I. (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16(3), 477501.Google Scholar
Foushee, R., Srinivasan, M., & Xu, F. (2021). Self-directed learning by preschoolers in a naturalistic overhearing context. Cognition, 206, 104415.Google Scholar
Gaskins, S. (1996). How Mayan parental theories come into play. In Harkness, S. & Super, C. M. (Eds.), Parents’ cultural belief systems: Their origins, expressions, and consequences (pp. 1183). Guilford Press New York.Google Scholar
Gaskins, S. (1999). Children’s daily lives in a Mayan village: A case study of culturally constructed roles and activities. In Göncü, A. (Ed.), Children’s Engagement in the World: Sociocultural Perspectives (pp. 2560). Oxford: Berg.Google Scholar
Gaskins, S. (2006). Cultural perspectives on infant–caregiver interaction. In Enfield, N. J. & Levinson, S. C. (Eds.), Roots of Human Sociality: Culture, Cognition and Interaction (pp. 279298). Oxford: Berg.Google Scholar
Greenwood, C. R., Thiemann-Bourque, K., Walker, D., Buzhardt, J., & Gilkerson, J. (2011). Assessing children’s home language environments using automatic speech recognition technology. Communication Disorders Quarterly, 32(2), 8392. https://doi.org/10.1177/1525740110367826Google Scholar
Hall, J. W. III, Grose, J. H., Buss, E., & Dev, M. B. (2002). Spondee recognition in a two-talker masker and a speech-shaped noise masker in adults and children. Ear and Hearing, 23(2), 159165.Google Scholar
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Beyond WEIRD: Towards a broad-based behavioral science. Behavioral and Brain Sciences, 33(2–3), 111135. https://doi.org/10.1017/S0140525X10000725Google Scholar
Hillairet de Boisferon, A., Dupierrix, E., Quinn, P. C., Lœvenbruck, H., Lewkowicz, D. J., Lee, K., & Pascalis, O. (2015). Perception of multisensory gender coherence in 6-and 9-month-old infants. Infancy, 20(6), 661674.Google Scholar
Hilton, C. B., Moser, C. J., Bertolo, M., Lee-Rubin, H., Amir, D., Bainbridge, C. M., Simson, J., Knox, D., Glowacki, L., Alemu, E., Galbarczyk, A., Jasienska, G., Ross, C. T., Neff, M. B., Martin, A., Cirelli, L. K., Trehub, S. E., Song, J., Kim, M., Schachner, A., … Mehr, S. A. (2022). Acoustic regularities in infant-directed speech and song across culturesNature Human Behaviour6(11), 15451556. https://doi.org/10.1038/s41562-022-01410-xGoogle Scholar
Hoff, E. (2003). The specificity of environmental influence: Socioeconomic status affects early vocabulary development via maternal speech. Child Development, 74(5), 13681378. https://doi.org/10.3389/fpsyg.2015.01492Google Scholar
Holler, J., Alday, P. M., Decuyper, C., Geiger, M., Kendrick, K. H., & Meyer, A. S. (2021). Competition reduces response times in multiparty conversation. Frontiers in Psychology, 12.Google Scholar
Hou, L. (2024). Giving oranges and puppies: Children’s production of directional verbs in an emerging sign language from Oaxaca. First Language, 122. https://doi.org/10.1177/01427237231221886Google Scholar
Houston, D. M., & Jusczyk, P. W. (2000). The role of talker-specific information in word segmentation by infants. Journal of Experimental Psychology: Human Perception and Performance, 26(5), 1570.Google Scholar
Kosie, J., & Lew-Williams, C. (2022). Open science considerations for descriptive research in developmental science. Infant and Child Development, e2377. https://doi.org/10.1002/icd.2377Google Scholar
Kuchirko, Y., Tafuro, L., & Tamis-LeMonda, C. S. (2018). Becoming a communicative partner: Infant contingent responsiveness to maternal language and gestures. Infancy, 23(4), 558576. https://doi.org/10.1111/infa.12222Google Scholar
Loukatou, G., Scaff, C., Demuth, K., Cristia, A., & Havron, N. (2022). Child-directed and overheard input from different speakers in two distinct cultures. Journal of Child Language, 49(6), 11731192.Google Scholar
Mannle, S., Barton, M., & Tomasello, M. (1992). Two-year-olds’ conversations with their mothers and preschool-aged siblings. First Language, 12(34), 5771.Google Scholar
Marasli, Z., & Montag, J. L. (2023). Optimizing random sampling of daylong audio. In Goldwater, M., Anggoro, F., Hayes, B., & Ong, D. (Eds.), Proceedings of the 44th Annual Meeting of the Cognitive Science Society.Google Scholar
Martin, A., Schatz, T., Versteegh, M., Miyazawa, K., Mazuka, R., Dupoux, E., & Cristia, A. (2015). Mothers speak less clearly to infants than to adults: A comprehensive test of the hyperarticulation hypothesis. Psychological Science, 26(3), 341347.Google Scholar
Masek, L. R., Ramirez, A. G., McMillan, B. T., Hirsh-Pasek, K., & Golinkoff, R. M. (2021). Beyond counting words: A paradigm shift for the study of language acquisition. Child Development Perspectives, 15(4), 274280.Google Scholar
McDivitt, K., & Soderstrom, M. (2016). McDivitt HomeBank corpus.Google Scholar
Mendoza, J. K., & Fausey, C. M. (2021). Everyday music in infancy. Developmental Science, 24(6), e13122.Google Scholar
Meylan, S. C., & Bergelson, E. (2022). Learning through processing: Toward an integrated approach to early word learning. Annual Review of Linguistics, 8(1), 7799. https://doi.org/10.1146/annurev-linguistics-031220-011146Google Scholar
Micheletti, M., de Barbaro, K., Fellows, M. D., Hixon, J. G., Slatcher, R. B., & Pennebaker, J. W. (2020). Optimal sampling strategies for characterizing behavior and affect from ambulatory audio recordings. Journal of Family Psychology.Google Scholar
Montag, J. L. (2020). New insights from daylong audio transcripts of children’s language environments. In Denson, S., Mack, M., Xu, Y., & Armstrong, B. C. (Eds.), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (pp. 30053011).Google Scholar
Oakes, L. M. (2017). Sample size, statistical power, and false conclusions in infant looking-time research. Infancy, 22(4), 436469. https://doi.org/10.1111/infa.12186Google Scholar
Ochs, E., & Kremer-Sadlik, T. (2020). Ethical blind spots in ethnographic and developmental approaches to the language gap debate. Langage Et Société, (2), 3967.Google Scholar
Ochs, E., & Schieffelin, B. (1984). Language acquisition and socialization: Three developmental stories and their implications. In Schweder, R. A. & LeVine, R. A. (Eds.), Culture theory: Essays on mind, self, and emotion (pp. 276322). Cambridge University Press.Google Scholar
Oshima-Takane, Y., Goodz, E., & Derevensky, J. L. (1996). Birth order effects on early language development: Do secondborn children learn from overheard speech? Child Development, 67(2), 621634.Google Scholar
Piot, L., Havron, N., & Cristia, A. (2022). Socioeconomic status correlates with measures of Language Environment Analysis (LENA) system: A meta-analysis. Journal of Child Language, 49(5), 10371051.Google Scholar
Pisani, S., Gautheron, L., & Cristia, A. (2021). Long-form recordings: From A to Z. Retrieved from https://bookdown.org/alecristia/exelang-book/. DOI: 10.5281/zenodo.6685828Google Scholar
Pye, C. (1986). Quiché Mayan speech to children. Journal of Child Language, 13(1), 85100. https://doi.org/10.1017/S0305000900000313Google Scholar
Pye, C. (2017). The Comparative Method of Language Acquisition Research. University of Chicago Press.Google Scholar
Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2014). Look who’s talking: Speech style and social context in language input to infants are linked to concurrent and future speech development. Developmental Science, 17, 880891. https://doi.org/10.1111/desc.12172Google Scholar
Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2017a). Look who’s talking NOW! Parentese speech, social context, and language development across time. Frontiers in Psychology, 8, 1008. https://doi.org/10.3389/fpsyg.2017.01008Google Scholar
Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2017b). The impact of early social interactions on later language development in Spanish–English bilingual infants. Child Development, 88(4), 12161234.Google Scholar
Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A., & Casillas, M. (2021). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods, 53, 818835.Google Scholar
Team, R Core. (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/Google Scholar
Rogoff, B., Paradise, R., Arauz, R. M., Correa-Chávez, M., & Angelillo, C. (2003). Firsthand learning through intent participation. Annual Review of Psychology, 54(1), 175203. https://doi.org/10.1146/annurev.psych.54.101601.145118Google Scholar
Rosemberg, C. R., Alam, F., Audisio, C. P., Ramirez, M. L., Garber, L., & Migdalek, M. J. (2020). Nouns and verbs in the linguistic environment of Argentinian toddlers: Socioeconomic and context-related differences. First Language, 40(2), 192217.Google Scholar
Rosemberg, C. R., Alam, F., Ramirez, M. L., & Ibañez, M. I. (2023). Activity Contexts and Child-Directed Speech in Socioeconomically Diverse Argentinian Households. International Journal of Early Childhood, 55, pp. 125. https://doi.org/10.1007/s13158-022-00345-8Google Scholar
Rosemberg, C. R., Alam, F., Stein, A., Migdalek, M. J., Menti, A., & Ojea, G. (2015). Los entornos lingüísticos de niños pequeños en Argentina/Language Environments of Young Argentinean Children. CONICET (http://hdl.handle.net/11336/202111).Google Scholar
Rowe, M. L. (2008). Child-directed speech: Relation to socioeconomic status, knowledge of child development and child vocabulary skill. Journal of Child Language, 35(1), 185205. https://doi.org/10.1017/S0305000907008343Google Scholar
Rowe, M. L. (2012). A longitudinal investigation of the role of quantity and quality of child-directed speech in vocabulary development. Child Development, 83(5), 17621774.Google Scholar
Rowe, M. L., & Weisleder, A. (2020). Language development in context. Annual Review of Developmental Psychology, 2, 201223.Google Scholar
Rowland, C. F., Bidgood, A., Durrant, S., Peter, M., & Pine, J. M. (2018). The language 0-5 project. Unpublished Manuscript, University of Liverpool. Doi, 10.Google Scholar
Sacks, H., Schegloff, E. A., & Jefferson, G. (1978). A simplest systematics for the organization of turn taking for conversation. In Studies in the organization of conversational interaction (pp. 755). Elsevier.Google Scholar
Scaff, C., Casillas, M., Stieglitz, J., & Cristia, A. (2023). Characterization of children’s verbal input in a forager-farmer population using long-form audio recordings and diverse input definitions. Infancy, EarlyView, 120.Google Scholar
Schober, M. F., & Clark, H. H. (1989). Understanding by addressees and overhearers. Cognitive Psychology, 21(2), 211232.Google Scholar
Shneidman, L. A. (2010). Language Input and Acquisition in a Mayan Village (PhD thesis). The University of Chicago.Google Scholar
Shneidman, L. A., Gaskins, S., & Woodward, A. (2016). Child-directed teaching and social learning at 18 months of age: Evidence from Yucatec Mayan and US infants. Developmental Science, 19(3), 372381.Google Scholar
Shneidman, L. A., & Goldin-Meadow, S. (2012). Language input and acquisition in a Mayan village: How important is directed speech? Developmental Science, 15(5), 659673. https://doi.org/10.1111/j.1467-7687.2012.01168.xGoogle Scholar
Singh, L., Cristia, A., Karasik, L. B., Rajendra, S. J., & Oakes, L. M. (2023). Diversity and representation in infant research: Barriers and bridges toward a globalized science of infant development. Infancy.Google Scholar
Smithson, M., & Merkle, E. C. (2013). Generalized linear models for categorical and continuous limited dependent variables. New York: Chapman; Hall/CRC. https://doi.org/10.1201/b15694Google Scholar
Snedeker, J., Geren, J., & Shafto, C. L. (2007). Starting over: International adoption as a natural experiment in language development. Psychological Science, 18(1), 7987.Google Scholar
Soderstrom, M., Casillas, M., Bergelson, E., Rosemberg, C., Warlaumont, A. S., & Bunce, J. (2021). Developing a cross-cultural annotation system and MetaCorpus for studying infants’ real world language experience. Collabra: Psychology.Google Scholar
Soderstrom, M., Grauer, E., Dufault, B., & McDivitt, K. (2018). Influences of number of adults and adult: child ratios on the quantity of adult language input across childcare settings. First Language, 38(6), 563581.Google Scholar
Soderstrom, M., & Wittebolle, K. (2013). When do caregivers talk? The influences of activity and time of day on caregiver speech and child vocalizations in two childcare environments. PloS One, 8, e80646. https://doi.org/10.1371/journal.pone.0080646Google Scholar
Sperry, D., Miller, P., & Sperry, L. (2015). Is there really a word gap. American Anthropological Association Annual Meeting, Denver, CO.Google Scholar
Stein, A., Menti, A. B., & Rosemberg, C. R. (2021). Socioeconomic status differences in the linguistic environment: a study with Spanish-speaking populations in Argentina, Early Years, 43(1), 3145, DOI: 10.1080/09575146.2021.1904383Google Scholar
Vogt, P., Mastin, J. D., & Schots, D. M. A. (2015). Communicative intentions of child-directed speech in three different learning environments: Observations from the Netherlands, and rural and urban Mozambique. First Language, 35(4–5), 341358. https://doi.org/10.1177/0142723715596647Google Scholar
Warlaumont, A. S., Pretzer, G. M., Mendoza, S., & Walle, E. A. (2016). Warlaumont HomeBank corpus.Google Scholar
Weisleder, A., & Fernald, A. (2013). Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychological Science, 24(11), 21432152. https://doi.org/10.1177/0956797613488145Google Scholar
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.orgGoogle Scholar
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: A professional framework for multimodality research. Proceedings of the Fifth International Conference on Language Resources and Evaluation, 15561559.Google Scholar
Wurm, L. H., & Fisicaro, S. A. (2014). What residualizing predictors in regression analyses does (and what it does not do). Journal of Memory and Language, 72, 3748. https://doi.org/https://doi.org/10.1016/j.jml.2013.12.003Google Scholar
Figure 0

Figure 1. Summary of clip selection and annotation method across corpora.

Figure 1

Table 1. Details for the corpora in the dataset (Bergelson, 2016; Casillas et al., 2017b; McDivitt & Soderstrom, 2016; Rosemberg et al., 2015; Rowland et al., 2018; Warlaumont et al., 2016)

Figure 2

Table 2. Predictions for TCDS analysis. Asterisk indicates effects previously observed with daylong child language data (Casillas et al., 2020, 2021; Scaff et al., 2023)

Figure 3

Table 3. Predictions for ADS analysis. Asterisk indicates effects previously observed with daylong child language data (Casillas et al., 2020, 2021; Scaff et al., 2023)

Figure 4

Table 4. Average input rates per clip across participants for each language group. Note that these descriptive statistics are raw rates and therefore reflect overall differences between corpora without controlling for, e.g., number of talkers present, which are accounted for in the statistical analyses

Figure 5

Figure 2. Mean by-recording rates of TCDS (above) and ADS (below) min/hr rates across language groups and talker types. For example, the upper-leftmost datapoint shows a recording with an average of 10 minutes per hour of TCDS from women talkers in North American English. The left-to-right order of language group within each of the six panels matches the order shown in the legend (NA English, UK English, Arg. Spanish, Tseltal, Yélî Dnye).

Figure 6

Figure 3. Coefficients with 95% confidence intervals from the count models of TCDS (left) and ADS (right) for all included fixed effects in the model with NA English and women set as the reference levels for language group and talker type, respectively. Intervals not overlapping with zero indicate significance. Color indicates population, ‘C’ and ‘M’ indicate effects related to child- and man-produced utterances, respectively. For example, both the left and the right panels show that both child- and man-produced input rates are significantly lower compared to the reference levels of woman-produced input. Note that the fixed effects included in each model are determined by the predictions laid out above separately for TCDS and ADS.

Supplementary material: File

Bunce et al. supplementary material

Bunce et al. supplementary material
Download Bunce et al. supplementary material(File)
File 1.6 MB