Hostname: page-component-7dd5485656-hw7sx Total loading time: 0 Render date: 2025-10-26T23:08:15.680Z Has data issue: false hasContentIssue false

Mandarin-learning toddlers’ sensitivity to noun phrase word order: An investigation of an early bias for language universals

Published online by Cambridge University Press:  22 August 2025

Lean Luo
Affiliation:
Department of Foreign Languages and Literatures, Tsinghua University , Beijing, China
Xiaolu Yang*
Affiliation:
Department of Foreign Languages and Literatures, Tsinghua University , Beijing, China
Stella Christie
Affiliation:
Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing, China Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
Rushen Shi
Affiliation:
Département de Psychologie, Université du Québec à Montréal, Montréal, QC, Canada
*
Corresponding author: Xiaolu Yang; Email: xlyang@mail.tsinghua.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

The current study probes Mandarin-learning toddlers’ sensitivity to two grammatical noun phrase orders differing in typological markedness. With three visual fixation experiments, we find that by age 2;6, children distinguish the cross-linguistically common order – but not the typologically rare one – from an ungrammatical order; however, their sensitivity to the two grammatical orders does not differ significantly. Further, we conduct a corpus analysis and demonstrate that for early acquisition, both grammatical orders are neither sufficiently nor consistently supported in the linguistic input. The sensitivity patterns and input profile outlined in our study constitute the first step of testing, in a natural language setting, a bias for typologically common ordering discussed in the artificial language learning literature. Although the findings remain inconclusive, they underscore the potential for future investigations in this direction.

摘要

摘要

本研究聚焦汉语中两种类型学标记性程度不同的名词短语形式, 考察幼儿对语序的敏感性。通过三项注视实验, 我们发现 2 岁 6 个月大的汉语儿童能够区分类型学普遍语序和不合法语序, 但尚未表现出辨别类型学罕见语序与不合法语序的能力; 然而, 两种合法语序的敏感性差异并不显著。此外, 语料库分析表明, 就早期习得而言, 两种合法语序的输入均缺乏充分且稳定的优势。本研究基于语序敏感性及语言输入特征, 为人造语言学习研究中提出的类型学普遍语序偏好提供了自然语言角度的实证基础。尽管结论尚不明确, 我们的研究发现表明沿此方向进一步探讨该议题具有重要意义。

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Recurrent linguistic patterns, also known as language universals, are often signals for the underpinnings of human language and mind. For one, in the theoretical tradition that conceives language as a cognitive computational system that can be formally characterized (e.g., Chomsky, Reference Chomsky1965), cross-linguistically frequent cases are often considered to be candidates for transparent instantiations of well-defined underlying representations whose components are in some way pre-specified in language-learners’ mind. For another, following the alternative approach that tries to reduce linguistic regularities to general-purpose mechanisms, one could argue that such patterns emerge from some cognitive primitives and/or processing principles (e.g., Hawkins, Reference Hawkins1983). Artificial language learning experiments in various domains have testified the cognitive privilege for these patterns, demonstrating some learner-internal bias for different language universals (see Culbertson, Reference Culbertson and Sprouse2023 for a systematic review).

The current study aims to advance understanding about the role of the documented learning advantage of cross-linguistically common regularities in child language acquisition. Examining whether relevant findings of artificial language learning studies can be generalized to real-world acquisition cases can further shed light on the process of language learning – whether it reflects internal biases towards universal patterns, as the artificial language learning research suggests. Further, extending the investigation into language universals to the domain of child language could provide an additional perspective for future study aiming at evaluating the nature of these patterns.

We present here an investigation into Mandarin-learning toddlers’ sensitivity to two complex noun phrase orders differing in typological markedness, which is related to a well-discussed phenomenon dating back to Greenberg (Reference Greenberg and Greenberg1963). In the next section, we begin with the typological and learning unevenness of different forms of noun phrase, and then discuss the property and acquisition of the relevant system in Mandarin Chinese. By doing so, we will show that Mandarin provides an appropriate window to probe the potential bias for universal noun phrase forms from a first-language perspective. In what follows, we will outline the present study and report our empirical findings. We first establish toddlers’ sensitivity to the two orders around 2;6 via three visual fixation experiments, and then, with a corpus analysis, outline the distributional properties of the two orders in the linguistic input. Finally, we will discuss the implications of our study.

2. Background

One early typological generalization in Greenberg (Reference Greenberg and Greenberg1963), well-known as universal 20 (U20), points out that noun phrase internal word orders attested in natural language have an uneven distribution:

The most common interpretation of (1) in the literature is that complex noun phrases involving the aforementioned internal elements in natural language are strictly limited to three forms, namely DEM-NUM-ADJ-N, N-ADJ-NUM-DEM, and N-DEM-NUM-ADJ. Though this eventually turns out to be too strong with the availability of data from more languages (see Cinque, Reference Cinque2005; Hawkins, Reference Hawkins1983; and references cited there), some recent large-scale typological examinations do reveal an evidently skewed distribution in the noun phrase domain. For example, Dryer (Reference Dryer2018) investigates the basic noun phrase order of 576 languages and reports the most frequent two to be DEM-NUM-ADJ-N (182 languages) and its mirror image N-ADJ-NUM-DEM (113 languages), notably exceeding the numbers of the other 22 logically possible combinations (e.g., the third most frequent order is found in 67 languages). This uneven distributional pattern, particularly the clear dominance of some attested orders, invites inquiries into possible underlying factors in human cognition.

A series of artificial language experiments empirically testifies the cognitively privileged status of the universal noun phrase forms. In Culbertson and Adger (Reference Culbertson and Adger2014), the initial study of the series, an artificial language was designed to examine a potential bias in the learning of noun phrase internal word order. In that language, adjectives, numerals, and demonstratives all came after nouns linearly. In one of their experiments, English-speaking adults, as learners of the artificial language, were exposed to bigram fragments such as N-ADJ and N-DEM in the training phase. Though the participants were informed about the relative positions between N and the other (in this case, postnominal) categories, they remained agnostic about the ordering among different postnominal items. However, when asked to make an inference about it, they overwhelmingly preferred the cross-linguistically frequent forms of ordering (e.g., N-ADJ-DEM) over those infrequent ones (e.g., N-DEM-ADJ). To minimize the interference of the L1 backgrounds of the participants, more experiments following a similar logic were conducted, and the general finding was replicated among other non-English-speaking populations using different artificial stimuli (Martin et al., Reference Martin, Ratitamkul, Abels, Adger and Culbertson2019; Martin et al., Reference Martin, Holtz, Abels, Adger and Culbertson2020; Martin et al., Reference Martin, Adger, Abels, Kanampiu and Culbertson2024). These results have pointed to an internal privilege to the universal forms of noun phrase that is active in language learning, though its exact nature remains an open issue (see Culbertson et al., Reference Culbertson, Schouwstra and Kirby2020). If this privilege underlies language acquisition in general, we should expect to find it at play during natural language acquisition as well. Moreover, clarifying the role of this universal bias in child acquisition might help make available a testing ground complementary to artificial language learning for different theories associated with the bias. To our knowledge, no prior study has attempted to examine the issue from a first language perspective.

The noun phrase system in Mandarin Chinese constitutes one natural language test candidate on the issue under discussion. Complex noun phrases containing DEM, NUM, ADJ, and N in Mandarin typically come in two forms (see (2)), a point that has been discussed for long in the literature (e.g., Ding et al., Reference Ding, Lü, Li, Sun, Guan, Fu, Huang and Chen1961; Lu, Reference Lu1998; Sio, Reference Sio2006; Zhang, Reference Zhang2015; among others).

As shown in (2), a typical complex noun phrase in Mandarin contains a numeral obligatorily followed by a classifier (CL), and an adjective appearing with a modification marker DE. Though bearing language-specific forms, CL and DE are cross-linguistically relevant. Classifiers provide units for counting, a grammatical function generally argued to be present and instantiated differently across languages (e.g., Borer, Reference Borer2005). Further, the marker DE is often conceived to signal the structure of phrasal (rather than word-internal) modification in Mandarin (e.g., Fan, Reference Fan1958), which is exactly the type of combinatorial property relevant to the cross-linguistic generalizations discussed above. These give grounds for situating the two forms in (2) in a typological frame. Viewed through this lens, then, Order 1, with ADJ being close to N, resembles the second most frequent form attested in Dryer (Reference Dryer2018) (also one of the orders alluded to in Greenberg’s (Reference Greenberg and Greenberg1963) U20; see above), while Order 2, in which ADJ comes before DEM, NUM, and N, is unattested as basic order in any language investigated in Dryer’s study.

Though the contexts the two orders occur in are not total equivalents, the exact meaning difference yielded by them is subtle and somewhat controversial in the literature (see Sio, Reference Sio2006 among others for more discussion). The broad generalization, as already noted in Ding et al. (Reference Ding, Lü, Li, Sun, Guan, Fu, Huang and Chen1961), is that barring any prosodic emphasis, adjectives in distinct positions tend to serve different functions. This can be seen from the examples below (adapted from Ding et al., Reference Ding, Lü, Li, Sun, Guan, Fu, Huang and Chen1961).

In (3a), which has the form of Order 1, the adjective serves only an unmarked, descriptive role, and thus the whole sentence receives a normal literal interpretation. By contrast, (3b), bearing an Order-2 noun phrase, is most naturally interpreted with an implicature of the existence of additional pens (marked by ‘↝’), in line with the restrictive use of the adjective, i.e., to differentiate the red object from the other non-red ones. However, the absence of similar non-literal meaning in (3c), despite its identical adjective positioning to (3b)’s, demonstrates the subtlety of the form-meaning mapping in the two orders.

To our knowledge, existing research on Mandarin-learning children’s complex noun phrase knowledge is too rare to be informative about the presence/absence of any potential bias during the process of development. So far, there is only one study we are aware of being directly relevant, namely Lee and Wu (Reference Lee, Wu, Hu and Pan2019). Specifically, the authors examined four- and five-year-olds’ knowledge of the meaning difference between two modified noun phrase forms that were similarly complex compared to the ones in (2). Though they did not include the DEM category, whose absence would arguably make the meaning or usage difference even more evident (e.g., noun phrases with DE-marked modifiers preceding numerals are more appropriately used to express specificity, but the same constraint does not apply when the DE-marked modifiers are placed after numerals; see Zhang, Reference Zhang2006), they found that children had trouble mapping the two forms onto their respective meanings in an adult-like way. This sets an important empirical foundation for our current study in revealing the possible delayed mastery of form-meaning mapping in the domain we focus on. What remains unknown, however, is the development of sensitivity to the legitimacy (i.e., grammaticality) of different forms of noun phrase, which supposedly takes place at an earlier age (for a general discussion on this point, see Naigles, Reference Naigles2002).

All these combined, we reason that the two Mandarin orders in (2) are suitable yet under-explored cases to investigate the topic of our focus. On the one hand, they justifiably constitute typologically common and rare noun phrase instantiations, respectively. On the other hand, the two orderings in question differ primarily in their forms, rather than meaning or usage, which young toddlers presumably fail to fully acquire in any case. It is therefore possible to compare the early sensitivity to the two orders on a formal basis. This also aligns with the focus of previous typological and artificial language learning research, and thus allows for an attempt to integrate the insights therein into the discussion of child language.

3. The present study

The literature reviewed above has pointed to the general expectation that the cross-linguistically common Order 1 in Mandarin (DEM-NUM-CL-ADJ-DE-N) may enjoy some acquisition or processing advantage as compared to Order 2 (ADJ-DE-DEM-NUM-CL-N). The current study explores this possibility by decomposing the claim into two testable components, namely, toddlers’ early sensitivity to the two orders in question and the relevant input information available to them. Our investigation centres around 2;6, as this is the age at which Mandarin-acquiring children are reported to demonstrate a reliable command over (at least some) items of different noun phrase internal categories (e.g., Hao et al., Reference Hao, Shu, Xing and Li2008; Lee, Reference Lee, Wilder and Åfarli2010; Li et al., Reference Li, Huang and Hsiao2010; Miao et al., Reference Miao, Yang, Shi, Brown and Kohut2020), hence satisfying the prerequisite for testing children’s sensitivity to the combinations of these items. In the meantime, we have chosen the full forms of the two orders as our primary investigation targets, though some partial forms, in particular the ones with NUM omitted, might be more common in daily speech. The first reason for our choice is to ensure that the forms being investigated are comparable to the complex noun phrases discussed in the typological literature, in which NUM, for instance, constitutes a central grammatical category. Second, because omission (e.g., of NUM) is prevalent in colloquial speech, child learners may initially parse categories that are frequently adjacent as unanalysed chunks (e.g., DEM-CL), which could in turn prevent us from examining the potential bias that is based on the compositional relations among different elements. We therefore focus on the full forms of the two orders and only consider partial structures when they are relevant (e.g., as potentially useful information in the input for the acquisition or processing of the full forms).

We measured toddlers’ sensitivity to the two Mandarin orders in three experiments. This part of the study adopted a preferential looking paradigm commonly known as the visual fixation procedure (Cooper & Aslin, Reference Cooper and Aslin1990), which uses children’s looking (i.e., listening) preference as a probe to examine their discrimination between different types of linguistic stimuli. In the standard design, no referential context is available during the experiment, thus allowing us to concentrate on the formal comparison of the orders being tested. The methodology has been shown effective to test children’s grammatical knowledge at 2;6 or an even younger age across languages (e.g., Shi et al., Reference Shi, Legrand and Brandenberger2020; Wang et al., Reference Wang, Yang and Shi2024; Ying et al., Reference Ying, Yang and Shi2022). Following prior research with toddlers aged 2;6 (e.g., Shi et al., Reference Shi, Legrand and Brandenberger2020), we pre-determined our target participant number to be 24 in each experiment. Furthermore, we constructed an additional order as an unacceptability baseline to probe children’s sensitivity to the two grammatical orders. Specifically, we chose an exceptionally restricted order in Mandarin Chinese, namely the one in (4), to serve the purpose.

In this order, the adjective, along with the modification marker DE, is put between the demonstrative and the numeral. This position is often considered unnatural, or even ungrammatical for adjectives or other modifiers marked by DE (see Lu, Reference Lu1998; Zhang, Reference Zhang2015; among others). This suggests the plausibility of using it as a reference point of unacceptability (which we will refer to as ungrammaticality for the sake of simplicity and consistency) in our experiments to test early knowledge of the other two productive orders.Footnote 1

For the investigation of the input, we made use of the data available in the Child Language Data Exchange System (CHILDES) and its corresponding analytical device, the Computerized Language Analysis (CLAN) program (MacWhinney, Reference MacWhinney2000). Operating on one dataset representative of the age range of our focus (i.e., before 2;6) and one larger dataset composed of all annotated mainland Mandarin corpora in CHILDES, we calculated the frequencies of different structures, including the two forms of complex noun phrase, as well as their sub-parts or sub-structures.

With these measurements, we examine the presence/absence of the internal bias for universal patterns. Specifically, if Order 1 is inherently favoured, we may expect evidence for an early sensitivity around 2;6 that is not reducible to an input effect, while the emergence of Order-2 sensitivity hinges on support from the input. On the contrary, if the acquisition of the two forms is of equal stature (i.e., no bias of any sort is at work at an early age), the development of sensitivity to both should be a transparent correspondence to the input information available to children.

4. The three experiments: Testing toddlers’ sensitivity to different orders

In Experiment 1, we examined children’s differentiation between the typologically common grammatical order (Order 1) and the ungrammatical baseline order (Order 3), and in Experiment 2, the one between the marked grammatical order (Order 2) and Order 3. In addition, we contrasted the two grammatical orders in Experiment 3. Significant differences in looking time would indicate their successful discrimination, hence the sensitivity to the grammaticality-ungrammaticality contrast, between the two test orders presented to them. As we show below, with all results taken together, we will have a fuller picture of toddlers’ sensitivity to the two orders.

4.1. Experiment 1

4.1.1. Participants

A total of 24 Mandarin-learning toddlers (mean age: 2;6;13; age range: 2;5;15–2;8;17; 12 females) with no reported hearing or language disorders participated in the experiment. Before the experiment, written informed consents were obtained from their parents or guardians. We also collected some basic background information of the participants, including residential area, parental education level, and additional dialectal exposure. For cases where relevant information was provided, all participants lived in or near Beijing, the capital city of China, and the parental education level (i.e., the higher one among the two parents’, following Leech et al., Reference Leech, Rowe and Huang2017) was university-level or above. They all received Mandarin as their predominant language input, with minimal to no exposure to other dialects.

Another 16 participants were tested but excluded from the analysis due to ceiling looking (3, i.e., never turned their heads away throughout the experiment), parental interference during test trials (6), fussiness or inattentiveness (5, judged by non-experimenters who were blind to the experiment), and failure to complete (2).

4.1.2. Materials

Speech stimuli used in both familiarization and test phases (see the Section 4.1.3. below) included the most common structures and items that are either among the most frequent ones in input speech (measured with the Tong Corpus in Deng & Yip, Reference Deng and Yip2018), or reported in Wordbank database (Frank et al., Reference Frank, Braginsky, Yurovsky and Marchman2017) to be spontaneously produced by more than 90% of toddlers aged 2;6 (Hao et al., Reference Hao, Shu, Xing and Li2008).

Familiarization stimuli were a set of sentences without any complex noun phrases, designed to get participants familiar with the nouns and adjectives in test sentences while not informing anything about the ordering of complex noun phrase internal elements. The adjectives appeared in the predicative position following a common adverb. Two nouns (dangao “cake” and xiaoxiong “bear”), each paired with an adjective (piaoliang “pretty” and keai “cute”), and two adverbs (zhen “really” and hao “very”) were used, yielding four distinct sentences in total. Exclamations and sentence final particles (SFP) were added so that they would sound more like utterances in child-directed speech. See (5) for an example sentence.

Test stimuli contained the same nouns and adjectives that were used in the familiarization. Test items were complex noun phrases in different forms: the grammatical and typologically common Order 1, and the additionally-constructed ungrammatical Order 3. The demonstrative na “that,” numeral yi “one,” and the general classifier ge were also chosen for their commonness in daily speech, with an aim to lower the processing load so that children could focus on parsing the overall word order difference of the stimuli. With the adjective-noun pairings in the familiarization phase unchanged, two items were produced for each test order. See (6) for one example of each order type.

Audio stimuli were recorded in a child-directed manner by a female native Mandarin speaker with varying intonations across different utterances. Familiarization sentences were produced directly, and all test items, on the contrary, underwent the process of cross-splicing, i.e., putting together two segments that were respectively spliced out from two different utterances. For Order-3 stimuli, which are probably ungrammatical for many natives, cross-splicing served to avoid any unnatural intonation that might emerge if they were read out directly. For instance, to get the ungrammatical (6b), we had the speaker produce two grammatical phrases with comparable lengths (see (7)), and the first part of (7a) and the second part of (7b) were conjoined to form (6b). We also controlled for the (supra-)segmental factors in the transitional parts by keeping the ending vowels of the last words in the first pieces and the initial consonants and tones of the second identical. Order-1 items were created in a similar way out of the consideration of experimental control.

A lip-synched puppet was used as the visual stimulus in both familiarization and test, as if she was articulating different utterances. Additionally, an animal animation accompanied by a piece of joyful music would appear at certain points during the experiment, serving as an attention-getter.

4.1.3. Design

The experiment consisted of a familiarization phase and a test phase. The design and auditory stimuli are displayed in Table 1.

Table 1. Auditory stimuli and design of Experiment 1

In the familiarization phase, all participants heard the same set of simple sentences with bare nouns and adjectives in the predicative position. A one-second interval was added between two adjacent familiarization sentences. In the test phase, Order-1 and Order-3 trials were presented to the participants in alternation, with each trial repeated five times. Each test trial had a maximum of six utterance tokens, also with an inter-stimulus interval of 1 second, yielding a total length of 23 seconds. The length was maintained identically for both trial types. The grammaticality of the initial test trial was counterbalanced across participants.

4.1.4. Procedure

Participants were tested individually in a soundproof chamber, sitting on their parents’ laps with their heads facing a hidden camera, through which the experimenter in another room could observe them. The experimenter was blind to all stimuli. The visual and audio stimuli were both played via the TV in front of the children, and the parents wore headphones playing masking music and were instructed not to interfere during the process. The procedure was run with an in-house program that allowed automatic trial presentation and looking time calculation on the basis of participants’ looking behaviour, which was recorded online by the experimenter’s pressing of computer buttons. All trials were infant-controlled: A trial started when the toddler participants looked at the screen, and terminated when they looked away for more than two consecutive seconds.

At the beginning of the familiarization phase, an attention-getter was presented, and the experimenter initiated the trial when toddlers fixated on it. The attention-getter reappeared automatically if children looked away for more than two consecutive seconds, and a new familiarization trial started when they looked back at the screen. Familiarization stopped when the accumulated looking time reached the pre-set threshold, i.e., 20 seconds. The following trials in the test phase proceeded in a similar manner: Each test trial stopped when toddlers looked away for more than 2 seconds or when the maximum length of the trial (i.e., 23 seconds) was reached. The experiment lasted for about 5 minutes.

4.1.5. Predictions

If toddlers at this age had already developed robust sensitivity to Order 1, they should be able to systematically discriminate the grammatical Order 1 from the ungrammatical Order 3, showing a significant looking time difference between the two types of test trials. We did not make prior predictions on the preferential direction (familiarity or novelty preference) as we found no study that was similar enough to serve as a reference. In the literature, both directions of preference have been reported, and factors such as age, task design, and stimulus difficulty could all be potential contributors to the final pattern (Hunter & Ames, Reference Hunter, Ames, Rovee-Collier and Lipsitt1988; see also Cyr & Shi, Reference Cyr and Shi2013; Thiessen & Saffran, Reference Thiessen and Saffran2003). In our case, a systematic preference in either direction would be taken as evidence for well-established sensitivity. If, on the contrary, the sensitivity was not stably in place at our test age, we should expect toddlers’ unsystematic looking behaviour as a group, which would result in the absence of a significant looking time difference.

4.1.6. Results and discussion

Figure 1 displays participants’ average looking time in Order-1 and Order-3 trials. As is customary for this paradigm, the first trial for each order type was excluded. The looking time in the remaining trials was compiled separately by order type.

Figure 1. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 1. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.

Following the common practice in the literature, statistical analyses were performed on each toddler’s mean looking time of different trials to abstract away from the needless natural variance in child participants’ looking behaviours (for recent studies endorsing similar analyses, see Shi et al., Reference Shi, Legrand and Brandenberger2020; Wang et al., Reference Wang, Yang and Shi2024; among others). This looking time difference, with the 95% bias-corrected and accelerated bootstrap confidence interval based on 1,000 resamples (BCa 95% CI) going from 0.07 to 3.94, was shown to be significant in a paired t-test (t(23) = 2.126, p = .044, two-tailed, Cohen’s d = .430). Overall, toddlers listened longer to the grammatical (Order-1) trials (Mean: 15.25 s; SE = .81) than to the ungrammatical (Order-3) trials (Mean: 13.31 s; SE = 1.01). This preferential pattern showed a degree of consistency among the individual data, with the majority (17 out of 24) of the children tested demonstrating a grammaticality preference. The overall systematicity, hence, allows us to extrapolate beyond the individual preferential variation, which might be produced by the minority of children whose word order sensitivity was distinct from the general level of this age group. The variation attested also falls within the norm of other studies, making similar group-level generalizations with the same paradigm (for one with comparable individual preferences specified, see Babineau & Christophe, Reference Babineau and Christophe2022).

A natural interpretation of the results would be that the sensitivity to the typologically common Order 1 is well-developed around 2;6, hence enabling toddlers this age to demonstrate a systematic discrimination. As mentioned earlier, a systematic group preference can indicate grammatical sensitivity, regardless of its direction. The familiarity effect (i.e., a preference for the grammatical order) attested in the present study is in line with findings of some previous studies employing the same paradigm (e.g., Babineau & Christophe, Reference Babineau and Christophe2022; Van Heugten & Shi, Reference Van Heugten and Shi2010; among others). The difficulty of grammatical stimulus encoding could be one potential cause of a familiarity preference (see also Shi et al., Reference Shi, Legrand and Brandenberger2020 for a similar discussion). It is possible that, given our current experimental design, the processing of our specific complex noun phrases was not entirely undemanding for the children that we tested. What matters the most for the generalization of Order-1 sensitivity, nonetheless, is that the discrimination effect was stable for this age group to be captured. This alone, however, says little about the deeper question we are interested in – that is, whether the cross-linguistically more common order is privileged in development due to the bias for language universals. Therefore, we need to test children’s sensitivity to the typologically rarer Order 2 and make a comparison. Following the same logic in Experiment 1, we relied on Order 3 to test toddlers’ Order-2 sensitivity in Experiment 2.

4.2. Experiment 2

4.2.1. Participants

Another 24 Mandarin-learning toddlers aged around 2;6 (mean age: 2;6;14; age range: 2;5;17–2;8;12; 14 females) participated in Experiment 2. As in Experiment 1, basic family information was collected, and their background was comparable to that of the children in the previous experiment. Another 13 participants were tested but excluded due to ceiling looking (1), parental interference during test trials (5), fussiness or inattentiveness (5, judged by non-experimenters who were blind to the experiment), failure to complete (1), and experimenter mis-operation (1).

4.2.2. Materials, design, and predictions

The speech stimuli and experimental design were nearly identical to those in Experiment 1, except that all Order-1 utterances were replaced with the Order-2 counterparts, which were comparable in intonation and length, and cross-spliced in a similar way. The design and stimuli used in Experiment 2 are shown in Table 2.

Table 2. Auditory stimuli and design of Experiment 2

Following the logic in Experiment 1, we should anticipate a significant discrimination if the sensitivity to Order 2 was equally well-developed as that of Order 1; conversely, if the Order-2 sensitivity was not robust enough around 2;6, children’s looking preferences would not be systematic as a group, and no significant difference in looking time should be expected.

4.2.3. Results and discussion

Results were sorted and analysed following the manner specified in Section 4.1.6. The looking time difference for Order-2 versus Order-3 trials (BCa 95% CI [−2.01, 3.77]) was shown to be not significant (t(23) = .831, p = .415, two-tailed, Cohen’s d = .216). As Figure 2 shows, though the overall listening time of the grammatical (Order-2) trials (Mean: 11.14 s; SE = .75) was still longer than that of the ungrammatical (Order-3) trials (Mean: 10.32 s; SE = .80), only 13 out of 24 participants demonstrated this preference. The non-significant results, as well as the greater preferential variation, revealed that the processing of the contrast between the two orders for children at this age was largely unsystematic, thus preventing reliable generalizations at the group level. Therefore, the overall results are interpreted as the absence of evidence for children’s Order-2 sensitivity.

Figure 2. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 2. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.

To further compare the toddlers’ discrimination performance in the two experiments, we combined all data and conducted a linear mixed-effects analysis, using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R. Each participant’s mean looking time in seconds for a trial was the dependent variable, with trial type (grammatical versus ungrammatical, sum-coded as 1 and − 1), experiment (Exp. 1 versus Exp. 2, also coded as 1 and − 1), and their interaction being the fixed effects. Random intercepts for trial type by participants were also included. Since each participant’s looking time of the same trial type was averaged (see justifications outlined in Section 4.1.6), the model need not include additional by-participant random slopes or by-item variance. As shown in Table 3, in the model that compared results of both experiments, trial type only displayed a weak marginal effect, which was arguably driven mainly by the systematic discrimination attested in Experiment 1. Further, the lack of interaction effect between trial type and experiment showed that there was no evidence for a fundamental difference between toddlers’ performance in the two experiments. The remaining effects were not of interest for the current study.Footnote 2

Table 3. The fixed-effects output of the linear mixed model for the combined dataset

Note: Significance is indicated in bold.

Therefore, although a systematic preference was attested in Experiment 1 but not in Experiment 2, the statistical comparison between the two grammatical orders did not reach significance. In other words, the Order-2 stimuli did not seem to be treated as items that were fundamentally different from the Order-1 items. Before we elaborate more on children’s sensitivity to the two orders, one potential concern needs addressing. Some research in the literature suggests that young infants and toddlers seem to be sensitive to frequent chunks in the input (e.g., Skarabela et al., Reference Skarabela, Ota, O’Connor and Arnon2021), leading to the possibility that our attested (non-)discrimination behaviour was due to some “cheating strategies” that children employed by tuning in to some prominent local strings. For instance, one frequent local string, NUM-CL-N, which is present in both Orders 2 and 3, but not in Order 1, might explain our results for now: Children discriminated orders with NUM-CL-N from those without (Experiment 1), but did not discriminate when both orders contained the NUM-CL-N string (Experiment 2). We addressed this concern (and also other similar ones based on other local strings) by testing children’s discrimination between Order 1 and Order 2 in Experiment 3, which also allowed us to further compare Order 1 and Order 2 in a more direct manner.

4.3. Experiment 3

4.3.1. Participants

Similar to the previous two experiments, participants of Experiment 3 were also 24 Mandarin-learning toddlers aged around 2;6 (mean age: 2;6;14; age range: 2;5;10–2;8;8; 12 females). According to the family information collected, their background was also similar to the participants in the previous experiments (see descriptions in Section 4.1.1). Another 10 participants’ data were dropped for ceiling looking (1), parental interference during test trials (2), fussiness or inattentiveness (2, judged by non-experimenters who were blind to the experiment), failure to complete (4), and experimenter mis-operation (1).

4.3.2. Materials, design, and predictions

All details remained identical with Experiments 1 and 2 except that the two test orders were replaced by Order 1 and Order 2. The speech stimuli of the two orders were the ones used in the previous experiments (see Tables 1 and 2), so that no extra confound was introduced.

If toddlers did base their (non-)differentiation behaviour on the presence or absence of prominent local strings that were present in Order 1 and absent in Orders 2 and 3 (such as NUM-CL-N), we should expect a significant discrimination between Orders 1 and 2 as well. If, on the contrary, children did fully represent and process the test noun phrases, we are more likely to have an overall non-discrimination result, in line with the lack of interaction effect in the comparison between the former experiments, because Order 2 might not be treated as entirely different from Order 1 by children aged 2;6.

4.3.3. Results and discussion

The looking time difference of the two orders (BCa 95% CI [−3.02, 1.14]) was non-significant as the paired t-test indicated (t(23) = −.812, p = .425, two-tailed, Cohen’s d = .220). A mixed-looking pattern was attested, with 10 participants looking longer in the Order-1 trials and 14 in Order-2 trials (Order 1: Mean: 10.23 s; SE = .66; Order 2: Mean: 11.00s; SE = .77, see Figure 3).

Figure 3. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 3. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.

The results were not predicted by the local-string-based explanation outlined above. In fact, we could see that the most appropriate interpretation would be that participants’ looking behaviour reflected their processing of fully represented noun phrases, as no local string could systematically explain the patterns we got.Footnote 3 Furthermore, the non-significant looking difference between Orders 1 and 2 again pointed to the fact that the two orders might not be treated completely differently by the age group we tested.

Summarizing the results of all experiments, we can see that children’s sensitivity to the two grammatical orders does not fundamentally differ – as shown in Experiment 3 and the comparison between Experiments 1 and 2. Before we derive any implications for the learning bias towards universal noun phrase ordering (qualified for Order 1 but not Order 2; see Section 2) based on these results, we still need to factor in the linguistic input available to children. Next, we evaluate whether there is a clear asymmetry between the two orders in the input.

5. A corpus analysis of the input

5.1. Methods

5.1.1. The datasets

In an attempt to represent the linguistic input available to the child learner (particularly by 2;6), we conducted our analyses on two sets of input data. First, we analysed the input speech by the age we tested in our experiments (i.e., 2;6) from the Tong corpus (Deng & Yip, Reference Deng and Yip2018), a representative longitudinal Mandarin corpus that includes the age range of our focus in CHILDES. In total, 12,049 input utterances were extracted. Next, to increase the reliability of the patterns, we conducted another investigation based on a much larger dataset, composed of all annotated mainland Mandarin corpora containing input speech in CHILDES, including AcadLang (Zhou, Reference Zhoun.d.), Erbaugh (Erbaugh, Reference Erbaugh and Slobin1992), LiReading (Li, Reference Lin.d.), Tong (Deng & Yip, Reference Deng and Yip2018), Zhou1 (Zhou, Reference Zhou2001), Zhou2 (Li & Zhou, Reference Li and Zhou2004), Zhou3 (Zhang & Zhou, Reference Zhang, Zhou and Zhou2009), ZhouAssessment (Zhou & Zhang, Reference Zhou and Zhangn.d.), ZhouDinner (Li & Zhou, Reference Li and Zhou2015), and ZhouNarratives (Li & Zhou, Reference Li and Zhou2011). A combination of all these corpora (referred to as “all corpora” henceforth) yielded 226,759 input utterances in total.

5.1.2. Two types of evidence

We considered two types of information, namely direct and indirect evidence, considering both the specific items we tested in experiments and other items falling into the same linguistic categories. By direct evidence, we refer to the input signature that can provide unambiguous information about the target structure under discussion. For the two noun phrase orders in Mandarin, we counted both fully and partially complex noun phrases that can provide such information about the two forms. Consider the following examples in (8) extracted from the input speech in the Erbaugh corpus (Erbaugh, Reference Erbaugh and Slobin1992) in CHILDES. Even though they are not the full noun phrase forms we tested in our experiments, they provide unambiguous information about the two grammatical orders, respectively. Specifically, (8a) indicates that DE-marked adjectives can occur after NUM-CL (Order 1), and (8b) signals the possibility of ADJ-DE preceding NUM-CL (Order 2).Footnote 4

As for indirect evidence, we refer to the data from which the target structure can be inferred. We considered one potentially prominent source: n-gram frequencies. We counted the bigrams and trigrams that exclusively occur in one of the two grammatical orders we focus on, based on the hypothesis that higher frequencies of the fragments of a complex form would result in toddlers’ higher sensitivity to it. Although it is shown in the literature that Mandarin-learning children at or even younger than our test age could go beyond linear sequence tracking and process input stimuli based on their linguistic structures (e.g., Ying et al., Reference Ying, Yang and Shi2022), statistics of this type were still factored in given their potential significance in children’s initial linguistic development (e.g., Ngon et al., Reference Ngon, Martin, Dupoux, Cabrol, Dutat and Peperkamp2013; Skarabela et al., Reference Skarabela, Ota, O’Connor and Arnon2021) and possible impact on processing even for adults (e.g., Tremblay et al., Reference Tremblay, Derwing, Libben and Westbury2011). In other words, the more prominent n-grams of a particular order might lead to a sensitivity advantage at 2;6, either by providing a better developmental support or by facilitating the recognition of relevant items. Therefore, the n-gram frequencies calculated here were the totalities of the occurrences of two orders’ different sub-parts, regardless of their linguistic status (e.g., whether the n-grams themselves are meaningful pieces, whether they are NPs or not, what syntactic environment they occur in, etc.) or position (whether the pieces occur at the beginning or end of the noun phrase).Footnote 5 Consider the bigram DE-N and trigram ADJ-DE-N. First, though they only occur in Order 1, they do not necessarily point to the grammaticality of Order 1 as they provide no clue about the relative position of different prenominal categories, i.e., DEM, NUM-CL, and ADJ-DE, when all of them co-occur. However, if n-grams alike occur very frequently in the input, in a way that far exceeds other n-grams like CL-N and NUM-CL-N (i.e., those only exist in Order 2), then we shall expect the sensitivity to Order 1 to develop prior to the sensitivity to Order 2 (similar logic applies to n-grams only occurring in Order 2). Following studies exploiting these two types of evidence in the literature (see also Koulaguina et al., Reference Koulaguina, Legendre, Barrière and Nazzi2019; Shi et al., Reference Shi, Legrand and Brandenberger2020; Yang, Reference Yang2004), our calculations were based on token frequencies, i.e., the total number of all items.Footnote 6

5.1.3. The search procedure

We made use of the tagging at the morphological tier of the transcripts in our analysis. Aside from the uniformly annotated ADJ, NUM, CL, and N, we tracked demonstratives via their tags “det” and “pro:dem,” and identified DE via “nom,” “cleft,” “poss,” and also via adjectives glossed as “adj-ish,” which indicated the presence of an unsegmented DE. For the initial search, we used the built-in COMBO command in the CLAN program, which allowed one to extract all combinations of the items with certain annotations. The search operated on all non-child tiers, essentially picking out not only the adult speech specifically directed to children but also the ones around them. After the search, we manually excluded the cases that were mistakenly included in a systematic way due to tagging errors. We also extracted instances with the items we used in the experiment for the calculation of item-specific chunks. Finally, we added up the overall frequencies of different structures.

5.1.4. Results and discussion

The results are summarized as in Table 4. Given the much smaller size of the Tong corpus dataset, item-specific statistics were too limited to be informative. Therefore, here we only report category-based results in the Tong corpus dataset, and include both in the larger-scale “all-corpora” dataset.

Table 4. Frequencies of different structures in the input based on both the specific items used in experiments and the grammatical categories involved; total number of child-directed utterances: 12,049 (the Tong corpus) & 226,759 (all corpora)

Note: Fully complex NPs and partially complex ones without DEM/NUM (e.g., (8)) were both counted as instances of complex form. As for n-grams, an occurrence of each listed chunk was counted as one token of indirect evidence. For example, one DEM-NUM-CL-ADJ-DE-N form would contribute one instance of Order-1 direct evidence, two Order-1 bigrams (CL-ADJ and DE-N), and three Order-1 trigrams (NUM-CL-ADJ, CL-ADJ-DE, and ADJ-DE-N).

For direct evidence, we did not find any instances of a noun phrase containing the exact same items we used in the experiment. As for the ones with other lexical items, the occurrences of both structures were still very rare. Though the absolute frequencies for Order 1 (17 and 246) were numerically higher than those for Order 2 (2 and 47), they each roughly accounted for fewer than 0.2% of total utterances (though the estimates might have slight deviations given that an utterance could in principle contain more than one complex noun phrase) in both datasets. These extremely sparse amounts of data could very well be indistinguishable from the negligible (but inevitable) noise in the input. We evaluate the role of these data with reference to other independent findings in the literature (for a similar idea, see Yang, Reference Yang2004). Ordering sensitivity acquired before 2;6 is found to be supported by a much larger amount of direct evidence. For instance, for the form of English wh-questions in which the wh-words are displaced to the beginning of the sentences, reliable acquisition has been attested in toddlers aged 1;6 (Perkins & Lidz, Reference Perkins and Lidz2021), and accordingly, around 25% unambiguous signatures are available in the input; and in the meantime, unambiguous evidence that is proportionally similar to the one we report here is either found to result in acquisition delay (see Yang, Reference Yang2004 and the references cited there) or argued to be trivial in explaining children’s early knowledge (see Koulaguina et al., Reference Koulaguina, Legendre, Barrière and Nazzi2019; Shi et al., Reference Shi, Legrand and Brandenberger2020). Therefore, we reason that direct evidence plays a limited role in accounting for the word order sensitivity we reported earlier.

For indirect evidence, Order 2 seemed to be slightly better supported overall. For chunks with the specific items we used, there were numerically higher Order-1 trigrams (22, as compared to 7 for Order 2), but substantially more Order-2 bigrams (224, as compared to 35 for Order 1). As for category-based data, we found more Order-2 n-grams consistently. As Table 4 shows, Order-2 n-grams were larger in number both in the Tong corpus (867 and 289, as compared to 638 and 203 for Order 1) and with all corpora combined (16,888 and 7,033, as compared to 16,669 and 4,991 for Order 1).

Therefore, children’s linguistic input does not display a clear advantage for either order in terms of early acquisition: The direct evidence tentatively points to the typologically common Order 1, but the data are too sparse and the advantage is not consistent when indirect evidence is taken into account. In the next section, we will discuss the implications of this input pattern as well as our experimental findings.

6. General discussion

The current study delves into the potential bias for language universals from the perspective of child language, by probing Mandarin-learning toddlers’ sensitivity to two grammatical noun phrase internal word orders differing in typological markedness, and the input statistics available to child learners. In the first part of our study, we tested children’s discrimination between different orders around 2;6 with three experiments. In Experiment 1, the typologically common grammatical Order 1, namely DEM-NUM-CL-ADJ-DE-N, was found to be systematically distinguished from the ungrammatical baseline Order 3 (DEM-ADJ-DE-NUM-CL-N). By contrast, the typologically rare grammatical Order 2, i.e., ADJ-DE-DEM-NUM-CL-N, was not reliably differentiated from the ungrammatical Order 3 (Experiment 2), nor from the grammatical Order 1 (Experiment 3) by toddlers aged around 2;6. In the second part, we showed with a corpus analysis, considering both the complex noun phrases approximating the two orders and their partial forms, that neither grammatical order showed a clear advantage for early acquisition in terms of linguistic input frequency.

With respect to the matter the present study sets out to test, i.e., the potential bias in learning for universal word orders in the noun phrase, our current findings remain inconclusive. Though the insufficient and inconsistent data revealed in the input suggest that children’s early sensitivity may be supported by factors beyond exposure to different word orders, it remains unclear whether its development is guided by a learning bias for language universals. According to our present data, the cross-linguistically common Order 1 is not fundamentally different from the typologically rare Order 2 in acquisition: The difference of (non-)discrimination effects between Experiments 1 and 2 was not statistically significant, and the two orders were not differentiated in Experiment 3.

However, the present study lays the foundation for future investigations of the learning bias for universal word orders in natural language acquisition. First, our corpus results have justified that the two orders, when in their full forms, serve as ideal test cases for the matter, because neither of them has a clear frequency advantage in the input. In other words, any clear acquisition asymmetry found at early stages of development between the two forms in future study is more likely to stem from factors internal to young children – for example, their preferences in organizing different noun phrase internal items, as attested in adults. Conversely, the total absence of such an asymmetry would point to the possibility that the learning bias observed in adults may gradually emerge later in development. Second, our experimental data also show that the visual fixation paradigm is suitable for the investigation of the issue in early toddlerhood as well. When young children are presented with the full forms of the two orders, they do seem to process the ordering of the complex noun phrases, as no local strings could systematically explain children’s looking patterns in our experiments.

Though our current comparisons between the two grammatical orders fail to provide evidence favouring an acquisition difference between the two grammatical orders, our results do not constitute strong evidence against an early bias for language universals either, especially because only Order 1, but not Order 2, is found to be discriminated from an ungrammatical order in our study. It is possible that the acquisition distinction between the two orders is not observed due to independent reasons. Replication of the current findings and/or further investigation targeting children’s Order-2 knowledge around this age could help further clarify the reliability of the null results in Experiments 2 and 3, as well as the potential acquisition asymmetry between the two orders. For instance, one possibility is that the acquisition of Order 2’s formal legitimacy is taking place around our test age (i.e., 30 months), so some children in this age group treat Order 2 as grammatical while others do not. Therefore, it is harder to capture a differentiation between Order 2 and the ungrammatical Order 3, or between Order 2 and the grammatical Order 1 at the group level.Footnote 7 Studies testing more children at the same age (and hence increasing the statistical power), or those examining slightly younger children, provided that they have the necessary lexical knowledge of the noun phrase items, may help illuminate the issue.

Another issue relevant to the potential learning bias that our study begins to explore is the abstractness of children’s ordering knowledge. The learning bias found in previous adult artificial language learning studies (Culbertson & Adger, Reference Culbertson and Adger2014; Martin et al., Reference Martin, Ratitamkul, Abels, Adger and Culbertson2019; Martin et al., Reference Martin, Holtz, Abels, Adger and Culbertson2020; Martin et al., Reference Martin, Adger, Abels, Kanampiu and Culbertson2024) is arguably category-based. In other words, to fully align with the bias discussed in the literature, one should provide evidence for children’s access to an abstract representation of a noun phrase that supports the universal orders. Even though the input frequency involving the particular items tested in our experiments does not seem to be a strong predictor for children’s sensitivity (especially to Order 1), our experiments were only based on a limited number of items, and therefore, not much inference can be made on this matter. Research on syntactic categorization has provided some hints for the abstractness of nominal structure in early grammar. For instance, studies investigating Mandarin-learning infants and toddlers have reported children’s ability to recognize novel items occurring after the DEM-CL chunk zhege “this-CL” as nouns (Ying et al., Reference Ying, Yang and Shi2022; Zhang et al., Reference Zhang, Shi and Li2015). Such findings implicate young children’s sensitivity to the syntactic structure of noun phrase, and thereby the accessibility of some abstract knowledge of nominal structure from an early age. Future research that reveals or disproves a generalized characterization of nominal structure supporting the universal orders in child grammar would contribute to the understanding of the potential learning bias.

To conclude, the current study constitutes the first step in integrating child language acquisition insights with the inquiry of language universals in the domain of noun phrase internal word order. Though the results do not provide clear support for the bias for universal linguistic patterns in early acquisition, our work lays the foundation for future research to further examine the bias from the perspective of first language acquisition. Investigations following this line may help develop a more comprehensive understanding about the essence of this bias and, by extension, the underpinnings of child language acquisition in general.

Acknowledgements

This study was supported by the National Social Science Fund of China (21BYY019) to Xiaolu YANG. We are grateful to all families that participated in the study. We also thank Prof. Xiaoshi HU, Prof. Ting XU, and Prof. Yue JI for their comments on the study at various stages, and members of the Language Acquisition Lab and the Child Cognition Center at Tsinghua University who helped with data collection, including Miao MIAO, Deming SHI, Ziqi WANG, Jiarui ZHANG, Yanting LI, and Wenjia TAN. Parts of this work were reported at HSP 2023, IACL-29, and ICTEAP-4. Comments and suggestions from the conference participants are appreciated. The authors thank the anonymous reviewers for their valuable comments and constructive suggestions, which have helped improve the quality of this ‘paper’.

Data availability statement

Experiment data and corpus analysis codes that support the current study are available via the Open Science Framework:

https://osf.io/aufn5/?view_only=183c0d4533e04efdb27d8213825d2e8e.

Competing interests

The authors declare none.

Footnotes

1 We did not use a more incontrovertibly ungrammatical ordering, e.g., putting ADJ with the marker DE between NUM and CL, as that might draw toddlers’ attention to the evident disruption of prominent local combinations such as NUM-CL, to the extent that they no longer bother representing or processing the ordering of the full noun phrase.

2 As suggested by one reviewer, we also carried out an exploratory analysis to determine whether additional (though minimal) dialectal input reported by parents/guardians or the residential areas of the families (i.e., different regions in or near Beijing, which may to some extent reflect family income differences) could predict toddlers’ performance, but found no significant effects (p’s > .05).

3 One reviewer raises the possibility of explaining the results in terms of linear overlap among the three orders. This will call for a more precise definition of overlap, and more importantly, independent empirical evidence with respect to the kind of overlap children might be sensitive to. For instance, if overlap is defined in terms of the position of different categories, the overlap between Orders 1 and 3 is more considerable than the one between Orders 1 and 2 – the former two both start with DEM and end with N, while the latter two overlap only in the final category N. However, the former pair was discriminated apart (Experiment 1), but the latter was not (Experiment 2). In other words, the possibility does not seem to be straightforwardly in line with our results, but more future research is needed for a comprehensive evaluation.

4 However, we excluded in our calculation partial forms that are assumed, empirically and/or theoretically, to be structurally distinct from the full forms, since that will obviate the possibility of them being useful direct evidence for the structures in question. First, we did not include noun phrases without overt nouns, as these forms are developmentally and distributionally different from the ones with overt nouns in early child productions (Packard, Reference Packard1988; see also Ji, Reference Ji2006; Lee, Reference Lee, Wilder and Åfarli2010), which is also in line with the theoretical research arguing against the analysis that treats noun phrases without overt nouns as simple elided cases of the full forms (e.g., Zhu, Reference Zhu1966; among others). Second, we also excluded forms involving DE-less modification, since they are typically word-internal, rather than phrasal, modification (see the Section 2). According to existing naturalistic production studies, there is no strong evidence suggesting that we should reject this assumption for early child grammar. For instance, modification forms with DE are attested before 2;0 (e.g., Hu, Reference Hu2007; Ji, Reference Ji2006), indicating that children do not systematically replace them with the formally simpler DE-less ones, likely due to their distinct grammatical status.

5 One could, in principle, adjust the weights of different n-grams according to different factors and/or do a more complex evaluation, but doing so would inevitably introduce additional assumptions about the child learner, many of which need careful (but not uncontroversial) justifications. Further, it has been shown above that no single local string could systematically explain our experimental results, so we only present the cumulative frequencies here (see Table 4) and leave more complicated evaluations for future research.

6 Type frequency (i.e., the number of distinct items) is also an important factor in acquisition research, especially when children’s ability to generalize abstract rules is at issue (e.g., Koulaguina & Shi, Reference Koulaguina and Shi2019). However, our experiments did not directly test the abstractness of children’s noun phrase representation. We therefore focus only on token frequencies (following the studies cited in the text) and will leave the matter of type frequency for future research.

7 Even though the present study focuses on children’s perception of word order, we conducted a simple search among the child productions by 2;6 in the Tong corpus following the procedure of our input analysis (see the section below). Seven instances of (partially) complex noun phrases in the form of Order 1 were identified before 2;6, but only one instance approximating Order 2 was found and it occurred right at 2;6. Though we cannot infer much from such a small sample, the finding is consistent with our interpretation that Order 1 is acquired before 2;6, while the sensitivity to Order 2 only begins to emerge around this age.

References

Babineau, M., & Christophe, A. (2022). Preverbal infants’ sensitivity to grammatical dependencies. Infancy, 27(4), 648662.10.1111/infa.12466CrossRefGoogle ScholarPubMed
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 148.10.18637/jss.v067.i01CrossRefGoogle Scholar
Borer, H. (2005). In name only. Oxford University Press.Google Scholar
Chomsky, N. (1965). Aspects of the theory of syntax. The MIT Press.Google Scholar
Cinque, G. (2005). Deriving Greenberg’s universal 20 and its exceptions. Linguistic Inquiry, 36(3), 315332.10.1162/0024389054396917CrossRefGoogle Scholar
Cooper, R. P., & Aslin, R. N. (1990). Preference for infant-directed speech in the first month after birth. Child Development, 61(5), 1584.10.2307/1130766CrossRefGoogle ScholarPubMed
Cyr, M., & Shi, R. (2013). Development of abstract grammatical categorization in infants. Child Development, 84(2), 617629.10.1111/j.1467-8624.2012.01869.xCrossRefGoogle ScholarPubMed
Culbertson, J. (2023). Artificial language learning. In Sprouse, J. (Ed.), The Oxford handbook of experimental syntax (pp. 271300). Oxford University Press.CrossRefGoogle Scholar
Culbertson, J., & Adger, D. (2014). Language learners privilege structured meaning over surface frequency. Proceedings of the National Academy of Sciences, 111(16), 58425847.CrossRefGoogle ScholarPubMed
Culbertson, J., Schouwstra, M., & Kirby, S. (2020). From the world to word order: Deriving biases in noun phrase order from statistical properties of the world. Language, 96(3), 696717.CrossRefGoogle Scholar
Deng, X., & Yip, V. (2018). A multimedia corpus of child mandarin: The Tong corpus. Journal of Chinese Linguistics, 46(1), 6992.Google Scholar
Ding, S., , S., Li, R., Sun, D., Guan, X., Fu, J., Huang, S., & Chen, Z. (1961). Xiandai Hanyu yufa jianghua [Lectures on modern Chinese grammar]. The Commercial Press.Google Scholar
Dryer, M. S. (2018). On the order of demonstrative, numeral, adjective, and noun. Language, 94(4), 798833.CrossRefGoogle Scholar
Erbaugh, M. (1992). The acquisition of mandarin. In Slobin, D. (Ed.), The crosslinguistic study of language acquisition (pp. 373455). Lawrence Erlbaum.Google Scholar
Fan, J. (1958). Xing-ming zuhe jian de zi de yufa zuoyong [The grammatical function of de in adjective-noun constructions]. Zhongguo Yuwen [Studies of the Chinese Language], (3), 213217.Google Scholar
Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2017). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language, 44(3), 677694.CrossRefGoogle ScholarPubMed
Greenberg, J. H. (1963). Some universals of grammar with particular reference to the order of meaning elements. In Greenberg, J. H. (Ed.), Universals of language (pp. 73113). The MIT Press.Google Scholar
Hao, M., Shu, H., Xing, A., & Li, P. (2008). Early vocabulary inventory for mandarin Chinese. Behavior Research Methods, 40(3), 728733.10.3758/BRM.40.3.728CrossRefGoogle ScholarPubMed
Hawkins, J. A. (1983). Word order universals. Academic Press.Google Scholar
Hu, Y. (2007). The (non)referential use of nominals in the early speech of mandarin-speaking children. [MA Thesis, Hunan University].Google Scholar
Hunter, M. A., & Ames, E. W. (1988). A multifactor model of infant preferences for novel and familiar stimuli. In Rovee-Collier, C. & Lipsitt, L. P. (Eds.), Advances in infancy research (Vol. 5, pp. 6995). Ablex.Google Scholar
Ji, S. (2006). Children’s acquisition of the nominal ‘de’ in mandarin [MA Thesis, Tsinghua University].Google Scholar
Koulaguina, E., Legendre, G., Barrière, I., & Nazzi, T. (2019). Towards abstract syntax at 24 months: Evidence from subject-verb agreement with conjoined subjects. Language Learning and Development, 15(2), 157176.10.1080/15475441.2019.1571417CrossRefGoogle Scholar
Koulaguina, E., & Shi, R. (2019). Rule generalization from inconsistent input in early infancy. Language Acquisition, 26(4), 416435.10.1080/10489223.2019.1572148CrossRefGoogle Scholar
Lee, T. H. (2010). Nominal structure in early child mandarin. In Wilder, C. & Åfarli, T. (Eds.), Chinese matters: From grammar to first and second language acquisition (pp. 75109). Tapir Academic Press.Google Scholar
Lee, T. H.-T., & Wu, Z. (2019). The acquisition of nominal structure, word order and referentiality in Chinese: Corpus and experimental findings on the numeral phrase. In Hu, J. & Pan, H. (Eds.), Interfaces in grammar (pp. 301340). John Benjamins Publishing Company.10.1075/lfab.15.11leeCrossRefGoogle ScholarPubMed
Leech, K. A., Rowe, M. L., & Huang, Y. T. (2017). Variations in the recruitment of syntactic knowledge contribute to SES differences in syntactic development. Journal of Child Language, 44(4), 9951009.CrossRefGoogle ScholarPubMed
Li, H., & Zhou, J. (2015). Study on dinner table talk of preschool children family in Shanghai. [Master’s Thesis, East China Normal University].Google Scholar
Li, L. (n.d.) CHILDES mandarin Li shared reading corpus. https://childes.talkbank.org/access/Chinese/Mandarin/LiReading.html.Google Scholar
Li, L., & Zhou, J. (2011). Preschool children’s development of reading comprehension of picture storybook: From a perspective of multimodel meaning making. [Doctoral dissertation, East China Normal University].Google Scholar
Li, P., Huang, B., & Hsiao, Y. (2010). Learning that classifiers count: Mandarin-speaking children’s acquisition of sortal and mensural classifiers. Journal of East Asian Linguistics, 19(3), 207230.10.1007/s10831-010-9060-1CrossRefGoogle ScholarPubMed
Li, X. Y., & Zhou, J. (2004). The effects of pragmatic skills of mothers with different education on children’s pragmatic development. [Master’s Thesis, Nanjing Normal University].Google Scholar
Lu, B. (1998). Left-right asymmetries of word order variation: A functional explanation. [Doctoral dissertation, University of South California].Google Scholar
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates.Google Scholar
Martin, A., Adger, D., Abels, K., Kanampiu, P., & Culbertson, J. (2024). A universal cognitive bias in word order: Evidence from speakers whose language goes against it. Psychological Science, 35(3), 121.CrossRefGoogle Scholar
Martin, A., Holtz, A., Abels, K., Adger, D., & Culbertson, J. (2020). Experimental evidence for the influence of structure and meaning on linear order in the noun phrase. Glossa: A Journal of General Linguistics, 5(1), 121.Google Scholar
Martin, A., Ratitamkul, T., Abels, K., Adger, D., & Culbertson, J. (2019). Cross-linguistic evidence for cognitive universals in the noun phrase. Linguistics Vanguard, 5(1), 20180072.CrossRefGoogle Scholar
Miao, M., Yang, X., & Shi, R. (2020). Mandarin-learning two-year-olds’ online processing of classifier-noun agreement. In Brown, M. M. & Kohut, A. (Eds.), Proceedings of the 44th annual Boston University conference on language development (pp. 390401). Cascadilla Press.Google Scholar
Naigles, L. R. (2002). Form is easy, meaning is hard: Resolving a paradox in early child language. Cognition, 86(2), 157199.10.1016/S0010-0277(02)00177-4CrossRefGoogle ScholarPubMed
Ngon, C., Martin, A., Dupoux, E., Cabrol, D., Dutat, M., & Peperkamp, S. (2013). (Non)words, (non)words, (non)words: Evidence for a protolexicon during the first year of life. Developmental Science, 16(1), 2434.10.1111/j.1467-7687.2012.01189.xCrossRefGoogle ScholarPubMed
Packard, J. L. (1988). The first-language acquisition of prenominal modification with de in mandarin. Journal of Chinese Linguistics, 16(1), 3154.Google Scholar
Perkins, L., & Lidz, J. (2021). Eighteen-month-old infants represent nonlocal syntactic dependencies. Proceedings of the National Academy of Sciences, 118(41), e2026469118.10.1073/pnas.2026469118CrossRefGoogle ScholarPubMed
Shi, R., Legrand, C., & Brandenberger, A. (2020). Toddlers track hierarchical structure dependence. Language Acquisition, 27(4), 397409.10.1080/10489223.2020.1776010CrossRefGoogle Scholar
Sio, J. U. (2006). Modification and reference in the Chinese nominal [Doctoral dissertation, Leiden University].Google Scholar
Skarabela, B., Ota, M., O’Connor, R., & Arnon, I. (2021). ‘Clap your hands’ or ‘take your hands’? One-year-olds distinguish between frequent and infrequent multiword phrases. Cognition, 211, 104612.CrossRefGoogle ScholarPubMed
Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: Use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. Developmental Psychology, 39(4), 706716.CrossRefGoogle ScholarPubMed
Tremblay, A., Derwing, B., Libben, G., & Westbury, C. (2011). Processing advantages of lexical bundles: Evidence from self‐paced reading and sentence recall tasks. Language Learning, 61(2), 569613.CrossRefGoogle Scholar
Van Heugten, M., & Shi, R. (2010). Infants’ sensitivity to non-adjacent dependencies across phonological phrase boundaries. The Journal of the Acoustical Society of America, 128(5), EL223EL228.10.1121/1.3486197CrossRefGoogle ScholarPubMed
Wang, Z., Yang, X., & Shi, R. (2024). Mandarin-learning 19-month-old toddlers’ sensitivity to word order cues that differentiate unaccusative and unergative verbs. Journal of Child Language, 51(2), 249270.10.1017/S0305000922000629CrossRefGoogle ScholarPubMed
Yang, C. D. (2004). Universal grammar, statistics or both? Trends in Cognitive Sciences, 8(10), 451456.CrossRefGoogle ScholarPubMed
Ying, Y., Yang, X., & Shi, R. (2022). Toddlers use functional morphemes for backward syntactic categorization. First Language, 42(3), 448465.10.1177/01427237221079137CrossRefGoogle Scholar
Zhang, L., & Zhou, J. (2009). The development of mean length of utterance in mandarin-speaking children. In Zhou, J. (Ed.), The application and development of international corpus-based research methods (pp. 4058). Education Science Publishing House.Google Scholar
Zhang, N. N. (2006). Representing specificity by the internal order of indefinites. Linguistics, 44(1), 121.10.1515/LING.2006.001CrossRefGoogle Scholar
Zhang, Z., Shi, R., & Li, A. (2015). Grammatical categorization in mandarin-Chinese-learning infants. Language Acquisition, 22(1), 104115.CrossRefGoogle Scholar
Zhang, N. N. (2015). Nominal-internal phrasal movement in Mandarin Chinese. The Linguistic Review, 32(2), 375425.CrossRefGoogle Scholar
Zhou, J. (2001). Pragmatic development of mandarin speaking young children: From 14 months to 32 months. [Doctoral dissertation, The University of Hong Kong].Google Scholar
Zhou, J. (n.d.) CHILDES mandarin academic language corpus. https://childes.talkbank.org/access/Chinese/Mandarin/AcadLang.html.Google Scholar
Zhou, J., & Zhang, Y. (n.d.) CHILDES mandarin Zhou assessment corpus. https://childes.talkbank.org/access/Chinese/Mandarin/ZhouAssessment.html.Google Scholar
Zhu, D. (1966). Guanyu “shuo ‘de’” [About the paper “on ‘de’”]. Zhongguo Yuwen [Studies of the Chinese Language], 1, 3747.Google Scholar
Figure 0

Table 1. Auditory stimuli and design of Experiment 1

Figure 1

Figure 1. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 1. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.

Figure 2

Table 2. Auditory stimuli and design of Experiment 2

Figure 3

Figure 2. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 2. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.

Figure 4

Table 3. The fixed-effects output of the linear mixed model for the combined dataset

Figure 5

Figure 3. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 3. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.

Figure 6

Table 4. Frequencies of different structures in the input based on both the specific items used in experiments and the grammatical categories involved; total number of child-directed utterances: 12,049 (the Tong corpus) & 226,759 (all corpora)