1. Introduction
Recurrent linguistic patterns, also known as language universals, are often signals for the underpinnings of human language and mind. For one, in the theoretical tradition that conceives language as a cognitive computational system that can be formally characterized (e.g., Chomsky, Reference Chomsky1965), cross-linguistically frequent cases are often considered to be candidates for transparent instantiations of well-defined underlying representations whose components are in some way pre-specified in language-learners’ mind. For another, following the alternative approach that tries to reduce linguistic regularities to general-purpose mechanisms, one could argue that such patterns emerge from some cognitive primitives and/or processing principles (e.g., Hawkins, Reference Hawkins1983). Artificial language learning experiments in various domains have testified the cognitive privilege for these patterns, demonstrating some learner-internal bias for different language universals (see Culbertson, Reference Culbertson and Sprouse2023 for a systematic review).
The current study aims to advance understanding about the role of the documented learning advantage of cross-linguistically common regularities in child language acquisition. Examining whether relevant findings of artificial language learning studies can be generalized to real-world acquisition cases can further shed light on the process of language learning – whether it reflects internal biases towards universal patterns, as the artificial language learning research suggests. Further, extending the investigation into language universals to the domain of child language could provide an additional perspective for future study aiming at evaluating the nature of these patterns.
We present here an investigation into Mandarin-learning toddlers’ sensitivity to two complex noun phrase orders differing in typological markedness, which is related to a well-discussed phenomenon dating back to Greenberg (Reference Greenberg and Greenberg1963). In the next section, we begin with the typological and learning unevenness of different forms of noun phrase, and then discuss the property and acquisition of the relevant system in Mandarin Chinese. By doing so, we will show that Mandarin provides an appropriate window to probe the potential bias for universal noun phrase forms from a first-language perspective. In what follows, we will outline the present study and report our empirical findings. We first establish toddlers’ sensitivity to the two orders around 2;6 via three visual fixation experiments, and then, with a corpus analysis, outline the distributional properties of the two orders in the linguistic input. Finally, we will discuss the implications of our study.
2. Background
One early typological generalization in Greenberg (Reference Greenberg and Greenberg1963), well-known as universal 20 (U20), points out that noun phrase internal word orders attested in natural language have an uneven distribution:

The most common interpretation of (1) in the literature is that complex noun phrases involving the aforementioned internal elements in natural language are strictly limited to three forms, namely DEM-NUM-ADJ-N, N-ADJ-NUM-DEM, and N-DEM-NUM-ADJ. Though this eventually turns out to be too strong with the availability of data from more languages (see Cinque, Reference Cinque2005; Hawkins, Reference Hawkins1983; and references cited there), some recent large-scale typological examinations do reveal an evidently skewed distribution in the noun phrase domain. For example, Dryer (Reference Dryer2018) investigates the basic noun phrase order of 576 languages and reports the most frequent two to be DEM-NUM-ADJ-N (182 languages) and its mirror image N-ADJ-NUM-DEM (113 languages), notably exceeding the numbers of the other 22 logically possible combinations (e.g., the third most frequent order is found in 67 languages). This uneven distributional pattern, particularly the clear dominance of some attested orders, invites inquiries into possible underlying factors in human cognition.
A series of artificial language experiments empirically testifies the cognitively privileged status of the universal noun phrase forms. In Culbertson and Adger (Reference Culbertson and Adger2014), the initial study of the series, an artificial language was designed to examine a potential bias in the learning of noun phrase internal word order. In that language, adjectives, numerals, and demonstratives all came after nouns linearly. In one of their experiments, English-speaking adults, as learners of the artificial language, were exposed to bigram fragments such as N-ADJ and N-DEM in the training phase. Though the participants were informed about the relative positions between N and the other (in this case, postnominal) categories, they remained agnostic about the ordering among different postnominal items. However, when asked to make an inference about it, they overwhelmingly preferred the cross-linguistically frequent forms of ordering (e.g., N-ADJ-DEM) over those infrequent ones (e.g., N-DEM-ADJ). To minimize the interference of the L1 backgrounds of the participants, more experiments following a similar logic were conducted, and the general finding was replicated among other non-English-speaking populations using different artificial stimuli (Martin et al., Reference Martin, Ratitamkul, Abels, Adger and Culbertson2019; Martin et al., Reference Martin, Holtz, Abels, Adger and Culbertson2020; Martin et al., Reference Martin, Adger, Abels, Kanampiu and Culbertson2024). These results have pointed to an internal privilege to the universal forms of noun phrase that is active in language learning, though its exact nature remains an open issue (see Culbertson et al., Reference Culbertson, Schouwstra and Kirby2020). If this privilege underlies language acquisition in general, we should expect to find it at play during natural language acquisition as well. Moreover, clarifying the role of this universal bias in child acquisition might help make available a testing ground complementary to artificial language learning for different theories associated with the bias. To our knowledge, no prior study has attempted to examine the issue from a first language perspective.
The noun phrase system in Mandarin Chinese constitutes one natural language test candidate on the issue under discussion. Complex noun phrases containing DEM, NUM, ADJ, and N in Mandarin typically come in two forms (see (2)), a point that has been discussed for long in the literature (e.g., Ding et al., Reference Ding, Lü, Li, Sun, Guan, Fu, Huang and Chen1961; Lu, Reference Lu1998; Sio, Reference Sio2006; Zhang, Reference Zhang2015; among others).

As shown in (2), a typical complex noun phrase in Mandarin contains a numeral obligatorily followed by a classifier (CL), and an adjective appearing with a modification marker DE. Though bearing language-specific forms, CL and DE are cross-linguistically relevant. Classifiers provide units for counting, a grammatical function generally argued to be present and instantiated differently across languages (e.g., Borer, Reference Borer2005). Further, the marker DE is often conceived to signal the structure of phrasal (rather than word-internal) modification in Mandarin (e.g., Fan, Reference Fan1958), which is exactly the type of combinatorial property relevant to the cross-linguistic generalizations discussed above. These give grounds for situating the two forms in (2) in a typological frame. Viewed through this lens, then, Order 1, with ADJ being close to N, resembles the second most frequent form attested in Dryer (Reference Dryer2018) (also one of the orders alluded to in Greenberg’s (Reference Greenberg and Greenberg1963) U20; see above), while Order 2, in which ADJ comes before DEM, NUM, and N, is unattested as basic order in any language investigated in Dryer’s study.
Though the contexts the two orders occur in are not total equivalents, the exact meaning difference yielded by them is subtle and somewhat controversial in the literature (see Sio, Reference Sio2006 among others for more discussion). The broad generalization, as already noted in Ding et al. (Reference Ding, Lü, Li, Sun, Guan, Fu, Huang and Chen1961), is that barring any prosodic emphasis, adjectives in distinct positions tend to serve different functions. This can be seen from the examples below (adapted from Ding et al., Reference Ding, Lü, Li, Sun, Guan, Fu, Huang and Chen1961).


In (3a), which has the form of Order 1, the adjective serves only an unmarked, descriptive role, and thus the whole sentence receives a normal literal interpretation. By contrast, (3b), bearing an Order-2 noun phrase, is most naturally interpreted with an implicature of the existence of additional pens (marked by ‘↝’), in line with the restrictive use of the adjective, i.e., to differentiate the red object from the other non-red ones. However, the absence of similar non-literal meaning in (3c), despite its identical adjective positioning to (3b)’s, demonstrates the subtlety of the form-meaning mapping in the two orders.
To our knowledge, existing research on Mandarin-learning children’s complex noun phrase knowledge is too rare to be informative about the presence/absence of any potential bias during the process of development. So far, there is only one study we are aware of being directly relevant, namely Lee and Wu (Reference Lee, Wu, Hu and Pan2019). Specifically, the authors examined four- and five-year-olds’ knowledge of the meaning difference between two modified noun phrase forms that were similarly complex compared to the ones in (2). Though they did not include the DEM category, whose absence would arguably make the meaning or usage difference even more evident (e.g., noun phrases with DE-marked modifiers preceding numerals are more appropriately used to express specificity, but the same constraint does not apply when the DE-marked modifiers are placed after numerals; see Zhang, Reference Zhang2006), they found that children had trouble mapping the two forms onto their respective meanings in an adult-like way. This sets an important empirical foundation for our current study in revealing the possible delayed mastery of form-meaning mapping in the domain we focus on. What remains unknown, however, is the development of sensitivity to the legitimacy (i.e., grammaticality) of different forms of noun phrase, which supposedly takes place at an earlier age (for a general discussion on this point, see Naigles, Reference Naigles2002).
All these combined, we reason that the two Mandarin orders in (2) are suitable yet under-explored cases to investigate the topic of our focus. On the one hand, they justifiably constitute typologically common and rare noun phrase instantiations, respectively. On the other hand, the two orderings in question differ primarily in their forms, rather than meaning or usage, which young toddlers presumably fail to fully acquire in any case. It is therefore possible to compare the early sensitivity to the two orders on a formal basis. This also aligns with the focus of previous typological and artificial language learning research, and thus allows for an attempt to integrate the insights therein into the discussion of child language.
3. The present study
The literature reviewed above has pointed to the general expectation that the cross-linguistically common Order 1 in Mandarin (DEM-NUM-CL-ADJ-DE-N) may enjoy some acquisition or processing advantage as compared to Order 2 (ADJ-DE-DEM-NUM-CL-N). The current study explores this possibility by decomposing the claim into two testable components, namely, toddlers’ early sensitivity to the two orders in question and the relevant input information available to them. Our investigation centres around 2;6, as this is the age at which Mandarin-acquiring children are reported to demonstrate a reliable command over (at least some) items of different noun phrase internal categories (e.g., Hao et al., Reference Hao, Shu, Xing and Li2008; Lee, Reference Lee, Wilder and Åfarli2010; Li et al., Reference Li, Huang and Hsiao2010; Miao et al., Reference Miao, Yang, Shi, Brown and Kohut2020), hence satisfying the prerequisite for testing children’s sensitivity to the combinations of these items. In the meantime, we have chosen the full forms of the two orders as our primary investigation targets, though some partial forms, in particular the ones with NUM omitted, might be more common in daily speech. The first reason for our choice is to ensure that the forms being investigated are comparable to the complex noun phrases discussed in the typological literature, in which NUM, for instance, constitutes a central grammatical category. Second, because omission (e.g., of NUM) is prevalent in colloquial speech, child learners may initially parse categories that are frequently adjacent as unanalysed chunks (e.g., DEM-CL), which could in turn prevent us from examining the potential bias that is based on the compositional relations among different elements. We therefore focus on the full forms of the two orders and only consider partial structures when they are relevant (e.g., as potentially useful information in the input for the acquisition or processing of the full forms).
We measured toddlers’ sensitivity to the two Mandarin orders in three experiments. This part of the study adopted a preferential looking paradigm commonly known as the visual fixation procedure (Cooper & Aslin, Reference Cooper and Aslin1990), which uses children’s looking (i.e., listening) preference as a probe to examine their discrimination between different types of linguistic stimuli. In the standard design, no referential context is available during the experiment, thus allowing us to concentrate on the formal comparison of the orders being tested. The methodology has been shown effective to test children’s grammatical knowledge at 2;6 or an even younger age across languages (e.g., Shi et al., Reference Shi, Legrand and Brandenberger2020; Wang et al., Reference Wang, Yang and Shi2024; Ying et al., Reference Ying, Yang and Shi2022). Following prior research with toddlers aged 2;6 (e.g., Shi et al., Reference Shi, Legrand and Brandenberger2020), we pre-determined our target participant number to be 24 in each experiment. Furthermore, we constructed an additional order as an unacceptability baseline to probe children’s sensitivity to the two grammatical orders. Specifically, we chose an exceptionally restricted order in Mandarin Chinese, namely the one in (4), to serve the purpose.

In this order, the adjective, along with the modification marker DE, is put between the demonstrative and the numeral. This position is often considered unnatural, or even ungrammatical for adjectives or other modifiers marked by DE (see Lu, Reference Lu1998; Zhang, Reference Zhang2015; among others). This suggests the plausibility of using it as a reference point of unacceptability (which we will refer to as ungrammaticality for the sake of simplicity and consistency) in our experiments to test early knowledge of the other two productive orders.Footnote 1
For the investigation of the input, we made use of the data available in the Child Language Data Exchange System (CHILDES) and its corresponding analytical device, the Computerized Language Analysis (CLAN) program (MacWhinney, Reference MacWhinney2000). Operating on one dataset representative of the age range of our focus (i.e., before 2;6) and one larger dataset composed of all annotated mainland Mandarin corpora in CHILDES, we calculated the frequencies of different structures, including the two forms of complex noun phrase, as well as their sub-parts or sub-structures.
With these measurements, we examine the presence/absence of the internal bias for universal patterns. Specifically, if Order 1 is inherently favoured, we may expect evidence for an early sensitivity around 2;6 that is not reducible to an input effect, while the emergence of Order-2 sensitivity hinges on support from the input. On the contrary, if the acquisition of the two forms is of equal stature (i.e., no bias of any sort is at work at an early age), the development of sensitivity to both should be a transparent correspondence to the input information available to children.
4. The three experiments: Testing toddlers’ sensitivity to different orders
In Experiment 1, we examined children’s differentiation between the typologically common grammatical order (Order 1) and the ungrammatical baseline order (Order 3), and in Experiment 2, the one between the marked grammatical order (Order 2) and Order 3. In addition, we contrasted the two grammatical orders in Experiment 3. Significant differences in looking time would indicate their successful discrimination, hence the sensitivity to the grammaticality-ungrammaticality contrast, between the two test orders presented to them. As we show below, with all results taken together, we will have a fuller picture of toddlers’ sensitivity to the two orders.
4.1. Experiment 1
4.1.1. Participants
A total of 24 Mandarin-learning toddlers (mean age: 2;6;13; age range: 2;5;15–2;8;17; 12 females) with no reported hearing or language disorders participated in the experiment. Before the experiment, written informed consents were obtained from their parents or guardians. We also collected some basic background information of the participants, including residential area, parental education level, and additional dialectal exposure. For cases where relevant information was provided, all participants lived in or near Beijing, the capital city of China, and the parental education level (i.e., the higher one among the two parents’, following Leech et al., Reference Leech, Rowe and Huang2017) was university-level or above. They all received Mandarin as their predominant language input, with minimal to no exposure to other dialects.
Another 16 participants were tested but excluded from the analysis due to ceiling looking (3, i.e., never turned their heads away throughout the experiment), parental interference during test trials (6), fussiness or inattentiveness (5, judged by non-experimenters who were blind to the experiment), and failure to complete (2).
4.1.2. Materials
Speech stimuli used in both familiarization and test phases (see the Section 4.1.3. below) included the most common structures and items that are either among the most frequent ones in input speech (measured with the Tong Corpus in Deng & Yip, Reference Deng and Yip2018), or reported in Wordbank database (Frank et al., Reference Frank, Braginsky, Yurovsky and Marchman2017) to be spontaneously produced by more than 90% of toddlers aged 2;6 (Hao et al., Reference Hao, Shu, Xing and Li2008).
Familiarization stimuli were a set of sentences without any complex noun phrases, designed to get participants familiar with the nouns and adjectives in test sentences while not informing anything about the ordering of complex noun phrase internal elements. The adjectives appeared in the predicative position following a common adverb. Two nouns (dangao “cake” and xiaoxiong “bear”), each paired with an adjective (piaoliang “pretty” and keai “cute”), and two adverbs (zhen “really” and hao “very”) were used, yielding four distinct sentences in total. Exclamations and sentence final particles (SFP) were added so that they would sound more like utterances in child-directed speech. See (5) for an example sentence.

Test stimuli contained the same nouns and adjectives that were used in the familiarization. Test items were complex noun phrases in different forms: the grammatical and typologically common Order 1, and the additionally-constructed ungrammatical Order 3. The demonstrative na “that,” numeral yi “one,” and the general classifier ge were also chosen for their commonness in daily speech, with an aim to lower the processing load so that children could focus on parsing the overall word order difference of the stimuli. With the adjective-noun pairings in the familiarization phase unchanged, two items were produced for each test order. See (6) for one example of each order type.

Audio stimuli were recorded in a child-directed manner by a female native Mandarin speaker with varying intonations across different utterances. Familiarization sentences were produced directly, and all test items, on the contrary, underwent the process of cross-splicing, i.e., putting together two segments that were respectively spliced out from two different utterances. For Order-3 stimuli, which are probably ungrammatical for many natives, cross-splicing served to avoid any unnatural intonation that might emerge if they were read out directly. For instance, to get the ungrammatical (6b), we had the speaker produce two grammatical phrases with comparable lengths (see (7)), and the first part of (7a) and the second part of (7b) were conjoined to form (6b). We also controlled for the (supra-)segmental factors in the transitional parts by keeping the ending vowels of the last words in the first pieces and the initial consonants and tones of the second identical. Order-1 items were created in a similar way out of the consideration of experimental control.

A lip-synched puppet was used as the visual stimulus in both familiarization and test, as if she was articulating different utterances. Additionally, an animal animation accompanied by a piece of joyful music would appear at certain points during the experiment, serving as an attention-getter.
4.1.3. Design
The experiment consisted of a familiarization phase and a test phase. The design and auditory stimuli are displayed in Table 1.
Table 1. Auditory stimuli and design of Experiment 1

In the familiarization phase, all participants heard the same set of simple sentences with bare nouns and adjectives in the predicative position. A one-second interval was added between two adjacent familiarization sentences. In the test phase, Order-1 and Order-3 trials were presented to the participants in alternation, with each trial repeated five times. Each test trial had a maximum of six utterance tokens, also with an inter-stimulus interval of 1 second, yielding a total length of 23 seconds. The length was maintained identically for both trial types. The grammaticality of the initial test trial was counterbalanced across participants.
4.1.4. Procedure
Participants were tested individually in a soundproof chamber, sitting on their parents’ laps with their heads facing a hidden camera, through which the experimenter in another room could observe them. The experimenter was blind to all stimuli. The visual and audio stimuli were both played via the TV in front of the children, and the parents wore headphones playing masking music and were instructed not to interfere during the process. The procedure was run with an in-house program that allowed automatic trial presentation and looking time calculation on the basis of participants’ looking behaviour, which was recorded online by the experimenter’s pressing of computer buttons. All trials were infant-controlled: A trial started when the toddler participants looked at the screen, and terminated when they looked away for more than two consecutive seconds.
At the beginning of the familiarization phase, an attention-getter was presented, and the experimenter initiated the trial when toddlers fixated on it. The attention-getter reappeared automatically if children looked away for more than two consecutive seconds, and a new familiarization trial started when they looked back at the screen. Familiarization stopped when the accumulated looking time reached the pre-set threshold, i.e., 20 seconds. The following trials in the test phase proceeded in a similar manner: Each test trial stopped when toddlers looked away for more than 2 seconds or when the maximum length of the trial (i.e., 23 seconds) was reached. The experiment lasted for about 5 minutes.
4.1.5. Predictions
If toddlers at this age had already developed robust sensitivity to Order 1, they should be able to systematically discriminate the grammatical Order 1 from the ungrammatical Order 3, showing a significant looking time difference between the two types of test trials. We did not make prior predictions on the preferential direction (familiarity or novelty preference) as we found no study that was similar enough to serve as a reference. In the literature, both directions of preference have been reported, and factors such as age, task design, and stimulus difficulty could all be potential contributors to the final pattern (Hunter & Ames, Reference Hunter, Ames, Rovee-Collier and Lipsitt1988; see also Cyr & Shi, Reference Cyr and Shi2013; Thiessen & Saffran, Reference Thiessen and Saffran2003). In our case, a systematic preference in either direction would be taken as evidence for well-established sensitivity. If, on the contrary, the sensitivity was not stably in place at our test age, we should expect toddlers’ unsystematic looking behaviour as a group, which would result in the absence of a significant looking time difference.
4.1.6. Results and discussion
Figure 1 displays participants’ average looking time in Order-1 and Order-3 trials. As is customary for this paradigm, the first trial for each order type was excluded. The looking time in the remaining trials was compiled separately by order type.

Figure 1. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 1. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.
Following the common practice in the literature, statistical analyses were performed on each toddler’s mean looking time of different trials to abstract away from the needless natural variance in child participants’ looking behaviours (for recent studies endorsing similar analyses, see Shi et al., Reference Shi, Legrand and Brandenberger2020; Wang et al., Reference Wang, Yang and Shi2024; among others). This looking time difference, with the 95% bias-corrected and accelerated bootstrap confidence interval based on 1,000 resamples (BCa 95% CI) going from 0.07 to 3.94, was shown to be significant in a paired t-test (t(23) = 2.126, p = .044, two-tailed, Cohen’s d = .430). Overall, toddlers listened longer to the grammatical (Order-1) trials (Mean: 15.25 s; SE = .81) than to the ungrammatical (Order-3) trials (Mean: 13.31 s; SE = 1.01). This preferential pattern showed a degree of consistency among the individual data, with the majority (17 out of 24) of the children tested demonstrating a grammaticality preference. The overall systematicity, hence, allows us to extrapolate beyond the individual preferential variation, which might be produced by the minority of children whose word order sensitivity was distinct from the general level of this age group. The variation attested also falls within the norm of other studies, making similar group-level generalizations with the same paradigm (for one with comparable individual preferences specified, see Babineau & Christophe, Reference Babineau and Christophe2022).
A natural interpretation of the results would be that the sensitivity to the typologically common Order 1 is well-developed around 2;6, hence enabling toddlers this age to demonstrate a systematic discrimination. As mentioned earlier, a systematic group preference can indicate grammatical sensitivity, regardless of its direction. The familiarity effect (i.e., a preference for the grammatical order) attested in the present study is in line with findings of some previous studies employing the same paradigm (e.g., Babineau & Christophe, Reference Babineau and Christophe2022; Van Heugten & Shi, Reference Van Heugten and Shi2010; among others). The difficulty of grammatical stimulus encoding could be one potential cause of a familiarity preference (see also Shi et al., Reference Shi, Legrand and Brandenberger2020 for a similar discussion). It is possible that, given our current experimental design, the processing of our specific complex noun phrases was not entirely undemanding for the children that we tested. What matters the most for the generalization of Order-1 sensitivity, nonetheless, is that the discrimination effect was stable for this age group to be captured. This alone, however, says little about the deeper question we are interested in – that is, whether the cross-linguistically more common order is privileged in development due to the bias for language universals. Therefore, we need to test children’s sensitivity to the typologically rarer Order 2 and make a comparison. Following the same logic in Experiment 1, we relied on Order 3 to test toddlers’ Order-2 sensitivity in Experiment 2.
4.2. Experiment 2
4.2.1. Participants
Another 24 Mandarin-learning toddlers aged around 2;6 (mean age: 2;6;14; age range: 2;5;17–2;8;12; 14 females) participated in Experiment 2. As in Experiment 1, basic family information was collected, and their background was comparable to that of the children in the previous experiment. Another 13 participants were tested but excluded due to ceiling looking (1), parental interference during test trials (5), fussiness or inattentiveness (5, judged by non-experimenters who were blind to the experiment), failure to complete (1), and experimenter mis-operation (1).
4.2.2. Materials, design, and predictions
The speech stimuli and experimental design were nearly identical to those in Experiment 1, except that all Order-1 utterances were replaced with the Order-2 counterparts, which were comparable in intonation and length, and cross-spliced in a similar way. The design and stimuli used in Experiment 2 are shown in Table 2.
Table 2. Auditory stimuli and design of Experiment 2

Following the logic in Experiment 1, we should anticipate a significant discrimination if the sensitivity to Order 2 was equally well-developed as that of Order 1; conversely, if the Order-2 sensitivity was not robust enough around 2;6, children’s looking preferences would not be systematic as a group, and no significant difference in looking time should be expected.
4.2.3. Results and discussion
Results were sorted and analysed following the manner specified in Section 4.1.6. The looking time difference for Order-2 versus Order-3 trials (BCa 95% CI [−2.01, 3.77]) was shown to be not significant (t(23) = .831, p = .415, two-tailed, Cohen’s d = .216). As Figure 2 shows, though the overall listening time of the grammatical (Order-2) trials (Mean: 11.14 s; SE = .75) was still longer than that of the ungrammatical (Order-3) trials (Mean: 10.32 s; SE = .80), only 13 out of 24 participants demonstrated this preference. The non-significant results, as well as the greater preferential variation, revealed that the processing of the contrast between the two orders for children at this age was largely unsystematic, thus preventing reliable generalizations at the group level. Therefore, the overall results are interpreted as the absence of evidence for children’s Order-2 sensitivity.

Figure 2. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 2. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.
To further compare the toddlers’ discrimination performance in the two experiments, we combined all data and conducted a linear mixed-effects analysis, using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R. Each participant’s mean looking time in seconds for a trial was the dependent variable, with trial type (grammatical versus ungrammatical, sum-coded as 1 and − 1), experiment (Exp. 1 versus Exp. 2, also coded as 1 and − 1), and their interaction being the fixed effects. Random intercepts for trial type by participants were also included. Since each participant’s looking time of the same trial type was averaged (see justifications outlined in Section 4.1.6), the model need not include additional by-participant random slopes or by-item variance. As shown in Table 3, in the model that compared results of both experiments, trial type only displayed a weak marginal effect, which was arguably driven mainly by the systematic discrimination attested in Experiment 1. Further, the lack of interaction effect between trial type and experiment showed that there was no evidence for a fundamental difference between toddlers’ performance in the two experiments. The remaining effects were not of interest for the current study.Footnote 2
Table 3. The fixed-effects output of the linear mixed model for the combined dataset

Note: Significance is indicated in bold.
Therefore, although a systematic preference was attested in Experiment 1 but not in Experiment 2, the statistical comparison between the two grammatical orders did not reach significance. In other words, the Order-2 stimuli did not seem to be treated as items that were fundamentally different from the Order-1 items. Before we elaborate more on children’s sensitivity to the two orders, one potential concern needs addressing. Some research in the literature suggests that young infants and toddlers seem to be sensitive to frequent chunks in the input (e.g., Skarabela et al., Reference Skarabela, Ota, O’Connor and Arnon2021), leading to the possibility that our attested (non-)discrimination behaviour was due to some “cheating strategies” that children employed by tuning in to some prominent local strings. For instance, one frequent local string, NUM-CL-N, which is present in both Orders 2 and 3, but not in Order 1, might explain our results for now: Children discriminated orders with NUM-CL-N from those without (Experiment 1), but did not discriminate when both orders contained the NUM-CL-N string (Experiment 2). We addressed this concern (and also other similar ones based on other local strings) by testing children’s discrimination between Order 1 and Order 2 in Experiment 3, which also allowed us to further compare Order 1 and Order 2 in a more direct manner.
4.3. Experiment 3
4.3.1. Participants
Similar to the previous two experiments, participants of Experiment 3 were also 24 Mandarin-learning toddlers aged around 2;6 (mean age: 2;6;14; age range: 2;5;10–2;8;8; 12 females). According to the family information collected, their background was also similar to the participants in the previous experiments (see descriptions in Section 4.1.1). Another 10 participants’ data were dropped for ceiling looking (1), parental interference during test trials (2), fussiness or inattentiveness (2, judged by non-experimenters who were blind to the experiment), failure to complete (4), and experimenter mis-operation (1).
4.3.2. Materials, design, and predictions
All details remained identical with Experiments 1 and 2 except that the two test orders were replaced by Order 1 and Order 2. The speech stimuli of the two orders were the ones used in the previous experiments (see Tables 1 and 2), so that no extra confound was introduced.
If toddlers did base their (non-)differentiation behaviour on the presence or absence of prominent local strings that were present in Order 1 and absent in Orders 2 and 3 (such as NUM-CL-N), we should expect a significant discrimination between Orders 1 and 2 as well. If, on the contrary, children did fully represent and process the test noun phrases, we are more likely to have an overall non-discrimination result, in line with the lack of interaction effect in the comparison between the former experiments, because Order 2 might not be treated as entirely different from Order 1 by children aged 2;6.
4.3.3. Results and discussion
The looking time difference of the two orders (BCa 95% CI [−3.02, 1.14]) was non-significant as the paired t-test indicated (t(23) = −.812, p = .425, two-tailed, Cohen’s d = .220). A mixed-looking pattern was attested, with 10 participants looking longer in the Order-1 trials and 14 in Order-2 trials (Order 1: Mean: 10.23 s; SE = .66; Order 2: Mean: 11.00s; SE = .77, see Figure 3).

Figure 3. Looking time (i.e., listening time) distribution, medians (solid lines in the boxplots), and means (horizontal dashed lines) for each trial type in Experiment 3. The dots represent data points of each individual, with a line connecting the two contributed by the same participant.
The results were not predicted by the local-string-based explanation outlined above. In fact, we could see that the most appropriate interpretation would be that participants’ looking behaviour reflected their processing of fully represented noun phrases, as no local string could systematically explain the patterns we got.Footnote 3 Furthermore, the non-significant looking difference between Orders 1 and 2 again pointed to the fact that the two orders might not be treated completely differently by the age group we tested.
Summarizing the results of all experiments, we can see that children’s sensitivity to the two grammatical orders does not fundamentally differ – as shown in Experiment 3 and the comparison between Experiments 1 and 2. Before we derive any implications for the learning bias towards universal noun phrase ordering (qualified for Order 1 but not Order 2; see Section 2) based on these results, we still need to factor in the linguistic input available to children. Next, we evaluate whether there is a clear asymmetry between the two orders in the input.
5. A corpus analysis of the input
5.1. Methods
5.1.1. The datasets
In an attempt to represent the linguistic input available to the child learner (particularly by 2;6), we conducted our analyses on two sets of input data. First, we analysed the input speech by the age we tested in our experiments (i.e., 2;6) from the Tong corpus (Deng & Yip, Reference Deng and Yip2018), a representative longitudinal Mandarin corpus that includes the age range of our focus in CHILDES. In total, 12,049 input utterances were extracted. Next, to increase the reliability of the patterns, we conducted another investigation based on a much larger dataset, composed of all annotated mainland Mandarin corpora containing input speech in CHILDES, including AcadLang (Zhou, Reference Zhoun.d.), Erbaugh (Erbaugh, Reference Erbaugh and Slobin1992), LiReading (Li, Reference Lin.d.), Tong (Deng & Yip, Reference Deng and Yip2018), Zhou1 (Zhou, Reference Zhou2001), Zhou2 (Li & Zhou, Reference Li and Zhou2004), Zhou3 (Zhang & Zhou, Reference Zhang, Zhou and Zhou2009), ZhouAssessment (Zhou & Zhang, Reference Zhou and Zhangn.d.), ZhouDinner (Li & Zhou, Reference Li and Zhou2015), and ZhouNarratives (Li & Zhou, Reference Li and Zhou2011). A combination of all these corpora (referred to as “all corpora” henceforth) yielded 226,759 input utterances in total.
5.1.2. Two types of evidence
We considered two types of information, namely direct and indirect evidence, considering both the specific items we tested in experiments and other items falling into the same linguistic categories. By direct evidence, we refer to the input signature that can provide unambiguous information about the target structure under discussion. For the two noun phrase orders in Mandarin, we counted both fully and partially complex noun phrases that can provide such information about the two forms. Consider the following examples in (8) extracted from the input speech in the Erbaugh corpus (Erbaugh, Reference Erbaugh and Slobin1992) in CHILDES. Even though they are not the full noun phrase forms we tested in our experiments, they provide unambiguous information about the two grammatical orders, respectively. Specifically, (8a) indicates that DE-marked adjectives can occur after NUM-CL (Order 1), and (8b) signals the possibility of ADJ-DE preceding NUM-CL (Order 2).Footnote 4

As for indirect evidence, we refer to the data from which the target structure can be inferred. We considered one potentially prominent source: n-gram frequencies. We counted the bigrams and trigrams that exclusively occur in one of the two grammatical orders we focus on, based on the hypothesis that higher frequencies of the fragments of a complex form would result in toddlers’ higher sensitivity to it. Although it is shown in the literature that Mandarin-learning children at or even younger than our test age could go beyond linear sequence tracking and process input stimuli based on their linguistic structures (e.g., Ying et al., Reference Ying, Yang and Shi2022), statistics of this type were still factored in given their potential significance in children’s initial linguistic development (e.g., Ngon et al., Reference Ngon, Martin, Dupoux, Cabrol, Dutat and Peperkamp2013; Skarabela et al., Reference Skarabela, Ota, O’Connor and Arnon2021) and possible impact on processing even for adults (e.g., Tremblay et al., Reference Tremblay, Derwing, Libben and Westbury2011). In other words, the more prominent n-grams of a particular order might lead to a sensitivity advantage at 2;6, either by providing a better developmental support or by facilitating the recognition of relevant items. Therefore, the n-gram frequencies calculated here were the totalities of the occurrences of two orders’ different sub-parts, regardless of their linguistic status (e.g., whether the n-grams themselves are meaningful pieces, whether they are NPs or not, what syntactic environment they occur in, etc.) or position (whether the pieces occur at the beginning or end of the noun phrase).Footnote 5 Consider the bigram DE-N and trigram ADJ-DE-N. First, though they only occur in Order 1, they do not necessarily point to the grammaticality of Order 1 as they provide no clue about the relative position of different prenominal categories, i.e., DEM, NUM-CL, and ADJ-DE, when all of them co-occur. However, if n-grams alike occur very frequently in the input, in a way that far exceeds other n-grams like CL-N and NUM-CL-N (i.e., those only exist in Order 2), then we shall expect the sensitivity to Order 1 to develop prior to the sensitivity to Order 2 (similar logic applies to n-grams only occurring in Order 2). Following studies exploiting these two types of evidence in the literature (see also Koulaguina et al., Reference Koulaguina, Legendre, Barrière and Nazzi2019; Shi et al., Reference Shi, Legrand and Brandenberger2020; Yang, Reference Yang2004), our calculations were based on token frequencies, i.e., the total number of all items.Footnote 6
5.1.3. The search procedure
We made use of the tagging at the morphological tier of the transcripts in our analysis. Aside from the uniformly annotated ADJ, NUM, CL, and N, we tracked demonstratives via their tags “det” and “pro:dem,” and identified DE via “nom,” “cleft,” “poss,” and also via adjectives glossed as “adj-ish,” which indicated the presence of an unsegmented DE. For the initial search, we used the built-in COMBO command in the CLAN program, which allowed one to extract all combinations of the items with certain annotations. The search operated on all non-child tiers, essentially picking out not only the adult speech specifically directed to children but also the ones around them. After the search, we manually excluded the cases that were mistakenly included in a systematic way due to tagging errors. We also extracted instances with the items we used in the experiment for the calculation of item-specific chunks. Finally, we added up the overall frequencies of different structures.
5.1.4. Results and discussion
The results are summarized as in Table 4. Given the much smaller size of the Tong corpus dataset, item-specific statistics were too limited to be informative. Therefore, here we only report category-based results in the Tong corpus dataset, and include both in the larger-scale “all-corpora” dataset.
Table 4. Frequencies of different structures in the input based on both the specific items used in experiments and the grammatical categories involved; total number of child-directed utterances: 12,049 (the Tong corpus) & 226,759 (all corpora)

Note: Fully complex NPs and partially complex ones without DEM/NUM (e.g., (8)) were both counted as instances of complex form. As for n-grams, an occurrence of each listed chunk was counted as one token of indirect evidence. For example, one DEM-NUM-CL-ADJ-DE-N form would contribute one instance of Order-1 direct evidence, two Order-1 bigrams (CL-ADJ and DE-N), and three Order-1 trigrams (NUM-CL-ADJ, CL-ADJ-DE, and ADJ-DE-N).
For direct evidence, we did not find any instances of a noun phrase containing the exact same items we used in the experiment. As for the ones with other lexical items, the occurrences of both structures were still very rare. Though the absolute frequencies for Order 1 (17 and 246) were numerically higher than those for Order 2 (2 and 47), they each roughly accounted for fewer than 0.2% of total utterances (though the estimates might have slight deviations given that an utterance could in principle contain more than one complex noun phrase) in both datasets. These extremely sparse amounts of data could very well be indistinguishable from the negligible (but inevitable) noise in the input. We evaluate the role of these data with reference to other independent findings in the literature (for a similar idea, see Yang, Reference Yang2004). Ordering sensitivity acquired before 2;6 is found to be supported by a much larger amount of direct evidence. For instance, for the form of English wh-questions in which the wh-words are displaced to the beginning of the sentences, reliable acquisition has been attested in toddlers aged 1;6 (Perkins & Lidz, Reference Perkins and Lidz2021), and accordingly, around 25% unambiguous signatures are available in the input; and in the meantime, unambiguous evidence that is proportionally similar to the one we report here is either found to result in acquisition delay (see Yang, Reference Yang2004 and the references cited there) or argued to be trivial in explaining children’s early knowledge (see Koulaguina et al., Reference Koulaguina, Legendre, Barrière and Nazzi2019; Shi et al., Reference Shi, Legrand and Brandenberger2020). Therefore, we reason that direct evidence plays a limited role in accounting for the word order sensitivity we reported earlier.
For indirect evidence, Order 2 seemed to be slightly better supported overall. For chunks with the specific items we used, there were numerically higher Order-1 trigrams (22, as compared to 7 for Order 2), but substantially more Order-2 bigrams (224, as compared to 35 for Order 1). As for category-based data, we found more Order-2 n-grams consistently. As Table 4 shows, Order-2 n-grams were larger in number both in the Tong corpus (867 and 289, as compared to 638 and 203 for Order 1) and with all corpora combined (16,888 and 7,033, as compared to 16,669 and 4,991 for Order 1).
Therefore, children’s linguistic input does not display a clear advantage for either order in terms of early acquisition: The direct evidence tentatively points to the typologically common Order 1, but the data are too sparse and the advantage is not consistent when indirect evidence is taken into account. In the next section, we will discuss the implications of this input pattern as well as our experimental findings.
6. General discussion
The current study delves into the potential bias for language universals from the perspective of child language, by probing Mandarin-learning toddlers’ sensitivity to two grammatical noun phrase internal word orders differing in typological markedness, and the input statistics available to child learners. In the first part of our study, we tested children’s discrimination between different orders around 2;6 with three experiments. In Experiment 1, the typologically common grammatical Order 1, namely DEM-NUM-CL-ADJ-DE-N, was found to be systematically distinguished from the ungrammatical baseline Order 3 (DEM-ADJ-DE-NUM-CL-N). By contrast, the typologically rare grammatical Order 2, i.e., ADJ-DE-DEM-NUM-CL-N, was not reliably differentiated from the ungrammatical Order 3 (Experiment 2), nor from the grammatical Order 1 (Experiment 3) by toddlers aged around 2;6. In the second part, we showed with a corpus analysis, considering both the complex noun phrases approximating the two orders and their partial forms, that neither grammatical order showed a clear advantage for early acquisition in terms of linguistic input frequency.
With respect to the matter the present study sets out to test, i.e., the potential bias in learning for universal word orders in the noun phrase, our current findings remain inconclusive. Though the insufficient and inconsistent data revealed in the input suggest that children’s early sensitivity may be supported by factors beyond exposure to different word orders, it remains unclear whether its development is guided by a learning bias for language universals. According to our present data, the cross-linguistically common Order 1 is not fundamentally different from the typologically rare Order 2 in acquisition: The difference of (non-)discrimination effects between Experiments 1 and 2 was not statistically significant, and the two orders were not differentiated in Experiment 3.
However, the present study lays the foundation for future investigations of the learning bias for universal word orders in natural language acquisition. First, our corpus results have justified that the two orders, when in their full forms, serve as ideal test cases for the matter, because neither of them has a clear frequency advantage in the input. In other words, any clear acquisition asymmetry found at early stages of development between the two forms in future study is more likely to stem from factors internal to young children – for example, their preferences in organizing different noun phrase internal items, as attested in adults. Conversely, the total absence of such an asymmetry would point to the possibility that the learning bias observed in adults may gradually emerge later in development. Second, our experimental data also show that the visual fixation paradigm is suitable for the investigation of the issue in early toddlerhood as well. When young children are presented with the full forms of the two orders, they do seem to process the ordering of the complex noun phrases, as no local strings could systematically explain children’s looking patterns in our experiments.
Though our current comparisons between the two grammatical orders fail to provide evidence favouring an acquisition difference between the two grammatical orders, our results do not constitute strong evidence against an early bias for language universals either, especially because only Order 1, but not Order 2, is found to be discriminated from an ungrammatical order in our study. It is possible that the acquisition distinction between the two orders is not observed due to independent reasons. Replication of the current findings and/or further investigation targeting children’s Order-2 knowledge around this age could help further clarify the reliability of the null results in Experiments 2 and 3, as well as the potential acquisition asymmetry between the two orders. For instance, one possibility is that the acquisition of Order 2’s formal legitimacy is taking place around our test age (i.e., 30 months), so some children in this age group treat Order 2 as grammatical while others do not. Therefore, it is harder to capture a differentiation between Order 2 and the ungrammatical Order 3, or between Order 2 and the grammatical Order 1 at the group level.Footnote 7 Studies testing more children at the same age (and hence increasing the statistical power), or those examining slightly younger children, provided that they have the necessary lexical knowledge of the noun phrase items, may help illuminate the issue.
Another issue relevant to the potential learning bias that our study begins to explore is the abstractness of children’s ordering knowledge. The learning bias found in previous adult artificial language learning studies (Culbertson & Adger, Reference Culbertson and Adger2014; Martin et al., Reference Martin, Ratitamkul, Abels, Adger and Culbertson2019; Martin et al., Reference Martin, Holtz, Abels, Adger and Culbertson2020; Martin et al., Reference Martin, Adger, Abels, Kanampiu and Culbertson2024) is arguably category-based. In other words, to fully align with the bias discussed in the literature, one should provide evidence for children’s access to an abstract representation of a noun phrase that supports the universal orders. Even though the input frequency involving the particular items tested in our experiments does not seem to be a strong predictor for children’s sensitivity (especially to Order 1), our experiments were only based on a limited number of items, and therefore, not much inference can be made on this matter. Research on syntactic categorization has provided some hints for the abstractness of nominal structure in early grammar. For instance, studies investigating Mandarin-learning infants and toddlers have reported children’s ability to recognize novel items occurring after the DEM-CL chunk zhege “this-CL” as nouns (Ying et al., Reference Ying, Yang and Shi2022; Zhang et al., Reference Zhang, Shi and Li2015). Such findings implicate young children’s sensitivity to the syntactic structure of noun phrase, and thereby the accessibility of some abstract knowledge of nominal structure from an early age. Future research that reveals or disproves a generalized characterization of nominal structure supporting the universal orders in child grammar would contribute to the understanding of the potential learning bias.
To conclude, the current study constitutes the first step in integrating child language acquisition insights with the inquiry of language universals in the domain of noun phrase internal word order. Though the results do not provide clear support for the bias for universal linguistic patterns in early acquisition, our work lays the foundation for future research to further examine the bias from the perspective of first language acquisition. Investigations following this line may help develop a more comprehensive understanding about the essence of this bias and, by extension, the underpinnings of child language acquisition in general.
Acknowledgements
This study was supported by the National Social Science Fund of China (21BYY019) to Xiaolu YANG. We are grateful to all families that participated in the study. We also thank Prof. Xiaoshi HU, Prof. Ting XU, and Prof. Yue JI for their comments on the study at various stages, and members of the Language Acquisition Lab and the Child Cognition Center at Tsinghua University who helped with data collection, including Miao MIAO, Deming SHI, Ziqi WANG, Jiarui ZHANG, Yanting LI, and Wenjia TAN. Parts of this work were reported at HSP 2023, IACL-29, and ICTEAP-4. Comments and suggestions from the conference participants are appreciated. The authors thank the anonymous reviewers for their valuable comments and constructive suggestions, which have helped improve the quality of this ‘paper’.
Data availability statement
Experiment data and corpus analysis codes that support the current study are available via the Open Science Framework:
https://osf.io/aufn5/?view_only=183c0d4533e04efdb27d8213825d2e8e.
Competing interests
The authors declare none.