1. Introduction
If John only peeled three apples, did he also eat them? Or did he also peel three pears? Interpreting only-type focus expressions requires integration of knowledge across syntactic, semantic, and prosodic domains, which poses great challenges to monolingual preschoolers across languages (e.g., Notley et al., Reference Notley, Zhou, Crain and Thornton2009; Zhou et al., Reference Zhou, Su, Crain, Gao and Zhan2012) and adult bilinguals with diverse language backgrounds (e.g., Ge et al., Reference Ge, Chen and Yip2021) in experimental settings. However, less is known about how young bilinguals produce only-type focus in naturalistic settings. Bilinguals receive proportionally reduced input in each language compared to their monolingual counterparts (“input reduction”), which may impact language development in various linguistic domains to different extents (Paradis & Grüter, Reference Paradis and Grüter2014). There is growing evidence that bilingual children’s development in the grammatical domain may be less affected by input reduction than in other linguistic domains, possibly because bilinguals could benefit from shared grammatical knowledge in their dual language systems (Korade et al., Reference Korade, Nicoladis and Charest2024). Given the complexity of only-type focus expressions and their reduced input, will bilinguals exhibit quantitatively and qualitatively different patterns compared with their monolingual peers? To what extent is their production of focus accounted for by how focus is provided in the parental input? This study aims to investigate these issues.
Below, we first compare key properties of only- and zhi(you)-focus expressions and review relevant acquisition studies with monolingual and bilingual populations, highlighting the acquisition challenges and potential accounts. Next, we present two corpus-based studies on bilingual children’s production and on parental input, respectively, and bring them together in the general discussion and conclusion sections.
2. Only/zhi(you)-focus and its acquisition by monolingual and bilingual children
2.1. Multi-domain knowledge in focus identification in only/zhi(you)-constructions
Only and zhi are focus particles (FP) in English and Mandarin, respectively, which express exclusive focus (or “restrictive focus”) by identifying a set of elements and negating their semantic alternatives. A syntactic constraint on focus association is that the FP must c-command the intended focus (Jackendoff, Reference Jackendoff1972); elements outside the c-command domain cannot be associated with the FP or receive focus reading. In (1)a-a’, only/zhi c-commands the entire VP (throws apples), and is thus associated with VP or with discrete elements inside the VP, including the verb (throws) and the object (apples). Association with the subject (John), however, is impossible, because the subject is outside the c-command domain of only. Within their focus domain, the different focus associations of the FPs lead to different interpretations of the sentences and correspondingly different truth conditions (König, Reference König2002; Lee, Reference Lee2019). For example, in (1), when the FPs are associated with the object, the sentences entail that John does not eat anything else except apples. However, when the FPs are associated with the verb, the sentences entail that John does not do anything else to the apples except throwing them.

Nevertheless, there are important differences between only and zhi in several aspects. In terms of syntactic positioning, as in (2), only can occur flexibly in pre-subject, pre-lexical verb (pre-LexV) and pre/post-object positions, and can directly precede or follow an NP in subject- and verb-less environments. Zhi, however, is restricted to pre-auxiliary verb (pre-AuxV) and pre-LexV positions. The positioning of only and zhi is consistent with how adverbs are usually positioned in English and in Mandarin respectively (Bonami et al., Reference Bonami, Godard, Kampers-Manhe, Corblin and De Swart2004; Cui, Reference Cui2012). Note that neither only nor zhi can connect a subject and an object without the main verb, as in (2)d, dʹ.
Interestingly, Mandarin has an FP zhiyou, which is formed by attaching zhi to a verbal element you (a stand-alone verb, meaning to have, exist or measure, Yuan et al., Reference Yuan, Li, Cao and Wang2009; Xiong, Reference Xiong2016). In theoretical analyses, there has been a lack of consensus on whether zhiyou is derived in syntax, in which zhi selects a you-phrase (Lee, Reference Lee2019), or functions as a single unit in the lexicon entering syntax directly (Hole, Reference Hole2017). It is beyond the scope of this paper to settle this debate. We follow a recent proposal by Sun (Reference Sun2021), which provides robust evidence for distinct properties of zhiyou-constructions, such as their ability to take PP arguments, and the definiteness effect of post-zhiyou NPs. These features are not predicted by properties of the individual compositional elements zhi or you. In this light, our forthcoming analysis of the acquisition data will discuss the positioning of zhi and zhiyou separately. Separating zhiyou from zhi allows us to capture the complementary distribution of zhi and zhiyou in the preverbal domain: like only in English, zhiyou can appear in the pre-subject position and assign focus reading to subjects, which is not possible for zhi in Mandarin, as in (2)a-aʺ; however, zhiyou cannot appear in pre-AuxV or pre-LexV positions, which are positions routinely occupied by only and zhi. Additionally, zhiyou can function as the main verb in SVO sentences, as in (2)d,” and can appear in pre-NP positions in subject- and verb-less environments, as in (2)f.” Table 1 summarizes the syntactic differences between only, zhi and zhiyou.

Table 1. Syntactic position of only, zhi, and zhiyou in canonical (S)V(O) sequences. (Notes: F = intended focus, AuxV = auxiliary verb, LexV = lexical verb, V = verbal predicate in general)

Within the FP’s c-command domain, when there are multiple focus candidates, speakers consult knowledge of word order, prosodic prominence and the relevant discourse context to determine the focus. It is proposed that, cross-linguistically, the default position for new-information focus (hereafter “information focus”) is the most deeply embedded constituent on the recursive side of syntactic branching, aligning with the general position of main stress prosodically (Cinque, Reference Cinque1993; Xu, Reference Xu2004). Applied to canonical SVO sentences in English and Mandarin, this means that the focus defaults to the rightmost position (the syntactic object), since both languages are right-branching in VPs. Prosodic prominence is a common strategy for marking focus in both languages. When syntax prevents the focal element from occurring in the default position, speakers have the option to shift the prosodic stress to the focal element (“stress shift”) to mark the intended narrow focus. In English, prosodic stress is typically realized through increased duration and mean pitch, regardless of whether the focus occurs in the default position (Breen et al., Reference Breen, Fedorenko, Wagner and Gibson2010; Zimmermann & Onea, Reference Zimmermann and Onea2011). Mandarin, being a topic-prominent analytic language with a more flexible word order and more null arguments than English (Xu, Reference Xu2004; Li & Thompson, Reference Li and Thompson1981), provides rich opportunities for the intended focus to appear in the utterance-final position. Like in English, Mandarin marks prosodic focus by raised mean pitch and longer duration, among other features (Chen et al., Reference Chen, Wang and Xu2009). However, the role of pitch in Mandarin focus marking is less prominent than that of duration, because the height and contour of pitch are used to encode lexical tone contrasts in Mandarin, leaving less space for speakers to manipulate pitch for focus purposes (Chen, Reference Chen, Prieto and Esteve-Gibert2018).
2.2. L1 Children’s non-adult-like performance in focus association
Identifying the intended focus in only-type focus structures is challenging for young children. Preschool children often associate pre-subject FP, as in (2)a, with elements in the VP (e.g., the object) rather than the subject, violating the syntactic constraint of focus association. This has been found in English- (Crain et al., Reference Crain, Ni and Conway1994; Paterson et al., Reference Paterson, Liversedge, Rowland and Filik2003), Mandarin- (Notley et al., Reference Notley, Zhou, Crain and Thornton2009; Yang, Reference Yang2002), and German-learning preschoolers interpreting English and Mandarin sentences like (2)a and (2)aʺ and their German counterparts (Müller et al., Reference Müller, Schulz, Höhle, Pirvulescu, Cuervo, Pérez-Leroux, Steele and Strik2011). Accounts for children’s non-adult-like performance have considered syntactic misanalysis (Notley et al., Reference Notley, Zhou, Crain and Thornton2009) and violation of the default information structure (Müller et al., Reference Müller, Schulz, Höhle, Pirvulescu, Cuervo, Pérez-Leroux, Steele and Strik2011). Notley et al. (Reference Notley, Zhou, Crain and Thornton2009) proposed that, unlike in adult grammar where the pre-subject FP only c-commands the subject rather than the VP, in early grammar the pre-subject FP is probably analysed as a sentential adverb scoping over the entire utterance. Under this analysis, all elements linearly following the FP are potential focus candidates and elements in the default focus position (e.g., object in SVO languages) are more likely to be interpreted as the focus. Müller et al. (Reference Müller, Schulz, Höhle, Pirvulescu, Cuervo, Pérez-Leroux, Steele and Strik2011) speculated that because subject focus violates the default information structure, children may be biased towards the default focus interpretation when there is competition between non-canonical focus (subject focus) and the default focus due to their limited cognitive resources. Despite their differences, both syntactic and information structural accounts highlight the bias towards the default focus position in children’s misinterpretations.
Children also differ from adults in focus-accentuation mapping when focus deviates from its default position and is marked by shifted stress. Gualmini et al. (Reference Gualmini, Maciukaite and Crain2003) found that across conditions, English-speaking children (4;0–5;0) consistently associated pre-verbal FP in ditransitive only-utterances with the indirect object in the sentence-final default focus position, whether the accentuation was on the direct object or the indirect object. Importantly, when a discourse context which contradicted one of the readings was provided, children’s performance improved dramatically, indicating that for young children, insensitivity to prosodic stress in focus interpretation can be mitigated by contextual information. Nevertheless, Zhou et al. (Reference Zhou, Su, Crain, Gao and Zhan2012) found that Mandarin-L1 children (4;0–5;0) showed adult-like fixation patterns in response to [zhiyou-modifier-noun] sequences with prosodic stress placed either on the modifier or the noun, but they dismissed prosodic stress as a cue in trial-final judgments and consistently associated the FP with the modifiers regardless of stress position, probably due to not-yet-automatic mapping between focus and phonology.
Additionally, studies on focus realization in canonical sentences without FPs reveal that preschool children aged 3–5 differ from older children and adults in the specific acoustic features used for focus marking in production. In a controlled production experiment juxtaposing narrow, broad, and contrastive focus types, Yang & Chen (Reference Yang and Chen2018) found that Mandarin preschool children rely on duration features rather than pitch-related features to realize focus, whereas older children and adults use both. For English children, the findings seem to vary across task designs and analyses: it was found that 4-year-old children make use of greater pitch (but not longer duration) on words carrying either new or discourse-prominent information in a picture description task (Wonnacott & Watson, Reference Wonnacott and Watson2008), but others reported increased duration (rather than pitch) on the final word to signal questions by 4-year-olds (Patel & Brayton, Reference Patel and Brayton2009). Still others discovered that toddlers manipulate duration (rather than intensity or pitch) to differentiate between given and new referents in their spontaneous interaction with an experimenter in a game (Thorson & Morgan, Reference Thorson and Morgan2021). Nevertheless, these findings suggest that the development of prosodic focus production begins early in toddlerhood, but it is a gradual course subject to differences between acoustic features and between languages.
In sum, previous studies have experimented with comprehension by children in laboratory settings to investigate the multiple domains of knowledge (syntax, semantics, prosody, discourse context) involved in their interpretations of only-type focus. There is a lack of research on how the FP is produced by young children in naturalistic environments, and on the role of language input in the acquisition of such multi-domain knowledge. To our best knowledge, Hracs (Reference Hracs2021) is the only study investigating input effects on FP acquisition. Based on only-utterances in spontaneous speech of English monolingual children (0;3–9;9) and their caretakers, she showed that the frequency of only in the input significantly predicts the frequency of only in the children’s output. Interestingly, distributional patterns of only are also similar between child-directed speech (CDS) and child-produced speech (CPS): only occurs in utterance-medial and preverbal positions and is associated with the utterance-final constituent most frequently. These results suggested close relations between the production of only in CDS and CPS in English, whether the relations are causal or not.
2.3. Focus identification in bilinguals and complex grammatical structures in early bilingual development
While evidence for the effects of input reduction in bilingual lexical development is robust (e.g., Thordardottir, Reference Thordardottir2014), its effects on bilinguals’ grammatical development are less clear (e.g., Blom, Reference Blom2010; Paradis et al., Reference Paradis, Tremblay and Crago2014; Paradis et al., Reference Paradis, Rusk, Duncan and Govindarajan2017; Unsworth, Reference Unsworth2014). It is possible that complex linguistic properties that involve multi-domain knowledge are the locus of monolingual versus bilingual differences, instantiating non-facilitative cross-linguistic influence (CLI) under input reduction conditions. Several studies have reported CLI in both syntactic and semantic domains in focus-sensitive particles. Leray (Reference Leray, Dufter and Jacob2009), for example, found that German-French bilingual preschoolers (2;6–4;6) predominantly positioned aussi “also” in French in the post-verbal position, among other possible positions, probably because it is the only position that is also possible for auch, the German counterpart of aussi. At the semantic level, the bilinguals associated aussi with the subject and the object almost equally, mirroring the association options available for German post-verbal auch, whereas the French monolinguals preferred an association between post-verbal aussi and the object.
Non-facilitative CLI in focus structures at the prosodic level is evident in adult bilinguals. Andorno & Turco (Reference Andorno and Turco2015) examined additive FPs anche and auch (Italian and German equivalents of also) produced in subject-focused contexts by German-L1 Italian-L2 speakers and Italian-L1 German-L2 speakers. They found that both groups of bilinguals adopted prosodic patterns in line with their L1s and partially discarded language-specific patterns of the L2s. Ge et al. (Reference Ge, Chen and Yip2021) found that Dutch learners of English, whose L1 shares a similar representation of focus-accentuation mapping with English, utilized prosodic stress to disambiguate focus association in only-utterances. However, Cantonese-L1 English learners did not distinguish between different accentuation patterns in the same task, possibly due to non-facilitative CLI from Cantonese, in which prosodic cues play a less prominent role in focus marking than in English and Dutch.
Nevertheless, CLI can manifest itself as accelerated acquisition (“positive transfer”). The French-English bilingual children (4;3–7;1) in Korade et al. (Reference Korade, Nicoladis and Charest2024), for example, demonstrated better-than-monolingual performance in production tasks assessing complex structures in English (i.e., wh-questions, passive, bi/multi-clausal sentences), despite reduced exposure to English and lower vocabulary scores in English. Crucially, their production of complex syntax in French is a significant predictor of that in English. These findings suggest that the impact of reduced language exposure can be mitigated by cross-linguistic reinforcement due to cross-language similarities.
In sum, early bilingual grammatical development is conditioned by both input effects and CLI, resulting in developmental patterns different from those of their monolingual peers. Research with various bilingual populations has revealed how complex structures involving multi-domain knowledge are susceptible to non-facilitative CLI and how structural similarities between languages can mitigate input reduction through facilitative CLI. The production of cross-linguistic equivalents only and zhi(you) by young bilinguals offers an ideal test ground to investigate the effects of CLI in early bilingual development.
3. Study 1
3.1. Research questions and predictions
While the comprehension of only-type focus poses challenges to children across languages, it remains unclear how bilingual children produce only-type focus in context. Our first study examines only and zhi(you) produced by Mandarin–English bilingual preschoolers in naturalistic interactions and addresses the following questions:
-
1. Production rates of FP: Do Mandarin–English bilingual preschoolers produce only and zhi(you)-structures significantly less frequently than age-matched English and Mandarin monolinguals due to reduced input in respective languages?
-
2. Quality of FP-utterances: Do the bilingual children produce only and zhi(you) in a full spectrum of syntactic positions with diverse semantic associations as allowed by respective languages (described in Table 1)? Will only appear in syntactic positions that are not allowed in English, yet possible for zhi(you) in Mandarin due to CLI from zhi(you), and vice versa?
-
3. Positioning and prosodic marking of focus: Do the bilingual children favour utterance-final focus in their only- and zhi(you)-utterances? If not, do they mark non-utterance-final focus with prosodic prominence (stress shift)?
Predictions . Previous research shows that the multi-domain knowledge required for focus identification in only/zhi(you) develops further at the preschool age (e.g., Notley et al., Reference Notley, Zhou, Crain and Thornton2009; Zhou et al., Reference Zhou, Su, Crain, Gao and Zhan2012). Assuming that target-like production of only/zhi(you)-focus necessarily hinges on large amounts of input and experience in respective languages, it is possible that Mandarin–English bilinguals, whose input in each language is substantially reduced compared to age-matched monolinguals, will be less productive than the monolinguals in these structures. However, since only and zhi(you) overlap in important syntactic and semantic properties, it is expected that they will be perceived as cross-linguistic equivalents by the bilinguals, opening up opportunities for (non-)facilitative CLI similar to adult and child bilinguals in the literature (Ge et al., Reference Ge, Chen and Yip2021; Leray, Reference Leray, Dufter and Jacob2009). On the one hand, if facilitative CLI serves to mitigate input reduction, the bilinguals will display monolingual-like or even increased frequency of only-focus. On the other hand, non-target-like production of only/zhi(you) will occur. Although, theoretically, only can be mapped onto zhi, zhiyou or zhi(you) in the dual-language representation of the young bilinguals, the strongest cross-linguistic correspondence is probably established between only and zhi in the utterance-medial preverbal positions, given that only occurs in utterance-medial preverbal positions most frequently in both child-directed and child-produced speech (Hracs, Reference Hracs2021), and that this position is routinely occupied by zhi, rather than zhiyou in Mandarin. In that case, non-target-like use of only in the pre-auxiliary position due to influence of zhi should be most evident among possible transfer-induced patterns in the bilinguals.
Regarding focus placement, a preference for utterance-final focus is expected in both only- and zhi(you)-utterances since it is the default focus position shared by the two languages, a position to which monolingual 3-year-olds in respective languages are sensitive (Chen et al., Reference Chen, Szendrői, Crain and Höhle2019; Hracs, Reference Hracs2021). In terms of stress shift for focus marking, we predict that the bilingual children are more likely to shift the prosodic stress through longer duration rather than higher pitch, since duration is reported to develop earlier than pitch-related features in Mandarin preschoolers (Yang & Chen, Reference Yang and Chen2018) and in some studies examining spontaneous production in English toddlers (Thorson & Morgan, Reference Thorson and Morgan2021).
3.2. Methods
To study the production of only/zhi(you)-focus in Mandarin–English bilinguals, we analysed only- and zhi(you)-utterances produced by four Mandarin–English bilingual children (Alex, Charles, Gabriel, Madison) from our newly established Hong Kong Mandarin–English Child Corpus (HKMECC, Mai et al., Reference Mai, Shang, Liu, Yan, Matthews and Yip2024). The children were born to Mandarin-speaking parents and raised bilingually in Mandarin and English in Hong Kong, with no or very limited exposure to Cantonese. Their bilingual development was documented longitudinally through monthly video recording of them interacting with a trained research assistant at home for 24 months (30 minutes per language per month; starting age ranging from 2;10 to 4;5). The parents of the children are originally from mainland China, speaking Mandarin Chinese as their native language and highly proficient in English. At time of recruitment, the children were attending an international kindergarten, with English as the medium of instruction and daily hour-long Mandarin lessons as part of the curriculum. All children remained in the same kindergarten and/or advanced to the same international primary school except for Gabriel, who attended a Cantonese–English bilingual kindergarten for 12 months during the observation period. Overall, the children had substantial exposure to Mandarin much earlier than to English, and they used mostly Mandarin at home and English at school, with minimal exposure to Cantonese outside home in three out of four cases. Given this, we consider them as Mandarin-L1 English-L2 sequential bilingual children who develop their languages in a one context-one language setting during the study period. Mean Length of Utterance in words (MLUw) indicates that Alex was balanced between Mandarin and English with comparable MLUw (p = .57), whereas Charles, Gabriel, and Madison had greater MLUw in Mandarin than in English (p < .007). Short biographies of individual children are included in the Supplementary Materials.
The Beijing Child Mandarin Corpus (BJCMC, Mai et al., Reference Mai, Shang, Liu, Yan, Matthews and Yip2024) served as the Mandarin monolingual baseline. BJCMC sampled naturalistic adult–child play at 16 age points with 3-month intervals between 3;0 and 6;9. Adopting a cross-sectional design, we recruited 48 children (three children per age point) and recorded them playing or talking with a trained research assistant in 30-minute sessions during home visits. As an inclusion criterion, the Beijing children were raised in Mandarin-dominant households and educated in Mandarin-medium kindergartens or primary schools in Beijing, and their exposure to English and other varieties of Chinese (dialects) was minimal. The average MLUw of the 48 children is 4.59, which is higher than the average Mandarin MLUw across individual bilingual children in HKMECC.
Several existing corpora of English children in CHILDES were selected as the English monolingual baselines. Priority was given to the corpora and cases which match our bilingual children in terms of age span (2;10–6;4), type of study (naturalistic play), frequency of recording (monthly), and socioeconomic status (mid-high). Average MLUw of the three children is 4.65, which is higher than the average English MLUw across individual bilingual children in HKMECC. Table 2 presents the key features of our dataset.
Table 2. Bilingual and monolingual child speech samples selected for Study 1. (Notes: HKMECC = Hong Kong Mandarin–English Child Corpus; BJCMC = Beijing Child Mandarin Corpus; RA = Research Assistant; MOT = Mother; MLUw = Mean Length of Utterances in words; * = significant difference in MLUw between languages in the same child at 0.01 level (paired-sample t-tests))

Transcription and coding. For HKMECC and BJCMC, child and adult speech were transcribed following CHAT standards in CLAN (MacWhinney, Reference MacWhinney2000) by trained bilingual research assistants, and manually checked and revised for speech-to-text accuracy by the first author and another postgraduate student in linguistics. We extracted utterances containing only, zhi and zhiyou and their surrounding contexts (five preceding and two subsequent utterances), excluding incomplete utterances and child utterances that are direct repetitions of the preceding adult utterances.
Three levels of codes were manually assigned to each utterance (Figure 1): (i) syntactic position of FP; (ii) semantic association of FP (i.e., grammatical function of the focus), and (iii) positional and prosodic realization of the focus. The first two levels were coded based on the categories in Table 1. The third level was implemented through several steps: for positional means, we differentiated focus expressions which appear at the utterance-final position and thus do not require special prosodic stress on the intended focus (i.e., [+UF]) and those that do not appear in the utterance-final position and require stress shift (i.e., [−UF]). Prosodic analysis was conducted only on the [−UF] utterances. In Praat (Boersma & Weenink, Reference Boersma and Weenink2023), the utterances were manually segmented into words (for English) and syllables (for Mandarin), which were annotated as either focal or non-focal elements based on the intended meaning. Here, we chose duration and mean pitch over other acoustic features (e.g., pitch range) as the acoustic indictors of stress in our study, because these two measures have been most intensively studied across existing investigations of prosodic focus marking in child and adult speakers of Mandarin or English in both laboratory and naturalistic settings (Breen et al., Reference Breen, Fedorenko, Wagner and Gibson2010; Chen et al., Reference Chen, Wang and Xu2009; Nadig & Shaw, Reference Nadig and Shaw2015; Patel & Brayton, Reference Patel and Brayton2009), which allows us to connect our findings with the broadest literature. For each utterance, the average duration of the focal elements was calculated by dividing the total duration of the focal elements by the number of syllables of the focal elements. The average duration of non-focal elements was calculated in the same way. The total durations of focal and non-focal elements were obtained in Praat. For pitch, mean pitch values of individual words in English and those of individual syllables in Mandarin were extracted using in Praat, and averaged across words (for English) or syllables (for Mandarin) in the focal and non-focal elements in an utterance. We then conducted Wilcoxon signed-rank tests in R (version 4.4.1; R Core Team, 2024) to compare the focal and non-focal elements within sentences in terms of average duration and pitch, following existing studies on prosodic focus marking in Mandarin and English (Breen et al., Reference Breen, Fedorenko, Wagner and Gibson2010; Chen et al., Reference Chen, Wang and Xu2009). More details about our coding scheme are provided in Supplementary Tables S1–S3.

Figure 1. Flowchart of coding procedures in Study 1 and Study 2.
3.3. Results
Production rates of the FPs. We extracted a total of 171 only-utterances, 147 zhi-utterances, and 116 zhiyou-utterances from the HK bilingual children, 115 zhi-utterances and 50 zhiyou-utterances from the Mandarin monolingual children, and 104 only-utterances from the English monolingual children (total token number = 703). We divided our developmental data spanning over 43 months into three periods of similar duration: 2;10–4;0 (Period I: 15 months), 4;1–5;0 (Period II: 12 months), and 5;1–6;4 (Period III: 16 months).
Mean production rate of only-utterances increased considerably in the bilingual children (0.23%, 0.35%, and 1.07% in Periods I, II, and III respectively). Although we do not have monolingual data of Period III as a benchmark, the production rates of the bilinguals in the two earlier periods are already very close to those of the monolinguals (0.16% and 0.38% in Periods I and II respectively). Examination of individual children’s data revealed that all bilingual and monolingual children, except for Gabriel, produced only-utterances increasingly frequently between the two periods. Since Gabriel is the only bilingual child in this dataset who attended a bilingual, rather than English-medium, kindergarten (3;0–4;0, Period I), his exceptional production pattern is probably attributable to his unique language exposure among the HK bilinguals.
In contrast, the production rate of zhi(you)-utterances displays different age-related changes between the bilingual and monolingual children. The Mandarin monolinguals were equally productive in Periods I and III (1.07% and 1.06%), although less so in Period II (0.76%). The curious dip at Period II is probably due to individual variation arising from the cross-sectional design, or factors not investigated in this study awaiting future investigations. The bilingual children, nevertheless, exhibited age-related increase in zhi(you)-utterances. Although in Period I, they produced zhi(you)-utterances less frequently than the monolinguals (Gabriel and Charles, 0.42% and 0.14% respectively), their production rates in Periods II and III (M = 1.01% and 1.32% respectively) increased steeply and were very close to those of the monolinguals.
Figure 2 displays the production rates of the FP-utterances by individual bilingual children and their monolingual peers in respective languages across three periods. Gabriel’s only-utterances were not included in this figure due to his unique pattern of English exposure and production as discovered above. Individual data are presented in full in Supplementary Table S4. In sum, despite reduced proportion of input in both languages, the bilingual children produced only/zhi(you)-utterances increasingly frequently with age between 2;10 and 6;4, matching with respective monolinguals by age 5.

Figure 2. Production rates of the focus particles in English (left) and in Mandarin (right) (proportions = percentages of only/zhi(you)-utterances among total utterances in individual children in the longitudinal cases, or among children within the age range in the cross-sectional Beijing corpus; error bars indicate standard deviation).
Syntactic position of the FPs. Given that Gabriel exhibited a different pattern from the other bilingual children in both English exposure and the production rates of only-utterances (only six tokens recorded), as described in the last section, we excluded Gabriel from the analysis of the quality of English FP from this point on. His zhi(you)-utterances in Mandarin (54 tokens), on the other hand, were retained. Figure 3 shows the mean percentages of only, zhi, and zhiyou in different positions in the bilingual and monolingual children (see Supplementary Table S5 for individual performance). Overall, the bilingual children matched their monolingual peers in respective languages regarding the syntactic distribution of the FPs. Among several syntactic positions available to only, the pre-lexical verb position (PreLexV) is the preferred position for only in both monolinguals and bilinguals (M = 37.02% and 42.96%, respectively, individual proportions ranging from 23.81% to 58.33%), consistent with previous findings with English children (Hracs, Reference Hracs2021). For the Mandarin FPs, the bilingual children, like their monolingual counterparts, produced zhi and zhiyou in complementary syntactic positions. Zhi appeared almost exclusively in the pre-auxiliary (PreAux, M = 55.93%) and pre-lexical verb (PreLexV, M = 42.68%) positions, while zhiyou appeared in pre-subject, pre-NP, and verb positions.

Figure 3. Mean percentages of only, zhi and zhiyou appearing in different syntactic positions in bilingual and monolingual children (percentages calculated by dividing the token number of FPs in a given position by total number of relevant FP).
Interestingly, the bilingual children produced 11 tokens of pre-auxiliary only (5 by Madison, 1 by Alex, 5 by Charles, shown in (3)a and (3)b), a structure rarely produced by the English monolinguals (one token in total). Among all only-utterances in which the FP only co-occurs with modal verbs (5 tokens by the monolinguals, 22 by the bilinguals), four out of five tokens by the monolingual children follow the [AuxV-only] sequence, as shown in (3)c, while only 50% of the only-utterances produced by the bilinguals appear in this sequence.

Semantic focus association. Figure 4 shows the grammatical functions of the intended focus in only-, zhi-, and zhiyou-utterances based on group means (individual data are provided in Supplementary Table S6). As expected, in more than 50% of the only- and zhi-utterances, the focus is (part of) the object (“Obj” in Figure 4) in both the monolinguals and the bilinguals. For zhiyou, it was commonly associated with the object in [S-zhiyou-O] sentences in which zhiyou behaves like the main verb (M = 36.60%, 28.00% in bilinguals and monolinguals respectively), or with the NPs in [zhiyou-NP] sequences (M = 34.07%, 40.00% in bilinguals and monolinguals respectively). We also found six instances of zhi associated with a subject focus produced by the monolinguals (1.74%; two tokens) and bilinguals (3.25%; three by Gabriel, one by Alex). All of them were grammatical [zhi-shi-SFVO] sequences, in which zhi precedes the copula shi “be.” Narrow focus on the verb was very rare in both languages.

Figure 4. Focus association of only-, zhi-, and zhiyou-utterances in bilingual and monolingual children (numbers showing percentages of tokens of a given focus position among total tokens of the relevant focus particle in the relevant group).
Positioning and prosodic marking of focus. For the only-utterances, 66 out of 152 tokens (43.42%) produced by the bilinguals had [−UF] focus. This percentage holds across the three individual bilingual children (Min = 38.71%, Max = 45.45%, M = 42.70%). The English monolinguals displayed a different pattern, producing a higher proportion of [−UF] focus in their only-utterances (M = 51.33%). For the zhi- and zhiyou-utterances, since they are not expected to behave differently in this aspect, we combined them in this analysis. Results show that both the bilingual and monolingual children produced larger proportions of [+UF] focus (M = 61.33% and 67.81% respectively) than [−UF] ones in their zhi(you)-utterances. Figure 5 illustrates the relative proportions of [−UF] and [+UF] focus in only−/zhi(you)-utterances by the children. Individual data are provided in Supplementary Table S7.

Figure 5. Positional marking of focus in only- (left) and zhi(you)-utterances (right) by bilingual and monolingual children (UF = Utterance-Final; BL = bilingual, ML = monolingual; numbers in bars showing percentages of different positions of the focus in total only/zhi(you)-utterances by the relevant speaker).
As described in Figure 1, the prosodic analysis was conducted with the [−UF] utterances produced by the bilingual and Mandarin monolingual children (237 tokens). In the absence of audio files of the English children, we were unable to conduct the same analysis on the English monolingual controls. After removing 88 utterances with excessive noise and speaker overlap, a total of 149 utterances remained in the duration analysis (44 only-utterances and 67 zhi(you)-utterances by bilinguals; 38 zhi(you)-utterances by monolinguals). For the pitch analysis, utterances containing segment(s) with undefined pitch by Praat were further excluded, rendering a smaller set of utterances in the pitch analysis (107 tokens in total, including 35 only- and 41 zhi(you)-utterances by bilinguals; 31 zhi(you)-utterances by monolinguals). Figure 6 presents duration and pitch patterns in those analysable only/zhi(you)-utterances with [−UF] focus. Wilcoxon signed-rank tests showed that the duration of focal part was significantly longer than that of non-focal part in both languages of the bilinguals (English: Z = 5.12, p < .001, r = .77; Mandarin: Z = 3.63, p < .001. r = .44) and in the Mandarin monolinguals (Z = 2.94, p = .003, r = .48). However, no significant differences in pitch were found in either language or group (ps > .05). The numerical results are provided in Supplementary Table S8.

Figure 6. Mean duration (left) and Average of mean pitch (right) of focus and non-focus parts of FP-utterances by children (BL = bilingual, ML = monolingual; numbers in bars indicate mean values; error bars indicate standard deviation; ** indicate p < .01 in Wilcoxon signed rank tests).
3.4. Summary
In Study 1, we analysed 703 only- and zhi(you)-utterances produced by bilingual and monolingual children in naturalistic adult–child play settings. Our results showed that the production rates of FP-utterances in bilingual children increased with age (Figure 2). Remarkably, despite reduced input and generally lower proficiency in each language (as indicated by MLUw shown in Table 2), their production rates of only were comparable with their monolingual counterparts at the earliest stage of development. Meanwhile, their production rates of zhi(you), despite initially lagging behind their monolingual peers, caught up by age 5. These findings answered our first research question and are consistent with our predictions that facilitative cross-linguistic influence may mitigate reduced input and exposure in bilingual children.
To answer the second question, we examined the quality of the focus-utterances in terms of syntactic positioning and the semantic association of the FP. The results (Figures 3, 4) showed that the bilinguals produced only and zhi(you) in a full spectrum of syntactic positions with diverse semantic associations as allowed by respective structures in specific languages. The proportions vary considerably across syntactic positions and focus association patterns, with only/zhi occupying the pre-verbal positions (PreAux and PreLexV) and semantically associated with the object in more than 50% of the tokens, while zhiyou displays a drastically different distributional pattern. We hypothesize that only and zhi are perceived as structural equivalents in the Mandarin–English bilingual grammar, which paves way for cross-linguistic transfer between the two particles. In the bilinguals’ production data, there is strong evidence of cross-linguistic influence from zhi to only, as illustrated in (3). That being said, not all properties of only and zhi are equally transferable to each other. Our data clearly show that like the monolingual children, the bilingual children produced a good number of only in pre-subject, pre/post-object and pre/post-NP positions, which are impermissible positions for zhi in adult Mandarin, and we did not find them placing zhi in these positions. We will return to this issue in our general discussion.
Finally, regarding focus-marking strategies examined in our third research question, as expected, in half or more than half of the only or zhi(you)-utterances, the intended focus appeared in the utterance-final position, and this pattern is found in the bilingual children and their English and Mandarin monolingual peers. Interestingly, this common preference for utterance-final focus seems to be strongest in Mandarin monolingual children (68%), and weakest in English monolingual children (49%), with the bilingual children’s both languages in between (57% and 61% for only and zhi(you) respectively), indicating convergence of language use between languages in the bilinguals. For the [−UF] utterances, our acoustic analysis found that the bilinguals, like their Mandarin monolingual peers, used longer duration, instead of higher pitch, to indicate intended focus in both languages. The asymmetrical usage of the two acoustic features aligns with our prediction and with previous findings with Mandarin and some English children (Patel & Brayton, Reference Patel and Brayton2009; Thorson & Morgan, Reference Thorson and Morgan2021; Yang & Chen, Reference Yang and Chen2018).
Overall, the results of Study 1 have provided the first evidence of productive use of only/zhi(you)-focus by Mandarin–English bilingual preschoolers in naturalistic settings, indicating facilitative and non-facilitative influence between only and zhi. We have also found an expected strong tendency to disambiguate only-type focus identification through word order devices rather than by prosodic stress shifting in the production of the bilingual children, which is consistent with previous observations of adult Mandarin (e.g., Xu, Reference Xu2004). However, it is not clear to what extent the production patterns in children can be accounted for by the provision of focus in parental input during the early years. This motivated Study 2, which examines parental input directed to Mandarin-learning bilingual children.
4. Study 2
4.1. Research question and prediction
Study 2 followed up on the finding of focus marking patterns in the bilingual children in Study 1 and investigated how Mandarin-speaking parents of bilingual children provide zhi(you)-utterances in child-directed speech in context. Although a close input–outcome relation at the syntactic level has been observed with only in English in Hracs (Reference Hracs2021), how zhi(you)-focus is provided in the parental input directed to Mandarin-speaking children has not been investigated, whether in bilingual or monolingual contexts. In this study, we asked: Through what linguistic means is the intended focus of zhi(you)-utterances provided and disambiguated in the Mandarin input directed to Mandarin–English bilingual children? If the patterns found in the Mandarin–English bilingual children’s production have roots in the input they receive, we expect to find similar patterns of focus realization in the parental input. That is, zhi(you)-focus in Mandarin-speaking parents will be realized as [+UF] to a great extent, and prosodic prominence will not be employed extensively to indicate the focus.
4.2. Methods
We conducted an exhaustive search in CHILDES and included parental input data in the recordings of two Mandarin–English bilingual children in the US (Luna, Sophie) from the Child Heritage Chinese Corpus (CHCC, Mai & Yip, Reference Mai and Yip2017). The parents of Luna and Sophie are native speakers of Mandarin who grew up in China and arrived in the US in young adulthood. The children heard mostly Mandarin in the home context and English in the school context. We chose parental input in CHCC, rather than child-directed speech in HKMECC (details in Study 1) for this analysis because the adult interlocutors in HKMECC were research assistants rather than parents. Given similarities between the CHCC and HKMECC children in terms of the parents’ background (Chinese-L1 first-generation immigrants; mid-high socioeconomic status) and the children’s bilingual acquisition contexts (Mandarin at home; English at school), input provided by the two parents in CHCC should approximate to the input directed to the bilingual children in Hong Kong. We also included input provided by parents to monolingual children in mainland China as a reference (n = 2, Deng & Yip, Reference Deng and Yip2018; Zhang & Zhou, Reference Zhang and Zhou2009). Table 3 details the parental input data.
Table 3. Parental input in Mandarin selected for Study 2. (Notes: MOT = Mother)

We followed the same steps described in Figure 1 for Study 1 to extract and pre-process the zhi(you)-utterances in the parental input data, except that we did not remove repetitions from the parental input, as repetitions constitute part of the input directed to the child. For each target utterance, we coded positional marking and prosodic marking following the procedures described in Study 1. Additionally, since contextual information plays an important role in identifying the focus, especially for children (Gualmini et al., Reference Gualmini, Maciukaite and Crain2003; Zimmermann & Onea, Reference Zimmermann and Onea2011), we coded whether alternative sets of the intended focus have been explicitly instantiated in the surrounding discourse of the target utterances (i.e., within five preceding and two subsequent utterances).
4.3. Results
Table 4 presents the descriptive statistics of 161 tokens of zhi(you) extracted from the parental input. Parents produced zhi(you) quite infrequently (M = 0.32%). To put the relative frequency in perspective, we searched for another well-studied FP ye (Mandarin equivalent of also) for comparison. Results show that zhi(you) appears considerably less frequently than ye across the bilingual and monolingual cases (M = 1.81%).
Table 4. Zhi(you) in parental input to bilingual and monolingual children. (Notes: % = raw tokens of zhi(you) or ye (“also”) divided by total utterances)

As shown in Figure 7, the majority of the target utterances in parental input had [+UF] focus (M = 70.15%, SD = 9.38%), and cases with the [−UF] focus accounted for a smaller proportion (M = 29.85%, SD = 9.38%). Furthermore, the alternative sets were explicitly provided in over half of the cases across families (M = 59.09%, SD = 4.79%) (see detailed results in Supplementary Table S9). Among the 48 utterances with [−UF] focus produced by the parents of Luna, Sophie and Tong (Xue’s utterances were excluded from this analysis due to the absence of audio recordings), a total of 42 utterances were eligible for durational analysis and 32 for pitch analysis. Wilcoxon signed-rank tests showed that the focal parts have significantly longer duration (M = 182.33 ms, Z = 3.84, p < .001, r = .59) and higher pitch (M = 262.53 Hz, Z = 3.75, p < .001, r = .58) than the non-focal parts in Mandarin parental speech (duration: M = 142.15 ms; pitch: M = 217.74 Hz, detailed results in Supplementary Table S8).

Figure 7. Positional and contextual marking of focus in parental input in Mandarin (UF = Utterance-Final; CSA = Contrastive Set Available; numbers in bars showing percentages of different positions of the intended focus).
4.4. Summary
Our results showed that Mandarin-L1 parents who addressed Mandarin–English bilingual children in Mandarin produced around four tokens of zhi(you)-utterances per 1000 utterances. This percentage is not lower than that of parents of Mandarin monolingual children in China (2.5 tokens per 1000 utterances). The parents utilized a range of linguistic means to disambiguate the focus of zhi(you)-utterances, which include those found earlier in the bilingual children in Study 1 (i.e., rendering the intended focus in the utterance-final position and using longer duration to mark the [−UF] focus) and a prosodic means not yet employed by the preschoolers consistently (i.e., marking the [−UF] focus with higher pitch). The parents also drew quite frequently upon surrounding contextual information in discourse to scaffold focus meaning (around 60% across individual parents). Overall, the intended focus in the zhi(you)-focus provided by the Mandarin-speaking parents is highly transparent, marked by multiple linguistic cues.
5. General discussion
5.1. Cross-linguistic influence and input reduction in focus production
Our investigation of only and zhi(you)-focus in four Mandarin–English bilingual children was motivated by similarities and differences between the FPs in Mandarin and English and the hypothesis that FPs are susceptible to input reduction and cross-linguistic influence (CLI) in early bilingual development. Although only/zhi(you)-focus structures are generally infrequent in the input, they are associated with the focus of the utterance (which is, by definition, centre of attention) and moderate the propositional assertion of the utterance, carrying heavy semantic weight and important communicative functions. In terms of phonological and morphological forms, both only and zhi(you) are stand-alone words and appear in invariant forms across contexts in respective languages. In this light, they qualify as one of those salient and prominent elements in the input (Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001), if one adopts a fine-grained and weighted view of input processing. Our analysis assumes that for young children, the acquisition of the multi-domain knowledge associated with only/zhi(you)-focus relies on significant amounts of experience demonstrating relevant usages of the FPs in meaningful contexts, likely moderated by cognitive factors, whether they are innate or shaped by the environment during the early years. If this analysis is on the right track, then bilingual children are faced with this challenge of input reduction in the acquisition of only/zhi(you)-focus, which should lead to slower development than monolingual children, unless it is mitigated by positive CLI from the other language. While the underlying mechanisms and conditions of CLI remain topics in current theoretical and empirical debates, it is generally accepted that bilinguals develop shared syntactic representations of similar structures across two languages, with robust evidence from cross-language priming studies: in bilingual production, once the intended meaning has been prepared at the conceptual level, relevant linguistic structures sharing syntactic representations in the two languages are co-activated. (e.g., Bosch & Unsworth, Reference Bosch and Unsworth2021; Unsworth, Reference Unsworth2023; Vasilyeva et al., Reference Vasilyeva, Waterfall, Gámez, Gómez, Bowers and Shimpi2010).
Cross-linguistic influence plays both facilitative and non-facilitative roles in driving bilingual development. Both roles are evident in our data. On the positive side, we found that the bilinguals who were consistently exposed to large amounts of English input produced only- and zhi(you)-utterances increasingly frequently with time. Their production rate of only was comparable to that of monolinguals at the earliest stages of development; and the rate of zhi(you), despite some delay initially, soon caught up. They produced only and zhi(you) in a full spectrum of syntactic positions with diverse semantic associations as allowed by respective structures in specific languages. Since the bilingual children did not have as much input as the monolingual controls, and there is no reason to suspect that the bilingual children in our corpus had access to higher quality input than typical monolingual children, we have accounted for the faster-than-predicted development of the FPs in the bilingual children by facilitative cross-linguistic mapping between zhi and only in the bilingual grammar. Given the nature of our data, it is possible that some children had mastered the multi-domain knowledge of the FP structures, but avoided using them. However, this does not undermine our comparison between the bilinguals and the monolinguals, because both the bilingual and monolingual FP-utterances were extracted from naturalistic adult–child play interactions.
Here, we further discuss the mapping among zhi, zhiyou in Mandarin and only in English. Among all possible cross-linguistic correspondences between zhi and zhiyou on the one hand and only on the other (Table 1), we predicted the strongest CLI effects between zhi and only in the utterance-medial preverbal positions, in contrast with the others. This prediction is confirmed by the strong preference for this position in both zhi and only (Figures 3, 4), as well as the non-target-like [only-AuxV] pattern in (3) in three bilingual children. Notice that this prediction was made based on both syntactic availability (i.e., both only and zhi can appear preverbally), and distributional patterns in usage (i.e., preverbal only with an object focus interpretation constitutes the majority of the only-utterances in CDS and CPS in Hracs (Reference Hracs2021)). Predictions based on syntactic availability alone would not have correctly differentiated CLI effects among different positions. Our study, thus, highlights the importance of predicting CLI based on both syntactic options and usage distributions.
Comparing our findings with previous ones, on the one hand, our study agrees with the study of French–English bilingual children in Korade et al. (Reference Korade, Nicoladis and Charest2024), which reported bilingual advantages in complex syntax production, as our Mandarin–English bilinguals produced focus particles with comparable frequency and quality to monolingual peers. On the other hand, our Mandarin–English bilingual children performed better in attuning to the language-specific properties of the exclusive FPs than German–French bilingual children in Germany in producing additive FPs. In the latter, their production of the FP in French was very much constrained by the corresponding FP in German (Leray, Reference Leray, Dufter and Jacob2009). Although our study and Leray (Reference Leray, Dufter and Jacob2009) overlap in terms of the age of the children (age 2–4) and the nature of the data (naturalistic production in adult–child interactions), there are important differences in the type of bilingualism investigated: our children achieved bilingualism in a one context-one language setting in a multilingual society, whereas the children in Leray (Reference Leray, Dufter and Jacob2009) were raised following the one parent–one language principle in a German-dominant society. Those differences may have led to the different language dominance patterns and focus acquisition outcomes between the two groups of bilingual children. An advantage of our children is that in our case, the L1 FPs (zhi and zhiyou) and the L2 FP (only) constitute a “many-to-one” relation in terms of syntactic position and semantic function, which is easier to compute in L1-to-L2 transfer than those “one-to-many” cases, in which different functions of an L1 item are distributed across various items in the L2. This may also have contributed to the faster development of only in our bilingual children.
5.2. The impact of lexical tone and sentential prosody on prosodic stress production: follow-up analyses
As an exploratory study to provide the first birdʹs eye view of multi-domain exclusive focus marking in bilingual children and their parents during naturalistic interactions, we conducted relatively coarser-grained analyses on prosodic focus marking, concentrating on duration and mean pitch measures. Due to the nature of our data, we were unable to control for Mandarin lexical tone in our analysis of zhi(you)-utterances, neither did we consider the impact of sentential prosody and distinguish between pre- and post-focal strings among the non-focal elements when comparing them with the focal elements. Nevertheless, here we present two follow-up analyses to shed more light on the impact of lexical tone and sentential prosody on our findings and to motivate future research.
In the first analysis, we evaluated whether the longer duration of focal parts in zhi(you)-sentences found across groups in our study is attributable to larger proportions of a “longer” tone, and whether the absence of raised pitch in the children and the presence of the same effects in the parental input are associated with different distributions of a “higher” tone across the two samples. We found that in the 147 zhi(you)-utterances in the durational analysis, Tone 4, which is a falling tone with a shorter duration (Wu et al., Reference Wu, Adda-Decker and Lamel2023), was (one of) the most frequent tonal category across parental input and both groups of children (above 30%). This suggests that our findings regarding duration could not have been an artifact of a distributional bias towards the inherently “longer” tones. We also found that among the 104 sentences in the pitch analysis, the four lexical tones showed a balanced distribution in the focal parts in the parental input (between 23% and 26%). However, Tone 1, which is associated with higher mean pitch, was the least frequent across groups in the children (9.38%). To further investigate this, we selected subsets of zhi(you)-utterances from the children, ensuring balanced distribution (25%) of the four tones in the focal parts of the selected utterances and reanalysed the utterances for mean pitch. The results revealed that neither group of children produced focus with a statistically significant increase in mean pitch (ps > .30). Overall, this follow-up analysis supports our findings in Studies 1 and 2. More details of this analysis are in Supplementary Tables S10 and S11.
In the second follow-up analysis, we conducted a finer-grained examination and assessed the extent to which pre/post-focus status of the non-focal elements had obscured our findings, since as the utterance naturally unfolds, syllables may become lengthened and mean pitch may decrease (Vaissière, Reference Vaissière1983). Results of non-parametric Friedman’s ANOVA and follow-up pairwise Conover’s test comparing pre-focal, focal, and post-focal parts revealed two crucial patterns: (i) For duration, across languages and groups of speakers, the focal elements are not shorter than the post-focal elements (ps > .8), suggesting manipulation of duration on the focal parts on top of sentential prosody to shift the stress; and (ii) For mean pitch, the focal elements are not lower than the pre-focal elements only in the parents (p = .67), revealing manipulation of pitch to increase the prosodic prominence of the focal parts by the parents; meanwhile, the bilingual children produced the focal elements with lower mean pitch than the pre-focal elements (ps < .01), adhering to general sentential prosody without raising the mean pitch of the focal parts, and the mean pitch differences between pre-focal and focal elements are bordering significant in the monolingual children (p = .09). Further studies should explore if the use of pitch for prosodic focus marking is subject to bilingual versus monolingual differences in Mandarin-learning children. Details are presented in Supplementary Figure S1.
Put together, our follow-up analyses show that the findings from our coarser-grained analyses are not affected by the natural distribution of tone and sentential prosody, at least for the speech samples in this study. Since this is a first attempt to analyse multi-domain focus marking in a low-frequency FP-construction using naturalistic data, we await future replications (e.g., including a sufficiently large number of analysable FP-tokens such that lexical tone and pre/post-focal status can be entered as predictors in linear regression models) and discussions on standard data analysis procedures for similar studies.
5.3. Development of focus expression: age, parental input, or both?
Both comprehension and production of focus require multi-domain linguistic knowledge. For focus-sensitive quantification involving only, previous studies have found that some knowledge of the FP is accessible to children at a young age. This includes early mastery of default focus position in both English and Mandarin monolingual children at age 3 (e.g., Crain et al., Reference Crain, Ni and Conway1994; Yang, Reference Yang2002), as well as the ability to activate the alternative set of the intended focus at 5 years old (e.g., Notley et al., Reference Notley, Zhou, Crain and Thornton2009). Our analysis of Mandarin input in Study 2 suggests that this focus knowledge in young children aligns with how focus is provided and disambiguated in the parental input. That is, in the majority of the zhi(you)-utterances, the intended focus is placed in the default focus position, and over half of the zhi(you)-focus instances were accompanied by discourse contexts in which the alternative set of the intended focus is provided explicitly in the surrounding context. A consequence is that for children, there exist few contexts in which understanding focus necessarily hinges upon complex syntactic computation of the construction, or integration of shifted stress. Although child-internal factors such as immature syntactic knowledge (Notley et al., Reference Notley, Zhou, Crain and Thornton2009), limited cognitive ability, and children’s not-yet-automatic mapping between focus and phonology (Zhou et al., Reference Zhou, Su, Crain, Gao and Zhan2012) may also play a role, our investigation of zhi(you)-focus in the parental input provides an additional account for the less developed syntactic computation and prosodic stress integration in focus structures among preschoolers.
Within the multiple domains of knowledge required for focus identification, the integration of prosodic prominence to indicate intended focus is not fully fledged in our bilingual or Mandarin monolingual preschoolers. Our findings showed that when shifted stress is needed, the preschoolers prosodically mark the intended focus through increased duration, rather than raised pitch. This is particularly interesting when we take into consideration the presence of both greater duration and raised pitch in the parental input. Clearly, prosodic marking in only/zhi(you)-focus develops particularly slowly compared to other aspects of focus knowledge and displays discrepancies from patterns in the input until a much later age. The development of adult-like competence in prosody for communication is a gradual process, despite its early onset (Chen et al., Reference Chen, Esteve-Gibert, Prieto, Redford, Gussenhoven and Chen2020).
A natural question is how integration of focus-related prosodic knowledge becomes available later in life and automatic in adulthood. One possibility is child-internal: recent findings using functional near-infrared spectroscopy neuroimaging techniques revealed that compared with children (age 6–12), adults showed stronger activity in the left prefrontal regions (the regions specialized for phonological and morphosyntactic processes) when judging English sentences with different prosodic patterns (Pasquinelli et al., Reference Pasquinelli, Tessier, Karas, Hu and Kovelman2023). We postulate that the ability of focus-accentuation mapping may be part of the biological programming of language development and necessarily hinges on age-related specialization of language networks. Additionally, age-related physiological changes such as gradual lengthening of the vocal tract and pharyngeal cavity, lowering of the larynx, and widening of the gap between the velopharynx and epiglottis are also likely to be responsible for the increasingly reliable control of prosody (Chen et al., Reference Chen, Esteve-Gibert, Prieto, Redford, Gussenhoven and Chen2020; Ito, Reference Ito, Prieto and Esteve-Gibert2018; Patel & Brayton, Reference Patel and Brayton2009).
Another possibility is child-external and input-driven: since child-directed speech exhibits a simplified pattern with shorter utterances and simpler sentence frames adapted to the child interlocutor’s age and proficiency (van Dijk et al., Reference van Dijk, van Geert, Korecky-Kröll, Maillochon, Laaha, Dressler and Bassano2013), parents of older children may produce FPs more frequently and with increased structural complexity, for which appropriate focus identification requires intensive computation of syntactic relations and shifted stress. These age-related changes were captured in the child-directed speech in English in Hracs (Reference Hracs2021). However, due to the scarcity of systematic audio data of naturalistic Mandarin input directed to children at preschool ages, we leave this question regarding Mandarin child-directed speech for future investigations.
6. Conclusions
Although this study is limited in terms of the number of children and analysable tokens of the target structure, it is one of the first to analyse the spontaneous production of focus particles (FP) only and zhi(you) by bilingual children and relevant parental input, with reference to matched monolingual baselines. An innovation is the multi-domain analysis we performed in order to understand only-type focus marking, which had not been conducted within one set of production data for either English or Mandarin children until our study.
A key finding of this study is the monolingual-like productivity of the FPs in both languages of the bilingual children despite their limited experience with respective languages, as well as the Mandarin-to-English transfer attested in the most frequent sequence among all possible sequences involving only or zhi. This has enriched our understanding of the selective nature of cross-linguistic influence (CLI) and factors conditioning the transfer when both languages are in development. It has been observed in recent and our own studies that cross-linguistic influence at the grammatical level can play an important role in mitigating reduced input in each language for bilingual children. Future research can continue to investigate how to predict accurately and to nurture effectively facilitative CLI effects in preschool-age children.
Another important finding is that Mandarin-speaking parents of bilingual children, like those of age-matched monolingual children, draw upon multiple linguistic means to mark the intended focus, including the utterance-final default focus position, contrastive referents in the discourse context and prosodic prominence, rendering the task of focus interpretation and disambiguation simpler and more straightforward for young children than one would presume. We suspect that the relatively late development of complex syntactic and prosodic computation and integration in FP-utterances reported in previous experimental studies may be closely related to how focus is provided in the input directed to young children, as well as to child-internal neurocognitive processes. Our study has provided a base for future experimental studies in pursuit of disentangling these factors, and has contributed a critical yet long-overdue step towards solving the focus puzzle.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S030500092510024X.
Acknowledgements
We thank the participating families for their trust and support. The findings of this article were discussed in a number of internal seminars in the City and Chinese Universities of Hong Kong, and presented in several venues, including the 14th International Symposium on Bilingualism (ISB14), the 10th International Conference on Formal Linguistics (ICFL-10), and the XVIth International Congress for the Study of Child Language (IASCL2024). We are tremendously grateful for the intellectual support and technical assistance from our mentors and colleagues, especially Virginia Yip, Stephen Matthews, Boping Yuan, Jackie Lai, Jiangling Zhou, Mengyao Shang, Yuqing Liang, Xuening Zhang, Jiaqi Nie, Ranee Cheng, Yue Chen, Yue Cao, Yang Zhao and Shanshan Yan. We are deeply indebted to Peggy Mok, Karen So, Zhenting Liu, and an anonymous reviewer for sharing their expertise in acoustic analysis of prosodic stress, and to Yenan Sun for consultations on focus semantics. All remaining errors are our own.
Financial support
This research was supported by research grants funded by the Research Grants Council, HKSAR (Project number: 14615820, 21604522).
Competing interests
The authors declare none.