How detailed do measures of bilingual language experience need to be? A cost–benefit analysis using the Q-BEx questionnaire

Cécile De Cat; Arief Gusnanto; Draško Kašćelan; Philippe Prévost; Ludovica Serratrice; Laurie Tuller; Sharon Unsworth

doi:10.1017/S1366728925100497

How detailed do measures of bilingual language experience need to be? A cost–benefit analysis using the Q-BEx questionnaire

Published online by Cambridge University Press: 12 September 2025

Laurie Tuller and

Cécile De Cat*: Affiliation:
School of Languages, Cultures and Societies, University of Leeds , Leeds, UK
Arief Gusnanto: Affiliation:
School of Mathematics, University of Leeds , Leeds, UK
Draško Kašćelan: Affiliation:
School of Health and Social Care, University of Essex , Colchester, UK
Philippe Prévost: Affiliation:
UMR Inserm U 1253, University of Tours, Tours, France
Ludovica Serratrice: Affiliation:
PCLS, University of Reading , Reading, UK
Laurie Tuller: Affiliation:
UMR Inserm U 1253, University of Tours, Tours, France
Sharon Unsworth: Affiliation:
Centre for Language Studies, Radboud University, Nijmegen, NL
*: Corresponding author: Cécile De Cat; Email: c.decat@leeds.ac.uk

Article contents

Abstract
Highlights
Introduction
Methods
Results
Discussion
Conclusion
Data availability statement
Author contribution
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

What is the optimal level of questionnaire detail required to measure bilingual language experience? This empirical evaluation compares alternative measures of language exposure of varying cost (i.e., questionnaire detail) in terms of their performance as predictors of oral language outcomes. The alternative measures were derived from Q-BEx questionnaire data collected from a diverse sample of 121 heritage bilinguals (5–9 years of age) growing up in France, the Netherlands and the UK. Outcome data consisted of morphosyntax and vocabulary measures (in the societal language) and parental estimates of oral proficiency (in the heritage language). Statistical modelling exploited information-theoretic and cross-validation approaches to identify the optimal language exposure measure. Optimal cost–benefit was achieved with cumulative exposure (for the societal language) and current exposure in the home (for the heritage language). The greatest level of questionnaire detail did not yield more reliable predictors of language outcomes.

Keywords

language exposure background questionnaire cross-validation information theory

Information

Type: Research Article
Information: Bilingualism: Language and Cognition , First View , pp. 1 - 12

DOI: https://doi.org/10.1017/S1366728925100497 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open data Open materials
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Highlights

• We compare alternative measures of language exposure as predictors of language proficiency.
• We identify the optimal ones using information-theoretic and cross-validation methods.
• Cumulative exposure yields the most reliable predictor of proficiency in the societal language.
• Current exposure in the home yields the optimal predictor of proficiency in the heritage language.
• The greatest level of questionnaire detail did not yield more informative predictors.

1. Introduction

Bilingualism is a multi-faceted phenomenon, manifested through a myriad of individual differences in terms of age of onset, contexts of language exposure and use, quantity of exposure and use, language outcomes, attitudes and changes over the lifetime. Bilingualism research spans many scientific disciplines, but even within each discipline, the tools used to document and quantify bilingualism vary widely (e.g., Kašćelan et al., Reference Kašćelan, Prévost, Serratrice, Tuller, Unsworth and De Cat2021). In recent years, there have been many calls for greater comparability of methods to measure and document bilingualism (e.g., Byers-Heinlein et al., Reference Byers-Heinlein, Esposito, Winsler, Marian, Castro and Luk2019; Rocha-Hidalgo & Barr, Reference Rocha-Hidalgo and Barr2022). This is essential not only to improve research replicability but also to facilitate exchange of information across sectors (notably with education professionals and speech and language therapists). The first step towards achieving comparability and replicability is to use the same tools (e.g., questionnaires) when gathering information which will subsequently be used to derive relevant variables (e.g., amount of exposure). Journals have started to publish methodological reviews to inform that debate (see, e.g., Luk & Esposito, Reference Luk and Esposito2020). However, the focus tends to be on the qualitative evaluation of existing questionnaires (e.g., in terms of content overlap – Dass et al., Reference Dass, Smirnova-Godoy, McColl, Grundy, Luk and Anderson2024).

This paper addresses a related, but hitherto ignored methodological question: What is the optimal level of questionnaire detail required to derive empirically adequate measures of bilingualism? This question arises from two distinct problems: the quantity problem and the quality problem. It is notoriously difficult to get participants to fill in questionnaires, and completion time is often invoked as a major hurdle. This quantity problem not only results in missing data but also potential bias in which participants actually complete the questionnaire. It may be, for example, that those who have the time and resources for such a task tend to come from more privileged backgrounds, resulting in sampling bias. The quality problem rests on the assumption that a greater level of questionnaire detail leads to greater precision of the estimates it can generate. We believe that if we ask more questions, or more precise questions, we will be able to obtain more informative and reliable predictors for the outcomes of interest in our study. However, that assumption remains untested.

To the extent that more questionnaire detail increases the quality of the information gathered, the quantity and quality problems overlap. Let us consider the concrete example of estimating a bilingual child’s current exposure to their two languages. This can be informed by questions asking for global estimates (1), estimates by context (2-a), estimates for typical weeks versus during holidays (2-b) or estimates by (group of) interlocutor(s) (2-c).

(1) How often do people talk to the child in each language overall?
(2) How often do people talk to the child in each language
1. a. at home/at school/in the local community?
2. b. during a typical week/during the weekend/during holidays?
3. c. depending on whether they are caregivers/siblings/extended family/peers/adults in the community?

The estimates obtained from (2) could also be fine-tuned, by weighing them according to the amount of time the child spends in each context and/or with each (group of) interlocutor(s). And this could be estimated both for typical weeks and holidays.

In all cases, and even if this is not required explicitly, the questionnaire respondent is asked to recall what happens over a period of time (e.g., a typical week, the current year) and estimate the relevant average (whether global or by context). This is a complex cognitive task in terms of memory and numeracy. If recall or quantification are not accurate, there may be a large error margin in the data.

One approach to the quality problem could be to perform a psychometric evaluation of the trade-off between accuracy and error at different levels of question precision. This would require asking the same respondents to answer a set of questions about the same aspect(s) of the child’s language experience (e.g., current language exposure), but with different levels of precision, as illustrated above. This could be repeated to compare responses within participants and estimate variations across sessions. By comparing the consistency of responses across questions and across sessions, we could obtain estimates of the level of error at each level of precision. Ideally this would be compared to data collected objectively by recording all the language interactions with the child (similar to the approach adopted by Verhagen et al., Reference Verhagen, Boom, Thieme, Kuiken, Keydeniers, Aalberse and Andringa2024).

Here, we adopt an information-theoretic approach. Our aim is to evaluate the impact of different levels of questionnaire detail when documenting bilingual children’s language experience. We therefore perform a ‘cost–benefit analysis’ to address the quantity and quality problems, by comparing the informativity of alternative measures of language exposure when used as predictors of language outcomes (within the same participants). These alternative measures vary in the amount of questionnaire detail and in the type of questionnaire information used to derive them. The trade-off to be considered is between the cost of obtaining these measures (based on the amount of respondent time required) and the benefit of these measures (based on how informative they are as predictors of language outcomes). Our aim is to make research-informed recommendations regarding the optimal level of questionnaire detail required to obtain informative and reliable measures of language experience for bilingual children.

This study focuses on Q-BEx (De Cat et al., Reference De Cat, Kašćelan, Prévost, Serratrice, Tuller and Unsworth2022), a customisable online questionnaire to document the language experience of bilingual and trilingual children, currently available in 28 languages (with another 6 forthcoming – www.q-bex.org). Its design was informed by an international, cross-sector, Delphi consensus survey (De Cat et al., Reference De Cat, Kašćelan, Prévost, Serratrice, Tuller, Unsworth and consortium2023) to determine the aspects of bilingual experience it should document. The goal of Q-BEx is to enable maximal comparability across languages, children, and countries, in order to gain a better understanding of the role of language experience in the language development of bilingual children across these varying contexts. The design of Q-BEx is informed by the state of the art not only in terms of coverage, but also in its design features. The formulation of the questions and the formatting of the interface were based on recommendations from the psychometric literature (e.g., DeVellis, Reference DeVellis2017). This aimed to minimise the cognitive effort required from respondents, and to maximise the clarity and accessibility of the questions. For example, to answer questions probing the proportion of exposure to each language, respondents used sliders to adjust the sections of a pie chart, in which each language was represented in a different colour (see Figure 1). This avoided asking for explicit calculations, and it allowed the respondent to record their estimates precisely, while ensuring that the sum of proportions did not exceed 100% (representing the child’s total language exposure). For a detailed description of the Q-BEx design (including the quality-control procedures implemented during the design process), see De Cat et al. (Reference De Cat, Kašćelan, Prévost, Serratrice, Tuller and Unsworth2022).

Figure 1. Intuitive estimates of language exposure.

A key feature of Q-BEx is the possibility to implement the questionnaire components selectively, which results in substantial differences in completion time (from the respondents’ point of view). The shortest version can take between 5 and 10 minutes to complete; the longest version can take up to an hour (or more, if the child’s language experience varies substantially across contexts or over time).

The choice of language outcome measures and language exposure measures are presented and justified in the next section. As explained below, our information-theoretic investigations consist of two sets of analyses. First, we adopt the method outlined in Burnham and Anderson (Reference Burnham and Anderson2003) to identify the most informative and parsimonious model out of a set of candidates. Second, we adopt a cross-validation approach (Desmarais & Harden, Reference Desmarais and Harden2014) to better take sampling effects into account and inform recommendations for other datasets. The implications regarding the optimal level of questionnaire detail are considered in the discussion, including specific recommendations for the customisation of Q-BEx.

2. Methods

The data informing this investigation were collected as part of a large-scale study aiming to validate the Q-BEx questionnaire. The design features of the questionnaire are presented after the Participants section. The language outcome measures are presented subsequently.

The study was approved by the ethics committee of each participating university: Tours (Comite d’éthique de la recherche Tours-Poitiers: 2021-09-04), Leeds (the Faculty of Arts’ Humanities and Cultures Research Ethics Committee: 21–006 Amd2), and Nijmegen (Ethics Assessment Committee of the Faculty of Arts and the Faculty of Philosophy, Theology and Religious Studies: 2021–9263). Written informed consent was obtained from the parents of each child participant.

2.1. Participants

Participants were bilingual (n = 88) and trilingual (n = 29) children growing up in France (n = 15), the Netherlands (n = 56) or the United Kingdom (n = 46), aged between 5 and 9 years old (mean = 83 months, sd = 12). This dataset excludes children who did not have a full set of measures as required by our analyses. Recruitment targeted schools in deprived vs privileged areas and relied on existing participant databases and contacts from outreach activities. In France, assistants helped the parents fill in the questionnaire if they had no computer equipment or limited literacy. The original sample included children recruited via speech and language therapy clinics in France. These children were excluded from the present study. In the Netherlands, children with a known history of speech and language therapy (SLT) were excluded from recruitment. In the UK, no exclusion criterion was applied (i.e., the SLT status of children was unknown).

Children’s language experience background was documented using the full version of the Q-BEx questionnaire. The only exception was that in France, the language mixing module was not implemented. Just under half of the participants (n = 54) had been exposed to the Societal Language (SL: either French, Dutch, or English) from birth. For the rest (n = 63), SL exposure started between 1 and 84 months of age.

As shown the left pane of Figure 2, the amount of cumulative experience to the SL varies substantially, both within the sequential and the simultaneous bilinguals. Relatedly, in the simultaneous bilinguals, the correlation between (biological) age and cumulative SL exposure is relatively weak (r = 0.36, p = 0.01).

Figure 2. Correlation between onset of exposure and cumulative exposure in each language (left: SL; right: HL).

All children were exposed to a Heritage Language (HL) from birth. In this study, we only consider children’s strongest HL in the case of trilinguals. We use the term ‘bilingual’ to refer to both bilingual and trilingual children, and the term HL to designate the child’s strongest or unique HL. Across our sample, these HLs were: Afrikaans, Arabic, Bangla, Bulgarian, Chinese, Creole, Czech, English, French, Frisian, German, Greek, Gujarati, Hindi, Hindko, Italian, Japanese, Kurdish, Latvian, Lithuanian, Lingala, Manouche, Mirpuri, Polish, Portuguese, Punjabi, Romanian, Russian, Spanish, Telugu, Turkish, Ukrainian and Urdu. The number of participants per HL per country is given in Table 6 in the Supplementary Materials. Another aspect of the diversity of our sample can be seen in the right pane of Figure 2, which shows the correlation between onset of SL exposure and cumulative experience in the (main) HL.

Two thirds (66%) of participants grew up in homes where the highest level of caregiver education was university level. A sixth (15%) did not have post-secondary school education.

In sum, our sample features substantial variability in terms of age of onset of bilingualism, languages experienced in the home and in the society, and the amount of experience in the HL and the SL. There is also variability in the highest education level achieved by caregivers across their languages.

2.2. The Q-BEx questionnaire

We asked children’s caregivers to complete the full version of Q-BEx, including all components from all modules (i.e., background information, risk factors, language exposure and use, language proficiency, richness, attitudes, language mixing).Footnote ¹ It took between 15 and 130 minutes to complete (35 minutes on average).Footnote ²

In this study, we focus on the Q-BEx module with the highest impact on questionnaire completion time, that is, language exposure and use.Footnote ³ However, we also consider information from other Q-BEx modules, when they are relevant covariates alongside language exposure, in models predicting language outcomes (as explained in the section about language outcome measures). Table 7 in the Supplementary Materials lists these covariates and identifies the Q-BEx module from which the relevant information was elicited.

2.3. Alternative estimates of language exposure

Table 1 lists the alternative estimates of language exposure considered in this study. For each, it specifies how the estimate is calculated and on the basis of what information from the questionnaire. Whenever they apply to both languages, the estimates are calculated in the same way for both the SL and the HL. However, some estimates are only considered with respect to the HL: we assume that exposure in the home and the number of interlocutors are likely to be reliable predictors for the HL but not for the SL. Indeed, for many heritage bilinguals, the home is where most of their HL experience takes place, and the number of HL interlocutors can be very small. HL exposure during the holidays seems to be particularly important for HL acquisition and maintenance (as shown, for example, by Kubota & Rothman, Reference Kubota and Rothman2024). It is currently unclear whether the same is true of the SL.

Table 1. Questionnaire detail and calculation for each of the alternative measures of language exposure

Both for current exposure estimates and for cumulative estimates, the order of presentation in the table (from top to bottom) broadly aligns with a decrease in the level of questionnaire detail required. This corresponds to the ‘cost’ of collecting that information in terms of respondent time. This cost can be interpreted as the ‘complexity’ of the measure, in terms of information quantity. Although we administered the longest version of the questionnaire, the alternative measures we compare are proxies for what would be obtained with a shorter version of the questionnaire.

In summary, we compared six estimates of SL exposure (as alternative predictors of morphosyntax and vocabulary outcomes) and eight estimates of HL exposure (as alternative predictors of HL outcomes). Based on questionnaire completion time, the most costly estimate for each language was current exposure adjusted (i.e., weighted) for the amount of time the child spends with each (group of) interlocutor(s) in each context. We considered four alternative estimates of current exposure: (i) the weighted estimates (defined above), (ii) their nonweighted counterpart, (iii) a global estimate (calculated as an average by context)Footnote ⁴ and (iv) a global estimate of exposure during holidays. In the case of the HL, we additionally considered (v) a global estimate of current exposure in the home. Furthermore, for both languages, we included (vi) an estimate of cumulative exposure (consisting in the sum of global estimates for each respondent-defined period in the child’s life). We also considered (vii) the age of SL onset as a coarse estimate of cumulative exposure to each language (indicating the length of SL exposure, or the length of any monolingual exposure to the HL). Finally, (viii) the number of HL interlocutors was considered as a coarse approximation for HL exposure.

2.4. Language outcome measures

Objective measures of proficiency in the SL (French, Dutch or English) were obtained for morphosyntax and vocabularyFootnote ⁵, as there is ample evidence that they are affected by language exposure (e.g., Hoff, Reference Hoff2003; Thordardottir, Reference Thordardottir2011; Verhagen et al., Reference Verhagen, Boom, Thieme, Kuiken, Keydeniers, Aalberse and Andringa2024).

Morphosyntactic abilities in the SL were measured using the LITMUS Sentence Repetition Task (SRT; Marinis & Armon-Lotem, Reference Marinis, Armon-Lotem, Armon-Lotem, de Jong and Meir2015). The reliability of this methodology is well established (see, e.g., the scoping review by Rujas et al., Reference Rujas, Mariscal, Murillo and Lázaro2021). Repeating a sentence taps into underlying linguistic knowledge in order to parse the stimulus sentence and then reconstruct its structure and meaning. If the morphosyntactic properties featured in the sentence have not been acquired or are too complex to compute, performance is negatively affected (Marinis & Armon-Lotem, Reference Marinis, Armon-Lotem, Armon-Lotem, de Jong and Meir2015; Polišenská et al., Reference Polišenská, Chiat and Roy2015).

In the French version, the SRT consisted of 16 sentences varying in complexity, from less (short sentences in present simple) to more (object relative clauses) complex. In the English and the Dutch versions, the SRT consisted of 30 sentences. They were ordered in two blocks (of 16 and 14 items, respectively) of identical complexity, so that the first block would be as close and equivalent to the French version as possible. No significant difference in accuracy was observed between the two blocks (see Prévost et al. Reference Prévost, Gusnanto, Kašćelan, Serratrice, Tuller, Unsworth and De Catunder review). In all three versions, the sentences were presented auditorily using headphones in a fixed order. Response accuracy was scored in three different ways (eliciting 3 scores), as defined in (3):

This yielded three outcome measures, which we transformed into percentages (i.e., the proportion of accurate repetitions out of the total number of items).

Vocabulary breadth, which corresponds to a receptive measure of vocabulary size, was assessed using the receptive Peabody Picture Vocabulary Task: BPVS-3 for English (Lloyd Dunn et al., Reference Dunn, Dunn, Sewell, Styles, Brzyska, Shamsan and Burge2009), EVIP for French (Dunn et al., Reference Dunn, Thoriault-Whalen and Dunn1993) and PPVT-III-NL for Dutch (Dunn et al., Reference Dunn, Dunn and Schlichting2005). In this task, children are presented with four pictures, they hear a single word and are asked to point to the corresponding picture. Administration followed the instructions in the manual, starting at the age-appropriate starting set and moving up (and if necessary, down) until the required number of errors was met. The outcome measure is the total number of correct responses (i.e., the raw score).Footnote ⁶

Receptive vocabulary depth, which corresponds to how well words are known, was assessed using the Word Classes sub-test of the CELF-5 in English (Semel et al., Reference Semel, Wiig and Secord2017) and French (Wiig et al., Reference Wiig, Semel and Wayne2019), and CELF-4 for Dutch (Kort et al., Reference Kort, Schittekatte and Compaan2008). In this task, children hear words and are asked to indicate which words belong together. As the task progresses, the number of words from which children need to make a selection increase from three to four and visual support in the form of pictures is removed. Administration followed the instructions in the manual until children reached the end or failed to provide a correct response to the required number of consecutive items (four in English, five in Dutch and French). The proportion of correct responses out of the total number of items answered was included as dependent variable in the analyses.

Subjective measures of proficiency outcomes in the HL were derived from parental estimates (based on questions from the Q-BEX module on language proficiency). The questions asked how well the child speaks and how well they understand the HL for their age. We did not include reading and writing abilities, as many children had not learned to read yet. Answer options included hardly at all/not very well/pretty well/very well.

2.5. Cognitive measures

To include as control variables in the models predicting language outcomes, we collected measures of non-verbal intelligence, short-term memory and working memory. Non-verbal intelligence was assessed with the Matrices task from the WISC–V (Wechsler, Reference Wechsler2014) or the WPPSI (Wechsler, Reference Wechsler2013) (the latter for children under 6 years of age).

Short-term memory was assessed through Forward Digit Recall (FDR); working memory was assessed through Backward Digit Recall (BDR). Both tasks were administered through Psychopy, according to the protocol described in Hill et al. (Reference Hill, Shire, Allen, Crossley, Wood, Mason and Waterman2021). Children were presented auditorily with sequences of numbers (through headphones) and asked to repeat these numbers either in the same order (in the FDR task) or in reverse order (in the BDR task). The length of the sequence increased by one digit after 4 trials, starting with 3 digits in the first block of the FDR task, and 2 digits in the first block of the BDR task. The maximum sequence length was 6 digits in the FDR task and 5 digits in the BDR task.

2.6. Modelling procedures

We adopted a predictive modelling approach throughout this study. This determines the baseline assumptions for the two sets of analyses reported below. Predictive modelling aims to identify the smallest set of predictor variables required for a model to achieve the highest level of generalisability in relation to an outcome variable. It seeks to maximise the variance explained by the model, while avoiding over-fitting the data. It does not aim to interpret the individual effect of a particular variable. Our quest for the optimal estimate of language exposure (as a predictor of language proficiency) is therefore situated in the context of a previously identified optimal set of predictor variables. In other words, the control variables included in each model were determined by the optimal predictive model we identified in previous work for each of the outcome variables (in the same group of children): We refer the reader to Prévost et al. (Reference Prévost, Gusnanto, Kašćelan, Serratrice, Tuller, Unsworth and De Catunder review) for SL morphosyntax outcomes, Serratrice et al. (Reference Serratrice, Gusnanto, Kašćelan, Prévost, Tuller, Unsworth and De Catunder review) for SL vocabulary outcomes, and De Cat et al. (Reference De Cat, Gusnanto, Kašćelan, Prévost, Serratrice, Tuller and Unsworthin preparation) for outcomes of oral proficiency in the HL. The optimal set of control variables retained in each model is listed in the Supplementary Materials. Across models, this included the following aspects: SL (coinciding with country of residence), parental education, cognitive measures, phonological competence, attitudes towards the SL, early language development milestones and concerns, diversity and richness of the language environment and experience. The type of regression depended on the distribution of the outcome variable. We used beta regressions for percent scores (SRT and vocabulary depth), linear regression for numeric scores (vocabulary breadth) and cumulative link models for ordinal scores (parental estimates of HL proficiency).

For the first set of analyses, we use the information-theoretic approach to model selection proposed by Burnham and Anderson (Reference Burnham and Anderson2003). Rather than selecting a single ‘best’ model, this approach considers a set of models, and makes inferences to determine the relative support for each model in the set. It thereby acknowledges the unavoidable uncertainty in model selection. Akaike weights, derived from the AIC, are calculated for each model, balancing model fit against model complexity to identify the most parsimonious model. The weights add up to 1 (i.e., 100%) across the models in the set. A model’s weight can be interpreted as the probability (expressed as a percentage) that a given model is the best in the set, as will be explained in detail in the Results section. This method provides a rigorous framework to compare alternative predictors within an otherwise identical model (including the same set of covariates), by focusing on the strength of the evidence rather than arbitrary significance thresholds. In that framework, differences in model parsimony can be interpreted as differences in the parsimony of the alternative predictors (as everything else is held constant).

For each language outcome measure, a series of models are fitted treating that measure as outcome variable, and considering in each model one of the alternative measures of language exposure as predictor, together with a set of control variables. The control variables included in each model were determined by the optimal predictive model we identified in previous work for the outcome variable in question (in the same group of children), as explained under Modelling procedures.

For the second set of analyses, we use cross-validation (Desmarais & Harden, Reference Desmarais and Harden2014) to assesses the true predictive performance of each alternative measure of language exposure (in relation to language outcomes) and estimate the risk of over-fitting. The data are split (by participants) into an estimation set and a validation set (with the same response variable and predictor variables in both sets). We employ k = 5-fold cross-validation, where observations are randomly split into 5 different groups and 4 of them (80%) are assigned as estimation set and the other (20%) as validation set. Model fitting is then performed (5 times) on the estimation set to obtain parameter estimates for each of the language experience measure. The parameter estimates obtained from the estimation set are then used to make ‘new sample’ prediction in the validation set using the same predictors.

In the validation set, therefore, we have a ‘new sample’ prediction on the outcome variable, in addition to the observed outcome variable itself. We calculate the average of squared differences between the two as the root mean square error (RMSE) of prediction. RMSE is interpreted as the predictive ability of the model. The lower the RMSE, the better the predictive ability of the model. This allows us to assess the extent to which the analysis of the validation set could generalise to an independent dataset including different children.

For each language outcome measure, cross-validations are performed on models using, in turn, each of the alternative measures of language experience as well as the same control variables as in the first set of analyses. In addition to RMSE, we calculate the range of RMSE across the five different folds to assess whether the calculation of RMSE is stable across different folds. Furthermore, to measure the uncertainty in the calculation of RMSE, we calculate the confidence interval of RMSE using the bootstrap method (Efron & Tibshirani, Reference Efron and Tibshirani1986, Reference Efron and Tibshirani1994). The bootstrap method is implemented by resampling with replacement observations in the estimation set to obtain bootstrapped RMSEs in the validation set. We can therefore calculate the 95% confidence interval from the 2.5 and 97.5 percentiles of the distribution of bootstrapped RMSEs.

According to the cross-validation analysis, the optimal model (i.e., the one containing the optimal estimate of language exposure) is the one that has the lowest RMSE and that shows the greatest stability of RMSE across folds. Furthermore, the confidence intervals indicate RMSE precision and allow comparison between models.Footnote ⁷ Overlapping CIs across models would suggest that improvements are modest.

When evaluating cost/benefit overall (across the two sets of analyses), the optimal measure of language experience (as predictor of language outcome) should be the one that requires the least questionnaire detail while yielding the best predictive performance, i.e., the smallest error size across folds.

3. Results

As a preliminary observation, we note that the alternative estimates of language experience are all correlated, for both the SL (Figure 5 in the Supplementary Materials) and the HL (Figure 6 in the Supplementary Materials). Most of the correlations are strong (ranging from r = .52 to r = .81), apart from those with the number of interlocutors in the HL matrix. All correlations are significant in both matrices (p < .001).

3.1. Information-theoretic approach

3.1.1. Predicting outcomes in the SL

The information-theoretic approach aims to guide the choice between alternative language exposure measures as predictors of sentence repetition scores in the SL. The summary of results is presented in Table 2 (for target repetitions). Similar findings were obtained for identical repetitions and grammatical attempts (see Tables 9 and 10 in the Supplementary Materials). In each table, the ‘Model’ column specifies the language exposure variable included in the model; the ‘weight’ column specifies the probability that the model in question is the best in the set and the ‘dAIC’ column specifies the difference in AIC between the model in question and the best one in the set.

Table 2. Comparison of models with different language exposure predictors of target repetitions (SRT)

For all three outcome measures from the sentence repetition task, Age of SL Onset is uncontroversially more informative as a predictor: in each table, the model including it has the highest possible probability of being the best one in the set.

The model comparisons focusing on vocabulary as outcomes measures (involving each time a different language exposure predictor, as above), are summarised in the Supplementary Materials in Table 11 for vocabulary breadth and Table 12 for vocabulary depth.

Here, the models including Age of SL Onset as predictor no longer have the highest probability of being the best in the set: rather, cumulative exposure to the SL is superior in both models, albeit only very marginally so in the models where vocabulary depth is the outcome measure.

3.1.2. Predicting outcomes in the HL

Table 3 summarises the model comparison for alternative measures of exposure to the HL, as predictors of speaking proficiency in the HL.

Table 3. Comparison of models predicting oral proficiency in the HL

Here, the model with the highest probability of being the best one in the set is the one with Current HL Exposure in the Home as predictor.

3.1.3. Sensitivity to sampling effects

The strong preference for Age of SL Onset as predictor of morphosyntax outcomes in the SL is surprising, as it only correlates with SL outcomes in the sequential bilinguals and cannot account for the variability of outcomes in the simultaneous bilinguals (illustrated in Figure 3). The impact of Age of SL Onset might be due to the presence of children with very late onset of SL exposure in our sample.

Figure 3. Age of onset of SL exposure.

As a first exploration of how variations in length of SL exposure might have affected the results, we refitted the models predicting the identical repetition score, excluding children with a late onset of exposure to the SL. We repeated the procedure three times, excluding first the children with onset of SL exposure above 4 years of age (48 months), then at or above 3.6 years of age (42 months), then at or above 3 years of age (36 months), as shown in Figure 4. These three thresholds have been proposed in the literature as cutoff points to distinguish late bilinguals from early bilinguals (see Schulz & Grimm, Reference Schulz and Grimm2019 for a review). The model comparisons at each iteration are shown in Tables 13–15 in the Supplementary Materials. We found that the probability of the model with Age of SL Onset being the best in the set decreased as the sample was restricted to children with lower ages of onset. If children with an Age of SL Onset above 36 months are excluded, Cumulative Exposure has a greater likelihood of being the most informative predictor than Age of SL Onset.

Figure 4. Grouping by Age of SL onset.

This suggests that model selection (from among variants each including a different language exposure predictor) is prone to sampling effects. We therefore turn to cross-validation: a method of informing model selection that is more robust to overfitting.

3.2. Cross-validation approach

We start with the cross-validation analyses comparing models predicting SL outcomes. The results for morphosyntax (based on the accurate repetitions of target structures in the sentence repetition task) are summarised in Table 4. The results for the other measures of SL outcomes can be found in in the Supplementary Materials: see Tables 16 and 17 for the other morphosyntax outcomes and Tables 18 and 19 for vocabulary outcomes.

Table 4. Cross-validation comparisons with target repetitions as outcome measure

For each alternative measure of language exposure (used in turn as predictor in the model), listed in the first column, we report the Root Mean Square of the Error between predicted and actual values, averaged across the five folds of data (in column ‘Mean error’). We also report the range of RMSE across folds, to estimate the stability of the model across subsets of data,Footnote ⁸ as well as the confidence interval boundaries (at 2.5% and 97.5%, respectively). In each table, the ‘Baseline model’ does not include any of the alternative measures of language exposure. Note that the model includes all the other control variables and that some of these reflect other aspects of language exposure (a point we come back to in the Discussion).

The patterns of results observed in models predicting SL proficiency are as follows:Footnote ⁹ (1) Age of SL Onset yields the lowest mean error for most outcome measures (with the exception of Vocabulary breadth). (2) Cumulative Exposure tends to show smaller ranges across outcomes, suggesting more stable predictions. (3) The CIs overlap across models, suggesting that the improvement of better performing models is only modest. In particular, there is always CI overlap between models with Age of SL Onset and those with Cumulative Exposure. Note that in each analysis, the CIs of the baseline model overlap with those of the other models. This is because the baseline model includes control variables that are associated with the amount of language exposure, such as estimates of the quality of language exposure. Here, we adopt the conservative approach of focusing on the impact of language exposure while controlling for the impact of these other predictors.

We now turn to the cross-validation analyses comparing models predicting HL outcomes.Footnote ¹⁰ As seen in Table 20 in the Supplementary Materials, Current Home Exposure is the strongest predictor, yielding the minimum error. The range for this variable is relatively low. It is closely followed by Exposure in the Current Period. Age of SL onset yields a much higher mean error. The weighted measure of current exposure does not fare better than its non-weighted alternative. As observed for SL outcomes, the CIs overlap, indicating modest differences between the alternative predictors.

4. Discussion

We performed a cost–benefit analysis to assess the optimal degree of questionnaire detail required to obtain informative predictors of language outcomes in bilingual children. Optimality was defined in terms of quantity as the minimum amount of information required to obtain the most informative predictor, and in terms of quality as the balance between precision and error at higher levels of questionnaire detail. We evaluated this through an empirical comparison of alternative predictors derived from language exposure estimates requiring a different amount of questionnaire-based information.

The alternative predictors were informed by Q-BEx, a customisable online questionnaire which can be implemented at various levels of detail (and hence questionnaire completion time). This study was based on data from 121 bilingual or trilingual children between the ages of 5 and 9, obtained with the longest version of the questionnaire. Using different subsets of measures (simulating implementation of the questionnaire at different levels of detail), we generated six alternative estimates of exposure to the Societal Language (SL) and eight alternative estimates of exposure to the Heritage Language (HL). These alternative estimates of language experience differed not only in levels of granularity but also in the type of information they are based on. For each language, all the estimates of language experience showed a consistent pattern of significant inter-correlations, corroborating their treatment as alternative measures of the same latent variable.

The informativity of alternative predictors was assessed according to information theory, in two sets of analyses. The first one followed the approach of Burnham and Anderson (Reference Burnham and Anderson2003), which aims to determine the likelihood that a particular model is the best-fitting among a set of alternatives. It can also determine how closely competitor models perform, compared to the optimal model. Age of SL onset far outperformed the alternative (language exposure) predictors of sentence repetition accuracy in the SL. Cumulative exposure clearly outperformed the other predictors of SL vocabulary breadth. In the case of SL vocabulary depth, the performance of all models was quite close, with Cumulative exposure coming out as most informative (closely followed by Age of SL onset). Among language experience predictors of HL outcomes, current measures were the most informative. Current exposure in the home yielded the best fit, with the non-weighted estimate of current exposure a close contender. These results are summarised in Table 5.

Table 5. Most informative predictors of language outcomes, based on AIC comparisons

We conclude from this first set of analyses that outcomes in the SL are best predicted by language exposure estimates that take into account the child’s experience over their lifetime. By contrast, outcomes in the HL are best predicted by current estimates. Both for SL and for HL, the most informative estimate of language exposure is not costly to obtain (as it requires little questionnaire information): Age of SL onset (as predictor of SL morphosyntax), or Current HL exposure in the home (as predictor of HL outcomes).Footnote ¹¹ However, Age of SL onset cannot predict variability among simultaneous bilinguals (given that it is zero for all of them). The informativity of Age of SL onset as predictor of sentence repetition accuracy does indeed diminish (in favour of Cumulative exposure), as ‘late SL starters’ get progressively excluded from the sample. In other words, the extent to which Age of SL onset is the best predictor depends on the extent to which it varies amongst participants.

To objectively assess the impact of sampling effects, we carried out a second set of analyses, using a cross-validation approach (Desmarais & Harden, Reference Desmarais and Harden2014). This approach asks whether the predictions obtained from a randomly defined (80%) subset of the data generalise to the remainder (20%) of the data. For each of the alternative measures of language exposure, the procedure was repeated five times (in five folds), using different sub-samples. The magnitude of the error between the predicted parameter estimates (based on the training set) and the observed estimates (from the testing set) revealed the extent to which the predictor of interest is affected by sampling effects.

The cross-validation analysis revealed that, in this sample (characterised by a particular age range and variability in Age of SL exposure), Age of SL exposure was the most effective estimate of SL language exposure in terms of cost–benefit. This was the case even though the sample includes simultaneous bilinguals (for whom Age of SL onset is not an informative predictor, as it is identical for all these children) because late Age of SL onset has such a strong (negative) effect on SL outcomes on a (relatively small) group of children in the population sample. Cumulative SL exposure is less prone to variability in prediction error across the data subsets (i.e., folds). Cumulative SL exposure is informative for more children within the sample (compared with Age of SL Onset): it can vary in simultaneous bilinguals and is strongly associated with Age of SL onset in sequential bilinguals.Footnote ¹²

When it comes to HL outcomes, the cross-validation analysis identifies Current exposure in the home as the optimal predictor. This also happens to be the simplest direct estimate of current HL exposure (based on interactions in the home, to the exclusion of other contexts). Age of SL onset (which is equivalent to the length of any monolingual period in the HL) had middling performance, suggesting it was not among the strongest predictors. Contrary to Kubota and Rothman (Reference Kubota and Rothman2024), HL exposure during the holidays performed relatively poorly. We interpret this as indicating that the reliability of this predictor of HL outcomes is strongly affected by the properties of the sample. For example, we speculate that factors affecting whether the family can holiday in a country where the HL is spoken may act as a moderator.

The answer to the quantity problem needs to be nuanced: the simplest measures can go a long way, depending on the properties of the population sample. The highly informative value of Age of SL onset in our sample is likely to be due to the magnitude of its effect in ‘late’ bilinguals in this age group (as they will not have accumulated much SL experience at the point of testing): if ‘late-starters’ are removed, Age of SL Onset is superseded by Cumulative Exposure. We conclude that Age of SL Onset is only informative to the extent that it is a reliable proxy for cumulative exposure (two highly correlated measures in our sample). A direct measure of cumulative exposure is therefore a ‘safer’ option, because it is less susceptible to sampling effects.

Various studies have debated whether language proficiency is best predicted by current or by cumulative estimates of exposure (see, e.g., Cohen, Reference Cohen2016; Unsworth, Reference Unsworth2013). Our results suggest that their effect is mediated by language status: in heritage bilinguals (who are exposed to their HL from birth but tend to become dominant in the SL, which is the school language), current exposure (especially in the home) is more informative as a predictor of HL outcomes, but cumulative is more informative as a predictor of SL outcomes. A limitation of our study is that it only included bilingual children growing up in countries where the education system was monolingual and delivered exclusively in the SL. Further research will be required to investigate whether home exposure to the HL remains the most informative predictor in children who are educated bilingually, or in the HL rather than the SL. Another question for further investigation is to determine what level of cumulative exposure in the SL is indicative of SL proficiency within the range found in monolingual children. We report on such an analysis in De Cat, Gusnanto, et al. (Reference De Cat, Gusnanto, Kašćelan, Prévost, Serratrice, Tuller and Unsworthunder review).

In answer to the quality problem, this investigation shows that greater questionnaire detail does not necessarily result in more precise and reliable measures. In particular, asking respondents to report how much time the child spends with each interlocutor (or group of interlocutors) in each context does not afford a more informative predictor of language outcomes, in spite of the greater precision it is designed to achieve in the estimates of current exposure to each language. This does not necessarily mean that the detailed information gathered in QBEx and other questionnaires may not be useful. In the context of a research study, this will depend on the specific research questions asked. However, when all that is needed is a more general bilingualism measure, or when the questionnaire is used in an educational or clinical setting, including these additional questions might not necessarily be worth the time investment. Further research will be required to ascertain whether the reliability of the weighted estimates increases if the questionnaire is administered via an interview. There is some preliminary evidence that this might be the case (Verhoeven et al., Reference Verhoeven, van Witteloostuijn, Oudgenoeg-Paz and Blom2024). In our sample, where the Q-BEx questionnaire was self-administered in the Netherlands and the UK, we found a higher rate of unlikely answers on the timetabling questions (which are a part of the weighted exposure estimate) when compared to the French sample, to which Q-BEx was administered as an interview with some of the parents (see Kašćelan et al., Reference Kašćelan, Prévost, Serratrice, Tuller, Unsworth and De Catin preparation for more details). Finally, we note that, across both sets of analyses, the differences in predictive ability among alternative measures of language exposure were relatively modest (as shown, for example, by the overlapping confidence intervals in the cross-validation). This is due to the fact that the baseline model already contains predictors that account for a substantial amount of variability in the data. Our comparisons were based on differences in a single additional predictor, using a relatively modest sample size.

5. Conclusion

The cost–benefit analysis presented here objectively assessed the informativity of language exposure estimates as predictors of language proficiency in bilingual and trilingual children.

In terms of recommendations for the customisation of Q-BEx when aiming to obtain reliable predictors of language proficiency, the results of this validation study suggest that the most effective implementation of the language exposure and use model, in terms of cost–benefit, is to include (i) the short version of current estimates submodule, which consists of 8 questions (one about the child’s language exposure and one about their language use in each of the following contexts: home, school/daycare, community and holidays), as well as (ii) the cumulative estimates submodule. We hope the protocol developed for this study will be used by others to extend the investigation to other age groups and bilinguals growing up in different contexts.

Beyond the validation of Q-BEx, this study shows that, in heritage bilinguals in their first few years of schooling, proficiency in the Societal Language is best predicted by cumulative estimates of language exposure, while proficiency in the Heritage Language is best predicted by current estimates, especially in the home.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925100497.

Data availability statement

The data and R scripts that support the findings of this study are available via the OSF at https://osf.io/7v2yw/?view_only=b657a039e8824a56be1a662499c63cf0 (for the data) and https://osf.io/qajbd/?view_only=604e696272674001991f4d8f87b0a967 (for the script).

Author contribution

De Cat: Conceptualisation, Methodology, Formal analysis, Data curation, Writing – Original Draft, Project administration, Funding acquisition; Gusnanto: Methodology, Formal analysis, Writing – Review & Editing; Kašćelan: Data collection, Writing – Review & Editing; Prévost: Project administration, Writing – Review & Editing; Serratrice: Project administration, Writing – Review & Editing; Tuller: Project administration, Writing – Review & Editing; Unsworth: Project administration, Writing – Review & Editing.

Funding statement

This project was supported by a grant from the Economic and Social Research Council (ES/S010998/1). We also gratefully acknowledge funding from the Harmonious Bilingualism Network. A big thank you to the many schools who enabled us to test children on their premises, to the parents and caregivers who patiently completed a very long questionnaire, and to the children to took part in the testing. We are grateful to the research assistants who contributed to the data collection and/or coding: Marieke van den Akker, Lotte Andersen, Zina Tamiatto, Laurent Lombard and Laure Mallevays.

Competing interests

The authors declare none.

Footnotes

This research article was awarded Open Data and Open Materials badges for transparent practices. See the Data Availability Statement for details.

¹ In the French arm of the study, the language mixing module was not included.

² This excludes two outliers, who took 183 and 213 minutes respectively, and may therefore have had interruptions while completing the questionnaire.

³ Here, we limit ourselves to language exposure. It is strongly correlated with language use: In our population sample, the correlation between SL exposure and use is r = .93 for current estimates and r = .79 for cumulative estimates. Consequently, we expect that the results will be similar for the two dimensions. For an investigation of the impact of “passivity of language experience” (which we define as the extent to which the child uses the language less than they are exposed to it) on heritage language outcomes, see De Cat, Gusnanto, et al. (Reference De Cat, Gusnanto, Kašćelan, Prévost, Serratrice, Tuller and Unsworthunder review).

⁴ This global estimate can now be obtained from one question per context, if the short version of the language exposure and use module is implemented. This option wasn’t available at the time of our data collection. We calculated this estimate by averaging by context.

⁵ We also assessed phonology, with the LITMUS Quasi-Universal Non-Word Repetition task (Marinis & Armon-Lotem, Reference Marinis, Armon-Lotem, Armon-Lotem, de Jong and Meir2015). We do not include this as an outcome measure in this study, as typically-developing children in this age range are expected to perform at or close to ceiling (given that the task is quasi-universal, i.e., designed not to be dependent on knowledge of a particular language), and the children suspected of atypical development were excluded from the analyses below. For an investigation of NWR outcomes in our entire sample, see De Cat, Tuller, et al. (Reference De Cat, Tuller, Kašćelan, Prévost, Serratrice and Unsworthunder review).

⁶ Standard scores are inaccurate for bilingual children given that they are not adjusted for reduced experience in the SL.

⁷ The comparison of CIs can only be done across models predicting the same outcome variable.

⁸ The range of errors was calculated by subtracting the smallest RMSE value from the largest RMSE obtained across folds for a given model. Please note that RMSEP values can only be interpreted relative to each other, and not in absolute sense. However, if the model prediction error (i.e., RMSE) is smaller than the natural spread of our data (i.e., the SD of the outcome measure), the model has good predictive value. This was the case in all our models.

⁹ Note that the Mean error cannot be compared across tables, as they are based on data measured on different scales, analysed with different types of statistical models.

¹⁰ Four of the control variables in the baseline models were too unevenly distributed to meet the requirements for cross-validation sampling. As all these control variables represented aspects of the “richness” of the language experience, we replaced them with the composite variable for the Richness of the language experience which is automatically calculated by Q-BEx - see Unsworth et al. (Reference Unsworth, Gusnanto, Kašćelan, Prévost, Serratrice, Tuller and De Catunder review) for a validation of that composite Richness variable. We also performed an alternative cross-validation analysis on simplified alternatives of the original control variables (reducing them each from five levels to two or three) instead of using the Richness composite measure. The results for that analysis are provided in the Supplementary Materials. The two approaches yielded highly consistent results.

¹¹ In this study, we calculated the current HL exposure in the home as the average across all interlocutors in the home. It is now possible to elicit this information from a single question, using the short version of the Language Experience module in Q-BEx.

¹² Another advantage of the Cumulative Estimates sub-module of Q-BEx is that it provides data to cross-check the answers to the Onset of Exposure questions, which were sometimes misinterpreted by respondents in our study: some parents seemed to assume that language exposure does not count until the child starts producing lan guage, and responded that exposure to any language (HL or SL) started when their child was 1 (or sometimes 2) years of age. See Kašćelan et al. (Reference Kašćelan, Prévost, Serratrice, Tuller, Unsworth and De Catin preparation) for an explanation of how we corrected these implausible responses based on information provided via the Cumulative Exposure sub-module.

References

Burnham, K., & Anderson, D. (2003). Model selection and multimodel inference: A practical information-theoretic approach. Springer Science & Business Media.Google Scholar

Byers-Heinlein, K., Esposito, E. G., Winsler, A., Marian, V., Castro, D. C., & Luk, G. (2019). The case for measuring and reporting bilingualism in developmental research. Collabra: Psychology, 5(1), 37.10.1525/collabra.233CrossRef Google Scholar PubMed

Cohen, C. (2016). Relating input factors and dual language proficiency in French–English bilingual children. International Journal of Bilingual Education and Bilingualism, 19(3), 296–313. https://doi.org/10.1080/13670050.2014.982506.CrossRef Google Scholar

Dass, R., Smirnova-Godoy, I., McColl, O., Grundy, J. G., Luk, G., & Anderson, J. A. E. (2024). A content overlap analysis of bilingualism questionnaires: Considering diversity. Bilingualism: Language and Cognition, 27(4), 744–750. https://doi.org/10.1017/S1366728923000767.CrossRef Google Scholar

De Cat, C., Gusnanto, A., Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., & Unsworth, S. (under review). A data-driven approach to the issue of “catching up with monolinguals”.Google Scholar

De Cat, C., Gusnanto, A., Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., & Unsworth, S. (in preparation). Individual differences in heritage language outcomes in 5- to 9-year-olds.Google Scholar

De Cat, C., Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., & Unsworth, S. (2022). Quantifying Bilingual EXperience (Q-BEx): Questionnaire manual and documentation. https://doi.org/10.17605/OSF.IO/V7EC8.CrossRef Google Scholar

De Cat, C., Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., Unsworth, S., & consortium, Q.-B. (2023). How to quantify bilingual experience? Findings from a Delphi consensus survey. Bilingualism: Language and Cognition, 26(1), 112–124. https://doi.org/10.1017/S1366728922000359.CrossRef Google Scholar

De Cat, C., Tuller, L., Kašćelan, D., Prévost, P., Serratrice, L., & Unsworth, S. (under review). Using Q-BEx to identify risk for language impairment in bilingual children.Google Scholar

Desmarais, B., & Harden, J. (2014). An unbiased model comparison test using cross-validation. Quality & Quantity, 48(4), 2155–2173.10.1007/s11135-013-9884-7CrossRef Google Scholar

DeVellis, R. F. (2017). Scale development: Theory and applications (4th ed.) Sage Publications.Google Scholar

Dunn, L., Dunn, L., & Schlichting, L. (2005). Peabody picture vocabulary test-III-NL. Harcourt Test Publishers.Google Scholar

Dunn, L., Dunn, L., Sewell, J., Styles, B., Brzyska, B., Shamsan, Y., & Burge, B. (2009). The British picture vocabulary scale. GL Assessment Limited.Google Scholar

Dunn, L., Thoriault-Whalen, C., & Dunn, L. (1993). Echelle de Vocabulaire en images Peabody (EVIP). Psycan.Google Scholar

Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1(1), 54–75. http://www.jstor.org/stable/2245500 Google Scholar

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Chapman and Hall/CRC. https://doi.org/10.1201/9780429246593.CrossRef Google Scholar

Hill, L. J., Shire, K. A., Allen, R. J., Crossley, K., Wood, M. L., Mason, D., & Waterman, A. H. (2021). Large-scale assessment of 7-11-year-olds’ cognitive and sensorimotor function within the born in Bradford longitudinal birth cohort study. Wellcome Open Research, 6, 53. https://doi.org/10.12688/wellcomeopenres.16429.2.CrossRef Google Scholar PubMed

Hoff, E. (2003). The specificity of environmental influence: Socioeconomic status affects early vocabulary development via maternal speech. Child Development, 74(5), 1368–1378.10.1111/1467-8624.00612CrossRef Google Scholar PubMed

Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., Unsworth, S., & De Cat, C. (2021). A review of questionnaires quantifying bilingual experience in children: Do they document the same constructs? Bilingualism: Language and Cognition, 25(1), 29–41. https://doi.org/10.1017/S1366728921000390.CrossRef Google Scholar

Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., Unsworth, S., & De Cat, C. (in preparation). Putting the Q-BEx questionnaire to a quality test: Questionnaire design and quality checks.Google Scholar

Kort, W., Schittekatte, M., & Compaan, E. (2008). CELF-4-NL: Clinical evaluation of language fundamentals-vierde-editie. B.V. Pearson Assessment and Information.Google Scholar

Kubota, M., & Rothman, J. (2024). Modeling individual differences in vocabulary development: A large-scale study on Japanese heritage speakers. Child Development, 96(1), 325–340. https://doi.org/10.1111/cdev.14168.CrossRef Google Scholar

Luk, G., & Esposito, A. G. (2020). BLC mini-series: Tools to document bilingual experiences. Bilingualism: Language and Cognition, 23(5), 927–928. https://doi.org/10.1017/S1366728920000632.CrossRef Google Scholar

Marinis, T., & Armon-Lotem, S. (2015). Sentence repetition. In Armon-Lotem, S., de Jong, J., & Meir, N. (Eds.), Methods for assessing multilingual children: Disentangling bilingualism from language impairment. Multilingual matters.Google Scholar

Polišenská, K., Chiat, S., & Roy, P. (2015). Sentence repetition: What does the task measure? International Journal of Language & Communication Disorders, 50(1), 106–118. https://doi.org/10.1111/1460-6984.12126.CrossRef Google Scholar PubMed

Prévost, P., Gusnanto, A., Kašćelan, D., Serratrice, L., Tuller, L., Unsworth, S., & De Cat, C. (under review). Predicting accuracy on the LITMUS sentence repetition task.Google Scholar

Rocha-Hidalgo, J., & Barr, R. (2022). Defining bilingualism in infancy and toddlerhood: A scoping review. International Journal of Bilingualism, 27(3), 253–274. https://doi.org/10.1177/13670069211069067.CrossRef Google Scholar

Rujas, I., Mariscal, S., Murillo, E., & Lázaro, M. (2021). Sentence repetition tasks to detect and prevent language difficulties: A scoping review. Children, 8(7), 578. https://www.mdpi.com/2227-9067/8/7/578 10.3390/children8070578CrossRef Google Scholar PubMed

Schulz, P., & Grimm, A. (2019). The age factor revisited: Timing in acquisition interacts with age of onset in bilingual acquisition. Frontiers in Psychology, 9, 2732. https://doi.org/10.3389/fpsyg.2018.02732.CrossRef Google Scholar PubMed

Semel, E., Wiig, E., & Secord, W. (2017). Clinical evaluation of language fundamentals (Fifth edition) (Fourth Edition (CELF-4) ed.). Pearson.Google Scholar

Serratrice, L., Gusnanto, A., Kašćelan, D., Prévost, P., Tuller, L., Unsworth, S., & De Cat, C. (under review). Predictors of vocabulary breadth and depth in the societal language of multilingual children in three European countries.Google Scholar

Thordardottir, E. (2011). The relationship between bilingual exposure and vocabulary development. International Journal of Bilingualism, 15(4), 426–445. https://doi.org/10.1177/1367006911403202.CrossRef Google Scholar

Unsworth, S. (2013). Assessing the role of current and cumulative exposure in simultaneous bilingual acquisition: The case of Dutch gender. Bilingualism: Language and Cognition, 16, 86–110.10.1017/S1366728912000284CrossRef Google Scholar

Unsworth, S., Gusnanto, A., Kašćelan, D., Prévost, P., Serratrice, L., Tuller, L., & De Cat, C. (under review). Unpacking the richness of language experience as a predictor of bilingual children’s language proficiency.Google Scholar

Verhagen, J., Boom, J., Thieme, A.-M., Kuiken, F., Keydeniers, D., Aalberse, S., & Andringa, S. (2024). Relationships between bilingual exposure at ECEC and vocabulary growth in a linguistically diverse sample of preschoolers. Journal of Applied Developmental Psychology, 93, 101657. https://doi.org/10.1016/j.appdev.2024.101657.CrossRef Google Scholar

Verhoeven, E., van Witteloostuijn, M., Oudgenoeg-Paz, O., & Blom, E. (2024). Comparing different methods that measure bilingual children’s language environment: A closer look at audio recordings and questionnaires. Language, 9(7), 231. 10.3390/languages9070231.Google Scholar

Wechsler, D. (2013). Wechsler Preschool & Primary Scale of intelligence (WPPSI-IV). Pearson.Google Scholar

Wechsler, D. (2014). Wechsler intelligence scale for children (WISC-V): Technical and interpretive manual. Pearson.Google Scholar

Wiig, E., Semel, E., & Wayne, A. (2019). CELF 5—Batterie d’évaluation des fonctions langagières et de communication (adaptation française ECPA). Pearson.Google Scholar