Hostname: page-component-cb9f654ff-p5m67 Total loading time: 0 Render date: 2025-08-24T05:12:29.943Z Has data issue: false hasContentIssue false

On the impact of exposure to different languages on Theory of Mind in neurotypical and autistic children

Published online by Cambridge University Press:  01 August 2025

Franziska Baumeister*
Affiliation:
Autism, Bilingualism, Cognitive and Communicative Development Research Group (ABCCD), Faculty of Science and Medicine, University of Fribourg, Fribourg, Switzerland
Pauline Wolfer
Affiliation:
Autism, Bilingualism, Cognitive and Communicative Development Research Group (ABCCD), Faculty of Science and Medicine, University of Fribourg, Fribourg, Switzerland
Ehsan Solaimani
Affiliation:
Department of Language and Linguistic Science, https://ror.org/04m01e293 University of York , York, UK
Moritz M. Daum
Affiliation:
Developmental Psychology: Infancy and Childhood, Department of Psychology, University of Zurich, Zurich, Switzerland Jacobs Center for Productive Youth Development, University of Zurich, Zurich, Switzerland
Stephanie Durrleman
Affiliation:
Autism, Bilingualism, Cognitive and Communicative Development Research Group (ABCCD), Faculty of Science and Medicine, University of Fribourg, Fribourg, Switzerland
*
Corresponding author: Franziska Baumeister; Email: franziska.baumeister@unifr.ch
Rights & Permissions [Opens in a new window]

Abstract

Exposure to multiple languages may support the development of Theory of Mind (ToM) in neurotypical (NT) and autistic children. However, previous research mainly applied group comparisons between monolingual and bilingual children, and the underlying mechanism of the observed difference remains unclear. The present study, therefore, sheds light on the effect of bilingualism on ToM in both NT and autistic children by measuring language experiences with a continuous operationalization. We measure ToM with a behavioral, linguistically simple tablet-based task, allowing inclusive assessment in autistic children. Analyses revealed no difference between monolingual and bilingual NT and autistic children. However, more balanced exposure to different languages within contexts positively predicted first-order false belief understanding in NT children but not autistic children. Mediation analysis showed that the impact in NT children was a direct effect and not mediated via other cognitive skills.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Highlights

  • - Prior studies on the effect of bilingualism on ToM focus on group comparisons.

  • - Few studies examine the impact of bilingualism on ToM in autistic children.

  • - More balanced exposure to languages in the same context boosts ToM in NT children.

  • - This effect of language exposure is direct, not mediated by Executive Functioning.

  • - No impact of language exposure on ToM in autistic children was found.

1. Introduction

Bilingualism, defined as the use of, and exposure to, two or more languages (de Bruin, Reference de Bruin2019; Grosjean, Reference Grosjean1982), is characterized by variability with respect to the age of first exposure to a second language (L2), the amount of L2 exposure, L2 use, L2 proficiency and switching habits (de Bruin, Reference de Bruin2019). It is a prevalent phenomenon affecting over half of the European population: The European Commission’s report “Europeans and their languages” (2024) states that 59% of Europeans can engage in conversations in at least one additional language to their native tongue. Similar figures are estimated for the global population (Bialystok, Reference Bialystok2018).

From the early nineteenth century to the 1960s, there was a widespread belief that bilingualism had a negative impact on intellectual development (Wei, Reference Wei2000). However, in the 1960s, a shift occurred as positive cognitive effects were observed in different measures, such as intelligence tests (Peal & Lambert, Reference Peal and Lambert1962). Since 1990, research has explored various areas potentially impacted by bilingualism, such as executive functioning (Bialystok, Reference Bialystok2011), metalinguistic awareness (MLA; Adesope et al., Reference Adesope, Lavin, Thompson and Ungerleider2010) and Theory of Mind (ToM; Schroeder, Reference Schroeder2018). ToM refers to the ability to understand and infer mental states such as desires, beliefs and intentions. It involves recognizing that these mental states can differ between oneself and others and anticipating resulting behaviors (Wimmer & Perner, Reference Wimmer and Perner1983). While some studies suggest that bilingualism may positively affect ToM in neurotypical (NT; Schroeder, Reference Schroeder2018) and autistic children (Peristeri et al., Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021), others do not report such ToM advantages associated with bilingualism (e.g., Dahlgren et al., Reference Dahlgren, Almén and Dahlgren Sandberg2017). Understanding the potential impact of growing up with multiple languages on children’s development is important, especially for families with children with developmental disorders (DD), including autism spectrum disorder (ASD),Footnote 1 a neurodevelopmental disorder characterized by difficulties in social communication (World Health Organization, 2019). Indeed, families considering bilingual upbringing for their children may be concerned that exposing them to more than one language might cause cognitive and language development delays (Davis et al., Reference Davis, Fletcher-Watson and Digard2021). At the same time, and in light of research that fails to identify the impacts of bilingualism, reviews call for a more detailed examination of the heterogeneous and complex bilingual experience (Białecka et al., Reference Białecka, Wodniecka, Muszyńska, Szpak and Haman2023; Feng et al., Reference Feng, Cho and Luk2023). In this study, we examine the interplay between bilingualism and ToM in NT and autistic children by comparing monolingual and bilingual children and investigating the effect and potential mediators of the balance of exposure to different languages in bilingual children.

1.1. Theory of Mind in neurotypical and autistic children

It has been suggested that ToM is closely linked to other cognitive abilities, such as executive functions (EF), which comprises cognitive control abilities like inhibitory control, monitoring and planning (Joseph & Tager-Flusberg, Reference Joseph and Tager-Flusberg2004) and linguistic abilities like syntactic comprehension (Durrleman et al., Reference Durrleman, Burnel and Reboul2017). Moreover, ToM predicts reading ability (Jacobs & Paris, Reference Jacobs and Paris1987). In narratives, ToM is essential for analyzing and interpreting the mental states of characters (Pelletier & Astington, Reference Pelletier and Astington2004). As a result, ToM fosters peer collaboration, enhances social interactions and reduces misunderstandings (Slaughter et al., Reference Slaughter, Imuta, Peterson and Henry2015).

ToM develops sequentially during early and middle childhood and can notably be assessed through tasks measuring children’s understanding of others’ desires, beliefs, knowledge and emotions (Wellman & Liu, Reference Wellman and Liu2004). For instance, the ToM scale by Wellman and Liu (Reference Wellman and Liu2004) consists of seven subtasks reflecting this developmental progression: (1) diverse desires: recognizing that individuals may have different desires, developing around age 2; (2) diverse beliefs: understanding differing beliefs, around age 3; (3) knowledge access: judging what someone else knows, around ages 3–4; (4) contents false belief: understanding false beliefs about container contents, around ages 4–5; (5) explicit false belief: predicting behavior based on false beliefs, around ages 4–5; (6) belief emotion: judging emotions based on false beliefs, around ages 4–5; and (7) real-apparent emotion: recognizing discrepancies between felt and displayed emotions, developing between ages 5–6. Higher-order reasoning, such as understanding second-order false beliefs (e.g., someone’s belief about another person’s belief), typically develops later, around ages 7–8 (Perner & Wimmer, Reference Perner and Wimmer1985).

Difficulties in the domain of social cognition, including the initiation and maintenance of social interactions, are the primary characteristics of ASD (World Health Organization, 2019). Moreover, due to the diverse nature of the condition, autistic children may experience delays in language development across various linguistic aspects, such as morphosyntactic or syntactic Skills (Naigles, Reference Naigles2021; Schaeffer et al., Reference Schaeffer, Abd El-Raziq, Castroviejo, Durrleman, Ferré, Grama, Hendriks, Kissine, Manenti, Marinis, Meir, Novogrodsky, Perovic, Panzeri, Silleresi, Sukenik, Vicente, Zebib, Prévost and Tuller2023; Silleresi, Reference Silleresi2023). ToM difficulties were once believed to be the core “deficit” in children with ASD. Baron-Cohen’s (Reference Baron-Cohen1997) theory of “mindblindness” suggested that autistic children face fundamental challenges in understanding and attributing mental states, emphasizing a normative perspective from the NT viewpoint. Tager-Flusberg’s (Reference Tager-Flusberg2007) developmental model of ToM introduced a more nuanced perspective, distinguishing between basic social-perceptual and complex social-cognitive components while still acknowledging inherent ToM deficits in autism. However, recent reflections highlight that these traditional perspectives may perpetuate outdated assumptions, often disregarding important individual differences, in that difficulties in ToM can be part of the criteria leading to an ASD diagnosis but are not the sole or primary criterion (Marocchini, Reference Marocchini2023).

1.2. Evidence for the impact of bilingualism on ToM in neurotypical and autistic children

Research suggests that bilingualism, that is the exposure to and use of multiple languages, may provide opportunities for developing skills relevant to ToM (Schroeder, Reference Schroeder2018). However, some studies (e.g., Dahlgren et al., Reference Dahlgren, Almén and Dahlgren Sandberg2017) have failed to replicate the effects found in behavioral tasks. Other studies find better performance of bilinguals only in single ToM subtasks. For example, Białecka et al. (Reference Białecka, Wodniecka, Muszyńska, Szpak and Haman2023) found positive effects in a ToM justification question but not in the related behavioral ToM task. Three different accounts have been claimed about how bilingualism may affect ToM in children, as illustrated in Figure 1 (Baumeister et al., Reference Baumeister, Bagioka, Rivoletti and Durrleman2025).

Figure 1. Graphical explanation of the hypothesized direct and indirect impact of bilingualism on ToM (Baumeister et al., Reference Baumeister, Bagioka, Rivoletti and Durrleman2025).

The first account (1) argues that bilingualism has a direct effect (Díaz, Reference Díaz2021; Tiv et al., Reference Tiv, O’Regan and Titone2021). This Competence account claims that ToM development takes place during early and middle childhood when children make use of information and cues in their surroundings to build their understanding of the mind and to reason that others may have subjective, and thus potentially false, perspectives about reality (Brown et al., Reference Brown, Donelan-McCall and Dunn1996). Bilingual children naturally encounter rich ToM learning experiences in which they develop sensitivity to communicative situations with interlocutors speaking different languages at different levels and on different occasions (Fan et al., Reference Fan, Liberman, Keysar and Kinzler2015). This, in turn, fosters the understanding that other people communicate in different languages and hold varied perspectives and mental states (e.g., Díaz, Reference Díaz2021). Following Díaz’s hypothesis, frequent exposure to different languages enhances learning opportunities. One way to operationalize this hypothesis is to calculate a score comprising the overall percentages of exposure to the different languages in an individual’s daily life, leading to a so-called exposure across contexts score. However, such a score may not precisely indicate whether individuals are exposed to different languages within the same contexts (such as at home, at school, in the community or during holidays), as an individual could also exclusively use one language within a single context at a time. Therefore, another way of operationalizing the exposure to different languages is by calculating a more fine-grained exposure within contexts score, which takes into account the percentages of time individuals are exposed to different languages within specific contexts.

The second account (2) is proposed by Goetz (Reference Goetz2003) and suggests that improved ToM among bilinguals may be explained indirectly via EF (Goetz, Reference Goetz2003; Kovács, Reference Kovács2009). This so-called Performance account claims that EF is necessary to resolve ToM tasks because these tasks usually require holding in short-term memory and updating a series of information, while inhibiting another person’s perspective (Carlson et al., Reference Carlson, Moses, Carlson and Moses2001; Moses, Reference Moses2001). The link between bilingualism and EF is based on the claim that bilinguals, whose languages are always active in their brain (Kroll et al., Reference Kroll, Bobb and Hoshino2014), continuously train their inhibitory control abilities when inhibiting a nontarget language in a given moment (Bialystok et al., Reference Bialystok, Craik and Luk2012; Bialystok & Craik, Reference Bialystok and Craik2022). However, this finding is under considerable debate, as not all studies have found a clear effect of bilingualism on EF (e.g., Gathercole et al., Reference Gathercole, Thomas, Kennedy, Prys, Young, Viñas-Guasch, Roberts, Hughes and Jones2014; Laine & Lehtonen, Reference Laine and Lehtonen2018). Despite this, combining the Performance account with research suggesting that bilingualism boosts EF (Bialystok et al., Reference Bialystok, Craik and Luk2012; Bialystok & Craik, Reference Bialystok and Craik2022) leads to the hypothesis that EF may be a mediator of bilingualism on ToM.

Finally, the third account (3) claims that metalinguistic awareness (MLA), the ability to reflect upon language as an object of thoughts (Thomas, Reference Thomas and Harris1992), is a potential predictor of ToM benefits in bilingual children (Goetz, Reference Goetz2003; Kovács, Reference Kovács2009). With respect to the link between MLA and ToM, Doherty (Reference Doherty1998) argue that both MLA and ToM require the understanding that situations or things can be conceived in different ways. The link between bilingualism and MLA is based on the claim that proficiency in two languages is considered to enhance meta-representational abilities (Doherty, Reference Doherty2000): Using two languages for communication fosters metalinguistic and metacognitive skills, including the ability to reason about one’s own and others’ thought processes, which enhance the development of ToM (Farhadian et al., Reference Farhadian, Abdullah, Mansor, Redzuan, Gazanizadand and Kumar2010; Kovács, Reference Kovács2009).

In bilingualism research in NT and autistic children, two studies hypothesized and tested potential mediators of a bilingual effect and investigated their potential contribution to the impact of bilingualism on ToM. Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021) tested MLA and EF and assessed their potential role with the help of mediation analysis to better understand the underlying mechanism by which bilingualism influences ToM in autistic children. They found evidence for a direct effect that is not mediated via MLA or EF. Huang et al. (Reference Huang, Baker and Wang2023) found evidence for a ToM advantage in NT bilinguals mediated through EF. However, these studies only showcase one example for NT and one example for autistic children and would thus require replication studies.

1.3. Methodological claims about research on the impact of bilingualism on ToM

Given that not all studies find a positive effect when investigating the impact of bilingualism on cognition (Paap et al., Reference Paap, Johnson and Sawi2015), it was noted that the unclear picture might stem from variability with respect to the operationalization of bilingualism on the one hand and the very measurement of it on the other. Because bilingualism is a complex and heterogeneous phenomenon, factors such as the amount of exposure, use, proficiency, age of first exposure and language environments play a crucial role in shaping bilingual experiences (de Bruin, Reference de Bruin2019). Despite this, many studies compare monolinguals and bilinguals without considering the diverse nuances within the bilingual population. The bilingual participants are, for example, characterized through their contexts of language use and exposure (e.g., Sudo & Matsui, Reference Sudo and Matsui2021), their amount of L2 exposure (e.g., Yow & Markman, Reference Yow and Markman2015), their age of first exposure to the L2 (e.g., Nguyen & Astington, Reference Nguyen and Astington2014), their L2 proficiency (e.g., Gordon, Reference Gordon2016) or their amount of L2 use (e.g., Greenberg et al., Reference Greenberg, Bellana and Bialystok2013). Only recently have researchers investigated the effect of bilingualism operationalized through continuous measures such as the balance of exposure on ToM, and they have shown that more balanced exposure to different languages predicts better ToM outcomes (Dicataldo & Roch, Reference Dicataldo and Roch2020; Huang et al., Reference Huang, Baker and Wang2023; Yow & Li, Reference Yow and Li2024). While Dicataldo and Roch (Reference Dicataldo and Roch2020) did not report an effect of L2 exposure on ToM, Huang et al. (Reference Huang, Baker and Wang2023) found that the balance of language exposure predicted ToM in bilinguals. Similarly, Yow and Li (Reference Yow and Li2024) operationalized the balance of language exposure with the help of entropy scores (Gullifer & Titone, Reference Gullifer and Titone2020) and found that exposure entropy predicted perspective taking. While the effects of bilingualism on ToM are often modest and context-dependent (Yu et al., Reference Yu, Lin, Li, Tsai and Chen2022), these more recent studies suggest that the cognitive and social opportunities created by diverse linguistic exposure, rather than bilingualism per se, may be central to understanding ToM development.

Moreover, Feng et al. (Reference Feng, Cho and Luk2023) and Baumeister et al. (Reference Baumeister, Bagioka, Rivoletti and Durrleman2025) highlight that studies not only differ with respect to the measurement of bilingualism but also in the ToM tasks employed in the studies (Białecka et al., Reference Białecka, Wodniecka, Muszyńska, Szpak and Haman2023; Feng et al., Reference Feng, Cho and Luk2023; Ziatabar Ahmadi et al., Reference Ziatabar Ahmadi, Jalaie and Ashayeri2015). Indeed, 72% of 53 ToM studies incorporated in Feng et al.’ systematic review used ToM tasks which included verbal instructions (i.e., verbal ToM tasks), while only 8% employed nonverbal tasks and 21% used a combination of both. In terms of responses, 74% required verbal responses, 13% a mix of verbal and nonverbal answers and the remaining 13% involved nonverbal responses (Feng et al., Reference Feng, Cho and Luk2023). This reliance on verbal measures may be problematic when assessing children who struggle with language comprehension or production, such as those with ASD (Naigles, Reference Naigles2021; Schaeffer et al., Reference Schaeffer, Abd El-Raziq, Castroviejo, Durrleman, Ferré, Grama, Hendriks, Kissine, Manenti, Marinis, Meir, Novogrodsky, Perovic, Panzeri, Silleresi, Sukenik, Vicente, Zebib, Prévost and Tuller2023; Silleresi, Reference Silleresi2023). Consequently, it has been suggested that ToM tasks that place less emphasis on linguistic abilities are preferred to those that rely primarily on verbal responses because nonverbal measures allow comparability across populations (Ziatabar Ahmadi et al., Reference Ziatabar Ahmadi, Jalaie and Ashayeri2015).

1.4. Present study

The present study investigates the impact of bilingualism on ToM in both NT and autistic children and examines potential mediators by using a linguistically simple tablet-based ToM measure accessible for children with ASD. We address three key research questions (RQ):

RQ1: Is there a difference in ToM between NT and autistic children (1A), monolingual and bilingual NT children (1B) and monolingual and bilingual autistic children (1C)?

To answer RQ1, we compared NT and autistic children (1A), monolingual and bilingual NT children (1B) and monolingual and bilingual autistic children on ToM (1C). Given potential ToM difficulties in autistic children, we hypothesized that NT children would perform better than autistic children (Marocchini, Reference Marocchini2023; Tager-Flusberg, Reference Tager-Flusberg2007). Based on previous research reporting a bilingual advantage, we predicted that bilingual NT children perform better than monolingual NT children (Schroeder, Reference Schroeder2018) and bilingual autistic children better than monolingual autistic children (Peristeri et al., Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021).

RQ2: Is there an impact of exposure to different languages across contexts in bilingual NT (2A) and bilingual autistic children (2B)? Is there an impact of exposure to different languages within contexts in bilingual NT (2C) and bilingual autistic children (2D)?

For RQ2, we operationalized Díaz’s (Reference Díaz2021) hypothesis of a direct effect of bilingualism on ToM, namely the claim that contact with individuals speaking different languages helps to understand that individuals can also have different perspectives, by measuring the extent to which a child is exposed to different languages, both across and within contexts. “Across contexts” refers to the extent to which different languages are heard independently of their context. We predicted that children with more balanced exposure to different languages performed better than children with less balanced exposure to different languages. However, since this operationalization may overlook how much different languages are used within single contexts, we further analyzed the extent of exposure to languages “within contexts.” This variable refers to the extent to which different languages are heard by an individual within specific contexts, such as their home, school, the community and during holidays. We predicted that children with more balanced exposure within contexts exhibit better performance in the ToM task compared to children with less balanced exposure within contexts, as being highly exposed to different languages within the same contexts may place a higher demand on constantly monitoring which language is being heard.

RQ3: Is the effect of language exposure a direct effect on ToM, or is it mediated via either EF or MLA?

To address RQ3, we explored whether the impact of exposure to different languages across and within contexts is direct or mediated via EF or MLA.

2. Methods

2.1. Participants

The sample consisted of 354 children living in Canada, France, Germany, Scotland, Spain, Switzerland and the United States: 69 monolingual NT (Mage = 7;4), 158 bilingual NT (Mage = 8;1), 26 monolingual autistic (Mage = 8;4) and 101 bilingual (Mage = 8;6) autistic children (see Table 1).Footnote 2 Monolingual children are those who have not been claimed by their caregivers to have had exposure to an additional language; bilinguals have been exposed to a second language. The participants were recruited through primary school contacts and autism associations, participant databases of previous projects, Facebook, psychologists, speech and language therapists and the official recruitment platform “BuildClinical” in the United States. The parents of all participants gave written informed consent, and the children received a gift card for participating in the project (CHF 35,/EUR 35,/CAD 60,/USD 35).

Table 1. Participant overview

Note: PPVT = Peabody Picture Vocabulary Test (Dunn et al., Reference Dunn, Lugo, Padilla and Dunn1986, Reference Dunn, Dunn and Thériault-Whalen1993; Dunn & Dunn, Reference Dunn and Dunn2007; Lenhard et al., Reference Lenhard, Lenhard, Segerer and Suggate2015; Stella et al., Reference Stella, Pizzoli and Tressoldi2000); RPM = Raven’s Progressive Matrices (Raven et al., Reference Raven, Rust, Chan and Zhou2018); SCQ = Social Communication Questionnaire (Lord et al., Reference Lord, Rutter, DiLavore and Risi2003); SES = Socioeconomic status (highest parental education).

All NT children included in the final data set reported no history of language or cognitive delays or impairments and no history of a diagnosis of ASD. All autistic children included in the final data set had an official diagnosis of ASD, assessed with either the Autism Diagnostic Observation Schedule – 2nd Edition (ADOS-2; Lord et al., Reference Lord, Rutter, DiLavore and Risi2003) or another standardized ASD diagnosis tool, such as the Autism Diagnostic Interview – Revised (ADI-R; Lord et al., Reference Lord, Rutter and Le Couteur1994). All children were tested in their most proficient language, which was either English, French, German, Italian or Spanish. In cases where their most proficient language was not one of the testing languages in which the study had been conceptualized, they were assessed in their second-best language.Footnote 3

2.2. Materials

2.2.1. Bilingualism measurement

We used the online parental Quantifying Bilingual Experiences (Q-BEx) questionnaire (De Cat et al., Reference De Cat, Kašćelan, Prevost, Serratrice, Tuller and Unsworth2022) to define the monolingual, that is children having been exposed to only one language throughout their life, and bilingual, that is children having been exposed to at least one additional language throughout their life, groups (RQ1) and calculate the amount of exposure to different languages across and within contexts (RQ2). The caregivers took 20 to 45 minutes to complete the questionnaire in their preferred language (English, French, German, Italian or Spanish) to report on various language experiences (e.g., amount of use, exposure, proficiency) of their children in up to three languages.

To address RQ1, we were specifically interested in the number of languages the participants were exposed to in their lives (one or more). As reported by their parents, children exposed to – or having used – only one language throughout their lives were considered “monolinguals”; children exposed to – or having used – more than one language were defined as “bilinguals.” For RQ2, we calculated two separate entropy scores to capture the diversity of language exposure, the balance of exposure across contexts entropy score and the balance of exposure within contexts entropy score. An entropy score between 0 and 1.56 quantifies the uncertainty associated with hearing one of the child’s languages in a given situation (Gullifer & Titone, Reference Gullifer and Titone2020), calculated by using the formula (Shannon, Reference Shannon1948)

$$ H=-\sum \limits_{i=1}^n{p}_i\times {\log}_2\left({p}_i\right) $$
  • The balance of exposure across contexts’ entropy score quantifies the uncertainty associated with hearing one of up to three languages (monolinguals, bilinguals, trilinguals) at any given moment, without accounting for specific contexts. Here, p(x) represents the proportion of each language across all contexts combined.

  • The balance of exposure within contexts’ entropy score quantifies the uncertainty of hearing a particular language within specific contexts (home, school, community, holidays). Separate entropy scores were calculated for each context, and these were weighted by the proportion of time spent in each context to compute a combined within-contexts entropy score.

For example, as illustrated in Table 2, a child exposed to German at home (100% of the time) and French at school (100% of the time) would have an across-contexts entropy score of 1 reflecting the balance of language exposure but a within-contexts entropy score of 0, as no two languages are used in the same context. The exposure across contexts entropy score, therefore, reflects the uncertainty of hearing a particular language at a given moment but without considering the context. A child spending 30% of his time at home, where German is used as the only language and 20% of his time in the community, where German is also used as the only language, and spending 50% of his time in school, where French is the single language being used (Example 1), will have an entropy score for the amount of exposure across contexts of 1, reflecting a high balance of exposure across contexts entropy score. The same score of 1 would be obtained by a child being exposed to both German and French 50% of the time within each context (Example 2); however, the number of occasions of ToM learning opportunities would be much higher in Example 2 because it involves the necessity to be attentive to different languages potentially being spoken in more instances, as different languages are used within various contexts. Therefore, the balance of exposure within contexts entropy score reflects the uncertainty of hearing a particular language at a given moment, considering the context. Example 1 results in a balance of exposure within contexts entropy score of 0 because the uncertainty of hearing a particular language at a specific moment is 0. On the contrary, Example 2 results in an exposure within contexts entropy score of 1, reflecting a higher uncertainty of hearing a particular language at any given time.Footnote 4

Table 2. Demonstration of entropy score calculations

2.2.2. Social Communication Questionnaire (SCQ)

The parental Social Communication Questionnaire (SCQ, Rutter et al., Reference Rutter, Bailey, Berument and Lord2003) was administered using the online platform Gorilla Experiment Builder. It assesses the severity of ASD symptoms and is considered appropriate for confirming an autism diagnosis in children between 4 and 18 years old (Allen et al., Reference Allen, Silove, Williams and Hutchins2007; Berument et al., Reference Berument, Rutter, Lord, Pickles and Bailey1999). Scores range between 0 and 40, with higher scores indicating a stronger severity of autistic symptoms. NT children in our study had significantly lower scores, below 16, than autistic children.

2.2.3. Socioeconomic status (SES)

The educational level of the caregivers was used as a measure of the participants’ socioeconomic status, as research has demonstrated its validity as a reflection of family income (Hauser & Warren, Reference Hauser and Warren1997). The higher value between the two caregivers’ answers on a 5-point Likert scale, ranging from elementary school (1) to a university degree (5), was used as a proxy for SES.

2.2.4. Theory of Mind

The ABCCD ToM measure (Baumeister et al., Reference Baumeister, Wolfer, Sahbaz, Rudelli, Capallera, Daum, Samson, Corrigan, Naigles and Durrleman2024) was used as a behavioral tool to assess the social cognitive ability of children in understanding diverse desires and attributing first- and second-order false beliefs. This gamified task was implemented on a tablet and involved a linguistically simple measure of three ToM abilities, as shown in Table 3: diverse desires (Block 1), first-order false beliefs (Block 2) and second-order false beliefs (Block 3). Each block had eight items, including two practice items, two control items and four test items. The practice items were used to familiarize the participants with the task, the control items to ensure that the participant understood the task and the test items constituted the real test items to measure the intended construct. In the test items of Block 1, participants were tested in their ability to differentiate between their own desire for an object and someone else’s desire, which differed from the participant’s desire. In the test items of Block 2, participants were asked to attribute a false belief to a character who did not witness a change in a scenario. Finally, Block 3 tested the ability to attribute a belief to a character with a false belief about another character’s belief.

Table 3. Overview blocks Theory of Mind task

For example, one test item in Block 2 (see https://osf.io/qdta8) introduced the clown and the acrobat playing “Hide and Seek” in a room with a chair, a bed and a sofa. While the clown is counting, the acrobat moves behind the bed, which is seen by the clown because he turns around. After turning back again, the acrobat moves behind the bed, which is not seen by the clown. At the end, the participants had to click on out of three possible endings, each representing where the clown would then search for the acrobat: behind the chair (response choice 1), the bed (response choice 2) or the sofa (response choice 3). The correct response would involve choosing the bed (where the acrobat hid while the clown was watching) as the logical next place for the clown to look. Choosing the chair or the bed would constitute incorrect responses.

Correct responses, scored as “1,” are interpreted as the participants being able to take into consideration another person’s perspective, that is, in Block 1, another person’s desire; in Block 2, the person’s perspective who has not seen a change, and in Block 3 a person’s perspective who does not know that another person had seen a change in the scene. Incorrect responses, scored as “0,” are interpreted as the participant attributing their own perspective to another person. Oddball responses were those that did not align with anyone’s perspective and were also scored “0.” A block was “passed” only if at least one control and one test item were answered correctly. If a block was not passed, the subsequent block was not displayed to the participant.

2.2.5. Peabody Picture Vocabulary Test (PPVT)

Proficiency in the language of testing was assessed using the standardized receptive Peabody Picture Vocabulary Test (PPVT) in the respective languages (Spanish: Dunn et al., Reference Dunn, Lugo, Padilla and Dunn1986, French: Dunn et al. (Reference Dunn, Dunn and Thériault-Whalen1993); English: Dunn & Dunn, Reference Dunn and Dunn2007; German: Lenhard et al., Reference Lenhard, Lenhard, Segerer and Suggate2015; Italian: Stella et al., Reference Stella, Pizzoli and Tressoldi2000). The participants were shown a series of panels with increasing difficulty, each with four pictures, while hearing a pre-recorded spoken word, and were required to select the corresponding picture. Raw scores were transformed into standardized z-scores (M = 0, SD ± 1).

2.2.6. Raven’s Colored Progressive Matrices

Nonverbal cognitive abilities were assessed via Pearson’s “Q-Global” platform using the standardized short and digitalized version of Raven’s Colored Progressive Matrices (RPM; Raven et al., Reference Raven, Rust, Chan and Zhou2018). The participants were presented with a series of images featuring a pattern with a missing piece, which the participants were required to identify. Raw scores were transformed into standardized scores (M = 100, SD ± 15).

2.2.7. Frog Matrices Task (FMT)

Visuospatial short-term memory (STM) and visual working memory (WM) were assessed through the Frog Matrices Task (FMT; Morales et al., Reference Morales, Calvo and Bialystok2013). Spans were determined by the longest sequence in which participants were able to recall the sequential (STM) and reverse (WM) order of frog placements, ranging from 0 to 6.

2.2.8. Simon task

Interference inhibition was measured using the Simon task (Bialystok et al., Reference Bialystok, Craik, Klein and Viswanathan2004; Simon, Reference Simon1969) in which blue and red stimuli appeared on the top right or top left of a tablet screen, above a red button on the right and a blue button on the left. The Simon effect was calculated as the difference between reaction times in the incongruent (i.e., trials in which the color of the stimulus does not align with the button’s color) and congruent conditions (i.e., trials in which the color aligns).

2.2.9. Dimensional Change Card Sorting Task (DCCS)

The Dimensional Change Card Sorting Task (DCCS) was used to assess children’s ability to switch between different sorting criteria (Zelazo, Reference Zelazo2006). A stimulus was presented in the upper part in the middle of a tablet with a button with a blue rabbit on the bottom left and a button with a red boat on the bottom right. In Block 1, participants were asked to sort six randomly presented stimuli according to their color. In Block 2, the participants were asked to sort six stimuli now according to their shape. In Block 3, 16 stimuli were presented either with a black border around the stimulus or without, that is, eight with color and eight without color. When participants saw a black border around the stimulus, they were asked to sort the stimulus following the color rule; when the stimulus was presented without a border, they were asked to sort it following the shape rule. For correct responses, a score of “1” was assigned; for incorrect responses or absence of answer, a score of “0.” The total score ranged between 0 and 16, indicating the children’s “switching” abilities.

2.2.10. Grammatical Judgment Task (GJT)

Metalinguistic awareness was assessed by a Grammatical Judgment task (GJT; Wolfer et al., Reference Baumeister, Wolfer, Sahbaz, Rudelli, Capallera, Daum, Samson, Corrigan, Naigles and Durrleman2024). Participants judged the acceptability of eight grammatically correct and eight grammatically incorrect sentences. We applied Rice et al. formula (Reference Rice, Wexler and Redmond1999), accounting for a potential “yes bias” to calculate an MLA-score

$$ MLA- score=0.5+\frac{\left(a-b\right)\left(1+a-b\right)}{4a\left(1-b\right)} $$

with “a” as the percentage of correct responses to grammatically correct sentences and “b” as the percentage of incorrect responses to grammatically incorrect sentences.

2.3. Testing procedure

The caregivers of the participants were requested to fill out online questionnaires through the online experiment platform “Gorilla Experiment Builder” (Anwyl-Irvine et al., Reference Anwyl-Irvine, Massonnié, Flitton, Kirkham and Evershed2020), which lasted around 60 minutes. These questionnaires included a background questionnaire regarding the children’s personal background, the Q-BEx (De Cat et al., Reference De Cat, Kašćelan, Prevost, Serratrice, Tuller and Unsworth2022) focusing on the children’s language experiences, the SCQ (Lord et al., Reference Lord, Rutter, DiLavore and Risi2003) and the SWAN (Swanson et al., Reference Swanson, Schuck, Porter, Carlson, Hartman, Sergeant, Clevenger, Wasdell, McCleary, Lakes and Wigal2012). Based on the information gathered from these questionnaires, a decision was made regarding the inclusion or exclusion of children in the study. Children diagnosed with ASD were eligible to participate in the “autistic” group if they had an official diagnosis provided by a clinician; children without ASD or any other neurodevelopmental disorder were eligible to participate in the “neurotypical” group. Each child participated in person in two or three sessions in their homes or schools, lasting 1 to 1.5 hours each, during which the participants completed the tasks on the tablet.

2.4. Analysis

2.4.1. Research question 1

To address RQ1, we fitted a binomial generalized linear mixed-effects model with a logit link function in R (R Development Core Team, 2021; version: 4.3.3) using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015; version 1.1–35.5). Response accuracy was the binary dependent variable (correct, incorrect), and items and participants were entered as random intercepts. We first created a “base” model, including the following fixed effects: sum-coded (−0.5, + 0.5) effects of language group (bilingual, monolingual), sum-coded (−0.5, + 0.5) effects of diagnostic group (neurotypical, autistic), treatment coded effect of ToM block (Block 1, Block 2, Block 3, with Block 2 as baseline) and their three-way-interaction. We then added each individual covariate separately and examined if the newly added variable improved the model’s goodness-of-fit using likelihood ratio tests. We retained only those variables that significantly contributed to the model fit. The covariates added were: chronological age, nonverbal reasoning (standardized IQ score), proficiency in the language of testing (PPVT z-score), autism severity (SCQ score; as in Tager-Flusberg, Reference Tager-Flusberg2007), socioeconomic status (parental educational level; as in Devine & Hughes, Reference Devine and Hughes2018), measures of EF and MLA (Kovács, Reference Kovács2009), as well as their interactions with diagnostic group (NT versus autistic). All numeric fixed effects were scaled. Missing data in the covariates were imputed using multiple imputations (Lee & Simpson, Reference Lee and Simpson2014). Potential multicollinearities between the covariates were assessed by inspecting the variance inflation factor (VIF) with a threshold of <5.

2.4.2. Research question 2

To investigate RQ2, we applied a similar approach as described for RQ1. This involved creating two sets of models, one for the analysis of balance of exposure across contexts and another for the analysis of balance of exposure within contexts. The “base” models for each set included the fixed effects of diagnostic group, block and balance of exposure across contexts (set 1) and balance of exposure within contexts (set 2). All other fixed effects and random intercepts were added in the same stepwise approach as for RQ1.

2.4.3. Research question 3

To investigate the causal paths of the potential effect of bilingual exposure, that is, whether the effects of balance of exposure on ToM are direct or mediated via EF or MLA, we applied a set of mediation analyses, using the mediation package (Tingley et al., Reference Tingley, Yamamoto, Hirose, Keele and Imai2014).Footnote 5 In these models, only the retained variables resulting from RQ2 were included.

For full details on the statistical models, please refer to: https://osf.io/cxe5v/?view_only=b7075a79cd704752b334ff9e9658cf79.

3. Results

3.1. Between-group comparisons (RQ1)

The model providing the best fit included the fixed and random effects from the “base” model, consisting of language group, diagnostic group, block and participants and items as random intercepts; furthermore, the model included the covariates age, proficiency in the language of testing, nonverbal IQ, working memory and switching. As shown in Table 4 and Figure 2, there was a significant effect of diagnostic group, with higher accuracy for the NT than the autistic children in Block 2. Accuracy was significantly higher in Block 1 than in Block 2 and higher in Block 2 than in Block 3. Besides, older children were more accurate than younger children (age), children with higher language proficiency performed better than children with a lower proficiency (proficiency in the language of testing), children with higher nonverbal IQ scores displayed better performance than children with lower IQ scores, children with better working memory scores had better performance than those with lower scores (working memory) and children with better switching abilities had better performance than those with lower scores (switching).

Table 4. Result of the logistic mixed effects model (RQ1)

Figure 2. Predicted probabilities of correct answers in different blocks for NT and autistic monolingual and bilingual children. ASD = Autistic children; NT = Neurotypical children.

As the interaction between diagnostic group and the difference between Block 1 and Block 2 was not significant, this suggests that NT children performed better than autistic children not only on Block 2 but also on Block 1. Furthermore, post hoc analyses of the significant interaction between diagnostic group and the difference between Block 2 and Block 3 showed that the difference between Blocks 2 and 3 was significant in NT children (Estimate = −1.013, SE = 0.256, z-score = −3.962, p < .001). By contrast, autistic children did not display a significant difference between Blocks 2 and Block 3 (Estimate = −0.169, SE = 0.334, z-score = −0. 504, p = .614).

Neither the effect of language group nor the interactions between language group, block and diagnostic group were significant. This suggests that there is not enough evidence to conclude a difference between monolingual and bilingual NT on one hand, and between monolingual and bilingual autistic children on the other.

Overall, the analysis of RQ1 showed that NT children performed significantly better than autistic children (1A), but that there was no difference between monolingual and bilingual NT children (1B) and no difference between monolingual and bilingual autistic children (1C).

3.2. Impact of exposure (RQ2)

For RQ2, we created two sets of models in a similar stepwise manner as in RQ1. The first set of models included balance of exposure to languages across contexts, diagnostic group, block, their three-way interaction, age, proficiency in the language of testing, working memory and switching as fixed effects. Socioeconomic status, autism severity, nonverbal IQ, inhibitory control captured by the Simon effect, and MLA did not improve model fit and were consequently not retained in the analyses. Participants and items were entered as random intercepts. The effect of balance of exposure to languages across contexts was not significant. As shown in Table 5, the effects for diagnostic group, the difference between Blocks 1 and 2, the difference between Blocks 2 and 3, age, proficiency in the language of testing, working memory and switching were significant. Because a significant interaction was found between diagnostic group and the difference between Blocks 2 and 3, we pursued follow-up analyses for NT and autistic children in Block 2 and Block 3 separately. In NT children, the difference between Block 2 and 3 was significant, with performance on Block 2 being significantly better than performance on Block 3 (Estimate = −1.011, SE = 0.290, z-value = −3.487, p < .001). In autistic children, however, no significant effect of the difference between Blocks 2 and 3 was found (Estimate = −0.097, SE = 0.320, z-value = −0.304, p = .761). Furthermore, whereas NT children performed significantly better on Block 2 than autistic children (Estimate = −1.260, SE = 0.499, z-value = −2.525, p = .012), there was no significant difference in performance between NT and autistic children on Block 3 (Estimate = 0.158, SE = 0.663, z-value = 0.238, p = .812). This suggests that Block 3 was particularly difficult for both NT and autistic children.

Table 5. Result of the logistic mixed effects binomial model (RQ2) – balance of exposure across contexts

The second set of models included balance of exposure to languages within contexts, diagnostic group, block, their three-way interaction, age, proficiency in the language of testing, working memory and switching as fixed effects, and participants and items as random intercepts. As for the first set of models, socioeconomic status, autism severity, nonverbal IQ, inhibitory control and MLA did not improve model fit and were consequently not retained in the analyses. As shown in Table 6, NT children performed significantly better than autistic children; performance was better on Block 1 than on Block 2 and on Block 2 than on Block 3; older children outperformed younger children; children with higher language proficiency, better working memory and higher switching abilities outperformed their peers with lower language proficiency, lower working memory and lower switching abilities, respectively. Follow-up analyses of the significant interaction between diagnostic group and the difference between Block 2 and 3 revealed, similarly to the model including balance of exposure across contexts, that the difference between Blocks 2 and 3 was significant in NT children, with performance on Block 2 being significantly better than performance on Block 3 (Estimate = −0.803, SE = 0.287, z-value = −2.795, p = .005). In autistic children, however, there was no significant effect of the difference between Blocks 2 and 3 (Estimate = −0.124, SE = 0.320, z-value = −0.391, p = .696). Besides, NT children performed significantly better on Block 2 than autistic children (Estimate = −1.222, SE = 0.663, z-value = −2.432, p = .015), and there was no significant difference in performance on Block 3 between NT and autistic children (Estimate = −0.166, SE = 0.665, z-value = −0.249, p = .803).

Table 6. Result of the logistic mixed effects model (RQ2) – balance of exposure within contexts

As shown in Figure 3, follow-up analyses of the significant three-way interaction between balance of exposure within contexts, diagnostic group and the difference between Blocks 2 and 3 showed that there was a significant effect of balance of exposure within contexts in NT children in Block 2, with children with higher entropy scores within contexts understanding first-order false beliefs better than those with lower entropy scores (Estimate = 0.765, SE = 0.323, z-score = 2.365, p = .018). There was no significant effect of balance of exposure within contexts on Block 3 in NT children (Estimate = −0.083, SE = 0.399, z-score = −0.209, p = .834) and neither in autistic children on Block 2 (Estimate = −0.016, SE = 0.316, z-score = −0.050, p = .960) nor on Block 3 (Estimate = 1.095, SE = 0.579, z-score = 1.892, p = .059).

Figure 3. Effect of balance of exposure within contexts on the predicted probability of an accurate response. ASD = Autistic children; NT = Neurotypical children, orange line = Block 1 (diverse desires), green line = Block 2 (first-order false beliefs), violet line = Block 3 (second-order false beliefs).

In sum, analyses showed that the balance of exposure to different languages across contexts did not have an impact on performance on ToM tasks in NT and autistic children. However, NT children with a higher balance of exposure within contexts were shown to perform significantly better on Block 2 (first-order false beliefs) than NT children with less balanced exposure to different languages (2A, 2C). No significant effects were detected in autistic children (2B, 2D).

3.3. Mediation analyses (RQ3)

Given the significant effect of balance of exposure within contexts on first-order false belief understanding in NT children identified in RQ2, mediation analysis was conducted to explore the underlying causal paths. Based on the findings from RQ2, switching was selected as the EF measure for the mediation model, as it consistently emerged as a significant predictor of ToM performance. Inhibitory control and metalinguistic awareness were not included in the mediation analysis, because they did not significantly improve model fit in RQ2.

The mediation analysis included two models: (a) a mediator model predicting switching based on balance of exposure within contexts and age, and (b) an outcome model identical to that used in RQ2. As shown in Figure 4, balance of exposure within contexts had a direct and positive impact on ToM (Average direct effect Estimate = 0.049, CI = [0.015, 0.080], p = .008), while no indirect effects were observed through EF. This suggests that the relationship between balance of exposure within contexts and first-order false belief understanding in NT children operates independently of switching.

Figure 4. Tests of ACME and ADE for Block 2 in NT children. ACME = Average Causal Mediation Effect, ADE = Average Direct Effect; NT = Neurotypical children.

4. Discussion

Autism spectrum disorder (ASD) can involve challenges with ToM, while bilingualism has been claimed to improve ToM abilities in neurotypical and autistic children (see Schroeder, Reference Schroeder2018 for a review on NT children; e.g., Baumeister et al., Reference Baumeister, Bagioka, Rivoletti and Durrleman2025; Peristeri et al., Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021 on autistic children). However, there is limited research on how bilingualism impacts ToM in autistic children, and, for both autistic and NT children, there is a need to address methodological issues of previous studies regarding how bilingual experiences and ToM are measured. More specifically, this study moves beyond categorical (monolingual and bilingual) group comparisons, examining instead the diverse nature of bilingual experiences (Luk, Reference Luk2022) and using ToM tasks that are more inclusive and valid (Beaudoin et al., Reference Beaudoin, Leblanc, Gagner and Beauchamp2020). The present study aimed to further elucidate the effects of bilingualism on ToM in both NT and autistic children by measuring language experiences in a fine-grained manner using the Q-BEx questionnaire and ToM via a novel task minimizing verbal and executive demands. We addressed three research questions: First, we compared NT and autistic children (1A), monolingual and bilingual NT (1B) and monolingual and bilingual autistic children (1C). Second, we explored the impact of balance of exposure to languages across and within contexts in both bilingual NT and bilingual autistic children (RQ2). Third, we inspected with mediation analysis whether the effect of balance of exposure to languages within contexts on first-order false belief understanding in NT children (found in RQ2) was direct or mediated via EF (RQ3).

4.1. Discussion of research question 1

NT children performed better than autistic children (1A), which aligns with previous research indicating that autistic children have difficulties in ToM (e.g., Tager-Flusberg, Reference Tager-Flusberg2007). Additionally, monolingual and bilingual NT (1B) and autistic (1C) children did not differ. The absence of the detection of a bilingualism effect on ToM in our study may have two explanations: Potentially, the absence of a group effect indicates an absence of an effect overall: This would align with null effect findings reported in other studies and meta-analyses conducted on ToM or on EF (e.g., Gathercole et al., Reference Gathercole, Thomas, Kennedy, Prys, Young, Viñas-Guasch, Roberts, Hughes and Jones2014; Lehtonen et al., Reference Lehtonen, Fyndanis and Jylkkä2023), two fields where research has increasingly questioned the existence of a broad, robust effect. Given the close relationship between ToM and EF (Carlson et al., Reference Carlson, Moses, Carlson and Moses2001), it is conceivable that any bilingual advantage in ToM may similarly be limited or context-dependent, rather than a general phenomenon. However, this explanation is unlikely given the substantial amount of studies highlighting an impact of bilingualism on ToM development (Baumeister et al., Reference Baumeister, Bagioka, Rivoletti and Durrleman2025; Schroeder, Reference Schroeder2018). Furthermore, this lack of effect does not seem to stem from limitations of sample size, as the current sample was relatively large.Footnote 6 It also cannot be explained by having overlooked theoretically important predictors, as we assessed the potential role of a range of these, such as age and proficiency in the language of testing.

An alternative and potentially more promising explanation for the lack of group differences between monolinguals and bilinguals might relate to how monolingual–bilingual groups were defined in this study versus other studies. We defined bilinguals as children who, according to their parents, were exposed to or used more than one language throughout their lives, regardless of the amount of exposure, use or proficiency. This contrasts with the characteristics of the bilingual group reported in other studies, where effects emerged: For example, Nguyen and Astington (Reference Nguyen and Astington2014) defined bilinguals as those with exposure to a second language for at least 30% of the time. Similarly, sample descriptions in other studies, such as in Gordon (Reference Gordon2016) and Fan et al. (Reference Fan, Liberman, Keysar and Kinzler2015) for NT children, or Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021) for autistic children, imply that their participants had close to equally balanced exposure to different languages. Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021), for example, assessed the difference in ToM between Albanian–Greek bilingual and Greek monolingual children with ASD. The bilingual children were reported to have exposure to both languages from birth in their homes, while using Greek mainly outside the home. To examine whether our definition of bilingualism may have influenced the results, we conducted an exploratory post hoc analysis using an alternative criterion for group classification. Specifically, we reclassified participants based on the balance of their language exposure, defining “bilinguals” as those with a relatively balanced exposure to different languages (defined by an exposure entropy score across contexts >1.2, which could result from exposure to an L1 for 60% of the time, an L2 for 35% of the time and an L3 for 5% of the time). Among the (smaller) group of autistic children, no significant difference was found between monolinguals and bilinguals; however, a close to significant difference emerged in the (larger) group of NT children in first-order false beliefs (Estimate = −0.624, SE = 0.334, z-value = −1.868, p = .062). This suggests that children with balanced exposure to different languages may be the bilinguals who benefit most, and where group effects may arise, as seen in RQ2. The findings furthermore highlight the need for a more nuanced definition of bilingualism and a deeper exploration of how the diverse nature of bilingualism may influence ToM, as addressed in RQ2.

4.2. Discussion of research question 2

There was a significant effect of balance of exposure within contexts on first-order false belief reasoning in NT children, whereas this effect was not shown in autistic children. No effect was detected on other ToM subdomains, nor was the balance of exposure across contexts investigated.

In NT children, a more balanced exposure within contexts predicted first-order false belief understanding: This is consistent with the hypothesis suggesting that the more balanced the exposure children have to different languages within contexts, the more they constantly monitor ongoing communications in different languages, thereby improving their ToM abilities (Diaz et al., Reference Diaz, Borjas and Farrar2021; Yow & Li, Reference Yow and Li2024). This effect was, however, only shown for the extent of balance of exposure to languages within contexts but not across contexts. As hypothesized earlier, the understanding that other individuals may have different perspectives may become more apparent when children have the possibility to make this experience not only at the “borders” of contexts (that is when switching from home to school, or from school into the community), but within the same contexts: Only the constant monitoring imposed by greater balance of exposure to languages within contexts would thus allow to trigger a bilingual effect on first-order false belief understanding.

The differential effects of balance of exposure within contexts observed across the ToM subtasks warrant further exploration. While balance of exposure within contexts significantly predicted better first-order false belief understanding (Block 2), this effect was not observed for diverse desires understanding (Block 1) nor second-order false belief understanding (Block 3). All blocks were shown to be reliable in a validation study with children between 4 and 10 years old (Baumeister et al., Reference Baumeister, Wolfer, Sahbaz, Rudelli, Capallera, Daum, Samson, Corrigan, Naigles and Durrleman2024). Nevertheless, for Block 1, performance was near ceiling for most participants in the current study, suggesting that the task may have been too easy for the tested age range, thereby limiting its sensitivity to detect subtle differences related to language exposure. Conversely, the second-order false belief task in Block 3, which assesses a more advanced ToM ability, may not reveal advantages from multilingual exposure in this age group. It is also possible that Block 3 was generally too difficult for all children, regardless of bilingualism experience, as evidenced by the lack of difference between NT and ASD groups in this task – despite NT children typically outperforming ASD children on ToM tasks, as observed in Blocks 1 and 2 of the present study. Developmentally, second-order false beliefs emerge later than first-order false beliefs, and other factors, such as general cognitive development, may play a more prominent role at this stage (Carlson & Moses, Reference Moses2001). The first-order false belief tasks, by contrast, may strike a balance between task difficulty and developmental appropriateness, making it more sensitive to individual differences in language exposure.

The lack of an effect of balance of exposure within contexts on ToM in autistic children compared to NT children contradicts Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021). The authors found better performance by bilingual autistic children in comparison to monolingual autistic children. This difference in the findings may have different sources: One possibility is that a potential effect of bilingualism could have been masked due to the different sample sizes in the monolingual autistic group in our study (monolingual N = 26, bilingual N = 101) in comparison to Peristeri et al. study (monolingual N = 60, bilingual N = 43). Another explanation is that the positive effect of bilingualism may not yet manifest in four- to eleven-year-old children, which was the age range of participants in this study. Indeed, this contrasts with Peristeri et al.’ (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021) participants, who had a wider age range between 6;9 and 15;6 years (M = 11;5). A third possible explanation is that autistic children may not take advantage of the social contacts they are surrounded by in the same manner as NT children, due to potential ToM-related difficulties (Marocchini, Reference Marocchini2023; Tager-Flusberg, Reference Tager-Flusberg2007). This could be especially the case in children who may lack an interest in social interactions in general (Carter et al., Reference Carter, Davis, Klin and Volkmar2005). Therefore, the better performance reported by Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021) could be due to lower autism severity in the autistic children who were characterized as “high functioning.” Intriguingly, autism severity, as measured with the SCQ in the present study, did not have a significant effect on ToM performance. This can be explained by the fact that the SCQ does not only include social interaction measures, but also indicators of repetitive behaviors. While these behavioral manifestations represent another type of potential difficulties in autism, they do not necessarily interfere with an interest in social interactions. A fourth explanation could be linked to the bilingual characteristics within the autistic sample, mentioned in the discussion of RQ: Because the sample of Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021) consisted of children having more “balanced” exposure to Greek and a second language than the participants in this study, this may indicate that the benefits of bilingualism through exposure may require more balanced exposure to different languages.

In sum, we argue that although an effect was predicted based on previous research, the absence of an effect of balance of bilingual exposure in autistic children may be explained by differences in the study designs and participant characteristics. Therefore, further research is required with larger cohorts.

4.3. Discussion of research question 3

Following the finding in RQ2 that balance of exposure within contexts contributes to first-order false belief understanding in NT children, we further investigated whether this effect was direct or mediated via EF.Footnote 7 Analyses revealed a significant direct effect of the balance of exposure to languages that was not mediated via EF. This is consistent with the view that being exposed to different languages within specific contexts enhances ToM, in and of itself, independently of EF. Therefore, our findings provide additional support for Díaz’s (Reference Díaz2021) hypothesis, namely that bilingualism may directly enhance ToM as children exposed to people speaking different languages within context infer that these people may also have different perspectives, thus leading to better performance in ToM.

5. Conclusion

Due to the increasing number of bilingual neurotypical and autistic children, it is necessary to better understand the effects of bilingualism on cognitive abilities, including ToM. Whereas there is substantial research on the effects of bilingualism on ToM in NT children, mostly applying group comparisons between monolingual and bilingual children and reporting bilingual advantages, the understanding of the impact of bilingualism on ToM in children with ASD is still limited. Additionally, for both autistic and NT children, it is not clear what potential aspects of bilingual experience impact ToM. The current study showed that both monolingual and bilingual children with ASD, in comparison to their NT peers, show persistent challenges with ToM. As ToM is crucial for social communication and interaction (Slaughter et al., Reference Slaughter, Imuta, Peterson and Henry2015), potential ways of ToM enhancement could be useful, and bilingualism may present one such “natural” source. Group comparisons between monolingual and bilingual NT, as well as monolingual and bilingual autistic children, showed no significant differences between the language groups. However, within NT bilingual children, the extent of balance of exposure to different languages within contexts significantly predicted first-order false belief understanding via a direct effect. This effect was indeed not mediated by EF, as previously hypothesized (Kovács, Reference Kovács2009), potentially due to a reduced reliance of the ToM measure used on EF. This finding furthermore highlights the complexity and heterogeneity of the characteristics of bilingual participants that are missed when applying group comparisons. In autistic children, however, no effects of balance of exposure were found on ToM.

While this study emphasizes the role of balance of language exposure for ToM, it is limited by the fact that the sample came from households with a high socioeconomic status, which may introduce bias into the investigation (Peristeri et al., Reference Peristeri, Silleresi and Tsimpli2022). Additionally, 16 participants were not tested in their most proficient language due to limitations in the availability of testing materials, which were restricted to five languages (English, French, German, Italian and Spanish). In these cases, participants were assessed in their second-best language, as required by the testing protocol. While this potential confound was controlled by incorporating a measure of participants’ proficiency in the language of testing, it remains possible that this factor influenced performance. Furthermore, the balance of exposure to languages across and within contexts was calculated based on detailed estimations of the “current” exposure to these languages, indicating the relative amount of time children were exposed to interactions in different languages in the last 12 months, therefore not capturing the individual’s language diversity in their lifetime. Including a cumulative measure, considering the languages being heard within specific contexts during the entire life, would allow an even more “realistic” measurement of the predictor of the bilingual experience. However, such a score cannot be calculated from the current version of the Q-BEx questionnaire and would thus require the creation of a new parental questionnaire.

Among the participants included in this study, 79 were trilinguals. Exposure to three languages across or within various contexts may present an even richer and more diverse experience than exposure to only two. To capture this complexity, measures such as the entropy score are particularly valuable, as they quantify not only the balance but also the diversity of exposure across and within contexts.Footnote 8

Furthermore, other bilingual factors such as age of onset may as well interact with exposure to shape cognitive outcomes. Future studies could investigate how the timing and duration of bilingual exposure influence the development of ToM, offering a more comprehensive understanding of bilingualism effects. Similarly, including L2 proficiency, assessed with an objective measure, such as the PPVT, could allow further insights into the potential influence of this dimension of bilingualism.”

A further limitation concerns the complexity of Block 3 in the ToM task, which involved linguistically demanding questions (e.g., “Now what does the clown think will happen first?”). While this level of complexity is common in second-order false belief tasks, it may have impacted participants’ comprehension and performance. Future research could include a measure of complex sentences, such as the Test of Reception of Grammar (TROG; Bishop, Reference Bishop2003), to control for comprehension of complex sentences and control for verbal short-term memory. Alternatively, future research could consider simplified versions of these tasks, such as those proposed by Marinis et al. (Reference Marinis, Andreou, Bagioka, Baumeister, Bongartz, Czypionka, Golegos, Peristeri, Skrimpa, Durrleman and Terzi2023), who omitted a test question, to ensure accessibility across diverse language and cognitive profiles.

To conclude, our findings suggested that the continuous measure of bilingual experience, specifically the balance of exposure within contexts, was a significant predictor of first-order false belief understanding in NT children. In contrast, a binary classification of language background (bilingual versus monolingual) did not emerge as a significant predictor. This discrepancy aligns with other work questioning the utility of categorical definitions of bilingualism (e.g., Huang et al., Reference Huang, Baker and Wang2023; Yow & Li, Reference Yow and Li2024), particularly in research exploring nuanced cognitive and developmental outcomes such as ToM. The reliance on binary classifications in past research may have oversimplified the diverse experiences of language users and failed to capture critical variations in linguistic exposure and usage. Cognitive advantages, such as those observed here, emerged through a continuous measure of bilingual experience (e.g., balance of exposure) rather than a traditional binary classification, highlighting that overlooking essential nuances in linguistic diversity may obscure important effects. This aligns with recent theoretical accounts (e.g., Díaz, Reference Díaz2021), which emphasize that the cognitive advantages attributed to bilingualism are likely rooted in dynamic and context-dependent interactions with diverse linguistic exposure. For instance, children with more balanced multilingual exposure are believed to engage in continuous monitoring and adaptive communication, enhancing their ability to infer and navigate others’ mental states.

Although binary classifications may remain useful in certain contexts (e.g., educational settings or broad demographic analyses), we argue that future research should prioritize continuous measures, especially for studying cognitive processes shaped by linguistic diversity. Continuous measures offer a more precise lens for examining how specific aspects of bilingualism (e.g., balance of exposure in ToM) shape cognitive development. Moreover, these measures may reveal that ToM advantages stem not from a single form of bilingualism but rather from a spectrum of experiences shaped by diverse linguistic exposure. In summary, our findings reinforce the need to move from categorical to continuous frameworks in bilingualism research (Kremin & Byers-Heinlein, Reference Kremin and Byers-Heinlein2021).

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S136672892510031X.

Data availability statement

The data that support the findings of this study and the data analysis scripts are openly available in OSF, under: https://osf.io/cxe5v/?view_only=b7075a79cd704752b334ff9e9658cf79

Acknowledgments

We warmly thank all participants who participated in this study and all autism centers, teachers and clinicians for their support in reaching out to participants. We also thank all our collaborators and students who supported the data collection process at various places.

Author contribution

FB: Conceptualization, Data curation, Methodology, Formal analysis, Investigation, Project administration, Visualization, Writing – Original Draft.

PW: Methodology, Data curation, Investigation, Project administration, Writing – Review & Editing.

ES: Conceptualization, Formal analysis, Writing – Review & Editing.

MMD: Supervision, Writing – Review & Editing.

SD: Conceptualization, Methodology, Project administration, Writing – Review & Editing, Supervision, Funding Acquisition.

Competing interests

This study was approved by the Swiss Association of Research Ethics Committees, Swissethics (Project ID-2022-00878), and the Institutional Review Board of the University of Connecticut, USA. All caregivers provided informed consent to their children’s participation before inclusion in the study.

Footnotes

This research article was awarded Open Data and Open Materials badges for transparent practices. See the Data Availability Statement for details.

1 We will use both person-first and identity-first language interchangeably when referring to individuals diagnosed with ASD, to acknowledge the diversity of preferences among the autistic community (Bottema-Beutel et al., Reference Bottema-Beutel, Kapp, Lester, Sasson and Hand2021; Buijsman et al., Reference Buijsman, Begeer and Scheeren2023; Vivanti, Reference Vivanti2020).

2 Although the developmental period for ToM development in NT children is considered in early and middle childhood (e.g., Wellman & Liu, Reference Wellman and Liu2004), we also incorporated older children into this study. Studies involving autistic children often include older participants (e.g., Baldimtsi et al., Reference Baldimtsi, Peristeri, Tsimpli and Durrleman2020; Shojaeian et al., Reference Shojaeian, Li, Kaurav and Salem2022) to account for potential developmental delays in ToM acquisition, which may result in a later emergence of skills typically observed in younger NT children. Including older children in our study was essential to examine ToM development comprehensively across both NT and autistic groups. Furthermore, while the studies included in Schroeder’s meta-analysis (Reference Schroeder2018) predominantly focused on younger children, there is growing recognition that ToM development continues throughout middle childhood, particularly for more advanced ToM abilities such as second-order false beliefs. Consequently, studies like those by Greenberg et al. (Reference Greenberg, Bellana and Bialystok2013), Tsimpli et al. (Reference Tsimpli, Peristeri and Andreou2017), Meir and Novogrodsky (Reference Meir and Novogrodsky2019), Andreou et al. (Reference Andreou, Tsimpli, Durrleman and Peristeri2020), Buac & Kaushanskaya (Reference Buac and Kaushanskaya2020), Peristeri et al. (Reference Peristeri, Baldimtsi, Vogelzang, Tsimpli and Durrleman2021) and Listanti et al. (Reference Listanti, Torregrossa, Eisenbeiss and Bongartz2023) also included children above the preschool age. Similarly, by examining a broader age range, our study captures this later developmental trajectory, providing valuable insights into ToM development in diverse populations.

3 A detailed overview of the bilingual children’s language profiles can be found in the Supplementary Materials.

4 For a detailed overview of the calculations, see the Supplementary Materials and the R data preparation code under: https://osf.io/cxe5v/?view_only=b7075a79cd704752b334ff9e9658cf79.

5 As mixed-effects models are not supported by the current version of the package, we fitted a generalized linear model for the outfit model and a linear model for the mediation model.

6 We also conducted post hoc power analysis to assess if the sample size in RQ1 in the monolingual/autistic subgroup might have been large enough to detect the effects tested. This involved using the details of the fitted models such as R-squared and number of predictors to estimate power retrospectively. The estimated power with 26 participants was 0.85 which we consider appropriate.

7 Since metalinguistic awareness (MLA) did not contribute significantly to model fit in RQ2, MLA was not further retained in the regression model. Its possible mediating effect was therefore consequently also not assessed.

8 In light of this, we included a fixed effect of multilingualism in our models for RQ2 (bilinguals versus trilinguals) and ran the models again. The results did not show significant improvements, as assessed by likelihood ratio tests, and as such, this factor was removed from the final models.

References

Adesope, O. O., Lavin, T., Thompson, T., & Ungerleider, C. (2010). A systematic review and meta-analysis of the cognitive correlates of bilingualism. Review of Educational Research, 80(2), 207245. https://doi.org/10.3102/0034654310368803.CrossRefGoogle Scholar
Allen, C., Silove, P., Williams, K., & Hutchins, . (2007). Validity of the social communication questionnaire in assessing risk of autism in preschool children with developmental problems. Journal of Autism and Developmental Disorders, 37, 12721278. https://doi.org/10.1007/s10803-006-0279-7.CrossRefGoogle ScholarPubMed
Andreou, M., Tsimpli, I. M., Durrleman, S., & Peristeri, E. (2020). Theory of mind, executive functions, and syntax in bilingual children with autism spectrum disorder. Languages, 5(4), 67. https://doi.org/10.3390/languages5040067CrossRefGoogle Scholar
Anwyl-Irvine, A. L., Massonnié, J., Flitton, A., Kirkham, N., & Evershed, J. K. (2020). Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods, 52(1), 388407. https://doi.org/10.3758/s13428-019-01237-x.CrossRefGoogle ScholarPubMed
Baldimtsi, E., Peristeri, E., Tsimpli, I. M., & Durrleman, S. (2020). The impact of bilingualism on theory of mind and executive functions in TD and ASD. In Proceedings of the 44th annual Boston University conference on language development. (pp. 79). Somerville, MA: Cascadilla Press.Google Scholar
Baron-Cohen, S. (1997). Mindblindness: An essay on autism and theory of mind. MIT Press.Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 148. https://doi.org/10.18637/jss.v067.i01.CrossRefGoogle Scholar
Baumeister, F., Wolfer, P., Sahbaz, S., Rudelli, N., Capallera, M., Daum, M. M., Samson, A. C., Corrigan, G., Naigles, L. R., & Durrleman, S. (2024). Measuring theory of mind: A preliminary analysis of a novel linguistically simple and tablet-based measure for children. Frontiers in Developmental Psychology, 2, 119. https://doi.org/10.3389/fdpys.2024.1445406.CrossRefGoogle Scholar
Baumeister, F., Bagioka, D. V., Rivoletti, L., & Durrleman, S. (2025). The impact of bilingualism on theory of mind in children with and without developmental disorders: A scoping review. Developmental Review, 75, 101186. https://doi.org/10.1016/j.dr.2025.101186CrossRefGoogle Scholar
Beaudoin, C., Leblanc, E., Gagner, C., & Beauchamp, M. L. H. (2020). Systematic review and inventory of theory of mind measures for young children. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.02905CrossRefGoogle ScholarPubMed
Berument, S. K., Rutter, M., Lord, C., Pickles, A., & Bailey, A. (1999). Autism screening questionnaire: Diagnostic validity. The British Journal of Psychiatry: the Journal of Mental Science, 175, 444451. https://doi.org/10.1192/bjp.175.5.444.CrossRefGoogle ScholarPubMed
Białecka, M., Wodniecka, Z., Muszyńska, K., Szpak, M., & Haman, E. (2023). Both L1 and L2 proficiency impact ToM reasoning in children aged 4 to 6. Painting a more nuanced picture of the relation between bilingualism and ToM (pp. 119). Bilingualism: Language and Cognition. https://doi.org/10.1017/S1366728923000652.Google Scholar
Bialystok, E. (2011). Reshaping the mind: The benefits of bilingualism. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 65(4), 229235. https://doi.org/10.1037/a0025406.CrossRefGoogle ScholarPubMed
Bialystok, E. (2018). Bilingual education for young children: Review of the effects and consequences. International Journal of Bilingual Education and Bilingualism, 21(6), 666679. https://doi.org/10.1080/13670050.2016.1203859.CrossRefGoogle ScholarPubMed
Bialystok, E., & Craik, F. I. M. (2022). How does bilingualism modify cognitive function? Attention to the mechanism. Psychonomic Bulletin & Review, 29, 12461269. https://doi.org/10.3758/s13423-022-02057-5.CrossRefGoogle ScholarPubMed
Bialystok, E., Craik, F. I. M., Klein, R., & Viswanathan, M. (2004). Bilingualism, aging, and cognitive control: Evidence from the Simon task. Psychology and Aging, 19(2), 290303. https://doi.org/10.1037/0882-7974.19.2.290.CrossRefGoogle ScholarPubMed
Bialystok, E., Craik, F. I. M., & Luk, G. (2012). Bilingualism: Consequences for mind and brain. Trends in Cognitive Sciences, 16(4), 240250. https://doi.org/10.1016/j.tics.2012.03.001.CrossRefGoogle ScholarPubMed
Bishop, D. V. (2003). Test for reception of grammar, version 2. London: Addison-Wesley Professional.Google Scholar
Bottema-Beutel, K., Kapp, S. K., Lester, J. N., Sasson, N. J., & Hand, B. N. (2021). Avoiding ableist language: Suggestions for autism researchers. Autism in Adulthood, 3(1), 1829. https://doi.org/10.1089/aut.2020.0014.CrossRefGoogle ScholarPubMed
Brown, J. R., Donelan-McCall, N., & Dunn, J. (1996). Why talk about mental states? The significance of children’s conversations with friends, siblings, and mothers. Society for Research in Child Development, 67(3), 836849.Google ScholarPubMed
Buac, M., & Kaushanskaya, M. (2020). Predictors of theory of mind performance in bilingual and monolingual children. International Journal of Bilingualism, 24(2), 339359. https://doi.org/10.1177/1367006919826866CrossRefGoogle Scholar
Buijsman, R., Begeer, S., & Scheeren, A. M. (2023). ‘Autistic person’ or ‘person with autism’? Person-first language preference in Dutch adults with autism and parents. Autism, 27(3), 788795. https://doi.org/10.1177/13623613221117914.CrossRefGoogle ScholarPubMed
Carlson, S. M., Moses, L. J., Carlson, S. M., & Moses, L. J. (2001). Individual differences in inhibitory control and children’s theory of mind. Child Development, 72(4), 10321053. https://doi.org/10.1111/1467-8624.00333.CrossRefGoogle ScholarPubMed
Carter, A. S., Davis, N. O., Klin, A., & Volkmar, F. R. (2005). Social development in autism. In Handbook of autism and pervasive developmental disorders (Vol. 1, pp. 312334). John Wiley & Sons, Inc.10.1002/9780470939345.ch11CrossRefGoogle Scholar
Dahlgren, S. O., Almén, H., & Dahlgren Sandberg, A. (2017). Theory of mind and executive functions in Young bilingual children. The Journal of Genetic Psychology, 178(5), 303307. https://doi.org/10.1080/00221325.2017.1361376.CrossRefGoogle ScholarPubMed
Davis, R., Fletcher-Watson, S., & Digard, B. G. (2021). Autistic people’s access to bilingualism and additional language learning: Identifying the barriers and dacilitators for equal opportunities. Frontiers in Psychology, 12, 4074. https://doi.org/10.3389/fpsyg.2021.741182.CrossRefGoogle ScholarPubMed
de Bruin, A. (2019). Not all bilinguals are the same: A call for more detailed assessments and descriptions of bilingual experiences. Behavioral Sciences, 9(3), 33. https://doi.org/10.3390/bs9030033.CrossRefGoogle Scholar
De Cat, C., Kašćelan, D., Prevost, P., Serratrice, L., Tuller, L., & Unsworth, S. (2022). Quantifying bilingual EXperience (Q-BEx): Questionnaire manual and documentation. https://osf.io/v7ec8/.Google Scholar
Devine, R. T., & Hughes, C. (2018). Family correlates of false belief understanding in early childhood: A meta-analysis. Child Development, 89(3), 971987. https://doi.org/10.1111/cdev.12682.CrossRefGoogle ScholarPubMed
Díaz, V. (2021). Minds in action: Evidence that linguistic diversity helps children build a theory of mind. Bilingualism: Language and Cognition, 25(1), 111. https://doi.org/10.1017/S1366728921000109.Google Scholar
Diaz, V., Borjas, M., & Farrar, M. J. (2021). Is there an association between executive function and receptive vocabulary in bilingual children? A longitudinal examination. Children (Basel, Switzerland), 8(1), 44. https://doi.org/10.3390/children8010044.Google ScholarPubMed
Dicataldo, R., & Roch, M. (2020). Are the effects of variation in quantity of daily bilingual exposure and socioeconomic status on language and cognitive abilities independent in preschool children? International Journal of Environmental Research and Public Health, 17(12), 4570. https://doi.org/10.3390/ijerph17124570.CrossRefGoogle ScholarPubMed
Doherty, M. (1998). Metalinguistic awareness and theory of mind: Just two words for the same thing? Cognitive Development, 13(3), 279305. https://doi.org/10.1016/S0885-2014(98)90012-0.CrossRefGoogle Scholar
Doherty, M. (2000). Children’s understanding of homonymy: Metalinguistic awareness and false belief. Journal of Child Language, 27(2), 367392.10.1017/S0305000900004153CrossRefGoogle ScholarPubMed
Dunn, L. M., & Dunn, D. M. (2007). PPVT-4: Peabody picture vocabulary test. Pearson Assessments.Google Scholar
Dunn, L. M., Dunn, L. M., & Thériault-Whalen, C. M. (1993). Echelle de vocabulaire en images Peabody: EVIP. PSYCAN.Google Scholar
Dunn, L. M., Lugo, D., Padilla, E., & Dunn, L. (1986). Test de vocabulario en imagines Peabody. American Guidance Service.Google Scholar
Durrleman, S., Burnel, M., & Reboul, A. (2017). Theory of mind in SLI revisited: Links with syntax, comparisons with ASD: Theory of mind in SLI revisited: Links with syntax, comparisons with ASD. International Journal of Language & Communication Disorders, 52(6), 816830. https://doi.org/10.1111/1460-6984.12317.CrossRefGoogle ScholarPubMed
European Commission. (2024). Special Eurobarometer 540: Europeans and their languages. Directorate-General for Education, Youth, Sport and Culture. https://data.europa.eu/doi/10.2766/28257Google Scholar
Fan, S. P., Liberman, Z., Keysar, B., & Kinzler, K. D. (2015). The exposure advantage: Early exposure to a multilingual environment promotes effective communication. Psychological Science, 26(7), 10901097.10.1177/0956797615574699CrossRefGoogle ScholarPubMed
Farhadian, M., Abdullah, R., Mansor, M., Redzuan, M., Gazanizadand, N., & Kumar, V. (2010). Theory of mind in bilingual and monolingual preschool children. Journal of Psychology, 1 (1). https://doi.org/10.1080/09764224.2010.11885444.CrossRefGoogle Scholar
Feng, J., Cho, S., & Luk, G. (2023). Assessing theory of mind in bilinguals: A scoping review on tasks and study designs [preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/vrnej.CrossRefGoogle Scholar
Gathercole, V. C. M., Thomas, E., Kennedy, I., Prys, C., Young, N., Viñas-Guasch, N., Roberts, E., Hughes, E., & Jones, L. (2014). Does language dominance affect cognitive performance in bilinguals? Lifespan evidence from preschoolers through older adults on card sorting, Simon, and metalinguistic tasks. Frontiers in Psychology, 5, 11. https://doi.org/10.3389/fpsyg.2014.00011.CrossRefGoogle ScholarPubMed
Goetz, P. J. (2003). The effects of bilingualism on theory of mind development. Bilingualism: Language and Cognition, 6(1), 115. https://doi.org/10.1017/S1366728903001007.CrossRefGoogle Scholar
Gordon, K. R. (2016). High proficiency across two languages is related to better mental state reasoning for bilingual children. Journal of Child Language, 43(2), 407424. https://doi.org/10.1017/S0305000915000276.CrossRefGoogle ScholarPubMed
Greenberg, A., Bellana, B., & Bialystok, E. (2013). Perspective-taking ability in bilingual children: Extending advantages in executive control to spatial reasoning. Cognitive Development, 28(1), 4150. https://doi.org/10.1016/j.cogdev.2012.10.002.CrossRefGoogle ScholarPubMed
Grosjean, F. (1982). Life with two languages: An introduction to bilingualism. Harvard University Press.Google Scholar
Gullifer, J. W., & Titone, D. (2020). Characterizing the social diversity of bilingualism using language entropy. Bilingualism: Language and Cognition, 23(2), 283294. https://doi.org/10.1017/S1366728919000026.CrossRefGoogle Scholar
Hauser, R. M., & Warren, J. R. (1997). Socioeconomic indexes for occupations: A review, update, and critique. Sociological Methodology, 27(1), 177298. https://doi.org/10.1111/1467-9531.271028.CrossRefGoogle Scholar
Huang, R., Baker, E. R., & Wang, T. (2023). Early bilingualism enhances theory of mind in children from low-income households via executive function skills. Cognitive Development, 68, 101389. https://doi.org/10.1016/j.cogdev.2023.101389.CrossRefGoogle Scholar
Jacobs, J., & Paris, S. (1987). Children’s metacognition about Reading: Issues in definition, measurement, and instruction. Educational Psychologist, 22(3), 255278. https://doi.org/10.1207/s15326985ep2203&4_4.CrossRefGoogle Scholar
Joseph, R. M., & Tager-Flusberg, H. (2004). The relationship of theory of mind and executive functions to symptom type and severity in children with autism. Development and Psychopathology, 16(1), 137155. https://doi.org/10.1017/s095457940404444x.CrossRefGoogle ScholarPubMed
Kovács, Á. M. (2009). Early bilingualism enhances mechanisms of false-belief reasoning. Developmental Science, 12(1), 4854. https://doi.org/10.1111/j.1467-7687.2008.00742.x.CrossRefGoogle ScholarPubMed
Kremin, L. V., & Byers-Heinlein, K. (2021). Why not both? Rethinking categorical and continuous approaches to bilingualism. The International Journal of Bilingualism, 25(6), 15601575. https://doi.org/10.1177/13670069211031986.CrossRefGoogle Scholar
Kroll, J. F., Bobb, S. C., & Hoshino, N. (2014). Two languages in mind: Bilingualism as a tool to investigate language, cognition, and the brain. Current Directions in Psychological Science, 23(3), 159163. https://doi.org/10.1177/0963721414528511.CrossRefGoogle ScholarPubMed
Laine, M., & Lehtonen, M. (2018). Cognitive consequences of bilingualism: Where to go from here? Language, Cognition and Neuroscience, 33(9), 12051212. https://doi.org/10.1080/23273798.2018.1462498.CrossRefGoogle Scholar
Lee, K. J., & Simpson, J. A. (2014). Introduction to multiple imputation for dealing with missing data. Respirology, 19(2), 162167. https://doi.org/10.1111/resp.12226.CrossRefGoogle ScholarPubMed
Lehtonen, M., Fyndanis, V., & Jylkkä, J. (2023). The relationship between bilingual language use and executive functions. Nature Reviews Psychology, 2(6), 114. https://doi.org/10.1038/s44159-023-00178-9.CrossRefGoogle Scholar
Lenhard, A., Lenhard, W., Segerer, R., & Suggate, S. (2015). Peabody picture vocabulary test-4. Deutsche Fassung. Pearson Assessment.Google Scholar
Listanti, A., Torregrossa, J., Eisenbeiss, S., & Bongartz, C. (2023). Home literacy exposure in the heritage language enhances theory-of-mind development: A study on Greek-Italian bilingual children. In Proceedings of the 47th annual Boston University conference on language development (pp. 505518).Google Scholar
Lord, C., Rutter, M., DiLavore, P. C., & Risi, S. (2003). Autism diagnostic observation schedule: ADOS. Western Psychological Services Los.Google Scholar
Lord, C., Rutter, M., & Le Couteur, A. (1994). Autism diagnostic interview-revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism and Developmental Disorders, 24(5), 659685. https://doi.org/10.1007/BF02172145.CrossRefGoogle ScholarPubMed
Luk, G. (2022). Justice and equity for whom? Reframing research on the “bilingual (dis)advantage”. Applied PsychoLinguistics, 44(3), 115. https://doi.org/10.1017/S0142716422000339.Google Scholar
Marinis, T., Andreou, M., Bagioka, D. V., Baumeister, F., Bongartz, C., Czypionka, A., Golegos, A., Peristeri, E., Skrimpa, V., Durrleman, S., & Terzi, A. (2023). Development and validation of a task battery for verbal and non-verbal first- and second-order theory of mind. Frontiers in Language Sciences, 1, 117. https://doi.org/10.3389/flang.2022.1052095.CrossRefGoogle Scholar
Marocchini, E. (2023). Impairment or difference? The case of Theory of Mind abilities and pragmatic competence in the Autism Spectrum. Applied Psycholinguistics, 44(3), 119. https://doi.org/10.1017/S0142716423000024CrossRefGoogle Scholar
Meir, N., & Novogrodsky, R. (2019). Prerequisites of third-person pronoun use in monolingual and bilingual children with autism and typical language development. Frontiers in Psychology, 10, 2289. https://doi.org/10.3389/fpsyg.2019.02289CrossRefGoogle ScholarPubMed
Morales, J., Calvo, A., & Bialystok, E. (2013). Working memory development in monolingual and bilingual children. Journal of Experimental Child Psychology, 114(2), 187202. https://doi.org/10.1016/j.jecp.2012.09.002.CrossRefGoogle ScholarPubMed
Moses, L. J. (2001). Executive accounts of theory-of-mind development. Child Development, 72(3), 688690. https://doi.org/10.1111/1467-8624.00306.CrossRefGoogle ScholarPubMed
Naigles, L. R. (2021). It takes all kinds (of information) to learn a language: Investigating the language comprehension of typical children and children with autism. Current Directions in Psychological Science, 30(1), 1118. https://doi.org/10.1177/0963721420969404.CrossRefGoogle Scholar
Nguyen, T.-K., & Astington, J. W. (2014). Reassessing the bilingual advantage in theory of mind and its cognitive underpinnings. Bilingualism: Language and Cognition, 17(2), 396409. https://doi.org/10.1017/S1366728913000394.CrossRefGoogle Scholar
Paap, K. R., Johnson, H. A., & Sawi, O. (2015). Bilingual advantages in executive functioning either do not exist or are restricted to very specific and undetermined circumstances. Cortex, 69, 265278. https://doi.org/10.1016/j.cortex.2015.04.014.CrossRefGoogle ScholarPubMed
Peal, E., & Lambert, W. E. (1962). The relation of bilingualism to intelligence. Psychological Monographs: General and Applied, 76(27), 123. https://doi.org/10.1037/h0093840.CrossRefGoogle Scholar
Pelletier, J., & Astington, J. W. (2004). Action, consciousness and theory of mind: Children’s ability to coordinate story characters’ actions and thoughts. Early Education and Development, 15(1), 522. https://doi.org/10.1207/s15566935eed1501_1.CrossRefGoogle Scholar
Perner, J., & Wimmer, H. (1985). “John thinks that Mary thinks that…”: Attribution of second-order beliefs by 5- to 10-year-old children. Journal of Experimental Child Psychology, 39(3), 437471. https://doi.org/10.1016/0022-0965(85)90051-7CrossRefGoogle Scholar
Peristeri, E., Baldimtsi, E., Vogelzang, M., Tsimpli, I. M., & Durrleman, S. (2021). The cognitive benefits of bilingualism in autism spectrum disorder: Is theory of mind boosted and by which underlying factors? Autism Research, 14(8), 16951709. https://doi.org/10.1002/aur.2542.CrossRefGoogle ScholarPubMed
Peristeri, E., Silleresi, S., & Tsimpli, I. M. (2022). Bilingualism effects on cognition in autistic children are not all-or-nothing: The role of socioeconomic status in intellectual skills in bilingual autistic children. Autism, 26(8), 13623613221075097. https://doi.org/10.1177/13623613221075097.CrossRefGoogle Scholar
Raven, J. C., Rust, J., Chan, F., & Zhou, X. (2018). Raven’s Progressive Matrices 2, Clinical Edition. Pearson.Google Scholar
R Development Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/Google Scholar
Rice, M. L., Wexler, K., & Redmond, S. M. (1999). Grammaticality judgments of an extended optional infinitive grammar. Journal of Speech, Language, and Hearing Research, 42(4), 943961. https://doi.org/10.1044/jslhr.4204.943.CrossRefGoogle ScholarPubMed
Rutter, M., Bailey, A., Berument, S. K., & Lord, C. (2003). Social communication questionnaire: Manual. Western Psychological Services.Google Scholar
Schaeffer, J., Abd El-Raziq, M., Castroviejo, E., Durrleman, S., Ferré, S., Grama, I., Hendriks, P., Kissine, M., Manenti, M., Marinis, T., Meir, N., Novogrodsky, R., Perovic, A., Panzeri, F., Silleresi, S., Sukenik, N., Vicente, A., Zebib, R., Prévost, P., & Tuller, L. (2023). Language in autism: Domains, profiles and co-occurring conditions. Journal of Neural Transmission, 130(3). https://doi.org/10.1007/s00702-023-02592-y.CrossRefGoogle ScholarPubMed
Schroeder, S. R. (2018). Do bilinguals have an advantage in theory of mind? A meta-analysis. Frontiers in Communication, 3, 36. https://doi.org/10.3389/fcomm.2018.00036.CrossRefGoogle Scholar
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379423.10.1002/j.1538-7305.1948.tb01338.xCrossRefGoogle Scholar
Shojaeian, N., Li, Z., Kaurav, R. P. S., & Salem, A. A. M. S. (2022). Theory of mind among swedish children with ASD, down syndrome and typically developing group. Journal of Autism and Developmental Disorders, 52(11). https://doi.org/10.1007/s10803-021-05366-1.CrossRefGoogle ScholarPubMed
Silleresi, S. (2023). Developmental profiles in autism Spectrum disorder. John Benjamins Publishing Company.10.1075/lald.68CrossRefGoogle Scholar
Simon, J. R. (1969). Reactions toward the source of stimulation. Journal of Experimental Psychology, 81(1), 174176. https://doi.org/10.1037/h0027448.CrossRefGoogle ScholarPubMed
Slaughter, V., Imuta, K., Peterson, C. C., & Henry, J. D. (2015). Meta-analysis of theory of mind and peer popularity in the preschool and early school years. Child Development, 86(4), 11591174. https://doi.org/10.1111/cdev.12372.CrossRefGoogle ScholarPubMed
Stella, G., Pizzoli, C., & Tressoldi, P. (2000). Il Peabody test – Test di vocabolario ricettivo. Omega Edizione.Google Scholar
Sudo, M., & Matsui, T. (2021). School readiness in language-minority dual language learners in Japan: Language, executive function, and theory of mind. The Journal of Genetic Psychology, 182(6), 375390. https://doi.org/10.1080/00221325.2021.1930994.CrossRefGoogle ScholarPubMed
Swanson, J. M., Schuck, S., Porter, M. M., Carlson, C., Hartman, C. A., Sergeant, J. A., Clevenger, W., Wasdell, M., McCleary, R., Lakes, K., & Wigal, T. (2012). Categorical and dimensional definitions and evaluations of symptoms of ADHD: History of the SNAP and the SWAN rating scales. The International Journal of Educational and Psychological Assessment, 10(1), 5170.Google ScholarPubMed
Tager-Flusberg, H. (2007). Evaluating the theory-of-mind hypothesis of autism. Current Directions in Psychological Science, 16(6), 311315. https://doi.org/10.1111/j.1467-8721.2007.00527.x.CrossRefGoogle Scholar
Thomas, J. (1992). Metalinguistic awareness in second- and third-language learning. In Harris, R. J. (Ed.), Advances in psychology (Vol. 83, pp. 531545). North-Holland. https://doi.org/10.1016/S0166-4115(08)61515-0.Google Scholar
Tingley, D., Yamamoto, T., Hirose, K., Keele, L., & Imai, K. (2014). Mediation: R package for causal mediation analysis. Journal of Statistical Software, 59(5), 138. https://doi.org/10.18637/jss.v059.i05.CrossRefGoogle Scholar
Tiv, M., O’Regan, E., & Titone, D. (2021). In a bilingual state of mind: Investigating the continuous relationship between bilingual language experience and mentalizing. Bilingualism: Language and Cognition, 24(5), 918931. https://doi.org/10.1017/S1366728921000225.CrossRefGoogle Scholar
Tsimpli, I. M., Peristeri, E., & Andreou, M. (2017). Object Clitic production in monolingual and bilingual children with Specific Language Impairment: A comparison between elicited production and narratives. Linguistic Approaches to Bilingualism, 7(3–4), 394430. https://doi.org/10.1075/lab.15025.tsiCrossRefGoogle Scholar
Vivanti, G. (2020). Ask the editor: What is the most appropriate way to talk about individuals with a diagnosis of autism? Journal of Autism and Developmental Disorders, 50(2), 691693. https://doi.org/10.1007/s10803-019-04280-x.CrossRefGoogle Scholar
Wei, L. (2000). The bilingualism reader. Routledge-Taylor & Francis Group.Google Scholar
Wellman, H. M., & Liu, D. (2004). Scaling of theory-of-mind tasks. Child Development, 75(2), 523541. https://doi.org/10.1111/j.1467-8624.2004.00691.xCrossRefGoogle ScholarPubMed
Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13(1), 103128.10.1016/0010-0277(83)90004-5CrossRefGoogle Scholar
Wolfer, P., Baumeister, F., Rudelli, N., Corrigan, G., Naigles, L. R., & Durrleman, S. (2024). Exploring metalinguistic awareness in school-aged autistic children: Insights from grammatical judgment. Journal of Autism and Developmental Disorders. https://doi.org/10.1007/s10803-024-06569-yCrossRefGoogle Scholar
World Health Organization. (2019). International statistical classification of diseases and related health problems (11th ed.).Google Scholar
Yow, W. Q., & Li, X. (2024). Role of bilingual experience in children’s context-sensitive selective trust strategies. Bilingualism: Language and Cognition, 27(1), 95106. https://doi.org/10.1017/S1366728923000433.CrossRefGoogle Scholar
Yow, W. Q., & Markman, E. M. (2015). A bilingual advantage in how children integrate multiple cues to understand a speaker’s referential intent. Bilingualism: Language and Cognition, 18(3), 391399. https://doi.org/10.1017/S1366728914000133.CrossRefGoogle Scholar
Yu, Y.-T., Lin, C.-H., Li, H.-J., Tsai, C.-H., & Chen, K.-L. (2022). Different mediators of applied theory-of-mind competence in children with autism spectrum disorder. Research in Developmental Disabilities, 130, 104335. https://doi.org/10.1016/j.ridd.2022.104335.CrossRefGoogle ScholarPubMed
Zelazo, P. (2006). The dimensional change card Sort (DCCS): A method of assessing executive function in children. Nature Protocols, 1, 297301. https://doi.org/10.1038/nprot.2006.46.CrossRefGoogle ScholarPubMed
Ziatabar Ahmadi, S. Z., Jalaie, S., & Ashayeri, H. (2015). Validity and reliability of published comprehensive theory of mind tests for normal preschool children: A systematic review. Iranian Journal of Psychiatry, 10(4), 214224.Google ScholarPubMed
Figure 0

Figure 1. Graphical explanation of the hypothesized direct and indirect impact of bilingualism on ToM (Baumeister et al., 2025).

Figure 1

Table 1. Participant overview

Figure 2

Table 2. Demonstration of entropy score calculations

Figure 3

Table 3. Overview blocks Theory of Mind task

Figure 4

Table 4. Result of the logistic mixed effects model (RQ1)

Figure 5

Figure 2. Predicted probabilities of correct answers in different blocks for NT and autistic monolingual and bilingual children. ASD = Autistic children; NT = Neurotypical children.

Figure 6

Table 5. Result of the logistic mixed effects binomial model (RQ2) – balance of exposure across contexts

Figure 7

Table 6. Result of the logistic mixed effects model (RQ2) – balance of exposure within contexts

Figure 8

Figure 3. Effect of balance of exposure within contexts on the predicted probability of an accurate response. ASD = Autistic children; NT = Neurotypical children, orange line = Block 1 (diverse desires), green line = Block 2 (first-order false beliefs), violet line = Block 3 (second-order false beliefs).

Figure 9

Figure 4. Tests of ACME and ADE for Block 2 in NT children. ACME = Average Causal Mediation Effect, ADE = Average Direct Effect; NT = Neurotypical children.

Supplementary material: File

Baumeister et al. supplementary material

Baumeister et al. supplementary material
Download Baumeister et al. supplementary material(File)
File 364.1 KB