Highlights
-
• Executive function skills predict visuospatial perspective-taking (VPT).
-
• Bilingual and monolingual adults did not differ in VPT.
-
• Neither a continuous measure of bilingualism nor cultural orientation predicts VPT.
1. Introduction
Bilingual speakers outperform their monolingual peers in various perspective-taking tasks, such as false-belief tasks (see overview in Rubio-Fernández, Reference Rubio-Fernández2017), spatial reasoning tasks (Greenberg et al., Reference Greenberg, Bellana and Bialystok2013) and referential communication task (Navarro & Conway, Reference Navarro and Conway2021; Wu & Keysar, Reference Wu and Keysar2007). The present study investigated whether bilingual and monolingual adults differ in terms of two aspects of visual perspective-taking (VPT), namely, in identifying whether someone sees something and the way in which they see it. Because differences in perspective-taking have been related to differences in culture and executive function, the study also investigated whether bilingualism affects visual perspective-taking independently of differences in these two factors.
1.1. Visuospatial perspective-taking and bilingualism
Perspective-taking is a multifaceted socio-cognitive mechanism. It is a crucial capacity to represent the mental state of another person, which helps to infer others’ mental states, such as beliefs, knowledge or attitudes. In turn, this allows us to adjust corresponding utterances and actions to achieve highly efficient social interaction (Apperly, Reference Apperly2011; Baron-Cohen et al., Reference Baron-Cohen, Leslie and Frith1985). One aspect of perspective-taking is visuospatial perspective-taking (VPT), which refers to the ability to imagine and mentally manipulate objects or scenes from different perspectives, such as self and other. It involves the ability to mentally transform images, understand spatial relationships and navigate through environments using mental imagery (Flavell, Reference Flavell1977). VPT has been divided into two levels (Flavell et al., Reference Flavell, Everett, Croft and Flavell1981). Level-1 VPT requires judgements about what is visible to others; level-2 VPT requires judgements about how the same object appears to different viewers. Whereas level-1 VPT can be solved by judging whether an object falls in a viewer’s line of sight, level-2 VPT commonly involves more effortful mental transformation, such as simulated spatial rotation (Surtees et al., Reference Surtees, Samson and Apperly2016). The present study explored the importance of bilingualism for both level-1 and level-2 VPT.
Bilingual speakers have often been found to outperform their monolingual peers in various aspects of perspective-taking such as false-belief reasoning or referential communication (Díaz, Reference Díaz2022; Feng et al., Reference Feng, Cho and Luk2024; Greenberg et al., Reference Greenberg, Bellana and Bialystok2013; but see Ryskin et al., Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014; Wang et al., Reference Wang, Tseng, Juan, Frisson and Apperly2019 for null results). For example, bilinguals have outperformed monolinguals in a communicative perspective-taking task, the director task (Agostini et al., Reference Agostini, Apperly and Krott2025; Fan et al., Reference Fan, Liberman, Keysar and Kinzler2015; Navarro et al., Reference Navarro, DeLuca and Rossi2022; Navarro & Conway, Reference Navarro and Conway2021). In this task, a director instructs participants to move objects in a grid. While participants see all objects, the director has visual access to some of the objects. Participant must take the director’s visual perspective into account to move the correct objects. Bilinguals are less affected by their own perspective.
Bilingual VPT has been studied in adults mainly with means of communicative tasks (Navarro et al., Reference Navarro, DeLuca and Rossi2022; Navarro & Conway, Reference Navarro and Conway2021; Ryskin et al., Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014; Wu & Keysar, Reference Wu and Keysar2007). The disadvantage of communicative tasks is their verbal component. Bilingual and monolingual participants may have different levels of fluency in the experimental language. This can lead to subtle differences in language retrieval and processing (Gollan et al., Reference Gollan, Montoya, Fennema-Notestine and Morris2005). For instance, Ryskin et al. (Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014) found that bilingual participants interpreted spatial instructions more slowly than monolingual participants. To avoid the confounding effects of language proficiency, the present study tested bilingual VPT in tasks that do not involve communication and make minimal demands on language.
1.2. Cultural orientation in VPT
Perspective-taking performance appears sensitive to cultural differences and experiences (Kessler et al., Reference Kessler, Cao, O’Shea and Wang2014;Wu et al., Reference Wu, Barr, Gann and Keysar2013 ; Wu & Keysar, Reference Wu and Keysar2007). For instance, Wu and Keysar asked Westerners (i.e., American English speakers) and East Asians (i.e., Mandarin Chinese speakers) to participate in the director task. Eye movements showed that both groups first focused on egocentric distractors, but East Asians were quicker in suppressing this egocentric tendency (Wu et al., Reference Wu, Barr, Gann and Keysar2013; Wu & Keysar, Reference Wu and Keysar2007). Similarly, Kessler et al. (Reference Kessler, Cao, O’Shea and Wang2014) utilised both a level-1 and a level-2 VPT task in which East Asians (i.e., Chinese) and Westerners (i.e., British) judged the position of an object relative to another person’s perspective. Again, East Asians were faster than Westerners in both tasks. Furthermore, East Asians revealed an other-centred bias, while the Westerners showed a self-centred bias.
These results suggest that cultural orientation affects the degree of the egocentric bias. Triandis (Reference Triandis1996, Reference Triandis2001) argued that compared to people growing up in an individualistic cultural society, people growing up in a collectivistic cultural society tend to define themselves as part of a group and give priority to ingroup goals. The concept of self/other is, therefore, modulated by an individual’s cultural background. Similarly, Markus and Kitayama (Reference Markus and Kitayama1991) argued that most Western cultures emphasise the importance of self-orientation, which promotes the construal of independence. In contrast, most Eastern cultures emphasise community benefits and paying attention to others, facilitating the construal of interdependence. In interpersonal contexts, individuals with an interdependence construal are therefore more concerned with the needs of others than with their own.
While East Asians and Westerners likely share the same underlying perspective-taking processes (Kessler et al., Reference Kessler, Cao, O’Shea and Wang2014; Wu et al., Reference Wu, Barr, Gann and Keysar2013), participants from collectivist cultures might be better at suppressing their egocentric bias. This can explain why East Asians are prone to less or no egocentric bias in VPT tasks. Conversely, it explains why Westerners (i.e., members of individualist cultures) who are more self-oriented seem more attuned to their own perspectives when the task requires them to take others’ perspectives.
While both Wu and Keysar (Reference Wu and Keysar2007), Wu and Thierry (Reference Wu and Thierry2013) and Kessler et al. (Reference Kessler, Cao, O’Shea and Wang2014) explain the superior performance of East Asians over Westerners as a cultural difference, their participant groups likely also differed in terms of bilingualism. In Wu and Keysar (Reference Wu and Keysar2007), East Asians grew up in China, but had immigrated to the United States, and in Kessler et al. (Reference Kessler, Cao, O’Shea and Wang2014) they were partly Chinese speakers who were living in the United Kingdom. As a group, East Asians were, therefore, very likely bilingual, while the Westerners were probably rather functional monolingual. Since bilinguals appear to be less egocentric than monolinguals (Rubio-Fernandez & Glucksberg, Reference Rubio-Fernandez and Glucksberg2012), Chinese participants might have had a performance advantage due to their cultural orientation, their bilingualism, or both.
Interestingly, some research on the related concept of Theory of Mind, that is, the ability to distinguish one’s own mental states from those of others, failed to find the advantage of interdependent/Eastern participants in suppressing an egocentric tendency over independent/Western participants in a false-belief task (Bradford et al., Reference Bradford, Jentzsch, Gomez, Chen, Zhang and Su2018). Importantly, Eastern (Chinese) participants were tested in China and were, therefore, much less likely to be bilingual, while Westerners were tested in the United Kingdom. The groups did not differ in task performance, despite differences in self-rated degree of collectivism.
Similar results were found by Wang et al. (Reference Wang, Tseng, Juan, Frisson and Apperly2019) in a computer-based version of the director task and a level-1 VPT task. Both tasks captured interference from self-perspective when making judgements about the other’s perspective (i.e., egocentric bias) and interference from the other’s perspective when making judgments about the self (i.e., altercentric bias). Data of Taiwanese (i.e., East Asian) and British (i.e., Western) participants were collected in their respective home countries. East Asian (interdependent) and Western (independent) participants displayed similar egocentric and altercentric biases on both tasks. These findings suggest that cultural background had little effect on perspective-taking but only warrant weak inferences about the effects of cultural orientation or bilingualism, since these were not directly measured.
It is often assumed that all participants with an Eastern background are collectivist and those with a Western background are individualist. However, this is an oversimplification. For instance, the Western countries of Poland and Portugal are collectivist, while the Eastern countries Singapore and South Korea are individualist (Krassner et al., Reference Krassner, Gartstein, Park, Dragan, Lecannelier and Putnam2017). Also, there are individual differences in cultural orientation within a country (Lomas et al., Reference Lomas, Diego-Rosell, Shiba, Standridge, Lee, Case, Lai and VanderWeele2023). In the present study, we, therefore, directly assessed participants’ cultural orientation.
1.3. Acculturation in VPT
Cultural orientations are not necessarily stable. They can change when participants are immersed in a different culture (Doucerain et al., Reference Doucerain, Medvetskaya, Moldoveanu and Ryder2023). This is particularly relevant for bilingual speakers who are more likely than monolingual speakers to be exposed to customs, ideas and perspectives from different cultures. Thus, bilinguals do not only speak two languages, they are also bicultural (Grosjean, Reference Grosjean2015). It is known that people are better at representing the mental states of those close to them (Aron et al., Reference Aron, Aron, Tudor and Nelson1991; Aron & Aron, Reference Aron and Aron1986), and cultural familiarity facilitates mindreading (Perez-Zapata et al., Reference Perez-Zapata, Slaughter and Henry2016). Bilinguals, that is, individuals who are familiar with two cultures, might, therefore, be better at representing the mental states of people from two cultures. This, in turn, might train their perspective-taking abilities more generally.
But not everybody who is exposed to different cultures also identifies with these cultures to the same degree. The degree with which bilinguals orient towards their two cultures might affect their cognitive functioning, as shown for creative thinking (Tadmor et al., Reference Tadmor, Galinsky and Maddux2012) and cognitive flexibility (Spiegler & Leyendecker, Reference Spiegler and Leyendecker2017). Importantly for the present study, bilinguals who identify with both of their cultures, thus those with a higher level of acculturation, might be more likely to pay attention to others’ perspectives in social interaction, which might lead them to be more sensitive to others’ perspectives and take others’ perspectives more efficiently. They might also be more accustomed to switching between different (cultural) perspectives and might therefore perform better in VPT. In the present study, we, therefore, measured the degree of acculturation as another potential factor for VPT.
1.4. Executive function in VPT
Executive functions, including working memory, inhibition and switching, affect the degree to which adults adopt perspectives (Long et al., Reference Long, Horton, Rohde and Sorace2018; Qureshi et al., Reference Qureshi, Monk, Samson and Apperly2020; Wardlow, Reference Wardlow2013). Because taking self and other perspectives are competing processes (Segal, Reference Segal2025), VPT is especially closely associated with inhibition and switching. Inhibition enables individuals to filter out distracting information and focus on relevant cues, to switch between different perspectives and to overcome perceptual biases that can interfere with accurate perspective-taking (Qureshi et al., Reference Qureshi, Monk, Samson and Apperly2020). Similarly, switching enables individuals to consider multiple viewpoints and integrate information from different perspectives (Long et al., Reference Long, Horton, Rohde and Sorace2018; Wardlow, Reference Wardlow2013). Both are, therefore, important facilitating cognitive mechanisms for VPT. Bilinguals often exhibit superior executive function skills compared to monolinguals (Bialystok, Reference Bialystok2009; Bialystok et al., Reference Bialystok, Craik and Luk2012; Green & Abutalebi, Reference Green and Abutalebi2013) (see recent meta-analyses in Donnelly et al., Reference Donnelly, Brooks and Homer2019; Grundy, Reference Grundy2020; Lehtonen et al., Reference Lehtonen, Soveri, Laine, Jarvenpaa, de Bruin and Antfolk2018; Lowe et al., Reference Lowe, Cho, Goldsmith and Morton2021). It has been argued that this is due to their usage of two languages, which requires that bilinguals constantly monitor, select and suppress one of their languages (e.g., Kaushanskaya & Marian, Reference Kaushanskaya and Marian2007; Kroll et al., Reference Kroll, Dussias, Bice and Perrotti2015). This suggests that bilinguals exercise executive function skills, including switching and inhibition, more frequently than monolinguals (e.g., Green & Abutalebi, Reference Green and Abutalebi2013). In other words, the bilingual advantage in inhibition and switching might lead to better perspective-taking. Furthermore, executive function has been found to vary across cultures and might vary due to biculturalism (Xie et al., Reference Xie, Altarriba and Ng2022). Being bicultural forces individuals to see the world from different perspectives and requires a high level of mental flexibility (Kharkhurin, Reference Kharkhurin2010). This might lead bilinguals to be better at VPT. Therefore, executive function is a potential confounding variable for the relationships of bilingualism and acculturation with VPT. In the present study, inhibition and switching skills were therefore tested as control variables.
1.5. The current study
The main aim of the present study was to investigate the role of bilingualism in adults’ VPT ability. In contrast to previous adult VPT studies, we applied non-verbal VPT paradigms. Furthermore, the present study is the first to study VPT differences due to bilingualism in both level-1 and level-2 VPT tasks. We incorporated tasks with a consistent design (Samson et al., Reference Samson, Apperly, Braithwaite, Andrews and Scott2010; Surtees et al., Reference Surtees, Samson and Apperly2016) which enabled a systematic examination of VPT.
VPT is affected by participants; priority of self-perspective or other-perspective. Priority of self-perspective should lead to stronger egocentric and weaker altercentric effects, whereas the reverse should be true for the priority of the other-perspective (Bukowski & Samson, Reference Bukowski and Samson2017). We predicted that priority for the other-perspective would be greater in bilinguals, resulting in weaker egocentric effects and stronger altercentric effects. If taking others’ perspectives is more automatic for bilinguals than monolinguals due to more frequent considerations of others’ perspectives, they might be more affected by the other-perspective when making a self-perspective decision, which is evident in slower responses for self-perspective decisions.
The second aim of the study was to distinguish the effects of bilingualism on VPT from the effects of culture. Previous studies on the impact of culture on VPT did not clearly distinguish between the effects of culture and the effects of bilingualism. We tested the effects of cultural orientation (collectivist versus individualist) and acculturation. For cultural orientation, we compared VPT of British monolinguals (B-Mono) with that of two bilingual groups, namely, European bilinguals (EU-Bil) and East Asian bilinguals (EA-Bil). EU-Bil participants incorporated two Western cultures, and EA-Bil participants combined an Eastern and a Western culture. We anticipated EA-Bils to be collectivist, while we expected EU-Bils and B-Monos to be individualist. This allowed us to test whether bilingualism affects VPT independently of cultural orientation. More specifically, we predicted that a collective background (i.e., Eastern culture) leads to lower egocentrism and greater altercentrism than an individualist background (i.e., Western culture). To confirm the language status of the participants (bilingualism versus monolingualism), we utilised the Language and Social Background Questionnaire (LSBQ). To confirm participants’ cultural orientations, we used the Culture Orientation Scale (Triandis & Gelfand, Reference Triandis and Gelfand1998), which measures traits of individualism and collectivism. Furthermore, we expected that those who highly identify with and integrate both heritage and host cultures, in other words, those with a higher level of acculturation, would show lower egocentrism and higher altercentrism. We measured acculturation with the means of the Vancouver Index of Acculturation (Ryder et al., Reference Ryder, Alden and Paulhus2000).
The third aim of the study was to test for effects of EF on VPT, and whether these might explain any bilingual effects. We expected participants with greater executive skills to show lower egocentric and altercentric effects (Long et al., Reference Long, Horton, Rohde and Sorace2018). As bilinguals have been found to have enhanced EF skills, they might show reduced egocentric and altercentric effects due to these skills. To help disentangle the effects of bilingualism and EF, and as VPT has been related to both inhibition and switching, we included measures of both in form of the Flanker task (Zhou & Krott, Reference Zhou and Krott2018) and the colour–shape task (Stasenko et al., Reference Stasenko, Matt and Gollan2017) respectively.
Note that altercentric effects have not been observed in all studies or under all conditions (Del Sette et al., Reference Del Sette, Bindemann and Ferguson2022; Rubio-Fernandez et al., Reference Rubio-Fernandez, Long, Shukla, Bhatia and Sinha2022; Surtees et al., Reference Surtees, Samson and Apperly2016). We implemented a blocked design, which reduces the likelihood that any altercentric effects would be exaggerated by transfer between self and other conditions. There is also debate about whether these effects reflect automatic processing of another person’s perspective (e.g., Butterfill & Apperly, Reference Butterfill and Apperly2013; Heyes, Reference Heyes2014). The present research was not designed to test for variation in automaticity, and so cannot illuminate this debate.
We approached the data in two ways. We first focused on bilingualism as a categorical variable, represented in the three participant groups. Second, we took an individual differences approach by treating the degree of bilingualism as a continuous variable. We did this because our group of monolingual participants unexpectedly scored higher in collectivism than the East Asian bilingual group. An individual differences approach does not make assumptions about the cultural orientation of particular groups. Also, with an individual differences approach, we accounted for the notion that culture and bilingualism are not a categorical variable (e.g., Grosjean & Li, Reference Grosjean and Li2013) and we were able to test whether the degree of bilingualism predicted VPT performance independently and in addition to any effects of cultural orientation, acculturation and executive function abilities. To our knowledge, the present study is the first to disentangle the effect of acculturation and bilingualism on VPT.
2. Methods
2.1. Participants
We performed a power analysis using G*Power version 3.1.9.7 (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) for a one-way ANOVA with group as an independent variable (EA-Bil, EU-Bil and NB-Mono). For a median effect size of f = 0.25, α = 0.05 and power = 0.8, the minimum sample size is 159 participants (53 participants per group). Since we anticipated attrition due to a two-session design, we aimed for 60 participants per group.
We also performed a power analysis for a hierarchical regression with six predictors (two control variables – age and socioeconomic status, three theoretical variables – acculturation, individualism, collectivism) and one tested predictor (bilingualism), a medium effect size f 2 = 0.15, α = 0.05 and power = 0.8. This led to a minimum sample size of 55 participants.
A total of 184 young adults aged between 17 and 35 (141 females, mean age = 20.5 years, SD = 2.8) were given course credits or payment in return for their participation. Of those, 63 were East-Asian bilinguals (EA-Bil), 61 European bilinguals (EU-Bil) and 60 British English monolinguals who had grown up in the United Kingdom (B-Mono) (for additional demographic details, see Table S1 in the Supplementary Material). All participants had normal or corrected-to-normal vision. The study was approved by the STEM Research Ethics Committee of the University of Birmingham (ERN_17–1812). In addition, the study complies with the Helsinki Declaration of 1975, as revised in 2008.
2.2. Measures
2.2.1. Language and social background questionnaire (LSBQ)
The LSBQ (Anderson et al., Reference Anderson, Chung-Fat-Yim, Bellana, Luk and Bialystok2018) tests participants’ social background, language history, language use in the community and language switching, and has good reliability and validity. It provides measures of socioeconomic status (SES) and degree of bilingualism. SES is calculated as the average of the parents’ education scores, rated on a 5-point Likert scale ranging from 1 (“No high school diploma”) to 5 (“Graduate or professional degree”).
2.2.2. Culture orientation
The Culture Orientation Scale (COS) (Triandis & Gelfand, Reference Triandis and Gelfand1998) is a 16-item questionnaire measuring the inclination to think, feel or act in a culturally determined way. The questionnaire distinguishes between four subscales, each containing four items: Horizontal Individualism (HI) (e.g., “I’d rather depend on myself than others”), Horizontal Collectivism (HC) (e.g., “If a co-worker gets a prize, I would feel proud”), Vertical Individualism (VI) (e.g., “It is important that I do my job better than others”), Vertical Collectivism (VC) (e.g., “Parents and children must stay together as much as possible”). The horizontal axis distinguishes between orientation towards equality and similarity among individuals, while the vertical axis reflects orientation towards hierarchy and the importance of social status and independence. Items are presented in a mixed order, with responses given on a 9-point Likert scale, ranging from “never or definitely no” to “always or definitely yes.” A higher sub-score represents a higher orientation tendency towards collectivism or individualism. The scale has shown good reliability and validity (Triandis, Reference Triandis2001). In our sample, Cronbach’s alpha for collectivism was 0.72 and 0.68 for individualism.
2.2.3. Acculturation
The Vancouver Index of Acculturation (VIA) (Ryder et al., Reference Ryder, Alden and Paulhus2000) is a questionnaire with 20 items. It assesses the degree of identification with a heritage culture and a host culture. It measures several aspects, including values, social activities, marriage and entertainment (e.g., “I often participate in my heritage culture traditions”). Each item is rated on a 9-point Likert scale, ranging from “strongly disagree” to “strongly agree.” Heritage and host culture items are presented in turn. A higher score represents higher identification with a culture. The questionnaire has good reliability and validity (Testa et al., Reference Testa, Doucerain, Miglietta, Jurcik, Ryder and Gattino2019). In the present sample, Cronbach’s alpha for the entire scale was 0.81, with 0.85 for the heritage dimension and 0.88 for the host dimension.
2.2.4. Level-1 VPT task
Procedure and materials of the level-1 VPT task were identical to those of Samson et al. (Samson et al., Reference Samson, Apperly, Braithwaite, Andrews and Scott2010) (see left panel of Figure 1). Stimuli showed a view into a room. Red dots were displayed on the left and/or the right wall. A human avatar in the room’s centre faced the left or the right wall. In the congruent condition (50% of trials), dots appeared only on the wall that the avatar faced, meaning that the participant and the avatar saw the same number of dots. In the incongruent condition (50% of trials), the dots were presented on both the left and right walls, meaning the participant and the avatar saw different numbers of dots. Avatars were matched to the gender (male, female) and ethnicity of a participant by manipulating physical features like hairstyle and face colour. Thus, EA-Bil participants saw an East Asian avatar, while EU-Bil and B-Mono participants saw a Western avatar.

Figure 1. Example stimuli of level-1 congruent and incongruent conditions and example trial sequence of the VPT task (left panel). Example stimuli of level-2 congruent and incongruent conditions and example trial sequence (right panel).
We separated the self and other perspectives into separate tests. In the self-perspective task, participants were instructed to judge from their own perspective. If they spontaneously adopted the avatar’s perspective, an altercentric interference would be observed. In the other-perspective task, participants were instructed to judge from the avatar’s perspective. If they spontaneously took the self-perspective, an egocentric interference would be observed. Altercentric and egocentric interferences were calculated as differences between congruent and incongruent conditions.
Each trial began with a fixation cross, lasting for 750 ms, followed by a 500-ms blank screen. Next, following Samson et al.’s (Reference Samson, Apperly, Braithwaite, Andrews and Scott2010) design, a cue presented for 750 ms told the participants whether to focus on their own perspective (“YOU”) or on the avatar’s perspective (“HE”/“SHE”). After a further 500 ms, an integer between 0 and 3 was presented for 750 ms, indicating the number of dots for verification. Finally, the stimulus picture was presented. Participants had 2000 ms to press one of two mouse buttons to judge whether the preceding number corresponded to the number of dots in the picture with the relevant perspective.
For both self-perspective and other-perspective conditions, there were 48 matching (“yes” response) trials, that is, 24 congruent (same perspectives) and 24 incongruent trials (different perspectives), and 48 mismatching (“no” response) trials. In addition, there were eight practice trials and four fillers without any dots. Trials were divided into two blocks. The order of trials in each block was pseudo-randomised, and the block presentation sequence was counterbalanced across participants. The self-perspective version of the task tested altercentric bias, while the other-perspective version tested egocentric bias. Both level-1 and level-2 PT tasks had a 3 (groups: EA-Bil, EU-Bil and B-Mono) × 2 (perspective: self and other) × 2 (congruency: congruent and incongruent) mixed design, with group as between-participant factor, and perspective and congruency as within-participant factors.
2.2.5. Level-2 VPT task
The procedure and materials of the level-2 VPT task were identical to the blocked design of Surtees et al. (Reference Surtees, Samson and Apperly2016) (see right panel of Figure 1). The stimuli showed a cartoon avatar standing next to a table. The avatar faced randomly to the left or right side of the table. The number “6” or “9” was placed on the table or stuck to the wall, meaning that the number was seen either lying flat or standing upright. Importantly, in the case of an upright number, the perspectives of the avatar and the participant were the same (congruent condition), while for a flat-lying number, the avatar’s perspective was different from that of the participant (incongruent condition). Because the cartoon avatar did not have a gender or any racial features, they did not need to be matched with the participant. Note that Surtees et al. (Reference Surtees, Samson and Apperly2016) found that the orientation of the number did not affect RTs or error rates in the absence of the avatar.
Judgements in the congruent condition would be easier than in the incongruent condition if participants spontaneously take both perspectives into account. Therefore, we predicted that participants would respond more slowly and make more errors in the incongruent than in the congruent condition. We split self and other perspectives into separate tests. In the self-perspective task, participants were instructed to judge from their own perspective. If they spontaneously adopted the avatar’s perspective, an altercentric interference would be observed. In the other perspective task, participants were instructed to judge from the avatar’s perspective. If they spontaneously took their own perspective, an egocentric interference would be observed. Altercentric and egocentric interferences were calculated as the difference in errors and RTs between the congruent and incongruent conditions (incongruent minus congruent).
Each trial started with a fixation cross for 750 ms, followed by a 500-ms blank screen and a cue, presented for 750 ms. For the self-perspective test, the cue was “YOU,” while for the other-perspective test, it was “HE.” The cue told the participant to decide from their own perspective or the avatar’s perspective. After a 500-ms break, the number “6” or “9” appeared for 750 ms. Finally, participants saw the stimulus and indicated using a keyboard whether the cue matched the picture (“d” key) or not (“k” key). For both self-perspective and other-perspective tests, the number matched the stimulus in 50% of the trials.
Each test contained 10 practice trials, 48 mismatching trials and 48 matching (“yes” response) trials, namely, 24 congruent and 24 incongruent trials, distributed over two blocks. The order of the trials within a block was pseudo-randomised, and the block presentation sequence was counterbalanced across participants.
2.2.6. Flanker task (Inhibition)
We adopted the procedure by Zhou and Krott (Reference Zhou and Krott2018) for the Flanker paradigm (Eriksen & Eriksen, Reference Eriksen and Eriksen1974) (see left panel of Figure 2). Stimuli were rows of arrows. Each arrow had a visual angle of about 0.55 degrees, and the distance between the arrows was about 0.06 degrees. Each trial started with a fixation cross for 1500 ms, followed by a stimulus, which remained until a response was made or after a timeout of 2000 ms. Stimuli were followed by an inter-trial interval of 150 ms. Participants judged the direction of the central arrow by pressing a left or right button on a Cedrus RB-834 response pad. In congruent trials, the central arrow and the flanking arrows pointed in the same direction, whereas in incongruent trials, they pointed in opposite directions. To increase the task’s difficulty, 75% of the trials were congruent and 25% were incongruent. Twenty-four practice trials were followed by five blocks of randomised 96 trials each, thus 360 congruent trials and 120 incongruent trials.

Figure 2. Example trial sequence of the Flanker task (left panel) and example trial sequence of the colour–shape task (right panel).
2.2.7. Colour–shape task (Switching)
This Task was based on Stasenko et al. (Reference Stasenko, Matt and Gollan2017) (see right panel of Figure 2). Stimuli showed the outline of a black triangle or circle superimposed on either a red or green rectangle (measuring 640 × 480 pixels). Participants judged stimuli from a colour (red versus green) or shape dimension (triangle versus circle), indicated by a preceding cue. The latter was a horizontal rainbow-coloured bar to cue a colour decision or a horizontal line-up of four black outlines of geometric shapes (i.e., rhombus, triangle, circle and square) to cue a shape decision. Cue images measured 320 × 108 pixels. Responses were given by a right (red/triangle) or left button (green/circle) on a Cedrus RB-834 response pad. Each trial started with a fixation cross for a random duration of 500–1500 ms, followed by a colour or shape cue for 250 ms. After a blank screen of 100 ms, a central fixation cross occurred, with a jittered presentation of 1000–2000 ms. Next, a stimulus occurred until a button was pressed or until 2000 ms. Finally, a blank screen of 100 ms followed. Apart from 24 practice trials and 16 fillers, each participant responded to 96 stay and 96 switch trials, randomly presented over four blocks. Participants were able to take breaks between blocks.
2.3. Procedure
Tasks were divided into two 1-hour sessions, with 2–5 days between sessions. In session 1, participants first filled in the questionnaires (LSBQ, COS and VIA). They then completed the Flanker task, followed by two self-perspective VPT tasks: a level-1 VPT task and a level-2 VPT task. In session 2, participants started with the two other-perspective VPT tasks, again a level-1 VPT task and a level-2 VPT task, and finished with the colour–shape task. Presenting self-perspective VPT tasks always in the first session and other-perspective VPT tasks in the second session allowed for the detection of a VPT effect independent of task and attention switching effects (Samson et al., Reference Samson, Apperly, Braithwaite, Andrews and Scott2010; Surtees et al., Reference Surtees, Samson and Apperly2016). In each session, the order of task levels (level-1 vs level-2) was counterbalanced across participants.
2.4. Data pre-processing
2.4.1. Bilingualism index
The bilingualism index was determined with the LSBQ bilingualism score calculator (Anderson et al., Reference Anderson, Chung-Fat-Yim, Bellana, Luk and Bialystok2018). It categorises participants into three categories: monolingual (<−3.13), bilingual (>1.23) and ambiguous (<1.23 and >−3.13). Some of the measures feeding into the score had missing data, especially for “religion,” “praying,” “partner,” and “work” (not more than 15% per measure). We filled in the missing data by using the “mice” package in R (Buuren & Groothuis-Oudshoorn, Reference Buuren and Groothuis-Oudshoorn2011), apart from the removal of one EA-Bil participant whose missing data exceeded 60%.
2.4.2. Collectivism and individualism indexes
Culture orientation subscales were created by averaging the answers of the four items from each dimension of the COS. Because the two individualism subscales HI and VI (r = 0.21, 0 < 0.01) and the two collectivism subscales HC and VC (r = 0.33, p < 0.001) were correlated, we averaged HC and VC scores to calculate a collectivism index and averaged HI and VI scores to calculate an individualism index. A higher index represented higher collectivist or individualist orientation.
2.4.3. Acculturation index
The acculturation index was generated from the VIA questionnaire. Information about heritage culture and additional demographic details are available in Table S1 in the Supplementary Material. Indices of heritage and host culture (i.e., British) were determined by averaging scores of the corresponding 10 items. Twenty-four B-Mono participants stated that they only had a heritage but no host culture. We coded their host culture score as “0.” All participants identified highly with at least one culture. Thus, the acculturation index was calculated as the absolute value of the difference between heritage and home culture indexes. It, therefore, represented the extent to which participants identified with both cultures, with a lower score representing balanced identification with two cultures, and a higher score reflecting a tendency to identify closely with only one culture.
2.4.4. VPT tasks
We determined response time outliers of the VPT tasks by visual inspection (Howitt & Cramer, Reference Howitt and Cramer2008; Wilcox, Reference Wilcox2012). We further excluded responses with RT > 3 SD (Miller, Reference Miller1991; Ratcliff, Reference Ratcliff1993). Note that the findings are consistent for a 2.5 SD cut-off.
For level-1 VPT self-perspective, we removed RTs < 300 ms (3.5%) and one EU-Bil and one NB-Mono whose accuracy rates were < 3 SDs below the mean of all participants (1.05%). We excluded incorrect response trials (6.45%) and RTs > 3 SDs above the mean of a particular participant (0.91%). For level-2 VPT self-perspective, we removed RTs < 300 ms (0.18%) and two EA-Bils, three EU-Bils and one NB-Mono whose accuracy rates were < 3 SDs below the mean of all participants (3.78%). We excluded incorrect response trials (6.76%) and RTs > 3 SDs above the mean (1.32%). For level-1 VPT other-perspective, we removed RT < 300 ms (3.07%) and one EU-Bil and two NB-Bils whose accuracy rates were < 3 SD below the mean of all participants (1.43%). We excluded incorrect response trials (5.47%) and RTs > 3 SDs above the mean (0.78%). For level-2 VPT other-perspective, we removed RTs < 300 ms (0.10%) and one EA-Bil, one EU-Bil and three NB-Monos whose accuracy rates were < 3 SDs below the mean of all participants (2.71%). We excluded incorrect response trials (7.10%) and RTs larger than 3 SDs above the mean (1.14%).
2.4.5. Flanker task (Inhibition)
For the flanker task, we removed RTs < 300 ms (1.12%) and three EU-Bils and one NB-Mono whose accuracy rates were < 3 SDs below the mean of all participants (2.12%). We also removed incorrect response trials (1.63%) and RTs > 3 SDs above the mean (1.35%).
2.4.6. Colour–Shape task (Switching)
For the Colour–shape task, we removed RTs < 300 ms (6.07%) and three EA-Bils, two EU-Bils and two NB-Monos whose accuracy rates were < 3 SDs below the mean of all participants (2.94%). We excluded incorrect response trials (5.41%) and RTs > 3 SDs above the mean (1.61%).
3. Results
Analyses were conducted in four steps. In Step 1, we confirmed task effects for all tasks. In Step 2, we created composite scores for egocentric interference, altercentric interference, inhibition interference and switching interference as a preparation for the analyses in Steps 3 and 4. In Step 3, we conducted group comparisons for level-1 and level-2 egocentric and altercentric VPT interference. In Step 4, we conducted correlation analyses and hierarchical regressions to investigate individual differences in factors determining VPT.
3.1. Step 1: Confirmation of task effects
To check for replication of common task effects, we fitted linear mixed effect models to RTs and generalised linear mixed effect models to error rates using the lme4 package in R version 4.2.2 (R Core Team, 2021). RT data were transformed using the inverse function to obtain more normally distributed data (Lo & Andrews, Reference Lo and Andrews2015). All fixed effects were factorised and deviation-coded, and trial numbers were scaled and added as a fixed factor to account for speed-up effects. Whenever the model did not converge, we simplified the model by firstly removing random slopes, then random intercepts, then fixed interaction effects and finally fixed main effects until the model converged (Barr et al., Reference Barr, Levy, Scheepers and Tily2013). The drop1 function from the lmerTest package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) was used to identify the best-fitting model (see Supplementary Material). Main effects and interactions were determined by ANOVA comparison between the models with and without the effect (Winter, Reference Winter2019). Simple effects in interactions were checked by using the emmeans function in the emmeans package (Winter, Reference Winter2019).
3.1.1. VPT tasks
The fixed effect structure of the model for VPT tasks contained task level (level-1, level-2), perspective (self, other) and congruency (congruent, incongruent) as well as their interactions. The maximum models for RTs and accuracy rates included intercepts from both random effects, slopes for within-subject factors (1 + tasklevel × perspective × congruency | subject) and slopes for within-stimulus factors (1 + perspective | trial).
The results of the VPT tasks are shown in Table 1 and Figure 3. They generally confirm previous findings. Participants were significantly faster in the level-1 task than in the level-2 task. Participants also responded significantly faster and more accurately in the self-oriented condition than in the other-oriented condition, as well as in the congruent condition than in the incongruent condition.
Table 1. Results for the fixed mixed effect model for response times and accuracy in VPT tasks


Figure 3. Comparisons of response times (left) and error rates (right) for congruent and incongruent conditions in self-perspective and other-perspective level-1 and level-2 VPT tasks. Note. The difference between incongruent and congruent trials indicates the egocentric and altercentric interference in VPT task. Note: *p < .05. **p < .01. ***p < .001. NS = nonsignificant.
Importantly, we found a significant perspective × congruency interaction rather than a perspective × congruency × task interaction, for both RTs, χ2 = 10.38, p = .001, and accuracy, χ2 = 142.25, p < .001. Replicating previous findings, participants responded faster, z = 5.99, p < .001, and more accurately, z = 6.76, p < .001, in the congruent condition than in the incongruent condition in the other-perspective condition, reflecting egocentric interference. For self-perspective, no significant difference between the congruent and incongruent conditions was found, for either RTs, z = 1.049, p = .290, or accuracy, z = 1.35, p = .177. Thus, we were unable to replicate the congruency effect in level-1 self-VPT (Samson et al., Reference Samson, Apperly, Braithwaite, Andrews and Scott2010; Surtees et al., Reference Surtees, Samson and Apperly2016).
3.1.2. Flanker and colour–shape tasks
The models for the Flanker and colour–shape tasks contained the fixed effect congruency (congruent versus incongruent) or switching (stay versus switch) respectively. Participants and trials were added as random effects. The interaction of congruency by participant was added as a random slope.
Results of both tasks (see Table 2 and Figure 4) confirmed previous findings (Stasenko et al., Reference Stasenko, Matt and Gollan2017; Zhou & Krott, Reference Zhou and Krott2018). In the Flanker task, participants responded significantly faster and more accurately in the congruent than in the incongruent condition. In the colour–shape task, participants responded significantly faster and more accurately in the stay condition than in the switch condition.
Table 2. Results for the fixed mixed effect model for response times and accuracy in Flanker task and colour–shape task


Figure 4. Comparisons of response times and error rates for congruent and incongruent conditions in Flanker task (left). And comparisons of response times and error rates for switch and stay differences in colour–shape task (right). Note. The difference between incongruent and congruent trials indicates the inhibition interference in Flanker task. Note: *p < .05. **p < .01. ***p < .001. NS = nonsignificant.
3.2. Step 2: Dimensionality reduction of dependent variables
We created composite scores for dependent variables in our tasks to increase the reliability of our results and to simplify our analyses (Rushton et al., Reference Rushton, Brainerd and Pressley1983).
3.2.1. Egocentric index
In the case of the VPT tasks, we were interested in egocentric and altercentric interference. Each interference measure was calculated in three steps. First, we built up a composite score for each condition (congruent and incongruent) by summing up inverted standardised RT and standardised accuracy (Bruyer & Brysbaert, Reference Bruyer and Brysbaert2011). Second, we built up a regression model by regressing the incongruent condition on the congruent condition. Finally, the residuals of the model were taken as the interference index (DeGutis et al., Reference DeGutis, Wilmer, Mercado and Cohan2013):
Note that a higher interference index reflects a better performance, that is, less egocentric/altercentric bias. This is different from how interference is typically calculated, namely as the difference between the incongruent and congruent conditions (Bruyer & Brysbaert, Reference Bruyer and Brysbaert2011; DeGutis et al., Reference DeGutis, Wilmer, Mercado and Cohan2013).
3.2.2. Inhibition index and switching index
The same approach as for the VPT tasks was applied to calculate an inhibition index in the Flanker task and a switching index in the colour–shape task. This meant that a higher inhibition index indicated a smaller inhibition cost in the Flanker task, and a higher switching index corresponded to a smaller switching cost in the colour–shape task.
3.3. Step 3: Results – group comparisons
We compared the three participant groups on all measures with one-way ANOVAs (see Table 3). Additional data analysis concerning the comparison of the raw RTs and error rates between three groups can be found in the Supplementary Material.
Table 3. EA-Bil, EU-Bil and B-Mono group comparison

Note: Overall p values indicate significant level (*p < .05. **p < .01. ***p < .001).
The three participant groups differed significantly in terms of age, bilingualism, collectivism and inhibition index. EA-Bils were older than EU-Bils and B-Monos, t = 3.37, p < .01. In terms of bilingualism, both bilingual groups (EA-Bils and EU-Bils) scored higher than B-Monos, EA-Bils versus B-Monos: t = 27.32, p < .001; EU-Bils versus B-Monos: t = 22.67, p < .001. However, EA-Bils scored higher than EU-Bils (t = 4.58, p < .001), indicating a higher level of bilingualism for the East-Asian than the European group. In terms of collectivism, EU-Bils and B-Monos scored higher than EA-Bils (EA-Bils versus EU-Bils: t = 4.14, p < .001; EA-Bils versus B-Monos: t = 6.26, p < .001), indicating that the European groups were more collectivist than the East-Asian group. This went against our expectation that EA-Bils would be the most collectivist.
In terms of the flanker task, EA-Bils had a higher inhibition index than B-Monos (t = 2.43, p < .05), while EU-Bil group did not differ from the other two groups. This suggests that EA-Bils spent less effort on inhibition than B-Monos. In terms of acculturation, B-Monos scored higher than EA-Bils (t = 6.50, p < .001) and EU-Bils (t = 7.43, p < .001), while the two bilingual groups did not differ. As expected, the monolingual group was less integrated into two cultures than both bilingual groups. Finally, there was no difference among the three groups in terms of switching ability, socioeconomic status or individualism. Most importantly, the groups did not differ in terms of egocentric or altercentric bias in VPT (see Figure 5), neither in level-1 VPT nor in level-2 VPT.

Figure 5. Group comparison (B-Mono: British monolinguals, EA-Bil: East Asian bilinguals, EU-Bil: European bilinguals) of the egocentric index in level-1 and level-2 VPT tasks.
In sum, while the groups differed in expected ways (see bilingualism, acculturation), we also found that the East Asian group unexpectedly scored highest on collectivism. Importantly, we did not find any group differences in egocentric or altercentric bias in VPT.
3.4. Step 4: Results – correlation and regression analyses
We next conducted individual difference analyses in the form of correlation and regression analyses. We first investigated whether continuous measures of bilingualism, acculturation, individualism, collectivism and executive functions, as well as the control variables age and SES, were related to perspective-taking skills in our sample. We then conducted hierarchical regression analyses to examine how bilingualism influences perspective-taking independently of executive functions and cultural factors, controlling for age and SES. Given the lack of altercentric biases in the tasks and no altercentric group differences, we focused here on egocentric interference.
3.4.1. Correlation analyses
Table 4 shows the correlation coefficients of the relationships between level-1 and level-2 egocentric biases with age, SES and the indices of inhibition, switching, collectivism, individualism, acculturation and bilingualism. We found significant relationships only for the switching index, which was positively correlated with level-1 and level-2 egocentric indices. In other words, participants who were better switchers, showed a smaller egocentric cost when judging someone else’s perspective. The egocentric bias did not correlate with any of the other measures, including bilingualism. Results of the full correlation matrix, also for the three participant groups separately, can be found in Tables S3 and S4 in the Supplementary Materials.
Table 4. Correlation coefficients for relationships of bilingualism, collectivism, individualism, acculturation, inhibition and switching indexes with level-1 and level-2 Egocentric indexes

Note: *p < .05. **p < .01. ***p < .001.
3.4.2. Regression analyses
We next investigated whether bilingualism was associated with egocentric bias in VPT independently of a relationship with executive function and cultural differences. We conducted hierarchical multiple regression analyses for both level-1 and level-2 egocentric biases with age and SES as control variables (Step 1), executive function (inhibition and switching) (Step 2) and cross-cultural difference (individualism, collectivism and acculturation) as theoretical control variables (Step 3) and bilingualism as a predictor of interest (Step 4) (Agresti & Kateri, Reference Agresti and Kateri2021). All continuous variables were standardised. We conducted analyses in R version 4.2.2 (R Core Team, 2021) using the lm function from the lme4 package. The vif function from the car package (Fox & Weisberg, Reference Fox and Weisberg2018) confirmed that collinearity between predictors was very low (all vifs ~1). Results are presented in Table 5.
Table 5. Results for hierarchical multiple regression for the level-1 and level-2 egocentric indexes

Note: Sample size for level-1 VPT is 160 and level-2 VPT is 159. *p < .05. **p < .01. ***p < .001.
For level-1 egocentric bias, age and SES accounted for 0.8% variance in Step 1. A further 4.7% was explained by EF predictors in Step 2, with switching being a significant predictor, β = 0.27, p < .01. Together, variables entered in Step 2 explained 5.5% of the overall variance, F (2, 155) = 3.82, p < .05. In Step 3, a further 1.5% was explained by culture measures. But none of the culture factors were individually significant, and model comparison revealed that the addition of culture factors did not improve model fit. In Step 4, a further 1.8% was explained by bilingualism. But the effect of bilingualism was not significant, and model comparison showed no improvement in model fit.
For level-2 egocentric bias, age and SES accounted for 2.6% variance in Step 1. In Step 2, a further 9.7% of level-2 egocentric bias was explained by EF predictors, with switching being a significant predictor, β = 0.35, p < .001. Together, variables entered in Step 2 explained 12.3% of the overall variance, F (2, 154) = 8.45, p < .001. In Step 3, a further 1.7% was explained by culture. But none of the culture factors were individually significant, and model comparison revealed that the addition of cultural factors did not improve model fit. In Step 4, a further 0.9% was explained by bilingualism. But the effect of bilingualism was not significant, and model comparison showed no improvement in model fit.
To summarise, participants with smaller switching costs showed less egocentric interference in the level-1 and level-2 tasks. Neither bilingualism nor culture predicted egocentric interference suppression.
Although bilingualism and acculturation were not directly correlated with egocentric bias, we wondered whether they might indirectly affect egocentric bias via switching. However, bilingualism was not correlated with either switching ability or VPT (apart from a significant correlation of bilingualism and switching ability within the EA-Bil group; see Tables S3 and S4 in the Supplementary Materials). Similarly, acculturation was not correlated with either switching skills or VPT. Therefore, the hypotheses that bilingualism or acculturation might be indirectly related to VPT via switching were rejected.
4. Discussion
The main aim of the present study was to investigate the benefit of bilingualism on adults’ VPT ability, extending a previous investigation in children (Greenberg et al., Reference Greenberg, Bellana and Bialystok2013). Furthermore, we tested for the first time the effect of bilingualism on both level-1 and level-2 VPT and in terms of both egocentric and altercentric tendencies. The second aim of the study was to distinguish the effects of bilingualism on VPT from any effects of culture and EF. To address these aims, we compared VPT performance of British monolingual speakers with that of two bilingual groups, namely, a group of European (Western) bilinguals and a group of Chinese-English (i.e., East Asian) bilinguals. In addition, we conducted an individual differences analysis to test whether participants’ degree of bilingualism predicted VPT performance independently of cultural orientation, acculturation, inhibition and switching ability.
Comparing the three participant groups on their VPT performance did not reveal any effects of bilingualism or culture. The groups revealed similar levels of egocentrism in both tasks, and neither group displayed altercentric interference on either perspective-taking task. These findings were supported by the individual differences analyses, which confirmed that neither bilingualism, collectivism, individualism or acculturation predicted egocentric interference in VPT. However, switching ability predicted both level-1 and level-2 VPT egocentric interference, with better switching ability being related to better suppression of egocentric interference. Thus, our results showed that EF, but not bilingualism or cultural differences, predicts VPT performance. These findings are important for approaches to cultural and bilingual differences in VPT in the following ways.
4.1. Bilingual and monolingual VPT performance did not differ
Our findings suggest that bilingualism does not improve visual perspective-taking abilities in young adults. This is in line with some previous studies (Ryskin et al., Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014; Wang et al., Reference Wang, Tseng, Juan, Frisson and Apperly2019), but contrasts with findings in communicative perspective-taking found with the director task (Navarro et al., Reference Navarro, DeLuca and Rossi2022; Navarro & Conway, Reference Navarro and Conway2021; Wu & Keysar, Reference Wu and Keysar2007). We consider three possible explanations for this result.
First, a bilingual perspective-taking advantage of adults might not be consistent across populations and tasks. Rubio-Fernandez and Glucksberg (Reference Rubio-Fernandez and Glucksberg2012) found a bilingual advantage in terms of reduced egocentric bias in a false-belief task, while Bradford et al. (Reference Bradford, Jentzsch, Gomez, Chen, Zhang and Su2018) did not. Similarly, some studies (Navarro et al., Reference Navarro, DeLuca and Rossi2022; Navarro & Conway, Reference Navarro and Conway2021; Wu & Keysar, Reference Wu and Keysar2007), but not all (Ryskin et al., Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014; Wang et al., Reference Wang, Tseng, Juan, Frisson and Apperly2019) reported a bilingual benefit using a referential communication task. Interestingly, Wang et al. (Reference Wang, Tseng, Juan, Frisson and Apperly2019) used the same level-1 VPT task as we did, and did not find a bilingual advantage either.
Second, monolingual and bilingual participants may differ in other ways besides their language experience. Our East-Asian bilinguals were significantly older than both European bilinguals and monolinguals. While age can predict egocentric and altercentric performance in level-1 and level-2 VPT (Martin et al., Reference Martin, Perceval, Davies, Su, Huang and Meinzer2019), age was not correlated with egocentric interference in our sample (Tables S3 and S4 in the Supplementary Materials). Furthermore, SES is usually considered a potential confounding factor in bilingualism research (Rivera Mindt et al., Reference Rivera Mindt, Arentoft, Kubo Germano, D’Aquila, Scheiner, Pizzirusso, Sandoval and Gollan2008). However, groups did not differ in SES. We, therefore, do not have any evidence of other factors masking a bilingual VPT benefit.
Third, while the visual perspective-taking advantage in bilingual children seems quite robust (Feng et al., Reference Feng, Cho and Luk2024; Schroeder, Reference Schroeder2018), it may not necessarily extend to young adults. This parallels the situation for executive functions, where a bilingual advantage seems more robust for children than adults. It has been argued that young adults’ executive function skills are mature and young adults are at their peak performance, leaving less room to detect the influence of bilingualism (e.g., Poarch & Krott, Reference Poarch and Krott2019). Similarly, young adults might be at the peak of their visual perspective-taking abilities, meaning that bilingual advantages might not always be detected. Alternatively, it could be that bilingualism supports precocious construction of perspective-taking abilities, but not the ability to use these abilities in their mature form (Apperly et al., Reference Apperly, Samson and Humphreys2009).
Interestingly, we failed to observe a benefit for bilinguals in switching but observed a bilingual advantage in inhibition for East Asian bilinguals (not European bilinguals). These mixed results are consistent with previous findings (Paap et al., Reference Paap, Johnson and Sawi2015, Reference Paap, Myuz, Anders, Bockelman, Mikulinsky and Sawi2017; Zhou & Krott, Reference Zhou and Krott2016) and indicate that, even though there is a difference in basic cognitive functioning between bilinguals and monolinguals, this may not translate into consistent advantages in higher-level cognition, especially in young adults.
4.2. Cultural differences did not predict VPT performance
The current study shows for the first time that people from collectivist cultures experience as much egocentric bias as people from individualist cultures, in both level-1 and level-2 VPT tasks. This finding is in line with the results of Wang et al. (Reference Wang, Tseng, Juan, Frisson and Apperly2019) for level-1 perspective-taking and Bradford et al. (Reference Bradford, Jentzsch, Gomez, Chen, Zhang and Su2018) for false belief reasoning, but appears to be inconsistent with those of Wu and co-workers (Wu et al., Reference Wu, Barr, Gann and Keysar2013; Wu & Keysar, Reference Wu and Keysar2007). A possible explanation is that the latter used a referential communication task, while we used the VPT task. However, Wang and co-workers found that culture influenced VPT performance in both a referential communication task and the level-1 dot VPT task, meaning that task type cannot fully explain the differences. Furthermore, our finding confirmed the null result by Wang and co-workers and extended it to a level-2 number VPT task.
Importantly, though, and against our expectations, European bilinguals in our study were not more collectivist than East-Asian bilinguals or British monolinguals. Instead, they were less collectivist than the other two groups. East-Asian bilinguals also did not differ from the other two groups in terms of individualism. This result does not align with the assumption of stronger collectivist traits in East-Asian participants than in Western participants (Markus & Kitayama, Reference Markus and Kitayama1991; Triandis & Gelfand, Reference Triandis and Gelfand1998). We can think of three possible reasons for our finding. First, the East-Asian participants in our study left their country to live abroad. They might be more focussed on themselves than East-Asians who stay home. Note that Chinese students who choose to study overseas are from a higher social status, with a higher level of international perspective (Wang & Crawford, Reference Wang and Crawford2020). Second, the assumption of Eastern cultures being more collectivist and less individualist than Western cultures might be wrong. In line with this, Lomas et al. (Reference Lomas, Diego-Rosell, Shiba, Standridge, Lee, Case, Lai and VanderWeele2023) found in the 2020 Gallup World Poll, which involved more than 100,000 participants in 116 countries that Easterners were more self-centred than Westerners. Third, the Culture Orientation Scale might not reflect the type of collectivism and individualism that is relevant for VPT. For instance, Bradford et al. (Reference Bradford, Jentzsch, Gomez, Chen, Zhang and Su2018), using the AICS (Shulruf et al., Reference Shulruf, Hattie and Dixon2007), found that a Chinese group scored higher in terms of collectivism than a Western group and the two groups scored similarly in terms of individualism. Thus, a different measure of cultural orientation than the one used here might significantly predict VPT.
Furthermore, our results are inconsistent with the culture orientation framework of Triandis (Triandis, Reference Triandis1996) and Markus and Kitayama (Markus & Kitayama, Reference Markus and Kitayama1991). They argue that collectivism and the interdependent self lead to less egocentric bias and more altercentric tendency. We found that East-Asian bilinguals scored lower in terms of collectivism than the two European groups, but they did not show higher egocentric interference than the other two groups.
Finally, the reason why culture and bilingualism did not predict VPT performance might be that VPT ability becomes more stable and more similar across participants in adulthood (Surtees & Apperly, Reference Surtees and Apperly2012). Interestingly, none of our factors reflecting differences in experience and socialisation predicted VPT. This is different from infancy, childhood and adolescence, where it is quite common to observe various factors influencing VPT, such as neural, cognitive, psychological and social factors (Lecce & Devine, Reference Lecce, Devine, Ferguson and Bradford2021).
4.3. Switching index predicted egocentric bias
According to Kessler et al. (Reference Kessler, Cao, O’Shea and Wang2014) and Wu et al. (Reference Wu, Barr, Gann and Keysar2013), when participants are asked to take another person’s perspective, they start with an egocentric perspective, but then suppress it and switch to the other’s perspective. We found that switching ability predicted egocentric interference suppression in both level-1 and level-2 VPT, while inhibition ability predicted neither. Since it is generally assumed that inhibition is a component of switching, it does not seem to be the inhibition aspect of switching that is predictive of egocentric interference. Instead, it might be cognitive flexibility, that is, the skill to switch smoothly between one perspective to the other, that is important in egocentric suppression in VPT.
4.4. No significant altercentric interference
Altercentric interference is perspective computation that results in interference from another’s VPT when reporting one’s own. Given previous results using the same paradigms as here (Surtees et al., Reference Surtees, Samson and Apperly2016), we expected to find altercentric interference for level-1, but possibly not for level-2 VPT. However, we found no evidence at either level. Although altercentric effects have been replicated, some studies have found that an altercentric bias can be decreased or absent (Capozzi et al., Reference Capozzi, Cavallo, Furlanetto and Becchio2014; Gardner et al., Reference Gardner, Hull, Taylor and Edmonds2018; O’Grady et al., Reference O’Grady, Scott-Phillips, Lavelle and Smith2020), especially when “self” and “other” trials are presented in separate blocks (Del Sette et al., Reference Del Sette, Bindemann and Ferguson2022; Rubio-Fernandez et al., Reference Rubio-Fernandez, Long, Shukla, Bhatia and Sinha2022; though see Samson et al., Reference Samson, Apperly, Braithwaite, Andrews and Scott2010 Experiment 3, and Surtees et al., Reference Surtees, Samson and Apperly2016, Experiment 1). We cannot rule out that altercentric interference might have been observed if our tasks had employed more human-like avatars, with this possibility applying most clearly to our level-2 condition where the avatar was least realistic. However, Surtees et al. (Reference Surtees, Samson and Apperly2016) found that the same avatar stimuli used in the level-2 condition of the present study led to level-1 but not level-2 altercentric interference, suggesting that the nature of the avatar was an insufficient explanation for the absence of level-2 altercentric effects. Whatever the reason for our current findings, the absence of any altercentric effects makes it difficult to interpret the absence of any modulation of these effects by either bilingualism or biculturalism.
4.5. Limitations
Our study has various limitations. Firstly, our self-selected sample from university students following a psychology degree is not necessarily a reflection of the general population of young adults (Hanel & Vione, Reference Hanel and Vione2016). For instance, psychology students may be familiar with common experimental procedures and may unintentionally change their behaviour to match their understanding of the purpose of the study (Khatamian Far, Reference Khatamian Far2018).
Another restriction of this study is the predominance of female participants. Gender has been revealed to be one of the influencing factors in VPT experimental tasks (Eisenberg & Lennon, Reference Eisenberg and Lennon1983; Samuel et al., Reference Samuel, Cole and Eacott2023). For instance, Kessler et al. (Reference Kessler, Cao, O’Shea and Wang2014) and Kessler and Wang (Reference Kessler and Wang2012) discovered that adult females exhibit a stronger effect of manipulating body posture when making left/right judgments from alternative perspectives in VPT tasks as compared to males. At the same time, females tended to be more empathetic. Interestingly, bilingual females have been found to report a similar degree of perspective-taking as monolingual females on the Interpersonal Reactivity Index, while bilingual males reported a higher degree of perspective-taking than monolingual males (Tarighat & Krott, Reference Tarighat and Krott2021). Even though this was present in a population living in Iran, that is, in a collectivist culture, and in a self-reported measure of non-visual perspective-taking, it raises the question whether bilingualism might affect VPT in males more strongly.
5. Conclusion
The aim of the present study was to explore a bilingual advantage for adults in VPT and its independence from executive function and cultural differences. Monolingual and bilingual participants showed similar egocentric biases in both level-1 and level-2 VPT tasks. No group showed an altercentric bias. While the groups did not show the expected differences in cultural orientation, individual difference analyses confirmed the results and showed that only switching ability predicted egocentric interference in both tasks. These results support the idea of core similarities in VPT mechanisms across language experience and culture. They also suggest that the bilingual advantage in perspective-taking found in children does not consistently apply to young adults. Thus, VPT abilities of young adults appear to be less affected by socialisation, including culture and bilingualism, than children.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925100825.
Data availability statement
Code, materials, data, as well as the analysis code that support the findings of this study, are openly available in an OSF repository at https://osf.io/8m4u9/.
Acknowledgements
We would like to thank Dr. Andrew Surtees, Prof. Klaus Kessler and the anonymous reviewers for their insightful comments on an earlier version of the manuscript. We would also like to thank the participants for taking part in our study.
Competing interests
The authors declare none.


