1. Introduction
Human cooperation hinges on a common interpretation of our shared reality. A collective belief in the utility of money, for example, motivates global commerce; religious dedication motivates more than half the world’s standards for food, drink and dress; and the general promotion of intolerance motivates world wars. Though all progress requires converging on a shared belief system, reaching a mutual understanding of the world is a peculiar challenge. Given our increasingly global society full of difference, common assessment will minimally require cognitive and emotional malleability.
Problematically, however, when beliefs are challenged, particularly those integral to our identity, we experience emotional discomfort stemming from the tension between our convictions and conflicting data, a phenomenon supported by fMRI evidence and research on persuasion and emotion (Ahluwalia, Reference Ahluwalia2000; Jacks & Devine, Reference Jacks and Devine2000; Kaplan et al., Reference Kaplan, Gimbel and Harris2016; Munro, Reference Munro2010; Pomerantz et al., Reference Pomerantz, Chaiken and Tordesillas1995; Zuwerink & Devine, Reference Zuwerink and Devine1996). To mitigate this discomfort, people employ various strategies to resist persuasion, such as avoiding challenging information, disputing its content or source, or engaging in biased processing that prioritizes personal goals. Importantly, the strength of identity-based beliefs correlates with greater resistance to change (Damasio, Reference Damasio2010; Fransen et al., Reference Fransen, Smit and Verlegh2015; Frijda, Reference Frijda, Etxebarria Bilbao, Aritzeta Galán, Barberá Heredia, Chóliz Montañés, Martínez-Sánchez, Jiménez Aleixandre, Mateos García and Páez Rovira2008; Lazarus, Reference Lazarus1991; Panksepp & Biven, Reference Panksepp and Biven2012; Petty et al., Reference Petty, Tormala, Rucker, Jost, Banaji and Prentice2004; Ringold, Reference Ringold2002; Scherer, Reference Scherer2005; Zajonc, Reference Zajonc1980).
In one prominent study examining the neural correlates of resistance to political ideology, an inherently emotional and identity-threatening subject, Kaplan et al. (Reference Kaplan, Gimbel and Harris2016) used behavioral and fMRI data to assess resistance to persuasion. The authors showed that defending personal political beliefs against criticism is a form of internally directed cognition implicating the default mode network (DMN) and other related regions. In their study, participants read political and nonpolitical statements with which they strongly agreed. After each statement, participants read five challenges in order to examine their willingness to update beliefs in the face of counterevidence. Comparing nonpolitical beliefs to political ones postchallenge showed an activation of the orbitofrontal cortex and the dorsolateral prefrontal cortex, which are central to cognitive flexibility and important for changing one’s beliefs. The authors also found that activity in the insular cortex and the amygdala, regions associated with emotions and feelings, correlated with resistance to belief change. They concluded that participants were more emotionally resistant to change when arguments against a strongly held political belief were being made; and, immediately following counter arguments, participants were faster to give a belief rating, suggesting that confronting belief conflict urges individuals to quickly eliminate the discomfort associated with it.
In this study, though not principally concerned with the neurological machinery underlying belief change, we focus on the extent to which bilingualism might impact Kaplan and colleagues’ original results using their methodology. In their study, heightened emotional reactions stemmed from confronting identity-threatening information, which in turn led to lesser belief change in political compared to nonpolitical contexts. In growing work on bilingualism and emotion, however, bilinguals using their non-native language are often found to have muted emotional reactions to inherently evocative stimuli (Caldwell-Harris & Ayçiçeği-Dinn, Reference Caldwell-Harris and Ayçiçeği-Dinn2009; Harris et al., Reference Harris, Ayçíçeğí and Gleason2003; Thoma et al., Reference Thoma, Hüsam and Wielscher2023). Here, we examine whether the known reduced emotional sensitivity in bilingualism also applies to personally relevant, identity-threatening values, including what effect it might have on one’s cognitive flexibility.
2. Rationale
Language is a medium through which ideas are either believed or dismissed, and the language one uses to evaluate information impacts decision-making. For instance, in their foreign language, bilinguals rely on reasoning strategies that are differentially activated when making the same decisions in their native language. The FLe has also been extended to moral decision making, showing through dual systems theory approaches that the emotional reactivity associated with morally salient decisions is reduced in the second language (L2) compared to the first (L1) (Brouwer, Reference Brouwer2021; Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017; for a more thorough review see Čavar & Tytus, Reference Čavar and Tytus2018; Cipolletti et al., Reference Cipolletti, McFarlane and Weissglass2016; Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa, Foucart, Hayakawa, et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014a; Costa, Foucart, Arnon, et al., Reference Costa, Foucart, Arnon, Aparici and Apesteguia2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; He et al., Reference He, Margoni, Wu and Liu2021; Keysar et al., Reference Keysar, Hayakawa and An2012; Miller et al., Reference Miller, Solis-Barroso and Delgado2021; Romero-Rivas et al., Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022; Wong & Ng, Reference Wong and Ng2018).
Understood within such a model, there are two systems guiding decisions: System 1, our intuitive, emotional and fast processing; and System 2, our slower, supervisory and more deliberate reasoning (Alter et al., Reference Alter, Oppenheimer, Epley and Eyre2007; Greene & Haidt, Reference Greene and Haidt2002; Kahneman, Reference Kahneman2011; Kahneman, Reference Kahneman2011; see Frankish, Reference Frankish2010 for review). Accordingly, FLe outcomes are thought to have roots in either burdened processing or reduced emotion. With a surge in recent assessment, FLe data offer an increasingly fine-grained perspective on the role of the many factors pertinent to bilingualism, and its methodological considerations, that influence the role of the L2 in decision making.
Two important findings in this research stand out: (1) when drawing a dichotomy between utilitarian and deontological choiceFootnote 1 on a moral judgment task, a very common assessment in FLe work, dissociating between the two strategies is crucial and (2) the more emotionally salient the judgment under consideration, the more the FLe emerges. To the first point, Hayakawa et al. (Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017) argued that while utilitarian and deontological choice are conceptually distinct, the methodologies built to test them are not always designed to tease them apart. Through a process dissociation technique adopted from Conway and Gawronski (Reference Conway and Gawronski2013), Hayakawa and colleagues showed that the emotional processing of System 1 is indeed reduced in the L2 (see also Białek et al., Reference Białek, Muda, Stewart, Niszczota and Pieńkosz2020). To the second point, Costa, Foucart, Hayakawa, et al. (Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014a) showed that in contexts in which emotions were less salient, the FLe did not emerge because it did not have to; it was only in the most emotionally salient contexts that the FLe affected decisions. Such a result is corroborated in work on outcome biases, probability matching, logical reasoning, and judgment of base-rate neglect (Mækelæ & Pfuhl, 2019; Vives et al., Reference Vives, Costumero, Ávila and Costa2021). Importantly for this study, then, tasks that build emotional salience into the design are more likely to elicit emotional reactions and show emergence of the FLe. The question here is whether or not the FLe is robust enough to shield against the negative emotions stemming from being challenged ideologically.
While FLe results are generally reliable, they are also known to be affected by various factors related to bilingualism and the methods used to examine it, such as age of participant and type of moral dilemma (Mills & Nicoladis, Reference Mills and Nicoladis2020), L2 proficiency (Čavar & Tytus, Reference Čavar and Tytus2018; Costa, Foucart, Hayakawa, et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014a) and type of bilingual (Wong & Ng, Reference Wong and Ng2018). Thus, the FLe is context-dependent (Brouwer, Reference Brouwer2021). A great deal of FLe research, however, relies on the Trolley Problem and other similar tasks that probe hypothetical moral quandaries, betting strategies, superstition (Hadjichristidis et al., Reference Hadjichristidis, Geipel and Surian2019), framing or general heuristic biases. As a result, the general tenets of the FLe are not frequently assessed in and applied to other domains (e.g., Costa, Foucart, Hayakawa, et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014a; Díaz-Lago & Matute, Reference Díaz-Lago and Matute2019). Budding work does show, however, that the L2 is more likely to lead to risky decisions or bets (e.g., Hadjichristidis et al., Reference Hadjichristidis, Geipel and Surian2017; Keysar et al., Reference Keysar, Hayakawa and An2012; Xing, Reference Xing2021) and a minimized perception of dishonesty (Alempaki et al., Reference Alempaki, Doğan and Yang2021) and crime (Woumans et al., Reference Woumans, Van der Cruyssen and Duyck2020). Consequently, less is known about the emotional buffer provided by foreign language processing (e.g., He et al., Reference He, Margoni, Wu and Liu2021; Liu et al., Reference Liu, Margoni, He and Liu2021, Reference Liu, Schwieter, Wang and Liu2022; Zheng et al., Reference Zheng, Mobbs and Yu2020) and its role in lessening visceral reactions from confronting counterevidence to strongly held social identity values.
In light of the above, it is of both theoretical and practical interest to assess the boundaries of the FLe, determining how the effect is applicable to other aspects of a bilingual’s life that are personally relevant (Miller et al., Reference Miller, Solis-Barroso and Delgado2021). We capitalize on the social backdrop in the US at the present to assess the role of the FLe on resistance to persuasion in the face of political counterevidence among bilinguals, extending Kaplan et al.’s design.
3. Methods
3.1. Screening
Data were collected using Qualtrics and Prolific, and participants completed the surveys in either their L1 or L2. A questionnaire ensured participants were sufficiently liberal and political to participate. In answering the question “Do you consider yourself a political person?” answers ranged from 1 (not at all) to 5 (very much) and participants were included only if they answered with a 4 or higher. For the question “Which of the following describes your political self-identification?” participants gave responses ranging from 1 (strongly liberal) to 7 (strongly conservative) and only those who answered with a 1 or 2 were included. All participants reported living the majority of their life in the United States, speaking English as an L1, identifying as politically liberal, and having strongly held political and nonpolitical beliefs of interest.
It is important to note that the above screening was consistent with Kaplan’s design for two reasons. First, central to research on persuasion and emotion is the claim that resistance to belief change stems from being challenged ideologically, especially when one’s beliefs are integral to their identity (Ahluwalia, Reference Ahluwalia2000; Jacks & Devine, Reference Jacks and Devine2000; Pomerantz et al. Reference Pomerantz, Chaiken and Tordesillas1995). When individuals encounter evidence challenging their views, they feel emotional discomfort arising from the tension between the value they place on their convictions and the doubt introduced by conflicting data (Munro, Reference Munro2010; Zuwerink & Devine, Reference Zuwerink and Devine1996). It can be reasonably assumed, therefore, that an individual who self-reports political indifference will have weaker views than an individual who reports strong political concern. As our primary question is whether emotions arising from the challenging of cherished values can be mitigated by bilingualism, we avoid testing individuals who report weak political interest and, therefore, weak emotional connection to the stimuli.
Second, as a first step toward better understanding the relationship between belief, emotion, and persuasion among those most likely to be found on US college campuses, we examine only Liberals, who outnumber Conservatives more than 2 to 1 (Foundation for Individual Rights in Education, 2024, 2025; Pew Research Center, 2019). While Conservatives and Liberals have been shown to process emotions differently (Hibbing et al., Reference Hibbing, Smith and Alford2014; Jost et al., Reference Jost, Glaser, Kruglanski and Sulloway2003), sometimes relying on different intuitions for some judgments (Graham et al., Reference Graham, Haidt and Nosek2009; Haidt, Reference Haidt2012), a recent meta-analysis shows that political bias is bipartisan (Ditto et al., Reference Ditto, Liu, Clark, Wojcik, Chen, Grady and Zinger2019). As such, we have no a priori reason to assume that Liberals are more or less susceptible to belief change than conservatives, as resistance to identity-threatening information does not respect party lines. Though our data speak only to a subset of the population, they apply more generally to any individual holding strong beliefs, political partisanship notwithstanding (Fransen et al., Reference Fransen, Smit and Verlegh2015).
3.2. Stimuli and procedure
The tasks involved two surveys, the first containing nine political statements (e.g., “Abortion should be legal in the United States”) and 14 nonpolitical statements (e.g., “Fluoride helps to prevent tooth decay”) which were rated on a 1 (strongly disagree)–7 (strongly agree) Likert scale. Sentences did not differ in number of words, letters or Flesch reading ease. Only those participants who rated at least eight political statements and eight nonpolitical statements with a “5” or higher participated in the second survey.
The second survey repeated all political and nonpolitical statements participants had previously rated with a “5” or higher except that each statement was followed by five challenges. At the end of each challenge set, participants provided a second rating for their belief on the same Likert scale from the first survey. The difference between the first and the second rating was taken as a measure of belief change following counterevidence.
The materials for Spanish were translated and back translated by three native speakers to ensure consistency in meaning and structure. There were no significant differences in number of characters between English (M = 69.09, SD = 26.84) and Spanish (M = 82.30, SD = 32.44) or in number of words between English (M = 11.17, SD = 4.69) and Spanish (M = 12.96, SD = 5.37) (ps > .05); however, there was a significant difference in number of syllables such that Spanish stimuli (M = 29.17, SD = 11.40) had more syllables than English stimuli (M = 19.57, SD = 7.50). This is not surprising, however, as Spanish contains more syllables per word than English, though both languages share a similar information rate (Coupé et al., Reference Coupé, Oh, Dediu and Pellegrino2019).
Importantly, L2 participants did not read English translations of the stimuli in conjunction with the Spanish counterparts, as this would expose each participant to two of the same stimulus and, therefore, complicate our ability to disentangle the effects of the L2 from those of the L1.
3.3. Participants
Multiple power analyses were conducted to determine an adequate sample size. Our primary analysis used a cumulative link mixed effects model (CLMM) to improve statistical power over a traditional analysis of variance (ANOVA) used by Kaplan and colleagues, as it accounts for variance from random effects and respects ordinal data (Bates et al., Reference Bates, Mächler, Bolker and Walker2014, Christensen, 2019). Because available options for power analyses of these models (e.g., Green & MacLeod, Reference Green and MacLeod2016) require specific and somewhat arbitrary assumptions about the random effects variance structure of the data, which is difficult to estimate pre hoc, we first performed a power analysis based on a more straightforward ANOVA design from Kaplan et al. (Reference Kaplan, Gimbel and Harris2016). This was done using the Superpower package for the R scripting language (Lakens & Caldwell, Reference Lakens and Caldwell2021). Because this analysis is based on ANOVA instead of a mixed effects model, it is likely to underestimate the size of the effect we could detect. Results showed that a sample size of 50 per group (for a total N = 150) is sufficient to detect a Group × Condition interaction with effect size ηp2 = 0.065 at 80% power and alpha = 0.05 (Cohen, Reference Cohen1988), a larger sample than Kaplan and colleagues’ original sample of 40.
To further corroborate the ANOVA-based analysis, a simulation-based analysis estimated the sample size needed to detect a three-way interaction (Condition × Language × Rating Type) in a CLMM with an ordinal outcome (Rating, 1–7), targeting ≥80% power at α = 0.05. The CLMM, fitted using the ordinal package in R, included fixed effects for condition (A, B; ±0.5), language (L1, L2; ±0.5), rating type (R1, R2; ±0.5) and their interaction, with random intercepts for participants (SD = 1.0) and items (SD = 0.5). Thresholds were − 3, −2, −1, 0, 1, 2, with a logistic link. Using the simr package, we simulated a fully crossed design (50 participants, 16 items; 6,400 observations) for three interaction effect sizes: β = −0.5 (small), −1.0 (medium), −2.0 (large). Power was the proportion of 1,000 simulations per condition with p < 0.05 for the interaction, with tryCatch handling convergence failures. Results show that power was 81.6% for a small effect, 100% for medium and 100% for large, with 1,000/1,000 valid simulations per effect size. These results indicate the design is adequately powered for small effects and robust for medium/large effects. Based on further analytical approximation, increasing the sample to 60 participants could raise power to ~88% for small effects.
We thus tested 203 participants from the following groups: English-speaking monolinguals (N = 70; Mage = 32.84, SD = 9.92, range = 19–65) and two groups of bilinguals who did the tasks in either their L1 (N = 67; Mage = 34.87, SD = 10.60, range = 20–63; MAoA = 13.27, SD = 6.71) or their L2 (N = 66; Mage = 28.66, SD = 8.12, range = 18–60; MAoA = 10.41, SD = 6.70). We found a significant difference in age between groups: F(2,196) = 7.055, p = 0.001 such that the BilingualL1 Group was older than the BilingualL2 Group: t(196) = 3.691, p < 0.001; and the Monolingual Group was older than the BilingualL1 Group: t(196) = −2.495, p = 0.036. There was no difference in age of L2 acquisition between the bilinguals, and English was their first and self-reported dominant language.
Proficiency was first assessed in the BilingualL1 and L2 groups by self-report. We followed up with an adapted version of the DELE, “Diploma of Spanish as a Foreign Language,” a commonly used proficiency measure for Spanish (e.g., Duffield & White, Reference Duffield and White1999 for validation). The DELE is issued by the Ministry of Education, Culture and Sport of Spain and our version was comprised of 50 questions in three sections: a cloze test, a vocabulary test and a multiple-choice grammar test. One point is given for each correct answer and 0 for incorrect answers. The BilingualL1 Group had an average score of 34.26 (SD = 8.47, range = 18–50); the BilingualL2 Group had an average score of 38.13 (SD = 6.76, range = 16–50), both ranging from low beginner to near native. The difference in DELE proficiency scores between the two groups was not significant: t(132) = −1.60, p = 0.111.
Self-reported and directly tested proficiency correlated in the following ways: In the BilingualL1 Group, we found a positive and significant correlation between DELE scores and self-reported speaking proficiency, r(53) = .28, p < 0.03, a positive but nonsignificant correlation between DELE scores and self-reported understanding proficiency, r(53) = .30, p = 0.76, and a positive but nonsignificant correlation between DELE scores and self-reported reading and writing proficiency, r(53) = .33, p = .73. For the BilingualL2 Group, we found positive but nonsignificant correlations between DELE scores and self-reported speaking proficiency, r(48) = .14, p = 0.319, self-reported listening proficiency, r(48) = .20, p = 0.166 and self-reported reading/writing proficiency, r(48) = .06, p = 0.664.Footnote 2
While assessing proficiency in an L2 is central to studies examining grammatical processing, it is unclear what effect traditional proficiency measures should have on emotional processing, especially considering that recent meta-analyses report that language experience, including proficiency, is not always a reliable predictor on FLe tasks (Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022). This finding may stem from different types of proficiency tasks being employed across studies or an unclear understanding of the relationship between grammatical proficiency and emotional resonance in foreign language processing. Therefore, our bilinguals answered three exploratory questions on a 7-point Likert scale. Though exploratory, these questions served as first steps toward understanding the role of L2 emotional connection on belief maintenance: how closely do you identify with your second language?; how do you feel when using your second language?; and how meaningful is your second language to you? There were no differences between the BilingualL1 and BilingualL2 groups’ self-reported L2 identity scores (See Table 1).
Table 1. Descriptive statistics for bilinguals’ proficiency and emotional connection to L2

3.4. Research Questions and Predictions
The following research questions guided the present study:
RQ1: How do monolinguals treat counter evidence for nonpolitical trials compared to political ones?
RQ2: How does the L2 affect belief change after counterevidence?
RQ3: How does the L2 affect reading and response times after counterevidence?
Prediction 1: Kaplan et al. (Reference Kaplan, Gimbel and Harris2016) showed that speakers using the native language were less open to change their political views than their nonpolitical views following counterevidence. We have no reason to suspect that our native language users, whether monolingual or bilingual, will be different from those of Kaplan et al.’s original study.
Prediction 2: The existing body of FLe work shows that bilingualism, generally speaking, reduces the emotional resonance of inherently evocative stimuli. If the FLe plays a role here, we predict that the L2 will—minimally—lead to similar belief change across conditions, while the L1 will lead to lesser belief change for the political trials compared to the nonpolitical trials.
Prediction 3: Given previous research on reduced emotions in a second language context and bilingual language processing more generally, we predict that the foreign language will lead to slower reading and response times across conditions than the native language.
4. Analyses and results
4.1. Transparency and openness
All materials were borrowed from Kaplan et al. (Reference Kaplan, Gimbel and Harris2016), with analytical procedures performed on the basis of their original design. One important difference, however, is our use of a CLMM to account for random effects and ordinal data in the belief change analysis. Additionally, we performed exploratory analyses on the effect of L2 emotional connection given our additional population of bilinguals. Deidentified data and R scripts are available at https://osf.io/sc5nr/.
4.2. Belief change
According to some contemporary research on bilingualism, comparing monolinguals to bilinguals can be problematic for a number of conceptual and analytical reasons (De Houwer, Reference De Houwer2023; Rothman et al., Reference Rothman, Bayram, DeLuca, Di Pisa, Dunabeitia, Gharibi and Wulff2023). Given that our primary interest is whether bilingualism affects cognitive flexibility, and that our monolingual group was recruited in order to corroborate findings from Kaplan et al. (Reference Kaplan, Gimbel and Harris2016), we analyzed the monolinguals separately from the bilinguals to avoid increasing the signal-to-noise ratio in the modeling procedure. Importantly, analyzing the monolinguals separately can help confirm the validity of using CLMM within the bilingual groups, acting as a baseline for methodological reliability.Footnote 3
4.2.1. Monolinguals
A series of CLMMs were compared to determine the optimal random effects structure (Barr et al., Reference Barr, Levy, Scheepers and Tily2013) using the ordinal package (Christensen, Reference Christensen2024) in R. Three candidate models were evaluated: a simpler random intercepts structure (Model 1), a model with a full random slopes and intercepts (Model 2), and a further simplified slopes model (Model 3). Likelihood ratio tests were used for model comparisons.
Model 2, which incorporated full random slopes for Condition and Rating Type by Participant and a random intercept for Statement, yielded the lowest AIC (3249.47). Specifically, the likelihood ratio test comparing Model 2 with Model 1 indicated a significant improvement in model fit: χ 2 = 24.56, df = 2, p < 0.001. In addition, Model 2 performed significantly better than Model 3: χ 2 = 12.34, df = 1, p < 0.001. Based on these comparisons, Model 2 was selected as the best fitting model. The model revealed the following effects and interactions:
-
- Condition: β = 1.14, SE = 0.47, z = 2.45, p = 0.014; OR = 3.13, 95% CI [1.26, 7.78], indicating that ratings in the Political condition were 3.13 times more likely to be higher than those in the nonpolitical condition.
-
- Rating Type: β = −2.78, SE = 0.25, z = −11.32, p < 0.001; OR = 0.06, 95% CI [0.04, 0.10]), reflecting that a shift from the first to the second rating decreased the odds of a higher rating by over 90%.
-
- Condition × Rating Type interaction: β = 1.67, SE = 0.25, z = 6.549, p < 0.001; OR = 5.30, 95% CI [3.22, 8.73]. The decline in ratings from the first to the second rating was less pronounced in the Political condition compared to the nonpolitical condition (see Figure 1).

Figure 1. Predicted belief change modulated by rating and political status.
The monolingual analysis validates the use of CLMM to analyze our ordinal data, and it corroborates the results of Kaplan et al. (Reference Kaplan, Gimbel and Harris2016): Monolinguals are less cognitively flexible in the Political condition compared to the nonpolitical condition.
4.2.2. Bilinguals
A CLMM with a logit link was used to examine the effects of condition (political versus nonpolitical), rating type (second versus first) and language (L1 versus L2), along with their interactions, on ordinal ratings of belief change. Identical to the monolingual group, the analysis here included maximal random slopes for condition and rating type by participant and random intercepts for statement.
Main effects
-
- Condition: β = 1.17, SE = 0.44, z = 2.65, p = 0.008; OR = 3.22, 95% CI [1.36, 7.65]. For L1 speakers, ratings in the political condition were approximately 3.22 times more likely to be higher than those in the nonpolitical condition.
-
- Rating type: β = −2.55, SE = 0.23, z = −11.21, p < 0.001; OR = .08, CI [.05, .12]. A shift from the first to the second rating decreased the odds of a higher rating by over 92%.
Interactions
-
- Condition × Rating Type: β = 1.58, SE = 0.26, z = 6.11, p < 0.001; OR = 4.84, CI [2.92, 8.01]. The decline in ratings from the first to the second rating was less pronounced in the political condition compared to the nonpolitical condition for L1 speakers.
-
- Condition × Language: β = 1.82, SE = 0.65, z = 2.77, p = 0.005; OR = 6.16, CI [1.70, 22.24]. For L2 speakers, the effect of higher baseline ratings in the political condition was more pronounced than for the L1 speakers.
-
- Rating Type × Language: β = −1.02, SE = 0.31, z = −3.32, p < 0.001; OR = .36, CI [.20, .66]. The decline from the first to the second rating was larger for L2 speakers.
-
- Condition × Rating Type × Language: β = −2.28, SE = 0.43, z = −5.34, p < 0.001; OR = .10, CI [.04, .24]. For L2 speakers, the decline in ratings from the first to the second rating in the political condition was greater relative to L1 speakers (see Figure 2).

Figure 2. Predicted belief change modulated by rating, language and political status.
These results reveal that while the baseline effects for political condition and the shift from first to second rating are similar to the monolingual analysis, they are significantly moderated by language group. For L1 speakers, political statements increase the odds of a higher rating substantially, and the drop in belief for second ratings is substantial but partially offset in the political condition. For L2 speakers, although the main effect of language alone is negligible, the interactions reveal that the economic effect (i.e., the modulation of the first-to-second rating change by Political content) is markedly different than for L1 speakers, which we take as a meaningful foreign language effect.
4.2.3. L2 proficiency
A separate CLMM was used to examine the influence of condition, rating type, and DELEscore on belief ratings among the L2 group. To account for variability across participants and items, the model included random intercepts and random slopes for both condition and rating type at the participant and stimulus levels. The model revealed only one main effect:
-
- RatingType: β = −3.23, SE = 1.58, z = −2.048, p = 0.04; OR = .04, CI [.002, .87]. The second rating was generally lower than the first.
These results indicate that DELE proficiency does not predict belief change in the L2 group; below, we examine whether emotional connection to the L2 is important to consider.
4.3. An exploration of self-rated L2 identity and meaningfulness
To explore the effects of L2 emotional connection, we first determined whether our three subjective measures (Identity, Feeling, and Meaningfulness) could be represented by a single underlying construct for use in a CLMM. To do so, we conducted a factor analysis on the three variables to assess their underlying structure.
The Kaiser-Meyer-Olkin measure verified the sampling adequacy, and Bartlett’s test of sphericity was significant (χ 2 = 1843.95, df = 3, p < 0.001), indicating that correlations between items were sufficiently large. The analysis revealed a single factor with an eigenvalue of 2.42, explaining 71.3% of the variance. The second and third eigenvalues were substantially smaller (0.35 and 0.23, respectively), supporting a one-factor solution. All three variables loaded strongly on this factor: Identity: 0.91, Feeling: 0.82, Meaningfulness: 0.80. The three measures also demonstrated high internal consistency (Cronbach’s α = 0.88) and strong interitem correlations (average r = 0.71). Based on these results, we created a composite variable by averaging the scores from the three measures. This approach was chosen over factor scores for several reasons:
-
1. The factor loadings were relatively balanced across the three measures
-
2. All measures used the same scale (1–7)
-
3. An average score maintains the original scale, facilitating interpretation
-
4. The high internal consistency (α = 0.88) supports simple averaging as an appropriate method
Our composite variable, representing participants’ overall emotional connection to the L2, was then used as a predictor in the CLMM fitted with the Laplace approximation.
The dependent variable was modeled as a function of the fixed effects of condition (political versus nonpolitical), RatingType (First versus Second rating) and a composite measure of emotional connection (composite), along with their interactions. Participant-level variability was accounted for by including a random intercept for Participant. A comparison with alternative models indicated that including the RatingType × Composite interaction significantly improved fit relative to a model with only main effects: χ2(1) = 9.40, p = 0.002. The model showed the following main effects and interactions:
-
- Condition: (ConditionPolitical) β = 1.82, SE = 0.14, z = 12.71, p < 0.001; OR = 6.18, 95% CI [4.67, 8.19]. Political belief is more highly rated than nonpolitical belief.
-
- Rating Type: (RatingTypeSecondRating) β = −4.96, SE = 0.55, z = −9.09, p < 0.001; OR = 0.01, 95% CI [0.003, 0.03]. Second ratings are lower than first ratings.
-
- Interaction (RatingType × Composite): β = 0.29, SE = 0.10, z = 3.01, p = 0.003; OR = 1.34, 95% CI [1.11, 1.62]. Second ratings for political items increase as a function of increased L2 emotional connection (see Figure 3).

Figure 3. Predicted belief change modulated by composite score and rating type.
The main effect of Composite was not significant: β = −0.14, SE = 0.14, z = −1.03, p = 0.30; OR = 0.87, 95% CI [0.66, 1.14].
In sum, political content is robustly associated with higher ratings, with the odds of a higher rating increasing more than six-fold compared to nonpolitical content; however, second belief ratings for political conditions rise as a function of increased emotional connection to the L2. Unlike proficiency, L2 emotional connection matters: more L2 emotion, less FLe.
4.4. Reading and response times
We finally ran three reading/decision time analyses on (1) the time spent reading the statement to be rated, (2) the time spent providing the second scalar rating for the statement and (3) the time spent reading each challenge statement. The surveys were built on Qualtrics such that each statement, challenge and rating was on a separate page, giving us a reading or decision time based upon first arriving at the page and when the participant moved forward in the survey. We used these crude measures as reading and rating times.
We removed responses slower than 30 seconds, which most likely reflected task interruptions. This led to the removal of 4 trials out of an original 7,759 in the Monolingual data, 2 trials out of 7,499 in the BilingualL1 data and no trials from the 6,849 in our BilingualL2 data. “Too fast” responses were not removed, as doing so may not only be arbitrary but also less effective for survey data as it is for language processing-specific designs (Conrad et al., Reference Conrad, Couper, Tourangeau and Zhang2017; Greszki et al., Reference Greszki, Meyer and Schoen2015; Zhang & Conrad, Reference Zhang and Conrad2014).
4.5. Monolinguals
4.5.1. Time reading challenges
Linear mixed-effects models using lme4 and lmerTest packages (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) were fitted to examine the effect of statement type (Political versus nonpolitical) on challenge reading times. The model was specified with statement type as a fixed effect, with random intercepts for both participants and statement items and a random slope for statement type by participant. Model selection via likelihood ratio tests confirmed that this random effects structure provided the most parsimonious fit to the data.
The final model was fitted using maximum likelihood estimation (AIC = 19033.33, BIC = 19077.95, log-likelihood = −9509.67). Fixed effects analysis revealed that the average challenge reading time for nonpolitical statements was 4.16 seconds (SE = 0.25, 95% CI [3.67, 4.64]). For political statements, the average challenge reading time was 4.34 seconds (SE = 0.26, 95% CI [3.83, 4.85]), representing a nonsignificant increase of 0.18 seconds: β = 0.18, SE = 0.26, t(31.60) = 0.86 p = 0.40.
Variance components analysis indicated individual differences among participants (variance = 2.56, SD = 1.60) and some variability among statement items (variance = 0.18, SD = 0.43), with residual variance of 4.38 (SD = 2.09). The random slope for statement type by participant also showed variability (variance = 0.47, SD = 0.69), indicating that the effect of statement type varied across participants.
These results suggest that while participants exhibited variability in their overall challenge reading speeds and in how they responded to different statement types, there was no reliable difference in challenge reading times between Political and nonpolitical statements in this monolingual sample.
4.5.2. Time reading statements
We fitted linear mixed-effects models to the data specified with statement type (political versus nonpolitical) as a fixed effect, with random intercepts for both participants and statement items to account for by-subject and by-item variability. Model selection via likelihood ratio tests confirmed that this random effects structure provided the most parsimonious fit to the data, as more complex models with random slopes did not significantly improve model fit.
The final model was fitted using maximum likelihood estimation (AIC = 3856.95, BIC = 3880.77, log-likelihood = −1923.47). Fixed effects analysis revealed that the average reading time for nonpolitical statements was 3.12 seconds (SE = 0.31, 95% CI [2.49, 3.76]). For Political statements, the average reading time was 3 seconds (SE = 0.36, 95% CI [2.28, 3.72]), representing a nonsignificant decrease of 0.12 seconds compared to nonpolitical statements: β = −0.12, SE = 0.39, t(21.28) = −0.32, p = 0.75. Variance components analysis indicated substantial individual differences among participants (variance = 1.74, SD = 1.32) and some variability among statement items (variance = 0.71, SD = 0.84), with residual variance of 4.14 (SD = 2.03).
These results suggest that while participants exhibited considerable variability in their overall reading speeds, there was no reliable difference in reading times between political and nonpolitical statements in this monolingual sample.
4.5.3. Time rating beliefs
A linear mixed-effects model was fitted to examine the effect of statement type (political versus nonpolitical) on rating times for monolingual participants. The fixed effect of statement type was entered alongside random effects for participants and statement items. After comparing models with different random effects structures, the optimal model included random intercepts for both participants and statement items and a random slope for statement type by participant. Likelihood ratio tests confirmed that this structure provided a parsimonious fit over simpler configurations.
The final model’s estimated marginal means were 4.89 seconds for nonpolitical statements (95% CI [4.30, 5.47]) and 4.14 seconds for political statements (95% CI [3.36, 4.92]): β = −0.75, SE = 0.32, t(24.3) = −2.35, p = 0.03. Variance component analysis revealed appreciable differences among participants, with the participant intercept variance estimated at about 2.22 (SD = 1.49). Variability among statement items was lower, with a variance near 0.24 (SD = 0.49), although the random slope for statement type by participant also exhibited notable variability. This indicates that while overall rating times differ substantially across individuals, the effect of statement type varies appreciably between participants.
These results indicate that monolingual participants differentiate between statement types in terms of their rating times, with political statements being read slightly faster than nonpolitical ones, though the magnitude of this difference remains modest (see Figure 4).

Figure 4. Estimated marginal means for rating time across conditions for monolinguals.
Importantly, Kaplan et al. (Reference Kaplan, Gimbel and Harris2016) found this same effect. The authors argued that more quickly rating or reading content with which agreement was already established demonstrated that ideologically consistent information was more readily digested, in spite of evidence to revise the belief. Such is a common strategy among those resisting persuasion and is a symptom of confirmation bias (Kappes et al., Reference Kappes, Harvey, Lohrenz, Montague and Sharot2020).
4.6. Bilinguals
4.6.1. Time reading challenges
We fitted linear mixed-effects models specified with statement type (Political versus nonpolitical), language group (L1 versus L2) and their interaction as fixed effects, with random intercepts for both participants and statement items to account for by-subject and by-item variability. Model selection via likelihood ratio tests confirmed that this random effects structure provided the most parsimonious fit to the data, as more complex models with random slopes did not significantly improve model fit.
The final model was fitted using maximum likelihood estimation (AIC = 38422.57, BIC = 38485.99, log-likelihood = −19202.29). Fixed effects analysis revealed that L1 speakers spent an average of 3.78 seconds (SE = 0.24, 95% CI [3.30, 4.25]) reading nonpolitical statements and 4.10 seconds (SE = 0.28, 95% CI [3.56, 4.63]) reading Political statements. L2 speakers spent an average of 5.67 seconds (SE = 0.24, 95% CI [5.20, 6.13]) reading nonpolitical statements and 5.79 seconds (SE = 0.26, 95% CI [5.27, 6.30]) reading Political statements.
The model revealed a significant main effect of language group (β = 1.89, SE = 0.26, t = 7.36, p < 0.001), with L2 speakers taking longer to read challenges than L1 speakers. The main effect of statement type was not significant (β = 0.32, SE = 0.20, t = 1.59, p = 0.116), nor was the interaction between language group and statement type (β = −0.20, SE = 0.26, t = −0.76, p = 0.45).
Pairwise comparisons were conducted to further examine the effects within and between groups. Within language groups, the difference between nonpolitical and political statements was not statistically significant for either L1 speakers (difference = −0.32 seconds, SE = 0.20, z = −1.59, p = 0.112) or L2 speakers (difference = −0.12 seconds, SE = 0.17, z = −0.69, p = 0.49). Between language groups, L2 speakers took significantly longer than L1 speakers to read both nonpolitical challenges (difference = −1.89 seconds, SE = 0.26, z = −7.36, p < 0.001) and political challenges (difference = −1.69 seconds, SE = 0.29, z = −5.90, p < 0.001). Variance components analysis indicated substantial individual differences among participants (variance = 3.7, SD = 1.92) and some variability among statement items (variance = 0.15, SD = 0.39), with residual variance of 5.03 (SD = 2.24). These results suggest that while L2 speakers consistently took longer to process challenges than L1 speakers, the political nature of the statements did not significantly affect reading times for either language group, despite a numerical trend toward faster processing of political content.
4.6.2. Time reading statements
We fitted linear mixed-effects models specified with statement type (Political versus nonpolitical), language group (L1 versus L2) and their interaction as fixed effects, with random intercepts for both participants and statement items to account for by-subject and by-item variability. Model selection via likelihood ratio tests confirmed that this random effects structure provided the most parsimonious fit to the data, as more complex models with random slopes did not significantly improve model fit.
The final model was fitted using maximum likelihood estimation (AIC = 7906.6, BIC = 7944.7, log-likelihood = −3946.3). Fixed effects analysis revealed that L1 speakers spent an average of 3.18 seconds (SE = 0.35, 95% CI [2.49, 3.87]) reading nonpolitical statements and 3.10 seconds (SE = 0.40, 95% CI [2.31, 3.89]) reading Political statements. L2 speakers spent an average of 4.64 seconds (SE = 0.32, 95% CI [4.00, 5.28]) reading nonpolitical statements and 4.43 seconds (SE = 0.34, 95% CI [3.76, 5.10]) reading political statements.
The model revealed a significant main effect of language group (β = 1.46, SE = 0.45, t = 3.25, p < 0.01), with L2 speakers taking longer to read statements than L1 speakers. The main effect of statement type was not significant (β = −0.08, SE = 0.42, t = −0.19, p = 0.85), nor was the interaction between language group and statement type (β = −0.13, SE = 0.55, t = −0.24, p = 0.81).
Pairwise comparisons were conducted to further examine the effects within and between groups. Within language groups, the difference between nonpolitical and political statements was not statistically significant for either L1 speakers (difference = 0.08 seconds, SE = 0.43, t(53.5) = 0.19, p = 0.85) or L2 speakers (difference = 0.21 seconds, SE = 0.36, t(63.3) = 0.59, p = 0.56). Between language groups, L2 speakers took significantly longer than L1 speakers to read both nonpolitical statements (difference = 1.46 seconds, SE = 0.46, t(130) = 3.18, p < 0.01) and political statements (difference = 1.33 seconds, SE = 0.51, t(102) = 2.63, p < 0.01). Variance components analysis indicated substantial individual differences among participants (variance = 2.74, SD = 1.66) and some variability among statement items (variance = 0.83, SD = 0.91), with residual variance of 5.04 (SD = 2.24). Again, while L2 speakers consistently took longer to process statements than L1 speakers, the political nature of the statements did not significantly affect reading times for either language group.
4.6.3. Time rating beliefs
We fitted linear mixed-effects models specified with statement type (political versus nonpolitical), language group (L1 versus L2), and their interaction as fixed effects, with random intercepts for both participants and statement items to account for by-subject and by-item variability. Model selection via likelihood ratio tests confirmed that this random effects structure provided the most parsimonious fit to the data, as more complex models with random slopes did not significantly improve model fit (all p > 0.82).
The final model was fitted using maximum likelihood estimation (AIC = 8186.5, BIC = 8224.6, log-likelihood = −4086.3). Fixed effects analysis revealed that L1 speakers spent an average of 4.52 seconds (SE = 0.32, 95% CI [3.89, 5.15]) reading nonpolitical statements and 3.93 seconds (SE = 0.35, 95% CI [3.23, 4.63]) reading political statements. L2 speakers spent an average of 5.75 seconds (SE = 0.30, 95% CI [5.16, 6.35]) reading nonpolitical statements and 5.32 seconds (SE = 0.31, 95% CI [4.71, 5.93]) reading political statements.
The model revealed a significant main effect of language group (β = 1.23, SE = 0.41, t = 2.97, p < 0.01), with L2 speakers taking longer to rate statements than L1 speakers. The main effect of statement type was not significant (β = −0.59, SE = 0.36, t = −1.64, p = 0.12), nor was the interaction between language group and statement type (β = 0.16, SE = 0.47, t = 0.33, p = 0.82) (see Figure 5).

Figure 5. Estimated marginal means for rating time across conditions for bilinguals.
Pairwise comparisons were conducted to further examine the effects within and between groups. Within language groups, the difference between nonpolitical and political statements was not statistically significant for either L1 speakers (difference = 0.59 seconds, SE = 0.37, t(49.7) = 1.59, p = 0.12) or L2 speakers (difference = 0.44 seconds, SE = 0.31, t(64.0) = 1.38, p = 0.17). Between language groups, L2 speakers took significantly longer than L1 speakers to rate both nonpolitical statements (difference = 1.23 seconds, SE = 0.42, t(143) = 2.91, p < 0.01) and political statements (difference = 1.39 seconds, SE = 0.46, t(109) = 3.04, p < 0.01). Variance components analysis indicated substantial individual differences among participants (variance = 2.54, SD = 1.59) and some variability among statement items (variance = 0.55, SD = 0.74), with residual variance of 6.14 (SD = 2.48). Again, the political nature of the statements did not significantly affect reading times for either language group.
Recall that the monolingual group rated political content faster than nonpolitical content, a potential consequence of confirmation bias. Here, neither the L1 nor the L2 bilingual group showed such an effect.
5. Discussion
5.1. Belief change
Individuals rely on many strategies that delimit the effectiveness of persuasion (Knowles & Linn, Reference Knowles and Linn2004). Some models of belief maintenance predict specific behaviors as a result of conflict between already established views and new but challenging information (e.g., cognitive dissonance). These behaviors include discrediting the source of competing information, doubling down in an established view, an erosion of belief in the methods used to find information, selectively exposing oneself only to congenial information, and overconfidence in one’s own view (e.g., Burris et al., Reference Burris, Harmon-Jones and Tarpley1997; Jacks & Cameron, Reference Jacks and Cameron2003; Munro, Reference Munro2010; Zuwerink & Devine, Reference Zuwerink and Devine1996). Whatever the strategy, beliefs related to social identity are especially resistant to change (Cohen-Scali, Reference Cohen-Scali2003; Conover, Reference Conover1988; Fleming & Petty, Reference Fleming and Petty1999; Unsworth & Fielding, Reference Unsworth and Fielding2014).
Our data show that both monolinguals’ and bilingual L1 participants’ political beliefs are less amenable to change than their nonpolitical beliefs, indicating that counterevidence is most effective at changing beliefs that are not personally threatening. In this regard, we corroborated Kaplan et al. (Reference Kaplan, Gimbel and Harris2016). In contrast, our L2 data show a meaningful increase in belief change following counterevidence for both trial types. While the political condition generally elicited higher ratings than the nonpolitical condition among this group, the effect was significantly modulated by RatingType (first or second). That is, the BilingualL2 group had higher first ratings of the political trials compared to L1 and Monolingual speakers, and the effect of political condition differed between first and second ratings, with a pronounced drop after the second rating as shown by the Condition × RatingType interaction. Most importantly, the three-way interaction between Condition, Language and RatingType showed a differential effect of the political condition between first and second ratings, which was greater for L2 speakers compared to Monolinguals and BilingualL1 speakers. This suggests that our L2 speakers’ sensitivity to the political condition changes differently across rating occasions compared to the other two groups. We interpret these results as an FLe among BilingualL2 speakers, who may be exhibiting greater cognitive flexibility in the face of counterevidence than their monolingual counterparts and their bilingual peers using the L1.
Though we only tested political Liberals with strong views, the study’s design strategically targets highly political individuals to probe the role of bilingualism in modulating emotional responses to belief-challenging stimuli. Our focus on strong ideological convictions is justified by the literature showing that such beliefs trigger strong emotional resistance when confronted with conflicting evidence, a phenomenon evidenced by both self-reports and neural activation patterns (Ahluwalia, Reference Ahluwalia2000; Kaplan et al., Reference Kaplan, Gimbel and Harris2016; Munro, Reference Munro2010; Zuwerink & Devine, Reference Zuwerink and Devine1996). By excluding politically indifferent individuals, we ensure robust emotional engagement with the stimuli, which is critical for testing whether bilingualism tempers the distress of ideological threats. Additionally, our emphasis on Liberals aligns with their predominance on US college campuses, where they outnumber Conservatives by over 2:1. Despite noted differences in emotional processing across party lines (Graham et al., Reference Graham, Haidt and Nosek2009; Hibbing et al., Reference Hibbing, Smith and Alford2014), recent meta-analyses suggest bipartisan susceptibility to bias (Ditto et al., Reference Ditto, Liu, Clark, Wojcik, Chen, Grady and Zinger2019), implying our findings on Liberals may extend to any individual whose firmly held beliefs are targeted, regardless of political leanings (Fransen et al., Reference Fransen, Smit and Verlegh2015).
5.2. An exploration of L2 identity and proficiency
While proficiency is commonly tested to determine one’s grammatical competence, here we question what effect it should have on responses to a hypothetical moral quandary (FLe), where the outcome is argued to be linked primarily to emotion and not grammar. In such experiments, grammatical proficiency is more of a safety check than an independent variable, meant to ensure that participants are competent enough in the language to read and answer questions in it; it is less of a causal link between emotional responses and the language in question. Though some FLe research does in fact find a link between proficiency and responses, there is often no explicit connection drawn between the two. Recall that the original proposal put forward in FLe work for why the L2 differed in emotional judgement from the L1 on some tasks was increased psycho-emotional distance from—and not necessarily proficiency in—the non-native language.
While considerable research examines the role of emotions on individual outcomes in second language acquisition (SLA) (see Dewaele & Li, Reference Dewaele and Li2020 for a review), there was, at the time of data collection, no explicit measure of how emotionally connected an individual was to the L2. To this end, in addition to measuring proficiency explicitly with a widely used task in Spanish, we asked participants to self-report their emotional connection to their non-native language through three questions: how closely do you identify with your L2? (Identity); how much do you feel like yourself when using your L2? (Feeling); and how meaningful is your L2 to you? (Meaningfulness). The general premise of these questions is that the more an individual connects with the L2, the more meaningful it is to them, and the more meaningful it is to them, the more they feel like themselves while using it. If such is the case, the less the FLe might emerge, which would lead to less belief change across emotional conditions.
Indeed, such is what we found. Increased emotional connection to the L2 led to significantly less political belief change. While the composite measure of L2 connection did not exert a significant main effect alone, its interaction with RatingType suggests that L2 connection has a compensatory impact on the second rating. For each unit increase in L2 connection, the odds of a higher second belief rating increase by 34% relative to the first. These effects persist even when accounting for substantial participant-level variance. In other words, the more emotionally connected one is to their L2, the more they “feel” in that language and, as a result, the less they change their beliefs across conditions. Not only does the FLe show up more generally, but it also increases as emotional connection to the L2 decreases.
While there was no established task at the time of testing that assessed one’s emotional connection to a non-native language, Toivo et al. (Reference Toivo, Scheepers and DeWaele2024) later created the Reduced Emotional Resonance in LX survey to fill this gap. While our initial attempt to capture this construct pre-dates the above authors’ survey, leaving us unable to use it on our dataset, our findings corroborate those of Toivo and colleagues: systematically accounting for L2 emotional connection is an important step when assessing the FLe. Given the now established reliability of the RER-LX, we encourage future research on the FLe to use it when appropriate. Likewise, it would be a worthwhile endeavor to explore the many reasons one’s emotional connection to their L2 could vary across individuals.
5.3. Reading and response times
Across all three time-oriented analyses, one common theme stands out: The L2 group spends more time reading all statement types and their associated challenges, as well as rating their belief across conditions. On the one hand, such a result is not surprising given the costs associated with L2 processing (Grüter & Rohde, Reference Grüter and Rohde2021; Hopp, Reference Hopp2016; Portin & Laine, Reference Portin and Laine2001). On the other hand, more time reading and rating also reliably led to greater belief change for the L2 group than the other two, particularly for political items.
In Kaplan et al. (Reference Kaplan, Gimbel and Harris2016), the finding that participants quickly rated or read content with which they had previously agreed was argued to mean that ideologically consistent information was more readily digested. This is a common strategy among those resisting persuasion and is analogous to confirmation bias. That the L2 may attenuate this behavior – albeit slightly – is not unimportant. Any mechanism that either enables or facilitates cognitive flexibility is useful to a society wishing to converge on shared values.
It might be argued that non-native language processing meaningfully slowed down the automatic response to dismiss conflicting views, which led to slower, perhaps more careful reading and rating times of evocative stimuli. If this is true, L2 processing also led to greater openness to change than the native language – proficiency notwithstanding. Interestingly, our participants, fairly homogenous in (high) proficiency, showed no effect of grammatical competence on belief change. While our proficiency measures suggest adequate L2 comprehension, we cannot rule out that lower confidence in L2 processing may have contributed to the FLe. Slower reading and rating times in the L2 group may reflect increased caution or uncertainty, potentially leading to more deliberate evaluation of counterevidence and greater belief change. Future studies could include self-reported confidence measures post-task to disentangle proficiency from comprehension certainty. Notwithstanding, given uniformity in reading and response patterns across the L2 group and across conditions, we assume that comprehension was sufficient for this group to understand the task. The fact that DELE scores did not interact significantly with either pre/postchallenge belief changes or timing suggests that the slowdown and flexibility were unrelated to grammatical competence. Whatever its underlying driver, slower reading and rating in the L2 also reliably led to increased belief change.
6. Conclusion
Research on the Foreign Language effect shows that bilinguals make emotional decisions differently in the L2 compared to the L1. Implications of this research are often discussed in the context of a growing bi- and multilingual world in which important decisions are affected by L2 processing; however, to date, much of the research has made use of tasks that, in one form or another, examine hypothetical moral quandaries. These include pitting utilitarian and deontological concern against one another, and/or assessing economic decisions for which few real-life implications are at stake. While informative, such results are not surrogates for the types of decisions made by the average bilingual in their personal life. Consequently, we examined whether L2 processing impacts cognitive flexibility among bilinguals reading strong counterevidence to specific sociopolitical views they are likely to encounter in their daily life.
We corroborated Kaplan et al. (Reference Kaplan, Gimbel and Harris2016) who found that individuals show bias in evaluating political views but not nonpolitical ones. We also found a generalized foreign language effect in our BilingualL2 group. The L2 led to greater belief change at the group level, as well as significant belief change even on questions of political import. Importantly, we found that grammatical proficiency was not a meaningful predictor of belief change or, therefore, cognitive flexibility. Rather, the strength of one’s emotional connection to their L2 predicted meaningful belief change. The results align with previous research on L2 emotional processing and lend credence to the growing claims that bilingualism affords a variety of meaningful benefits.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925100473.
Data availability statement
The data that support the findings of this study are openly available at https://osf.io/sc5nr/.
Acknowledgments
We acknowledge the use of Grok for the cross-checking of analysis scripts and corroboration of R model outputs.
Competing interests
The author(s) declare none.