Introduction
We are pleased by the interest that our article (Al-Hoorie, Hiver, & In’nami, Reference Al-Hoorie, Hiver and In’nami2024) has generated, and we are grateful for the scholars who have taken the time to read it and write thoughtful commentaries on it. To date, our article has received a total of nine responses, seven in Studies in Second Language Acquisition (Liu & Henry, Reference Liu and Henry2025; McClelland & Larson-Hall, Reference McClelland and Larson-Hall2025; Nagle, Reference Nagle2025; Oga-Baldwin, Reference Oga-Baldwin2024; Oga-Baldwin et al., Reference Oga-Baldwin, Al-Hoorie and Hiver2025; Papi & Teimouri, Reference Papi and Teimouri2024; Vitta et al., Reference Vitta, Leeming and Nicklin2025), one in Applied Linguistics (Henry & Liu, Reference Henry and Liu2024), and one in Vocabulary Learning and Instruction (Vitta, Reference Vitta2024) using our paper as a cautionary tale for second language (L2) vocabulary research. We find it noteworthy that most of these commentaries agree with our assessment of a validation crisis in the L2 Motivational Self System (L2MSS; Dörnyei, Reference Dörnyei2005, Reference Dörnyei, Dörnyei and Ushioda2009) and with our call to stop using the problematic measures widely used in its tradition without first addressing the various validity concerns raised in our paper and in the literature more generally. Perhaps the closest parallel in recent history to the present discussion is what came to be known as The Modern Language Journal debate that occurred around 30 years ago (Dörnyei, Reference Dörnyei1994a, Reference Dörnyei1994b; Gardner & Tremblay, Reference Gardner and Tremblay1994a, Reference Gardner and Tremblay1994b; Oxford, Reference Oxford1994; Oxford & Shearin, Reference Oxford and Shearin1994), where scholars considered the merits of a shift toward more cognitive-situated perspectives of L2 motivation.
We reiterate our position that it is time to move away from the L2MSS. Two full decades have passed and the L2MSS has not delivered its initial, explicitly stated promise, which is to inform practice (e.g., Dörnyei, Reference Dörnyei2005, p. 116; Reference Dörnyei and Ushioda2009, p. 32), hence the original justification for the move away from the socioeducational model (Gardner, Reference Gardner, Giles and Clair1979, Reference Gardner1985, Reference Gardner2010). Most of the research conducted has been correlational/observational and has failed to seriously engage with practical concerns of everyday teachers in the classroom. To meaningfully inform practice, research findings should demonstrate meaningful effect sizes and provide causal (rather than merely correlational) evidence. These findings should also yield insights that are not already intuitive or easily discoverable through typical classroom experience (Al-Hoorie et al., Reference Al-Hoorie, Hiver, Kim and De Costa2021).
Despite not satisfying these important criteria, boldly overstated claims regarding the L2MSS proliferated in the field. It has been claimed that it stimulated more qualitative research than the socioeducational model (Boo et al., Reference Boo, Dörnyei and Ryan2015), suggesting that notions of intercultural relations and the affective potential of identifying with another linguistic community were limiting the scope of qualitative investigation. It has also been claimed that it could explain “an exceptionally high figure” of the variance of L2 motivation (Dörnyei, Reference Dörnyei, Dörnyei and Ushioda2009, p. 31), though this exceptionally high (bivariate) correlation itself signals toward possibly spurious associations (see Al-Hoorie & Hiver, Reference Al-Hoorie and Hiver2025). Claims about the validity of the model and its scales also spread unrestrained throughout the field (see Al-Hoorie, Hiver, et al., Reference Al-Hoorie, Hiver and In’nami2024). A primary factor leading to this state of affairs is the ambitious focus on the self, which caused more confusion than integrative motivation ever could (MacIntyre et al., Reference MacIntyre, Mackinnon, Clément, Dörnyei and Ushioda2009), leading some to raise the question of whether it was ultimately worth the trouble (MacIntyre, Reference MacIntyre, Al-Hoorie and Szabó2022). This theory has also fostered lazy theorizing, where every other day a new “self” is introduced into the literature without the barest nod toward the various considerations of validity or any effort to identify a more precise psychological descriptor—or to link it to one. The result has been a kind of psychological polytheism, when in reality it is one unified agentic self that manages competing internal desires and pressures (Bandura, Reference Bandura1997, p. 26).
While our critique may appear harsh to some, we argue that it is necessary to explicitly articulate the shortcomings of a dominant theory in the field—particularly its central emphasis on the self—which we maintain has ultimately misdirected the trajectory of research. The fatal attraction of this model, as McClelland and Larson-Hall (Reference McClelland and Larson-Hall2025) frame it, lies in its deceptive simplicity and apparent intuitiveness: People become what they idealize and envision. And as Oga-Baldwin (Reference Oga-Baldwin2024) rightly pointed out, an overhaul is due, “lest we send future researchers down the garden path toward disappointment and dead-ends” (p. 10).
How did we get here?
Not unlike various other strands of second language research (Kostromitina et al., Reference Kostromitina, Sudina and Baghlaf2025; Sudina, Reference Sudina2021, Reference Sudina2023b), research into the L2MSS was not preceded by any serious validation efforts. Validity was simply taken for granted with declarations that “it goes beyond logical, intellectual arguments when justifying the validity of the various future-oriented self types” (Dörnyei, Reference Dörnyei, Dörnyei and Ushioda2009, p. 15). In the first official, albeit exploratory, anthology compiling studies on this theory (Dörnyei & Ushioda, Reference Dörnyei and Ushioda2009), models were often contradictory, reliability estimates were sometimes omitted, and even well-established, validated scales for variables such as integrativeness were abridged or misapplied (see Claro, Reference Claro, Al-Hoorie and MacIntyre2020), resulting in potentially misleading findings—a critique that Dörnyei (Reference Dörnyei, Al-Hoorie and MacIntyre2020) himself later acknowledged.
In our own research, one initial spark of our skepticism for this theory was a meta-analysis (Al-Hoorie, Reference Al-Hoorie2018) conducted while the author was still a PhD student under Prof. Zoltán Dörnyei. Even though that study was a side project and not part of his PhD thesis (and was published after his graduation), the tone was modest and diplomatic for obvious reasons. That meta-analysis raised several important issues, one of which was the notable decline in effect sizes depending on the specific research design employed and variables included. For instance, the ideal L2 self had a strong correlation with intended effort (r = .61) but this magnitude dropped substantially when achievement was the outcome variable (r = .20). It dropped even further after adjusting for potential publication bias (r = .103, ns). Note that this is the bivariate correlation, and so the actual causal relationship, if any, is expected to be smaller (see Al-Hoorie & Hiver, Reference Al-Hoorie, Hiver, Al-Hoorie and Szabó2022, Reference Al-Hoorie and Hiver2025, for a discussion).
Another interesting observation was that the relationship between intended effort and achievement was non-significant (r = .116), which casts doubt on the motivation → behavior → outcome chain that was a central argument for the theory (Dörnyei, Reference Dörnyei2005, p. 73). This is to be expected because of the broad and vague nature of this intended effort construct (“I would like to spend lots of time studying [the L2]”). Most people studying a second language would probably agree to a statement like this (“Yes, of course I would like to spend a lot of time!”), but this intention will not necessarily translate into actual time and effort expended (Hiver & Wu, Reference Hiver, Wu, Lambert, Aubrey and Bui2023). Effective intentions specify the when, where, and how of these intentions (Gollwitzer, Reference Gollwitzer1999) in order to mobilize behavior. Or as we previously argued, “it is rather mundane to find out that those who self-report looking forward to something also self-report a desire to spend a lot of time doing it” (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2020, p. 85). In other words, if the intended effort construct or the items used to measure it are unrelated to actual achievement, what justifies its continued use as a criterion variable, particularly in the absence of any compelling theoretical rationale?
Another observation in that meta-analysis was the significant drop in the magnitude of the correlation between the L2 learning experience and intended effort when comparing studies that employed a factor-analytic procedure and those that did not. It was as if studies employing a factor-analytic procedure removed problematic items (e.g., showing high cross-loadings) and cleaned up their scales, which resulted in weaker correlations. Thus, it appeared to us that both the scales used within this theory and its primary criterion variable (intended effort) are problematic. These findings suggested that there are serious questions about the L2MSS that warrant closer examination.
Our next step to further investigate these curious findings was to replicate a landmark study on this theory (You et al., Reference You, Dörnyei and Csizér2016). You et al. (Reference You, Dörnyei and Csizér2016) conducted a large-scale study and published a highly cited paper, despite the presence of signs of various methodological issues. Our replication project (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2020) was initiated (and preregistered) while the authors were still at the University of Nottingham, though the manuscript was submitted for publication only after both authors had graduated. The results of this replication again showed the same troubling pattern: The measurement model showed discriminant validity issues that required us to drop some of the scales that were integral to the original research setup and model. The model by You et al. (Reference You, Dörnyei and Csizér2016) ultimately failed to replicate.
To continue this line of research, we felt that the next logical step was to focus specifically on the measures used in this tradition and examine their validity. We therefore collected a number of scales in widespread use and conducted a scale validation study (Al-Hoorie, Hiver, et al., Reference Al-Hoorie, Hiver and In’nami2024), and indeed, the results turned out to be bleak. In fact, the results were so problematic that we opted not to publish that paper out of respect for our former supervisor’s standing and body of work. Another consideration that discouraged us from moving forward with its publication was the private backlash that our failed replication triggered. We ultimately only decided to proceed with publishing this paper after the unfortunate and untimely passing of Prof. Zoltán Dörnyei.
As the next step, we decided to zoom in on one aspect emerging from our validation study (Al-Hoorie, Hiver, et al., Reference Al-Hoorie, Hiver and In’nami2024), namely the lack of discriminant validity between the ideal L2 self and linguistic self-confidence. This was a particularly concerning finding, considering that the ideal L2 self is the most important and most celebrated aspect of the theory. Without this keystone element, the theory would basically collapse. At the same time, linguistic self-confidence and its various manifestations (e.g., self-efficacy, perceived competence, ability beliefs) are well-established constructs in different theoretical traditions. Could it be that the standard ideal L2 self scale items used in the L2MSS reflect beliefs in one’s ability rather than an actual–ideal discrepancy as originally intended? We therefore conducted a two-study project involving four samples from different parts of the world and used both quantitative and qualitative analyses (Al-Hoorie, McClelland, et al., Reference Al-Hoorie, McClelland, Resnik, Hiver and Botes2024). Our results indeed showed that response to ideal L2 self scale items reflects present beliefs in ability rather than a purported future image of oneself.
Our next paper in this line of research was inspired by a concerning observation by Oga-Baldwin (Reference Oga-Baldwin2024), detecting a close parallel between the L2 learning experience items and intrinsic motivation items. Again, intrinsic motivation has been a well-established construct for long decades, and if the L2 learning experience (commonly considered as the best predictor in the theory) were to fail the discriminant validity test, this would dismantle the theory (or what remained of it) even further. However, this time we decided not to conduct a conventional validation study, simply because the parallel between the items was painfully obvious (see Oga-Baldwin, Reference Oga-Baldwin2024, Table 1). For example, the item “I like the atmosphere of my [L2] class” is supposed to represent the L2 learning experience while “I study [L2] because I like my [L2] class” has been a standard intrinsic motivation item for decades. Instead of a conventional validation study, whose results would most likely affirm the lack of discriminant validity already repeatedly shown, we decided to adopt a different approach. We presented these items to experienced L2 motivation researchers and asked them to indicate the construct(s) under which each item falls in an open-ended manner. Not one of our respondents, who were blind to the purpose of the study, thought of the L2 learning experience as a possible construct for these items. Instead, intrinsic motivation and its elements (e.g., enjoyment, curiosity) showed up repeatedly in the responses. These findings led us to conclude that the L2 learning experience is problematic enough that it will not only fail rigorous validation analyses, but that it is ambiguous enough to also run counter to the professional intuition of active, published researchers in the field.
To make a long story short, construct validity refers to whether a given measure truly captures the construct it is intended to assess. Over several extended projects, our research has demonstrated that, in the case of both the ideal L2 self and the L2 learning experience, the measures commonly used do not tap into the intended constructs but rather align with other, pre-existing constructs. This constitutes a textbook example of a Type III error (Kimball, Reference Kimball1957)—that is, arriving at the correct answer to the wrong question (see Al-Hoorie, McClelland, et al., Reference Al-Hoorie, McClelland, Resnik, Hiver and Botes2024).
Why did we get here?
It is no secret that the L2MSS is a motley, hybridized version of two distinct theories: self-discrepancy theory and possible selves theory (Al-Hoorie & Al Shlowiy, Reference Al-Hoorie and Al Shlowiy2020). It borrows labels from self-discrepancy theory (ideal and ought) in a superficial manner, while the scales that constitute its empirical basis are heavily derived from possible selves theory and its focus on vision. The actual self, which is necessary as a benchmark against ideal and ought self-guides, is completely absent, a case of a missing person (Thorsen et al., Reference Thorsen, Henry and Cliffordson2020). As Henry and Liu (Reference Henry and Liu2024) pointed out, this forced marriage between two distinct theories is the result of a poor grasp of the original constructs: “While the crisis might be manifested in jangle fallacy problems at the measurement level, it roots lie at the construct level” (p. 744; see also Henry & Liu, Reference Henry and Liu2023).
The perpetuation of this theory despite its obvious flaws boils down to a lack of falsifiable standards to evaluate it against. It is not clear what findings would lead advocates of this theory to question it or discontinue using it. Low correlations? Non-significant correlations? Bivariate correlations emerging from observational (vs. experimental) data on their own do not usually provide an adequate evidential basis because researchers (and practitioners) are typically more interested in the underlying causal mechanism, not just the correlations (Al-Hoorie & Hiver, Reference Al-Hoorie and Hiver2025).
To demonstrate the lack of falsifiable standards in the L2MSS, consider the ought-to L2 self as a case in point. From the start, Ushioda and Dörnyei (Reference Ushioda, Dörnyei, Dörnyei and Ushioda2009) explicitly stated that “A basic hypothesis is that… [the ought-to L2 self] will serve as a powerful motivator to learn the language because of our psychological desire to reduce the discrepancy between our current and possible future selves” (p. 4). From an empirical perspective, even if we accept correlations as good evidence, the ought-to L2 self has shown hardly any correlation with L2 achievement (r = –.048, ns) (Al-Hoorie, Reference Al-Hoorie2018). It did show a significant correlation with intended effort (r = .379), but intended effort itself had little to do with achievement as mentioned above. From a theoretical perspective, as well, it is fundamentally wrong to consider such external incentives underlying the ought-to L2 self as a positive motivational force that can sustain long-term motivation (Ryan & Deci, Reference Ryan and Deci2017). One is therefore left to wonder what empirical or theoretical evidence there is to justify the continued use of the ought-to L2 self as a meaningful construct. This lack of falsifiability is not unique to the ought-to self but symptomatic of the broader theoretical tradition it belongs to.
The consequences of being here
We acknowledge that our critique may carry a heavy psychological toll for those invested in the L2MSS. Our own early work invested heavily in adopting and pursuing this tradition before a number of empirical red flags forced us to reassess, course-correct, and look beyond it from a more critical perspective. As Liu and Henry (Reference Liu and Henry2025) remark, confronting the possibility that a dominant theory may be flawed can trigger a spectrum of emotional responses—from denial and defensiveness to anxiety, frustration, and even despair. These reactions are not trivial. For many researchers, theoretical commitments are deeply tied to professional identity, and criticism of a framework can feel like criticism of oneself.
One particularly relevant psychological phenomenon here is cognitive dissonance (Festinger, Reference Festinger1957), which refers to the mental discomfort experienced when one is faced with evidence that conflicts with previously held beliefs, values, or actions. In the context of the current discussion, cognitive dissonance may arise when researchers realize that the theory they have long supported intellectually—and perhaps even built their professional careers around—may not be as empirically robust as once assumed. This dissonance can lead to significant internal tension, prompting either a defensive rejection of the critique or a difficult process of reassessment of one’s academic path.
In light of this, we can better understand the emotional tone of Papi and Teimouri’s (Reference Papi and Teimouri2024) response, as well as their use of various rhetorical strategies aimed at preserving the status quo and reassuring readers that there is hardly cause for concern. For example, they try to explain away our findings through sibling and parent constructs, and this rationalization is presented as if it has always been common sense, when these happy family relations were never hypothesized by the theory a priori. This is a representation of the shifting standards that make the L2MSS effectively unfalsifiable. Ideally, such attempts to explain unfavorable results should be advanced as hypotheses involving risky predictions (Popper, Reference Popper2014) for future research to test. The riskier the prediction, the stronger the support.
This also applies to the other points raised by Papi and Teimouri (Reference Papi and Teimouri2024). Suddenly, using a neutral midpoint in the questionnaire response scale calls into question the validity of the results. One has to wonder how fragile a theory must be so that one additional response option can have such a damaging effect. This is especially in light of psychometric research showing that the width of the response scale does not have a major impact on important indicators such as means, standard deviations, and skewness (Felix, Reference Felix2011). Somehow, our participants were also considered unique, and we had a specific sample of middle and high school students in South Korea—by all accounts a fairly typical setting for compulsory L2 learning—that for some reason the theory does not apply to, even though a major argument for the L2MSS is its global nature. One has to wonder why Korean learners would not count but learners from a next-door country like China (You & Dörnyei, Reference You and Dörnyei2016; You et al., Reference You, Dörnyei and Csizér2016) would. Inexplicably, using a collection of scales that had not previously been administered together is portrayed as problematic—even though combining such measures is precisely the aim of validation research when there are concerns about the validity of individual scales.
We consider all these as questionable research practices falling under the rubric of critiquing after the results are known (or CARKing; Nosek & Lakens, Reference Nosek and Lakens2014) that undermine the integrity of theory development by allowing researchers to retrofit explanations to convenient patterns emerging in the data. The problem is compounded when scholars are miscited as documented by Vitta et al. (Reference Vitta, Leeming and Nicklin2025) transparently and in detail.
Papi and Teimouri’s (Reference Papi and Teimouri2024) arguments also contain a number of fallacies. They lament our call to pause the use of questionable measures through appealing to authority, stating that our work “went so far as to call for the abandonment of the entire L2MSS research tradition, which happens to be the most prominent one in L2 motivation research” (p. 1). They also use a straw man fallacy as they repeatedly claim that our “conclusions are based on the common but misguided assumption that measures and constructs are synonymous” (p. 8), when in fact we went to great pains to explicitly state that “When two constructs fail discriminant validity testing, three possible interpretations arise… A final possibility is that the two constructs are unique but the measures developed and used to assess them are still unable to tap into their uniqueness” (p. 309–310), that “the actual items of this scale do not reflect any actual–ideal discrepancy as they explicitly refer to imagining oneself in the future using the language competently” (p. 322, emphasis added), that “A measure might be intended to reflect a certain construct, but in reality it might not live up to that expectation and tap into a different construct instead” (Al-Hoorie, McClelland, et al., Reference Al-Hoorie, McClelland, Resnik, Hiver and Botes2024, p. 3), and that “When researchers investigate the role of actual–ideal discrepancies in language learning but resort to the Ideal L2 Self scale as an operationalisation, they attribute their findings to the role of the ideal L2 self. In reality, however, our findings suggest that this operationalisation actually reflects the role of ability beliefs” (p. 15, emphasis added).
One analytical point in Papi and Teimouri’s (Reference Papi and Teimouri2024) response merits closer examination. They “engaged in some hair-splitting, angels-on-pinheads reanalysis” (Oga-Baldwin, Reference Oga-Baldwin2024, p. 2) to demonstrate that our data do not point toward discriminant validity problems. However, as Vitta et al. (Reference Vitta, Leeming and Nicklin2025) explain in detail, regression and semipartial correlation are inferential procedures that are not appropriate for addressing issues of discriminant validity. A semipartial correlation measures the unique contribution of a predictor variable to the outcome after removing the shared variance between that predictor and other predictors. However, it does not remove shared variance between those predictors and the outcome variable. Specialized factor analytic procedures are superior in this context because they assess the underlying structure of the constructs and account for shared variance among observed variables. Factor analytic procedures therefore offer a more rigorous test of discriminant validity than semipartial correlations. For these reasons, we reassure Papi and Teimouri that their concerns are unwarranted.
Where do we go from here?
Our critique of the L2MSS has highlighted several conceptual and methodological shortcomings, highlighting the need to reorient the field toward more conceptually grounded, empirically robust, and socially attuned frameworks. In this spirit, we offer several directions for future research that can help bolster the field of L2 motivation and restore its credibility.
A fundamental limitation of the L2MSS is its overemphasis on the individual self as the primary unit of analysis. The model portrays motivation as a largely internal phenomenon, neglecting the relational, social, and political structures that shape learners’ trajectories. Motivation does not reside solely in mental representations of future selves; it is co-constructed in interaction with peers, teachers, institutions, and broader sociocultural forces. Future theories must address this imbalance by re-integrating the social embeddedness of motivation and acknowledging, for instance, how community belonging, linguistic ideologies, and power asymmetries shape what learners imagine and strive for.
The L2MSS also offers a static and oversimplified view of identity. Its constructs (ideal, ought-to selves) generally assume stability and coherence. This is why the standard measurement of these constructs relies on one-shot self-report questionnaires. Yet, research in multilingualism, identity, and poststructuralist perspectives of second language acquisition has repeatedly shown that learners’ self-concepts are fluid, shifting across time and contexts (Darvin & Norton, Reference Darvin and Norton2023). Future theories should embrace more dynamic conceptions of identity that can account for intra-individual variation, identity negotiation, and the temporality of motivation. Complex dynamic systems theory (Larsen-Freeman & Cameron, Reference Larsen-Freeman and Cameron2008) offers one promising avenue here, treating motivation as a process emergent from non-linear interactions over time.
Another, related direction involves viewing motivation not as a mental state or future projection, but as something that emerges within specific learning practices. This practice-based perspective understands motivation as embedded in activity—what learners do, how they engage, and how meaning and affect emerge through learning activity in context. Such a shift would open the door to studying motivation as something observable and analyzable in real time, in contrast to abstract and atemporal self-report measures.
The field must also do more to integrate robust theoretical insights from neighboring disciplines (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2022; Hiver et al., Reference Hiver, Al-Hoorie and Larsen-Freeman2022), particularly those with a stronger track record of empirical validation. Self-Determination Theory (Ryan & Deci, Reference Ryan and Deci2017), for example, offers a well-supported account of how autonomy, competence, and relatedness interact to shape motivation. Likewise, insights from the learning sciences, sociocultural theory, and sociology of education can enrich our understanding of motivational dynamics. This kind of transdisciplinary integration would ensure theoretical precision while helping the field avoid the circularity and ambiguity that have plagued the L2MSS.
Furthermore, the L2MSS was formulated in a pre-digital learning era. Since then, the widespread emergence of mobile apps, online language communities, AI-mediated learning, and personalized digital tools has fundamentally transformed the language learning landscape. These new environments not only shape learners’ goals and experiences but also introduce novel motivational dynamics, such as algorithmic feedback, virtual identities, and self- or goal-tracking. A contemporary theory of motivation must account for how such digital technologies mediate engagement, identity, and persistence (see, e.g., Dao, Reference Dao2024).
A final area of concern involves methodology. In terms of measurement, even basic validation procedures have not been systematically implemented, let alone more advanced ones like invariance testing (Nagle, Reference Nagle2025). In fact, when we implemented invariance testing in our replication (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2020), we were questioned by a reviewer about this procedure and its meaning and value. We hope that Nagle’s (Reference Nagle2025) response stimulates more awareness and interest in useful procedures such as invariance testing (for a primer, see Sudina, Reference Sudina2023a) and mediation analysis (for an introduction, see Jia & Hui, Reference Jia and Hui2025). In terms of research design, much of the L2MSS literature has relied on observational, cross-sectional questionnaire data. Moving forward, researchers must adopt more innovative and rigorous methodological approaches that uncover the underlying causal mechanisms. These may include longitudinal case studies, digital ethnography, multimodal analysis, and participatory research methods that can better capture the complexity and dynamism of motivational processes in real-world settings.
Conclusion
In the title, we described the past two decades of research as “lost” because the stated aim of introducing the L2MSS and moving away from the socioeducational tradition was to inform practice. This clearly has not materialized. Instead, the L2MSS appears to have encouraged a lax approach to validity, with new scales and constructs often introduced with little theoretical grounding.
The L2MSS was born out of intellectual debate and theoretical differences between the two L2 motivation giants, Zoltán Dörnyei and Robert Gardner. It is certainly no surprise that major figures can shape the direction of research in their fields. When a superstar scientist passes away, research into alternative, “foreign” ideas tends to flourish. As Azoulay et al. (Reference Azoulay, Fons-Rosen and Graff Zivin2019) put it, “The loss of an elite scientist central to the field appears to signal to those on the outside that the cost/benefit calculations on the avant garde ideas they might bring to the table has changed, thus encouraging them to engage” (p. 2918). The findings by Azoulay et al. (Reference Azoulay, Fons-Rosen and Graff Zivin2019) are in line with the notion that the adoption as well as the abandonment of certain research paradigms does not necessarily rely solely on evidentiary basis but also on political will (Kuhn, Reference Kuhn2012; Oga-Baldwin, Reference Oga-Baldwin2024). These issues are exacerbated by the publish-or-perish environment incentivizing risk-averse research that prioritizes speed over thorough and adequately validated findings and leads researchers to avoid challenging existing paradigms or publishing negative results.
In light of the various concerns about the measures used in the L2MSS tradition, we restate our position to suspend the use of these problematic measures, as echoed by most commentators on our article. We also call for an end to the fractionation of the self into an ever-growing catalogue of “selves” that lack philosophical, theoretical, and empirical validity.
We recognize that such a pause effectively means a pause on the L2MSS itself. Rebuilding measures that are faithful to the theory requires deconstructing the existing theory and rebuilding it from scratch. This also requires specifying testable hypotheses in advance to avoid the persistent issue of shifting standards in this tradition, where researchers easily cop out and reinterpret unfavorable results in a way that is consistent with preferred theory. Alternatively, researchers may opt for a more established paradigm that has potential for expansion (e.g., Al-Hoorie et al., Reference Al-Hoorie, Oga-Baldwin, Hiver and Vitta2025; McClelland & Larson-Hall, Reference McClelland and Larson-Hall2025; Oga-Baldwin et al., Reference Oga-Baldwin, Fryer and Larson-Hall2019).
Competing interests
The authors declare that they have no conflict of interest.