1. Introduction
Over the years, alternation studies have come to represent ‘a sizeable segment of the quantitative studies executed within construction-grammar’ (Pijpops Reference Pijpops2020: 283). Many of these have consisted in specifying the unique functional profiles of variants to show that the choice between them is not free of constraints, as was often assumed in generative approaches to grammar (see Leclercq & Morin Reference Leclercq and Morin2025 for an overview). A typical output of studies of this type has been the identification of meaning differences between the alternatives: one famous example is the ditransitive/to-dative alternation (e.g. Peter sent John a letter vs Peter sent a letter to John, Diessel Reference Diessel, Dąbrowska and Divjak2019a: 70–1), delineated in terms of a partial and holistic reading of the event for each variant, respectively.
The systematic identification of meaning differences for the establishment of constructional alternations is all the more significant as it sheds light on several principles given centre stage in Construction Grammar theorising. Firstly, meaning distinctions between alternatives are explained by the cognitive principle of No Synonymy, whereby ‘[i]f two constructions are syntactically distinct, they must be semantically or pragmatically distinct’ (Goldberg Reference Goldberg1995: 67). No Synonymy is an isomorphic pressure (Haiman Reference Haiman1980) which guarantees that alternatives are associated with different meanings, as a corollary of a more general usage-based cognitive assumption known as optimal expressivity, viz. ‘the efficiency trade-off between communicative pressures for linguistic informativeness and cognitive pressures for linguistic simplicity’ (Leclercq et al. Reference Leclercq, Morin and Pijpops2025; see also Kirby et al. Reference Kirby, Tamariz, Cornish and Smith2015). Secondly, isomorphism is completed and bolstered by the usage principle of statistical preemption, or the disposition ‘not to use a formulation if an alternative formulation with the same function is consistently witnessed’ (Boyd & Goldberg Reference Boyd and Goldberg2011: 55). Together, these two principles predict that pairs of competing constructions must exhibit a degree of meaning variation: their paradigmatic relationshipFootnote 1 of course entails some amount of interchangeability (see Ma et al.’s ‘Principle of Optionality’, Reference Ma, Van Hoey and Szmrecsanyiforthcoming), but otherwise, synonymy avoidance falls out of basic cognitive pressures to simultaneously make meaning construction maximally expressive and maximally economical.
One open question regarding interchangeability and distinction in alternations is the granularity at which these processes operate, in other words, how subtle the formal and functional differences between alternatives can be. This topic has received increasing attention in recent construction-based work on three scores.
First, from a meaning-based perspective, there have been discussions and disagreements about apparent cases of (near-)synonymous constructional alternatives. The most notable type of such cases is that of allostructions, defined as ‘(truth-)semantically equivalent but formally distinct manifestations of a more abstractly represented construction’ (Cappelle Reference Cappelle, Dufte, Fleischer and Seiler2009: 187), e.g. pick up the book [V Prt OBJ] vs pick the book up [V OBJ Prt]. Based on the definition above, several researchers have argued that allostructions directly contradict isomorphism (e.g. Laporte et al. Reference Laporte, Larsson and Goulart2021). Leclercq & Morin (Reference Leclercq and Morin2023), however, offer the reminder that synonymy avoidance as conceived by Goldberg includes pragmatic distinctions (the avoidance of ‘P-synonymy’), and argue that an explicit interface between semantic and pragmatic information (Leclercq Reference Leclercq2020) is a desirable revision for models of the meaning of constructions (see also Leclercq & Morin Reference Leclercq and Morin2025).
Second, some authors have argued that synonymy avoidance is incompatible with the existence of sociolinguistic variables, so-called ‘different ways of saying the same thing’ (Labov Reference Labov1972: 323; see Uhrig’s Reference Uhrig2015 principle of no variation; Ma et al. Reference Ma, Van Hoey and Szmrecsanyiforthcoming). Responding to these concerns, Morin (Reference Morin2023, Reference Morinforthcoming a,Reference Morinb) argues for the importance of incorporating social information in models of constructional meaning, in order to more explicitly capture differences in ‘stylistic aspects, such as register’ (Goldberg Reference Goldberg1995: 67; Levshina & Lorenz Reference Levshina and Lorenz2022), as well as sociocultural categories, including but not limited to region (Morin et al. Reference Morin, Desagulier and Grieve2024). Overall, Leclercq’s and Morin’s proposals for solving the aforementioned issues have since converged into a model of the meaning of constructions that explicitly identifies semantic, pragmatic and social dimensions. This is reflected in what they call the Principle of No Equivalence (Leclercq & Morin Reference Morin2023: 10), quoted below.
The Principle of No Equivalence: If two competing constructions differ in form (i.e. phonologically, morpho-syntactically or even orthographically), they must be semantically, pragmatically and/or socially distinct.
Third, from a form-based perspective, recent research has examined pairs of expressions that are formally very similar and where one is traditionally considered to be a contraction of the other, investigating whether these are indeed distinct constructions, and therefore alternatives. Examples of this are contractions such as gonna, wanna, gotta, which have been shown to differ from their source forms with correlated differences in social meaning (associations of informality) (Lorenz Reference Lorenz, Sommerer and Smirnova2020; Levshina & Lorenz Reference Levshina and Lorenz2022), and even, in the case of gonna, differences in semantic meaning (Lorenz Reference Lorenz2013a,Reference Lorenz, Hasselgård, Ebeling and Ebelingb,Reference Lorenzc). It has also been suggested that these contractions follow a general reduction pattern of a more abstract V-to-Vinf construction, where lateral relations mutually reinforce each other through analogy (Lorenz & Tizón-Couto Reference Lorenz and Tizón-Couto2024).
This article attempts to unify these perspectives through an investigation of the modal constructions going to and gonna. Footnote 2 Using data from the British English part of the LiveJournal corpus (Speelman & Glynn Reference Speelman and Glynn2012), we employ a combination of collocation-based and profile-based methods to examine potential differences in meaning between these forms, and whether these are primarily semantic, pragmatic and/or social. First, distinctive collexeme analysis (Gries & Stefanowitsch Reference Gries and Stefanowitsch2004) is used to assess which lexical verbs are most associated with either modal construction. Second, a Behavioural Profile approach in the sense of Gries (Reference Gries2010: 325ff.)Footnote 3 is used to (i) compare a range of functional features assumed to be relevant for the production of these forms, and (ii) determine which of the three dimensions of constructional meaning is most important.
The goal of this study is to test the following two research questions:
-
1. First, if gonna is becoming emancipated from going to, as suggested by Lorenz (Reference Lorenz, Hasselgård, Ebeling and Ebeling2013b) (see section 2.2 for discussion), we would expect to find significant differences between the two variants in terms of their semantico-pragmatic meaning.
-
2. Second, we are interested in testing whether the social meaning (informality feature) associated with gonna is an intralinguistic rather than an extralinguistic predictor, that is, as a part of constructional meaning, and not just ‘different ways of saying the same thing’ (Labov Reference Labov1972: 323). This is achieved by testing the impact of formality within one larger register, rather than comparing different registers.
We here underline the role of social meaning as an intralinguistic predictor: this means that speakers not only use the full and contracted forms in different situational contexts, but that the associations of (in)formality are part of the variants’ meaning, regardless of the situational context. To assess this, we attempt to tease out register differences within a single text type (informal personal online narratives). Overall, the confirmation of either question 1 or 2 would lend support to the principle of No Equivalence. Moreover, while the present study is entirely synchronic, it may shed some light on the question of whether competing variants are bound to change (Fleischmann Reference Fleischman1982), or whether they can achieve ‘allostructional stability’ (Nesset & Janda Reference Nesset and Janda2023): in the former case, we would expect there to be incipient differences in semantico-pragmatic meaning, in the latter, the association with social meaning may restrict such a development. In this regard, we explore how social meaning may fit in with other proposed factors of conventionalisation, such as frequency, coalescence, semantic reanalysis and favourable phonology (Lorenz & Tizón-Couto Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020).
The rest of the article is structured as follows. In section 2, we present in more detail the principle of No Equivalence and the three-dimensional model of constructional meaning that it relies on, as well as processing factors on constructional use that are beyond its scope. We use this model to revisit previous studies of contracted modal constructions and the future alternation, especially the meaning-based arguments put forward to consider the former distinct constructions. In section 3, we describe the design and methods used for our study of the relationship between going to and gonna, including a description of the corpus, the operationalisation of the variables, the blind annotation framework and the statistical models chosen. In section 4, we report the results of the corpus study, especially the significance of ‘informal register’ as a predictor for the choice of gonna over going to, to the exclusion of other meaning variables. In section 5, we discuss and conclude on the relevance of these results for the validation of the hypothesis that gonna is a distinct construction from going to on the one hand, and for the validation of social meaning as a primary source of equivalence avoidance on the other hand, as well as some possible developments of these issues in future Construction Grammar research.
2. A three-dimensional model of meaning for alternations
2.1. The principle of No Equivalence
As mentioned earlier, the principle of No Equivalence is based on three dimensions of constructional meaning (Leclercq & Morin Reference Morin2023, Reference Leclercq and Morin2025). These have been explicitly defined in response to several issues raised about the internal consistency of Construction Grammar tenets and the explanatory power of the theory when faced with empirical facts. The following paragraphs briefly unpack each dimension for the sake of clarity in later sections.
Firstly, No Equivalence predicts that two alternative constructions can exhibit semantic differences. This is the most typical kind of difference already modelled in Goldberg’s (Reference Goldberg1995) principle of No Synonymy. More specifically, semantic meaning in this framework is identified as truth-conditional or propositional meaning. For example, in the modal domain, Leclercq & Morin (Reference Morin2023) suggest that the modal constructions can and could exhibit at least two major semantic differences: one relating to temporal location (‘present’ time sphere vs ‘past’ time sphere), and the other relating to the type of possibility (‘root’ vs ‘epistemic’, Coates Reference Coates, Bybee and Fleischman1995).
Secondly, No Equivalence predicts that two alternative constructions can exhibit differences in pragmatic meaning. These were also initially modelled by the principle of No Synonymy, but their scope was rather broad to the extent that they included any aspect of non-truth-conditional meaning, such as ‘particulars of information structure, including topic and focus, and additionally stylistic aspects of the construction such as register’ (Goldberg Reference Goldberg1995: 67). For the sake of explicitly highlighting stylistic aspects as ‘social meaning’ (Morin Reference Morin2023) discussed below, the present framework narrows down pragmatic meaning to ‘utterance-focused features of meaning such as presuppositions, implicatures, illocutionary acts, speaker attitudes and information structure’ (Leclercq et al. Reference Leclercq, Morin and Pijpops2025). One notable example in the modal domain is the alternation between can/could and be able to, the latter conveying a specific implicature of actualisation (Leclercq & Depraetere Reference Leclercq and Depraetere2022) despite a shared semantic meaning of modal possibility.
Thirdly and finally, No Equivalence predicts that two alternative constructions can exhibit differences in social meaning. This is defined as ‘information drawn about speakers and their communicative practices’ (Leclercq et al. Reference Leclercq, Morin and Pijpops2025; Hall-Lew et al. Reference Hall-Lew, Moore and Podesva2021), and it includes at least two important subtypes according to Morin (Reference Morin2023, Reference Morinforthcoming a,Reference Morinb). The first subtype, ‘interactional’ social meaning, focuses on the communicative practices themselves. It includes a particularly notable source of synonymy avoidance cited by Goldberg (Reference Goldberg1995) and under consideration in this study, that of ‘register’ (Biber & Conrad Reference Biber and Conrad2009). Register in Biber & Conrad’s sense is defined as a functional relationship between linguistic features and situational context, and since it is a theoretical construct that cannot be directly observed (Li et al. Reference Li, Dunn and Nini2023: 427), it can be operationalised either in terms of the situational context or the linguistic features associated with that context. Variationist corpus studies have typically chosen the former approach, operationalising formal and informal registers as different (sub)corpora (Engel & Szmrecsanyi Reference Engel and Szmrecsanyi2022; Li et al. Reference Li, Dunn and Nini2023; Szmrecsanyi & Engel Reference Szmrecsanyi and Engel2023). In this study, however, we opt for the latter approach and operationalise register in terms of the linguistic features in the context surrounding each instance of the two modal constructions under scrutiny (section 3). One notable example of a constructional alternation driven by register is the variable realisation of complementisers (e.g. I think that/Ø Taylor Swift is a great singer, Gadanidis et al. Reference Gadanidis, Kiss, Konnelly, Pabst, Schlegl, Umbal and Tagliamonte2021), which had initially been levelled against the principle of no synonymy (Kinsey et al. Reference Kinsey, Jaeger and Wasow2007). The second subtype, ‘sociocultural’ social meaning, encompasses the range of supralocal and sublocal social categories that have been shown to stratify inherent variability in language use (Eckert Reference Eckert2012), e.g. the regional and ethnic meaning of double modals in American English (Morin & Grieve Reference Morin and Grieve2024), but is not the primary focus of this study.
Importantly, Leclercq et al. (Reference Leclercq, Morin and Pijpops2025) posit No Equivalence as a principle applying to entrenched and conventional constructions, i.e. stored units of the mind and normalised units in the community (Ellis et al. Reference Ellis, Frey, Jalkanen, Römer and Schulze2009; Blumenthal-Dramé Reference Blumenthal-Dramé2012; Divjak & Caldwell-Harris Reference Divjak, Caldwell-Harris, Dąbrowska and Divjak2015; Schmid Reference Schmid2020). It is thus compatible with external pressures of language use, which are modelled beyond its scope and result in what is called ‘good-enough production’ in language (Goldberg & Ferreira Reference Goldberg and Ferreira2022). Perhaps the most notable of these pressures include factors of communicative efficiency (Levshina Reference Levshina2022) and language processing speed (Christiansen & Chater Reference Christiansen and Chater2016), such as in the form of the Complexity Principle (Rohdenburg Reference Rohdenburg1996) or information density optimisation (Levy & Jaeger Reference Levy, Jaeger, Schlökopf, Platt and Hoffman2007) in the grammatical domain. Factors of this kind have been shown to be relevant for understanding the relationship between contracted forms of modal constructions and their full forms: for example, Levshina & Lorenz (Reference Levshina and Lorenz2022) find an effect of meaning predictability on the use of shorter and less effortful constructions such as wanna compared to want to.Footnote 4 However, they find that this effect is limited, and that stylistic and social differences are more important for understanding the alternation. We return to this issue in the following sections, but for the moment we want to underline that processing factors are complementary rather than contradictory with respect to No Equivalence. In fact, in this paper, we operationalise a range of processing factors to get as full a picture as possible of explanatory factors for variation between going to and gonna.
2.2. Meaning differences in modal alternations: going to and gonna
Having delineated a three-dimensional model of constructional meaning from external pressures of language use, we now turn to briefly reviewing previous research on alternations that are most relevant for the present study, and we revisit them in light of our classification to get a clearer sense of the types of meaning differences at play in their variation, starting with research on contracted modal forms. As introduced earlier, a notable research question in Construction Grammar studies has been whether such forms can be considered distinct constructions. A noteworthy set of case studies has focused on contracted forms of modal and semi-modal constructions, especially in the context of their diachronic emergence (Krug Reference Krug2000, Reference Krug, Bybee and Hopper2001; Lorenz Reference Lorenz, Sommerer and Smirnova2020; Azorin Reference Azorin, Cameron, Agnès and Alessandro2025), reassessing the general claim that contractions are simply ‘colloquial pronunciation variants that are semantically identical to their full form’ (Daugs Reference Daugs2022: 221). Nesselhauf (Reference Nesselhauf and Hundt2014), for instance, uses corpus data to argue that the reduced form ’ll has evolved into an independent construction from the full form will, based on the specific meaning of ‘spontaneous intention’. Similarly, Flach (Reference Flach2021) shows that will and ’ll have very different adverbial collocations, suggesting that this meaning divergence is driven by collocational preferences. According to Daugs (Reference Daugs, Hilpert, Cappelle and Depraetere2021), the contracted forms can’t, won’t and ’d exhibit preferences for specific modality types across the spectrum of ‘epistemic’, ‘deontic’ and ‘dynamic’, making them distinct constructions from their full forms cannot, will not and would. The case of contracted modal constructions joins a seemingly more general pattern of abbreviation in constructions, which correlate with spoken and informal linguistic registers (Rickford et al. Reference Rickford, Wasow, Mendoza-Denton and Espinoza1995; Lorenz Reference Lorenz2013a,Reference Lorenz, Hasselgård, Ebeling and Ebelingb,Reference Lorenzc; Hilpert et al. Reference Hilpert, Saavedra and Rains2021, Reference Hilpert, Saavedra and Rains2023a,Reference Hilpert, Saavedra and Rainsb; Lorenz Reference Lorenz, Sommerer and Smirnova2020; Lorenz & Tizón-Couto Reference Lorenz and Tizón-Couto2024; Marttinen Larsson Reference Marttinen Larsson2024), and which, following usage-based cognitive principles, would entrench and conventionalise this register-sensitive information in our linguistic associations (Schmid Reference Schmid2020).
In the case of going to and gonna, Berglund (Reference Berglund2005: 166) states that the two forms are distinguished by non-standardness (in fact an intersection of the social categories of age/sex and register), as gonna is almost exclusively found in spoken data and in speech-like contexts in writing and is preferred by younger male speakers. She argues, however, ‘that [the pattern according to which] going to and gonna are indeed variant forms of one expression is obvious when the collocational patterns are examined’ (Reference Berglund2005: 166), though it must be pointed out that the collocational profiles she reports are raw frequencies. Alternatively, Lorenz (Reference Lorenz2013c: 45–6) claims that, in spoken American English, be gonna is undergoing a process towards lexical independence from going to on the basis of the contracted forms’ increasing frequency in spoken language, the levelling of register differences, and an incipient semantic divergence between the two forms where gonna appears to convey ‘predictive’ meaning significantly more often than going to. This emancipation of gonna would mean that the variants are separating into two constructions.
These somewhat conflicting accounts present two hypotheses that can be tested empirically. If going to and gonna are variants of the same construction, as suggested by Berglund (Reference Berglund2005), we would expect their collocational profiles to largely overlap, while there would be no significant differences in terms of their meaning. Alternatively, if going to and gonna are in the process of becoming different constructions, as put forward by Lorenz (Reference Lorenz2013a,Reference Lorenz, Hasselgård, Ebeling and Ebelingb,Reference Lorenzc), we expect to see contrasting collocational profiles in addition to divergent meanings. Given that these two expressions can be considered part of a family of constructions (Bergs Reference Bergs2010), we will in the following consider the contrast between the two most salient members of the family, namely will and going to.
2.3. The English future alternation: will and going to
As we saw in section 2.1, the three-dimensional model of constructional meaning includes semantic, pragmatic and social subtypes. Starting with the semantic level, the only features of will and going to that can be said to convey truth-conditional meaning relate to temporal reference (present vs future). Uses with present time reference can in turn be subdivided into modal (epistemic and deontic) and temporal (habitual and generic) meanings, while uses with future time reference are always temporal, but can be said to communicate intentions/plans or predictions.
The latter distinction is a notable one in the literature on tense and it has long been assumed that will is mostly used to express a ‘prediction-based future’ (Dahl Reference Dahl and Dahl2000: 310) referring neutrally to the future (Wekker Reference Wekker1976: 39; Comrie Reference Comrie1985: 44), while going to typically expresses an ‘intention-based future’ (Dahl Reference Dahl and Dahl2000: 310) or ‘future fulfilment of present intention’ (Leech Reference Leech1971: 54; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 214).Footnote 5 The difference between prediction and intention is illustrated by the examples in (1a) and (1b), respectively.
Corpus-based research has confirmed this assumption, by empirically showing that the collexemes of will generally exhibit low degrees of agentivity and dynamicity, as opposed to the collexemes of going to (Gries & Stefanowitsch Reference Gries and Stefanowitsch2004), while the use of first person as an index of intention and willingness has continued to increase with the form going to in British and Canadian English (Tagliamonte et al. Reference Tagliamonte, Durham and Smith2014; Denis & Tagliamonte Reference Denis and Tagliamonte2018). Berglund & Williams (Reference Berglund, Williams and Facchinetti2007: 116) find that ‘going to is found in a more predominantly intentional use than gonna’ in their sample from the British National Corpus, suggesting ongoing emancipation of the contracted form in British English.
Other features are clearly not truth-functionally distinct, as they express subtle differences in degree of temporal proximity and speaker certainty, or separate presuppositions. Starting with temporal proximity, going to is said to be more common than will ‘if the event referred to is in the near or immediate future’ (Wekker Reference Wekker1976: 132) and to have ‘a stronger focus on matrix time’ (Bergs Reference Bergs2010: 218). In contrast, will ‘shows a lesser degree of immediacy’ (Huddleston & Pullum et al. Reference Huddleston and Geoffrey2002: 211). Corpus studies on the English future alternation have not provided clear answers to the role of temporal proximity. Torres Cacoullos & Walker (Reference Cacoullos, Torres and Walker2009: 331) provide a fine-grained temporal annotation, but do not find a straightforward linear correlation between degree of temporal proximity and choice of future marker. Denis & Tagliamonte (Reference Denis and Tagliamonte2018: 416) find that distal temporal reference (occurring farther than a day into the future) favours going to, while Glynn & Mikkelsen’s (Reference Glynn and Mikkelsen2024: 13) results show that this context favours will. According to Szmrecsanyi & Engel (Reference Szmrecsanyi and Engel2023: 99), hodiernal temporal reference (taking place on the same day) is a significant predictor of going to, but only in certain registers.
As for speaker certainty, going to is often considered to convey an assumption that is taken for granted as part of future reality rather than a judgment about the future (Joos Reference Joos1964: 22), suggesting that it expresses a higher degree of certainty than does will. Similarly, Gries & Stefanowitsch (Reference Gries and Stefanowitsch2004: 113–14) note that in English students’ grammars, going to expresses a greater certainty on the part of the speaker than will when talking about future events not involving oneself. However, this is contradicted by Leech’s (Reference Leech1971: 65) proposed hierarchy of certainty in future expressions, with the present futurate > will and shall > going to representing an increase from more to less certain, thus placing will above going to in the certainty hierarchy. To our knowledge, the only corpus study of the English future alternation that has tackled the issue of speaker certainty has yielded inconclusive results (Glynn & Mikkelsen Reference Glynn and Mikkelsen2024: 13).
Overall, the factors of temporal proximity and speaker certainty can be seen as related, or even as superficial manifestations of the same underlying difference according to which going to is described as expressing ‘future culmination of present cause’ (Leech Reference Leech1971: 54, Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 214) or ‘the implication that all conditions for the future event have been met’ (Wekker Reference Wekker1976: 127), while will is said to be dependent on future contingencies (Binnick Reference Binnick, Peranteau, Levi and Phares1972: 3). Thus, while sentence (2a) is seen as ‘elliptical’ unless accompanied by an if-clause expressing a contingency, sentence (2b) is interpreted as unaffected by any such contingencies:
Within a relevance theoretic account, will and going to impose future and present processing contexts respectively (Haegeman Reference Haegeman1989; Nicolle Reference Nicolle1997). From the point of view of cognitive grammar, Brisard (Reference Brisard2001: 252) schematically characterises will as ‘non-given and non-present’ and going to as ‘non-given and present’. In his analysis, ‘givenness’ conveys that the information is construed as both familiar and shared, while the notion of ‘presence’ conveys the idea that the statement can be checked through perception or other means of verification. Similarly, Celle & Lansari (Reference Celle and Lansari2009: 107–8) describe will as detached from the present of the utterance and going to as portraying the future event as intrinsically linked to the present of the utterance.
Finally, the social meaning level of constructional distinction would capture the widely shared assumption that going to is more frequent in informal registers, while will is more frequent in formal registers (Wekker Reference Wekker1976; Berglund Reference Berglund2005; Engel & Szmrecsanyi Reference Engel and Szmrecsanyi2022; Szmrecsanyi & Engel Reference Szmrecsanyi and Engel2023). However, and importantly for the wider implications of this article, it is unclear whether this register variation is considered to be an internal part of the meaning of the alternatives. Notably, in the two most recent studies of this list, register is posited as an external constraint on variation between the forms, in keeping with the operationalisation of the sociolinguistic variable as a set of truth-conditional (semantic) equivalents (‘different ways of saying the same thing’). A crucial point in this article is to consider a different perspective, namely, that register should also be conceived as part of the experiential meaning of constructions (Leclercq & Morin Reference Leclercq and Morin2025) and, therefore, that it is a distinct source of isomorphism and constructionhood.
Finally, regarding potential processing factors affecting the use of either variant, it has been shown that subordinate clauses and contexts of interrogation and negation (i.e. syntactically complex environments) favour the longer form going to, while contexts of apodosis favour the shorter will (Szmrecsanyi Reference Szmrecsanyi2003; Torres Cacoullos & Walker Reference Cacoullos, Torres and Walker2009; Denis & Tagliamonte Reference Denis and Tagliamonte2018; Mikkelsen & Hartmann Reference Mikkelsen, Hartmann, Flach and Hilpert2022). In all these studies, however, the full and contracted forms are treated together. Once ’ll and gonna are treated as separate from will and going to, the overall picture changes: negation and interrogation are associated with the shorter form gonna in British English, while the preference for going to and gonna in contexts of subordination may be better explained by the strong association between main clause and the short form ’ll (Mikkelsen & Hartmann Reference Hartmann and Mikkelsen2024).
3. Data and methods
Our empirical study is carried out on a dataset extracted from the British part of the LiveJournal Corpus (Speelman & Glynn Reference Speelman and Glynn2012), which consists of informal personal online narratives spanning a ten-year period (2002–12), contains over 10 million words, and is subdivided into 2,145 blogs, representing different speakers (blog authors). This corpus was preferred over so-called ‘balanced’ corpora because ‘no corpus can ever hope to be representative of a language’ (Glynn Reference Glynn, Glynn and Fischer2010: 11). Moreover, it is precisely the focused nature of this corpus, restricted to a single textual genre, which allows us to test the independence of social meaning from register understood as situational variety.
All instances of going to and gonna following the verb be (as shown by Azorin Reference Azorin, Cameron, Agnès and Alessandro2025, copula-deletion is distinctly associated with gonna, and for this reason such cases were omitted) were extracted with the lexical verb (the infinitival complement), resulting in a sample of 8,331 occurrences. This full sample was submitted to a distinctive collexeme analysis (Gries & Stefanowitsch Reference Gries and Stefanowitsch2004) using the R-script Coll.Analysis 4.1 (Gries Reference Gries2024). A subsample of 500 occurrences was then randomly selected and manually annotated for a series of usage features described below. The relative impact of each of these features on the choice of the outcome (going to vs gonna), as well as potential interactions, were tested with binary mixed-effects logistic regression using the lme4 package (Bates et al. 2015). Regression modelling can be described as a family of techniques that are used to detect correlations between dependent variables and independent or explanatory variables. However, given that these explanatory variables are merely operationalisations of hypothetical mental categories, we can say that regression modelling is used to infer, rather than determine, causal relationships between these mental categories and linguistic forms. We use the term Behavioural Profile approach in the sense of Gries (Reference Gries2010: 325ff.) to refer to the pairing of manual annotation and multifactorial statistics commonly employed in alternation studies where the outcome or predicted dependent variable is the grammatical choice of the speaker, and the independent variables used to predict that choice are a set of hypotheses (Denis & Tagliamonte Reference Denis and Tagliamonte2018; Laporte et al. Reference Laporte, Larsson and Goulart2021; Mikkelsen & Hartmann Reference Mikkelsen, Hartmann, Flach and Hilpert2022; Tizón-Couto Reference Tizón-Couto2022; Nesset & Janda Reference Nesset and Janda2023; Szmrecsanyi & Engel Reference Szmrecsanyi and Engel2023). In this study, the independent or explanatory variables correspond to the three meaning dimensions outlined above, as well as processing factors further described below. In addition to these fixed effects (the dependent and independent variables), mixed-effects modelling also covers random effects designed to capture variables that are open-ended and unbalanced. Specifically, lexical verb is included as a random variable to measure potential lexical effects on the choice of the variant.
Yet other factors that could potentially impact the results of the study were controlled in other ways. Since corpora are naturally skewed with regards to the number of words per speaker, interspeaker variation (idiolect) was controlled for by extracting only one token per speaker per construction. It is important to keep in mind that going to and gonna are unevenly distributed across the corpus: there are 6,373 instances of going to used by 1,082 different speakers, for 1,958 instances of gonna used by 502 different speakers. From this follows that the present study ignores style in the sense of Labov (Reference Labov1972), i.e. intraspeaker variation. Social metadata such as age and region within the UK is also not available for the corpus and thus not included. During the annotation process, the two forms (going to/gonna) were hidden from the annotator to minimise the effects of annotator bias (see table 1). The fully annotated corpus data is made available in a repository referred to in the Appendix. We now turn to describing the annotation process of the semantic, pragmatic, social and processing variables.
Table 1. Hidden items with context to left and right

First, we operationalised semantic meaning as a single variable (in initial caps) with several values or levels (in inverted commas): Communicative Function (‘future intention’, ‘future prediction’, ‘modal’, ‘habitual/generic’). Due to the scarcity of epistemic/deontic modal and habitual/generic readings in the forms found in the corpus, these values were combined. Examples of an intention (3), a prediction (4), a (deontic) modal (5) and a generic (6) are provided below (blog pseudonym in parenthesis).
Second, we operationalised pragmatic meaning in three different variables:
-
(i) Temporal Proximity (‘proximal’ vs ‘distal’ vs ‘non-future’)
-
(ii) Speaker Certainty (‘strong’ vs ‘weak’ vs ‘non-committing’)
-
(iii) Contingency (‘present’ vs ‘future’ vs ‘non-contingent’)
Examples (7) and (8) illustrate the difference between ‘proximal’ and ‘distal’, (9) and (10) exemplify ‘strong’ and ‘weak’ certainty, while (11) and (12) provide examples of ‘present’ and ‘future’ contingencies. In (11) the event (buy a new coat) is presented as a future consequence of present circumstance (Its so cold that), and thus inevitable, while in (12) the event (be a complete bitch) is presented as depending on an implicit if-clause (if I do not have a bubble bath and a glass of wine), and thus something that can be avoided.
Third, we operationalised social meaning (register) in two different variables:
-
(i) Topic of Discourse (‘personal’ vs ‘appraisal’ vs ‘dialogue/fiction’)
-
(ii) Degree of Formality (‘formal’ vs ‘informal’)
In most examples, the blog authors are talking about themselves and their own personal life, as illustrated in examples (3)–(12). On occasion, however, the author engages with a topic that is not exclusive to their own life or that of close friends and relatives. For instance, in example (13), the topic is a new single from the band Muse. Such examples were annotated as instances of ‘appraisal’. The third type of topic concerns instances of narration and dialogue (fan fiction), such as (14), where Sam and Kurt are fictional characters. Such examples were annotated as instances of ‘fiction/dialogue’. Finally, for the feature Degree of Formality, examples like (15) were annotated as ‘informal’ relative to examples such as (16) based on the surrounding context.

Importantly, the annotation framework was applied to a larger span of context than is shown in these examples (up to 500 characters), and the actual constructions were hidden from the annotators, as exemplified in table 1, which includes examples (15) and (16).
Clause Type (‘independent’, ‘dependent’) and Sentence Type (‘affirmative’, ‘negative’, ‘interrogative’) were added as processing variables. The first feature concerns whether or not a given instance is syntactically embedded in another clause, as in (10), (11) and (12) above. The second feature is illustrated for each subtype in (3), (6) and (17), respectively.
Finally, the framework includes a processing-related variable of Recency, or repetition in discourse, i.e. mention of the same construction in the preceding context (within the same blog entry): this means that the variable Recency of a given construction (target) is annotated as ‘gonna’ if the previous modal is gonna (prime), as ‘going’ if the previous modal is going to (prime) and as ‘no mention’ if there is no previous modal within the same blog entry.
In the case of Communicative Function, a bottom-up frame-based approach was followed, while Speaker Certainty, Temporal Proximity and Degree of Formality were annotated using nine-point Likert scales, which were then reduced to three or two points (see Reference MikkelsenMikkelsen forthcoming for a more thorough explanation of the operationalisation and annotation process). The quality of the blind annotations by the authors (one native and one non-native speaker of English) was assessed by measuring Inter-annotator agreement. We found a substantial agreement between the two annotators, as shown in table 2 below, where values above 0.80 indicate ‘strong’ and above 0.90 ‘almost perfect’ agreement (McHugh Reference McHugh2012: 279). In case of disagreement between the two annotators, the instances were discussed and if agreement could not be reached, the native speaker’s assessment was given preference, following Berglund & Williams (Reference Berglund, Williams and Facchinetti2007: 109).
Table 2. Inter-annotator agreement, measured with Cohen’s kappa

Given the literature discussed in sections 2.2. and 2.3, we would expect gonna to correlate with informal contexts by contrast with going to, both in terms of Topic of Discourse and Degree of Formality. Regarding the semantic and pragmatic factors, if gonna is used more in its predictive sense than earlier, we might expect that it is moving towards the semantico-pragmatic space of will, associated with predictions, distal temporal proximity, low speaker certainty and future contingency. As for processing factors, we can expect that syntactically complex contexts such as interrogation, negation and subordination should correlate with going to. In addition to these main predictors, as mentioned above, a random variable of ‘lexical verb’ (the infinitival complement occurring in the constructional slot gonna/going to V) is included in the model.
4. Results
We start by reporting the results of the distinctive collexeme analysis, produced using the R-script Coll.analysis 4.1 (Gries Reference Gries2024), which compares expected and observed frequencies and ranks the collexemes according to different association measures. Here we have chosen the log likelihood ratio (LLR). Table 3 shows the 25 verbs most associated with each form. Note that we do not provide a cut-off point for significance, as the calculation of p-values assumes that each item is an independent observation, and the nature of corpus data violates this assumption (see Schmid & Küchenhoff Reference Schmid and Küchenhoff2013: 537ff. for discussion). While the LLR values in this table cannot be interpreted in absolute terms, they are generally quite low, indicating that most verbs are relatively evenly distributed between the two forms. The notable exception being gonna go, which appears to be especially chunked compared to going to go.
Table 3. Most distinctive collexemes of going to/gonna (Log likelihood ratio)

The collexemes suggest that going to is more associated with intention verbs, such as ‘see’ (which is here used primarily in the sense of ‘try’), ‘try’ and ‘attempt’, while gonna shows a preference for cognition verbs such as ‘realise’, ‘hate’, ‘scar’ (here, emotionally), ‘haunt’ and ‘miss’. This may be interpreted as an effect of the intention sense being stronger with going to and the prediction sense with gonna. On the other hand, when we interpret the collexemes not in terms of their semantics, but cluster them according to their social meaning potential, a different picture emerges. Here, we see that certain collexemes of gonna are associated with the personal sphere such as ‘hang (out)’, ‘pick (up)’, ‘drink’ and ‘cum’, while others convey strong (negative or positive) emotions such as ‘love’, ‘hate’, ‘haunt’, ‘scar’, ‘rock’ and ‘miss’.Footnote 6 Comparatively more formal verbs, such as ‘become’, ‘attempt’, ‘provide’ and ‘affect’, are associated with going to, suggesting that the most important factor explaining the variation is the social dimension of meaning. However, as pointed out above, the LLR values are low for most verbs; moreover, many of the verbs are very rare in the sample. For this reason, we need to compare the results of the collexeme analysis with the more qualitative profile-based approach.
Moving on to the Behavioural Profile analysis of the study, the results of the regression model are presented in table 4, while the effect plots of the significant fixed effects are presented in figure 1. This figure should be read as follows: the numbers on the y-axis indicate probabilities (rather than odds), while the values on the x-axis represent the variable levels and the yellow whiskers indicate confidence intervals at the 95 per cent level. Following recommendations by Tizón-Couto & Lorenz (Reference Tizón-Couto and Lorenz2021), we here report the full model, including the predictors that are not significantly associated with either going to or gonna. The variables are ranked in order of relative importance, as indicated by the effect size, and a positive number means that the variable correlates with gonna, while a negative number means that the variable correlates with going to. The two most important predictors according to this measure are Recency (mention of gonna in the preceding context) and Topic of Discourse (appraisal, that is, speakers engaging in topics that go outside of their own personal sphere). However, the value of both predictors are infrequent in the sample (see table 5), which means that their effects are uncertain: there are only 52 instances in which one of the forms is repeated in the previous discourse (Recency) and merely 26 instances of ‘appraisal’ in the sample. This means that while Recency is a strong predictor of the choice of form, a mention of gonna in the preceding discourse leads to a repetition of form in 87 per cent (20/23) of the cases (for going to this figure is 79 per cent or 23/29), we must recall that this feature is absent from a majority of the items (n=448) in the sample. When taking into consideration the high degree of uncertainty of the variables Topic of Discourse and Recency, as indicated by the large whiskers in figure 1, it seems that the single most important feature is Register (‘informal’), understood as the perceived formality based on a close examination of the linguistic context (table 1). Moreover, this variable was the only one that performed significatively better than baseline (C=0.67) when included as a single predictor in a separate regression model. Finally, contexts of subordination (ClauseType ‘dependent’) are associated with the longer form going to, as expected, though the small effect size and the large whiskers in figure 1 mean that we cannot exclude the null hypothesis (i.e. that clause type plays no role in the choice of forms). The other processing variable, namely the type of sentence, does not achieve statistical significance. Similarly, the semantico-pragmatic variables Communicative Function, Contingency, Speaker Certainty and Temporal Proximity do not significantly contribute to the model.
Table 4. Mixed-effects binary logistic regression model of speaker choice for going to vs gonna

Note. Positive coefficients predict gonna. Significance codes: *** = p < 0.001; ** = p < 0.01; * = p < 0.05; NS = p > 0.05.

Figure 1. Effect plot of fixed effects showing the predicted probability of gonna for each level on the y-axis
Table 5. Tokens per variable/level for the full dataset

Examining the model statistic on the right side of table 4, we see that the overall predictive power of the model, as expressed by the index of concordance (C-score), is 0.76, or ‘acceptable discrimination’ (Hosmer et al. Reference Hosmer, Lemeshow and Sturvisant2013: 177). This number should be interpreted relative to the similarity of the things compared, and when taking into account that the two forms are typically described as synonyms, acceptable discrimination is, in reality, quite good. The small difference between the marginal and conditional R2, which respectively express the percentage of variance explained by the fixed effects and the combination of both fixed and random effects, indicates that lexical effects play a minor role in the choice of form. The highest of the VIFs (‘variance inflation factors’) is above 9, which may point to collinearity, i.e. highly correlated predictors. Since some researchers warn against using VIFs to make decisions about model selection (Winter Reference Winter2019: 114), we examined the possible correlation between variables using a chi-squared test of independence. This test revealed significant associations between Communicative Function and Speaker Certainty (p<0.001), as intentions and plans are correlated with ‘high’ speaker certainty and predictions with ‘non-committing’ speaker certainty, and between Communicative Function and Sentence Type (p<0.001), as contexts of interrogation are preferred with predictions and contexts of negation with non-future uses. However, we must recall that none of these variables contribute significantly to the model. Moreover, we also used backward selection (i.e. removing one variable at the time from the full model) to obtain a minimal model containing only the predictors that achieved significance. This minimal model was not significantly different from the full model (based on the anova function), performed the same in terms of predictive accuracy (C=0.76) and in this model the VIFs were all very low (around 1), meaning that the significant predictors are all independent of each other.
As mentioned above, the model points to a very slight contribution from lexical effects and none of the semantic/pragmatic variables showed any preference for either of the two forms. We can thus conclude that the two forms display no significant differences in meaning as it is traditionally understood. However, if one adopts a broader definition of meaning, as the one presented in section 2, then the observed register differences point to a distinction in social meaning. Overall, then, the results of the regression modelling, just like those of the collexeme analysis, seem to suggest that the social dimension of meaning is the driving factor in the choice between going to and gonna.
5. Discussion
The results of both the distinctive collexeme analysis and the Behavioural Profile analysis converge to highlight social meaning as a key driver of the alternation between going to and gonna. The low Pearson residuals and the minimal difference between marginal and conditional R2 values suggest that this alternation is not tied to specific verbs but rather to sensitivity to broader linguistic and situational contexts. One particularly strong predictor is repetition in discourse, which, while rare in the sample, exhibits high predictive accuracy. This pattern is not merely a processing effect but can also be interpreted as a register effect: speakers tend to maintain stylistic coherence within stretches of discourse, shifting register only when situational demands justify it. For instance, in (18), the narrator employs going to in a descriptive mode, while a character’s dialogue shifts to the more colloquial gonna. This alternation underscores a key finding: gonna functions as a distinct construction, separate from going to, due to its association with register-driven social meaning.

Overall, our findings offer new perspectives on how register meaning operates within Construction Grammar. As mentioned in section 2.1, variationist studies have tended to operationalise register as situational context, often using (sub)corpora as proxies (e.g. formal versus informal, written versus spoken) (Engel & Szmrecsanyi Reference Engel and Szmrecsanyi2022; Li et al. Reference Li, Dunn and Nini2023; Szmrecsanyi & Engel Reference Szmrecsanyi and Engel2023). By contrast, our approach examines a single register (blogs, representing informal written discourse) and operationalises register in two ways: Topic of Discourse (e.g. personal, appraisal, or fiction) and Degree of Formality, measured through the linguistic features in the surrounding context of each instance. Importantly, we find that Degree of Formality operates independently of both broader register categories (written informal) and narrowly defined situational contexts.Footnote 7
This distinction has significant theoretical implications. It suggests that knowledge of register extends beyond understanding which contexts suit a given construction. Instead, the construction itself carries inherent social associations, regardless of situational context. For gonna, these social associations are tied to its status as a colloquial form. Our results thus support a growing body of evidence within Construction Grammar that contracted verbal forms, particularly modal constructions, constitute distinct constructions with unique meanings (Lorenz Reference Lorenz2013a,Reference Lorenz, Hasselgård, Ebeling and Ebelingb,Reference Lorenzc; Nesselhauf Reference Nesselhauf and Hundt2014; Flach Reference Flach2021; Daugs Reference Daugs, Hilpert, Cappelle and Depraetere2021; Reference Daugs2022; Levshina & Lorenz Reference Levshina and Lorenz2022). Moreover, our results challenge the notion that colloquial variants are peripheral. Instead, the ‘colloquial’ nature of gonna is central to its meaning as a construction, emphasising that register-based variation can act as a crucial mechanism for the meaning differentiation of formally similar constructions. The ability of gonna to convey register-specific social meaning underlines its distinctiveness and reinforces the idea that colloquial forms deserve attention not as deviations or subforms, but as constructions in their own right.
Of course, the findings of our study emphasise social meaning as a key distinguishing factor, but future research building onto this contribution would need to take other processing factors into account, which were not included in our models mainly due to the nature of the corpus data. For example, there are reasons to believe that going to and gonna are associated with a family of constructions centred on the V-to-Vinf schema (Tizón-Couto & Lorenz Reference Tizón-Couto and Lorenz2018; Lorenz & Tizón-Couto Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020, Reference Lorenz and Tizón-Couto2024), whose realisations have been shown to be affected by phonological, frequency-related and inferential factors (Lorenz & Tizón-Couto Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020: 93–7). Another crucial note to keep in mind is that the scope of the Principle of No Equivalence is specifically the ‘construction’, in its technical sense as an entrenched and conventionalised unit in the constructional network (Schmid Reference Schmid2015, Reference Schmid2020; Diessel Reference Diessel2019b). We consider it safe to assume that gonna checks these two criteria of constructionhood in light of the previous research we have reviewed,Footnote 8 but we would need to test this assumption more directly in a future study, as it would allow us to distinguish register-specific constructions from realisations that only sporadically occur in different registers.
One interesting example is the reduced variant havda (for have to), which has been shown by Lorenz & Tizón-Couto (Reference Lorenz and Tizón-Couto2024: 19) to be avoided in professional speech: Lorenz & Tizón-Couto (Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020) point out that it is unclear whether this variant is sufficiently entrenched and conventionalised to be considered a construction; we would therefore remain cautious in stating that it is accounted for by No Equivalence. Similar remarks hold for the realised variants try to Vinf versus try and Vinf in different native varieties of English, which are also influenced by register and formality effects (Tizón-Couto Reference Tizón-Couto2022): No Equivalence would apply to these variants only if they are considered an ‘alternation’ in its technical sense, i.e. a paradigm involving two entrenched and conventionalised constructions. Ascertaining whether these two essential preconditions to the principle are met is ultimately an empirical question.
6. Conclusion
This study has mostly focused on a synchronic approach to the alternation between going to and gonna, but we believe the findings have implications for future diachronic analyses of the two forms, as well as diachronic pathways of contracted modal and semi-modal forms in English grammar in general. Notably, Lorenz & Tizón-Couto (Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020: 99) propose a framework of conditions for a reduced contraction of the V-to-Vinf schema to become conventionalised, which includes frequency, formal coalescence, semantic reanalysis and favourable phonology – a path from ‘contractionhood’ to ‘constructionhood’. We believe our study allows us to support this framework and even enrich it, by isolating ‘social meaning’ as a unique source of reanalysis from the inferential to the constructional domain through register associations. Moreover, it is possible that this specialisation in the social domain contributes to keeping semantic specialisations at bay. Future case studies of V-to-Vinf contracted variants may help clarify this specific process and provide refined explanations of the emergence of contracted modal constructions through language change.
Finally, future research could further deepen our understanding of register and social meaning in constructional variation by incorporating additional social variables, such as speaker sex, age and region, as well phonological variables, such as intonation and prosody. Unfortunately, the LiveJournal corpus used in this study is a corpus of written texts lacking social metadata, limiting our analysis to broader linguistic and situational predictors. Expanding the scope to include different corpora (e.g. spoken vs written) with richer metadata could enable the development of more robust models. Additionally, the use of experimental approaches, such as elicitation tasks, would yield further insights by allowing us to compare the effect of the two forms in identical contexts, thus testing the independence of our results from contextual effects. The triangulation of different approaches would allow us to test whether the patterns observed here extend to other constructions or reflect broader trends in register sensitivity and grammaticalisation in a Construction Grammar framework.
Appendix
The dataset and code files for the statistical analysis of the corpus can be found at the following OSF database: https://osf.io/a8fcw/?view_only=167ea17f8db6480787709423fd18d5f5



