Hostname: page-component-857557d7f7-bvshn Total loading time: 0 Render date: 2025-11-20T16:39:13.891Z Has data issue: false hasContentIssue false

Breaking free from the be going to / gonna dichotomy? A study of variation in an emerging English modal

Published online by Cambridge University Press:  20 November 2025

Leela Azorin*
Affiliation:
ICAR (CNRS UMR 5191), Ecole Normale Superieure de Lyon , France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
Rights & Permissions [Opens in a new window]

Abstract

This article explores the variation surrounding the semi-modals be going to and gonna. While gonna is frequently mentioned alongside be going to, it remains under-described in traditional grammars and academic literature. However, recent studies within Construction Grammar suggest that gonna may represent an independent construction, prompting a reconsideration of other variants within the be going to / gonna paradigm such as gon and imma, which appear to derive directly from gonna and no longer from be going to. In light of recent work, what have traditionally been regarded as mere ‘phonetic realizations’ or ‘orthographic variants’ may in fact play a more significant role in the formation and definition of constructions, raising questions about the structure of constructional networks. This article analyzes the immediate syntactic environment of the variants to account for both the variation of forms and the status of such forms. The study is conducted using two corpora that are particularly prone to showing linguistic innovations and language change: a spontaneous spoken corpus and a web corpus. Findings indicate that shorter variants often involve elision of be and that gonna is more grammaticalized than going to, based on the types of verbs they precede.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Be going to is a verbal periphrasis that can sometimes be reduced to gonna:

can be written or said as follows:

Although both be going to and its reduction gonna have been extensively studied in the literature, researchers have not consistently examined the variation between the two. Gonna is typically regarded as an informal variant of be going to (Leech Reference Leech2004: 58), but relatively few studies have addressed the structural or grammatical differences between them. This article seeks to investigate these distinctions from a morphosyntactic perspective. Moreover, some authors have identified additional variants within the be going to / gonna paradigm that are neither be going to nor gonna, but this has seldom been studied in written contexts. In spoken contexts, Lorenz (Reference Lorenz2012a: 75) cites forms such as [ɡoɪnə], [ɡɒndə], and even [nə] or [ɡɒ] as potential realizations of the construction. This article examines the immediate syntactic environment of the contracted variants of be going to / gonna with regard to their ‘mother’ form (be going to). Although many studies have examined be going to diachronically (Perez Reference Perez1990; Danchev & Kytö Reference Danchev, Kytö and Kastovsky1994; Poplack & Tagliamonte Reference Poplack and Tagliamonte1999; Disney Reference Disney2009; Catasso Reference Catasso2012, among others), the analysis in this article is strictly synchronic.

The analyses conducted are carried out on two different types of corpora: a web corpus and a spoken corpus. This will allow for a more comprehensive investigation of variation across mediums in the twenty-first century. The present study addresses the following three research questions. What are the possible variants within the be going to / gonna paradigm? What might explain the choice of one variant over another from a morphosyntactic perspective? How can we account for these variants from a theoretical viewpoint?

The aim of this article is to identify the different variants and examine the main morphosyntactic criteria that can account for this variation. For the time being, semantic differences that may be at play within this paradigm of forms are intentionally left aside.

The article is structured as follows. First, section 2 provides an overview of the literature on be going to and gonna in traditional grammars as well as papers written within a grammaticalization framework. Some recent works within the Construction Grammar framework are also discussed. Then, section 3 presents the two corpora chosen for my analysis, as well as my attempt at a typology of forms belonging to the be going to / gonna paradigm found in the corpora. The methodology used for my analysis is also introduced in this section: three morphosyntactic criteria and one syntactico-semantic criterion were chosen in an attempt to better describe the different variants within the be going to / gonna paradigm. Section 4 displays the results from these analyses. Finally, section 5 discusses these results and their potential theoretical implications.

2. State of the art

The be going to periphrasis has been widely studied in the literature (Berglund Reference Berglund, Lewandowska-Tomaszczyk and Melia2000a; Reference Berglund, Mair and Hundt2000b; Reference Berglund2005; Collins Reference Collins2009; Krug Reference Krug2001; Mair Reference Mair, Hickey and Puppel1997; Szmrecsanyi Reference Szmrecsanyi2003, among others), especially in relation to the theory of grammaticalization (Hopper & Traugott Reference Hopper and Traugott2003; Krug Reference Krug2000; Poplack & Tagliamonte Reference Poplack and Tagliamonte1999). Grammaticalization can be defined as ‘how lexical items and constructions come in certain linguistic contexts to serve grammatical functions or how grammatical items develop new grammatical functions’ (Hopper & Traugott Reference Hopper and Traugott2003: 1). This process is typically gradual and involves several changes such as phonological weakening, semantic bleaching and syntactic simplification. Crucially, grammaticalization is not limited to formal properties but also includes changes in the frequency, context and function of the forms involved.

In this context, be going to is the result of a process of grammaticalization whereby the lexical verb go, having originally a meaning of movement (‘to travel or move to another place’Footnote 1), has been grammaticalized and has undergone a semantic bleaching or a shift in meaning, its literal meaning of motion evolving into a metaphorical motion through time (Bybee, Perkins & Pagliuca Reference Bybee, Perkins and Pagliuca1994: 130). Be going to has progressively become grammaticalized and is now associated with the notions of futurity and intention (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 214):

Example (2) expresses the subject’s intention to perform a future action.

In that sense, gonna seems to represent a further step in the grammaticalization process of be going to, as the result of a process of univerbation (‘the union of two (or, rarely, more) syntagmatically adjacent word forms into one’, Lehmann 2020: 3). From be going to (three words), phonological weakening and syntactic simplification have resulted in univerbation with the form gonna. It is interesting to see that although be going to has been extensively debated and studied, gonna has very rarely been studied by itself.

2.1. Be going to and gonna in traditional grammars

A crucial question regarding be going to and gonna is that of their status. How can we categorize them? Depending on the level of analysis, two perspectives may explain the grammatical status of such forms.

First, these forms are described as belonging to the ‘modal’ domain: Huddleston & Pullum et al., in their grammar, speak of ‘modal idioms’ (Huddleston & Pullum et al. Reference Huddleston and Pullum2002: 1227) to refer to be going to, but also to other forms such as have to, got to and want to. In other linguistic works, they are referred to as ‘quasi-modals’ (Collins Reference Collins2009), ‘emerging modals’ (Krug Reference Krug2001), ‘semi-modals’ (Leech Reference Leech, Facchinetti, Palmer and Krug2012) or even ‘periphrastic modals’ (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 484). All these terms highlight the fact that the forms belong to the modal realm, while not exactly corresponding to typical modal auxiliaries such as will, must, can, etc. This may be due to their periphrastic structure, as well as to the fact that be going is followed by to-infinitives, whereas modals are typically followed by bare infinitives. Semantically speaking, be going to refers to an event occurring in the future: ‘future as outcome of present circumstances’ (Palmer Reference Palmer1988: 146). More specifically, the form expresses prediction, either based on the speaker’s intention (or ‘future fulfilment of present intention’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 214)) or inference (or ‘future fulfilment of present cause’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 214)), respectively represented in examples (3) and (4):

As be going to and gonna are modal-like forms that express futurity, they are often said to compete with the modal will. It is therefore no surprise that they should belong to the modal realm, since modality is defined by the expression of possibility or necessity and the expression of the ‘attitude of the speaker’ (Lyons Reference Lyons1968: 306).

Secondly, from a syntactic viewpoint, be going to and gonna are described as resembling auxiliaries. In their grammar, Quirk et al. call be going to a ‘semi-auxiliary’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 137), in an intermediate position in their auxiliary verb – main verb scale, in-between ‘modal idioms’ (such as have better, have got to) and catenative verbs (appear to, seem to). This analysis seems to apply to gonna as well, since gonna is referred to as a ‘non-standard spelling’ of be going to in Quirk et al.’s grammar (Reference Quirk, Greenbaum, Leech and Svartvik1985: 898). Hopper & Traugott, for their part, go as far as categorizing be going to as an ‘auxiliary’ (Hopper & Traugott Reference Hopper and Traugott2003: 69).

These two approaches complete one another in helping us understand these forms: Quirk et al. define the class of semi-auxiliaries as ‘verb idioms which express modal or aspectual meaning [and which are introduced by one of the primary verbs have and be]’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 143, emphasis added). Both auxiliary status and modality, as scalar concepts, seem to help us categorize the verbal forms under analysis. Moreover, the two are often linked – especially when referring to the future. As Quirk et al. write: ‘Futurity, modality, and aspect are closely interrelated, and this is reflected in the fact that future time is rendered by means of modal auxiliaries, by semi-auxiliaries, or by the simple present or present progressive forms’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 213).

Although terminology may vary, two key dimensions help refine our understanding of be going to and gonna: their evolving syntactic role as auxiliaries and their association with modal meaning. Thus, it may be said, as a basis for my analysis, that be going to, and therefore gonna, lean towards being emergent modal auxiliaries.

In traditional grammars, be going to receives only limited attention, and gonna is typically mentioned only in passing, often as a ‘non-standard spelling’ (Huddleston & Pullum et al. Reference Huddleston and Pullum2002: 1616; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 898). The form is never analyzed independently – unsurprisingly given that be going to itself is only briefly examined. Over the past two decades, however, linguists have started to analyze the potential independence gonna may be taking from be going to, especially in the theories of grammaticalization and, more recently, Construction Grammar.

2.2. Be going to and gonna in previous research: grammaticalization and Construction Grammar

The verbal periphrasis be going to is not a recent development. Assigning a precise date of origin is difficult, but the construction can already be found in Shakespeare’s works, although its use with a future-oriented meaning remained relatively rare at the time. Recent studies (e.g. Petré & Van de Velde Reference Petré and Van de Velde2018) have shed light on the early stages of be going to’s grammatical development, suggesting that the initial phase of its grammaticalization most likely took place and stabilized between 1620 and 1640. According to Petré & Van de Velde (Reference Petré and Van de Velde2018), earlier attestations are considered non-conventional usages, rather than clear evidence of an established construction.

If the form is not new, its use has increased over time, particularly in recent decades. Figure 1 shows the frequencies of be going to and gonna in the Corpus of Historical American English (COHA, Davies Reference Davies2010), and we can note that COHA’s first occurrence of gonna dates back to 1917, in Thomas Augustus’ play The Copperhead.

Figure 1. Occurrences of going to V and gonna in COHA, from 1890 to 2010Footnote 2

As shown in figure 1, the use of be going to increased steadily in the twentieth century, notably from the 1930s onwards. As for gonna, determining its first appearance is even more difficult, most likely because the form is considered non-standard and therefore avoided in transcriptions and writings, as Col & Duchet (Reference Col and Duchet2000) suggest: ‘L’apparition de gonna est difficile à dater, du fait de l’évitement graphique dont cette forme semble être victime’ (‘gonna’s first appearance is hard to date, due to the graphic avoidance of which this form seems to be a victim’; my translation). Petré & Van de Velde (Reference Petré and Van de Velde2018) cite 1806 as the earliest attestation of gonna, referencing the Oxford English Dictionary (OED). However, the form remained infrequent throughout the nineteenth century and only began to rise in frequency during the twentieth century. Figure 1 displays the increase of both forms in American English, with a marked rise from the 1930s for be going to, and from the 1970s for gonna, at which point the number of gonna occurrences per decade begins to rival that of be going to. This trend aligns with the more widespread use of semi-modals in American English in general, which tend to replace their modal auxiliary counterparts, as evidenced by Daugs (Reference Daugs2017).

This rise in frequency is crucial: an increase in usage frequency can lead to morphological reduction, a phenomenon Lorenz identifies as a ‘symptom of advanced grammaticalization’ (Lorenz Reference Lorenz, Hasselgård, Ebeling and Ebeling2013: 133). Krug considers phonological variants such as wanna and gonna to represent ‘different stages in the evolution of new auxiliaries’ (Krug Reference Krug2001: 314). Accordingly, the more reduced gonna becomes, the more it may be perceived as auxiliary-like, as part of its process of auxiliation. This hypothesis will be tested against my corpora in section 4, in order to assess which variant exhibits more auxiliary-like behavior. Furthermore, as several authors point out, grammaticalization is unidirectional (Haspelmath Reference Haspelmath1999, Reference Haspelmath, Fischer, Norde and Perridon2004; Hopper & Traugott Reference Hopper and Traugott2003: 99): it always progresses towards more grammaticalization. However, gonna itself is only minimally addressed in such works. As Col & Duchet (Reference Col and Duchet2000) note, when gonna is mentioned, it is typically regarded merely as a contracted or reduced form of be going to (‘elle est considérée, dans le meilleur des cas, comme la forme réduite de (be) going to’, ‘in the best case scenario, it is considered to be a reduced form of (be) going to’; my translation). It is treated as a ‘non-standard spelling’ (Huddleston & Pullum et al. Reference Huddleston and Pullum2002: 1616; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 898) or an ‘informal variant’ (Leech Reference Leech2004: 58).

Nonetheless, some authors point out that, with the weakening of infinitival to and its ‘incorporation into the preceding word’ (Collins Reference Collins2009: 18) in gonna, the motion meaning that could be found with be going to (as in I’m going to London) has entirely disappeared: *I’m gonna London is ungrammatical. As Krug (Reference Krug, Heine and Narrog2011) notes, be going to is thus semantically closer to the lexical verb go, while gonna will tend to be more abstract. This implies that the to-contraction is not merely phonological but also morphosyntactic in nature (Pullum Reference Pullum1997). Furthermore, research in the 2000s began to suggest that gonna might be evolving into a morphosyntactically and semantically independent form, distinct from be going to (Col & Duchet Reference Col and Duchet2000, Reference Col, Duchet, Col and Roulland2001; Krug Reference Krug2000). This idea – the potential autonomy of gonna – is a prominent issue examined in the present study and will be tested in section 4.

More recently, other authors have advocated for a potential independence of gonna, including Machová (Reference Machová2015), Lorenz (Reference Lorenz2012a, Reference Lorenz and Botinis2012b, Reference Lorenz, Hasselgård, Ebeling and Ebeling2013, Reference Lorenz, Sommerer and Smirnova2020), Lorenz & Tizón-Couto (Reference Lorenz, Tizón-Couto and Botinis2016, Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020, Reference Lorenz and Tizón-Couto2024) and Daugs (Reference Daugs2017, Reference Daugs, Hilpert, Cappelle and Depraetere2021). Among other findings, Machová notably observes that the auxiliary be can be dropped with gonna, especially with first- and second-person plurals in American English. She hypothesizes that the form may acquire additional operator properties in the future. Lorenz, Tizón-Couto and Daugs, in particular, approach emergent periphrastic modals through the lens of Construction Grammar. Within this framework, constructions are defined as ‘form-meaning pairing[s] organized in a network’ (Traugott & Trousdale Reference Traugott and Trousdale2013: 1), cognitively stored in the speaker’s mind. This model assumes a ‘lexicon-syntax continuum,’ rejecting a strict boundary between lexical items and syntactic rules (Hoffmann & Trousdale Reference Hoffmann, Trousdale, Hoffmann and Trousdale2013: 1). In this framework, Lorenz (Reference Lorenz2012a: 192) identifies multiple variants for different emergent modals. Lorenz & Tizón-Couto (Reference Lorenz and Tizón-Couto2024) argue that once a ‘pronunciation’ variant reaches a high frequency compared to other variants, it is being stored cognitively as an independent lexical item. This implies that speakers do not derive gonna from be going to on the spot but rather access gonna directly from their mental lexicon as an independent form. The example of such contractions offers valuable insights into how the human mind ‘processes and stores linguistic experiences’ (Lorenz & Tizón-Couto Reference Lorenz and Tizón-Couto2024: 2). This view is further developed by Daugs (Reference Daugs2017, Reference Daugs, Hilpert, Cappelle and Depraetere2021), who also adopts a Construction Grammar perspective to argue that contractions should be distinguished from phonological reductions. In the case of gonna, the form does not merely represent a pronunciation variant of the same variable but rather reflects a true lexical choice on the part of the speaker, as the contractions are institutionalized. Daugs advocates for variants – contracted forms in both his analysis and mine – to be perceived as constructions in their own right and not mere equivalents of the mother form. Although Construction Grammar involves a form–meaning pairing, the modal contractions are distinct from their full counterparts in terms of morphology, distribution and collocational patterns. In this respect, Construction Grammar offers a productive framework for examining the transition from – and choice between – be going to to gonna, as well as the relationships established between the two forms, but also with other forms that exist in-between and beyond this network. While ‘meaning’ is a core dimension of constructions, this article focuses exclusively on the formal dimension of the pairings. This focus represents the first step of the analysis, and further research will shed light on the semantic-pragmatic component. The relationship between form and meaning – preliminarily explored elsewhere in this special issue by Mikkelsen & Morin (Reference Mikkelsen and Morin2025) – falls outside the scope of the present article.

3. Data and methodology

The corpora examined in this study point to the existence of a wider range of forms falling between and extending beyond be going to and gonna, highlighting a more complex and varied network than previously assumed. I first present the corpora and outline the range of variants identified. The analysis then focuses on identifying criteria that may account for the observed variation and help explain the choice of one variant over another.

3.1. The corpora

To better understand the heterogeneous network at work, all forms resembling be going to and gonna were extracted from two different corpora. The selection of forms was based on semantic and orthographic or phonological resemblance, depending on the corpus. The analyses are based on two different mediums: a written corpus and a spoken corpus. Since gonna is said to be the result of a phonological reduction first, it seemed interesting to investigate potential new variations used in place of gonna in spoken corpora. Moreover, previous studies indicate that be going to is especially frequent in conversation (Daugs Reference Daugs2017; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021), which motivated the choice of a spoken corpus as one of the datasets. This also apply to gonna, but the form has rarely been studied by itself. The literature states that the proportion of be going to (and gonna) is far higher in spoken corpora than in written ones. One possible explanation for it is what Leech refers to as the ‘prestige barrier’ (Reference Leech, Marín-Arrese, Carretero, Hita and van der Auwera2013: 110) – that is, the fact that the use of semi-modals might be slowed down by ‘the taboo surrounding the use of highly colloquial forms in written (especially printed) texts.’ For this reason, a web corpus was selected as the second dataset, rather than a traditional written corpus. Online writing tends to be more spontaneous, allowing users to break free from the norms of traditional grammar (McCulloch Reference McCulloch2019). As Tagliamonte puts it: ‘variation in language is most readily observed in the vernacular of everyday life’ (Reference Tagliamonte2011: 2). In this light, comparing a spoken corpus with a web corpus offers a fruitful approach to better understand the spectrum of variations with be going to and gonna.

The written corpus is a thematic Twitter corpus consisting of geotagged American tweets related to Climate Change (Climate Change Tweets Ids, Littman & Wrubel Reference Littman and Wrubel2019; hereafter referred to as CCTweets). It was collected between September 2017 and May 2019 by a research team at the George Washington University. After filtering out tweets not geotagged in the US, the corpus comprises approximately 1,300,000 words across more than 55,000 tweets. Despite its thematic focus, the corpus retains many of the typical features of computer-mediated communication, and Web 2.0 in particular (Herring Reference Herring and Chapelle2012), such as unconventional typography, emojis, hyperlinks and other genre-specific markers. In this context, the communicative genre appears to take precedence over the thematic constraints of the corpus. This is clearly illustrated by example (5), which combines an emoji with unconventional typography – the use of capital letters for emphasis – within a single sentence. Such features reflect the user’s attempt to convey emotion, highlight key points and engage the audience in a manner typical of Web 2.0 discourse:

This type of corpus was selected because it reflects language as it is used online, in everyday contexts, by a wide and diverse range of speakers, offering valuable insight into ongoing linguistic innovation. However, we acknowledge certain limitations inherent in this dataset: the thematic focus and the limited number of characters may constitute biases in my study. These biases will be taken into account in my discussion of the results.

The spoken corpus is the Santa Barbara Corpus of Spoken American English (SBC, Du Bois et al. Reference Du Bois, Chafe, Meyer, Thompson, Englebretson and Martey2000–5). It consists of recordings of naturally occurring conversations between two or more people all over the United States in the 1990s and contains 249,000 words. Both corpora are openly accessible for academic research under Creative Commons licenses (BY-ND 3.0 US and 1.0 Universal).

A first analysis (table 1) presents raw and normalized frequencies of going to and gonna in the two corpora in comparison with the total number of words. For comparison, the raw and normalized frequencies in the Corpus of Contemporary American English (COCA, Davies Reference Davies2008–) were also added. Notably, normalized frequencies in COCA – which has included a web section since 2020, covering data from 2012 to 2019 – and CCTweets are similar, whereas the results differ markedly for the spoken corpus. This suggests that CCTweets, despite being a web-based corpus, may exhibit more traditional usage patterns than initially expected. Nonetheless, a marked decrease in the frequency of going to is observed in CCTweets compared to COCA. In contrast, within the SBC, be going to occurs much less frequently, while gonna is prominently represented in conversation.

Table 1. going to V and gonna in CCTweets, the SBC and COCA

These two forms – going to and gonna, the more institutionalized ones – are not the only ones that can be found in my corpora, as the next subsection will demonstrate.

3.2. Presentation of the data

The transcription of the SBC only gives two potential transcriptions: going to and gonna. However, additional variants do exist, as confirmed by my web corpus. Table 2 illustrates the presence of other variants such as gunna, gon and imma, among others.

Table 2. Typology of variants in the be going to / gonna paradigm in the SBC and CCTweets (normalized frequencies per million words)

Two important observations should be made regarding this typology. First, identifying such variants in the Twitter corpus is particularly challenging, as it requires prior knowledge of their possible orthographic realizations in order to retrieve them automatically from such a wide dataset. Second, categorizing forms in the spontaneous spoken corpus presents its own difficulties: the SBC transcriptions only include the forms going to and gonna,Footnote 3 which means the analysis initially had to rely on auditory inspection. Every occurrence of going to or gonna in the transcriptions of the sixty recordings was carefully listened to, categorized in the typology, and this process was double-checked when needed,Footnote 4 but I acknowledge that some degree of subjectivity may remain in the categorization. Moreover, establishing a typology that would encompass both mediums proves to be complex, as each corpus presents its own set of issues – orthographic variation in the written corpus and phonological variation in the spoken corpus. For this reason, table 2 provides two distinct typologies: phonological variants for the SBC and orthographic variants for CCTweets.

Table 2 presents an attempt at a typology of the different variants identified in the two corpora, along with an example for each variant. While the specific forms differ slightly across datasets, eight distinct variants were observed in each corpus. The traditional semantic values typically associated with going to and gonna – such as intention and inference – remain observable in the examples. Although the two corpora differ markedly in nature, several forms are attested in both: goin(g) to / [ˈɡoʊɪŋ/n tʊ/ə]; gon(’) / [ɡɔn] / [ɡən] / [gəm] and Ima, though the phonological realizations and orthographic representations may vary. Other forms, however, are exclusive to one of the two mediums. In the SBC, we find variants such as [ˈɡoʊjnə]; [gə]; [nə] / [na] and [ˈa(ɪ)məna] / [a(ɪ)mənə]. Conversely, Twitter allows for a variety of orthographic – and morphological – representations of ‘Ima’: I’ma; Imma; I’mma; Ima. It also features interesting orthographic variants of gonna: gunna and gone.

Taken together, both typologies appear to map out a continuum, with be going to at one end of the spectrum and, at the opposite extreme, the most reduced form found in both corpora: Ima. This contraction is particularly striking: the only morpheme/phoneme left of gonna in this word is <a> / [a]; [ə]. Besides, the variant exhibits what I have termed a hypercontraction, involving the first-person singular pronoun I, conjugated be (am or its cliticized form’m), and gonna, all compressed into a single unit: I(’m)ma. Although my data is synchronic, the SBC provides insight into the possible trajectory of change of these variants with an interesting continuum of forms from the phrase I’m gonna: [aɪm ˈɡɑnə/a] → [am ɡənə] → [amənə] → [amə/a]. I believe this may illustrate how synchronic variation can serve as a window onto ongoing language change. It is important to note that while such reduction may partly result from increased speech rate – especially in elements with lower semantic content such as grammatical constructions – speech rate alone cannot fully account for these forms. Indeed, the extreme phonetic reduction found in spoken data also surfaces orthographically in the written corpus (‘Ima’), suggesting a more entrenched, cognitively stored form rather than a fleeting pronunciation effect.

3.3. Defining criteria

Section 3.2. established the broad range of forms belonging to the be going to / gonna paradigm, which appears to constitute what Lorenz & Tizón-Couto (Reference Lorenz and Tizón-Couto2024: 29) describe as a ‘network of variants’. The SBC presents different realization variants, which could appear to be only phonetic variants, due to some specific phonetic context. Yet many of these same forms, with slight orthographic adaptations, also surface in the CCTweets corpus. This raises a key question: if some of these phonetic variants recur in written form, could it indicate that there are some identified forms becoming conventionalized in some way?

For the purposes of this study, I decided to investigate the immediate syntactic environment surrounding the variants (N-2; N+2). Three morphosyntactic criteria and one syntactico-semantic criterion were selected for analysis. While specifically semantic or pragmatic criteria may provide valuable insights into the potential differences between the variants – particularly regarding meanings such as intention or prediction (see Mikkelsen & Morin Reference Mikkelsen and Morin2025) – it appears that the semantic differences are too subtle to be discerned solely through an analysis that would distinguish between intention and prediction or inference, and are prone to interpretative results. For this reason, I chose to begin with a distributional analysis, which yields less interpretative results than semantic-pragmatic analyses. Nonetheless, some of the selected criteria, once analyzed, may allow us to make assumptions about meaning.

Drawing on previous works by Gesuato & Facchinetti’s (Reference Gesuato and Facchinetti2011), Col & Duchet’s (Reference Col and Duchet2000) and Berglund’s (Reference Berglund, Lewandowska-Tomaszczyk and Melia2000a, Reference Berglund2005), four criteria were selected:

  • The type of subject

  • The presence of be

  • The presence of negation

  • The verb following the variant

The final criterion, although more semantically related, was retained because it falls within the immediate syntactic context of the variants. It also offers preliminary semantic insight into potential differences between the various forms. These four criteria have previously been used to distinguish among grammatical structures and explore whether distributional patterns reflect the growing autonomy of certain forms within a paradigm.

Other possible criteria – such as the presence of adverbs or the type of clause – were not included, as my priority was to analyze the closest syntactic environment of the variants, as well as elements likely to be consistently present across all instances. Additionally, while factors such as phonetic context or speech rate are undoubtedly relevant and insightful, they apply exclusively to the spoken corpus. Given my aim of identifying common criteria that could reliably be examined across both corpora, the analysis was deliberately limited to elements that allowed for systematic comparison. This approach also helps support the hypothesis that some variants may have become conventionalized to some degree. For the analysis of the SBC, all transcriptions were manually corrected so that the variant under investigation corresponded to the acoustically identified form rather than simply be going to or gonna. Analyses for both corpora were first conducted using the concordance software AntConc (Anthony Reference Anthony2021) and subsequently manually verified to ensure that no relevant occurrences (e.g. presence of subject, presence of be) were overlooked due to the N-2/N+2 window limitation. For instance, in (6), both the subject and the auxiliary be occur outside the immediate window surrounding gonna and were therefore retrieved through manual inspection:

4. Results

The following section presents the results found in the SBC and CCTweets for the four selected criteria. The implications of these findings will be discussed in section 5.

4.1. Type of subject

The first criterion analyzed is the type of subject, as displayed in figure 2.

Figure 2. Bar plots showing the distribution of subject per variant in the SBC and CCTweets (logarithmic scale on the y-axis)

Since certain variants occur far more frequently than others, the y-axis in figure 2 is plotted on a logarithmic scale. This approach improves readability while preserving the overall distribution patterns across both high- and low-frequency variants. As expected, figure 2 highlights a striking difference between the two corpora regarding the most frequent subject: I in the SBC, and a third-person subject in CCTweets. This is unsurprising, given that the SBC is a spontaneous spoken corpus, and I is the most used pronoun in communication. The variants resulting from a contraction of I’m gonna all have I as the pronoun, since it is morphologically attached to the variant. No other subject can be present with this hypercontraction in CCTweets. It seems that these variants are morphosyntactically constrained: from the occurrences, we see that it can only happen with the pronoun I, as it is integrated into this new lexeme in written form. It should be noted that while the hypercontraction Im(’m)a is found as such in CCTweets, similar forms can be identified phonetically in the SBC, albeit with very low frequency. These include realizations such as [jərnə] and [wɪrnə], corresponding to you’re gonna and we’re gonna, respectively:

It is worth noting that this form has become fixed in writing with I and not with any other pronoun: *your(n)a or *he(’)s(n)a do not appear in CCTweets. This may be due to I being one of the most frequent pronouns used with be going to and gonna, as the form is linked to intention, which is typically expressed by first-person subjects (Azorin & Lansari Reference Azorin, Lansari, Carlucci and Nykiel2025: 106). Similarly, the monosyllabic forms [ɡə] and [nə/a] also tend to co-occur with I, suggesting that reduction is facilitated by the high frequency of this pronoun in conversational contexts.

Another noteworthy finding is the absence of any pronoun or subject in some cases: this occurs 9 times in the SBC, and 62 times in CCTweets (raw frequencies). The more frequent absence of subject with Twitter is not surprising, as computer-mediated communication often elides the subject for efficiency and speed (Herring Reference Herring and Chapelle2012). This tendency is particularly relevant for tweets, which are subject to a character constraint (280 characters at the time of collection). In most of these cases, the subject is implied, and is very often the first-person singular pronoun:

Interestingly, in both corpora, the more marginal variants almost always include an overt subject – possibly because their deviation from conventional forms is such that further ellipsis might risk rendering the utterance ungrammatical or less interpretable.

4.2. Presence of be

Previous works have shown that be can sometimes be elided with going to or gonna, showing the processes of grammaticalization and auxiliation at play with this paradigm (Machová Reference Machová2015; Col & Duchet Reference Col and Duchet2000). This phenomenon has been tested in the two corpora as well. Relative frequencies were calculated to compare the internal distribution of forms of be (full, contracted and elided) across variants, irrespective of their overall frequency, as is shown in figure 3. This allows for a clearer view of form preference within each variant.

Figure 3. Presence, absence and form of be in both corpora, per variant (relative frequencies, %)Footnote 5

The results suggest that the different variants exhibit distinct morphosyntactic behaviors. Although be is predominantly present in both corpora, some patterns seem to emerge depending on the variant. In general, the be going to variant shows a preference for the full form of be, while gonna shows a clear leaning towards the contracted form. As for the elision of be, the proportion of elided forms is higher in the most radically contracted variants, such as gon and its phonetic realizations or the monosyllables [ɡə] and [nə]. In the SBC, the variant [ɡɒn] (contrary to [ɡən] and [ɡəm]) consistently appears without be, a pattern mirrored in CCTweets, where most occurrences of gon(’) also involve the elision of be:

This variant, which is a more contracted form of gonna, seems to tend towards auxiliation and is the variant which is the most auxiliary-like in that respect.

As for gonna and [ɡənə] / [ɡəna], the proportion of elided forms is noteworthy, as there is a clear distributional difference between these forms and going to or its assimilated variants. This suggests that gonna is not merely an orthographic or phonetic variant, but also a morphosyntactic one.

4.3. Presence of negation

A third criterion tested across the two corpora is the presence of negation, as this may occur in the immediate syntactic environment of the forms, in-between be and the variant. The analysis aims to determine whether negation is more likely to co-occur with certain forms than others. Figure 4 summarizes the relative frequencies of negated forms in both datasets as well as the overall negation rate per corpus for the be going to / gonna paradigm.

Figure 4. Presence of negation in both corpora, per variant (relative frequencies, %)

Only the variants that co-occurred with negation were included in figure 4: four forms for CCTweets and four for the SBC. The results do not reveal any consistent pattern across variants, but rather highlight a difference between the two corpora: overall, negation is more frequent in the Twitter corpus, where it accounts for 11 percent of all occurrences within the going to / gonna paradigm, compared to 9 percent in the SBC. While the raw frequencies remain relatively low, the SBC results reveal a noteworthy trend: the more phonologically reduced the variant, the more likely it is to appear with negation. This tendency is not observed in CCTweets, where no clear correlation emerges between contraction and negation.

A particularly noteworthy observation concerns the hypercontraction Ima/Imma, which is never found in combination with negation. This is morphosyntactically unsurprising as the natural place for the negation is between ’m and the variant, and the contracted form of this variant does not allow for such negation (*Imnota, *Imnta).Footnote 6 If the auxiliation were complete, we would expect the negation to come after Imma, in a negation pattern like *Imma not. However, such forms are not attested in my corpora either.

4.4. Verb following going to and gonna

Having examined the left-hand context and internal characteristics of the variants themselves (e.g. presence of be, possible negation), we now turn to the immediate right-hand context – namely, the element that directly follows the variant. This can provide a first glance at the semantic and pragmatic differences between variants. Given the raw frequency distribution of the various variants, the analysis focuses exclusively on the two most frequent ones in both corpora: going to and gonna. For the spoken corpus, the original transcriptions have therefore been used. Out of a total of 345 distinct verbs, the twenty most frequent across both corpora are displayed in table 3 and graphically represented in figure 5.

Table 3. Distribution of verbs with going to and gonna across both corpora (normalized frequencies per 1,000 words)

Figure 5. Bar plots showing the distribution of verbs for going to and gonna across both corpora (normalized frequencies per 1,000 words)

Berglund (Reference Berglund2005: 143) states that the ten verbs most frequently co-occurring with going to and gonna among the most common infinitives in the BNC are be, do, have, get, go, take, give, say, put and come. Nearly all of these verbs are also attested in my corpora, with the exception of give.

One particularly noteworthy observation concerns the absence of a verb (‘NO_VERB’ in table 3 and figure 5; LLR = 2.7; LogOddRatio = 0.5 in figure 6). This phenomenon occurs in both corpora, but more so in the SBC. One hypothesis would be that, in a more spontaneous as well as communicative setting, gonna can carry a metadiscursive role (Azorin Reference Azorin, Patrukhina and Bosbach2025) and, as a metadiscursive form, would not need any verb following it:

Figure 6. Plot showing the results of the distinctive collexeme analysis for the most frequent verbs with going to and gonna

In this context, gonna appears to function as a device for reclaiming the conversational floor, even when the speaker does not yet have a fully formulated utterance, not seeming to know exactly what they want to say, thereby contributing to the discursive planning of speech. Although it may resemble a false start, the overall frequencies of such syntactic constructions suggest that gonna may assume a metadiscursive role in turn-taking. Moreover, the absence of a following verb occurs primarily with gonna rather than going to, which may point to a pragmatic difference between the two variants. However, the frequency of this phenomenon is not statistically significant in my corpora and would require further analysis on larger datasets. These remarks are thus preliminary.

To deepen the investigation, I conducted a distinctive collexeme analysis, following Gries’ methodology (Reference Gries2024). This statistical procedure yields a table listing, among other elements, preference for one form over the other (here: be going to vs. gonna), Log-Likelihood Ratios (LLR), Log odds ratios and Pearson’s residual scores. For the sake of clarity, a summary of the findings is presented in figure 6 as a plot. Only verbs meeting a minimum frequency threshold (≥ 5 tokens across both variants) and exhibiting sufficiently high LLR values (LLR >3.85, p<0.05) are displayed, ensuring that the association with either gonna or going to is statistically significant. Figure 6 displays this data.

The horizontal axis represents the log-transformed co-occurrence frequency of each verb with the two variants combined (gonna + going to), while the vertical axis shows the log odds ratio, with positive values indicating a preference for gonna and negative values a preference for going to. A dashed line at zero marks the neutral point. Labels are provided only for verbs showing statistically meaningful preferences. The plot reveals clear lexical preferences: some verbs are strongly biased toward gonna (e.g. read, blow, let, come, go), while others cluster more with be going to (e.g. get, end, kill). From my datasets, the differences between the variants are not pronounced but remain statistically significant.

Of particular interest is the preference for gonna with the motion verbs go and come. Since going to stems from the motion verb go as part of its grammaticalization pathway, it may seem cognitively odd to associate be going to – even in its grammaticalized form – with motion verbs, especially given that going to + place is still in use in English. The preference of these verbs for gonna may prove that this variant is more grammaticalized than going to, to the point that any residual sense of motion has been entirely lost. In contrast, going to may still evoke a cognitive association with physical movement. Since gonna functions exclusively as a modal form, speakers may prefer it in contexts where they need a motion verb and a modal meaning, thereby avoiding potential ambiguity or redundancy for the co-speaker. The preference for gonna with the verbs read, blow and let is also noteworthy, but warrants further investigation using larger datasets. Although these verbs show a clear tendency to prefer gonna in my data, they remain infrequent in the overall data, which limits the robustness of any conclusions.

Finally, two results from table 3 and figures 5 and 6 may appear surprising considering the traditional frequent verbs in English: the overall frequencies of die and kill – respectively 50 and 40 occurrences per 1,000 words. Almost all these occurrences are found exclusively in CCTweets. This reflects a bias inherent to this corpus, which is thematic in nature. Thus, the focus on climate change in this corpus likely explains the frequent appearance of tweets referring to global warming and apocalyptic scenarios, as in:

Therefore, these results are highly specific to my corpus and should probably not be interpreted as evidence that die and kill are among the most frequent verbs co-occurring with going to and gonna more broadly. In terms of meaning, examples (13) and (14) illustrate inference rather than intention. This may suggest that the CCTweets corpus contains a higher proportion of inference-related structures compared to those expressing intention. While this observation could be further tested using a random sample of occurrences, such an investigation falls outside the scope of this article.

5. Discussion: variation and its theoretical and cognitive implications

In this section, I discuss the possible conclusions that can be drawn from the results presented in section 4. I consider the status that some of the variants displayed in this paper may hold within the framework of cognitive linguistics theories and their role as emergent modal auxiliaries. Based on my findings, it appears that the status of gonna needs to be distinguished from that of the other variants.

5.1. The status of gonna

Results from my corpora show that gonna – both in its written form and its phonetic realizations – remains one of the most frequent forms within the paradigm, alongside going to. It is the only variant able to ‘compete’ with going to in terms of language change.

My results have several implications. Firstly, the proportion of contracted be with gonna and more contracted variants compared to be going to can be interpreted in two different ways. First, one can surmise that formality is at play here: gonna is traditionally associated with more informal settings and contexts (Leech Reference Leech2004) and the contracted form of be is therefore expected as it contributes to an overall ‘informal harmony’ within the structure. However, it is worth noting that be going to was initially also associated with very informal settings (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021) and the fact that full forms of be are found with it may consequently be harder to explain. I hypothesize that, rather than mainly being a matter of formality, the contracted form of be associated with gonna is indicative of processes of grammaticalization and auxiliation. This also explains why, the closer we get to more contracted variants of be going to / gonna, the more likely it is to find be elided. These variations in the contractions, contracting gonna even more in forms such as gon(’), are a sign of the auxiliation process at play here. If gonna or other contracted forms are to be considered modal auxiliaries, the presence of the auxiliary be becomes redundant and its systematic elision can be expected.

Secondly, the absence of a verb following gonna (three times more frequent with gonna than with going to) may highlight a pragmatic difference between going to and gonna, particularly in spoken conversation. Gonna may have acquired a metadiscursive role, being used as a way to keep the conversation flowing or regain the floor (Azorin Reference Azorin, Patrukhina and Bosbach2025).

The evidence from gonna calls into question its status as an emergent modal auxiliary. Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 137) classify it as a ‘semi-auxiliary’, but the scale may need to be revisited. Gonna, in comparison to going to, has acquired syntactic characteristics that make it more akin to modal auxiliaries such as must or can. With the elision of be and the removal of to, gonna may become an auxiliary in its own right. Consistent with this, and similarly to true modal auxiliaries, gonna shows no subject agreement when be is absent: forms such as *gonnas do not appear in my data. Its univerbation may thus steer it away from the category of ‘periphrastic modals’ (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 484) to make it a full-fledged modal auxiliary. From a morphosyntactic perspective, my findings strongly support this trajectory for gonna.

Finally, my results – in line with recent works – indicate that gonna truly seems to distance itself from be going to, not only morphologically but also syntactically, resulting in new variants developing out of gonna, not going to. One such example is Im(m)a, a contraction of I’m gonna, and not I’m going to. Although the form is not fully stabilized yet, as evidenced by variations at both the orthographic and phonetic levels, it is noteworthy that this form is created out of gonna, which is independent enough to generate variants of its own. The emergence of such forms confirms that gonna has begun to function as an independent construction. This shift is not merely formal; it also signals a syntactic reorganization that could, over time, warrant the recognition of gonna as a construction in its own right within the framework of Construction Grammar. Whether a semantic distinction also exists between going to and gonna Footnote 7 remains an open question. A sociolinguistic and pragmatic approach – though beyond the scope of this article – would also offer valuable insights into potential differences between going to and gonna. If we consider the network of forms associated with gonna within Construction Grammar, new nodes appear to have emerged, connecting it to its variants – gon/ima/imma/[a(ɪ)mənə] in my study. While gonna still maintains a link to going to, its mother form, the link may no longer be strictly hierarchical (i.e. vertical link), but rather horizontal, suggesting that gonna may now be seen as another possibility (‘lexical choice’) for going to, instead of a simplified spelling and production of going to.

5.2. Other variants

The morphosyntactic analyses revealed differences in distribution not only for be going to and gonna, but also for more marginal forms. The raw frequencies for these more marginal forms do not always allow for significant statistical results but it should be noted that these forms nevertheless exist. They are attested in a web corpus and/or a spoken corpus, whereas they do not occur, or very rarely, in more traditional corpora such as COCA. In the SBC, these more marginal forms account for over 30 percent of all forms within the paradigm (32.2 percent of all forms). While these forms may be partly genre-related, they are likely not exclusively so. As Hundt (Reference Hundt2004) notes, patterns which are viewed as genre-specific can spread and have an ‘impact on the systemic possibilities and the overall spread of a grammaticalizing pattern’. Moreover, the contracted forms seem to come from spoken language first, and then spread into written production, maybe starting with more loosely normed productions (such as Twitter here). This explains why many variants found orthographically are often called ‘phonetic’: phonetic respellings. However, the emergence of innovative spellings in digital platforms, such as Twitter or online forums, challenges the adequacy of this label. The comparison between these spoken and written modes of communication offers unique insight into how new linguistic forms emerge, stabilize and sometimes become conventionalized. As argued by Budts & Petré (Reference Budts and Petré2016) and Bybee & Hopper (Reference Bybee and Hopper2001), these marginalized forms may, in time, become conventionalized. Even though Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021) argue that such reductions are governed by conventions up to a certain point, and that one should therefore rarely come across gona or gunna for example, my corpora do present such occurrences. The very existence of these forms indicates that speakers are indeed using them for various purposes, whether it be style or register, or for sociolinguistic reasons (age, social class, ethnicity, etc.).

With respect to the morphosyntactic criteria, the analysis of subjects highlights one morphosyntactically constrained variant or variants in the forms of ‘Ima’ and [amənə]. It raises the question as to why such variation appears specifically with the pronoun I, rather than with others. At present, frequency appears to be the most plausible explanation. Also, the absence of an explicit subject is noteworthy in light of the grammaticalization process affecting these forms. However, this observation must be contextualized with respect to the corpora investigated: a spontaneous spoken corpus and a web corpus – Twitter in particular, which obeys strict rules such as character limits (280 characters at the time of data collection). Further research into subject distribution would be worthwhile, especially in relation to semantic parameters. Given that I is the most frequently used pronoun in the SBC, it would be useful to explore whether it is systematically associated with intention. Conversely, is there a pronoun which favors the inference meaning? A deeper semantic analysis of pronoun distribution across the different variants is therefore warranted. As for the absence of be, the co-occurrence of this phenomenon with very contracted variants derived from gonna is particularly interesting in relation to the ongoing process of auxiliation.

One theoretical issue that emerges from the paradigm of forms concerns the categorization of these more marginal variants, or their definition. Lorenz distinguishes ‘variation’, of which gonna would be an instance, from ‘reduction’ such as [amənə]. According to Lorenz, reductions are shaped by how someone speaks, considering situational parameters such as speech rate or preceding phonetic element, while variations are determined by who speaks, taking into consideration social factors (Lorenz Reference Lorenz, Hasselgård, Ebeling and Ebeling2013: 146). This distinction is conceptually useful, but in actual speech, these parameters often interact in complex ways, making it difficult to categorically separate variants into purely ‘variations’ or ‘reductions’. The typology presented in this article does not allow for such separation, as the more marginal forms also seem to exhibit specific distributional patterns.

A final remark concerns the notion of dichotomy, as introduced in the title of this article. While the aim of this study was to challenge and move beyond the traditional be going to / gonna dichotomy, the findings suggest that this binary opposition remains highly relevant. On the one hand, the presence of a variety of marginal and innovative variants demonstrates that the paradigm is more complex and dynamic than a mere two-form system. On the other hand, the fact that these less frequent variants typically derive from either be going to or gonna reinforces the idea that these two forms continue to operate as central anchors within the paradigm. This is further evidenced by the frequency distributions observed: the analysis of verb types was restricted to be going to and gonna alone, not only for methodological consistency, but also because of insufficient frequencies for the other variants. Thus, rather than fully breaking free from the dichotomy, the analysis points to a structured system where binary oppositions coexist with, and may even give rise to, more gradual forms of variation. This gradient variation manifests in a continuum of forms that seem to differ in terms of phonetic reduction, morphological structure and degree of grammaticalization.

6. Conclusion

To conclude, this study has shown that going to and gonna form the core of a broader network of related variants. While many of the other forms may appear marginal, they nonetheless exist and are connected to the ‘mother’ formsFootnote 8 – now considered to be be going to and gonna. The morphosyntactic analysis presented here constitutes a first step towards understanding the choice of one variant over another. At this stage, it appears that the absence of be may partly account for the choice of one variant (gonna, gon’) over others such as going to, or even Im(m)a.

The typology established in this study provides a useful overview of the range of available variants. However, the challenge of categorizing these forms in the spoken corpus calls into question the stability and boundaries of the variations. The presence of these marginalized forms in corpora such as the SBC and CCTweets suggests that these forms are gaining acceptance in certain contexts. Although the present study does not directly test for social meanings, the fact that such variants are actively used by speakers points to their potential role in expressing social identity and sociolinguistic factors such as age, class or ethnicity. This hypothesis would need to be explored in further research. These forms are therefore not mere linguistic curiosities; they serve as indicators of how language is evolving in real time, particularly in the context of global, digital communication. In this respect, the choice of corpora appears fruitful: despite their differences, both corpora reveal similar forms and continua of variation, and both highlight the emergence of innovative linguistic patterns.

For the be going to / gonna paradigm – as well as for other contracted forms – there exists a clear relationship between spoken and written data: indeed, structures such as wanna or gonna first emerged as contractions in rapid speech and were later institutionalized (Bolinger Reference Bolinger1981). This pattern holds for almost all the variants found in the written corpus: gon, I’ma, imma. These variants originate in spoken language and are then being transferred, and sometimes modified, in written formats. Therefore, the cross-comparison between the two mediums is essential to analyze language change.

This study also confirms the independence that gonna is gaining with respect to be going to. The form has begun to develop variants of its own and is distinct from be going to in terms of distribution as well as syntactic properties – as shown by the contraction and absence of be in particular. Nevertheless, the criteria examined here represent only a starting point in describing gonna and more contracted variants. Further research, particularly incorporating semantic-pragmatic criteria as well as sociolinguistic and contextual variables, will be necessary to fully clarify their status within cognitive and usage-based frameworks such as Construction Grammar.

Footnotes

1 Definition taken from the Cambridge Dictionary (online version): https://dictionary.cambridge.org/dictionary/english/go

2 In the analyses based on COHA (figure 1, section 2.2) and COCA (table 1, section 3.1), the query going to V has been used to distinguish the grammaticalized form of going to from its lexical counterpart (going to + noun). Such a distinction is unnecessary in the case of gonna, which can be retrieved as a standalone form.

3 It should be noted that some variants also seem to be mislabeled in the transcriptions given.

4 All annotations were carried out by the author. In cases of uncertainty, the relevant items were double-checked and discussed with another specialist to ensure consistency and reliability.

5 Variants similar to Ima were counted, all in the ‘contracted be’ category.

6 As one of the reviewers suggested, the absence of such forms might be linked to the need to avoid incomprehensibility or possible constructional confusion: upon hearing a form like Imnota, speakers might be likely to misinterpret it as a reduced form of I am not to + V (e.g. I am not to attend the meeting), depending on the prosodic context.

7 For a more detailed discussion, see Mikkelsen & Morin (Reference Mikkelsen and Morin2025).

8 As suggested by one of the reviewers, the so-called ‘mother’ forms might be viewed as prototypes in the sense of exemplar theory (Pierrehumbert Reference Pierrehumbert2001). Gonna, for instance, may be stored as such in the mental lexicon, with other, less prototypical forms clustering around it. See Tizón-Couto & Lorenz (Reference Tizón-Couto and Lorenz2018) for an attempt in this direction with variants of have to.

References

Anthony, Laurence. 2021. Antconc. Tokyo: Waseda University. https://laurenceanthony.net/softwareGoogle Scholar
Azorin, Leela. 2025. Gonna et le métadiscours dans l’oral spontané. In Patrukhina, Liubov & Bosbach, Jeanne Vigneron (eds.), Linguistique et didactique de l’oral spontané en Europe francophone: Perspectives croisées des cultures scientifiques de l’allemand, du français et de l’anglais (special issue of Cahiers d’études germaniques 89), 239–56.Google Scholar
Azorin, Leela & Lansari, Laure. 2025. How progressive is gonna be Ving? In Carlucci, Alessandro & Nykiel, Jerzy (eds.), The progressive revisited: Historical and quantitative studies in Germanic and Romance languages (Studies in Language Companion Series 236), 98125. Amsterdam: John Benjamins.10.1075/slcs.236.04azoCrossRefGoogle Scholar
Berglund, Ylva. 2000a. ‘You’re gonna, you’re not going to’: A corpus-based study of colligation and collocation patterns of the (BE) going to construction in Present-day spoken British English. In Lewandowska-Tomaszczyk, Barbara & Melia, Patrick James (eds.), PALC ’99: Practical applications in language corpora, 161–92. Frankfurt am Main: Peter Lang.Google Scholar
Berglund, Ylva. 2000b. Gonna and going to in the spoken component of the British National Corpus. In Mair, Christian & Hundt, Marianne (eds.), Corpus linguistics and linguistic theory – Papers from the twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20), 3549. Amsterdam and Atlanta, GA: Rodopi.Google Scholar
Berglund, Ylva. 2005. Expressions of future in present-day English: A corpus-based approach (Acta Universitatis Upsaliensis 126). Uppsala: Uppsala Universitet.Google Scholar
Biber, Douglas, Johansson, Stig, Leech, Geoffrey N., Conrad, Susan & Finegan, Edward. 2021. Grammar of spoken and written English, 2nd edn. Amsterdam: John Benjamins.10.1075/z.232CrossRefGoogle Scholar
Bolinger, Dwight. 1981. Consonance, dissonance, and grammaticality: The case of wanna. Language & Communication 1(2–3), 189206.10.1016/0271-5309(81)90012-4CrossRefGoogle Scholar
Budts, Sara & Petré, Peter. 2016. Reading the intentions of be going to. On the subjectification of future markers. Folia Linguistica 37(1), 132.CrossRefGoogle Scholar
Bybee, Joan L. & Hopper, Paul J. (eds.). 2001. Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Bybee, Joan, Perkins, Revere & Pagliuca, William. 1994. The evolution of grammar: Tense, aspect, and modality in the languages of the world. Chicago: University of Chicago Press.Google Scholar
Catasso, Nicholas. 2012. Where is going to going to go? A generative proposal between diachrony and synchrony. International Journal of Linguistics 4(1), 90129.10.5296/ijl.v4i1.1413CrossRefGoogle Scholar
Col, Gilles & Duchet, Jean-Louis. 2000. Éléments pour une définition des valeurs de gonna en anglais, à partir du corpus électronique COLT. Les Cahiers FoReLL – Formes et Représentations en Linguistique et Littérature (Complexité Syntaxique et Sémantique: Études de Corpus) 14, 167–86.Google Scholar
Col, Gilles & Duchet, Jean-Louis. 2001. Forme non stable et grammaticalisation: Le cas de gonna en anglais contemporain. In Col, Gilles & Roulland, Daniel (eds.), Grammaticalisation 2 – Concepts et cas (Travaux linguistiques du CERLICO 14). Rennes: Presses universitaires de Rennes.Google Scholar
Collins, Peter. 2009. Modals and quasi-modals in English. Leiden: Rodopi.10.1163/9789042029095CrossRefGoogle Scholar
Danchev, Andrei & Kytö, Merja. 1994. The construction be going to + infinitive in Early Modern English. In Kastovsky, Dieter (ed.), Studies in Early Modern English, 5978. Berlin: De Gruyter Mouton.10.1515/9783110879599.59CrossRefGoogle Scholar
Daugs, Robert. 2017. On the development of modals and semi-modals in American English in the 19th and 20th centuries. Studies in variation, contacts and change in English. Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki 19. https://varieng.helsinki.fi/series/volumes/19/daugs/Google Scholar
Daugs, Robert. 2021. Contractions, constructions and constructional change: Investigating the constructionhood of English modal contractions from a diachronic perspective. In Hilpert, Martin, Cappelle, Bert & Depraetere, Ilse (eds.), Modality and diachronic Construction Grammar, 1252. Amsterdam: John Benjamins.Google Scholar
Davies, Mark. 2008–. The Corpus of Contemporary American English. www.english-corpora.org/coca/ (accessed 1 September 2024).Google Scholar
Davies, Mark. 2010. The Corpus of Historical American English. www.english-corpora.org/coha/ (accessed 1 September 2024).Google Scholar
Disney, Stephen J. 2009. The grammaticalisation of ‘be going to’. Newcastle Working Papers in Linguistics 15, 6381.Google Scholar
Du Bois, John W., Chafe, Wallace L., Meyer, Charles, Thompson, Sandra A., Englebretson, Robert & Martey, Nii. 2000–5. Santa Barbara Corpus of Spoken American English, parts 14. Philadelphia, PA: Linguistic Data Consortium. www.linguistics.ucsb.edu/research/santa-barbara-corpus (accessed 1 September 2024).Google Scholar
Gesuato, Sara & Facchinetti, Roberta. 2011. GOING TO V vs GOING TO BE V-ing: Two equivalent patterns? ICAME Journal 35, 5994.Google Scholar
Gries, Stefan Thomas. 2024. Coll.analysis 4.1. A script for R to compute perform collostructional analyses. www.stgries.info/teaching/groningen/index.htmlGoogle Scholar
Haspelmath, Martin. 1999. Why is grammaticalization irreversible? Linguistics 37(6), 1043–68.10.1515/ling.37.6.1043CrossRefGoogle Scholar
Haspelmath, Martin. 2004. On directionality in language change with particular reference to grammaticalization. In Fischer, Olga, Norde, Muriel & Perridon, Harry (eds.), Up and down the cline – The nature of grammaticalization, 1744. Amsterdam: John Benjamins.10.1075/tsl.59.03hasCrossRefGoogle Scholar
Herring, Susan C. 2012. Grammar and electronic communication. In Chapelle, Carol A. (ed.), The encyclopedia of applied linguistics. Hoboken, NJ: Wiley-Blackwell.Google Scholar
Hoffmann, Thomas & Trousdale, Graeme. 2013. Construction Grammar: Introduction. In Hoffmann, Thomas & Trousdale, Graeme (eds.), The Oxford handbook of Construction Grammar, 112. Oxford: Oxford University Press.Google Scholar
Hopper, Paul J. & Traugott, Elizabeth Closs. 2003. Grammaticalization, 2nd edn. Cambridge: Cambridge University Press.10.1017/CBO9781139165525CrossRefGoogle Scholar
Huddleston, Rodney & Pullum, Geoffrey K. et al. 2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press.10.1017/9781316423530CrossRefGoogle Scholar
Hundt, Marianne. 2004. Animacy, agentivity, and the spread of the progressive in Modern English. English Language and Linguistics 8(1), 4769.10.1017/S1360674304001248CrossRefGoogle Scholar
Krug, Manfred G. 2000. Emerging English modals: A corpus-based study of grammaticalization. Berlin: De Gruyter Mouton.10.1515/9783110820980CrossRefGoogle Scholar
Krug, Manfred G. 2001. Frequency, iconicity, categorization: Evidence from emerging modals. In Bybee & Hopper (eds.), 309–36.Google Scholar
Krug, Manfred G. 2011. Auxiliaries and grammaticalization. In Heine, Bernd & Narrog, Heiko (eds.), The Oxford handbook of grammaticalization, 547–58. Oxford: Oxford University Press.Google Scholar
Leech, Geoffrey. 2004. Meaning and the English verb. London: Pearson Education.Google Scholar
Leech, Geoffrey. 2012. Modality on the move: The English modal auxiliaries 1961–1992, In Facchinetti, Roberta, Palmer, Frank & Krug, Manfred (eds.), Modality in contemporary English, 223–40. Berlin: De Gruyter Mouton.Google Scholar
Leech, Geoffrey. 2013. Where have all the modals gone? An essay on the declining frequency of core modal auxiliaries in recent standard English. In Marín-Arrese, Juana I., Carretero, Marta, Hita, Jorge Arús & van der Auwera, Johan (eds.), English modality: Core, periphery and evidentiality, 95116. Berlin: De Gruyter Mouton.10.1515/9783110286328.95CrossRefGoogle Scholar
Littman, Justin & Wrubel, Laura. 2019. Climate Change Tweets Ids. Harvard Dataverse. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/5QCCUU (accessed 1 September 2024).CrossRefGoogle Scholar
Lorenz, David. 2012a. Contractions of English semi-modals: The emancipating effect of frequency. PhD Dissertation, Albert-Ludwigs-Universität Freiburg.Google Scholar
Lorenz, David. 2012b. The perception of gonna and gotta – A study of emancipation in progress. In Botinis, Antonis (ed.), ExLing 2012: Proceedings of the 5th tutorial and research workshop on Experimental Linguistics, 7780. Athens: ExLing Society.Google Scholar
Lorenz, David. 2013. Is gonna a word? From reduction to emancipation. In Hasselgård, Hilde, Ebeling, Jarle & Ebeling, Signe Oksefjell (eds.), Corpus perspectives on patterns of lexis, 133–52. Amsterdam: John Benjamins.10.1075/scl.57.11lorCrossRefGoogle Scholar
Lorenz, David. 2020. Converging variations and the emergence of horizontal links: To-contraction in American English. In Sommerer, Lotte & Smirnova, Elena (eds.), Nodes and networks in diachronic Construction Grammar, 244–74. Amsterdam: John Benjamins.Google Scholar
Lorenz, David & Tizón-Couto, David. 2016. Perception of reduced words: Chunking and predictability. In Botinis, Antonis (ed.), ExLing 2016: Proceedings of 7th tutorial and research workshop on Experimental Linguistics, 99102. Athens, Greece: ExLing Society.Google Scholar
Lorenz, David & Tizón-Couto, David. 2020. Not just frequency, not just modality: Production and perception of English semi-modals. In Hohaus, Pascal & Schulze, Rainer (eds.), Re-assessing modalising expressions: Categories, co-text, and context (Studies in Language Companion Series 216), 79108. Amsterdam: John Benjamins.10.1075/slcs.216.04lorCrossRefGoogle Scholar
Lorenz, David & Tizón-Couto, David. 2024. Coalescence and contraction of V-to-Vinf sequences in American English – Evidence from spoken language. Corpus Linguistics and Linguistic Theory 20(1), 136.10.1515/cllt-2015-0067CrossRefGoogle Scholar
Lyons, John. 1968. Introduction to theoretical linguistics. Cambridge: Cambridge University Press.10.1017/CBO9781139165570CrossRefGoogle Scholar
Machová, Dagmar. 2015. The degree of grammaticalization of gotta, gonna, wanna and better: A corpus study. Topics in Linguistics 15(1).CrossRefGoogle Scholar
Mair, Christian. 1997. The spread of the going to-future in written English: A corpus-based investigation into language change in progress. In Hickey, Raymond & Puppel, Stanisłav (eds.), Language history and linguistic modelling: A Festschrift for Jacek Fisiak on his 60th birthday, vol. I: Language history, 1537–44. Berlin and New York: Mouton de Gruyter.Google Scholar
McCulloch, Gretchen. 2019. Because Internet: Understanding the new rules of language. New York: Riverhead Books.Google Scholar
Mikkelsen, Olaf & Morin, Cameron. 2025. Register as a source of non-equivalent contracted constructions: going to and gonna in British English. English Language and Linguistics (special issue, Cameron Morin, Agnes Celle & Alessandro Basile (eds.), Cognitive approaches to variation and change in the English modal domain) 29(3). https://doi.org/10.1017/S1360674325100373CrossRefGoogle Scholar
Palmer, Frank R. 1988. The English verb. London and New York: Longman.Google Scholar
Perez, Aveline. 1990. Time in motion: Grammaticalisation of the be going to construction in English. La Trobe University Working Papers in Linguistics 3, 4964.Google Scholar
Petré, Peter & Van de Velde, Freek. 2018. The real-time dynamics of the individual and the community in grammaticalization. Language 94(4), 867901.CrossRefGoogle Scholar
Pierrehumbert, Janet B. 2001. Exemplar dynamics: Word frequency, lenition and contrast. In Bybee & Hopper (eds.), 137–58.Google Scholar
Poplack, Shana & Tagliamonte, Sali. 1999. The grammaticization of going to in (African American) English. Language Variation and Change 11(3), 315–42.10.1017/S0954394599113048CrossRefGoogle Scholar
Pullum, Geoffrey K. 1997. The morpholexical nature of English to-contraction. Language 73(1), 79102.10.2307/416594CrossRefGoogle Scholar
Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey & Svartvik, Jan. 1985. A comprehensive grammar of the English language. London and New York: Longman.Google Scholar
Szmrecsanyi, Benedikt. 2003. Be going to versus will/shall: Does syntax matter? Journal of English Linguistics 31(4), 295323.10.1177/0075424203257830CrossRefGoogle Scholar
Tagliamonte, Sali A. 2011. Variationist sociolinguistics: Change, observation, interpretation. Oxford: John Wiley & Sons.Google Scholar
Tizón-Couto, David & Lorenz, David. 2018. Realisations and variants of have to: What corpora can tell us about usage-based experience. Corpora 13(3), 371–92.10.3366/cor.2018.0154CrossRefGoogle Scholar
Traugott, Elizabeth Closs & Trousdale, Graeme. 2013. Constructionalization and constructional changes. Oxford: Oxford University Press.10.1093/acprof:oso/9780199679898.001.0001CrossRefGoogle Scholar
Figure 0

Figure 1. Occurrences of going to V and gonna in COHA, from 1890 to 20102

Figure 1

Table 1. going to V and gonna in CCTweets, the SBC and COCA

Figure 2

Table 2. Typology of variants in the be going to / gonna paradigm in the SBC and CCTweets (normalized frequencies per million words)

Figure 3

Figure 2. Bar plots showing the distribution of subject per variant in the SBC and CCTweets (logarithmic scale on the y-axis)

Figure 4

Figure 3. Presence, absence and form of be in both corpora, per variant (relative frequencies, %)5

Figure 5

Figure 4. Presence of negation in both corpora, per variant (relative frequencies, %)

Figure 6

Table 3. Distribution of verbs with going to and gonna across both corpora (normalized frequencies per 1,000 words)

Figure 7

Figure 5. Bar plots showing the distribution of verbs for going to and gonna across both corpora (normalized frequencies per 1,000 words)

Figure 8

Figure 6. Plot showing the results of the distinctive collexeme analysis for the most frequent verbs with going to and gonna