Paper Highlights
-
• Introduces the “Author Fluency Task” (AFT) to assess print exposure
-
• AFT is more correlated with L2 vocabulary than the Author Recognition Test (ART)
-
• Print exposure measures vary in utility across L1/L2 groups
-
• L2 performance only partially explained by verbal fluency skill
-
• Recommends AFT for measuring L2 print exposure
1. Background
1.1. Assessment of L2 print exposure
Print exposure predicts individual differences in component skills of reading in L1 speakers (Mol & Bus, Reference Mol and Bus2011; Moore & Gordon, Reference Moore and Gordon2015), yet the amount an individual reads for pleasure is almost necessarily a function of the pleasure derived from reading. This leads to the “Matthew effect”: a growing gap between the rich and poor in reading skills (Cunningham & Stanovich, Reference Cunningham and Stanovich1998; Mol & Bus, Reference Mol and Bus2011; Stanovich, Reference Stanovich1986). An adept reader may find the practice both enjoyable and rewarding, reaping the benefits of increased exposure. Less proficient readers, however, may find reading more frustrating than gratifying, and avoid picking up books in their free time – consequently, their skills may stagnate, making reading even less enjoyable.
Reading difficulties (and a concomitant skill gap) may be further compounded in L2, as learners often struggle with more obscure vocabulary than they encounter in daily speech. Surmounting this difficulty is crucial for second language acquisition (SLA), however, because a significant portion of L2 vocabulary is acquired incidentally through reading (Restrepo-Ramos, Reference Restrepo-Ramos2015). Naturally, learners need significant language exposure to reach their full potential in L2, meaning researchers require precise psychometric instruments to quantify L2 speakers’ exposure to print.
The Author Recognition Test (ART; Stanovich & West, Reference Stanovich and West1989) is the standard test of print exposure, in which participants are asked to select authors from a checklist of names. Foils (i.e., fake names) are usually included to discourage guessing. As a proxy measure for reading experience, ART is well-validated in L1 populations as a predictor of individual differences in a variety of reading skills (McCarron, Reference McCarron, Nesi and Milin2026), including vocabulary (Dąbrowska, Reference Dąbrowska2018), word recognition (Chateau & Jared, Reference Chateau and Jared2000), spelling (Stanovich & West, Reference Stanovich and West1989), reading frequency (Acheson et al., Reference Acheson, Wells and MacDonald2008; Moore & Gordon, Reference Moore and Gordon2015), sentence processing (Acheson et al., Reference Acheson, Wells and MacDonald2008), oral language skills (Acheson et al., Reference Acheson, Wells and MacDonald2008; Mol & Bus, Reference Mol and Bus2011), reading comprehension, and academic achievement (Mol & Bus, Reference Mol and Bus2011).
Despite this, ART faces concerns about its reliability and validity in L2 populations (McCarron & Kuperman, Reference McCarron and Kuperman2021; Vermeiren & Brysbaert, Reference Vermeiren and Brysbaert2023). Essentially, L2 speakers generally know very few authors on ART – the question is whether they do not read enough in L2, or if they are simply not reading the kinds of authors on ART. This distinction is not trivial, as there is substantially more variation in the quantity and kind of L2 compared to L1 exposure (Flege, Reference Flege, Piske and Young-Scholten2008, Reference Flege, Nyvad, Hejná, Højen, Jespersen and Sørensen2019; Gullifer & Titone, Reference Gullifer and Titone2020), meaning the selection of authors who are representative of L1 reading experience may not index the same latent variable in L2. If so, it may be that a valid and reliable measure of reading experience in L2 must acknowledge and exploit this variation (in the parlance of computer programmers) as a “feature, not a bug”.
Alternatively, it may be that extensive L2 reading for pleasure does not materially contribute to second language proficiency. Given the vital role of reading for L1 proficiency, this may seem unlikely. However, due to the preponderance of language input online, one might wonder whether Internet text exposure is a better predictor of language skill than print exposure for most speakers. In fact, recent L1 research suggests the opposite – when comparing effects of print exposure, years of postsecondary education, reading attitudes and website exposure in a study of fluent Norwegian speakers, Strømsø (Reference Strømsø2024) found only print exposure predicted reading comprehension scores. Moreover, a high degree of online exposure negated the positive effects of print exposure for participants with a high degree of both. Although not yet replicated in L2, this finding demonstrates that book reading is a uniquely beneficial kind of exposure.
There are reasons to expect book reading to be a better source of language input compared to the Internet. Online texts found on social media and discussion forums tend to have more in common with spoken rather than written language, being more conversational and informal (Johns et al., Reference Johns, Dye and Jones2020; Snow, Reference Snow2010), and there are critical distinctions between the two modalities. Compared to speech, written language features greater lexical density and diversity (Berman & Nir, Reference Berman and Nir2010; Roland et al., Reference Roland, Dick and Elman2007), as well as longer sentences, and correspondingly, more complex syntax (Biber, Reference Biber1988), including more passive constructions (Dąbrowska & Street, Reference Dąbrowska and Street2006) and relative clauses (Roland et al., Reference Roland, Dick and Elman2007). This follows from the disembodied nature of text, which must construct context and meaning ex nihilo, whereas spoken language can create meaning through reciprocity and shared context (Clark, Reference Clark2020; Snow, Reference Snow2010). Corpus studies of children’s books have also revealed they contain more relative clauses, more complex syntax and greater lexical density and diversity compared to both child-directed speech (Dawson et al., Reference Dawson, Hsiao, Tan, Banerji and Nation2021; Hsiao et al., Reference Hsiao, Dawson, Banerji and Nation2022; Nation et al., Reference Nation, Dawson and Hsiao2022) and adult television transcripts (Cunningham & Stanovich, Reference Cunningham and Stanovich1998). Books are thus not only qualitatively different from other sources of input – they also distinguish themselves in the very early stages of language development.
1.2. A semantic fluency measure of L2 print exposure
The primary advantage of proxy measures like ART is that they avoid potential social desirability biases associated with self-report measures such as reading surveys (West et al., Reference West, Stanovich and Mitchell1993). Yet a standard ART does not indicate whether recognising an author’s name reflects personal reading experience or general reading exposure (“primary versus secondary print knowledge”; Martin-Chang & Gould, Reference Martin-Chang and Gould2008). Additionally, some research suggests ART reflects general cultural knowledge rather than reading experience specifically (Moore & Gordon, Reference Moore and Gordon2015; Vermeiren et al., Reference Vermeiren, Vandendaele and Brysbaert2022). Ideally, researchers would use a test that measures the latter rather than the former, to the extent that these concepts can be extricated.
The fundamental assumption of ART is that knowledge of author names offers a reliable proxy for print exposure. But because L2 speakers have different kinds of cultural exposure to their target language, and consequently may encounter different authors when reading in L2, a potential alternative might be to simply ask second language speakers to name L2 authors who come to mind. Such measures of semantic fluency (SF) involve listing as many items as possible from a given category in a set time, with one point for each unique and valid item. SF tasks are often used in estimating the advancement of neurodegenerative diseases such as Alzheimer’s and dementia (Macoir et al., Reference Macoir, Sylvestre and Turgeon2006; Troyer et al., Reference Troyer, Moscovitch and Winocur1997, Reference Troyer, Moscovitch, Winocur, Leach and Freedman1998). However, SF tasks have also been used in L2 studies, where bilinguals typically generate fewer category items and proper names than monolinguals (e.g., Gollan et al., Reference Gollan, Montoya and Werner2002). This undoubtedly relates partially to the speed of lexical access, but also to fewer encounters with L2 exemplars; naturally, the two concepts are interrelated. Yet this group difference is not deterministic, as evidence shows that proficient bilingual adults can perform equivalently to monolinguals on SF tasks (Friesen et al., Reference Friesen, Luo, Luk and Bialystok2015). An “Author Fluency Task” (AFT) would thus rely on the assumption that individuals with greater L2 print exposure can also access more author names extemporaneously, consistent with the “principle of likely need” (Jones et al., Reference Jones, Dye and Johns2017).
What is the benefit of developing a new proxy measure of print exposure, as opposed to simply creating an ART for L2 populations? Although an L2 ART may be more reliable than one that has been validated with L1 speakers, it would nevertheless require calibration for each L2 population evaluated, and the scores would not be directly comparable between groups. Another advantage of AFT over ART is that it might level the playing field for L2 speakers, providing all participants equal time to demonstrate their print knowledge, whereas L2 speakers may not have encountered as many authors on ART. Author fluency and recognition also rely on very different skills, with AFT requiring an extensive search of explicit memory surrounding reading experience, and ART arguably a less demanding task, given that it requires participants only to identify familiar author names as opposed to retrieving them independently. Comparatively, AFT might be more difficult, but tasks that target productive rather than receptive use of language are more useful for advanced learners of English (Webb & Kagimoto, Reference Webb and Kagimoto2009), our primary population of interest. Granted, just like recognising an author on ART, naming an author does not necessarily reflect personal reading experience. Nevertheless, there is reason to expect that author names that are recalled could be more indicative of primary print exposure than those that are recognised. Recognition tasks like ART have been argued to reflect “marginal knowledge”, or information that is stored in memory but inaccessible unless presented (Berger et al., Reference Berger, Hall and Bahrick1999; Cantor et al., Reference Cantor, Eslick, Marsh, Bjork and Bjork2015), suggesting it is not deeply encoded. Semantic fluency, in contrast, primarily indexes the semantic organisation of memory (Lehtinen et al., Reference Lehtinen, Kautto and Renvall2023), and by necessity, this requires a substantial body of well-integrated information. To evaluate AFT as a measure of L2 print exposure – and to compare with ART – it would need to be validated using outcome measures for L2 vocabulary that are often acquired through extensive reading experience. For this reason, we decided to use measures of formulaic language.
1.3. Formulaic (and functional) language in L2
Some vocabulary is especially difficult for L2 speakers to acquire and use naturally. Discourse connectives, which link ideas across sentence clauses, are one such example. They are often associated with written language (in particular, academic writing; Biber, Reference Biber2006) and may be composed of either single (“consequently”, “nevertheless”) or multiple words (“as long as”, “in addition”, “on the other hand”, etc.). In the latter case, such connectives are clear examples of formulaic language, typically defined as expressions comprised of at least two words that are processed as a single unit (Wray, Reference Wray2002, Reference Wray2006). We contend, however, that single-word connectives may also be considered formulaic in the sense that they encode a set of “operating instructions” for interpreting the coherence relations linking separate clauses (Andersson, Reference Andersson2016; Li et al., Reference Li, Mak, Evers-Vermeul and Sanders2017). Many connectives also blur the line between single and multi-word items, as they often begin life as lexical bundles but become lexicalised as single words over time due to their frequent co-occurrence and entrenchment (e.g., “indeed”, “furthermore”, “moreover”, “nevertheless”, etc.). We argue that this highlights how the distinction between single and multi-word processing is largely arbitrary, as put forth by “single-system” models of language (e.g., Arnon & Christiansen, Reference Arnon and Christiansen2017; Bybee, Reference Bybee2007). Therefore, we consider connectives to be formulaic language under a broader, usage-based perspective that emphasises their pragmatic function as linguistic “prefabs” (Bybee, Reference Bybee2006). Connectives, like other constructions, are “partially schematic—that is, they have positions that can be filled by a variety of words or phrases” (Bybee, Reference Bybee2010, p. 25). For connectives, these “open positions” are clauses that must fulfil the conditions of their coherence relations. For example, “whereas” requires a contrastive clause, e.g., “whereas x [statement], y [contrast]”, and “although” requires a concession relation, e.g., “although x [statement], y [concession]”. Similar to other kinds of formulaic language, connectives require speakers to “chunk” together words or phrases into meaningful sequences, and this is often developed through substantial implicit learning (Ellis, Reference Ellis1996).
Because they lack a strict lexical definition (Van Silfhout et al., Reference Van Silfhout, Evers-Vermeul and Sanders2015; Zufferey et al., Reference Zufferey, Mak, Degand and Sanders2015), connectives can be difficult to acquire through explicit teaching. One explicit approach for teaching L2 connectives is simply to provide an approximate L1 equivalent; yet the closest translation in L1 may not necessarily encode the same relations in L2 in all cases (Zufferey & Gygax, Reference Zufferey and Gygax2017). This can be problematic for second language speakers who often filter L2 through the lens of L1 – particularly during early stages of acquisition – relying on an inconsistent equivalence between L1 and L2 vocabulary items (Ringbom, Reference Ringbom2016). Consequently, connectives pose a serious challenge, with even very advanced L2 speakers often struggling to understand how and when to use them (Wetzel et al., Reference Wetzel, Zufferey and Gygax2020).
Similarly, word collocations (e.g., “weak tea” preferred over “feeble tea”) are another obstacle for L2 learners. Although these can be learned through explicit instruction, they are more challenging than learning single words (Peters, Reference Peters2014, Reference Peters2016), and their virtually endless number means they are likely an inefficient use of targeted language instruction, which largely focuses on individual words (Schmitt, Reference Schmitt2010). However, collocations can be acquired incidentally through statistical learning from input, both in L1 and L2 (Pellicer-Sánchez, Reference Pellicer-Sánchez2017; Sonbul & Schmitt, Reference Sonbul and Schmitt2013; Webb et al., Reference Webb, Newton and Chang2013), and the more language input one receives, the more these associations are formed. Accordingly, L2 speakers process L2 collocations more slowly than L1 speakers (Siyanova & Schmitt, Reference Siyanova and Schmitt2008) and use fewer collocations in L2, which also tend to be congruent with how words are paired in their L1 (Granger, Reference Granger and Cowie1998).
Whereas the importance of selecting the correct connective may be clear, the significance of collocation knowledge may be less evident. After all, what difference is there between “raise prices” and “lift prices”? If “raise” and “lift” are essentially synonymous, surely either one will serve the same purpose. But all word pairings are not created equal, and set phrases are subject to certain preferential selection constraints. Indeed, although speakers of a language may correctly infer the meaning of a novel expression, formulaic language is processed more quickly and accurately (Ellis et al., Reference Ellis, Simpson-Vlach and Maynard2008; Hallin & Van Lancker Sidtis, Reference Hallin and Van Lancker Sidtis2017). Therefore, researchers have posited that this preference for formulaic language ostensibly functions to ease processing burdens between communicators (Wray, Reference Wray2002).
What constitutes as “formulaic”, however, is largely (though not solely) a matter of frequency of occurrence (Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Van Heuven2011), and L2 learners are even more sensitive to frequency for formulaic language compared to natives (Ellis et al., Reference Ellis, Simpson-Vlach and Maynard2008). Corpus studies reveal that written and spoken language are also distinct in their use of formulaic language; certain collocations are more common in writing, whereas others appear more frequently in speech (Gablasova et al., Reference Gablasova, Brezina and McEnery2017; Shin, Reference Shin2007), connective frequencies vary by modality and register (Andersson & Sundberg, Reference Andersson and Sundberg2021) and connectives use is more varied in writing (Tskhovrebova et al., Reference Tskhovrebova, Zufferey and Gygax2022). Given that L2 learners have comparatively less exposure, and are more likely to interpret formulaic language in L2 serially (i.e., word-by-word) rather than processing into meaningful “chunks” as in L1 (Conklin & Schmitt, Reference Conklin and Schmitt2012), connectives and collocations present a significant hurdle. Accordingly, L2 writing and speech is often characterised by an overreliance on certain connectives (Wetzel et al., Reference Wetzel, Zufferey and Gygax2020), and features less formulaic language in general (Granger, Reference Granger and Cowie1998; Pérez-Llantada, Reference Pérez-Llantada2014). Formulaic expressions, however, are often less about the meaning of individual words than understanding how words relate to each other. As J.R. Firth put it, echoing Wittgenstein, “you shall know a word by the company it keeps” (Reference Firth1957, p. 11).
Although we classify both connectives and collocations as formulaic language, we reiterate that there are critical distinctions between the two which are important for interpreting our results. If collocations are a more canonical example of formulaic language, connectives are perhaps more “functional” than formulaic. This is due to the kinds of meanings they convey. Connectives encode procedural meaning (Blakemore, Reference Blakemore2002) and guide inferences between clauses, whereas collocations encode conceptual meaning, reflecting learned associations between co-occurring words. Connectives are further complicated by their polyfunctionality, as a particular connective may perform a different role depending on semantic or pragmatic context (as in the French “en effet”, which can convey causal or confirmational coherence relations; Zufferey & Gygax, Reference Zufferey and Gygax2017). Further complicating matters, temporal prepositions such as “since” or “while” – already challenging for many L2 speakers – can double as discourse connectives; compare, for example, “since he had surgery, he hasn’t come hiking” and “since he had surgery, he can’t come hiking”. This polyfunctionality requires speakers to distinguish subtle gradations in relational meaning that are not required for collocations, and which span across separate clauses. However, both connectives and collocations require substantial experience to master, and print exposure likely helps speakers attune to the statistical regularities that inform their use. For these reasons, we use both in this study as a validation of AFT.
1.4. Contributions of L1- and L2-specific skills for L2 learning
Although the importance of L1 input is well-accepted, the degree of influence of L1- versus L2-specific skills in SLA remains a matter of debate. Language transfer theories (Baker et al., Reference Baker, Stoolmiller, Good and Baker2011; Cummins, Reference Cummins1979; Sparks et al., Reference Sparks, Patton, Ganschow and Humbach2012) posit that greater L1 proficiency affords a proportionate degree of linguistic knowledge in L2, and while there is considerable evidence for this (Sparks, Reference Sparks, (Edward) Wen, Skehan and Sparks2023), some have argued it is limited to more general language skills such as phonology and pragmatics rather than syntax and vocabulary (Verhoeven, Reference Verhoeven1994). For our present discussion, the role of L1 print exposure is particularly relevant, and there is evidence of its influence on L2 reading skills, including decoding and comprehension (Sparks et al., Reference Sparks, Patton, Ganschow and Humbach2012). One study showed that while L1 German print exposure (as measured by a German ART) predicted L2 French connectives knowledge, a French ART did not (Wetzel et al., Reference Wetzel, Zufferey and Gygax2020). Although this may be attributable to language interdependence, we contend that the findings are expected for this population of adolescent beginner L2 speakers, who generally have little L2 exposure – as the authors point out, their participants knew very few of the second-language authors on ART. Since the effect of print exposure is cumulative over a lifetime, a more interesting case might be to compare L1 and L2 print exposure measures in an older, more proficient L2 population. This is what we endeavoured to do in the present study.
1.5. Present study
This study received ethics approval [reference R77364/RE002] and was pre-registered (https://osf.io/nsduz/). We tested whether L1 French/L2 English print exposure (assessed by AFT and ART in both languages) is associated with individual differences in knowledge of English connectives and collocations, even when accounting for a standard proficiency measure in both languages. Our research questions were intended to assess the utility of AFT as a novel measure of print exposure:
-
1) Does an L2 AFT outperform ART as a predictor of L2 vocabulary knowledge? Does either measure explain additional variance not accounted for by proficiency?
-
2) Do AFT/ART perform differently by vocabulary measure (collocations versus connectives)?
-
3) Does L1 or L2 print exposure better predict performance on L2 vocabulary tasks?
For our L2 English cohort, we hypothesised that
-
1) L1/L2 LexTALE scores would both positively predict connectives and collocations scores.
-
2) L2 (but not L1) AFT scores would positively predict connectives scores when controlling for LexTALE.
-
3) L1/L2 ART (but not AFT) scores would positively predict collocations scores when controlling for LexTALE.
For comparison, we hypothesised the same pattern for our L1 English cohort; i.e., that the English ART would predict collocation scores, and AFT would predict connectives. Essentially, we predicted that L2 print exposure, measured by AFT, would reliably predict connectives scores when controlling for LexTALE, but only ART would predict additional variance for collocations scores. Our rationale was that recognising authors might recruit similar skills as those required for recognising collocations. In contrast, we posited that the L2 English AFT, reflecting explicit memory of L2 reading experience, would be associated with English connectives, which require careful consideration to evaluate their functions. The L1 French AFT, however, which reflects L1 reading, was not anticipated to predict L2 vocabulary. In this way, we aimed to determine how L1 and L2 print exposure variously contribute to L2 language skills.
2. Methods
2.1. Participants
Prior to data collection, power analysis was carried out using G*Power (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007). For 0.8 power to detect a small effect size of .15 at a .05 alpha error probability, we obtained a recommended sample size of n = 55. Sixty L1 French/L2 English participants (M age = 31.13, 32 women) were recruited through Prolific (2024) to complete a single session on the online experimental research platform Gorilla (Anwyl-Irvine et al., Reference Anwyl-Irvine, Massonnié, Flitton, Kirkham and Evershed2020). Participants who provided informed consent and completed the study were reimbursed £6.67 each. Recruitment was limited to participants between 18 and 75 years old, who spoke French natively and currently lived in France, who spoke and read English fluently at an intermediate-to-advanced level and had normal or corrected-to-normal vision. Self-rated L2 proficiency (0–5 Likert scale) was high, with the majority scoring themselves between 3 (“Professional Working Proficiency”) and 4 (“Full Professional Proficiency”), see Supplementary Figure S2.
We also recruited 60 L1 English speakers (M age = 39.42, 37 women) through Prolific. Selection criteria mirrored those of the L2 group, but with native English speakers living in the United Kingdom. Below, we primarily restrict our analyses to the L2 cohort, permitting us to compare the relative contributions of L1 and L2 measures. However, we also include models from L1 English speakers to illustrate the differential predictions made by our print exposure measures.
2.1.1. Procedure
Participants completed 1) the demographics questionnaire, followed by 2) the English and French AFT, 3) the English and French LexTALEs and ARTs, 4) the connectives and collocations tasks and 5) a motivation survey. Task order was counterbalanced for levels 2, 3 and 4 due to task similarities. The L1 participant procedure was identical, excluding French tasks.
2.2. Materials
2.2.1. Author fluency task
For the English AFT, participants listed as many author names as possible in three minutes. Instructions asked participants to provide names of authors who had been published in English and who were known primarily for their writing. Participants typed names into a text field. Due to the reportedly difficult nature of the task, names were scored leniently by the first author. Each name was verified using an online database of over 6,000,000 author names (Internet Archive, 2022) and Google to determine possible misspellings, which were corrected. Validated author names were rated 1, non-authors −1 and indeterminate names 0. The coded ratings were then summed for each participant’s list of names. For example, a hypothetical participant listing, “J.R.R. Tolkien, Margaret Atwood, Kurt Vonnegut, Conan O’Brien, J. Smith” (3 authors, 1 non-author, 1 indeterminate), would receive a score of 3–1−0 = 2. Selection statistics are shown in Supplementary Tables S7/S8, and group differences in score distributions are visualised in Supplementary Figure S4.
The French AFT was identical in procedure and scoring, but with French instructions. Accordingly, participants were asked to provide names of authors who had been published in French. Selection statistics are shown in Supplementary Table S9.
2.2.2. Author recognition test
The English ART was taken from Vermeiren et al. (Reference Vermeiren, Vandendaele and Brysbaert2022), featuring 60 author names and 30 foil names. Participants were randomly shown each name serially and were asked to indicate whether each was an author or not with keyboard responses. Correct author selections increased scores by 1 point, incorrect selections decreased scores by 1 and no penalty was incurred for not indicating an existing author. A full list of author names, mean response times and selection statistics is found in Supplementary Tables S10/S11, and is illustrated in Supplementary Figures S6/S7. Group differences in score distributions are visualised in Supplementary Figure S5.
The French ART followed the same procedure and scoring logic, but participants were provided instructions in French. This version was taken from Zufferey and Gygax (Reference Zufferey and Gygax2020), and features 40 author names and 40 foil names (Supplementary Table S12).
2.2.3. LexTALE
The English LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) is a lexical decision task containing 40 words, 20 non-words and 3 filler words. Participants were shown each item randomly and responded with keyboard presses to indicate whether each was an English word or not. Scores were calculated as the percentage of correct selections for words and non-words out of the total (Supplementary Table S13). Supplementary Figure S8 visualises group score differences.
The French LexTALE (Brysbaert, Reference Brysbaert2013) followed the same procedure and scoring logic, but contained 56 French words and 28 non-words. Instructions were provided in French, and scores were calculated as the percentage of correct selections out of the total (Supplementary Table S14).
2.2.4. Discourse connectives task
This task was adapted and translated to English from the original version in Wetzel et al. (Reference Wetzel, Zufferey and Gygax2020), which was presented in French to L1 German French learners. This is a sentence cloze task that asks participants to complete a coherent sentence by selecting the appropriate connective from six options. For example:

Each connective falls into one of six coherence relations denoting the logical relationships specified by each connective, e.g., “whereas” encodes a “contrast” relation. For each sentence, competitors were selected from each of the other relations. High and low frequency connectives were selected using the corpus English Web 2020 (“enTenTen20”) in corpus software SketchEngine (Kilgarriff et al., Reference Kilgarriff, Baisa, Bušta, Jakubíček, Kovář, Michelfeit, Rychlý and Suchomel2014). The full list of stimuli (Supplementary Table S1) and corpus frequency statistics (Supplementary Table S2) can be found in the Supplementary Materials.
2.2.5. Collocations task
The Words That Go Together task was used to assess knowledge of English collocations (Dąbrowska, Reference Dąbrowska2014). Participants read a list of five word pair phrases and were instructed to select the one that was most familiar or natural. Accuracy scores were calculated as percentages of correct selections. The full list of stimuli is provided in Supplementary Table S3.
2.2.6. Semantic fluency
After reviewing the initial findings, we conducted an additional analysis (pre-registered in an update) using a test of general semantic fluency in English. This followed the same format as AFT, but with three different categories of items: “animals”, “grocery items” and “public figures” (i.e., famous people, including celebrities, politicians, etc.). Of the original 60 L2 participants, 48 returned two months later. Participants were given one minute for each category, for a total of three minutes, equivalent to AFT. Unlike AFT, participants were unable to complete the task early, which may have increased the number of items provided. Items were scored by the first author and calculated as the sum of unique and valid items per category. Score distributions by sub-task are visualised in Supplementary Figure S11.
2.2.7. Additional variables
Details on additional variables, including motivation (Supplementary Table S4) and demographics, can be found in the Supplementary Materials. Supplementary Table S6 provides summary statistics for the motivation measure, and Supplementary Figures S1, S2 and S3 visualise demographic information of L2 participants, including age of acquisition, self-reported proficiency and ratings of perceived importance of reading for learning English.
3. Results
Summary statistics and sample sizes per task are presented in Table 1. L1 participant scores exceeded L2, most notably for LexTALE and collocations. Outliers were identified as those falling below Q1−1.5×IQR or above Q3 + 1.5×IQR within their cohort on each task. While not pre-registered, this step was taken due to very low scores for some tasks. Outliers on each task were removed, leading to slightly lower sample sizes in some measures. Correlations for all measures in the L2 group are shown in Table 2, and an analogous table for L1 speakers is provided in Supplementary Table S5. Author name selection statistics are also provided for AFT (Supplementary Tables S7/S8/S9) and ART (Supplementary Tables S10/S11/S12).
Table 1. Summary statistics for each task by cohort. Mann–Whitney U test p-values were Bonferroni-corrected for multiple comparisons. Bold p-values indicate p < .05.

Table 2. Spearman correlation matrix for all measures, L2 English cohort. Significant correlations are in bold; * = p < .05, ** = p < .01, *** = p < .001.

Analysis was performed in R (version 2023.12.1, R Core Team, 2024). Generalised linear mixed effects models (GLMER) were constructed using the package lme4 (Bates et al., Reference Bates, Mächler, Bolker and Walker2015), p-values were extracted using the package lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) and model assumptions of overdispersion, normality and outliers were checked using the package DHARMa (Hartig, Reference Hartig2022). To counter problems with multicollinearity, continuous predictors were first standardised before being entered into GLMERs, and we iteratively compared model performance with likelihood ratio tests using the maximal effects structure justified by the design (Barr et al., Reference Barr, Levy, Scheepers and Tily2013).
3.1. Connectives
We begin by describing performance by connective type and language group before considering models that demonstrate the relative strengths of each predictor for both language groups. Group differences in score distributions are illustrated in Supplementary Figure S9. Scores for each connective, by coherence relation, frequency and language group are presented in Table 3, and performance by coherence relation and group is illustrated in Figure 1. Higher frequency connectives, unsurprisingly, were responded to more accurately than lower-frequency alternatives. A notable exception was “indeed”, where performance was poorer compared to even the lowest frequency connective. This may be because we were specifically interested in its use as a subordinating conjunction, which could not be uniquely captured with our search terms – although “indeed” is very common in the corpus, its use as a connective is substantially lower relative to alternative uses.Footnote 1 Curiously, L2 speakers outperform L1 participants on “indeed”, the sole exception of its kind. This could be due to familiarity with the French connective “en effet”, which functions similarly to “indeed” (albeit with important differences; Zufferey & Gygax, Reference Zufferey and Gygax2017), yet it is unclear why English natives struggle with this high-frequency connective.
Table 3. Accuracy scores as percentages per connective, by frequency (high/low) and cohort


Figure 1. Percentage of correct answers by language group and coherence relation.
For L2 English speakers, comparisons favoured a linear regression model with only the English AFT as a predictor of connectives scores over one with ART alone, as indicated by a significant Vuong test (z = 2.54, p < .01) and a lower AIC (ΔAIC = −17.76). Our most comprehensive model (F(2, 57) = 39.86, p < .001, Adj-R2 = .57) showed effects for both LexTALE (F(1, 57) = 73.61, p < .001) and AFT (F(1, 57) = 6.11, p < .05). The English ART did not significantly predict connectives when considering either of the other variables. The contributions of each L2 predictor are illustrated in the standardised partial residuals presented in Figure 2A, and a model with LexTALE and AFT as predictors is provided in Supplementary Table S16.

Figure 2. (A) Partial effects of L2 predictors on L2 English connectives accuracy. (B) Partial effects of L1 predictors on L1 English connectives accuracy.
Performance was also evaluated using L1 French measures. Comparing regression models with predictors of the French AFT and ART alone preferred the AFT model (ΔAIC = −5.20). The best-fitting model (F(1, 58) = 9.40, p < .01, Adj-R2: .13) identified the French AFT (AFT-FR) as a significant predictor (Supplementary Table S17). However, this model did not satisfy the assumption of normally distributed residuals (Shapiro–Wilk p < .01), and attempts to address this issue through data transformation and robust regression methods were unsuccessful. This model’s findings are thus interpreted with caution, but evidently, the explanatory power of L1 print exposure appears modest. For comparison across the French LexTALE, ART and AFT, Supplementary Figure S12 shows standardised partial residual plots from a model with all predictors.
For L1 English speakers, separate regression models showed a marginal effect of AFT on connectives (Adj-R2: .05, p = .05), whereas ART performed modestly (F(1,55) = 7.65, Adj-R2: .11, p < .01). Figure 2B shows standardised partial residual plots from a model with all predictors. To illustrate the differences in the two print exposure predictors across language groups, we constructed exploratory GLMER models. Our final model included fixed effects of AFT, ART, connective frequency and coherence relation, and their interactions with language group, as well as random intercepts for participants and items (Marginal-R2 = .16, Conditional-R2 = .34; Table 4). Contrasts were dummy-coded, with the baseline set to “low” for connective frequency, “addition” for coherence relation and “L1” for group. Main effects for all coherence relations were significant (ORs = 2.59–5.74), though confidence intervals varied widely when comparing across groups. There was also a significant negative interaction for the L2 group for all coherence relations except for “concession”. The main effect of frequency was non-significant, but interacted with language group such that L2 speakers showed significantly increased odds in the high frequency condition compared to L1 speakers (OR = 1.64, p < .001). Main effects for ART and AFT were also non-significant, but there was a significant interaction between AFT and language group, such that AFT predicted increased odds ratios in L2 (OR = 2.12, p < .001). Thus, for each 1 SD increase in AFT (5.32 author names in L2), the odds of correct selections increased by 112% for L2 compared to L1 speakers. Fixed effects are visualised in Figure 3.
Table 4. Fixed effects and their interactions with language group, and random effects of participant/item on odds of correct connectives selections. Bold p-values indicate p < .05.


Figure 3. Effects of predictors by language group on connectives accuracy.
3.2. Collocations
For the collocations task, L2 trial accuracy was as low as 6.67% for “refuse an application” (due to competition from “deny an application”) to as high as 80% for “fair share”. Detailed statistics on the full list of items by language group are provided in Supplementary Table S15, and group differences in score distributions are illustrated in Supplementary Figure S10.
Comparing non-nested linear regression models with predictors of the English AFT and ART separately favoured the model with AFT (ΔAIC = −10.92), and our best fitting model (F(2, 57) = 63.39, p < .001, Adj-R2: .68) included both LexTALE (F(1, 57) = 123.46, p < .001) and AFT, although this was marginal (F(1, 57) = 3.32, p = .07) (Supplementary Table S18). ART was not significant when accounting for either of the other variables. To illustrate the differential contributions of each predictor, standardised partial residual plots from a model with all predictors are shown in Figure 4A.

Figure 4. (A) Partial effects of L2 predictors on L2 English collocations accuracy. (B) Partial effects of L1 predictors on L1 English collocations accuracy.
Using L1 French predictors, separate linear regression models showed the French AFT (F(1, 58) = 6.79, p < .05, Adj-R2: .09) (Supplementary Table S19) and ART (F(1, 58) = 5.75, p < .05, Adj-R2: .07) each modestly predicted collocations scores, with negligible differences in model fit (ΔAIC = −0.97), indicating limited explanatory power for L1 print exposure. The French LexTALE was not associated with L2 collocations scores. For comparison across the French LexTALE, ART and AFT, Supplementary Figure S13 shows standardised partial residual plots from a model with all predictors.
For L1 English speakers, individual regression models predicting collocations scores showed a null effect of AFT, but a significant albeit small effect of ART (F(1, 55) = 7.65, p < .01, Adj-R2: .11). As with the connectives task, the English ART was a better predictor compared to AFT – an opposite finding to L2 speakers. However, our best model included LexTALE alone (F(2, 55) = 10.78, p < .001, Adj-R2: .15). For comparison with L2, we provide residual plots from a model including all predictors in Figure 4B.
We also constructed an exploratory GLMER predicting the odds of correct collocation selections. Our final model included fixed effects of AFT, ART and collocation frequency (as a continuous measure, using values from Dąbrowska, Reference Dąbrowska2014), and their interactions with language group, with random intercepts for participants and items (Marginal-R2: .15, Conditional-R2: .35; Table 5). Significant main effects were found for ART (OR = 1.55, p = .001) and language group (OR = 0.24, p < .001), but AFT and frequency were non-significant. However, there were significant interactions with language group, with AFT predicting increased odds ratios in L2 compared to L1 (OR = 1.67, p < .01), translating into 67% higher odds per 1 SD in AFT score; and for frequency and language group, predicting increased odds for higher-frequency collocations in L2 compared to L1 speakers (OR = 1.17, p < .05). ART also marginally predicted lower odds in L2 compared to L1 (OR = 0.73, p = .08). Fixed effects are visualised in Figure 5.
Table 5. Fixed effects and their interactions with language group, and random effects of participant/item on odds of correct collocations selections. Bold p-values indicate p < .05.


Figure 5. Effects of predictors by language group on collocations accuracy.
3.3. Mediating effects of semantic fluency
To evaluate whether verbal fluency generally could moderate the effect of AFT, we re-recruited participants for a test of semantic fluency with three different item categories: “animals”, “grocery items” and “public figures”. Some participants interpreted the instructions incorrectly, providing names of French supermarket chains instead of grocery items, and categories of public figures (e.g., actor, musician) instead of proper names, but we opted to keep these observations. We removed one participant who entered all items in French. Below, we compare both a combined measure with the sum of all scores, as well as the individual subtasks.
A regression model showed that AFT (F(1, 44) = 5.81, p < .05) and the SF sum score (F(1,44) = 31.46, p < .001) co-predicted L2 connectives (F(2, 44) = 18.64, p < .001, Adj-R2: .43). For L2 collocations, only AFT predicted the outcome (F(1, 44) = 28.43, p < .001, model Adj-R2: .38), whereas SF was non-significant.
Analysis by subtask revealed divergent outcomes. A model predicting connectives with AFT and animal naming (F(2, 44) = 22.15, p < .001, Adj-R2: .48) showed effects of AFT (F(1, 44) = 31.16, p < .001) and animals (F(1, 44) = 13.15, p < .001); another model comparing AFT and groceries (F(2, 44) = 15.50, p < .001, Adj-R2: .39) showed effects of AFT (F(1, 44) = 26.46, p < .001) and groceries (F(1, 44) = 4.54, p <. 05); and a model with AFT and public figures (F(2, 44) = 12.57, p < .001, Adj-R2: .33) showed an effect of AFT (F(1, 44) = 24.39, p < .001) but a null effect for public figures. Analogous models predicting collocations from AFT and animals, groceries and public figures only showed effects of AFT (F(1, 44) = 27.21–28.85), all ps < .001.
4. General discussion
We sought to validate a semantic fluency task for author names in L2 (AFT) as a measure of print exposure using outcome measures of formulaic vocabulary, and to determine the relative contributions of L1 and L2 print exposure for L2 vocabulary knowledge. We hypothesised that when controlling for LexTALE, ART would predict collocations, whereas AFT would predict connectives. The rationale for this was that the two print exposure measures might reflect the different kinds of memory required for each task. That is, evaluating the correct use of connectives requires not only word recognition but also knowledge of their function; conversely, evaluating collocations is a far more automatic process – either you know which words tend to co-occur more than others, or you do not. In fact, however, we found that AFT was more positively correlated with both L2 connectives and collocations compared to ART.
The finding that L2 AFT scores predict significant additional variance beyond LexTALE for connectives scores, and marginally so for collocations, further underscores the importance of reading for acquiring L2 vocabulary. Moreover, this was not the case for ART. Given the high variability in L2 exposure and proficiency, and the restrictive nature of ART, this is not entirely surprising. That an open-ended measure like AFT performs well in this regard, however, even when accounting for L2 proficiency, is the primary contribution of the present research. Second language research is replete with discussions about how to access L2 learners’ “cultural capital” (Bourdieu, Reference Bourdieu and Richardson1986; Tunmer et al., Reference Tunmer, Chapman and Prochnow2006), yet when evaluating the role of print exposure in these populations, researchers have not always acknowledged that the language experiences of L2 speakers rarely mirror those of English natives. Consequently, an effective and reliable proxy measure of L2 print exposure may not be the same as one used for L1. This is precisely what we demonstrate, with interactive models showing ART is most effective in L1, and AFT exceeding in L2.
Furthermore, measures of L2 proficiency and print exposure outperformed analogous L1 measures as predictors of L2 vocabulary. Logically, one’s degree of exposure to a particular language should explain more about vocabulary knowledge in that language, compared with exposure to another. Yet L1 experience is generally considered fundamental, laying the groundwork for learning additional languages (Sparks et al., Reference Sparks, Patton, Ganschow and Humbach2012). Again, our connectives measure is an adapted version of a task from a study in which L2 proficiency was predicted by an L1, but not L2 ART (Wetzel et al., Reference Wetzel, Zufferey and Gygax2020). We maintain that this was due to limited L2 exposure, which makes ART unlikely to be useful for L2 beginners. Granted, L1 proficiency is undoubtedly a limiting factor for L2 novices, but the question of what distinguishes advanced L2 speakers is a separate one. Once a speaker becomes relatively proficient in a target language, it follows that more extensive and naturalistic L2 exposure becomes critical. However, we acknowledge that our participants were also older than those recruited by Wetzel and colleagues, and since print exposure increases with age, their effects are difficult to disentangle. Similarly, years of exposure to L2 and the age of acquisition also influence print exposure and, consequently, proficiency. Most likely, both age and print exposure are implicated to some degree in explaining our results.
Despite the criticisms surrounding ART, it is interesting to note that author recognition still correlated with L2 vocabulary in our study – although considerably less than AFT. While ART may index L2 print exposure in advanced L2 speakers such as these, its overlap with proficiency measures might lead researchers to infer null effects for print exposure when controlling for other tasks. However, we acknowledge that AFT is unlikely to be useful for novice learners either, given their limited L2 reading experience.
Additionally, we found that a semantic fluency (SF) aggregate measure also correlated positively with connectives and collocations (Table 2). For predicting connectives scores, this SF measure moderated some, but not all variance explained by AFT. For collocations, SF scores were non-significant when paired with AFT. This also varied by outcome measure and by SF subtask. For connectives, “animals” and “groceries” remained significant when controlling for AFT, but “public figures” became non-significant. This distinction is likely because lexical access pathways are distinct for common and proper nouns (Proverbio et al., Reference Proverbio, Lilli, Semenza and Zani2001; Semenza, Reference Semenza2009). Semantic fluency for authors and celebrities both requires recall of proper nouns, and we observe the expected outcome that author names are more informative than public figures, as the former index reading experience whereas the latter reflect general cultural exposure. Conversely, no semantic fluency subtask predicted collocations when paired with AFT. These divergent outcomes suggest that semantic fluency, or something associated with it, plays a larger role in the processing of connectives, and print exposure is more important for acquiring collocations. We suspect that if the variance explained by AFT in L2 were simply due to differences in fluency alone, first, we would also observe some effect of semantic fluency for the collocations task when paired with AFT. Second, a similar effect for AFT would likely also be seen in the L1 English population. But in fact, we see a sort of “inverted picture”, where ART is the better predictor for L1 speakers, and AFT outperforms in L2. It is possible that the role of semantic fluency is simply stronger in L2 than in L1, given the wider range of L2 skill generally, and research demonstrating L1 and L2 speakers are primarily differentiated by fluency rather than comprehension (Kuperman et al., Reference Kuperman, Siegelman, Schroeder, Acartürk, Alexeeva, Amenta, Bertram, Bonandrini, Brysbaert, Chernova, Da Fonseca, Dirix, Duyck, Fella, Frost, Gattei, Kalaitzi, Lõo, Marelli and Usal2023; Siegelman et al., Reference Siegelman, Elgort, Brysbaert, Agrawal, Amenta, Arsenijević Mijalković, Chang, Chernova, Chetail, Clarke, Content, Crepaldi, Davaabold, Delgersuren, Deutsch, Dibrova, Drieghe, Filipović Đurđević, Finch and Kuperman2024). Yet it is unclear why such variance would not have been sufficiently captured by LexTALE, which also measures lexical access. We contend that the limiting factor for AFT is familiarity with authors (and consequently, is a reliable proxy for reading experience) rather than verbal fluency generally, as indicated by the null effect of public figure naming when paired with AFT. Thus, we argue that AFT reflects the additional engagement with reading required to become highly proficient in L2.
Before concluding, we note some limitations to our study. First, we calculated AFT scores using one point for each author, with no weighting for authors who are perceived to be more (or less) valuable to the reader. Perhaps more popular author names are more likely to represent general cultural knowledge rather than personal reading experience – after all, one need not have read any of Stephen King or Jane Austen’s books for them to come readily to mind when thinking of authors, and they may be associated with Hollywood adaptations of their works rather than the original material. Developing weights for author names is a complex and delicate issue, but one that bears consideration.
Although LexTALE was a robust predictor of vocabulary knowledge, it also has limitations. LexTALE is a word recognition measure, and word knowledge is a multidimensional construct, with depth of word knowledge and meaning a better metric than knowledge or recognition of form (Jeon & Yamashita, Reference Jeon, Yamashita, Jeon and In’nami2022). As a lexical decision task, LexTALE only indexes knowledge of word form (and correspondingly, processing speed). Moreover, some evidence suggests that although the LexTALE is a robust measure of vocabulary knowledge, it may not be reliable as a global proficiency measure in L2 (Puig-Mayenco et al., Reference Puig-Mayenco, Chaouch-Orozco, Liu and Martín-Villena2023). Thus, a more sensitive measure may be required to separate the effects of L2 proficiency, semantic fluency and L2 print exposure.
The study may also have benefited from a larger sample size, as online studies generally require more observations due to increased variability in testing conditions (Rodd, Reference Rodd2024). To ensure these findings are robust, AFT will need to be replicated in greater numbers, and participants should complete the task on multiple occasions to determine its test–retest reliability. Although its reliability is likely comparable to other semantic fluency tasks, this metric is an important dimension of a test’s utility and would provide additional insight into its use across language groups. AFT will also require replication in diverse language populations, since our findings may be partially related to the close linguistic distance between English and French. However, we suspect this is unlikely to completely explain the results, since the similarities between these two languages might be expected to instead diminish the importance of L2 reading experience. Similarly, it is possible our findings may apply primarily to English L2 speakers due to the global spread and influence of English, which means many English authors benefit from the language’s broad reach and market dominance. Future studies will determine if these findings generalise well to other target languages, although we expect AFT will be most effective in languages with a similar culture of readership to English.
L2 learners may also know many English authors, but their personal experience with them could be primarily through L1 translations. This first iteration of AFT did not ask participants which L2 authors they provided were read in an L1 translation, and this could be an important modification for subsequent studies. We attempted to diminish this by instructing participants to only name L2 authors who had been published in English. An alternative phrasing for these instructions could have asked participants to instead name authors they had read personally, but we considered this to be too restrictive, especially given that this restriction was not present for ART. Instead, we determined it was more important to allow participants to name whichever authors came most readily to mind when they thought about L2 reading generally. Undoubtedly, some of their actual encounters with these authors will have been through translation. Yet we assert that although an increase in author names may not directly reflect primary print exposure, just as for ART, it nevertheless indicates increased familiarity with authors associated with the target language. Additionally, as our participants self-reported to have intermediate or greater proficiency in English, we suspect that they would have ample opportunity and interest to explore these works as they originally appeared, rather than reading translations. Therefore, we reason that the potential impact of reading translations of these works is unlikely to explain the robust correlations between the L2 AFT and L2 connectives/collocations scores.
Finally, this study reinforces previous findings that indicate that explicit recall tasks more accurately assess L2 language proficiency. Unlike self-report surveys or the ART, a semantic fluency task for author names allows second language learners to demonstrate their print knowledge directly, while reducing concerns about social desirability bias or guessing. As a practical and intuitive measure, the AFT offers a useful alternative or complement to existing print exposure assessments, and may help to refine our understanding of how reading experience contributes to second language learning.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S136672892510045X.
Data availability statement
The data that support the findings of this study are openly available on OSF at https://osf.io/q62mt/.
Acknowledgements
Thank you to Hui Zhu for contributing to the French–English translations of the connectives task derived from Wetzel et al. (Reference Wetzel, Zufferey and Gygax2020).
Competing interests
The authors declare none.