Introduction
Vocabulary knowledge is vital to the successful acquisition of a second language (L2). Typically, L2 learners begin by intentionally learning vocabulary from dictionaries, textbooks, or teacher instruction (Hulstijn, Reference Hulstijn and Robinson2001). Once they have acquired the most commonly used words, they transition to becoming independent learners, relying primarily on incidental vocabulary acquisition through reading (Nagy et al., Reference Nagy, Anderson and Herman1987). Many studies have examined how reading in context promotes the incidental learning of L2 vocabulary (e.g., Webb, Reference Webb2008). However, this body of research has largely overlooked the acquisition of affix knowledge—a critical component of word knowledge. Affix knowledge has been found to correlate with L2 learners’ overall vocabulary knowledge (Schmitt & Meara, Reference Schmitt and Meara1997; Mochizuki & Aizawa, Reference Mochizuki and Aizawa2000; Sasao & Webb, Reference Sasao and Webb2017; Laufer et al., Reference Laufer, Webb, Kim and Yohanan2021; Snoder & Laufer, Reference Snoder and Laufer2022) and their ability to produce derivatives (Iwaizumi & Webb, Reference Iwaizumi and Webb2023). Like individual words, there are numerous affixes that L2 learners need to acquire to achieve language mastery (Bauer & Nation, Reference Bauer and Nation1993). As with individual lexemes, affixes vary in frequency; even those that occur less frequently are necessary for English proficiency (Bauer & Nation, Reference Bauer and Nation1993). Similar to lexical acquisition, it should be expected that L2 learners initially focus on mastering the most frequently used affixes through intentional learning, then transition to incidental means for the low-frequency ones. It has been suggested that high-frequency affixes are often explicitly taught in L2 classrooms (Nation, Reference Nation2022). However, the extent to which L2 learners can acquire knowledge of lower-frequency affixes incidentally through reading remains unclear.
Eye-tracking studies investigating incidental vocabulary learning have examined how L2 learners process novel words during reading (Godfroid et al., Reference Godfroid, Boers and Housen2013; Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Mohamed, Reference Mohamed2018; Elgort et al., Reference Elgort, Brysbaert, Stevens and van Assche2018; Pellicer-Sánchez, 2021). These studies consistently indicate that learning gains are positively predicted by the duration of eye fixations on novel words, suggesting that greater attention facilitates more substantial learning. However, little is known about whether this association between reading time and learning generalizes to affix knowledge. Given this research gap, the current study investigates whether L2 learners can incidentally learn affix knowledge through reading. Specifically, the study assesses gains in three types of affix knowledge—form recognition, meaning recall, and meaning recognition—using pre-tests and post-tests conducted before and after reading. Eye tracking, which has been applied to explore the effects of morphemic unit properties (e.g., suffix length and the informativeness difference between base words and suffixes) on the processing and reading of derived words (Kuperman et al., Reference Kuperman, Bertram and Baayen2010), was employed to examine the relationship between the processing of affixed words during reading and incidental learning. Furthermore, affix characteristics and learners’ L2 proficiency were analyzed to determine their contributions to learning gains. If the results show that L2 learners can incidentally learn novel affixes through reading, this would have significant implications for both L2 learning and teaching practices.
Background literature
Incidental L2 vocabulary learning through reading
Incidental learning is commonly intended to refer to the process by which language learners acquire knowledge of a language unintentionally through meaning-focused activities like reading or listening, whereas intentional learning involves a deliberate and effortful focus on language form (e.g., Hulstijn, Reference Hulstijn and Robinson2001). However, in practice, it is often difficult to determine the moment-to-moment nature of learning (Webb, Reference Webb and Webb2020), and therefore, it is often operationalized as either (a) learning from engagement with meaning-focused tasks (e.g., Ellis, Reference Ellis1999; Chen & Truscott, Reference Chen and Truscott2010) or (b) learning prior to an unannounced post-test (e.g., Hulstijn, Reference Hulstijn and Robinson2001). In the field of L2 acquisition, many studies have explored incidental vocabulary learning (e.g., Webb, Reference Webb2008; Elgort & Warren, Reference Elgort and Warren2014; Pellicer-Sánchez, Reference Pellicer-Sánchez2016). Though some researchers have reported that the vocabulary gains from incidental learning during reading are modest (Hulstijn et al., Reference Hulstijn, Hollander and Greidanus1996), this approach is still widely regarded as an important complement to intentional vocabulary learning (Schmitt & Carter, Reference Schmitt and Carter2000; Wu, Reference Wu2009; Hu, Reference Hu2013). Drawing on Nation’s (2013) classification of word knowledge, studies on L2 incidental vocabulary learning typically assess learners’ gains using tests of word form recognition, meaning recall, and meaning recognition (e.g., Pellicer-Sánchez, Reference Pellicer-Sánchez2016).
Although the potential benefits of incidental learning through reading are influenced by various factors, including individual learner differences, learning materials, activity types, and methodological control for prior knowledge (Webb et al., 2023), the number of repeated exposures to a new word remains the most commonly researched factor (Horst et al., Reference Horst, Cobb and Meara1998; Waring & Takaki, Reference Waring and Takaki2003; Pellicer-Sánchez & Schmitt, Reference Pellicer-Sánchez and Schmitt2010; Pellicer-Sánchez, Reference Pellicer-Sánchez2016). Studies have found that the chances of learning a new word increase with repeated exposures, with the number required for incidental word learning ranging from as few as two to more than 10. However, the benefit of additional exposures appears to decline beyond 20 exposures (Uchihara et al., Reference Uchihara, Webb and Yanagisawa2019). While there is no consensus on the exact number, research by Horst et al. (Reference Horst, Cobb and Meara1998), Waring and Takaki (Reference Waring and Takaki2003), and Pellicer-Sánchez (Reference Pellicer-Sánchez2016) suggests that a minimum of eight exposures is necessary for learners to recognize or recall the meaning of new words.
Reading context is also crucial in investigating the incidental learning of words through reading (e.g., Webb, Reference Webb2008). Previous research has examined various aspects of context, including differences in length, genre, authenticity, and semantic diversity (Pellicer-Sánchez & Schmitt, Reference Pellicer-Sánchez and Schmitt2010; Godfroid et al., Reference Godfroid, Boers and Housen2013; Elgort & Warren, Reference Elgort and Warren2014; Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Johns et al., Reference Johns, Dye and Jones2016; Joseph & Nation, Reference Joseph and Nation2018). A core concept related to reading context, contextual richness, refers to the extent of information a context provides about the meaning of an unknown word (Zahar et al., Reference Zahar, Cobb and Spada2001; Webb, Reference Webb2008). Some scholars (Zahar et al., Reference Zahar, Cobb and Spada2001; Webb, Reference Webb2008) have developed a 4-point scale to measure the richness of a word’s context. However, research findings on the effect of contextual richness on incidental word learning are inconsistent. While some studies suggest that contextual richness is more influential than exposure frequency (Webb, Reference Webb2008), others have found the opposite (Zahar et al, Reference Zahar, Cobb and Spada2001; Tekmen & Daloğlu, Reference Tekmen and Daloǧlu2006; Reynolds, 2020).
In sum, incidental learning during reading can contribute to multiple aspects of vocabulary knowledge, which are influenced by factors like exposure frequency and contextual richness. Thus, studies on incidental vocabulary learning should account for these factors and employ diverse tests to assess different dimensions of vocabulary knowledge.
L2 Learning and processing of derivational affixes
English words can be modified by a range of inflectional and derivational morphemes, with derivational affixes often changing a word’s meaning and/or part of speech. For instance, adding the derivational prefix un- and suffix -ness to happy creates unhappiness, altering both its meaning and grammatical category.
Previous research has explored various factors that may influence the learning and processing of L2 affixes. One aspect is the distinction between neutral and non-neutral affixes regarding phonological changes. Non-neutral affixes alter either stress placement or vowel quality in the base word, such as the suffix -ese in Japanese. In contrast, neutral affixes, such as -hood in childhood, leave the base word’s stress and vowel quality unchanged. This distinction potentially influences acquisition, as neutral affixes are generally learned earlier and more easily by L1 learners (Tyler & Nagy, Reference Tyler and Nagy1989) and likely by L2 learners as well (Friedline, Reference Friedline2011). Another aspect is affix productivity, which also appears to influence the processing and acquisition of derived words (e.g., Bertram et al., Reference Bertram, Laine and Karvinen1999; Bertram et al., Reference Bertram, Schreuder and Baayen2000). For example, native Finnish children better recognize and understand words with highly productive suffixes (Bertram et al., Reference Bertram, Schreuder and Baayen2000). Similarly, affix frequency is considered critical for developing morphological knowledge in children (e.g., Beyersmann et al., Reference Beyersmann, Castles and Coltheart2012; Rastle & Davis, Reference Rastle and Davis2008), and has been examined for its role in nonword priming among native French speakers (Beyersmann et al., Reference Beyersmann, Casalis, Ziegler and Grainger2015). Furthermore, affix position—prefix or suffix—may affect L1 readers’ processing, potentially stemming from left-to-right parsing during reading (Beauvillain, Reference Beauvillain1996; Kwantes & Mewhort, Reference Kwantes and Mewhort1999; Mousikou & Schroede, Reference Mousikou and Schroeder2019). The part of speech of derived words also presumably impacts learning. Schmitt and Zimmerman (Reference Schmitt and Zimmerman2002) found that nouns or verbs were easier to produce compared to adjectives or adverbs, although Iwaizumi and Webb (Reference Iwaizumi and Webb2023) found that producing adverb derivatives was the easiest.
Learners’ language proficiency also influences L2 affix learning and processing. Research has shown a positive correlation between L2 proficiency and affix knowledge (Mochizuki & Aizawa, Reference Mochizuki and Aizawa2000; Laufer et al., Reference Laufer, Webb, Kim and Yohanan2021; Snoder & Laufer, Reference Snoder and Laufer2022; Laufer, Reference Laufer2023), with advanced learners more likely to possess richer knowledge of derivational morphemes. Their broader vocabulary and derivational affix knowledge may enable them to acquire derived words more systematically (Iwaizumi & Webb, Reference Iwaizumi and Webb2023). This indicates that advanced L2 learners tend to use decompositional processing for morphologically derived words, analyzing words into their constituent morphemes (Diependaele et al., Reference Diependaele, Duñabeitia, Morris and Keuleers2011; Li et al., Reference Li, Taft and Xu2017). Furthermore, learners’ morphological awareness may also impact L2 affix learning by enhancing their ability to recognize, reflect on, and manipulate the morphemic structure of words while reading (Carlisle, Reference Carlisle and Feldman1995). Several morphological awareness measures have been employed to assess learners’ knowledge of derivational morphology (e.g., Mahony, Reference Mahony1994; Mahony et al., Reference Mahony, Singson and Mann2000; Friedline, Reference Friedline2011). According to Tyler and Nagy (Reference Tyler and Nagy1989), knowledge of derivational morphology encompasses syntactic, relational, and distributional components. Syntactic knowledge helps learners recognize that derivational suffixes represent the syntactic category of words (Tyler & Nagy, Reference Tyler and Nagy1989, p. 649). Relational knowledge involves understanding that words sharing the same base have related lexical meanings, while distributional knowledge pertains to comprehending the selectional constraints on affixes (Tyler & Nagy, Reference Tyler and Nagy1989).
To investigate the ongoing debate about L2 morphological processing (i.e., whole-word retrieval vs. morphological decomposition), researchers often use pseudowords to explore how L2 learners process derivational morphology (e.g., Casalis et al., Reference Casalis, Commissaire and Duncan2015; Li et al., Reference Li, Taft and Xu2017; Amenta et al., Reference Amenta, Foppolo and Badan2025). Pseudowords are novel linguistic forms created by manipulating constituent morphemes (i.e., base words and affixes) for experimental purposes (e.g., proudy, foodle; see further examples below). These pronounceable, word-like constructs conform to the phonological and orthographic rules of a given language but lack semantic meaning (Ziegler et al., Reference Ziegler, Besson, Jacobs, Nazir and Carr1997). Novel-derived words (or affixed pseudowords) are a specific subtype formed by combining a real base word with a real affix (e.g., animalful; Amenta et al., Reference Amenta, Foppolo and Badan2025), and are therefore interpretable to some extent. Although recent studies have shown that both novel-derived words (e.g., animalful) and uninterpretable pseudowords (e.g., animalfil) may elicit semantic activation (Marelli & Baroni, 2015; Gatti et al., Reference Gatti, Marelli and Rinaldi2023; de Varda et al., Reference de Varda, Gatti, Marelli and Günther2024), pseudowords remain a valuable tool for examining L2 morphological processing, as any morphological effects observed with pseudowords likely reflect processes that are engaged when whole-word access is unavailable (Li et al., Reference Li, Taft and Xu2017).
For example, Casalis et al. (Reference Casalis, Commissaire and Duncan2015) created four types of pseudowords by orthogonally manipulating the base word and suffix status: (1) a real base word with a real suffix (e.g., proudy), (2) a real base word with a non-existent suffix (e.g., foodle), (3) a non-existent base word with a real suffix (e.g., slinny), and (4) a non-existent base word with a non-existent suffix (e.g., birtle). French-speaking L2 learners of English performed a lexical decision task, where affixed pseudowords (e.g., proudy) were the hardest to reject, followed by pseudowords with either a real base word or suffix, with fully non-existent forms being the easiest to dismiss. Similarly, Li et al. (Reference Li, Taft and Xu2017) tested Chinese-English bilinguals using affixed pseudowords (e.g., animalful) and their non-affixed counterparts (e.g., animalfil). Participants found affixed pseudowords more difficult to reject, indicating increased morphological processing. Recently, Amenta et al.(Reference Amenta, Foppolo and Badan2025) examined native Dutch learners of Italian using a lexical decision task with three types of Italian pseudowords: (1) a real base word with a real suffix (e.g., stelloso: “with many stars”), (2) a real base word with a non-existent suffix (e.g., calzeccia from calz- “sock”), and (3) a non-existent base word with a real suffix (e.g., tencapabile with -abile “-able”). They also found that affixed pseudowords were the hardest to reject. Collectively, these findings suggest that L2 learners are sensitive to derivational morphology, likely due to their reliance on morphological decomposition in the absence of whole-word lexical entries (Li et al., Reference Li, Taft and Xu2017). This process appears to involve activating individual morphemes in lexical memory and evaluating the plausibility of their combination (Li et al., Reference Li, Taft and Xu2017), supporting the view that L2 learners engage in morpheme-level analysis when processing unknown derivations.
Understanding both the form and meaning of affixes is essential for developing affix knowledge (e.g., Bauer & Nation, Reference Bauer and Nation1993). Various factors influence an L2 learner’s processing and acquisition of this knowledge, including the properties of the affix itself and individual learner differences. Therefore, research on affix learning should address both language-level factors, such as affix neutrality, and individual-level factors, including L2 proficiency and morphological awareness. Given the advantages of pseudowords in investigating L2 morphological processing, this study employs pseudowords as target stimuli.
Incidental L2 vocabulary learning and eye tracking
Eye fixation measures are commonly used in L2 incidental vocabulary learning research to examine language processing and learning during reading, and are primarily interpreted as indicators of attention (e.g., Godfroid et al., Reference Godfroid, Boers and Housen2013; Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Elgort et al., Reference Elgort, Brysbaert, Stevens and van Assche2018; Godfroid et al., Reference Godfroid, Ahn, Choi, Ballard, Cui, Johnston, Lee, Sarkar and Yoon2018; Mohamed, Reference Mohamed2018; Pellicer-Sánchez et al., Reference Pellicer-Sánchez, Conklin and Vilkaitė-Lozdienė2021). One of the earliest studies was conducted by Godfroid et al. (Reference Godfroid, Boers and Housen2013), in which 28 Dutch-speaking EFL learners with similar English proficiency were recruited to read paragraphs embedded with 12 targeted pseudowords. The researchers found that learners fixated longer on unknown pseudowords than on known controls, even though both were matched for length and syllable count. Furthermore, the study revealed a positive correlation between the total reading time participants spent fixating on pseudowords and their scores on an unexpected offline recognition test.
Similarly, Pellicer-Sánchez (Reference Pellicer-Sánchez2016) used eye fixation data to compare the incidental vocabulary learning processes of L1 and L2 English speakers. Twenty-three advanced L2 learners and 25 natives read a story embedded with six targeted pseudowords and six known controls, each appearing eight times. The pseudowords and control words were matched for the number of letters and syllables. Eye movement measures showed that L1 readers spent less time processing the pseudowords than L2 readers. Furthermore, both groups showed a decrease in fixation time for pseudowords and control words with increased exposure. After eight exposures, fixation times did not significantly differ between pseudowords and control words for either group, supporting previous findings that eight exposures to a novel word may be necessary for initial learning to occur (e.g., Horst et al., Reference Horst, Cobb and Meara1998; Waring & Takaki, Reference Waring and Takaki2003). Unannounced offline meaning recall tests revealed a significant positive relationship between total reading time and test scores for both groups.
Mohamed (Reference Mohamed2018) investigated the processing of 20 unknown pseudowords and 20 control words by 42 advanced L2 learners of English while reading a graded-reader novel. All pseudowords were matched with control counterparts for length and syllable count. Similar to previous studies, Mohamed found that as the occurrences of both pseudowords and control words increased, total reading time spent on these words decreased. Moreover, the results indicated that total reading time positively predicted all three types of vocabulary knowledge. However, exposure frequency had varying impacts on each type of knowledge: form recognition was most affected, followed by meaning recognition, and finally, meaning recall.
In the three aforementioned studies, total reading time on target (pseudo)words strongly and positively predicted increased accuracy on offline vocabulary tests. Each study accounted for L2 proficiency and word-level factors, such as word length. However, none of the studies reported on contextual richness. Previous reading studies have shown that an informative context increases the likelihood of participants inferring the meaning of a target word (Webb, Reference Webb2008). If so, contextual richness may influence eye fixation patterns: when a word’s meaning is inferred before fixation, the time spent fixating on the word may decrease, or the word may even be skipped entirely. Therefore, it is crucial to include ratings of contextual richness in studies using eye-tracking data.
The present study
Previous research has shown that English affix knowledge significantly contributes to the success of L2 English learning. However, limited studies have investigated whether L2 learners can establish knowledge of English affixes incidentally through reading. To address this gap, the present study employed eye tracking to examine the correlation between the processing of affixed words while reading and incidental learning outcomes. Given the numerous factors that potentially affect affix processing and learning, this study focused on a specific type of English affix: neutral derivational noun suffixes. Learning outcomes were assessed using offline pre-tests and post-tests measuring three types of affix knowledge: suffix form recognition, suffix meaning recognition, and suffix meaning recall. By integrating online eye movement data with offline assessment, the study aimed to provide a clearer understanding of how L2 learners acquire English derivational noun suffixes incidentally while reading. It also analyzed the effects of stimulus-related variables (suffix length, suffix productivity, suffix frequency, and contextual richness) and learner-related variables (vocabulary knowledge and morphological awareness for derivations) on suffix processing and learning. The following research questions were investigated:
-
1. What aspects of suffix knowledge (i.e., form or meaning) can be incidentally learned by Chinese EFL learners through reading?
-
2. To what extent do attentional eye fixation measures (i.e., total reading time), language-level characteristics (i.e., suffix length, suffix productivity, suffix frequency, and contextual richness), and learners’ language proficiency (i.e., vocabulary knowledge and morphological awareness for derivations) predict learning gains in suffix knowledge through reading?
Given the potential for L2 vocabulary knowledge to be acquired incidentally through reading (Webb, Reference Webb2008; Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Webb et al., 2023), we formulated the following hypotheses:
For Research Question 1, we hypothesized that both suffix form and suffix meaning would be incidentally learned by L2 learners, as reflected in increased accuracy on post-tests compared to pre-tests, across measures of form recognition, meaning recall, and meaning recognition.
For Research Question 2, we hypothesized that longer reading times, indicative of greater attention to targeted base words and suffixes, would predict learning gains. This prediction aligns with prior findings on the positive relationship between increased reading times and incidental vocabulary learning (e.g., Godfroid et al., Reference Godfroid, Boers and Housen2013). Additionally, we expected that stimulus-related factors, including suffix length, suffix productivity, suffix frequency, and contextual richness, would influence the processing and learning of morphologically derived words. Specifically, we anticipated that more informative contexts would facilitate learning, while suffix characteristics like length, productivity, and frequency would also play significant roles. Finally, we predicted that learner-related factors, such as higher levels of morphological awareness for derivations, would lead to greater learning gains due to a better understanding of the morphemic structure of complex words (Carlisle, Reference Carlisle and Feldman1995). Given the close relationship between language proficiency and morphological knowledge (Kraut, Reference Kraut2015), we also hypothesized that vocabulary knowledge, as a measure of language proficiency, would positively impact learning gains.
Methodology
Participants
A total of 52 Chinese EFL learners pursuing non-English related master’s degrees at a university in Macau were recruited for this study. Of these, 11 participants were excluded primarily due to technical or calibration issues with the eye-tracking apparatus. Moreover, data from one participant were removed due to low accuracy on true/false reading comprehension questions (below 70%). Therefore, data from 40 participants (8 males; age range: 21–29, M = 23.8, SD = 1.522) were analyzed. Participants’ English proficiency, based on IELTS (International English Language Testing System) or CET-6 (College English Test-6) scores, was estimated at the B2 level (high intermediate) according to the Common European Framework of Reference for Languages (CEFR, Council of Europe, 2001). Results from the updated Vocabulary Levels Test (Webb et al., Reference Webb, Sasao and Ballance2017) confirmed that all participants had receptive knowledge of the most frequent 2,000 English word families. Each participant provided written informed consent before participation and received monetary compensation (approximately USD 15). All participants had normal or corrected-to-normal vision.
Materials
In preparation for the main reading study, targeted pseudowords were created by combining pseudo base words with actual English neutral derivational suffixes. Three-sentence reading contexts were then developed to embed each targeted pseudoword. Pilot testing was conducted to ensure the unfamiliarity of the suffixes to participants in the main study and to gauge the contextual richness of the reading materials. Details of this process are outlined in the following subsections.
Targeted suffixes
To identify potential targeted suffixes, we consulted four resources and compiled a pool of 25 potential English derivational noun suffixes. These suffixes were selected based on examples listed in the following references: (1) The Word Part Levels Test (Sasao & Webb, Reference Sasao and Webb2017); (2) infrequent English affixes from Bauer and Nation’s (1993) Level 5 word family list; (3) English Word-Stress (Fudge, Reference Fudge1984); and (4) English Suffixes, Stress-assignment Properties, Productivity, Selection and Combinatorial Processes (Trevian, Reference Trevian2015).
To prepare for the formal study, we created form recognition, meaning recall, and meaning recognition tests for the 25 potential noun suffixes, following a format similar to that of the main study. These tests were administered sequentially—in the order of form recognition, meaning recall, and meaning recognition—to 15 pilot participants with backgrounds equivalent to those of the main study participants. Suffixes recognized by the pilot participants were excluded as potential targets. To ensure accurate eye movement data collection, suffixes of only one or two letters were excluded. This decision was made because evidence indicates that larger interest areas are more likely to be fixated and less likely to be skipped in eye-tracking studies (Conklin & Pellicer-Sánchez, Reference Conklin and Pellicer-Sánchez2016). After applying these criteria, eight suffixes remained, four of which were four letters long, and four of which were three letters long. The full set of 25 potential suffixes and the final eight targeted suffixes are listed in Table 1.
Table 1. Twenty-five potential suffixes and the eight targeted suffixes

Each of the eight selected suffixes represented multiple lexical meanings. For example, the suffix “-ster” can denote (1) “a person linked with an activity” (e.g., dopester, fraudster, gamester, gangster, etc.) or (2) “a conveyance on which or by which people or things are transported” (e.g., dragster, speedster, sportster, etc.) (Trevian, Reference Trevian2015, p. 134). Drawing on the aforementioned resources, one designated meaning was selected for each suffix, ensuring no overlap in meaning across the targeted suffixes. This approach was guided by two considerations: first, the study assesses participants’ learning of suffix meaning knowledge, including meaning recall and meaning recognition; second, using a single, designated meaning minimizes participant confusion. Details about the eight targeted suffixes and their associated information are provided in Appendix A.
Affix productivity and frequency are critical factors in affix learning and processing (Bertram et al., Reference Bertram, Schreuder and Baayen2000; Beyersmann et al., Reference Beyersmann, Casalis, Ziegler and Grainger2015). Using the Corpus of Contemporary American English (COCA), we calculated the total token counts for each targeted suffix to determine suffix frequency. Affix productivity was calculated by comparing the number of hapax legomena—types that occur once only in the corpus—with the total number of tokens (i.e., frequency) (Baayen, Reference Baayen1994). The productivity and frequency values for each targeted suffix are presented in Table 2.
Table 2. Number of hapax legomena, frequency, and productivity of all the targeted suffixes

Pseudo base words
To conduct the experiment, pseudowords composed of pseudo base words and targeted suffixes were used. Since pseudowords lack lexical meaning, this design prevented participants from inferring suffix meanings from whole words. Moreover, using pseudo base words allowed each occurrence of a suffix to pair with a different pseudo base, thereby promoting the learning of suffixes rather than entire pseudowords.
Previous research suggests that eight exposures may be sufficient for L2 learners to incidentally learn novel words (Pellicer-Sánchez, Reference Pellicer-Sánchez2016). Therefore, each targeted suffix was paired with eight unique pseudo base words. These pseudo base words were monosyllabic and five letters in length. The choice of five-letter pseudo base words was informed by eye-tracking research, which indicates that words approximately eight letters or longer are more likely to be fixated (Daneman & Carpenter, Reference Daneman and Carpenter1983). As the targeted suffixes in this study are no less than three letters long, the pseudo base words required a minimum of five letters to maintain the desired word length. Furthermore, all pseudo base words were monosyllabic, as some suffixes could only attach to one-syllable bases.
Monosyllabic nonwords limited to five letters were generated using the ARC Nonword Database (Rastle et al., Reference Rastle, Harrington and Coltheart2002). To control for orthographic frequency effects, pseudo base words were selected based on bigram frequency and OLD20 (Orthographic Levenshtein distance) values (e.g., Mousikou & Schroeder, Reference Mousikou and Schroeder2019). Bigram frequencies were obtained from Medler and Binder’s Online Orthographic Database (Medler & Binder, Reference Medler and Binder2005). The frequency of the bigram straddling the pseudo base word and suffix was also considered, as it influences readers’ ability to decompose morphologically complex words (Hay, Reference Hay2002, Reference Hay2003). The OLD20 value (Yarkoni et al., Reference Yarkoni, Balota and Yap2008) was calculated as the average edit distance (insertion, deletion, and substitution) of the 20 nearest neighbors in the lexicon, with higher OLD20 values indicating sparser orthographic neighborhoods (Mousikou & Schroeder, Reference Mousikou and Schroeder2019). Wuggy (Keuleers & Brysbaert, Reference Keuleers and Brysbaert2010) was used to obtain OLD20 values for the generated pseudo base words.
Eight pseudo base words were randomly selected for each targeted suffix, referencing bigram frequencies and OLD20 values, resulting in a total of 64 targeted pseudowords. To ensure comparability across the eight suffix groups, three one-way Analyses of Variance were performed to examine potential differences in bigram frequency within the base word, bigram frequency at the boundary between the pseudo base word and suffix, and OLD20 values. Results showed no significant differences in bigram frequency within the base word (F (7,56) = 0.044, p = 1.000), bigram frequency at the boundary (F (7,56) = 0.069, p = 0.999), or OLD20 values (F (7,56) = 0.848, p = 0.552). These findings suggest that the pseudo base words in each suffix group are comparable across these key orthographic measures.
Since the pseudo bases lack semantic meaning, it was necessary to assign meanings consistent with their assigned suffixes. After finalizing the forms of 64 pseudowords, each was given a meaning that reflected its suffix. For instance, the suffix -ster, which conveys the meaning “a conveyance/vehicle,” was used in the pseudoword wroulster (composed of wroul and -ster), which was assigned the meaning “tram” to align with the suffix’s meaning.
Reading contexts
Three-sentence contexts were created for each targeted pseudoword. Each context consisted of a 12-word first sentence and an 8-word last sentence framing the second sentence, which contained the targeted pseudoword. The second sentence was 22 words long, with the pseudoword consistently appearing as the fifth word. A total of eight neutral sentence frames were used to separate the processing of the targeted pseudoword/context from sentence integration and wrap-up during the eye tracking experiment (Lowell & Morris, Reference Lowell and Morris2017). These 64 three-sentence contexts formed the reading materials for this study. To ensure readability, all sentences were written using the first 2,000 most frequent English word families from the British National Corpus/Corpus of Contemporary American English word lists (Nation, Reference Nation2012). This was confirmed using the Range for texts function on the Compleat Lexical Tutor website (Cobb, Reference Cobbn.d.). Each three-sentence context underwent independent review and revision by two native English speakers with backgrounds in English language education.
Given that the study focused on incidental suffix learning through reading, it was essential to measure the informativeness of the reading contexts. For this purpose, 15 graduate students with similar demographic backgrounds to those of the main study participants rated the contextual richness of the second sentences. The raters used Webb’s (Reference Webb2007) three-level scale for contextual richness, excluding the misleading context option, as none of the contexts were intended to mislead participants. Thus, only the following three ratings were used:
1. It is unlikely that the exact meaning of the target word can be inferred. However, information in the context may lead to partial knowledge of the target word’s meaning.
2. Information in the context may make it possible to infer the meaning of the target word. However, there are a number of choices. Participants may gain partial knowledge.
3. Participants have a good chance of inferring the meaning correctly. There are few meanings that are logical apart from the correct meaning. Participants should gain at least partial knowledge (Webb, Reference Webb2007, p. 53).
The raters were provided with the meanings of the targeted suffixes and the second sentences containing the pseudowords in the fifth word position. Targeted suffixes were bolded, and the informative parts of the context were underlined. Raters rated the extent to which the underlined portions supported inference of the suffix meanings. Task details are provided in Appendix E. A mean contextual richness score was calculated for each context by averaging the raters’ scores. These scores for each targeted suffix are presented in Table 3.
Table 3. Contextual richness of all the targeted suffixes

Note: This table shows the mean contextual richness (i.e., informativeness) for each suffix’s three-sentence reading context.
Furthermore, 48 filler contexts were created following the same structure and vocabulary constraints. Like the experimental materials, these fillers were composed using the 2,000 most frequent word families (Nation, Reference Nation2012). To ensure participants paid attention during the reading process, 24 true/false comprehension questions were included, following established methods for assessing reader engagement (e.g., Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Godfroid et al. Reference Godfroid, Ahn, Choi, Ballard, Cui, Johnston, Lee, Sarkar and Yoon2018).
Offline tests
Suffix tests
To evaluate participants’ suffix knowledge acquired through reading, a pre-test and post-test design was employed, with the tests administered one week apart. Both tests consisted of three components: form recognition, meaning recall, and meaning recognition, each assessing a different aspect of suffix knowledge. These three suffix tests were divided into two parts. Part one, administered first, assessed form recognition and meaning recall. Form recognition was evaluated using a multiple-choice format similar to the Word Part Levels Test (Sasao & Webb, Reference Sasao and Webb2017), where participants were required to select the real English suffix from four options. For meaning recall, a blank space was provided to the right side of the multiple-choice item, where participants were instructed to write the Chinese translation of the selected suffix. In part two, meaning recognition was assessed using a similar multiple-choice format. Each item presented four options matched for part of speech and length, requiring participants to select the correct meaning of the suffix. The test was divided into two parts, as the multiple-choice options in part two could aid participants in answering the meaning recall section of part one. To mitigate the risk of a practice effect, the pre-tests included 17 distractors in addition to the eight targeted suffixes. In the post-test, only knowledge of the eight targeted suffixes was assessed, as no delayed post-test was arranged. Each correct response was assigned one point, with a maximum total score of 24. Pre-test and post-test materials are provided in Appendices B and C, respectively.
Language proficiency tests
To be included, participants were required to demonstrate mastery of the first 2,000 English word families as measured by the Updated Vocabulary Levels Test (Webb et al., Reference Webb, Sasao and Ballance2017). This criterion ensured participants could comprehend the study stimuli. Language proficiency, often measured by vocabulary knowledge, has been suggested to be closely related to morphological knowledge (Kraut, Reference Kraut2015). Therefore, we administered the LexTALE test (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) to assess participants’ L2 proficiency.
Participants’ knowledge of the three aspects of derivational morphology proposed by Tyler and Nagy (Reference Tyler and Nagy1989), namely syntactic knowledge, relational knowledge, and distributional knowledge, was also evaluated using separate tests. Syntactic knowledge was assessed using an adapted version of the affix-choice nonword task developed by Mahony (Reference Mahony1994). This cloze-like task required participants to select one out of four suffixed nonword options to fill a sentence gap. Each option utilized the same nonword base but with different real suffixes, only one of which was syntactically compatible with the sentence. The task included 24 items, with a maximum score of 24.
Relational knowledge was assessed using a morphological relatedness test adapted from Mahony et al. (Reference Mahony, Singson and Mann2000). Participants were instructed to determine whether 20 word pairs were morphologically related, with a maximum score of 20.
Distributional knowledge was assessed using a task adapted from Study 1 in Friedline (Reference Friedline2011). This test comprised 30 items that tested correct and incorrect ordering of derivational morphemes (e.g., hopenessful vs. playfulness). Participants were offered a six-degree scale to rate the “realness” of the words, with a maximum score of 30. The test materials for these assessments are provided in Appendix D . Participants’ language proficiency, morphological awareness, and related demographic information are summarized in Table 4.
Table 4. Participant information

Note: Maximum morphological awareness score is 74. LexTALE score is used as a proxy for language proficiency, with a maximum score of 100.
Procedures
Potential participants completed an online background questionnaire through Qualtrics (www.qualtrics.com). Eligibility criteria included being non-English majors, Chinese-speaking learners of English, and postgraduate students. Qualified participants were then invited to the lab, where they completed a consent form, the Updated Vocabulary Levels Test (Webb et al., Reference Webb, Sasao and Ballance2017), and the pre-test. Those who demonstrated mastery of the most frequent 2,000 English word families were invited back one week later to complete the eye-tracking portion of the study. Immediately following the eye movement data collection, an unexpected post-test was administered, after which the morphological awareness test and the LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) were administered.
Eye movement data were collected using an EyeLink Portable Duo eye tracker (SR Research Ltd.; https://www.srresearch.com/) at a sampling rate of 1,000 Hz. Participants were instructed to place their chins on a chin rest while seated 65 cm from the monitor. Stimuli were presented in Courier New font, size 24, and double-spaced, with each letter averaging 19 pixels on a 13-inch monitor, resulting in a visual angle of approximately 0.47 degrees. Participants read binocularly, but only movements of the right eye were tracked and recorded. The experiment consisted of 113 trials, including 1 practice trial, 64 stimulus trials, and 48 filler trials, evenly divided into three sections. Prior to each section, a 9-point calibration was conducted, with short breaks provided between sections to allow participants to rest. A 1-point drift correction was performed before each trial to determine if further calibration was necessary.
Participants first viewed an introductory screen containing instructions indicating they would read English paragraphs on the screen, some followed by true/false comprehension questions. A practice trial familiarized participants with the experiment procedure. Targeted pseudowords were never positioned as the first or last word on a line. Participants could press the spacebar or respond to true/false questions using two designated keyboard buttons to advance. Participants could not access previous screens.
Each three-sentence reading context included three areas of interest (AOIs). The height of each AOI was set to 74 pixels, corresponding to the default double-spaced text in Eyelink software, while widths varied according to AOI length. The first AOI comprised the five-letter pseudo base word, measuring 7,696 pixels. The second AOI included the targeted suffix, measuring 4,884 pixels for three-letter suffixes and 6,290 pixels for four-letter suffixes. The third AOI encompassed the informative context following the pseudowords, with variable widths due to the context’s length.
Data preparation and analysis
For the offline form recognition, meaning recall, and meaning recognition tests, incorrect responses were coded as 0 and correct responses as 1. The meaning recall test was independently scored by two research assistants. Interrater reliability for the pre-test: Cohen’s kappa = 1.000, p < .001, given that there was only one correct response to score; post-test: Cohen’s kappa = .954, p < .001. Raw eye-tracking data were pre-processed using Eyelink Data Viewer software (SR Research). Fixation patterns for each trial were reviewed, and vertical drift was manually adjusted as necessary. Fixations shorter than 80 ms were merged with adjacent fixations within 0.5 degrees of visual angle. Subsequently, fixations shorter than 80 ms or longer than 1200 ms were deleted. The eye fixation measures analyzed included total reading times for three AOIs: pseudo base words, targeted suffixes, and informative contexts.
Statistical analyses were conducted using R (R Core Team, 2022). To answer Research Question 1, two paired-samples t-tests were performed to compare pre-test and post-test scores on form recognition and meaning recognition. A Wilcoxon Signed-Rank Test was conducted for meaning recall due to the nonnormality of the data.
To address Research Question 2, three mixed-effect logistic regression models were built to identify factors predicting participants’ accuracy on the three post-tests. These models were implemented using functions from the lme4 package (version 1.1.21; Bates et al., Reference Bates, Mächler, Bolker and Walker2015). For all models except the one for meaning recall, participants’ pre-test responses were included as a covariate. Including pre-test scores for meaning recall resulted in a nearly singular Hessian, as participants had only one correct response on the pre-test, rendering this variable uninformative. Suffixes were coded as 0 for those with three letters and 1 for those with four letters. Continuous numerical variables were standardized as z-scores. These included total reading times for pseudo base words, targeted suffixes, and informative contexts; contextual variables such as richness ratings and length; participant scores on vocabulary knowledge and morphological awareness tests, and suffix productivity and frequency. Informative context length, defined as the total number of letters, spaces, and punctuation marks, was treated as a control variable in the models.
Results
Suffix learning
To answer the first research question, we examined whether any of the three aspects of learners’ suffix knowledge improved significantly after the treatment. Pre-test and post-test scores were calculated for each participant, and paired-samples t-tests and a Wilcoxon Signed-Rank Test were conducted to determine whether the observed increases were statistically significant. The alpha level was adjusted to 0.05/3 to account for the three comparisons (Al-Hoorie & Vitta, Reference Al-Hoorie and Vitta2019). For form recognition, a paired-samples t-test showed that post-test scores (M = 5.400, SD = 2.023) were significantly higher than pre-test scores (M = 3.175, SD = 1.647), t (39) = −6.022, p < .001, Cohen’s d = 1.213. Similarly, for meaning recognition, participants’ accuracy in the post-test (M = 5.225, SD = 1.941) was significantly higher than their pre-test scores (M = 3.450, SD = 1.280), t (39) = −5.211, p < .001, Cohen’s d = 1.102. For meaning recall, the Wilcoxon Signed-Rank Test indicated a significant improvement in post-test scores (M = 1.675, SD = 1.789) compared to pre-test scores (M = 0.025, SD = 0.158), Z = −4.572, p < .001, Cohen’s d = 1.686. Cohen’s d was calculated as Cohen’s d av for within-subject comparisons (Cumming, Reference Cumming2012). Figure 1 shows a comparison of pre-test and post-test scores for each aspect of suffix knowledge.

Figure 1. Suffix pre-test and post-test scores (max. score = 8). Bars within the box plots show medians and gray points show means.
Effects of predictor variables
Descriptive statistics for eye fixation measures, i.e., total reading times on pseudo base word, suffix, and context, are presented in Table 5.
Table 5. Descriptive statistics for eye fixation measures on base word, suffix, and context

To address the second research question, three multilevel logistic regression models were constructed, one for each suffix knowledge test: form recognition, meaning recall, and meaning recognition. These models examined the fixed effects of reading time fixation measures, learners’ L2 proficiency (vocabulary knowledge and morphological awareness), and language-level factors (suffix length, suffix productivity, suffix frequency, and contextual richness) on the incidental learning of targeted suffixes.Footnote 1 The models included maximal random effects structures following the recommendations of Barr et al. (Reference Barr, Levy, Scheepers and Tily2013). However, the random effects required simplification to avoid either convergence issues or singular fits. Specifically, the form recognition model included a random intercept for participant and an uncorrelated random slope for morphological awareness. The meaning recall model included random intercepts for both participant and suffix items, while the meaning recognition model included only a random intercept for participant. The results of these three models are presented in Table 6.
Table 6. Effects of eye fixation measures, participant’s language proficiency, and language-level factors on suffix knowledge learning gains

Note: Vocabulary knowledge, morphological awareness, total reading time (base word), total reading time (suffix), total reading time (context), contextual richness, suffix productivity, suffix frequency, and context length were all standardized as z scores. Total reading time (base word), total reading time (suffix), and total reading time (context) are given in ms. Values of p in bold face are significant at the .05 level of alpha.
a Morphological awareness.
For form recognition and meaning recall, total reading times on base words and suffixes positively predicted post-test response accuracy (form recognition: b = 0.999, SE = 0.458, p = 0.029, OR = 2.715 and b = 0.774, SE = 0.360, p = 0.032, OR = 2.169; meaning recall: b = 1.328, SE = 0.405, p = .001, OR = 3.772 and b = 0.592, SE = 0.263, p = 0.024, OR = 1.808). Odds ratios indicate that participants with reading times 1 SD longer than the mean were 2.715 and 2.169 times more likely to respond correctly on the form recognition post-test, and 3.772 and 1.808 times more likely on the meaning recall post-test, respectively. For meaning recognition, only total reading time on suffixes significantly predicted more accurate post-test responses (b = 0.677, SE = 0.258, p = 0.009, OR = 1.968).
In contrast, total reading time on contexts negatively impacted response accuracy across all post-tests (form recognition: b = −0.794, SE = 0.310, p = 0.011, OR = 0.452; meaning recall: b = −1.359, SE = 0.384, p < .001, OR = 0.257; and meaning recognition: b = −0.908, SE = 0.229, p < .001, OR = 0.403). For language-level factors, responses to 4-letter suffixes were significantly more accurate than 3-letter suffixes, particularly for form recognition (b = 1.742, SE = 0.755, p = 0.021, OR = 5.706). Suffix productivity also positively influenced meaning recall scores (b = 1.811, SE = 0.598, p = 0.002, OR = 6.117). However, contextual richness and suffix frequency did not significantly affect performance on any test. For language proficiency, better morphological awareness significantly improved meaning recall accuracy (b = 0.517, SE = 0.250, p = 0.038, OR = 1.677), while richer vocabulary knowledge positively impacted meaning recognition scores (b = 0.349, SE = 0.175, p = 0.046, OR = 1.417).
Discussion
To investigate whether Chinese EFL learners could make incidental learning gains in three aspects of suffix knowledge, we examined pre-test and post-test accuracy for eight neutral derivational noun suffixes presented within three-sentence contexts. To gain a more comprehensive understanding of how learners’ attention to morphological structure may contribute to learning gains, we also collected reading time data using eye tracking. These reading times, along with variables reflecting participants’ L2 proficiency and language-level factors, were analyzed to identify predictors of accuracy on the three offline tests. Research Question 1 addressed the possibility of incidental learning of suffixes during reading, and the data showed that high-intermediate Chinese EFL learners could incidentally learn both form and meaning of suffixes through reading. Research Question 2 investigated the factors that potentially contributed to incidental learning across the three aspects of suffix knowledge (i.e., suffix form recognition, meaning recognition, and meaning recall). Our results indicated that attentional eye fixation measures, language-level features, and learners’ language proficiency had an impact on learners’ gains in suffix knowledge.
Suffix learning
Research Question 1 aimed to investigate whether participants could make learning gains through reading pseudowords presented within a context that provided insight into the meaning of each word and, subsequently, the meaning of the corresponding suffix. We compared pre-test and post-test scores for form recognition, meaning recall, and meaning recognition—three common measures of word knowledge used in incidental vocabulary learning research. Statistical analyses revealed significant improvements in the Chinese EFL learners’ knowledge of both the form and meaning of the targeted suffixes after the reading session. Our findings regarding the incidental learning of L2 suffixes parallel findings on the incidental learning of L2 vocabulary in multiple ways. First, they suggest that L2 reading can also support the incidental learning of affixes, aligning with studies showing that reading can foster the incidental learning of L2 vocabulary.
Additionally, the significant learning gains observed in this study indicate that eight repeated exposures to the suffixes were sufficient for incidental learning. This finding aligns with previous L2 vocabulary studies, which have found that eight exposures to a new word are generally sufficient for incidental learning (Horst et al., Reference Horst, Cobb and Meara1998; Waring & Takaki, Reference Waring and Takaki2003; Pellicer-Sánchez, Reference Pellicer-Sánchez2016). However, unlike traditional vocabulary studies where learners encounter the same word multiple times, participants in this study encountered eight unique words, each sharing the same suffix. This design emphasizes the suffix itself rather than the entire pseudoword incorporating the suffix, more closely reflecting natural learning contexts where learners are likely to encounter different words that share a common suffix. The learning gains observed in this study for suffixes suggest that, like findings for words in previous studies (e.g., Pellicer-Sánchez, Reference Pellicer-Sánchez2016), the frequency threshold of eight exposures is enough for learning either. Moreover, the learning gains suggest that the high-intermediate Chinese EFL learners in this study successfully decomposed the morphologically complex words encountered during reading into pseudo base words and targeted suffixes. Thus, this study supports the view that intermediate to advanced L2 learners can effectively decompose morphologically complex words into their bases and derivational affixes (Diependaele et al., Reference Diependaele, Duñabeitia, Morris and Keuleers2011; Li et al., Reference Li, Taft and Xu2017; Iwaizumi & Webb, Reference Iwaizumi and Webb2023).
Similar to findings from previous research on incidental L2 vocabulary learning (Schmitt, Reference Schmitt2008, Reference Schmitt2010; Nation, Reference Nation2013; Mohamed, Reference Mohamed2018), this study found that among the three types of word knowledge, form recognition was the easiest for learners, followed by meaning recognition, and finally meaning recall. In this study, the greatest learning gains were observed in suffix form recognition, with an average increase of 2.22 correct answers on the eight-item post-test. This was followed by suffix meaning recognition, which showed an average increase of 1.78 correct answers, and suffix meaning recall, which had the smallest increase of 1.65 correct answers. These findings suggest that the learning difficulty pattern observed in vocabulary acquisition also applies to the acquisition of derivational suffixes.
Effects of predictor variables
Attentional fixation measures
The attentional fixation measures used in this study included total reading times on pseudo base words, targeted suffixes, and informative contexts. Two of these measures—total reading time on base words and suffixes—captured the time participants spent fixating on targeted pseudowords. Previous research on incidental vocabulary learning has consistently shown that total reading time on novel words significantly and positively predicts learning success for those words (Godfroid et al., Reference Godfroid, Boers and Housen2013; Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Mohamed, Reference Mohamed2018). In alignment with these findings, our study revealed that total reading time on targeted novel suffixes significantly and positively predicted learning gains across all three suffix knowledge tests. This positive relationship between attention to suffixes and suffix knowledge learning supports theoretical accounts of L2 acquisition, such as Schmidt’s noticing hypothesis (1994, 1995, and 2001), which underscores the critical role of attention in language learning.
Additionally, our L2 learners’ attention to both constituent morphemes (i.e., pseudo base words and suffixes) within the pseudowords provides insight into how morphological information was applied during sentential reading. The positive relationships between attention to suffixes and suffix learning gains, and between attention to base words and suffix learning gains, suggest that both suffixes and base words were effectively exploited as cues to support suffix learning when learners encountered unknown morphologically complex words. This finding contrasts with prior research that has tended to emphasize the prominent role of either component morpheme (i.e., suffix effect vs. embedded stem activation account; e.g., Beyersmann et al., Reference Beyersmann, Casalis, Ziegler and Grainger2015; Beyersmann et al., Reference Beyersmann, Cavalli, Casalis and Colé2016; Grainger & Beyersmann, Reference Grainger, Beyersman and Ross2017; Amenta et al., Reference Amenta, Foppolo and Badan2025). In previous behavioral lexical decision paradigms, where (pseudo)words are presented in isolation, the relative contribution of each morpheme has been investigated primarily through comparisons of participants’ rejection accuracy and reaction times across different types of pseudowords (e.g., Casalis et al., Reference Casalis, Commissaire and Duncan2015; Li et al., Reference Li, Taft and Xu2017; Amenta et al., Reference Amenta, Foppolo and Badan2025). Under such time-constrained conditions, participants are required to make rapid lexical decisions. For example, Amenta et al. (Reference Amenta, Foppolo and Badan2025) reported that higher-proficiency participants rejected pseudowords with a real base word and a non-existent suffix more quickly than those with a non-existent base word and a real suffix (a condition that parallels our pseudoword design), suggesting a reliance on suffixes as morphological cues. Similarly, Casalis et al. (Reference Casalis, Commissaire and Duncan2015) and Amenta et al. (Reference Amenta, Foppolo and Badan2025) found that words containing real suffixes were more likely to be recognized than those without, regardless of language proficiency. Research with native speakers has also demonstrated suffix effects through priming paradigms—for instance, Duñabeitia et al. (Reference Duñabeitia, Perea and Carreiras2008) used word pairs such as baker–WALKER to illustrate suffix priming. Regarding the role of base words in morphological processing and visual word recognition, studies have shown that low-proficiency learners are more likely to recognize words containing real base words or to reject pseudowords with non-existent bases (Amenta et al., Reference Amenta, Foppolo and Badan2025; Casalis et al., Reference Casalis, Commissaire and Duncan2015). Such effects have also been documented among native speakers in priming experiments using pairs like adaptable–ADAPTER (Rastle et al., Reference Rastle, Davis, Marslen-Wilson and Tyler2000). Collectively, these findings suggest that low-proficiency learners are particularly sensitive to embedded base words as morphological cues.
However, findings from lexical decision tasks may not fully reflect learners’ comprehensive attention allocation or processing strategies when engaging with morphologically complex words. Rather, under time pressure, learners may prioritize the most accessible or diagnostic cues to make quick decisions. In contrast, our study employed a self-paced silent reading task within sentential contexts, where participants were likely motivated to derive meaning from the complex words in context. This may have encouraged dual attention to both suffixes and base words. The consistent appearance of a shared suffix across several novel words may have further prompted learners to recognize its form and infer its meaning. Unlike the case of novel-derived word processing, where attention to both morphemes is often attributed to morphemic interference (Taft & Forster, Reference Taft and Forster1975), our findings suggest that attention was directed toward the integration of morphemes for meaning construction. Thus, attention to both suffixes and pseudo base words indicates that higher-proficiency L2 learners are sensitive to constituent morphemes even when encountering unknown words in context. This supports the view that morphological decomposition occurs during natural reading (Amenta et al., Reference Amenta, Marelli and Crepaldi2015) and is consistent with findings from priming and lexical decision tasks that have demonstrated morphological decomposition in L2 learners (Diependaele et al., Reference Diependaele, Duñabeitia, Morris and Keuleers2011; Duñabeitia et al., Reference Duñabeitia, Dimitropoulou, Morris and Diependaele2013; Casalis et al., Reference Casalis, Commissaire and Duncan2015; Zhang et al., Reference Zhang, Liang, Yao, Hu and Chen2016; Li et al., Reference Li, Taft and Xu2017; Amenta et al., Reference Amenta, Foppolo and Badan2025).
In addition, the relationship between total reading time on base words and suffix form recognition performance can potentially be explained by the phenomenon of parafoveal preview (Schotter et al., Reference Schotter, Angele and Rayner2012), where readers gather information about upcoming words or morphemes while fixating on the current one. As demonstrated in compound words processing (Hyönä et al., Reference Hyönä, Bertram and Pollatsek2004), parafoveal preview may also account for the relationship between attention to base words and suffix form recognition. This mechanism may also underlie why total reading time on base words predicts meaning recall accuracy, suggesting that suffix meaning can be inferred from broader contextual engagement.
In contrast to the positive relationship observed between reading times on targeted words and learning outcomes, we found a negative relationship between reading times on contexts and all three measures of suffix knowledge. The underlying reasons for this finding are less well understood, as fixation-based measures on contextual regions have rarely been the focus of previous eye-tracking studies on incidental vocabulary learning. One possible explanation lies in the larger size of the context regions as AOIs. Fixation durations on these broader regions likely encompass not only attention to the contextual input but also the cognitive effort required to process and comprehend that input. This blend of attention and processing demands may help explain the observed negative correlation. Additionally, the role of language proficiency suggested by the influence of vocabulary knowledge provides a further possible interpretation: faster readers, who are often more proficient, may have required less time to extract relevant information from context and were thus better positioned to consolidate morphological knowledge. In this view, shorter reading times on context may reflect more efficient processing rather than reduced engagement.
Language-level factors
For the orthographic length of suffix, we found that long (i.e., four-letter) suffixes were more likely to be successfully learned than short (i.e., three-letter) suffixes in the form recognition test. This advantage may result from differences in phonological patterns between the two types of suffixes. Phonologically, a syllable typically consists of a vowel nucleus, an optional consonant onset preceding the vowel, and an optional consonant coda following it. All the long suffixes in our study began with a consonant onset, whereas three of the four targeted short suffixes began with a vowel nucleus. Consequently, the vowel nuclei in these short suffixes may have integrated with the consonant codas of their preceding pseudo bases’ syllables when the pseudowords were pronounced. This integration likely made it more challenging for learners to decompose the pseudo base words from the nucleus-initial short suffixes during reading, adversely affecting learning gains for both form and meaning knowledge. Another factor influencing the learning of short suffixes could be their phonological properties. Three nucleus-initial short suffixes are considered non-neutral in some contexts (Fudge, Reference Fudge1984), potentially increasing their learning difficulty for L2 learners (Tyler & Nagy, Reference Tyler and Nagy1989; Friedline, Reference Friedline2011). However, during the suffix selection phase, we followed Trevian’s (Reference Trevian2015) suggestion that examples exhibiting non-neutral properties in short suffixes are exceptional cases. Therefore, we classified these short suffixes as neutral. Nevertheless, the findings may reflect the “mixed” nature (i.e., sometimes neutral, sometimes non-neutral) of these short suffixes. An alternative explanation is that long suffixes, being perceptually salient (Laudanna & Burani, Reference Laudanna, Burani and Feldman1995; Bertram et al., Reference Bertram, Laine and Karvinen1999), prompt readers to adopt a decomposition strategy during morphological processing, where stems and suffixes are processed separately. This separate processing likely draws learners’ attention to the long suffixes, resulting in improved recognition of their forms (Kuperman et al., Reference Kuperman, Bertram and Baayen2010). Amenta et al. (Reference Amenta, Marelli and Crepaldi2015) provided additional insight using eye fixation data, observing that the first fixation location in derived words typically falls on the stem. Long suffixes, positioned in the periphery of vision, may experience reduced visual acuity and limited visibility, which can hinder comprehensive morphological and early semantic processing. However, this incomplete early processing may redirect readers’ attention specifically to the long suffixes, thereby facilitating their form recognition.
In relation to the two key suffix properties relevant to morphological processing, previous research involving both L1 speakers (e.g., Bertram et al., Reference Bertram, Laine and Karvinen1999; Bertram et al., Reference Bertram, Schreuder and Baayen2000; Lázaro et al., Reference Lázaro, Sainz and Illera2015; Sánchez-Gutiérrez et al., Reference Sánchez-Gutiérrez, Mailhot, Deacon and Wilson2018) and L2 learners (e.g., Burani & Thornton, Reference Burani, Thornton, Baayen and Schreuder2003; Silva & Clahsen, Reference Silva and Clahsen2008; Dal Maso & Giraudo, Reference Dal Maso and Giraudo2014) has shown facilitatory effects of affix productivity and affix frequency through various experimental paradigms. Specifically, more productive and more frequent affixes have been associated with enhanced word recognition and, in some cases, greater difficulty in rejecting pseudowords. In our study, suffix productivity significantly predicted L2 learners’ success in meaning recall, suggesting that production-based task is particularly sensitive to variation at the suffix level. In contrast, suffix frequency did not predict any learning outcomes. These findings highlight the beneficial role of suffix productivity, especially in the most cognitively demanding dimension of suffix knowledge: meaning recall. Notably, suffix productivity only predicted performance on the production-based suffix knowledge task. This pattern aligns with prior findings from L1 English-speaking children, who used highly productive suffixes more effectively than less productive ones in elicited production tasks (Windsor & Hwang, Reference Windsor and Hwang1999). Conversely, Schmidtke et al. (Reference Schmidtke, Rahmanian and Moro2022) found that L2 learners made greater gains with less productive suffixes in a written suffix production task. However, their participants received eight months of language exposure, whereas learners in our study had a significantly shorter learning duration. The discrepancy likely reflects differences in exposure time, which may modulate the role of suffix productivity during learning. Regarding suffix frequency, its lack of predictive value in our study may stem from our experimental design. During the selection process, suffixes already recognized by most pilot participants were excluded, ensuring that target suffixes were low in frequency. Consequently, the limited variability in suffix frequency likely reduced its potential impact on learning outcomes.
Similarly, contextual richness did not have a significant impact on any suffix post-tests. This was somewhat unexpected, as contextual richness is often associated with gains in word meaning knowledge (Webb, Reference Webb2008). However, this result may be explained by the design of our study. Using a 1-to-3 rating scale, all contexts were informative as intended, with an average rating of 2.31 (SD = 0.08, range = 2.18–2.38; see Table 3). The lack of variation in contextual richness within the study may account for its limited influence on suffix knowledge gains. Notably, previous eye-tracking studies on incidental vocabulary learning have not examined the contextual richness of individual reading contexts (e.g., Godfroid et al., Reference Godfroid, Boers and Housen2013; Pellicer-Sánchez, Reference Pellicer-Sánchez2016; Mohamed, Reference Mohamed2018). Future research with a broader range of contextual richness will be necessary to clarify its potential relationship with the learning of the three aspects of suffix knowledge.
Learner’s language proficiency
Our results provide evidence that learners’ vocabulary knowledge predicts incidental learning gains in suffix knowledge, though this effect was observed in only one of the three post-tests. Similar to previous research showing a positive relationship between language proficiency and word meaning recognition (e.g., Tekmen & Daloğlu, Reference Tekmen and Daloǧlu2006), our research found that language proficiency, as measured by vocabulary knowledge, positively and significantly predicted suffix meaning recognition success. Language proficiency has been discussed as a key factor influencing L2 learners’ morphological processing strategies. Highly proficient learners are more likely to decompose morphologically complex words, similar to native speakers, whereas less proficient learners tend to rely on whole word recognition (e.g., Silva & Clahsen, Reference Silva and Clahsen2008; Li et al., Reference Li, Taft and Xu2017). The learning gains in suffix meaning recognition observed among our high intermediate proficiency learners support the decomposition-based strategy, as successful parsing of complex words into base words and suffixes is a prerequisite for understanding suffix learning. However, it is noteworthy that language proficiency predicted gains only in suffix meaning knowledge, not in form-related knowledge. This selective effect may be attributed to the nature of our experimental context, which involved reading for comprehension, a task that naturally emphasizes semantic processing over form-focused learning.
Nevertheless, the relationship between vocabulary knowledge and suffix learning gains should not be considered unidirectional or entirely direct. As with findings from research on the correlation between affix knowledge and L2 vocabulary (Mochizuki & Aizawa, Reference Mochizuki and Aizawa2000), our results indicate that the learning gains in affixes and vocabulary knowledge likely have a mutually beneficial relationship. Additionally, vocabulary knowledge, as a measure of language proficiency, is strongly correlated with learners’ reading comprehension (e.g., Laufer, Reference Laufer, Arnaud and Béjoint1992). Improvements in reading comprehension can enhance learners’ ability to understand sentence contexts, thereby facilitating the inference and learning of an unknown suffix’s meaning. Thus, reading comprehension may potentially mediate the relationship between suffix meaning learning gains and L2 vocabulary knowledge. Interestingly, vocabulary knowledge only significantly predicted suffix meaning recognition, not suffix meaning recall. This discrepancy may be attributed to the greater task difficulty associated with suffix meaning recall, which may require higher-frequency encounters with a given suffix for L2 learners to pick up some productive knowledge of it.
Morphological awareness for derivations only positively and significantly predicted learning gains in suffix meaning recall. Morphological awareness works when learners encounter morphologically complex words during reading, enabling them to recognize the morphemic structure of these words (Carlisle, Reference Carlisle and Feldman1995). However, this recognition of morphemic structure provides access only to the components of the word and may not necessarily reflect whether learners have fully learned the suffixes. Nevertheless, our study found a positive and significant relationship between morphological awareness and suffix meaning recall. As previously noted, the suffix meaning recall test is more difficult than the other two suffix knowledge tests since it measures productive knowledge, which requires test takers to provide the meaning of the suffix. The significant relationship between morphological awareness and learners’ performance on this more demanding test may indicate that suffix knowledge recall draws on learners’ morphological awareness to a greater degree than recognition does.
Limitations and implications
The first limitation of the present study is the inability to fully control for participants’ prior knowledge of the targeted suffixes. Although we used piloting to identify low-frequency suffixes that participants were unlikely to know, it is still possible that some participants had prior knowledge or partial familiarity with these suffixes. Another limitation lies in the ambiguity regarding whether the length of the suffix or another factor, such as the potential mixed neutral/non-neutral distinction, explains the greater incidental learning observed for four-letter suffixes. Finally, the study does not provide insight into whether participants will retain the knowledge demonstrated in the post-tests, as no delayed post-test was included to assess retention. Thus, future research should incorporate delayed post-tests to evaluate both suffix learning and retention. Additionally, future studies could benefit from incorporating derivational suffixes that form words of other parts of speech, non-neutral suffixes, and/or prefixes.
For L2 teaching, the findings of the present study align with previous research suggesting that reading in a L2 is beneficial for language learning. Furthermore, our study demonstrates that reading in an L2 supports the learning of unfamiliar suffixes in a manner comparable to its effect on learning mid-to-low frequency words. Thus, incidental learning during reading can serve as a complement to explicit teaching, particularly for high-frequency suffixes. Based on these findings, educational institutions and publishers may consider developing graded readers that systematically enhance learners’ suffix knowledge, following the established affix difficulty order (Bauer & Nation, Reference Bauer and Nation1993; Sasao & Webb, Reference Sasao and Webb2017). However, as there are few studies on the magnitude of incidental suffix learning gains, it is challenging to make direct comparisons or draw definitive conclusions. Despite this, incidental learning can still be considered a valuable complement to the explicit teaching of high-frequency affixes, much like its role in vocabulary acquisition (e.g., Schmitt & Carter, Reference Schmitt and Carter2000; Wu, Reference Wu2009; Hu, Reference Hu2013).
Conclusion
This study builds upon previous research on incidental vocabulary acquisition through reading by specifically examining English neutral derivational noun suffixes. The significant improvements in accuracy on the offline post-tests indicate that Chinese EFL learners can effectively acquire both the form and meaning of suffixes through incidental learning during reading. These findings highlight that L2 readers are capable of developing sensitivity to the morphological structures of a second language. Furthermore, incidental learning while reading proves to be a valuable supplement to intentional learning, particularly for English affixes that occur at lower frequencies. Additionally, the investigation into attentional fixation measures, learners’ language proficiency, and language-level features demonstrates that these factors play a role in influencing suffix learning gains.
Replication package
Data availability: the data that support the findings of this study are openly available in Open Science Framework at https://osf.io/xz49r/.
Acknowledgements
This research was supported by the University of Macau under research grant numbers MYRG-GRG2023-00112-FED.