French is not so easy to decode: a pilot study
Learning to decode printed words is an important part of learning to read (e.g., Castles et al., Reference Castles, Rastle and Nation2018), and it is more difficult in some writing systems than in others. With reference to the dual route model (Coltheart et al., Reference Coltheart, Rastle, Perry, Langdon and Ziegler2001; Ziegler et al., Reference Ziegler, Perry and Zorzi2014), words can be read either by the sublexical phonological procedure or by the lexical procedure. Within the sublexical phonological procedure, small written units (letters, graphemes, syllables) are converted into their phonological correspondents, whereas in the lexical procedure, written words are recognized as a whole and linked to their oral form and meaning stored in the internal lexicon. The sublexical phonological procedure is used very predominantly at the beginning of learning to read, and its proper development is crucial. Indeed, its efficiency strongly supports the acquisition of reading and spelling since orthographic representations of words are gradually acquired as they are successfully decoded (Martinet et al., Reference Martinet, Valdois and Fayol2004; Share, Reference Share1995). The ease with which children identify words is also the most important predictor of written comprehension up to the middle of primary school, the identification of written words being the sine qua non of reading comprehension (Share, Reference Share1995; Vellutino et al., Reference Vellutino, Tunmer, Jaccard and Chen2007). Moreover, the main source of difficulties in learning to read is problems in the sublexical procedure (Gentaz et al., Reference Gentaz, Sprenger-Charolles and Theurel2015; Spencer et al., Reference Spencer, Quinn and Wagner2014), while difficulties in decoding are more frequent in less consistent orthographic systems (Seymour et al., Reference Seymour, Aro and Erskine2003; Ziegler & Goswami, Reference Ziegler and Goswami2005). Taken together, these observations suggest that the grapheme-to-phoneme conversion rules that are most likely to cause challenges in a particular orthographic system should be precisely identified.
The French orthographic system has many particularities, which have led it to be described as “the least Latinate of all Romance languages” (Posner, Reference Posner1996, p. 245, our translation). With 130 graphemes for about 36 phonemes, it is an asymmetric orthographic system, being easier in the grapheme-to-phoneme direction than in the phoneme-to-grapheme one (Jaffré & Fayol, Reference Jaffré, Fayol, Joshi and Aaron2005). Even so, it is one of the most difficult writing systems to read after English. In a cross-language study, Seymour et al. (Reference Seymour, Aro and Erskine2003) recorded the percentage of words correctly read by pupils after 1 year of reading instruction in 13 orthographies and ranked French as a language of intermediate difficulty (alongside Danish and Portuguese), even though the items were frequent words that did not comprise the various inconsistencies of this orthographic system.
One of the main specificities of the French orthographic system is its high number of multiletter graphemes (i.e., groups of letters corresponding to one phoneme). Most of them are composed of two letters (“ou,” “ch,” “oi,” “ai,” “au,” “an,” “in,” “on,” etc.) and some of three letters (“eau,” “ain,” or “ein”). Some of them are highly cohesive, meaning that they always correspond to a grapheme unit in print (Chetail, Reference Chetail2020). For example, “au” systematically corresponds to a single grapheme and thus to one phoneme (/ɔ/). However, a considerable amount of French multiletter graphemes are ambiguous or low cohesive (Commissaire et al., Reference Commissaire, Besse, Demont and Casalis2018). For example, “an” can correspond either to one grapheme (i.e., /ɑ̃/ such as in “rang,” /rɑ̃/—“rank”; “blanc,” /blɑ̃/—“white”) or to two simple graphemes when followed by a vowel or the duplicated “n” (i.e., /a/-/n/ such as in “canard,” /kanar/—“duck”; “banir,” /banir/—“to ban”; “année,” /ane/—“year”). A few words are composed of two or even three of such ambiguous sequences of letters (“banane,” /banan/—“banana”; “cinema,” /sinema/—“cinema”; “inanimé,” /inanime/—“inanimate”). Such multiletter graphemes thus represent difficulties of orthographic segmentation into graphemes beyond those of converting graphemes into phonemes.
French also has a considerable number of contextual graphemes, i.e., graphemes that correspond to different phonemes according to the surrounding graphemes. Among the most frequent are “c” and “g,” which are also contextual in other languages such as English, Italian, or Spanish. In French, they respectively correspond to /k/ and /g/ when followed by “a,” “o,” “u,” “r,” or “l” (such as in “cave,” /kav/—“cellar”; clou, /klu/—“nail”; “goût,” /gu/—“taste”; “glace,” /glas/—“ice cream”) and to /s/ and /ʒ/ when followed by “e,” “i,” or “y” (such as in “cil,” /sil/—“lash”; “cerf,” /sϵr/—“deer”; “gilet,” /ʒilϵ/—“vest”; “geler,” /ʒəle/—“freeze”). The letter “s” is another contextual grapheme in French, corresponding to /z/ between two vowels (such as in “visage,” /vizaʒ/—“face”) but to /s/ at the beginning of words or between a vowel and a consonant (such as in “soupe,” /sup/—“soup”; “veste,” /vϵst/—“jacket”). Less often presented as such, the grapheme “e” inside words is also contextual (Afonso et al., Reference Afonso, Álvarez and Kandel2015; Léon et al., Reference Léon, Léon, Léon and Thomas2009). It makes /ϵ/ in almost all cases where it is followed by two consonants (“veste,” /vϵst/—“jacket”; “ferme,” /fϵrm/—“a farm”; “persil,” /pϵrsil/—“parsley”). On the opposite side, it makes /ə/ when followed by a single consonant (“devoir,” /dəvwar/—“homework”; “melon,” /məlɔ̃/—“melon”; “cerise,” /səriz/—“cherry”).
An additional challenge with French is its final consonants, which are largely unstable (Perry et al., Reference Perry, Ziegler and Zorzi2014). Some final consonants are mostly sounding (“l,” 99.2%, “cheval,” /ʃəval/—“horse”; “c,” 92%, “lac,” /lak/—“lake”; Peereman et al., Reference Peereman, Lété and Sprenger-Charolles2007), while others are almost always silent (“t,” 99.5%, “bout,” /bu/—“piece”; “p,” 98.8%, “trop,” /tro/—“too much”; “d,” 98.7%, “tard,” /tar/—“late”; “s,” 96.1%, “souris,” /suri/—“mouse”). Many silent final consonants have a function at the derivational level (“plat,” “plateau”—“flat,” “tray”), and there can also be two final silent letters at this word level (“temps”—“time”; “instinct”—“instinct”). One to three silent final letters also often appear at the inflectional level (“il fait”—“he does”; “des fourmis”—“ants”; “tu prends”—“you take,” “ils trouvent”—“they find”). This is due to the fact that French inflectional morphology is predominantly silent. Indeed, while written French resembles other Romance languages, spoken French has undergone its own evolution, characterized in particular by an influence from Germanic languages (Barra-Jover, Reference Barra-Jover2009), making it a particularly complex written language to acquire morphosyntactically (Ågren, Reference Ågren2016).
The consequences of these specificities on decoding performance are not sufficiently documented. First, for multiletter graphemes, results are not consistent as regards the automaticity with which they are processed. On the one end, studies showed that English and French adults were slower to detect a target letter in a word when the target letter was embedded in a multiletter grapheme (i.e., “U” in “LOUPE”) than when it corresponded to a single-letter grapheme (i.e., “U” in “CHUTE”; Rastle & Coltheart, Reference Rastle and Coltheart1998; Rey et al., Reference Rey, Ziegler and Jacobs2000), showing that such graphemes are processed as perceptual units. Among French children from Grades 1 to 4, Sprenger-Charolles et al. (Reference Sprenger-Charolles, Colé, Béchennec and Kipffer-Piquard2005) showed that two-letter graphemes (ch, ou, on, etc.) did not generate more mistakes or longer latencies than those with one-letter graphemes only, leading the authors to conclude that multiletter graphemes are processed as perceptual units from early on. On the other hand, using five experimental tasks among French adults, Chetail (Reference Chetail2020) found no reliable grapheme effect, supporting the claim that graphemes are not perceptual units in skilled visual word recognition. For their part, Spinelli et al. (Reference Spinelli, Kandel, Guerassimovitch and Ferrand2012) highlighted that graphemic effects depend on the grapheme cohesion. In a letter decision task, they found that A was detected faster in weakly cohesive complex graphemes (e.g., AN) than in strongly cohesive ones (e.g., AU) (see also Commissaire and Casalis, Reference Commissaire and Casalis2018). Furthermore, studies highlighted that expert readers are significantly slower in reading pseudowords containing multiletter graphemes than only simple ones, an effect known as the graphemic complexity effect (Joubert & Roch Lecours, Reference Joubert and Roch Lecours2000; Rastle & Coltheart, Reference Rastle and Coltheart1998; Rey et al., Reference Rey, Jacobs, Schmidt-Weigand and Ziegler1998; Rey & Schiller, Reference Rey and Schiller2005). Results are therefore inconsistent on the ease with which multiletter graphemes are processed, and data are above all lacking regarding the possible decoding challenge that low-cohesive multiletter graphemes, which are prominent in French, might represent.
Regarding contextual graphemes, most studies have been conducted in English on the reading of adjacent vowels since their spelling-sound relations are particularly complex in that orthography (Kessler & Treiman, Reference Kessler and Treiman2001; Treiman et al., Reference Treiman, Kessler, Zevin, Bick and Davis2006). A few English studies still investigated the context use for the consonants “c” and “g” (Treiman & Kessler, Reference Treiman and Kessler2019; Treiman et al., Reference Treiman, Kessler and Evans2007). Treiman and Kessler (Reference Treiman and Kessler2019) found that readers from early elementary school to university were not as influenced by the context as would be expected given the contextual effects in the English vocabulary and concluded that the use of context developed slowly. For example, for “c” preceded by “e” or “i,” the percentage of front pronunciations (i.e., /s/) increased from 17% for students from first- to third-grade level to 79% for students from a post-high school level. The contextual consonants “c” and “g” are, however, not as consistent in English as they are in French, even though the alternation of pronunciation of these graphemes came from French, borrowed in the medieval period (Emerson, Reference Emerson1997). Indeed, in English, the letter “g” before “e” or “i” is usually pronounced /ɡ/ at the beginning of non-Latinate words (e.g., get, give) and /ʒ/ in Latinate vocabulary (e.g., genetic, gingivitis) (Treiman & Kessler, Reference Treiman and Kessler2019), while it is always pronounced /ʒ/ before these letters in French. Despite this higher consistency, Sprenger-Charolles et al. (Reference Sprenger-Charolles, Colé, Béchennec and Kipffer-Piquard2005) found among French children that words containing “c” and “g” were read less accurately than words without any contextual grapheme up to Grade 4. Similarly, Alegria and Mousty (Reference Alegria, Mousty, Brown and Ellis1994) showed that in the direction of writing, the contextual letters “g” and “s” caused many difficulties for second graders with and without reading difficulties. In Italian, in which the spelling-sound relations for these graphemes are similar to French, it has been shown that 3rd and 5th graders read low-frequency words containing “c” or “g” more slowly and less accurately than words containing simple rules (see also Burani et al., Reference Burani, Barca and Ellis2006). Together, these observations show that the decoding challenges generated by French contextual graphemes need further study, especially those for which we found no data, such as “s” or “e.”
Finally, few data are available for final consonants. Royer et al. (Reference Royer, Spinelli and Ferrand2005) showed that expert French readers detected a silent final consonant more quickly (“t” in “chat”) than letters embedded in multiletter graphemes (“i” in “quai”), suggesting that silent letters are processed as single-letter graphemes rather than being bonded to their preceding vowel. In their connectionist model of reading developed for French, Perry et al. (Reference Perry, Ziegler and Zorzi2014) reported that inconsistencies with silent letters were hard to process, that the activation produced by words with these properties was higher, and that the model produced slower-than-average reaction times in these cases. They also found that the adult participants mainly pronounced the silent consonants, a tendency that was faithfully reproduced by their model. Since this propensity was found on monosyllabic pseudowords and since monosyllabic words are particularly inconsistent in French (Ziegler et al., Reference Ziegler, Jacobs and Stone1996), the decoding challenges that final consonants might represent should be further studied on other types of items.
The current study
Therefore, to find out whether these inconsistent and insufficiently studied French graphemes cause decoding challenges to typical readers, adults were required to read aloud pseudowords containing the ambiguous adjacent letters “an,” “on,” and “in”; the contextual graphemes “g,” “s,” and “e”; and the final consonants “d,” “p,” “s,” and “t,” as well as matched control pseudowords without them. Comparing the performance of adults reading these two kinds of pseudowords allowed us to test to what extent the conversion rules of the inconsistent graphemes are automatized. Indeed, automatized GPC rules for the inconsistent scenarios should result in targeted pseudowords read as accurately and as quickly as control pseudowords without inconsistent graphemes. Our hypothesis is, however, that when adults have to convert graphemes into phonemes, pseudowords containing inconsistent correspondences will give rise to more unexpected answers and to longer latencies than control pseudowords.
Method
Participants
This pilot study was planned to be conducted among students from our university, but this was not possible because of the COVID pandemic. Therefore, adults were recruited in the authors’ neighborhood, having strictly followed the hygiene rules in force. Twenty-seven participants (20 women and 7 men) took part in the study. They were 35 years old on average (SD = 12.7; min = 16 years old; max = 57 years old). They were all native French speakers from the French-speaking part of Switzerland and all learned to read in French, reported no history of learning disorders, and all achieved or were about to achieve a tertiary level of education. They received a 15-franc (14 GBP) voucher to thank them for their participation.
Material
Sixty pseudowords containing the graphemes of interest were created. The targeted pseudowords were matched to sixty control pseudowords that were similar in every way, except that the graphemes under study were replaced by stable graphemes corresponding to a phoneme of similar pronunciation length. Beyond that constraint, sequences of letters surrounding the graphemes of interest were replaced by sequences of letters of as comparable frequency as possible according to the Manulex infra database (Peereman et al., Reference Peereman, Lété and Sprenger-Charolles2007; see Appendix 1). The frequency of the sequences of letters in the target and control pseudowords was controlled because sequences of letters make the occurrence of a given letter (or grapheme) more or less predictable (Pacton et al., Reference Pacton, Fayol and Perruchet2005). All pseudowords were composed of three syllables. The position of the graphemes of interest varied between the beginning, middle, and end of the pseudoword. The positions were the same in each pair of target and control pseudowords, and in all categories of graphemes, the number of each possible position was the same.
More precisely:
-
To test the possible difficulty generated by the graphemes “in,” “on,” or “an” corresponding to /ϵ̃/, /ɔ̃/, or /ɑ̃/ (when followed by a consonant), six pseudowords were created. In the control pseudowords, the graphemes “in,” “on,” or “an” were replaced by a stable multiletter grapheme corresponding to a vocalic phoneme (“au” or “ou”) by choosing sequences of three letters whose frequency was as similar as possible. For example, the pseudoword “bonfivule” was matched to “baufivule,” and “létonvir” to “létauvir.”
-
The criteria were the same for the sequences of letters “in,” “on,” or “an” corresponding to /i/-/n/, /ɔ/-/n/, or /a/-/n/ (when followed by a vowel). In the control pseudowords, the sequences of letters “in,” “on,” or “an” were replaced by a stable sequence of letters corresponding to one vocalic and one long phoneme (“ir,” “aj,” “il,” “of,” “or,” “ar”) by choosing sequences of three letters whose frequency was as similar as possible. For example, the pseudoword “binavule” was matched to “biravule,” and “boutanore” to “boutajore.”
-
To test the possible difficulty generated by the subcase of two following sequences of letters, such as “in,” “on,” “an,” “im,” “om,” or “am,” meant to be respectively read /i/-/n/, /ɔ/-/n/, /a/-/n/, /i/-/m/, /ɔ/-/m/, and /a/-/m/ (when followed by a vowel), three pseudowords were created. In the control pseudowords, the sequences of letters were replaced by two stable sequences of letters corresponding to one vocalic and one long phoneme (“or,” “av,” “il,” “ur,” “az,” “ir”) by choosing sequences of letters that were as frequent as possible. For example, the pseudoword “binumate” was matched to “bilurate,” and “lomanube” to “loravube.”
-
To test the possible difficulty generated by the grapheme “s” inside a word corresponding to /s/ (between one vowel and one consonant), six pseudowords were created. In the control pseudowords, the grapheme “s” was replaced by a stable grapheme corresponding to a long phoneme (“r,” “l”) by choosing sequences of three letters around them whose frequency was as similar as possible (see Appendix 1). For example, the pseudoword “misbudole” was matched to “mirbudole,” and “nafuspé” to “nafulpé.”
-
The criteria were the same to test the possible difficulty generated by the grapheme “s” corresponding to /z/ (between two vowels). For example, the pseudoword “posido” was matched to “porido,” and “bilusore” to “bilufore.”
-
To test the possible difficulty generated by the grapheme “e” inside a word corresponding to /ϵ/ (when in the middle of a syllable, generally followed by two consonants), six pseudowords were created. The grapheme “e” was in the first syllable in half of the items and in the second syllable in the other half of the items. In the control pseudowords, the grapheme “e” was replaced by a stable grapheme corresponding to a vocalic phoneme (“a,” “i,” or “o”) by choosing sequences of three letters around them whose frequency was as similar as possible. For example, the pseudoword “neltari” was matched to “noltari,” and “chudelpa” to “chudilpa.”
-
The criteria were the same to test the possible difficulty generated by the grapheme “e” corresponding to /ə/ (at the end of an initial or median syllable, generally followed by a single consonant). For example, the pseudoword “refapile” was matched to “rofapile,” and “chadelu” to “chadilu.” For some of the targeted items (buvedire, chadelu, vateril), the “e” could also be silent (schwa), in which cases such answers were also considered correct.
-
To test the possible difficulty generated by the grapheme “g” corresponding to /g/ (when followed by “a,” “o,” “u,” “r,” or “l”), six pseudowords were created. In the control pseudowords, the grapheme “g” was replaced by a stable grapheme corresponding to an occlusive phoneme (“t,” “b,” “p,” or “d”) by choosing sequences of three letters around them whose frequency was as similar as possible. For example, the pseudoword “mogalou” was matched to “mobalou,” and “tolugar” to “tolupar.”
-
The criteria were the same to test the possible difficulty generated by the grapheme “g” corresponding to /ʒ/ (followed by “e” or “i”). In the control pseudowords, the grapheme “g” was replaced by a stable grapheme corresponding to a long phoneme (“f,” “r,” “v”). For example, the pseudoword “bégilor” was matched to “béfilor,” and “tapogé” to “tapofé.”
-
To test the possible difficulty generated by the final consonants “d,” “p,” “s,” and “t,” which are most often silent in French, nine pseudowords were created. In the control pseudowords, the final consonants were replaced by the silent final vowel “e” (always silent in French) by choosing final sequences of letters that were as frequent as possible. For example, the pseudoword “firédoup” was matched to “firédoue,” and “nujélard” to “nujélare.”
Procedure
The task was created using the E-prime 3 software. The items were presented in a pseudorandomized order. More precisely, eight lists were created with the pseudowords in a different order, having ensured that the two pseudowords of the same pair were separated by at least five items to minimize the priming effect (Stark & McClelland, Reference Stark and McClelland2000). The pseudowords were individually displayed for 500 ms in the middle of a 15-inch computer screen in Arial font, lowercase letters, and size 40. They were preceded by a “+” fixation point displayed for 500 ms in the center of the screen and followed by a mask composed of eight capital “X,” covering the place occupied by the pseudowords, displayed for 1,500 ms. The experiment began with five practice items. Participants were individually tested in a quiet room, comfortably seated on a chair approximately 40 cm from the screen. They were told that meaningless words would be presented to them and were required to read them aloud, as quickly and as accurately as possible, as if they were French words. The experiment lasted approximately six minutes per participant. Answers were considered correct if the pseudowords were read as expected according to the absolute or largely grapheme-to-phoneme correspondences (GPC) rules. An interjudge agreement was completed on a third of the data (nine participants) and yielded 96.3% agreement. Latencies (Perfetti, Reference Perfetti1985) were registered in ms through a voice key (Cedrus SV-1) on the basis of the pseudowords that were read as expected.
Results
Since most of the distributions were not Gaussian, Wilcoxon tests (for paired samples) were computed on the accuracy and reaction time (RT) scores for each scenario of the inconsistent graphemes. Results are displayed in Table 1, and all data are available at https://osf.io/4a7st/. Regarding accuracy, among the significant differences (with Bonferroni correction) between the pseudowords with inconsistent graphemes and the control pseudowords, the error rates were the highest for the grapheme “e” corresponding to /ə/ (“refapile,” “chadelu”; p < .001, 87% of unexpected answers relative to 5% for the control pseudowords); for the final consonants meant to be silent (“firédoup,” “nujélard”; p < .001, 50% of unexpected answers relative to 5% for the control pseudowords); for the grapheme “s” corresponding to /z/ (“posido,” “bilusore”; p < .001, 36% of unexpected answers relative to 12% for the control pseudowords); and for the graphemes “an,” “on,” and “in” corresponding to /ɑ̃/, /ɔ̃/, and /ϵ̃/ (“bonfivule,” “néfantile”; p < .001, 26% of unexpected answers relative to 6% for the control pseudowords). A trend toward significance was found for the grapheme “g” corresponding to /ʒ/ (“bégilor,” “tapogé”; p = .009, 19% of unexpected answers relative to 4% for the control pseudowords). Conversely, the differences were not significant for the grapheme “s” corresponding to /s/ (“misbudole,” “nafuspé”; p = .04, 20% of unexpected answers relative to 12% for the control pseudowords); for the sequences of letters “an,” “on,” and “in” followed by a vowel (“binavule,” “futinope”; p = .018, 14% of unexpected answers relative to 8% for the control pseudowords); for the two following sequences of letters such as “in,” “on,” “an,” “im,” “om,” and “am” (“binumate,” “lomanube”; p = .157, 16% of unexpected answers relative to 9% for the control pseudowords); for the grapheme “e” corresponding to /ϵ/ (“neltari,” “chudelpa”; p = .856, 9% of unexpected answers relative to 9% for the control pseudowords); and for the grapheme “g” corresponding to /g/ (“mogalou,” “tolugar”; p = .873, 9% of unexpected answers relative to 8% for the control pseudowords).
Table 1. Mean accuracy and RT for each grapheme-to-phoneme correspondences (GPC) scenario and its control pseudowords and differences between them

Note: Mean accuracy (maximum = 1 pt) and RTs (in s) scores (and SD) for pseudowords with inconsistent graphemes as well as for matched control pseudowords, and differences between each pair of pseudowords. With Bonferroni correction, differences are significant at p < .005 and tendentially significant at p < .01.
Mean RTs were calculated for each scenario on the basis of the participants’ RTs for correct answers. Only a trend toward significance was found for the following sequences of “on,” “in,” and “an” letters. Because latencies are more sensitive to the beginning of items (Sambai et al., Reference Sambai, Coltheart and Uno2018), and some targeted graphemes were at the end of the items in the current study, we checked whether differences were significant when considering only the latencies on pseudowords with inconsistencies at the beginning of the items, but this was not the case.
Discussion
The current pilot study was aimed at testing whether French graphemes following inconsistent conversion rules, such as the ambiguous sequences of letters “in,” “on,” and “an”; the contextual graphemes “g,” “s,” and “e”; and the final consonants “d,” “p,” “s,” and “t,” generated decoding challenges among expert readers.
In line with our hypothesis, many pseudowords with inconsistent graphemes generated more unexpected answers than matched pseudowords without them (Table 1). The scenarios that generated significant differences were, from the highest to the lowest rate of unexpected answers, the grapheme “e” corresponding to /ə/; the final consonants meant to be silent; the grapheme “s” corresponding to /z/; and the graphemes “an,” “on,” and “in” corresponding to /ɑ̃/, /ɔ̃/, and /ϵ̃/. A trend toward significance was also found for the grapheme “g” corresponding to /ʒ/. Conversely, there was no significant difference for the grapheme “s” corresponding to /s/, for the sequences of letters “an,” “on,” and “in” followed by a vowel, for the grapheme “e” corresponding to /ϵ/, for the grapheme “g” corresponding to /g/, nor for two following sequences of letters such as “in,” “on,” and “an.” Differences were also not significant for RTs.
Thus, the ambiguous sequences of letters “in,” “on,” and “an” generated more unexpected answers than their matched pseudowords only when they corresponded to the low-cohesive complex graphemes /ɑ̃/, /ɔ̃/, and /ϵ̃/, not when they were two simple adjacent graphemes. There were not either significantly more unexpected answers when two sequences of such simple graphemes followed each other (“binumate,” “lomanube”). Therefore, for these ambiguous adjacent letters that are prominent in French, it is when they appear as low-cohesive complex graphemes that they represent a decoding challenge, not when they correspond to two simple graphemes.
The current results also highlighted that the three contextual graphemes under study (“s,” “e,” and a trend toward significance for “g”) represent decoding challenges. The rate of unexpected answers seems to be explained by dominant rules (Alegria & Mousty, Reference Alegria and Mousty1996), since the scenarios that generated more unexpected answers were the less frequent GPC. Indeed, the fact that the grapheme “e” corresponding to /ə/ generated more unexpected answers than the grapheme “e” corresponding to /ϵ/ is in line with the fact that this grapheme more often corresponds to /ϵ/ than to /ə/; the fact that the grapheme “s” corresponding to /z/ gave rise to more unexpected answers than the grapheme “s” corresponding to /s/ is in line with the fact that this grapheme more often corresponds to /s/ than to /z/; and the fact that the grapheme “g” tended to generate more unexpected answers when corresponding to /ʒ/ than to /g/ is in line with the fact that this grapheme more often corresponds to /g/ than to /ʒ/. However, the percentages of unexpected answers across the scenarios of the same grapheme are not proportional to their respective frequency. For example, the percentage difference of unexpected answers between the grapheme “s” corresponding to /z/ and their control pseudowords (64% vs 88% = 24%) is three times higher than the percentage difference of unexpected answers between the grapheme “s” corresponding to /s/ and their control pseudowords (80% vs 88% = 8%). Yet, the grapheme “s” is in total almost six times more frequently associated with the phoneme /s/ than with /z/ (Peereman et al., Reference Peereman, Lété and Sprenger-Charolles2007). This lack of proportionality between the percentage of unexpected answers and the frequency of the respective GPC might be explained by the fact that inside words (the place where the graphemes of interest were placed in the current study), the grapheme “s” more often corresponds to /z/ than to /s/. The same applies to the grapheme “g”: its correspondence to the phoneme /ʒ/ gives rise to largely more additional unexpected answers (15%) than its correspondence to the phoneme /g/ (1%) relative to the control pseudowords, while this grapheme is only associated 1.2 times more often with the phoneme /g/ than with /ʒ/ in total. This might be explained by the fact that inside words, “g” more often corresponds to /ʒ/ than to /g/. Therefore, it appears that the grapheme-to-phoneme conversions selected by readers firstly depend on the global GPC rule frequencies and that they are secondarily moderated by the GPC rule frequencies depending on the position of the graphemes in the words.
Lastly, the pseudowords with final consonants also generated significantly more unexpected answers than their matched pseudowords (50% versus 5%). The final consonants meant to be silent were pronounced in half of the cases, whereas they are silent in this place of the words in more than 95% of cases (Peereman et al., Reference Peereman, Lété and Sprenger-Charolles2007). This result is similar to the 57% of pronounced final consonants found by Perry et al. (Reference Perry, Ziegler and Zorzi2014) on monosyllabic pseudowords and thus extends these results to three-syllable pseudowords. The pseudowords with final consonants that generated the most unexpected answers were, in descending order (see Appendix 1): roudélop (93% of unexpected answers); chépuroid (82% of unexpected answers); vapurit (78% of unexpected answers); léfadis (70% of unexpected answers); julavort (44% of unexpected answers); péfutors (30% of unexpected answers); firédoup (22% of unexpected answers); péluraut (19% of unexpected answers); and nujélard (11% of unexpected answers). So, since the same final consonant can generate very different rates of pronunciation (for example, 93% for the “p” in “roudélop” but 22% for the “p” in “firédoup”), it is not just certain final graphemes that generate more pronunciation than others, but apparently sequences of letters that do (see below).
Contrary to our hypothesis, the latencies between the targeted and control pseudowords were not significantly different. This might be explained by the fact that latencies are overall not variable enough among adults reading pseudowords in French. Differences in latencies have been found in word reading (Hogaboam & Perfetti, Reference Hogaboam and Perfetti1978), pseudowords in English (Perfetti & Hogaboam, Reference Perfetti and Hogaboam1975), among young learners (Sprenger-Charolles et al., Reference Sprenger-Charolles, Colé, Lacert and Serniclaes2000), between different languages (Paulesu et al., Reference Paulesu, Démonet, Fazio, McCrory, Chanoine, Brunswick and Frith2001), and between dyslexic and control individuals (Sprenger-Charolles et al., Reference Sprenger-Charolles, Colé, Lacert and Serniclaes2000), but latencies might not be a sensitive measure in a design such as ours. It may also be that the small number of items used per category in the present study did not allow us to achieve sufficient statistical power to capture any differences at this level.
While the main aim of this study was to describe the extent to which certain features of the French orthographic system represent decoding challenges, some theoretical implications can be inferred regarding the nature of the sublexical route. In the literature, although nobody would question the special status of graphemes in alphabetic scripts, a common issue is how and when the graphemic stage takes place during the mapping between letters and phonemes (Chetail, Reference Chetail2020). According to the CDP+ (e.g., Perry et al., Reference Perry, Ziegler and Zorzi2007) and BIA (Diependaele et al., Reference Diependaele, Ziegler and Grainger2010) models, a letter string is first parsed and segmented into grapheme units before any phonological sublexical activation. Conversely, according to the DRC models (e.g., Coltheart et al., Reference Coltheart, Rastle, Perry, Langdon and Ziegler2001), the nonlexical route proceeds letter by letter, not grapheme by grapheme. Letter strings are analyzed serially from left to right, and the system looks for a grapheme-to-phoneme rule to determine the phonemes corresponding to the letters, starting with those furthest to the left. Globally, our results support that the mechanisms at work might partly depend on the GPC scenarios considered and on their position in the words, and that interindividual differences exist. For example, for the adjacent letters “an,” “on,” and “in,” the fact that they generated 26% of unexpected answers when they were low cohesive complex graphemes and 14% of unexpected answers when they were two simple consecutive letters supports the view that these sequences of letters are a little more likely to be read letter by letter rather than as a larger (graphemic) unit. This observation is, however, limited to these sequences of letters since the low percentage of unexpected answers (6%) on pseudowords containing the highly cohesive complex graphemes “ou” or “au” suggests that those are quite spontaneously clustered into graphemes. The results regarding contextual graphemes indicate that in a considerable part of cases, they are converted into a phoneme without taking into account the surrounding letters. For example, for the grapheme “s,” the participants read it /s/ rather than /z/ in more than 30% of the cases, that is, without taking into account the surrounding vowels. As developed above, the position in the (pseudo)word may also get into action when it comes to attributing one phoneme rather than another to these contextual graphemes, a position on which the GPCs’ frequency rules precisely depend (Peereman et al., Reference Peereman, Lété and Sprenger-Charolles2007). This assignment of a phoneme without taking into account the surrounding letters was especially true for the grapheme “e,” which was read /ϵ/ instead of /ə/ in more than 80% of the cases. The GPC rule for this grapheme is, however, less systematic than for the graphemes “s,” “c,” and “g.” Indeed, it corresponds to /ϵ/ in most of the cases where it is followed by two consonants and to /ə/ when followed by a single consonant, but there are some exceptions, such as when the grapheme “e” is part of the prefix “re” (semantic value of repetition), as in “reprendre” (/rəprɑ̃dr/, “take again”), or when it is followed by two “s” (dessin, /dϵsϵ̃/, “drawing”; dessus; /dəsy/, “top”). In short, contextual graphemes do not seem to be analyzed automatically in relation to the surrounding letters and are often associated with the phoneme most frequently linked to the isolated letter, depending in part on its position in the word.
The theoretical implications regarding the final consonant are quite ambivalent. On the one hand, the fact that they were pronounced in 50% of the cases supports the idea that they are read as separate letters rather than being attached to their preceding vowel. In other words, the end of “roudélop” seems to be read “l.o.p” rather than “l.op,” as it was supported in the Perry et al. (Reference Perry, Ziegler and Zorzi2014) study. On the other hand, these letters were not pronounced in 50% of the cases, and, as developed above, the same final consonant generated very different rates of pronunciation (77% for the “t” in “vapurit” but 18% in “péluraut”). This suggests that the precedent letters still come into play when deciding whether or not to pronounce the final consonant. Thus, while our results for low-cohesive multiletter graphemes, contextual graphemes, and final consonants globally show that the isolated letter level is quite important, it also appears that the nature of the sublexical route partly depends on the sequences of letters in question and on interindividual differences.
Some limitations of this study must be considered. First, as mentioned earlier, the number of items per category and the number of participants were rather low, and more significant differences might have emerged with more items and/or more participants allowing more statistical power. The significant differences found here in these conditions suggest, however, that in any case, the scenarios which gave rise to significantly more unexpected answers compared to their matched pseudowords are less automatized than the scenarios that did not. Second, to match the targeted pseudowords to the control pseudowords, the sequences of letters that differed between them could not always occur at precisely the same frequency when considering the stability and the pronunciation length of the control grapheme (this was especially the case for the categories s meant to be read /z/, e meant to be read /ə/, and the final consonant). However, when the frequencies could not be comparable, we introduced as far as possible higher frequencies for the sequences of letters of the target pseudowords, so that any disadvantages in the scenarios of interest would not be attributable to a lower frequency. This frequency does not actually appear to be so decisive in the current reading performance, since the percentage of unexpected answers was not aligned with them. For example, the frequency of the three letters (“iso”) in the targeted item “nisoulor” was 1601,46 for a mean percentage of expected answers of 44%, while the frequency of the three letters (“ivo”) in the control item “nivoulor” was 48,38 for a mean percentage of expected answers of 78% (see Appendix 1). Moreover, asking participants about their bilingual or mother-tongue status would have provided a better description of the sample and allowed them to take into account any effects of these variables on their decoding performance (Papastefanou et al., Reference Papastefanou, Marinis and Powell2021). Measuring their reading performance through an additional reading task would also have allowed them to take their general reading level into account. Finally, a few of the created pseudowords should be slightly modified after noting certain unexpected recurring responses. For example, the pseudoword “chépuroid,” whose end is the same as the very frequent and common word “froid” (“cold”), was sometimes read /ʃepyrɔid/ instead of the expected answer /ʃepyrwa/, possibly under the influence of a word that is much less frequent but of more similar length, “astéroïde” (“asteroid”). In this case, the pronunciation of the final consonant “d” might have been prompted by such an influence.
In sum, this study highlights the fairly large number of inconsistent French graphemes causing decoding difficulties for expert readers, suggesting that the difficulty of decoding French should not be underestimated. This suggests that such inconsistent graphemes might also represent particular challenges in the acquisition phase or for individuals facing reading disorders, although this suggestion should be verified by further studies. Indeed, because the decoding challenges faced here by expert readers can be linked to dominant grapheme-to-phoneme rules, and beginning and struggling readers are less sensitive to frequency effects (Alegria & Mousty, Reference Alegria and Mousty1996; Mousty & Leybaert, Reference Mousty and Leybaert1999; Sprenger-Charolles et al., Reference Sprenger-Charolles, Siegel, Béchennec and Serniclaes2003), it would be worthwhile to identify the extent to which the various inconsistent grapheme-to-phoneme associations generate difficulties for them. The current results highlight that the grapheme-to-phoneme conversion rules of the inconsistent graphemes studied here are not easy to automate, which suggests that their conversion rules could deserve to be emphasized during instruction. In particular, the identification of the precise scenarios that generate decoding challenges among struggling readers could be pursued, which could lead to additional knowledge on effective ways of dealing with such targeted difficulties.
Replication package
Replication data and materials for this article can be found at https://osf.io/4a7st/.
Competing interests
The authors declare none.
Appendix 1. Frequencies of the three letters surrounding the grapheme-to-phoneme correspondences (GPCs), mean RTs, and mean percentage of expected answers for each target and control pseudowords

Note: Frequencies of the three letters surrounding the GPCs, mean RTs on expected answers (in ms), and mean percentage of expected answers for each target and control pseudowords are displayed. Frequencies for the last category (“Two in, on, an…”) are not reported since they are only available for up to three adjacent letters (Peereman et al., Reference Peereman, Lété and Sprenger-Charolles2007).