1. Introduction
Natural languages exhibit great diversity at all levels of linguistic organisation. Despite the large variation in grammatical, lexical and phonetic features, certain properties are present in all or most languages. These include fundamental design features like productivity and duality of patterning (Hockett, Reference Hockett1960), universal or near-universal properties like recursion (Chomsky, Reference Chomsky1965) and statistical regularities like those concerning constituent order (Greenberg, Reference Greenberg and Greenberg1963), word length (Zipf, Reference Zipf1935), and common frequency distributions across word classes (Ramscar, Reference Ramscar2021). The last few decades have seen an explosion of research, which seeks to explain these universal or cross-linguistically frequent patterns as adaptations by languages themselves to constraints on human cognition and communicative interaction (Christiansen & Chater, Reference Christiansen and Chater2008; Evans & Levinson, Reference Evans and Levinson2009).
1.1. Biases in learning shape languages
Cognitive biases, in particular learning biases, have been identified as possible sources of common typological patterns (Culbertson et al., Reference Culbertson, Smolensky and Legendre2012; Hudson Kam & Newport, Reference Hudson Kam and Newport2005, Reference Hudson Kam and Newport2009). A useful test case for studying the relationship between individual cognition and language structure is linguistic regularisation. Typically, linguistic variation is conditioned on grammatical or social context (Givón, Reference Givón and Slobin1985), and speakers have implicit awareness of the constraints that guide variation. Grammatically conditioned variation is often deterministic. For example, the properties of stem-final phonemes fully predict the pronunciation of regular past tense markers in English, as in rocked (the -ed suffix is realised as a voiceless stop), rigged (voiced stop) and vetted (syllabic realisation). Social variation, on the other hand, is often probabilistic in nature: variants have social meaning, such that some variants are more likely to be produced in certain social environments, by certain language users, or by language users adopting a particular style to signal a particular social meaning (Eckert, Reference Eckert2012; Labov, Reference Labov2006; Shuy et al., Reference Shuy, Wolfram and Riley1967). It is often claimed that completely unpredictable or ‘free’ variation is very rare in natural languages.
Despite the complex conditioning rules, which may make acquiring conditioned variation difficult, learners (both adults and children) show a bias against unpredictable variation. Artificial language learning paradigms, where participants attempt to learn experimenter-designed miniature languages, have been very useful in investigating this phenomenon. In a pioneering series of studies, Hudson Kam and Newport (Reference Hudson Kam and Newport2005, Reference Hudson Kam and Newport2009) taught child and adult participants an artificial language, which featured variation in markers serving an identical grammatical function (i.e. the variation was not conditioned on any context). Child learners tended to regularise, to eliminate the variation completely, using only one preferred variant (often the most frequent) and getting rid of the alternative markers. Adults were more likely to retain the variation by matching the frequency of each marker to their input – this is known as probability matching. However, with increased complexity (that is, when there were many markers performing the same function), adults also regularised. Regularisation has been demonstrated at different linguistic levels: in morphology (Fedzechkina et al., Reference Fedzechkina, Newport and Jaeger2017; Hudson Kam & Newport, Reference Hudson Kam and Newport2005, Reference Hudson Kam and Newport2009; Perfors, Reference Perfors2012, Reference Perfors2016), in word order (Culbertson et al., Reference Culbertson, Smolensky and Legendre2012; Fedzechkina et al., Reference Fedzechkina, Newport and Jaeger2017; Fehér et al., Reference Fehér, Wonnacott and Smith2016) and in the lexicon (Ferdinand et al., Reference Ferdinand, Kirby and Smith2019; Reali & Griffiths, Reference Reali and Griffiths2009). Saldaña et al. (Reference Saldaña, Smith, Kirby and Culbertson2021) showed that the strength of regularisation is comparable across the different levels, that is, all other things being equal, morphological variation does not regularise more rapidly than syntactic variation.
Regularisation can also be a gradual process occurring across generations: iterated learning experiments (where one learner’s language output is used as input to another learner whose output then becomes input to a third learner in a chain of transmission) have shown that small initial biases for regularisation can be amplified by successive generations of learners to eventually lead to fully regular systems (Reali & Griffiths, Reference Reali and Griffiths2009). Smith and Wonnacott (Reference Smith and Wonnacott2010) used an iterated learning paradigm to show that systems of conditioned variation can also emerge gradually. In an experiment involving an artificial language featuring variation in plural markers, over multiple generations of learning and reproduction, variation was maintained (both plural markers persisted in the language) but became conditioned on the linguistic context, producing a pattern of variation analogous to conditioning in natural languages. Specifically, initially unpredictable plural markers gradually became associated with particular nouns. This lexical conditioning of variation provides a means by which languages may become more predictable without losing variation.
Other studies have used artificial language learning paradigms to look at how learners respond to the presence of conditioned variation in their input and probed in more detail the kinds of conditioned variation that can be spontaneously introduced in learning. Several studies (e. g. Hudson Kam & Newport, Reference Hudson Kam and Newport2009; Wonnacott, Reference Wonnacott2011) showed that adult participants are able to accurately learn even quite complex systems of lexically conditioned variation. Samara et al. (Reference Samara, Smith, Brown and Wonnacott2017) showed that both children and adults can acquire socially conditioned variation (determined by the gender of the speaker), although children seem to struggle to reproduce partially conditioned variation (probabilistic rather than deterministic conditioning of variation). Another type of conditioning found in natural languages is category-based conditioning, where variation is conditioned on more abstract semantic properties of words, such as noun animacy (e. g. in Bantu and Basque) or gender (e. g. French and German).
Category-based conditioned variation is particularly interesting because it potentially provides a simple, predictable and presumably relatively easily learnable system of conditioning, especially when contrasted with complex lexically conditioned patterns of variation. In artificial language studies, adults and older children can successfully learn semantically conditioned variation (Brown et al., Reference Brown, Smith, Samara and Wonnacott2022; Ferman & Karni, Reference Ferman and Karni2010), although again, younger children do not perform as well when the conditioning is probabilistic rather than consistent (Brown et al., Reference Brown, Smith, Samara and Wonnacott2022; Hudson Kam, Reference Hudson Kam2015; Schwab et al., Reference Schwab, Lew-Williams and Goldberg2018). Surprisingly, while the tendency of adult participants to spontaneously introduce lexical conditioning is well-documented, studies looking at semantic conditioning find relatively little tendency to introduce it. In a study by Brown et al. (Reference Brown, Smith, Samara and Wonnacott2022), only some participants introduced semantic conditioning spontaneously, only early in learning, and only when the semantic cues were particularly salient (i.e., when they encountered many nouns in each semantic category where lexical identity was less prominent). Therefore, it seems that a categorically conditioned system of variation is unlikely to be produced by learning (at least in a single step). Our aim in the current study is to find out whether interaction can facilitate the emergence of this kind of conditioned variation.
1.2. Biases in interaction shape languages
So far, we have focused on studies that involve individual learners learning a language in isolation. While such studies have provided several important advances in our understanding of the causes of typological regularities, they cannot provide the full picture. Since languages are learned and transmitted in a communicative context, interaction must play a fundamental role in mediating between individual cognition and language structure. Language is produced in communication, and children learn from observations of linguistic data produced during interaction to serve in-the-moment communicative goals. Therefore, the way language changes over long timescales must reflect biases in both individual and communicative mechanisms.
Artificial language learning paradigms have also been used to explore biases in communicative scenarios: participants undergo an initial training stage on a target language, then play a communication game using that artificial language, alternating between producing and interpreting descriptions in the language. During this interaction, participants can change artificial languages in ways that reflect their cognitive and communicative biases (Fedzechkina & Jaeger, Reference Fedzechkina and Jaeger2020; Fehér et al., Reference Fehér, Wonnacott and Smith2016, Reference Fehér, Ritt and Smith2019; Kanwal et al., Reference Kanwal, Smith, Culbertson and Kirby2017; Kirby et al., Reference Kirby, Tamariz, Cornish and Smith2015), which allow us to study the effects of both learning and interaction. In some cases, dyadic interaction has been shown to lead to similar linguistic systems as emerge more gradually in multi-learner iterated learning chains. For instance, Winters et al. (Reference Winters, Kirby and Smith2018) and Raviv et al. (Reference Raviv, Meyer and Lev-Ari2019) showed that compositional languages can quickly evolve through interaction, which had previously been shown to evolve slowly through inter-generational transmission (Kirby et al., Reference Kirby, Cornish and Smith2008, Reference Kirby, Griffiths and Smith2014).
Referential communication tasks in artificial languages also provide tools to investigate interactive mechanisms underlying long-term changes in linguistic behaviour. One such mechanism is priming. During linguistic interaction, interlocutors tend to match each other’s linguistic choices via a process known as alignment or accommodation (Coupland, Reference Coupland and Jaspers2010; Pickering & Garrod, Reference Pickering and Garrod2004). Priming, the low-level mechanism responsible for alignment, is the repetition of a partner’s previously used linguistic variant (e.g. a word, grammatical construction or pronunciation). Linguistic alignment is an integral aspect of social communication: priming has been demonstrated in natural dialogue (Levelt & Kelter, Reference Levelt and Kelter1982; Schenkein, Reference Schenkein and Butterworth1980; Weiner & Labov, Reference Weiner and Labov1983), in corpora (Gries, Reference Gries2005) and in experiments (e.g. Branigan et al., Reference Branigan, Pickering and Cleland2000, Reference Branigan, Pickering, McLean and Cleland2007). Priming occurs at different levels of linguistic representation: phonetic (e.g. Giles et al., Reference Giles, Coupland, Coupland and Giles1991), lexical (e.g. Brennan, Reference Brennan1996; Garrod & Anderson, Reference Garrod and Anderson1987), semantic (e.g. Garrod & Anderson, Reference Garrod and Anderson1987; Garrod & Clark, Reference Garrod and Clark1993) and structural (e.g. Bock, Reference Bock1986; Gries, Reference Gries2005). Priming has been found to enhance communicative success (Pickering & Garrod, Reference Pickering and Garrod2006), and it is influenced by social factors: people show more behavioural and linguistic alignment when they perceive their interlocutors favourably, while divergence from an interlocutor can be a sign of disaffiliation (Balcetis & Dale, Reference Balcetis and Dale2005; Bourhis et al., Reference Bourhis, Giles, Leyens, Tajfel and Giles1979; Doise et al., Reference Doise, Sinclair and Bourhis1976; Giles et al., Reference Giles, Taylor and Bourhis1973; Giles & Powesland, Reference Giles and Powesland1975).
Connecting the literature on priming to the regularisation of variation, interaction has previously been shown to lead to the regularisation of unpredictable variation (Fehér et al., Reference Fehér, Wonnacott and Smith2016). The effect is especially strong when people with variable language interact with regular users of the language (Fehér et al., Reference Fehér, Ritt and Smith2019). Fehér et al. (Reference Fehér, Wonnacott and Smith2016) contrasted several interactive scenarios to tease apart the interactive mechanisms involved in this regularisation process (in that case, regularisation of variable word order). In the Singles condition, participants interacted with a computer partner who produced both possible word orders with equal frequency; in the Dyad condition, a pair of human participants interacted with each other; and in the Pseudodyad condition, participants were told that they would interact with another participant (as in the Dyad condition), but in fact had a computer partner (as in the Singles condition). Regularisation occurred in Dyads and Pseudodyads but not in Singles, where priming by the variable partner prevented participants from regularising their language. Interestingly, Pseudodyads diverged from their partners’ variable use, possibly due to higher-order communicative intentions based on their belief that a partner should prefer a regular language.
1.3. The current study
In the study we report here, we used these same three interactive conditions and the same measures (of regularisation, conditioning and priming) as in Fehér et al. (Reference Fehér, Wonnacott and Smith2016) to explore the interactive mechanisms that may contribute to category-based semantic conditioning. We trained participants on a miniature artificial language, which provided descriptions for scenes featuring objects drawn from one or two semantic categories (animals and/or vehicles). The training language had variable and unpredictable plural marking, providing two equivalent means to indicate plurality. After participants were trained on the target language, they were asked to recall the language by having to produce descriptions for images, giving us a pre-interaction snapshot of how they marked plurals. After this recall phase, they used the language to play a communication game over networked computers with another participant or a computer partner, allowing us to see how their plural marking changed in interaction.
Based on our earlier findings (Fehér et al., Reference Fehér, Wonnacott and Smith2016, Reference Fehér, Ritt and Smith2019), we expected that learners would reduce the unpredictable variation present in their input language during interaction, particularly in the Dyad condition, where shared regularisation biases would reinforce each other. If higher-level communicative intentions (i.e. a desire to make oneself more predictable for a human interlocutor) play a role in this process, participants in the Pseudodyad condition should regularise to the same extent. Participants in the Singles condition, interacting with a variable computer partner, would not regularise to the same extent due to their partners’ use of both markers and the absence of the pressure to behave predictably for a human partner. Contrasting different forms of interaction (Singles, Dyads and Pseudodyads) therefore gives us insights into the interactive processes that we expect to lead to a reduction in unpredictable variation during interaction.
Our design also allows us to study the inherent tension between the elimination of variation (regularisation) and the conditioning of variation. Our main manipulation here is to provide objects from either one or two semantic categories, which creates an important distinction: when all objects belong to one semantic category (one-category condition), predictable marker use can only be achieved by eliminating variation entirely (that is, using a single marker exclusively) or by conditioning marker usage lexically (that is, consistently using a particular marker with each noun). In contrast, in the two-category condition, conditioning on semantic category (that is, consistently using a particular marker with each category of noun, animals or vehicles) becomes possible. Our main prediction here was that the presence of two semantic categories would lead to the retention of variation by allowing users to condition the use of plural markers on semantic category. There are two main reasons for this hypothesis. One is that interaction may amplify the weak individual tendencies for semantic conditioning seen in Brown et al. (Reference Brown, Smith, Samara and Wonnacott2022). The other reason is that conditioning the language on semantic category would provide interlocutors with the simplest possible shared linguistic system. This hypothesis follows Freyd’s shareability account (Reference Freyd1983). Freyd argued that interlocutors, in a need to align, take advantage of the similarity between features of items (such as semantic category) to establish communicative conventions. This may be desirable because it would help participants achieve a predictable, unambiguous language that could facilitate quick, efficient exchanges helping them succeed in the communication game.
2. Methods
2.1. Participants
We recruited 124 participants from the student population at the University of Edinburgh. Participants attended our lab in person and were randomly assigned to one of three interactive conditions (see below): Singles (33 participants), Dyads (64 participants) and PseudoDyads (27 participants).
2.2. Stimuli and structure of the input language
We designed a semi-artificial language with English nouns and novel verbs and plural markers. We used English nouns to facilitate learning, becausewe needed to present a large number of nouns in our study, and participants struggle to memorise more than a few new words in an experimental session. Moreover, we were not interested in the acquisition of new words, rather, how markers are used in different semantic contexts. As visual stimuli, we used cartoon images drawn from two semantic categories, animals and vehicles. We manipulated the number of semantic categories presented to participants: in the one-category condition, each participant/pair was assigned to a given semantic class (either animals or vehicles) and encountered all 16 objects comprising that category (for animals: cow, dog, elephant, fox, giraffe, hamster, hedgehog, hippo, kangaroo, panda, pig, rabbit, sheep, squirrel, tiger, zebra; for vehicles: ambulance, bike, boat, bus, car, digger, submarine, helicopter, plane, rocket, scooter, tank, tractor, train, truck, van). For participants/pairs in the two-category condition, we selected 8 animals and 8 vehicles at random from these lists, yielding a set of 16 distinct objects. For each participant/pair we selected 2 plural markers at random from a list of 8 (bup, dak, jeb, kem, pag, tid, wib, yav) and one verb at random from a list of 5 possibilities (glim, norg, frab, gund, shen), which was used to indicate motion. We originally had two verbs referring to two distinct motions, and fewer nouns per category, but after piloting, we increased the number of nouns and dropped down to one verb to reduce the number of training trials required to learn the language. During training, for images featuring two animals/vehicles, participants saw labels consisting of a verb, noun, and one of the two plural markers. Both plural markers occurred equally frequently during training (both in terms of overall frequency and in terms of their frequency of occurrence with each noun). Figure 1 shows two of the images presented to participants with the corresponding training labels.

Figure 1. Example visual and linguistic stimuli. We presented images of vehicles and/or animals either singly or in pairs, along with their descriptions in the semi-artificial language. The word ‘frab’ in the example describes motion (indicated by the straight arrows in the visual stimuli). We used English nouns followed by either no marker for singular images or one of two randomly selected plural markers for plural images.
2.3. Experimental procedure
The experimental procedure involved 5 phases. Critically, the first 4 phases are identical across conditions, with only the final Phase 5, Interaction phase differing between conditions.
Phase 1, Noun training: participants were shown each object along with its corresponding English noun. We presented each object twice, totalling 32 trials. This phase familiarised participants with the images and the corresponding nouns to be used in the experiment.
Phase 2, Noun testing: participants were shown each object in random order and were asked to type in the corresponding noun. Each noun was presented twice, yielding a total of 32 trials.
Phase 3, Sentence training: participants were presented with scenes (one or two objects of the same type, plus an arrow indicating motion) and their descriptions. This phase consisted of 96 trials, organised in 2 blocks of 48 trials. Each block featured 16 trials presenting singulars (each object presented once in the singular) and 32 trials presenting plurals (each object presented once with both plural markers). The order of presentation was randomised within each block.
Phase 4, Recall: participants were presented with scenes and were asked to produce their full descriptions by typing. Participants completed 96 recall trials, organised into 2 blocks of 48 trials featuring production of 16 singulars and 32 plurals, as in Phase 3, meaning that participants ultimately produced descriptions for each noun in its plural form 4 times.
Phase 5, Interaction: all participants played a communication game in which they took turns describing objects for their partner (sender role), and selecting objects based on their partner’s descriptions (receiver role). When sending a message, participants were presented with a scene (singular or plural) and were prompted to type the description so that their partner could identify it. When acting as receiver, the participant received the label their partner produced and was prompted to select the appropriate picture from an array of eight images, the correct image and 7 distractors. The distractor set consisted of the correct object with the incorrect number (that is, if the target picture was a singular noun, this noun was a plural) and three other objects both in singular and plural form. Both participants then received feedback indicating whether the receiver had correctly selected the image the sender was describing. Participants completed 72 such communication trials split into two blocks of 36 trials. In each block, they acted as sender three times (once for the singular and twice for the plural of each noun) for 12 randomly selected nouns (out of the 16 nouns in their training set – we tested on a subset to reduce experiment duration and participant fatigue). Therefore, participants ultimately produced descriptions for each of 12 nouns in its plural form 4 times. The order of images was randomised within a block, and participants were equally likely to send or receive on the first trial, with the roles alternating for the remainder of the block.
2.4. Experimental conditions
The three conditions only differed at the interactive testing phase. Participants in the Single condition interacted with a computer partner whose marker use (when acting as sender during interaction) was identical to the training input, using the two plural markers equally often for each noun. When acting as receiver, this computer partner evaluated the trained descriptions for all the images in its receiver array and simply selected the image whose description was closest (according to Levenshtein distance) to the description provided by the (human) sender. In the case of two or more descriptions being equally close, the computer receiver selected randomly among those candidates. Participants in the Single condition knew they were interacting with a computer partner. In the Dyad condition, participants interacted over a computer network with another participant seated in another booth. Participants in the Dyad condition arrived in the lab in pairs, were briefed simultaneously, were informed that they would be doing the interactive task with the other participant, and their training was synchronised such that they reached the interactive phase at roughly the same time. Participants in the Pseudodyad condition also arrived in the lab in pairs and were told that they would be interacting with the other participant (instructions were identical to those in the Dyad condition), but unbeknownst to them, they performed the interactive game with a computer agent who behaved in the same way as the computer partner in the Single condition (that is, marker use identical to the training input, distance-based evaluation when acting as receiver). In other words, Pseudodyads believed they were in the Dyad condition, but their actual experience of their partner’s language use was identical to that of participants in the Single condition. To make the interaction more realistic and plausible for the participants in the Pseudodyad condition, we built in a delay of 7–11 seconds when the partner was producing a description, and 3–8 seconds when the computer was the receiver. We added this delay to all the phases of the experiment, so participants in the Pseudodyad condition, as well as in the Dyad condition, could be in a situation where they had to wait for their partner to catch up. Based on post-experiment briefing, we have no reason to believe that participants in the Pseudodyad condition ever suspected they were not actually interacting with a human partner.
2.5. Data analysis
We analysed participant productions in both the recall and interaction phases. Our analyses concern participants’ use of the two marker words: do they regularise (that is, use only one marker word), do they condition marker use (e.g. on the semantic category of the noun they are marking) or do they behave variably and are they primed by their partner’s marker choices?
In all three conditions and for both testing phases (interaction and recall), we tested for (1) regularisation which would be evidenced by an overuse of one of the markers (an output proportion significantly different from the input proportion); (2) conditioning of marker use on semantic category (measured by mutual information between marker choice and semantic class); (3) conditioning of marker use on lexical item (measured by mutual information between marker choice and noun) and (4) priming (a higher likelihood of using a particular marker when that marker was used by the partner on the previous trial).
For our analyses of regularisation and priming we used logistic mixed effect models run in R (R Core Team, 2023) using lmer (Bates, Reference Bates2015). We used the performance package (Lüdecke, Reference Lüdecke2021) to check for adherence to regression assumptions. For all models reported here, unless otherwise specified, the random-effects structure is maximal and consists of by-participant and by-pair random intercepts and random slopes for fixed effects that varied within-pair (that is, experiment phase); the random effect of participant was nested within the pair random effect, capturing the non-independence of pairs of participants in the Dyad condition.
For our analyses of mutual information, linear regression was not suitable due to violations of heteroscedasticity. We therefore followed Keogh et al. (Reference Keogh, Kirby and Culbertson2024) in using a permutation-based approach to evaluating statistical significance: for each comparison, a distribution of expected values under the null hypothesis was obtained by permuting the data 10,000 times (e.g. the permutation to assess effects of condition involves randomly re-assigning participants across conditions to remove any dependence in the data between condition and mutual information), then the observed value was compared to the null hypothesis distribution; we then obtained a z-score for the observed value (higher z scores indicate values which are unlikely under the null hypothesis), and a p-value by simply counting how many of the permuted data sets yield a value as or more extreme than the observed value. Note that since dyads contain two individuals but participants in the Single and Pseudodyad conditions participate individually, prior to permutation we replaced each pair of participants in a dyad with a single value obtained by averaging the values from the individual participants. An alternative approach involving down-sampling (selecting one participant at random per dyad) produces similar results.
We identified and excluded trials where an ungrammatical description was produced (e.g. a plural marker omitted for scenes involving a plural, the verb or noun omitted); this resulted in the removal of roughly 1% of trials. Data and analysis code are available at https://osf.io/4axu3/?view_only=f5b4ebe0a4de49b8a2097f4774168277. Plots were produced in ggplot2 (Wickham, Reference Wickham2009).
3. Results
3.1. Regularisation
Although each participant was trained on two equally frequent plural markers, for plotting and analysis purposes, we identified the marker a participant used more frequently (independently for the recall and interaction phases) and classified that as the majority marker for that participant at that phase. For the very small number of participants who produced both markers with exactly equal frequency, we picked one of the markers randomly as the majority marker. Full regularisation means the complete elimination of one marker and would be indicated by exclusive usage of the majority marker (proportion = 1). A deviation from equal usage of the two markers (that is, proportions greater than 0.5) indicates regularisation, with larger departures from 0.5 indicating greater regularisation.
Figure 2 shows the proportion of majority marker use in participants’ productions in the recall and interaction phases. We analysed this data using mixed-effects logistic regression, with each production classified according to whether the participant used their majority marker or not. This model included fixed effects of experiment phase (sum coded, such that positive coefficient estimates indicate more use of the majority marker in the interaction than in the recall phase), number of categories (sum coded, positive coefficients indicate more use of the majority marker in the two-category condition) and experimental condition (Single, Pseudodyad or Dyad; we used Helmert coding for this three-level factor, such that the first predictor compares Single to Pseudodyad conditions, and the second predictor compares the combined Single and Pseudodyad conditions to the Dyad condition). This combination of predictor coding means that the model intercept represents the grand mean over all manipulations and phases, and the effects of all fixed effects are main effects (that is, the effect of that predictor collapsing over all levels of other fixed effects). The random effects consisted of by-participant/pair random intercepts and slopes for experiment phase.

Figure 2. Proportion of majority marker produced by participants during recall and interaction. Participants in Dyads increase their use of their majority marker in interaction, particularly in the One Category condition. Coloured points give means for individual participants, black diamonds plus error bars indicate group means plus bootstrapped 95% CIs.
As Figure 2 suggests, most participants produced a mix of both markers, but more participants regularised fully (that is, produced only the majority marker) in the One Category Dyads condition during the interaction phase. The model shows that participants regularised more (that is, produced their majority marker more often) during interaction than in recall (as indicated by a significant effect of phase: b = 0.17, SE = 0.04, p < .001), and Dyads regularised more often than Singles/Pseudodyads (b = 0.18, SE = 0.04, p < .001). This difference between Dyads and the other conditions was also indicated by a condition x phase interaction effect in our model (b = 0.10, SE = 0.03, p < .001), but the difference was attenuated in Dyads in the Two Category condition, who regularised less in the interaction phase than One Category Dyads (as indicated by a condition x phase x number of categories effect: b = −0.06, SE = 0.03, p = .020).
We conducted follow-up analyses to explore whether the regularising effect of interaction was limited to Dyads or happened in all three conditions and was simply more pronounced in Dyads. This was achieved by re-running the same model using treatment coding of the Single/Pseudodyad/Dyad factor, taking each condition in turn as the reference level. This analysis found no evidence that participants in the Single and Pseudodyad conditions were more regular in the interaction phase than in recall (that is, the effect of phase for these conditions is n.s.: Single, b = 0.06, SE = 0.07, p = .916; Pseudodyads, b = 0.08, SE = 0.08, p = .311), but the effect of phase was significant for Dyads (b = 0.36, SE = 0.06, p < .001). We also verified that, even though Two Category Dyads regularised less than One Category Dyads during the interaction phase, they slightly increased their use of the majority marker (evaluated using treatment coding, this time of condition and experimental phase: for Two Category Dyads, the effect of experiment phase is marginal: b = 0.17, SE = 0.09, p = .055). Finally, we observed that participants in the Dyad condition tended to align with their partner in which marker was their majority marker: only 7 of 32 pairs had participants who used different majority markers, and even in those cases one partner used the other partner’s preferred marker on average 38% of the time (that is, it was only narrowly not the majority marker for both participants).
3.2 Conditioning of variation
The previous analyses show that Dyads regularised during interaction, and that One Category Dyads regularised more than Two Category Dyads, as indicated by the condition x phase interaction and several cases of full regularisation during interaction in that condition. One possible explanation for why Two Category Dyads are more likely to retain variability in their marker use in interaction is that they condition that variation on the semantic category of the objects, tending to use one marker to mark plurality for animals and the other marker for vehicles. We can evaluate this by measuring the mutual information of object category and a participant’s plural marker choice: mutual information (in bits) is 1 if a participant perfectly conditions their marker choice on object category (using one marker exclusively for animals and the other marker exclusively for vehicles), and mutual information is 0 if a participant uses the markers with the same frequency in both categories (e.g. using marker 1 0%, 50%, 70% or 100% of the time with both categories).
Mutual information is given by:

where H(M) is the entropy (in bits) of a participant’s marker distribution and H(M|C) is the conditional entropy of a participant’s marker use given the category of the object. H(M|C) = 0 if a participant only uses a single marker or uses a separate marker for each category of referent; H(M|C) = 1 if a participant uses both markers equally frequently with both categories. MI is therefore 1 only if the participant is highly variable overall (H(M) = 1) and their marker choice is predictable based on the category they are marking (H(M|C) = 0). We can also use MI to evaluate lexical (that is, noun-based) conditioning, by calculating the MI between markers and specific nouns (rather than noun categories) in a participant’s productions.
We use Monte Carlo techniques to establish, for each participant, whether a given level of MI associated with a particular distribution of plural marking is likely to arise by a chance assignment of markers to categories/nouns, or whether that level of conditioning represents non-random alignment of markers and categories/nouns. For the set of markers produced by each participant in recall or interaction, we generate 1,000 random permutations which use the same proportion of the various markers but randomly shuffles the assignment of markers to categories/nouns; we then measure the (category-based or lexical) MI of those randomisations, and compare the resulting distribution to the MI of the participant’s actual output: a participant’s output is classified as significantly non-random if it has higher MI than 95% of the randomisations, yielding a one-tailed test with a threshold p = .05. Since category-based and lexical conditioning are related (e.g. an assignment of markers which yields category-based MI of 1 will also have lexical MI of 1), we control for lexical conditioning when assessing the significance of category-based conditioning, and vice versa. This is straightforward for assessing lexical conditioning: we simply randomise the assignment of markers to nouns within each category, yielding a permutation with the same category-based MI but potentially different lexical MI. To assess category-based conditioning while controlling for lexical conditioning, we shuffle the assignment of nouns to categories (e.g. we might switch the categories of elephant and car, treating elephant as a member of the vehicle category and car as a member of the animal category), yielding a randomisation where lexical MI is unchanged with a potentially different category-based MI. Similar information-theoretic measures of conditioning and Monte Carlo statistical techniques are introduced in Smith and Wonnacott (Reference Smith and Wonnacott2010) and used in e.g. Samara et al. (Reference Samara, Smith, Brown and Wonnacott2017).
Figure 3 shows the category-based mutual information of participants’ productions in recall and interaction (for Two Category conditions only; this measure is not applicable in One Category conditions). We analysed this data using the permutation-based approach from Keogh et al. (Reference Keogh, Kirby and Culbertson2024), with the fixed effects of Condition, Phase, and their interaction (dropping the number of categories predictor). As is clear from Figure 3, and contrary to our expectations, there is very little category-based conditioning in general (mutual information values are around 0 for most participants, very few participants have levels of mutual information greater than expected by chance), and there is no reliable increase in conditioning in the interaction phase (no effect of condition, phase or their interaction, all p > .282).

Figure 3. Mutual information between marker choice and object category (animal or vehicle). Mutual information is low throughout, indicating very little category-based conditioning, and it does not increase in interaction. Points with a solid outline indicate participants whose observed level of Mutual Information is unlikely to arise by chance (as assessed by Monte Carlo methods); there are very few such participants.
If variability in marker usage in the Two Category Dyad condition is not protected from regularisation due to category-based conditioning, what else might explain the difference between One Category and Two Category Dyads in their degree of regularisation during interaction? One possibility is that Two Category Dyads are conditioning their marker use on the noun they are marking (e.g. using one marker with some nouns and the other marker with other nouns). This kind of lexical conditioning is possible in both One Category and Two Category conditions, but it might be easier to produce in the Two Category conditions because each category is more sparsely represented (e.g. participants in the Two Category condition only have to keep 8 animals distinct from one another, rather than 16). This is confirmed by an analysis of average similarity between all nouns encountered by each participant, as evaluated by cosine similarity between word embeddings of those nouns, where the embeddings are derived from the British National Corpus and taken from http://vectors.nlpl.eu/repository/. Average similarity between nouns is 0.39 in the One Category Condition (SD = 0.02), and 0.29 in the Two Category condition (SD = 0.01); in line with our intuition, nouns drawn from two categories are less similar to one another, that is, more distinct from one another, than nouns drawn from one category.
We can also use mutual information to evaluate lexical conditioning, by calculating the mutual information between markers and specific objects in a participant’s productions. As can be seen in Figure 4, lexical conditioning is much more prevalent than semantic category-based conditioning. Of particular interest is the effect of interaction on lexical conditioning. Lexical conditioning reduces in the interaction phase in the Single condition, where participants are interacting with a partner whose productions exhibit no lexical conditioning (that is, their partner produces a language with lexical mutual information of 0), and in the One Category Dyads, where many participants become fully regular (which eliminates any possibility of lexical conditioning). Interestingly, however, Two Category Dyads show no such reduction in lexical conditioning during interaction. These impressions are confirmed by a permutation-based analysis of lexical mutual information values, with fixed effects of Condition, Phase, Number of Categories and their interactions. This analysis indicates that lexical mutual information is generally lower in the interaction phase (as indicated by a significant effect of phase: z = 2.03, p = .043), except in Two Category Dyads where it remains level (as indicated by a three-way interaction between condition, number of categories and phase: z = −2.42, p = .012). This three-way interaction becomes non-significant if we remove participants who are fully regular (three-way interaction z = −1.72, p = .090) or highly regular (three-way interaction z = −1.70, p = .092) from the analysis, that is, the difference between One Category and Two Category Dyads in their lexical conditioning behaviour may be largely due to the individuals in the One Category condition who became fully/highly regular during the interaction phase. This model also indicates that Pseudodyads show less of a decrease in lexical conditioning in the interaction phase than the Singles (condition x phase interaction for the Singles-Pseudodyads factor, z = 2.19, p = .028). This seems to be largely driven by the anomalous behaviour of the One Category Pseudodyads who do not show the same reduction in conditioning seen elsewhere where participants interact with a partner whose productions exhibit no lexical conditioning but also do not produce highly regular marker use as seen in the One Category Dyads.

Figure 4. Mutual information between marker choice and individual nouns (e. g. truck, cow). Mutual information is quite high throughout, indicating high levels of lexically based conditioning that tend to decrease during interaction except in One Category Pseudodyads and Two Category Dyads. Plotting conventions as in previous figures. Points with a solid outline indicate participants with observed levels of Mutual Information unlikely to arise by chance, see footnote 2.
The difference in lexical conditioning between One and Two Category Dyads suggests that variation in Two Category Dyads is preserved in the interaction phase thanks to increased lexical (rather than category-based) conditioning. We will next examine whether moment-to-moment alignment between partners during interaction (that is, priming) can explain this difference.
3.3 Priming
Our data allow us to investigate the extent to which participants are influenced by their partner’s plural marker choice during interaction – do we see priming, that is, the tendency to copy a partner’s last choice of marker, and if so, does this differ depending on whether participants are (or believe themselves to be) interacting with another person?
To evaluate the extent to which a participant copies their partner’s marker choice, we arbitrarily designated the two markers for each pair as Marker 1 and Marker 2.Unlike in regularisation, all that matters for this analysis is marker identity, so this designation does not depend on frequency of production. Figure 5 plots how often participants produced Marker 2 based on whether their partner produced Marker 1 or Marker 2 the last time they produced a marker. We ran a mixed effects logistic model on this data, plotting marker choice based on fixed effects of partner’s last production (centred, higher coefficient estimates indicate more use of Marker 2 after the partner produced Marker 2), number of categories (sum coded as in previous analyses) and condition (Helmert coded as in previous analyses), with by-participant random intercepts and by-pair random intercepts and random slopes for partner’s last production. This model shows a clear priming effect (as indicated by a positive effect of partner’s last marker, b = 0.78, SE = 0.10, p < .001) and an interaction suggesting stronger priming in Dyads than in Singles/Pseudodyads (as indicated by a condition x partner’s last production interaction, b = 0.18, SE = 0.07, p = .010).

Figure 5. Participant’s marker choice as a function of their partner’s last produced marker. We see priming (Marker 2 more likely to be produced when the partner just produced Marker 2) in all conditions, but especially in Dyads where the effect of the partner’s last marker choice is particularly pronounced. Plotting conventions as in previous figures.
We were also curious whether such priming effects were boosted by category or lexical overlap between prime and target, in other words, if the priming effect was stronger within semantic categories (i.e. if your partner was producing a given marker for an animal, are you more strongly primed if you are also producing for an animal rather than a vehicle?), or stronger within lexical items (i.e. if your partner was producing a given marker for a particular animal, say elephant, are you more strongly primed if you are also producing for that referent?). The presence of a category-based priming boost could explain a reduced tendency to regularise in Two Category Dyads, and it may also provide a potential mechanism by which category-based conditioning could develop (by reciprocal priming). Figure 6 shows how often participants in Two Category conditions produced Marker 2 based on whether their partner produced Marker 1 or Marker 2 the last time they produced a marker, split by whether the noun in the partner’s last production was from a different or the same category. There is no obvious priming boost for same-category primes, and a model run on this data including same/different category as a predictor did not find evidence of this facilitating effect: the interaction between partner’s last production and same/different category was not significant (b = 0.22, SE = 0.17, p = .204). In contrast, we see a large facilitating effect of whether the prime was for the same or a different noun (Figure 7): in an equivalent analysis run on data from both the One and Two Category conditions, including a factor coding same/different noun, we see a large positive interaction effect between partner’s last production and same/different noun (b = 1.07, SE = 0.32, p < .001). This does not interact with the Single/Pseudodyad factor (p > .087). This suggests that the low-level interactive mechanics of priming do not explain the difference in regularisation tendencies between One Category and Two Category Dyads, but they indicate a potential mechanism by which interaction can reinforce lexical conditioning: participants are much more likely to reproduce their partner’s lexically based conditioning, due to this priming effect. We also tested whether a more fine-grained measure of similarity between nouns affected priming, using cosine similarity between word embeddings for the partner’s last noun and the target noun. A regression model on this data indicated a positive interaction between embedding similarity and partner’s last production as predictors (b = 0.20, SE = 0.07, p = .004). However, this effect seems to be driven largely by the lexical boost from similarity = 1 items (that is, where the participant is producing for the same nouns as their partner); removing similarity = 1 trials from the analysis leads to no interaction (b = 0.05, SE = 0.07, p = .533), suggesting that fine-grained similarity, at least as evaluated with these embeddings, does not influence the strength of priming.

Figure 6. Participant’s marker choice as a function of their partner’s last produced marker, split by whether the partner’s prime featured a noun from the same or different category. We see priming (Marker 2 more likely to be produced when the partner just produced Marker 2) in all cases, but not more so when the prime is from the same category. Plotting conventions as in previous figures.

Figure 7. Participant’s marker choice as a function of their partner’s last produced marker, split by whether the partner’s prime featured the same or a different noun. NB trials featuring a same noun prime are less common than trials featuring a different noun prime. There is a clear lexical boost, with greater priming when the partner was producing for the same noun. This effect appears to be considerably stronger in Dyads, but this is not supported in the statistical analysis. Plotting conventions as in previous figures.
4. Discussion
Our study tested whether interaction would lead to the emergence of conditioned variation. Our participants learned an artificial language that described images drawn from one or two semantic categories featuring unpredictably varying plural markers. Our main question was whether participants who were given a language with two semantic categories would spontaneously condition marker use on these categories and whether we would find increased conditioning resulting from interaction when compared with individual recall. Previous research had demonstrated that learners readily acquire conditioned variation (Wonnacott, Reference Wonnacott2011) and that they can exploit semantic (Brown et al., Reference Brown, Smith, Samara and Wonnacott2022) or social (Samara et al., Reference Samara, Smith, Brown and Wonnacott2017) conditioning cues. Brown et al. (Reference Brown, Smith, Samara and Wonnacott2022) found relatively few circumstances in which participants spontaneously introduced semantic conditioning during learning and recall. However, the condition in their experiment which most closely matched our recall phase did produce a few participants who spontaneously conditioned variation on semantic category. We had very few participants who did this more than what could be expected by chance. The reason for this difference is not clear, particularly given the close alignment in the number of nouns used and the number of training trials.
We did not find any evidence that interaction increases semantic conditioning. Other work has shown that interaction leads to the reduction in unpredictable variation, typically via regularisation (e.g. Fehér et al., Reference Fehér, Wonnacott and Smith2016, Reference Fehér, Ritt and Smith2019), and indeed we saw regularisation in the One Category Dyad condition here. Conditioning on semantic categories would provide an alternative way to render variation predictable, and we expected that conditioning on category would be preferable to conditioning lexically because it is the simplest and most easily learnable option on which interlocutors could converge. Furthermore, we expected that if reciprocal interaction amplifies individual biases for predictability (as it does for regularisation), even quite weak conditioning tendencies in recall might have stronger effects in interaction. One possible explanation for why this did not happen is that the individual biases for semantic conditioning were not strong enough to be amplified in interaction, consistent with the very low levels of category-based conditioning in the pre-interaction recall test.
Variability can also be rendered predictable by conditioning marker use on individual nouns (lexical conditioning). Although this type of conditioning places more demands on memory than category-based conditioning, we found widespread lexical conditioning during recall, matching results elsewhere showing spontaneous lexical conditioning in adults (Saldaña et al., Reference Saldaña, Smith, Kirby and Culbertson2021; Smith et al., Reference Smith, Perfors, Fehér, Samara, Swoboda and Wonnacott2017; Smith & Wonnacott, Reference Smith and Wonnacott2010; Wonnacott et al., Reference Wonnacott, Newport and Tanenhaus2008, Reference Wonnacott, Brown and Nation2017). Lexical conditioning usually decreased in the interaction phase, although the causes for this may have been different across conditions. In the Singles and Pseudodyad conditions, participants interacted with a highly variable partner who provided additional evidence that variation was not lexically conditioned, and resolutely failed to adopt any lexical conditioning modelled for them by their human partner. In One Category Dyads, the drop in lexical conditioning in interaction may be due to a mismatch between interlocutors’ idiosyncratic preferences as to which marker should go with which lexical item (see also Smith et al., Reference Smith, Perfors, Fehér, Samara, Swoboda and Wonnacott2017). Surprisingly, however, Two Category Dyads did not decrease their lexical conditioning in the same way: Two Category Dyads were able to retain lexical conditioning in interaction, avoiding full regularisation but also ignoring the easier option of conditioning on semantic category.
One possible explanation for the prevalence of spontaneous lexical, but not category-based conditioning, during recall is that the evidence against category-based conditioning is stronger in the training data participants received – we are grateful to an anonymous reviewer for making this observation. By design, during training participants receive direct evidence that variation is not conditioned on lexical items or semantic categories; however, the evidence for the absence of category-based conditioning is stronger than the evidence for the absence of conditioning on each individual lexical item, simply because the learners get more data at the category level than the lexical level: participants encountered each noun only 4 times during sentence training (twice with each marker) but encountered each category 32 times during training (16 times with each marker). This account of course assumes that learners are not sensitive to the higher-order generalisation that lexical items are variable, and prior work has shown using similar paradigms to ours that both adult and child learners are in fact sensitive to these generalisations over lexical patterns (Wonnacott, Reference Wonnacott2011; Wonnacott et al., Reference Wonnacott, Newport and Tanenhaus2008). However, it may be that the category-based evidence is stronger or more accessible than the generalisation about the variability of lexical items, a possibility that future work could explore.
In contrast with our earlier findings (Fehér et al. Reference Fehér, Wonnacott and Smith2016), we did not observe regularisation in Pseudodyads, participants who thought they were interacting with a human but actually interacted with a variable computer partner. In the previous study, Pseudodyads increased their regularisation to the same extent as Dyads in the interaction phase, which could have been due to higher order communicative intentions (since priming should have driven them towards less regularisation). The discrepancy between the findings could be because of the difference in the variable linguistic element: the 2016 study involved word order variation whereas this study features morphological variation. Although Saldaña et al. (Reference Saldaña, Smith, Kirby and Culbertson2021) did not observe different levels of regularisation in word order and morphological variation, it is possible that different types of variation evoke different levels of biases: the preferred word order in the 2016 study was less effortful to produce and more similar to the most frequent word order in English (the participants’ first language), whereas the current study looked at variation between two well-matched markers. This would mean that in the earlier study, Pseudodyads’ attempts to align with their partner involved an overproduction of the variant that they believed their partner should also prefer, whereas the current study did not feature obviously preferred markers. Despite the discrepancy in the assumed strength of the biases, the recall phase in both studies revealed a similar level of bias towards one of the two possible variants: the preferred structure was used in about 60% of the recall trials. Although the direction of the preference was random in this case, Dyadic interaction did lead to increased regularisation by about 20 percentage points. Therefore, when real participants interact with each other, they are able to negotiate and converge on a shared linguistic system.
To explain the high levels of regularisation in One Category Dyads, we explored priming as a possible mechanism. Priming is a low-level interactive mechanism that, due to the reciprocal nature of interaction, can potentially drive paired participants to produce increasingly regular output. We found evidence for priming across the board, and higher levels in Dyads than in Pseudodyads and Singles, but the size of the priming effect did not differ between One Category and Two Category conditions, so priming cannot, in itself, explain the difference in regularisation between One and Two Category Dyads. There was no evidence for increased priming within a given semantic category which could have facilitated semantic conditioning. We did however find evidence for lexical boost (Pickering & Branigan, Reference Pickering and Branigan1998), increased priming when producing for the same noun as the partner, and this could have been a plausible mechanism for lexical conditioning to increase during interaction. However, there was no robust evidence that the size of the lexical boost differed between One Category and Two Category conditions, which would be required to explain greater lexical conditioning in the Two category condition. We therefore conclude that priming alone does not explain the differences in regularisation and conditioning we see in our two Dyad conditions.
To conclude, we have shown that interaction did not lead to the emergence of semantic conditioning, and in most cases, it also reduced the amount of lexical conditioning present in individual recall. In the Single and Pseudodyad conditions participants interacted with a partner who did not have any conditioning in their language; the priming we found in these conditions would have pushed participants to reduce their lexical conditioning. It is less clear why interaction in dyads did not produce category-based conditioning, but it may be that any incipient tendency to condition marker choice based on category is overridden by the interplay of other mechanisms such as lexical conditioning and priming. If people have strong tendencies to be primed, for instance, they will tend to repeat their partner’s marker use, even in cases where that violates an emerging category-based conditioning system (e.g. copying a partner’s last marker choice for an animal referent when you are producing a marker for a vehicle), preventing category-based conditioning from getting off the ground. Stronger priming within than across categories might resolve this issue, but in fact we found no evidence of an effect of category on priming.
There was also a strong tendency for regularisation in interaction, but only in One Category Dyads. The mere presence of two semantic categories seemed to be enough to prevent regularisation. One possible explanation for this is that simply having two semantic categories in the input made participants aware of the variation in the referent space, and to therefore preserve similar variation in the markers. They may even have understood that marker variation could be based on semantic category, but simply failed to converge on such a system with their partner, for reasons reviewed above. Another possibility is that participants in the One Category condition may have found it harder to remember which marker was associated with which noun, due to the higher level of similarity between all nouns in that condition (that is all referents are animals or all referents are vehicles), making the Two Category condition better suited to retaining lexical conditioning. Whether this is a factor in lexical conditioning would be possible to test experimentally by asking whether people find it easier to learn a lexically conditioned system where the conditioning items are more or less distinct from one another.
The evolution of semantic conditioning is likely the result of several competing learning and interactive mechanisms. We found that interaction alone does not result in category-based conditioning and that people show a preference for lexical conditioning and have strong tendencies for priming even when communicating in an artificial language. Further studies are necessary to determine how semantic conditioning can evolve spontaneously. Conditioning of any type may not be particularly useful unless people are asked to generalise rules to novel items, so future studies should include this manipulation. Cross-generational transmission may also be necessary for category-based conditioning rules to be established in a linguistic system, which transmission chain experiments can determine. Interaction may be necessary but is not sufficient on its own to lead to category-based conditioning.
Acknowledgements
This work was supported by a Newton International Fellowship, awarded to O.F., and by the Economic and Social Research Council (Grant No. ES/K006339, held by K.S.). This project has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 681942, held by K. S.
Competing interests
The author(s) declare none.