1. Introduction
Several studies have shown that human communication is multimodal, involving nonverbally conveyed messages, referred to as visual cues, together with verbally conveyed messages. It has also been demonstrated that these co-speech visual cues are grammaticalized, expressing several semantico-pragmatic functions (Gibbon, Reference Gibbon2009; Zellers, House & Alexanderson, Reference Zellers, House and Alexanderson2016, inter alia). Moreover, visual cues and prosody are intertwined systems in multimodal communication, with a relevant role for sociopragmatics (Brown & Prieto, Reference Brown, Prieto, Haugh, Kadar and Terkourafi2021). This also applies to signed languages: nonmanuals (facial expressions, head movements and body postures), in American Sign Language (ASL) and Israeli Sign Language, are grammaticalized (Sandler, Reference Sandler and Brown2005) and are part of the intonational system of these signed languages (Dachkovsky & Sandler, Reference Dachkovsky and Sandler2009). However, the relative weight of visual cues in conveying prosodic meaning, in comparison with other cues, remains unexplored. The goal of the current study is to examine the relative weight of visual cues, focusing on nonmanuals (in sign language) and co-speech visual cues (in the spoken modality) produced to convey interrogativity in Portuguese.
By comparing interrogatives of thirty-five geographically and genetically different signed languages around the world, Zeshan (Reference Zeshan2004) observed that the range of crosslinguistic variation is large for some parameters, such as the structure of question-word paradigms. However, interestingly, the use of nonmanuals in questions exhibits more similarities (or less variability) across signed languages. Namely, head movements, together with eyebrow movements, have been reported to play a prosodic role as question markers. These nonmanuals were also shown to vary in form across question types, with yes-no questions being typically produced with eyebrow raising and head forward and down, and wh-questions being more frequently produced with eyebrow lowering combined with head forward or raising (Dachkovsky & Sandler, Reference Dachkovsky and Sandler2009; Dachkovsky, Healy & Sandler, Reference Dachkovsky, Healy and Sandler2013; Pfau & Quer, Reference Pfau, Quer and Brentari2010; Sandler, Reference Sandler and Brown2005; Zeshan, Reference Zeshan2004). In contrast with questions, statements do not seem to be marked by nonmanuals, as reported for ASL (Valli, Lucas, Mulrooney & Villanueva, Reference Valli, Lucas, Mulrooney and Villanueva2011).
Differently from most signed languages, in Portuguese Sign Language (LGP) questions are mainly produced with eyebrow lowering (63% - Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2019), and only the head movement changes between yes-no questions (head falling) and wh-questions (head raising). Statements, as observed for other signed languages, are predominantly not produced with any specific nonmanual marker (50% - Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2019). When present, the most frequent nonmanual in statements is the head falling movement (35% - Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2019). Despite the differences between LGP and other signed languages, a similar combination of nonmanuals was found, thus showing that the underlying system is similar: the same nonmanuals are used in a combinatory way (Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2019). Although the relative weight of these nonmanuals in conveying prosodic meaning is largely unexplored, Cruz, Swerts and Frota (Reference Cruz, Swerts and Frota2019) advanced the proposal of considering eyebrow lowering as a question marker in LGP, because questions are mostly signaled by this nonmanual, and the head movement as a marker of the question type, because it varies across question types. However, it is important to mention that the authors also reported the production of some yes-no questions with the head falling movement as a single nonmanual, thus leading to a hypothetical ambiguity with the statements produced with the same nonmanual. These productions are important test cases to explore the relative weight of nonmanuals in conveying prosodic meaning.
In the spoken modality, several studies have shown that co-speech visual cues are organized into a system sharing several features with the prosody of spoken language (Krahmer & Swerts, Reference Krahmer and Swerts2009; Loehr, Reference Loehr2012; Mol, Krahmer, Maes & Swerts, Reference Mol, Krahmer, Maes and Swerts2012; Swerts & Krahmer, Reference Swerts and Krahmer2008). Eyebrows and head movements, as well as hands, are the most studied co-speech visual cues from a prosodic point of view. They have been shown to play a complementary role in the spoken modality, although the strength of co-speech visual cues is weaker than auditory cues for the distinction across sentence types (Crespo-Sendra, Kaland, Swerts & Prieto, Reference Crespo-Sendra, Kaland, Swerts and Prieto2013; Granström & House, Reference Granström and House2004; House, Reference House2002). Eyebrow movements have been described as a question marker (Cavé et al., Reference Cavé, Guaïtella, Bertrand, Santi, Harlay and Espesser1996; Purson et al., Reference Purson, Santi, Bertrand, Guaïtella, Boyer and Cavé1999), also playing a relevant role for the perception of focus (Krahmer, Ruttkay, Swerts & Wesselink, Reference Krahmer, Ruttkay, Swerts, Wesselink and Bel2002), prominence (Krahmer & Swerts, Reference Krahmer and Swerts2007; Swerts & Krahmer, Reference Swerts, Krahmer, Bel and Marlin2004, Reference Swerts and Krahmer2006, Reference Swerts and Krahmer2008), and for the distinction across sentence types and pragmatic meanings (Borràs-Comes & Prieto, Reference Borràs-Comes and Prieto2011; Borràs-Comes, Kaland, Prieto & Swerts, Reference Borràs-Comes, Kaland, Prieto and Swerts2014; Crespo-Sendra, Kaland, Swerts & Prieto, Reference Crespo-Sendra, Kaland, Swerts and Prieto2013). Head movements, like eyebrows, may vary in form and function (Kousidis, Malisz, Wagner & Schlangen, Reference Kousidis, Malisz, Wagner and Schlangen2013). They have been considered as modality markers for uncertainty, they signal discourse structure and lexical repairs, and they may function as deictics or as backchannel giving/request (McClave, Reference McClave2000; Poggi, D’Errico & Vincze, Reference Poggi, D’Errico and Vincze2010). Although reported to pattern ‘in predictable ways’ (McClave, Reference McClave2000: 856), it has been challenging to identify kinematic parameters that allow a systematic description of head gesture patterns, due to their dynamicity, variability and multidimensionality (Wagner, Malisz & Kopp, Reference Wagner, Malisz and Kopp2014).
In the spoken modality of European Portuguese (EP), eyebrow raising has been shown to be the dominant visual cue for questions across varieties, and thus a question marker, together with the nuclear pitch accent (Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2015). In combination with eyebrow raising, the head falling movement also occurs in yes-no questions (56% in the standard variety – Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2015). Because the visual cue characterizing the production of statements is the head falling movement alone (67%), Cruz, Swerts and Frota (Reference Cruz, Swerts and Frota2015) hypothesized that speakers could be sensitive to visual information to identify sentence types across varieties, especially in the absence of tonal contrast or in the presence of audiovisual mismatches. However, this hypothesis was not confirmed, as Cruz, Swerts and Frota (Reference Cruz, Swerts, Frota, Leemann and Post2017) have shown that intonational cues are more relevant than visual cues to identify sentence types. Nevertheless, the authors also concluded that visual cues might play a role in structural or linguistic marking, as participants’ reaction times were lower when exposed to the audiovisual condition, comparatively to the audio only condition. This also suggests that processing two types of cues – auditory and visual – at the same time is not necessarily synonymous of a heavier cognitive process. Actually, visual cues might reduce the cognitive load (Goldin-Meadow, Nusbaum, Kelly & Wagner, Reference Goldin-Meadow, Nusbaum, Kelly and Wagner2001). This has been recently confirmed by Holler, Kendrick and Levinson (Reference Holler, Kendrick and Levinson2018) and Holler and Levinson (Reference Holler and Levinson2019), who showed that responses to questions accompanied by manual and/or head movements are faster than responses to questions conveyed auditorily only. Thus, in line with the proposal of an integrated system in language comprehension (Kelly, Özyürek & Maris, Reference Kelly, Özyürek and Maris2010), visual cues induce facilitation in processing, thus enhancing comprehension (Kita & Emmorey, Reference Kita and Emmorey2023), and thinking (Kita, Alibali & Chu, Reference Kita, Alibali and Chu2017), besides speaking (see Clough & Duff, Reference Clough and Duff2020, for a review on the role of gesture in communication and cognition).
It seems thus clear that visual cues are crucial for communication purposes, playing similar functions in different language modalities, although in different forms or degrees. This might be the case for eyebrow and head movements, which are both relevant for conveying sentence type and pragmatic meaning in LGP and in the spoken modality of Portuguese. As previously described, eyebrows are present in the production of yes-no questions, assuming different forms in both modalities (eyebrow lowering in LGP; eyebrow raising in the spoken modality). Head movements are also present in yes-no questions, assuming the same form in both modalities (head falling movement), and may be present in the production of statements (also head falling in both modalities).
2. Research questions and hypotheses
Since in LGP yes-no questions may be, although less frequently, produced with the head falling movement only (i.e., the eyebrow movement, typically considered as a question marker, is absent), a natural context arises for a hypothetical ambiguity with the production of statements characterized by the occurrence of the same nonmanual cue. However, communication for deaf signers is not disrupted as a result of this apparent overlap. We thus hypothesize that the amplitude of the head falling movement might be different between the two sentence types. If confirmed, this would mean that, in the absence of the eyebrow movement, the head assumes the role of question marker, thus reinforcing the dynamicity and multidimensionality of the head gesture (as suggested by Wagner, Malisz & Kopp, Reference Wagner, Malisz and Kopp2014). Additionally, it would show that, in signed languages, the interplay of nonmanuals is a complex and dynamic mechanism, subject to a relative weighting across nonmanuals used in combination.
Like in LGP, in the spoken modality of EP the head falling movement also occurs in statements and yes-no questions as single visual cue (17% - Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2015). However, differently from LGP, in the spoken modality this does not lead to a hypothetical ambiguity, as the auditory (intonational) cues are not the same: yes-no questions are produced with a falling-rising melody, whereas statements are characterized by a falling melody. For this reason, and also taking into consideration the fact that speakers do not seem to rely on visual cues to identify sentence types, we do not expect to find a different amplitude of head movements between sentence types. If confirmed, this would allow us to conclude that the visual cues used to convey prosodic meanings are grammaticalized differently in each modality, what might impact the communication between signers and speakers.
To sum up, we hypothesize that, in LGP, the amplitude of the head falling movement might be larger in yes-no questions produced with the head falling movement only (i.e., without the eyebrow movement) than in statements, which are characterized by the occurrence of the same nonmanual. If confirmed, this will lead to the interpretation that the amplitude of the head falling movement is a question marker in LGP in the absence of eyebrow movement.
In the spoken modality of EP, we do not expect to find a different amplitude of head movement between the two sentence types (given that no ambiguity arises due to the presence of contrasting spoken intonational cues). The confirmation of this hypothesis will allow us to conclude that the visual cues used to convey prosodic meanings are grammaticalized differently in each modality.
In order to test these hypotheses, we conducted a kinematic analysis of the vertical displacement of head movements in statements and yes-no questions, both in LGP and in the spoken modality of Portuguese, following the methodological procedures detailed below.
3. Methods
3.1. Participants
Five native signers of LGP, all deaf womenFootnote 1, aged between 20–50 years-old, participated in the study. Four of them reported congenital hearing deafness, while the fifth reported acquired hearing deafness at the age of 6 months. They were videotaped in a quiet room of Associação Portuguesa de Surdos (APS, Portuguese Association of Deaf People) while performing an adapted version of the Discourse Completion Task (Félix-Brasdefer, Reference Félix-Brasdefer, Juán and Martinéz-Flor2009; Billmyer & Varghese, Reference Billmyer and Varghese2000) for LGP (Cruz, Martins & Frota, Reference Cruz, Martins and Frota2017). Participants interacted with an interpreter of LGP, also a linguist, who works on a daily basis in the deaf association where data collection took place. This ensured spontaneity in the task, as all participants were used to interacting with each other, besides the fact that they were in their natural environment. The sentences were elicited as follows: the interpreter of LGP presented several communicative contexts one by one, some of them supported by printed images, and asked the signers to produce what they would gesture in each communicative context as if they were there. A Canon – Legria video camera (model HF G25 AVCHD) was positioned in front of the participant, at a fixed distance across recordings, and the participant was sat in diagonal line with the interpreter. This setting was the same for all recordings.
Additionally, five native speakers of Standard European Portuguese (SEP), all women, aged between 20–45 years-old, were videotaped in a sound attenuated laboratory with a JVC video camera (model GY-HM11E), while performing the Discourse Completion Task, adapted for Portuguese within the project InAPoP - Interactive Atlas of the Prosody of Portuguese (Frota, coord., 2012-Reference Frota2015).
The current analysis was based on a sample of the collected data, as detailed below.
3.2. Dataset
The sample for analysis covers neutral statements (i.e., broad focus statements uttered as all-new utterances) and neutral yes-no questions (i.e., information-seeking questions where the information asked is new to the speaker). For LGP, we considered both the questions produced with the two most frequent nonmanuals (head falling movement, together with eyebrow lowering – Figure 1), and the potentially ambiguous yes-no questions, i.e., those produced with a head falling movement only (Figure 2).
Some yes-no questions produced with the two visual cues may also show a head forward movement as the head moves downwards. We ran an informal pilot perceptive study involving native signers, who were asked to watch videos of yes-no questions with and without the head forward movement, and to choose one out of 4 possible answers: (i) the signer wants to know something, without any previous related knowledge, (ii) the signer wants to know something, with previous related knowledge, (iii) the signer already knows the answer to the question, but she wants to confirm, (iv) other situation – please specify and explain why. This pilot revealed that the head forward movement is not interpreted as a question marker or as relevant information from a pragmatic point of view. For this reason, we considered these productions as conveying two visual cues for interrogativity.
Both statements and questions are produced with a falling head movement in spoken Portuguese (Figure 3). However, the potential ambiguity between statements and questions with one visual cue that characterizes LGP does not occur in the spoken modality, as auditory cues are distinct between sentence types, namely the nuclear contour is different, and were shown to be more relevant than visual cues to perceive questions (Cruz, Swerts & Frota, Reference Cruz, Swerts, Frota, Leemann and Post2017).
The most frequent production of a yes-no question in the spoken modality is illustrated in Figure 4, with the falling-rising intonational cues accompanied with the visual cues - head falling movement together with eyebrow raising.
All these sentences were previously annotated in ELAN, version 6.0 (Wittenburg, Brugman, Russel, Klassmann & Sloetjes, Reference Wittenburg, Brugman, Russel, Klassmann and Sloetjes2006; Max Planck Institute for Psycholinguistics, The Language Archive, 2020), with detailed information on head and eyebrow movements time-aligned with intonational cues (for the spoken modality – see Cruz, Swerts & Frota Reference Cruz, Swerts and Frota2015 for further details) or with manuals (for LGP – see Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2019 for details). FACS – Facial Action Coding System (Ekman, Friesen & Hager, Reference Ekman, Friesen and Hager2002) was used to annotate the visual cues, and the whole movement of each visual cue was taken as an event. After this annotation, done by a linguist trained in audiovisual annotation and in LGP, the utterances were selected for the kinematic analysis by filtering the sentence types and the labels of interest for the current study, which were related with the type and number of visual cues involved (e.g., yes-no questions with both visual cues or with the head movement alone). In total, 101 utterances were considered, with the following distribution: (i) 26 sentences from LGP, including 9 statements, 13 yes-no questions produced with both nonmanuals, and 4 yes-no questions produced with the head only; (ii) 75 sentences from the spoken modality, comprising 18 statements, with the common nuclear contour H + L* L% (Frota, Reference Frota2002), 39 yes-no questions produced with the most frequent nuclear contour H + L* LH% (Frota, Reference Frota2002) and both visual cues (head falling movement, together with eyebrow raising), and 18 yes-no questions with the same melodic pattern, but involving the head movement only.
All 101 utterances were labeled in ELAN by one additional annotator, naïf in audiovisual annotation and in LGP, who performed a blind annotation, i.e., he only had access to two tiers (one for the eyebrows and another one for the head movements) with blank intervals time-aligned with the video. For the LGP data, no information was provided about glosses, sentence type or meaning being produced. However, and in order to avoid biased annotations, a third tier was added to the annotator’s grids in order to call his attention to nonmanuals conveying lexical information. For instance, the adverb ONTEM (‘yesterday’) is manually articulated with the pointing finger moving backwards above the shoulder and a head up movement, which plays a role in the lexical/morphological component of the grammar, thus unrelated to the prosody of statements and questions.
In order to control for variability of labels used, the ELAN grids included a controlled vocabulary containing a set of labels for the eyebrows (neutral, raising, lowering) and another set for the head (neutral, up-down, down-up, left–right, tilt left, tilt right, forward, backward). The naïf annotator only had to double-click in each time interval and to choose the label that better described the eyebrow and the head movement occurring in its time span. At the end, he was asked to revise all his annotations in order to ensure internal consistency in the labelling task performed, both within and across modalities.
Cohen’s kappas were then computed, using SPSS (version 26) (IBM Corp., 2019), to generate pairwise inter-rater agreement scores. Following Landis and Koch’s interpretation of kappa (Landis & Koch, Reference Landis and Koch1977), an overall substantial agreement was found between the trained linguist annotator and the naïf annotator for the eyebrows (Cohen’s κ = .78) and an overall slight agreement for the head (Cohen’s κ = .15). When considering the modality annotated, we observed a moderate agreement for the eyebrows in LGP (Cohen’s κ = .52) and an almost perfect agreement for the same visual cue in the spoken modality Cohen’s κ = .84). For head movements, a slight agreement was found in LGP Cohen’s κ = .11) and a slight agreement in SEP Cohen’s κ = .10). This clearly shows that (i) annotating LGP, in general, is harder, which was expected, as the naïf annotator does not know LGP; (ii) annotating head movements is harder than eyebrows because the set of labels available in the controlled vocabulary for the head movements was bigger, covering all rotation possibilities in the y and x axis. The eyebrows, in contrast, are not physiological dynamic as the head is. For these reasons, we decided not to exclude the sentences with unmatching labels from the dataset.
3.3. Analysis procedures
All the 101 utterances were prepared for the extraction of the respective kinematic information according to the procedure detailed in section 3.3.1. The kinematic measurements were then subject to statistical analysis, as described in section 3.3.2.
3.3.1. Kinematics
Using Kinovea (version 0.9.4), a free 2D motion analysis software, the head vertical displacement was tracked along the time series (ms) and extracted in pixels (px). For each video file, we added a marker to the participant’s chin (Figure 5), and we inspected the video frame by frame to manually correct the marker location when needed (for instance, when manuals are partially superimposed with the head movement, which results in a tracking loss).
Since we are dealing with productions of varying length in number of syllables and words, we automatically extracted the time-normalized vertical displacement to control for duration differences across sentences. This allowed us to merge the kinematics data of each sentence type per modality and to observe/compare the head peaks along the time series across sentence types, also considering the type of visual (and auditory) cues present in the signal, as described in sections 4.1, 4.2, and 4.3.
3.3.2. Statistics
The statistical analysis performed was similar for both data samples – LGP and spoken modality of Portuguese. Namely, Generalized Additive Models (GAMs) were run for each sample (Wood, Reference Wood2017), as they allow to compare the head vertical displacement extracted in pixels over time, by modelling non-linear dependencies of this dependent variable and sentence type as predictor (statement, yes-no question with two visual cues, and yes-no question with one visual cue), via smooth functions. For each modality, we ran a GAM (with fixed predictors only) and a Generalized Additive Mixed Model (GAMM), with participant as random factor. All models were initially fitted using fREML (fast restricted maximum likelihood estimation), which is the default smoothing parameter estimation method obtained in GAM. However, in other to prevent oversmoothing, model fits were also checked using the function gam.check(). Since the k-index was not smaller than 1 in any model, the default smoothing parameter (k = 10) was kept, thus, none smoothing adjustments were needed. Then, to compare the kinematics of the head movement between modalities over time, we ran another GAMM with Modality (spoken versus signed) as predictor, additionally to the Sentence type. Because the mixed models ran for each sample exhibited a better fit than the models without the random factor, participant was here included as random factor (and only a GAMM was run). Additionally, the interaction Sentence type*Modality was also modeled over time, as we were interested in observing whether the head vertical displacement is different across sentence types, depending on the modality.
The dependent variable corresponds to a total of 7500 measurements for the spoken modality (i.e., 100 measurements for each sentence, along the normalized time series into 1 s) and 2600 measurements for LGP.
All statistical procedures were run in R software (version 4.4.1, R Core Team 2024), using R packages mgcv (Wood, Reference Wood2011, Reference Wood2017) for modelling, and R package itsadug to plot the models results (van Rij et al., Reference van Rij, Wieling, Baayen and van Rijn2017).
4. Results
The results are presented first for LGP, and then for the spoken modality.
4.1. Portuguese Sign Language (LGP)
Figure 6 shows head vertical displacement in LGP along the normalized time series, merged per sentence type, and comparing statements and yes-no questions. We observe that statements (blue line) are less marked by head movement, ranging from 4.54 to −4.59 pxs, than yes-no questions (gray and orange lines), which display a wider degree of displacement.
Within yes-no questions, those produced with the head as the single visual cue show the widest amplitude, ranging from 8.45 to −17.47 pxs (orange line), whereas yes-no questions produced with head and eyebrow movements present a smaller amplitude, ranging from 3.04 to −15.86 pxs (gray line). This seems to suggest that the head amplitude plays a relevant role in conveying interrogativity, and that it varies with the amount of nonmanuals in the signal, along the lines of our initial hypothesis.
To confirm whether head amplitude plays such a relevant role, a GAM was run with head vertical displacement as dependent variable, and sentence type as predictor, with yes-no question with two visual cues as reference level – Model 1. We also ran Model 2 (GAMM), with participant as random factor. Model 2 showed a better fit than Model 1 (fREML = 9489.7 versus fREML = 9703.5), explaining 35.6% of the deviance (contra 23.2% in Model 1). For this reason, we report the results from Model 2, plotted in Figure 7. We observed that the head vertical displacement differs across sentence types (statement versus reference level: ß = 6.74, SE = 0.44, t = 15.44, p < .001; yes-no question with one visual cue versus reference level: ß = -2.22, SE = 0.60, t = −3.71, p < .001).
As clearly shown in Figure 7, the head falling movement as the single visual cue for yes-no questions (blue curve) is considerably more pronounced than the same visual cue in statements (green curve), showing that these utterances are not ambiguous in production. The vertical head movement that co-occurred with eyebrow lowering in yes-no questions (red curve) is also different from the head movement produced in statements (green curve), indicating that interrogativity is conveyed by a larger degree of head displacement. We may also observe in Figure 7 that both curves start at closer height position, but the head in yes-no questions produced with eyebrow lowering moves downwards before 200 ms and attains a larger amplitude. When comparing the curves of both interrogatives (with two visual cues - red - and one visual cue – blue), two salient features emerge. First, when only the head is used, its falling movement is more pronounced than in yes-no questions produced with eyebrows, thus showing a larger amplitude. Second, there is also a difference at the beginning of the head movement: in yes-no questions with one visual cue, it starts at a lower position (in the vertical axis), which leads to a short rising movement (see the time-window of the first 200 ms), contrasting with the falling movement of the head in yes-no questions with two visual cues, which starts before 200 ms, as mentioned above.
Moreover, the shape of the curves is different across sentence types. Namely, statements clearly contrast with questions, as we may observe an initial peak, followed by a falling movement, and a final rising movement, corresponding to the retraction phase (McNeill, Reference McNeill1992). In questions, the shape of the movement is more complex and also differs depending on the amount of nonmanuals in the signal: yes-no questions produced with two visual cues (more complex) also present more complexity in the head movement – an initial peak followed by two (lower) peaks, reminding a binomial distributionFootnote 2, and then the retraction; yes-no questions with one visual cue (less complex) show a head movement with less complexity – an initial peak followed by a lower peak, and then the retraction.
These findings confirm that the head movement amplitude plays a relevant role in conveying interrogativity by distinguishing yes-no questions from statements, and that it varies with the amount of non-manuals in the signal.
Finally, we also observe that the highest displacement values of the head movement are attained at the same time-window (500-850 ms), independently of the sentence type. Given that nonmanuals are part of the intonational system of signed languages (Dachkovsky & Sandler, Reference Dachkovsky and Sandler2009), we may thus hypothesize that the head is a cue for the prosodic nucleus. However, this requires further analysis in order to check the exact time-alignment of the head movements with manuals, which is not possible in the current study, as sentences were time-normalized.
4.2. Spoken modality of Portuguese
Figure 8 shows head vertical displacement in the spoken modality along the normalized time series, per sentence type. We observe that statements (blue line) are less marked by the head movement, like in LGP, ranging from 0.07 to −5.85 pxs, than yes-no questions (gray and orange lines), which show a wider degree of head displacement.
Also like in LGP, in the spoken modality yes-no questions produced with the head as the single visual cue exhibit the widest amplitude, ranging from 0.24 to −16.59 pxs (orange line), whereas yes-no questions produced with head and eyebrow movements present a smaller amplitude, ranging from 0.04 to −10.16 pxs (gray line). Again, the head amplitude seems to play a relevant role in conveying interrogativity, especially when it is the single visual cue and even in the presence of different auditory cues, despite the fact that visual cues were previously found to be less relevant than auditory ones to perceive questions (Cruz, Swerts & Frota, Reference Cruz, Swerts, Frota, Leemann and Post2017).
To confirm whether head amplitude plays such a relevant role, a GAM was run with head vertical displacement as dependent variable, and sentence type as predictor, with yes-no question with two visual cues as reference level – Model 1. We also ran Model 2 (GAMM), with participant as random factor. Model 2 showed a better fit than Model 1 (fREML = 29607 versus fREML = 30330), explaining 26% of the deviance (contra 9.83% in Model 1). We thus report the results from Model 2, plotted in Figure 9. As in LGP, in the spoken modality, the head vertical displacement differs across sentence types (statement versus reference level: ß = 3.08, SE = 0.38, t = 8.17, p < .001; yes-no question with one visual cue versus reference level: ß = –6.09, SE = 0.42, t = −14.6, p < .001).
As in LGP, in the spoken modality, the head falling movement as single visual cue for yes-no questions (blue curve) is more pronounced than the same visual cue in statements (green curve), suggesting that these productions are not visually ambiguous, although the auditory cues already distinguished between the sentence types. In yes-no questions, against our hypothesis and in line with what is described in section 4.1 for LGP, the amplitude of the head falling movement is larger when one single visual cue is available (blue curve) than when both visual cues are present (red curve).
Differently from LGP, however, in the spoken modality all curves start at a closer height position and the head moves downwards in all sentence types. In yes-no questions with one visual cue, there is a clear difference between LGP and the spoken modality: in the latter, the falling head movement starts from the very beginning of the production, contrasting with the delayed falling movement in the same sentence type in LGP.
A comparison between the shape of the curves in the spoken modality with those observed in LGP (Figure 7) reveals that, unlike in LGP, in the spoken modality there is no shape variation, as all sentence types exhibit an initial peak, followed by a falling movement, and a final rising movement, corresponding to the retraction phase (McNeill, Reference McNeill1992). This leads us to hypothesize that, because visual information is less relevant than auditory information for speakers, it is less distinctive than in LGP. Nevertheless, the highest displacement values of the head movement in the spoken modality are attained at the same time-window as in LGP (500-850 ms), independently of the sentence type. In the spoken modality, this is probably time-aligned with the nuclear prosodic word. If so, this would be in line with our hypothesis that the head, in LGP, is a cue for the prosodic nucleus. However, as mentioned above, this requires further analysis in order to check the exact time-alignment of the head movements with speech.
These findings lead us to conclude that the degree of head vertical displacement is similar in LGP and in the spoken modality, which goes against our initial hypothesis that the magnitude of the head movement would be smaller in the spoken modality, given the presence of contrasting auditory cues across sentence types. The difference across modalities seems to lie on the complexity of the head movement shape – more complex in LGP than in the spoken modality –, which seems to be related with the weight of visual cues in the signal. Namely, in the spoken modality visual cues are not as relevant as auditory cues to convey prosodic meaning, whereas they are critical in the signed modality.
4.3. LGP versus Spoken modality of Portuguese
Figure 10 depicts the kinematics of head movement per sentence type across modalities.
To examine the degree of similarity in the kinematics of the head movement between modalities (with both modalities showing a wider amplitude when the eyebrows are not present), we ran a GAMM with Modality (spoken versus signed) and sentence type as predictors, with spoken modality as reference level. Because the mixed models ran for each sample exhibited a better fit than the models without the random factor, participant was here included as random factor (Model 1). Then, the interaction Sentence type*Modality was also modeled over time, additionally to Modality and Sentence type as predictors (Model 2), as we were interested in observing whether the head vertical displacement is different across sentence types, depending on the modality. Model 2, with the interaction term, showed a better fit than Model 1 (fREML = 39913 versus fREML = 40051), explaining 17.3% of the deviance (contra 15% in Model 1). We thus report the results of Model 2. We found that the head vertical displacement differs between modalities (ß = –2.03, SE = 0.42, t = −15.52, p < .001), and that the interaction Sentence Type*Modality shows a significant effect. This means that the difference between the head vertical displacement in statements and yes-no questions with two visual cues is not the same across modalities (ß = 2.68, SE = 0.67, t = 4.00, p < .001). Similarly, the difference between LGP and the spoken modality in the contrast between yes-no questions with one visual cue and two visual cues is not the same (ß = 14.00, SE = 0.84, t = 16.59, p < .001).
As shown in Figure 10, the head amplitude is larger in LGP than in the spoken modality of Portuguese. This finding, together with the significant different kinematics per sentence type across modalities, strongly suggests that although both modalities make use of head movement to convey interrogativity in phonologically relevant ways for both grammars, head movement displays realizational differences, being more prominent in LGP.
5. Discussion
This study had the main goal of examining if (and how) the amplitude of head movements conveys interrogativity both in Portuguese Sign Language (LGP) and in the spoken modality of European Portuguese. In the light of findings from previous work (Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2019), we hypothesized that the amplitude of the head falling movement in LGP would be larger in yes-no questions produced without eyebrow movement than in statements, thus functioning as a disambiguating cue, taking the role of question marker. Also in the light of findings from previous work (Cruz, Swerts & Frota, Reference Cruz, Swerts and Frota2015, Reference Cruz, Swerts, Frota, Leemann and Post2017), we hypothesized that in the spoken modality a different amplitude of head movements between statements and yes-no questions would not be observed, given the contrasting auditory (intonational) cues across sentence types.
In LGP, we found that the head falling movement as the single visual cue for yes-no questions was significantly more pronounced than the same visual cue in statements, thus showing that these utterances were not ambiguous in production, and that interrogativity seems indeed to be conveyed by a larger degree of head displacement. Importantly, this feature was not limited to potentially ambiguous sentences in LGP. When eyebrow lowering and the head movement co-occurred, we also found a significantly wider head amplitude than in statements. These findings suggest that the presence of eyebrow movement might not be as important as the head movement in conveying interrogativity in LGP, probably because eyebrow movement does not vary across question types in LGP, differently from the majority of signed languages (Dachkovsky & Sandler, Reference Dachkovsky and Sandler2009; Dachkovsky, Healy & Sandler, Reference Dachkovsky, Healy and Sandler2013; Pfau & Quer, Reference Pfau, Quer and Brentari2010; Sandler, Reference Sandler and Brown2005; Zeshan, Reference Zeshan2004). Moreover, a smaller head displacement was found in yes-no questions with eyebrow lowering than in yes-no questions without the eyebrow lowering cue. Therefore, our hypothesis that in the absence of the eyebrow cue the head movement would take the role of question marker seems to be confirmed, which reinforces the dynamicity and multidimensionality of the head gesture (Wagner, Malisz & Kopp, Reference Wagner, Malisz and Kopp2014), and shows that the combination of nonmanuals is a complex and dynamic mechanism, subject to a relative weighting across nonmanuals.
This dynamicity of the head movement in LGP can also be observed in its shape over time, especially in yes-no questions produced with the eyebrow lowering cue. The more complex the nonmanuals, in terms of amount of cues, the more complex the head shape over time. This leads to the interpretation that the head plays a primary role in conveying interrogativity in LGP.
Similarly to LGP, in the spoken modality the head falling movement as single visual cue for yes-no questions was significantly more pronounced than the same visual cue in statements and in yes-no questions produced with two visual cues (eyebrow raising and head movement). This was unexpected due to previous findings from perception showing that speakers rely on the different intonational cues (falling pitch in statements; falling-rising pitch in yes-no questions) to identify sentence types (Cruz, Swerts & Frota, Reference Cruz, Swerts, Frota, Leemann and Post2017). Thus, the hypothesis that head movement would not play a role in the spoken modality seems not to be confirmed. However, our findings related with the head movement shape over time, namely, the fact that it is always falling, independently of the sentence type, suggest that head movement is less relevant to convey interrogativity in the spoken modality than in LGP, indicating that the visual cues used to convey interrogativity might be grammaticalized differently across modalities. These different grammaticalization paths of visual cues in spoken and signed languages are, from our point of view, supported by the biology of each language modality: visual cues in signed languages are vital for communication and they are fully visible, thus being also faster recognized than words (Emmorey, Reference Emmorey2023), whereas in the spoken modality this crucial role is played by vocal articulators and the related biological codes (Gussenhoven, Reference Gussenhoven2002, Reference Gussenhoven2004).
The current study has three important limitations. Our dataset was produced by a small sample of participants, and thus the potential generalization of the present findings to wider populations should be interpreted with caution. In addition, for the current study it was not possible to relate the semantic-pragmatic content of speech/signs, beyond sentence type, to the head movement trajectories, given the use of normalized time series. Finally, our option of analysing the head movement in the y axis only (because we were focused on its vertical displacement) can also be seen as a limitation, as the head is physically dynamic. Future research is needed to address these shortcomings.
6. Conclusion
By and large, the present findings seem to support the view that gestural features of language and prosody are interconnected or integrated in the marking of semantic and pragmatic meanings (Brown and Prieto, Reference Brown, Prieto, Haugh, Kadar and Terkourafi2021). How these two communication vectors are time-aligned in order to embrace the holistic nature of sociopragmatics is left for future research avenues (Cruz & Frota, Reference Cruz and Frotain progress). Also of relevance is the investigation of how head and eyebrow movements are time-coordinated with each other (Cruz & Frota, Reference Cruz, Frota, Skarnitzl and Volín2023), and whether, similarly to what we observed for the head movement, the amplitude of the eyebrow movement is semantico-pragmatically relevant.
Last but not least, and in line with Sandler (Reference Sandler and Brown2005), the comparison of visual cues in both modalities allowed us to suggest that the two systems use the degree of head movement to convey interrogativity, albeit with differences in its grammaticalization and realization. Even in the spoken modality, in which auditory cues are preferred over visual cues to identify sentence types, the head movement varies, thus pointing to the suggestion that this feature might be a cross-modality linguistic property, with a relevant role in interaction.
These findings also challenge traditional language processing models, mainly focused on verbal language. Although this was not the focus of the current study, in line with Kelly, Özyürek and Maris (Reference Kelly, Özyürek and Maris2010), inter alia, we speculate that the amplitude of head movement might be relevant for a faster and accurate identification of sentence types, thus enhancing comprehension. Future studies should address this possibility. Importantly, how these cognitive processes apply and evolve in a communicative context involving both modalities remains an open question, as it is not yet known whether the processing of the interaction between language and gesture differs between spoken and signed languages (Kita & Emmorey, Reference Kita and Emmorey2023).
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/langcog.2024.63.
Acknowledgments
This research was supported by Fundação para a Ciência e a Tecnologia (UIDB/00214/2020). We thank all the signers and speakers for their collaboration, as well as Associação Portuguesa de Surdos (APS) and the Lisbon Baby Lab, where video recordings took place. Special thanks are also due to the anonymous reviewers for their precious comments and questions, as well as to Pedro Jordão for the blind annotations of the data, and Chao Zhou for helping with the statistical analysis.
Competing interest
The authors declare none.