To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The distributional representation of a lexical item is typically a vector representing its co-occurrences with linguistic contexts. This chapter introduces the basic notions to construct distributional semantic representations from corpora. We present (i) the major types of linguistic contexts used to characterize the distributional properties of lexical items (e.g., window-based and syntactic collocates and documents) , (ii) their representation with co-occurrence matrices, whose rows are labeled with lexemes and columns with contexts, (iii) mathematical methods to weight the importance of contexts (e.g., Pointwise Mutual Information and entropy), ( iv) the distinction between high-dimensional explicit vectors and low-dimensional embeddings with latent dimensions, (v) dimensionality reduction methods to generate embeddings from the original co-occurrence matrix (e.g., Singular Value Decomposition), and (vi) vector similarity measures (e.g., cosine similarity).
This chapter contains a synoptic view of the different types and generations of distributional semantic models (DSMs), including the distinction between static and contextual models. Part II then focuses on static DSMs, since they are still the best known and widely studied family of models, and they learn context-independent distributional representations that are useful for several linguistic and cognitive tasks.
Neural machine translation is not neutral. The increased linguistic fluency and naturalness as the hallmark of neural machine translation sometimes runs the risk of trans-creation, which bends the true meaning of the source text to accommodate the conventionalized, preferred use and interpretation of concepts, terms and expressions in the target language and cultural system. This chapter explores the cultural and linguistic bias of neural machine translation of English educational resources on mental health and well-being, highlighting the urgent need to develop and redesign machine translation systems to produce more neutral and balanced machine translation outputs for global end users, especially people from vulnerable social backgrounds.
Access to healthcare profoundly impacts the health and quality of life of Deaf people. Automatic translation tools are crucial in improving communication between Deaf patients and their healthcare providers. The aim of this chapter is to present the pipeline used to create the Swiss-French Sign Language (LSF-CH) version of BabelDr, a speech-enabled fixed phrase translator that was initially conceived to improve communication in emergency settings between doctors and allophone patients (Bouillon et al., 2021). In order to do so, we start off by explaining how we ported BabelDr in LSF-CH using both human and avatar videos. We first describe the creation of a reference corpus consisting of video translations done by human translators, then we present a second corpus of videos generated with a virtual human. Finally, we relate the findings of a questionnaire on Deaf users’ perspective on the use of signing avatars in the medical context. We showed that, although respondents prefer human videos, the use of automatic technologies associated with virtual characters is not without interest to the target audience and can be useful to them in the medical context.
Distributional semantics develops theories and methods to represent the meaning of natural language expressions, with vectors encoding their statistical distribution in linguistic contexts. It is at once a theoretical model to express meaning, a practical methodology to construct semantic representations, a computational framework for acquiring meaning from language data, and a cognitive hypothesis about the role of language usage in shaping meaning. This book aims to build a common understanding of the theoretical and methodological foundations of distributional semantics. Beginning with its historical origins, the text exemplifies how the distributional approach is implemented in distributional semantic models. The main types of computational models, including modern deep learning ones, are described and evaluated, demonstrating how various types of semantic issues are addressed by those models. Open problems and challenges are also analyzed. Students and researchers in natural language processing, artificial intelligence, and cognitive science will appreciate this book.
Digital health translation is an important application of machine translation and multilingual technologies, and there is a growing need for accessibility in digital health translation design for disadvantaged communities. This book addresses that need by highlighting state-of-the-art research on the design and evaluation of assistive translation tools, along with systems to facilitate cross-cultural and cross-lingual communications in health and medical settings. Using case studies as examples, the principles of designing assistive health communication tools are illustrated. These are (1) detectability of errors to boost user confidence by health professionals; (2) customizability for health and medical domains; (3) inclusivity of translation modalities to serve people with disabilities; and (4) equality of accessibility standards for localised multilingual websites of health contents. This book will appeal to readers from natural language processing, computer science, linguistics, translation studies, public health, media, and communication studies. This title is available as open access on Cambridge Core.
Space and time representation in language is important in linguistics and cognitive science research, as well as artificial intelligence applications like conversational robots and navigation systems. This book is the first for linguists and computer scientists that shows how to do model-theoretic semantics for temporal or spatial information in natural language, based on annotation structures. The book covers the entire cycle of developing a specification for annotation and the implementation of the model over the appropriate corpus for linguistic annotation. Its representation language is a type-theoretic, first-order logic in shallow semantics. Each interpretation model is delimited by a set of definitions of logical predicates used in semantic representations (e.g., past) or measuring expressions (e.g., counts or k). The counting function is then defined as a set and its cardinality, involving a universal quantification in a model. This definition then delineates a set of admissible models for interpretation.
Temporal prepositions trigger various temporal relations over events and times. In this chapter, I categorize such temporal relators into five types: (i) anchoring (at, in), (ii) ordering (before, after),(iii) metric (for, in), (iv) bounding (from -- till), and (v) orienting (time interval + before, after). These temporal relators are analyzed with respect to the tripartite temporal configurations <E,R,T>, where E is a set of eventualities, R is a set of temporal relators, and T is a set of associated temporal structures, which subsume metric structures. Temporal relators combine with temporal expressions to form temporal adjuncts, either simple or complex. Complex temporal adjuncts introduce time intervals as nonconsuming tag in annotation, while relating eventualities to temporal structures. Each temporal relator r in R combines with a temporal structure t in T as its argument to form a temporal adjunct, while relating an eventuality e in E of various aspectual types such as state, process, or transition to an appropriate temporal structure t in T. This chapter clarifies such temporal relations by annotating and interpreting event and temporal base structures and their relations.
In this chapter, I explain how TimeML, a specification language for the annotation of event-associated temporal expressions in language, was normalized as an ISO international standard, known as ISO-TimeML, with some modifications. ISO’s working group developed TimeML into an ISO standard on event-associated temporal annotation by making four modifications of TimeML: (i) abstract specification of the annotation scheme, (ii) adoption of standoff annotation, (iii) merging of two tags, <EVENT/> and <MAKEINSTNACE/>, to a single tag <EVENT/>, and (iv) treating duration (e.g., two hours) as measurement. Following Bunt’s (2010) proposal for the construction of semantic annotation schemes and his subsequent work, I then formalize ISO-TimeML by presenting a metamodel for its design, an abstract syntax for formulating its specification language in set-theoretic terms, and an XML-based concrete syntax. I also analyze base structures as consisting of two substructures, anchoring and content structures, into the annotation structures of the normalized TimeML.
This chapter formulates an annotation-based semantics (ABS) for the annotation and interpretation of temporal and spatial information in language. It consists of two modules, one for representation and another for interpretation. The representation module consists of a type-theoretic first-order logic with a small set of merge operators. The theory of types is based on the extended list of basic types, which treats eventualities and points of time and space as basic types besides the two basic types e for entities and t for truth-values. These types extend the Neo-Davidsonian semantics to all types of objects including paths and vectors (trajectories) triggered by motions. The merge operators in ABS allow the compositional process of combining the semantic representation of base structures into that of the link structures that combine them without depending on complex lambda operations. ABS adopts shallow semantics to represent complex structures of eventuality or quantificaiton with simple logical predicates defined as part of admissible interpretation models.
The task of semantics for annotation is two-fold. First, semantics validates the construction of a syntax for the generation of well-formed annotation structures. Second, semantics provides a formalism for interpreting those annotation structures that are generated by the syntax. In this chapter, my main concern is to present a general view of what kind of semantics is needed to interpret annotation structures and to lay a general ground for constructing an interpretation scheme for temporal and spatial annotation. This semantics follows the ordinary steps of doing model-theoretic formal semantics such as Montague semantics. It goes through an intermediate step of representing semantic content or denotations in logical forms and then interprets them with respect to a model with truth definitions.
Data can be segmented into minimal units. Such a process is called base segmentation. In this chapter, I discuss three types of base segmentation of language data, depending on its three media types: phoneme segmentation, image segmentation, and text segmentation. They can be grouped into larger units. Base segmented text, for instance, undergo tokenization, annotated segmentation such as word segmentation, and chunking with POS-tagging. The semantic annotation of language data, whether written, spoken, or visualized, requires the target data to be segmented and preferably annotated with appropriate morpho-syntactic information.
This chapter shows how ISO-Space evolved from MITRE’s SpatialML. Both present a specification language for the annotation of spatial information on geographical names and landmarks and also directional information involving orientations. Unlike SpatialML, ISO-Space extended its scope to motions and motion-triggered dynamic paths. ISO-Space also generalized distance measures to measures of other types and dimensions so that spatial annotation can be integrated with other semantic annotation schemes such as temporal annotation (e.g., ISO-TimeML). I also discuss how spatial relators, called signals, are enriched with fine-grained specifications, especially related to directional or orientational configurations involving frames of reference. SpatialML is a compact and very simple annotation scheme and is made easily mappable to other geospatial annotation schemes. In contrast, ISO-Space is more expressive and complex than SpatialML, meeting semantic needs of interpreting complex spatial language and computational needs of application envisioned in the coming years.
In this chapter, I discuss how the abstract syntax for a semantic annotation scheme is modeled on a formal grammar of language. I design a semantic annotation scheme as consisting of three components: a (nonempty) set of annotation structures, a syntax, and a semantics. The metatheoretic syntax formally defines, or generates, annotation structures each of which consists of base structures and link structures. The semantics interprets annotation structures, while validating the formulation of the syntax. I also discuss how the specification of attribute-value assignments determine the well-formedness of annotation structures and its substructures, anchoring and content (feature) structures. This chapter focuses on the formulation of an abstract syntax.
Viewing ontology as a science of things, this chapter treats times as real objects in the world. Such a view of ontology of times, called temporal ontology, conforms to Neo-Davidsonian semantics and to the type-theoretic semantics, which treats time points as one of the basic types that include individual objects, events, and spatial points. It is thus designed to provide a sound basis for the development of a semantics for the annotation and interpretation of event-based temporal information in language. In this chapter, I first introduce OWL-Time ontology which classifies temporal entities into instances and intervals. I then introduce an interval temporal calculus with 13 base relations over time points and intervals. I also discuss how eventualities are temporalized to be treated as denoting time intervals. Eventualities are then temporally related to times. To apply the notions of time points and intervals to the interpretation of tenses and aspects of language, especially the progessive aspect and the present perfective aspect, I define the notion of neighborhood and apply it to the definition of the present perfect as denoting the neighborhood of the present moment.