To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The storage and retrieval of scientific texts were early applications of computers, and by the early 1960s, schemes for automatic indexing and abstracting had emerged (e.g., Doyle, 1965; Luhn, 1957, 1958; O'Connor, 1964; Tasman, 1957). As online systems emerged in the 1960s and 1970s, more databases and new search features were created to give professional intermediaries more power in searching for information. Searching in online systems was complex, and so intermediaries created systematic strategies for eliciting users' needs; selecting terms, synonyms, and morphological variants appropriate to the need and the system; using Boolean operators to formulate precise queries; restricting those queries to specific database fields; forming intermediate sets of results; manipulating those sets; and selecting appropriate display formats. The strategies and tactics that professional intermediaries use are meant to maximize retrieval effectiveness while minimizing online costs. These strategies are goal oriented and systematic and are termed analytical strategies. In this chapter, we describe several analytical strategies to illustrate how electronic environments have changed information seeking by allowing searchers to systematically manipulate large sets of potentially relevant documents. These strategies in turn influenced subsequent designs of online systems. Next we look at studies of novice users working with various online systems, showing how difficult analytical strategies are to learn and apply, and the need for electronic systems that support informal information-seeking strategies for end users.
Marco Polo had the opportunity of acquiring a knowledge, either by his own observation or what he collected from others, of so many things, until his time unknown.
The Travels of Marco Polo
The laws of behavior yield to the energy of the individual.
Emerson, Essays, Second Series: Manners
In contrast with the formal, analytical strategies developed by professional intermediaries, information seekers also use a variety of informal, heuristic strategies. These informal, interactive strategies are clustered together under the term browsing strategies. In general, browsing is an approach to information seeking that is informal and opportunistic and depends heavily on the information environment. Four browsing strategies are distinguished in this chapter: scanning, observing, navigating, and monitoring. The term browsing reflects the general behavior that people exhibit as they seek information by using one of these strategies.
Browsing is a natural and effective approach to many types of information-seeking problems. It is natural because it coordinates human physical, emotive, and cognitive resources in the same way that humans monitor the physical world and search for physical objects. It can be effective because the environment and particularly human-created environments are generally organized and highly redundant – especially information environments that are designed according to organizational principles. Browsing is particularly effective for information problems that are ill defined or interdisciplinary and when the goal of information seeking is to gather overview information about a topic or to keep abreast of developments in a field.
One of the fundamental problems of lexical semantics is the fact that what C. Ruhl (1989) calls the ‘perceived meaning’ of a word can vary so greatly from one context to another. In this chapter I want to survey the ways in which the contribution the same grammatical word makes to the meaning of a larger unit may differ in different contexts. There are two main sources of explanatory hypotheses for contextual variations in word meaning: lexical semantics and pragmatics. While there are probably no contexts where each of these is not involved in some way, their relative contributions can vary. For instance, in the following examples the difference between 1 and 2 in respect of the interpretation of the word teacher (i.e.,“male teacher” and “female teacher”, respectively) can be accounted for entirely by differential contextual enrichment of a single lexical meaning for teacher (in other words, pragmatically):
1. The teacher stroked his beard.
2. Our maths teacher is on maternity leave.
The only involvement of lexical semantics here is that the specification of the meaning of teacher must somehow make it clear that although it is unspecified for sex, it is, unlike, say, chair, specifiable for sex. Examples 3 and 4 exemplify a slightly different type of contextual enrichment, in that the extra specificity in context is of a meronymous rather than a hyponymous type:
3. John washed the car.
4. The mechanic lubricated the car.
The different actions performed on the car allow us to infer that the agents were occupied with different parts of the car in each case.
This chapter builds on the title and theme of Apresjan's 1974 paper, Regular Polysemy. Apresjan was concerned merely to define the phenomenon and identify where it occurred. Here, we shall explore how it can be exploited.
Regular polysemy occurs where two or more words each have two senses, and all the words exhibit the same relationship between the two senses. The phenomenon is also called ‘sense extension’ (Copestake & Briscoe, 1991), ‘semantic transfer rules’ (Leech, 1981), ‘lexical implication rules’ (Ostler & Atkins, 1991), or simply ‘lexical rules’. An example, taken direct from a dictionary (Longman Dictionary of Contemporary English, hereafter LDOCE) is:
gin (a glass of) a colourless strong alcoholic drink …
martini (a glass of) an alcoholic drink …
In each case, two senses are referred to, one with the ‘bracketed optional part’ included in the definition and the other with it omitted; the relation between the two is the same in both cases.
Recent work on lexical description has stressed the need for the structure of a lexical knowledge base (LKB) to reflect the structure of the lexicon (Atkins & Levin, 1991) and for the LKB to incorporate productive rules, so the rulebound ways in which words may be used are captured without the lexicon needing to list all options for all words (Boguraev & Levin, 1990). These arguments suggest that generalizations regarding regular polysemy should be expressed in the LKB, and that the formalism in which the LKB is written should be such that, once the generalization is stated, the specific cases follow as consequences of the inference rules of the formalism.
“We want the ring. Without the ring there can be no wedding. May we break the finger?”
“Old age,” said Sir Benjamin, breathing deeply and slowly, “is a time when deafness brings its blessings. I didn't hear what you said then. You have another chance.”
“May we break the finger?” asked Ambrose again. “We could do it with a hammer.”
“I thought, sir,” said Sir Benjamin, “that those were the words. The world is all before you. By God, my hiccoughs have gone, and no wonder. As for ‘break’, ‘break’ is a trull of a word, it will take in everything. Waves, dawns, news, wind, hearts, banks, maidenheads. But never dream of tucking into the same predicate my statues as object and that loose-favoured verb. That would be a most reprehensible solecism.”
The Eve of Saint Venus, Anthony Burgess
Introduction
We are interested in how a verb's description changes as it evolves into different usages. In order to examine this issue we are attempting to combine a LEXICAL ANALYSIS of a particular verb with a CONCEPTUAL ANALYSIS. LEXICAL ANALYSIS is aimed at producing a LEXICAL DEFINITION which functions as a linguistic paraphrase. This should ideally be a lexical decomposition composed of terms with simpler, “more primitive” meanings than the term being defined (Mel'čuk 1988; Wierzbicka 1984). In contrast, a conceptual analysis produces a CONCEPTUAL DESCRIPTION of the lexical item.
Criteria for building a conceptual description are linked to arguments borrowed from “real-world” considerations or expressed in terms of denotations.
A major motivation for the introduction of default inheritance mechanisms into theories of lexical organization has been to account for the prevalence of the family of phenomena variously described as blocking (Aronoff, 1976:43), the elsewhere condition (Kiparsky, 1973), or preemption by synonymy (Clark & Clark, 1979:798). In Copestake and Briscoe (1991) we argued that productive processes of sense extension also undergo the same process, suggesting that an integrated account of lexical semantic and morphological processes must allow for blocking. In this chapter, we review extant accounts which follow from theories of lexical organization based on default inheritance, such as Paradigmatic Morphology (Calder, 1989), DATR (Evans & Gazdar, 1989), ELU (Russell et al., 1991, in press), Word Grammar (Hudson, 1990; Fraser & Hudson, 1992), or the LKB (Copestake 1992; this volume; Copestake et al., in press). We argue that these theories fail to capture the full complexity of even the simplest cases of blocking and sketch a more adequate framework, based on a non-monotonic logic that incorporates more powerful mechanisms for resolving conflict among defeasible knowledge resources (Common-sense Entailment, Asher & Morreau, 1991). Finally, we explore the similarities and differences between various phenomena which have been intuitively felt to be cases of blocking within this formal framework, and discuss the manner in which such processes might interact with more general interpretative strategies during language comprehension. Our presentation is necessarily brief and rather informal; we are primarily concerned to point out the potential advantages using a more expressive default logic for remedying some of the inadequacies of current theories of lexical description.
The work reported here is part of research on the ACQUILEX project which is aimed at the eventual development of a theoretically motivated, but comprehensive and computationally tractable, multilingual lexical knowledge base (LKB) usable for natural language processing, lexicography and other applications. One of the goals of the ACQUILEX project was to demonstrate the feasibility of building an LKB by acquiring a substantial portion of the information semi-automatically from machine readable dictionaries (MRDs). We have paid particular attention to lexical semantic information. Our work therefore attempts to integrate several strands of research:
• Linguistic theories of the lexicon and lexical semantics. In this chapter we will concentrate on the lexical semantics of nominals where our treatment is broadly based on that of Pustejovsky (1991), and in particular on his concepts of the generative lexicon and of qualia structure.
• Knowledge representation techniques. The formal lexical representation language (LRL) used in the ACQUILEX LKB system is based on typed features structures similar to those of Carpenter (1990, 1992), augmented with default inheritance and lexical rules. Our lexicons can thus be highly structured, hierarchical and generative.
• Lexicography and computational lexicography. The work reported here makes extensive use of the Longman Dictionary of Contemporary English (LDOCE; Procter, 1978). MRDs do not just provide data about individual lexical items; our theories of the lexicon have been developed and refined by considering the implicit organization of dictionaries and the insights of lexicographers.
In this chapter we will show how these strands can be combined in developing an appropriate representation for group nouns in the LRL, and in extracting the requisite information automatically from MRDs.
Lexical semantics is still in a rather early stage of development. This explains the reason why there are relatively few elaborated systems for representing its various aspects. A number of semantic aspects can be straightforwardly represented by means of feature-value pairs and by means of typed feature structures. Others, such as the Qualia Structure or the lexical semantics relations are more difficult to represent. The difficulty is twofold: (1) there is first the need to define appropriate models to represent the various levels of semantic information, including their associated possible inference systems and their properties (e.g., transitivity, monotonicity, etc.) and (2) there is the need to develop complex algorithms that allow for as efficient as possible treatments from these models. The first chapter of this section tackles the first point and the second one addresses some algorithmic problems.
“Introducing Lexlog” by J. Jayez, is a set of specifications for constructing explicit representations for lexical objects in restricted domains. Lexlog offers two types of functions: control functions to formalize representations and updatings on these representations in a controlled way, and expression functions to express different semantic operators and to tailor these operators with syntactic operators, for example, the trees of the Tree Adjoining Grammar framework. The discussion ends with a detailed presentation of the implementation in Prolog.
The last chapter, “Constraint propagation techniques for lexical semantics descriptions,” by Patrick Saint-Dizier, addresses the problem of the propagation in parse trees of large feature structures. The motivation is basically to avoid computations of intermediate results which later turn out to be useless.
Another current major issue in lexical semantics is the definition and the construction of real-size lexical databases that will be used by parsers and generators in conjunction with a grammatical system. Word meaning, terminological knowledge representation and extraction of knowledge in machine readable dictionaries are the main topics addressed. They really represent the backbone of a lexical semantics knowledge base construction.
The first chapter, “Lexical semantics and terminological knowledge representation” by Gerrit Burkert, shows the practical and formal inadequacies of semantic networks for representing knowledge, and the advantages of using a term subsumption language. In a first stage, this document shows how several aspects of word meaning can be adequately described using a term subsumption language. Then, some extensions are proposed that make the system more suitable for lexical semantics. Formal aspects are strongly motivated by several examples borrowed from an in-depth study of terminological knowledge extraction, which is a rather challenging area for lexical semantics.
“Word meaning between lexical and conceptual structure”, by Peter Gerstl, presents a method and a system to introduce world-knowledge or domain-dependent knowledge in a lexicon. The meaning of a word is derived from general lexical information on the one hand and from ontological knowledge on the other hand. The notion of semantic scope is explored on an empirical basis by analyzing in a systematic way the influences involved in natural language expressions. This component has been integrated into the Lilog system developed at IBM Stuttgart.
One of the most difficult areas for research in machine translation (MT) is the representation of meanings in the lexicon. The lexicon plays a central role in any MT system, regardless of the theoretical foundations upon which the system is based. However, it is only recently that MT researchers have begun to focus more specifically on issues that concern the lexicon, e.g., cross-linguistic variations that arise during the mapping between lexical items in the source and target languages.
The traditional approach to constructing dictionaries for MT has been to massage on-line dictionaries that are primarily intended for human consumption. Given that most natural language applications have focused primarily on syntactic information that can be extracted from the lexicon, these methods have constituted a reasonable first-pass approach to the problem. However, it is now widely accepted that MT requires language-independent conceptual information in order to successfully process a wide range of phenomena in more than one language. Thus, the task of constructing lexical entries has become a much more difficult problem as researchers endeavor to extend the concept base to support more phenomena and additional languages.
This chapter describes how parameterization of the lexicon allows an MT system to account for a number of cross-linguistic variations, called divergences, during translation. There are many cases in which the natural translation of one language into another results in a very different form than that of the original. These divergences make the straightforward transfer from source structures into target structures impractical.
This volume on computational lexical semantics emerged from a workshop on lexical semantics issues organized in Toulouse, France, in January 1992. The chapters presented here are extended versions of the original texts.
Lexical semantics is now becoming a major research area in computational linguistics and it is playing more of a central role in various types of applications involving natural language parsers as well as generators.
Lexical semantics covers a wide spectrum of problematics from different disciplines, from psycholinguistics to knowledge representation and to computer architecture, which makes this field relatively difficult to perceive as a whole. The goal of this volume is to present the state of the art in lexical semantics from a computational linguistics point of view and from a range of perspectives: psycholinguistics, linguistics (formal and applied), computational linguistics, and application development. The following points are particularly developed in this volume:
psycholinguistics: mental lexicons, access to lexical items, form of lexical items, links between concepts and words, and lexicalizing operations;
linguistics and formal aspects of lexical semantics: lexical semantics relations, prototypes, conceptual representations, event structure, argument structure, and lexical redundancy;
knowledge representation: systems of rules, treatment of type coercion, aspects of inheritance, and relations between linguistics and world knowledge;
applications: creation and maintenance of large-size lexicons, the role of the lexicon in parsing and generation, lexical knowledge bases, and acquisition of lexical data;
operational aspects: processing models and architecture of lexical systems.
Lexical semantics offers a large variety of uses in natural language processing and it obviously allows for more refined treatments. One of the main problems is to identify exactly the lexical semantic resources that one needs to solve a particular problem. Another main difficulty is to know how best to organize this knowledge in order to keep the system reasonably efficient and maintainable; this is particularly crucial for a number of large-scale applications. This volume contains two chapters that explore application of lexical semantics in the area of natural language generation and in the area of machine translation with an interlingua representation.
The first chapter of this section, “Lexical functions of the Explanatory Combinatorial Dictionary for lexicalization in text generation”, by Margarita Alonso Ramos et al., applies Mel'čuk's framework to natural language generation. It shows that the problem of lexicalization, i.e., the relation between a concept (or a combination of concepts) and its linguistic realization, cannot really be correctly carried out without making reference to a lexicon that takes into account the diversity of the lexico-semantics relations. This approach views lexicalization both as a local process (lexicalization is solved within a restricted phrase) and a more global one, taking into account the ‘contextual effects’ of a certain lexicalization with respect to the others in a sentence or in a text. Paradigmatic lexical functions are shown to be well adapted to treat lexicalization in the context of a text, whereas syntagmatic ones operate at the sentence or proposition levels.
Language comprehension and interpretation can be set within a general cognitive science perspective that encompasses both human minds and intelligent machine systems. This comprehensive view, however, must necessarily entail the search for compatibility between various types of concepts or models subsuming classes of facts obtained through specific methods, and hence belonging to various scientific domains, from artificial intelligence to cognitive psychology, and ranging over linguistics, logics, neurosciences, and others.
No sub-domain of cognitive science is more fascinating for the exploration of such functional identities between artificial and natural processing or representation and none deserves more to be comparatively worked out than language comprehension or interpretation. The purpose of this chapter is to highlight some of these identities, but also some differences, in particular as concerns lexical semantics.
A large number of concepts are obviously shared by computational semantics and cognitive psychology. I will approach them in this chapter mainly in the form of relational properties, belonging to lexical-semantic, or lexical-conceptual, units, and will classify them according to whether they can be attributed, or not, to both mental and machine representations. For the sake of simplicity I will often restrict this comparison to consideration of lexical units that are expressed by nouns in most natural languages and which, as a rule, denote classes of objects or individuals. But other kinds of lexical-conceptual units, expressed in these natural languages by verbs, adjectivals, prepositions, function words, etc., and which denote events or actions, properties, various sorts of relations, etc., could also be submitted to this type of analysis and comparison.
The basic elements of most recent Natural Language Understanding (NLU) systems are a syntactic parser which is used to determine sentence structure, and a semantic lexicon whose purpose is to access the system's factual knowledge from the natural language input. In this regard, the semantic lexicon plays the key role of relating words to world knowledge. But the semantic lexicon is also used in solving some specifically linguistic issues when recovering sentence structure, and should contain linguistic knowledge. In this chapter we discuss the issue of lexical content in terms of linguistic and world knowledge, through the so-called dictionary–encyclopedia controversy. To illustrate the discussion we will describe the lexical semantics approach adopted in our NLU program processing sentences from medical records (Zweigenbaum and Cavazza, 1990). This program is a small-scale but fully implemented prototype adopting a broad view to NLU, from syntactic analysis to complex domain inferences through model-based reasoning (Grishman and Ksiezyk, 1990). The dictionary–encyclopedia controversy opposes two extreme conceptions of word definitions: according to the dictionary approach a word is described in terms of linguistic elements only, without recourse to world knowledge, whereas an encyclopedic definition includes
an indication of the different species or different stages of the object or process denoted by the word, the main types of behavior of this object or process,… (Mel'čuk and Zholkovsky, 1988).
This point has been discussed by many authors including Katz and Fodor (1963), Eco (1984), Wierzbicka (1985) and Taylor (1989).
Recent works in Computational Linguistics show the central role played by the lexicon in language processing, and in particular by the lexical semantics component. Lexicons tend no longer to be a mere enumeration of feature-value pairs but tend to have an intelligent behavior. This is the case, for example, for generative lexicons (Pustejovsky, 1991) which contain, besides feature structures, a number of rules to create new (partial) definitions of word-senses such as rules for conflation and type coercion. As a result, the size of lexical entries describing word-senses has substantially increased. These lexical entries become very hard to be used directly by a natural language parser or generator because their size and complexity allow a priori little flexibility.
Most natural language systems consider a lexical entry as an indivisible whole which is percolated up in the parse/generation tree. Access to features and feature values at grammar rule level is realized by more or less complex procedures (Shieber, 1986; Johnson, 1990; Günthner, 1988). The complexity of real natural language processing systems makes such an approach very inefficient and not necessarily linguistically adequate. In this document, we propose a dynamic treatment of features in grammars, embedded within a Constraint Logic Programming framework (noted as CLP hereafter) which permits us to access a feature-value pair associated to a certain word-sense directly into the lexicon, and only when this feature is explicitly required by the grammar, for example, to make a check. More precisely, this dynamic treatment of features will make use of both constraint propagation techniques embedded within CLP and CLP resolution mechanisms.