To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
With the emergence of broad-coverage parsers, quantitative evaluation of parsers becomes increasingly more important. We propose a dependency-based method for evaluating broad-coverage parsers that offers more meaningful performance measures than previous approaches. We also present a structural pattern-matching mechanism that can be used to eliminate inconsequential differences among different parse trees. Previous evaluation methods have only evaluated the overall performance of parsers. The dependency-based method can also evaluate parsers with respect to different kinds of grammatical relationships or different types of lexical categories. An algorithm for transforming constituency trees into dependency trees is presented, which makes the evaluation method applicable to both constituency grammars and dependency grammars.
This paper describes two experiments: one exploring the amount of information relevant to sense disambiguation contained in the part-of-speech field of entries in a Machine Readable Dictionary (MRD); the other, more practical, experiment attempts sense disambiguation of all content words in a text assigning MRD homographs as sense tags using only part-of-speech information. We have implemented a simple sense tagger which successfully tags 94% of words using this method. A plan to extend this work and implement an improved sense tagger is included.
In this paper, we present results of a project that investigated the application of lexicon based text retrieval techniques to Alternative and Augmentative Communication (AAC). As a practical outcome of this research, a communication aid based on message retrieval by key words was designed, implemented and evaluated. The message retrieval module in the system uses a large semantic lexicon, derived from the WordNet database, for query expansion. Trials have been carried out with the device to evaluate whether the approach is suitable for AAC, and to determine the semantic relations that lead to efficient message retrieval. The first part of this paper describes the background of the project and highlights the retrieval requirements for a communication aid, which differ considerably from the requirements in standard text retrieval. We then present the overall design of the WordKeys communication aid and describe the tasks of its sub-modules. We summarise trials that have been carried out to determine the effect of semantic query expansion on the success of message retrieval. Evaluation results show that information about word frequency can solve problems that occurred in the semantic query expansion because of taxonomies that have too many intermediate steps between closely related words. Finally, a user evaluation with the improved system showed that full text retrieval is an effective approach to message access in a communication aid.
Augmentative and Alternative Communication (AAC) is the field of study concerned with providing devices and techniques to augment the communicative ability of a person whose disability makes it difficult to speak or otherwise communicate in an understandable fashion. For several years, we have been applying natural language processing techniques to the field of AAC to develop intelligent communication aids that attempt to provide linguistically correct output while increasing communication rate. Previous effort has resulted in a research prototype called Compansion that expands telegraphic input. In this paper we describe that research prototype and introduce the Intelligent Parser Generator (IPG). IPG is intended to be a practical embodiment of the research prototype aimed at a group of users who have cognitive impairments that affect their linguistic ability. We describe both the theoretical underpinnings of Compansion and the practical considerations in developing a usable system for this population of users.
This article focuses on the need for technological aid for agrammatics, and presents a system designed to meet this need. The field of Augmentative and Alternative Communication (AAC) explores ways to allow people with speech or language disabilities to communicate. The use of computers and natural language processing techniques offers a range of new possibilities in this direction. Yet AAC addresses speech deficits mainly, not linguistic disabilities. A model of aided AAC interfaces with a place for natural language processing is presented. The PVI system, described in this contribution, makes use of such advanced techniques. It has been developed at Thomson-CSF for the use of children with cerebral palsy. It presents a customizable interface helping the disabled to compose sequences of icons displayed on a computer screen. A semantic parser, using lexical semantics information, is used to determine the best case assignments for predicative icons in the sequence. It maximizes a global value, the ‘semantic harmony’ of the sequence. The resulting conceptual graph is fed to a natural language generation module which uses Tree Adjoining Grammars (TAG) to generate French sentences. Evaluation by users demonstrates the system's strengths and limitations, and shows the ways for future developments.
Alternative and Augmentative Communication (AAC) for people with speech and language disorders is an interesting and challenging application field for research in Natural Language Processing. Further advances in the development of AAC systems require robust language processing techniques and versatile linguistic knowledge bases. Also NLP research can benefit from studying the techniques used in this field and from the user-centred methodologies used to develop and evaluate AAC systems. Until recently, however, apart from some exceptions, there was little scientific exchange between the two research areas. This paper aims to make a contribution to closing this gap. We will argue that current interest in language use, which can be shown by the large amount of research on comprehensive dictionaries and on corpora processing, makes the results of NLP research more relevant to AAC. We will also show that the increasing interest of AAC researchers in NLP is having positive results. To situate research on communication aids, the first half of this paper gives an overview of the AAC research field. The second half is dedicated to an overview of research prototype systems and commercially available communication aids that specifically involve more advanced language processing techniques.
Non-speaking people often rely on AAC (Augmentative and Alternative Communication) devices to assist them to communicate. These AAC devices are slow to operate, however, and as a result conversations can be very difficult and frequently break down. This is especially the case when the conversation partner is unfamiliar with this method of communication, and is a big obstacle to many people when they wish to conduct simple everyday transactions. A way of improving the performance of AAC devices by using scripts is discussed. A prototype system to test this idea was constructed, and a preliminary experiment performed with promising results. A practical AAC device which incorporates scripts was then developed, and is described.
Clustering of a translation memory is proposed to make the retrieval of similar translation examples from a translation memory more efficient, while a second contribution is a metric of text similarity which is based on both surface structure and content. Tests on the two proposed techniques are run on part of the CELEX database. The results reported indicate that the clustering of the translation memory results in a significant gain in the retrieval response time, while the deterioration in the retrieval accuracy can be considered to be negligible. The text similarity metric proposed is evaluated by a human expert and found to be compatible with the human perception of text similarity.
Case systems abound in natural language processing. Almost any attempt to recognize and uniformly represent relationships within a clause – a unit at the centre of any linguistic system that goes beyond word level statistics – must be based on semantic roles drawn from a small, closed set. The set of roles describing relationships between a verb and its arguments within a clause is a case system. What is required of such a case system? How does a natural language practitioner build a system that is complete and detailed yet practical and natural? This paper chronicles the construction of a case system from its origin in English marker words to its successful application in the analysis of English text.
We present a lexical platform that has been developed for the Spanish language. It achieves portability between different computer systems and efficiency, in terms of speed and lexical coverage. A model for the full treatment of Spanish inflectional morphology for verbs, nouns and adjectives is presented. This model permits word formation based solely on morpheme concatenation, driven by a feature-based unification grammar. The run-time lexicon is a collection of allomorphs for both stems and endings. Although not tested, it should be suitable also for other Romance and highly inflected languages. A formalism is also described for encoding a lemma-based lexical source, well suited for expressing linguistic generalizations: inheritance classes, lemma encoding, morpho-graphemic allomorphy rules and limited type-checking. From this source base, we can automatically generate an allomorph indexed dictionary adequate for efficient retrieval and processing. A set of software tools has been implemented around this formalism: lexical base augmenting aids, lexical compilers to build run-time dictionaries and access libraries for them, feature manipulation libraries, unification and pseudo-unification modules, morphological processors, a parsing system, etc. Software interfaces among the different modules and tools are cleanly defined to ease software integration and tool combination in a flexible way. Directions for accessing our e-mail and web demonstration prototypes are also provided. Some figures are given, showing the lexical coverage of our platform compared to some popular spelling checkers.
In this paper, we introduce a method to represent phrase structure grammars for building a large annotated corpus of Korean syntactic trees. Korean is different from English in word order and word compositions. As a result of our study, it turned out that the differences are significant enough to induce meaningful changes in the tree annotation scheme for Korean with respect to the schemes for English. A tree annotation scheme defines the grammar formalism to be assumed, categories to be used, and rules to determine correct parses for unsettled issues in parse construction. Korean is partially free in word order and the essential components such as subjects and objects of a sentence can be omitted with greater freedom than in English. We propose a restricted representation of phrase structure grammar to handle the characteristics of Korean more efficiently. The proposed representation is shown by means of an extensive experiment to gain improvements in parsing time as well as grammar size. We also describe the system named Teb that is a software environment set up with a goal to build a tree annotated corpus of Korean containing more than one million units.
This paper presents a new type of nonlinear discourse structure found to be very common in free English texts. This structure reflects nonlinear presentation of the information and knowledge conveyed by the texts. It is argued that such nonlinearity is representationally and informationally advantageous because it allows one to create smaller, more compact texts. The paper presents a heuristics-based, relatively domain-independent algorithm for computing this new text structure. The paper discusses good quantitative and qualitative performance of the algorithm, and presents the results of the extensive tests on a large volume of free English texts.
Natural language interfaces require dialogue models that allow for robust, habitable and efficient interaction. This paper presents such a model for dialogue management for natural language interfaces. The model is based on empirical studies of human computer interaction in various simple service applications. It is shown that for applications belonging to this class the dialogue can be handled using fairly simple means. The interaction can be modeled in a dialogue grammar with information on the functional role of an utterance as conveyed in the linguistic structure. Focusing is handled using dialogue objects recorded in a dialogue tree representing the constituents of the dialogue. The dialogue objects in the dialogue tree can be accessed by the various modules for interpretation, generation and background system access. Focused entities are modeled in entities pertaining to objects or sets of objects, and related domain concept information; properties of the domain objects. A simple copying principle, where a new dialogue object's focal parameters are instantiated with information from the preceding dialogue object, accounts for most context dependent utterances. The action to be carried out by the interface is determined on the basis of how the objects and related properties are specified. This in turn depends on information presented in the user utterance, context information from the dialogue tree and information in the domain model. The use of dialogue objects facilitates customization to the sublanguage utilized in a specific application. The framework has successfully been applied to various background systems and interaction modalities. In the paper results from the customization of the dialogue manager to three typed interaction applications are presented together with results from applying the model to two applications utilizing spoken interaction.
This special issue presents the state-of-the-art in implemented, general-purpose Natural Language Processing (NLP) systems that use nontrivial Knowledge Representation and Reasoning (KRR). These systems use full-scale implementations of traditional KRR techniques as well as some newer knowledge-related processing mechanisms that have been developed specifically to meet the needs of natural language processing. The papers cover a wide range of natural language inputs, knowledge and formalisms, application domains and processing tasks, illustrating the key role that knowledge representation plays in all types of NLP systems.