Search results for Artificial Intelligence and Natural Language Processing

Generating basic skills reports for low-skilled readers *
SANDRA WILLIAMS, EHUD REITER
Journal:

Natural Language Engineering / Volume 14 / Issue 4 / October 2008

Published online by Cambridge University Press:

01 October 2008, pp. 495-525
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We describe SkillSum, a Natural Language Generation (NLG) system that generates a personalised feedback report for someone who has just completed a screening assessment of their basic literacy and numeracy skills. Because many SkillSum users have limited literacy, the generated reports must be easily comprehended by people with limited reading skills; this is the most novel aspect of SkillSum, and the focus of this paper. We used two approaches to maximise readability. First, for determining content and structure (document planning), we did not explicitly model readability, but rather followed a pragmatic approach of repeatedly revising content and structure following pilot experiments and interviews with domain experts. Second, for choosing linguistic expressions (microplanning), we attempted to formulate explicitly the choices that enhanced readability, using a constraints approach and preference rules; our constraints were based on corpus analysis and our preference rules were based on psycholinguistic findings. Evaluation of the SkillSum system was twofold: it compared the usefulness of NLG technology to that of canned text output, and it assessed the effectiveness of the readability model. Results showed that NLG was more effective than canned text at enhancing users' knowledge of their skills, and also suggested that the empirical ‘revise based on experiments and interviews’ approach made a substantial contribution to readability as well as our explicit psycholinguistically inspired models of readability choices.

Definitional and human constraints on structural annotation of English *
GEOFFREY SAMPSON, ANNA BABARCZY
Journal:

Natural Language Engineering / Volume 14 / Issue 4 / October 2008

Published online by Cambridge University Press:

01 October 2008, pp. 471-494
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The limits on predictability and refinement of English structural annotation are examined by comparing independent annotations, by experienced analysts using the same detailed published guidelines, of a common sample of written texts. Three conclusions emerge. First, while it is not easy to define watertight boundaries between the categories of a comprehensive structural annotation scheme, limits on inter-annotator agreement are in practice set more by the difficulty of conforming to a well-defined scheme than by the difficulty of making a scheme well defined. Secondly, although usage is often structurally ambiguous, commonly the alternative analyses are logical distinctions without a practical difference – which raises questions about the role of grammar in human linguistic behaviour. Finally, one specific area of annotation is strikingly more problematic than any other area examined, though this area (classifying the functions of clause-constituents) seems a particularly significant one for human language use. These findings should be of interest both to computational linguists and to students of language as an aspect of human cognition.

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development
SHULY WINTNER
Journal:

Natural Language Engineering / Volume 14 / Issue 4 / October 2008

Published online by Cambridge University Press:

01 October 2008, pp. 457-469
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage morphological grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development. Using a real-world case study, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.

A structural approach to the automatic adjudication of word sense disagreements
ROBERTO NAVIGLI
Journal:

Natural Language Engineering / Volume 14 / Issue 4 / October 2008

Published online by Cambridge University Press:

01 October 2008, pp. 547-573
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The semantic annotation of texts with senses from a computational lexicon is a complex and often subjective task. As a matter of fact, the fine granularity of the WordNet sense inventory [Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database MIT Press], a de facto standard within the research community, is one of the main causes of a low inter-tagger agreement ranging between 70% and 80% and the disappointing performance of automated fine-grained disambiguation systems (around 65% state of the art in the Senseval-3 English all-words task). In order to improve the performance of both manual and automated sense taggers, either we change the sense inventory (e.g. adopting a new dictionary or clustering WordNet senses) or we aim at resolving the disagreements between annotators by dealing with the fineness of sense distinctions. The former approach is not viable in the short term, as wide-coverage resources are not publicly available and no large-scale reliable clustering of WordNet senses has been released to date. The latter approach requires the ability to distinguish between subtle or misleading sense distinctions. In this paper, we propose the use of structural semantic interconnections – a specific kind of lexical chains – for the adjudication of disagreed sense assignments to words in context. The approach relies on the exploitation of the lexicon structure as a support to smooth possible divergencies between sense annotators and foster coherent choices. We perform a twofold experimental evaluation of the approach applied to manual annotations from the SemCor corpus, and automatic annotations from the Senseval-3 English all-words competition. Both sets of experiments and results are entirely novel: structural adjudication allows to improve the state-of-the-art performance in all-words disambiguation by 3.3 points (achieving a 68.5% F1-score) and attains figures around 80% precision and 60% recall in the adjudication of disagreements from human annotators.

Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models
ANJA BELZ
Journal:

Natural Language Engineering / Volume 14 / Issue 4 / October 2008

Published online by Cambridge University Press:

01 October 2008, pp. 431-455
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Two important recent trends in natural language generation are (i) probabilistic techniques and (ii) comprehensive approaches that move away from traditional strictly modular and sequential models. This paper reports experiments in which pcru – a generation framework that combines probabilistic generation methodology with a comprehensive model of the generation space – was used to semi-automatically create five different versions of a weather forecast generator. The generators were evaluated in terms of output quality, development time and computational efficiency against (i) human forecasters, (ii) a traditional handcrafted pipelined nlg system and (iii) a halogen-style statistical generator. The most striking result is that despite acquiring all decision-making abilities automatically, the best pcru generators produce outputs of high enough quality to be scored more highly by human judges than forecasts written by experts.

Aaron Halpern On the Placement and Morphology of Clitics, Stanford, CA, 1995. ISBN 1 881 52660 7, £17.95 (paperback). ISBN 1 881 52661 5, £35.00 (hardback).
Pius Ten Hacken
Journal:

Natural Language Engineering / Volume 1 / Issue 4 / December 1995

Published online by Cambridge University Press:

12 September 2008, pp. 393-396
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Patrick O'Neil, Database Principles, Programming, and Performance. San Francisco, CA: Morgan Kaufmann, 1994. 800 pages. ISBN 1 55860 219 4
Chris Bloor
Journal:

Natural Language Engineering / Volume 1 / Issue 1 / March 1995

Published online by Cambridge University Press:

12 September 2008, pp. 109-110
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

High speed feature unification and parsing
John C. Brown
Journal:

Natural Language Engineering / Volume 1 / Issue 4 / December 1995

Published online by Cambridge University Press:

12 September 2008, pp. 309-338
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Feature unification in parsing has previously used either inefficient Prolog programs, or LISP programs implementing early pre-WAM Prolog models of unification involving searches of binding lists, and the copying of rules to generate edges: features within rules and edges have traditionally been expressed as lists or functions, with clarity being preferred to speed of processing. As a result, parsing takes about 0·5 seconds for a 7-word sentence. Our earlier work produced an optimised chart parser for a non-unification context-free-grammar that achieved 5 ms parses, with high-ambiguity sentences involving hundreds of edges, using the grammar and sentences from Tomita's work on shift-reduce parsing with multiple stack branches. A parallel logic card design resulted that would speed this by a further factor of at least 17. The current paper extends this parser to treat a much more complex unification grammar with structures, using extensive indexing of rules and edges and the optimisations of top-down filtering and look-ahead, to demonstrate where unification occurs during parsing. Unification in parsing is distinguished from that in Prolog, and four alternative schemes for storing features and performing unification are considered, including the traditional binding-list method and three other methods optimised for speed for which overall unification times are calculated. Parallelisation of unification using cheap logic hardware is considered, and estimates show that unification will negligibly increase the parse time of our parallel parser card. Preliminary results are reported from a prototype serial parser that uses the fourth most efficient unification method, and achieves 7 ms for 7-word sentences, and under 1 s for a 36-word 360-way ambiguous sentence with 10,000 edges, on a conventional workstation.

E. Ashcroft, A. Faustini, Anthony, R. Jagannathan and W. Wadge, Multi-dimensional programming. New York, NY: Oxford University Press, 1995. ISBN 019 507597 8, £40.00.
Stavros Kokkotos
Journal:

Natural Language Engineering / Volume 1 / Issue 4 / December 1995

Published online by Cambridge University Press:

12 September 2008, pp. 390-392
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Anaphora and ellipsis in artificial languages
Stephen G. Pulman
Journal:

Natural Language Engineering / Volume 1 / Issue 3 / September 1995

Published online by Cambridge University Press:

12 September 2008, pp. 217-234
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Artificial languages for person-machine communication seldom display the most characteristic properties of natural languages, such as the use of anaphoric or other referring expressions, or ellipsis. This paper argues that useful use could be made of such devices in artificial languages, and proposes a mechanism for the resolution of ellipsis and anaphora in them using finite state transduction techniques. This yields an interpretation system displaying many desirable properties: easily implementable, efficient, incremental and reversible.
Linguists in general, and computational linguists in particular, do well to employ finite state devices wherever possible. They are theoretically appealing because they are computationally weak and best understood from a mathematical point of view. They are computationally appealing because they make for simple, elegant, and highly efficient implementations. In this paper, I hope I have shown how they can be applied to a problem… which seems initially to require heavier machinery.

A rule-based phrase parser for real-time text-to-speech synthesis
Joan Bachenko, Eileen Fitzpatrick, Jeffrey Daugherty
Journal:

Natural Language Engineering / Volume 1 / Issue 2 / June 1995

Published online by Cambridge University Press:

12 September 2008, pp. 191-212
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Text-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.

Editorial
Journal:

Natural Language Engineering / Volume 1 / Issue 1 / March 1995

Published online by Cambridge University Press:

12 September 2008, pp. 1-8
- Article
- - You have access
- PDF
- Export citation

G. Adriaens and U. Hahn (eds.), Parallel Natural Language Processing. Norwood, NJ: Ablex, 1994. $79.50. ISBN 0 89391 869 5
Chris Bowerman
Journal:

Natural Language Engineering / Volume 1 / Issue 1 / March 1995

Published online by Cambridge University Press:

12 September 2008, pp. 110-111
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Technical terminology: some linguistic properties and an algorithm for identification in text
John S. Justeson, Slava M. Katz
Journal:

Natural Language Engineering / Volume 1 / Issue 1 / March 1995

Published online by Cambridge University Press:

12 September 2008, pp. 9-27
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.
The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

Douglas Biber Dimensions of Register Variation: A Cross-linguistic Comparision. Cambridge University Press, 1995. ISBN 0 521 47331 4, £37.50 (hardback). 444pp.
Adam Kilgarriff
Journal:

Natural Language Engineering / Volume 1 / Issue 4 / December 1995

Published online by Cambridge University Press:

12 September 2008, pp. 396-400
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Several knowledge models and a blackboard memory for human-machine robust dialogues
Violaine Prince, Didier Pernel
Journal:

Natural Language Engineering / Volume 1 / Issue 2 / June 1995

Published online by Cambridge University Press:

12 September 2008, pp. 113-145
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This contribution focuses on a dialogue model using an intelligent working memory that aims at facilitating a robust human-machine dialogue in written natural language. The model has been designed as the core of an information seeking dialogue application. The particularity of this project is to rely on the potent interpretation and behaviour capabilities of pragmatic knowledge. Within this framework, the designed dialogue model appears as a kind of ‘forum’ for various facets, impersonated by different models extracted from both intentional and structural approaches of conversation. The approach is based on assuming that multiple expertise is the key to flexibility and robustness. Also, an intelligent memory that keeps track of all events and links them together from as many angles as necessary is crucial for multiple expertise management. This idea is developed by presenting an intelligent dialogue history which is able to complement the wide coverage of the co-operating models. It is no longer a simple chronological record, but a communication area, common to all processes. We illustrate our topic through examples brought out from collected corpora.

Best parse parsing with Earley's and Inside algorithms on probabilistic RTN
Young S. Han, Key-Sun Choi
Journal:

Natural Language Engineering / Volume 1 / Issue 2 / June 1995

Published online by Cambridge University Press:

12 September 2008, pp. 147-161
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Inside parsing is a best parse parsing method based on the Inside algorithm that is often used in estimating probabilistic parameters of stochastic context free grammars. It gives a best parse in O(N3G3) time where N is the input size and G is the grammar size. Earley algorithm can be made to return best parses with the same complexity in N.
By way of experiments, we show that Inside parsing can be more efficient than Earley parsing with sufficiently large grammar and sufficiently short input sentences. For instance, Inside parsing is better with sentences of 16 or less words for a grammar containing 429 states. In practice, parsing can be made efficient by employing the two methods selectively.
The redundancy of Inside algorithm can be reduced by the topdown filtering using the chart produced by Earley algorithm, which is useful in training the probabilistic parameters of a grammar. Extensive experiments on Penn Tree corpus show that the efficiency of Inside computation can be improved by up to 55%.

Russian morphology: An engineering approach
Andrei Mikheev, Liubov Liubushkina
Journal:

Natural Language Engineering / Volume 1 / Issue 3 / September 1995

Published online by Cambridge University Press:

12 September 2008, pp. 235-260
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Morphological analysis, which is at the heart of the processing of natural language requires computationally effective morphological processors. In this paper an approach to the organization of an inflectional morphological model and its application for the Russian language are described. The main objective of our morphological processor is not the classification of word constituents, but rather an efficient computational recognition of morpho-syntactic features of words and the generation of words according to requested morpho-syntactic features. Another major concern that the processor aims to address is the ease of extending the lexicon. The templated word-paradigm model used in the system has an engineering flavour: paradigm formation rules are of a bottom-up (word specific) nature rather than general observations about the language, and word formation units are segments of words rather than proper morphemes. This approach allows us to handle uniformly both general cases and exceptions, and requires extremely simple data structures and control mechanisms which can be easily implemented as a finite-state automata. The morphological processor described in this paper is fully implemented for a substantial subset of Russian (more then 1,500,000 word-tokens – 95,000 word paradigms) and provides an extensive list of morpho-syntactic features together with stress positions for words utilized in its lexicon. Special dictionary management tools were built for browsing, debugging and extension of the lexicon. The actual implementation was done in C and C++, and the system is available for the MS-DOS, MS-Windows and UNIX platforms.

Peter Flach Simply Logical: Intelligent Reasoning by Example. Chichester, UK: John Wiley & Sons, 1994. ISBN 0 471 94152 2, £27.95.
Jan J. Ijdens
Journal:

Natural Language Engineering / Volume 1 / Issue 4 / December 1995

Published online by Cambridge University Press:

12 September 2008, pp. 389-390
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

NLE volume 1 issue 3 Cover and Front matter
Journal:

Natural Language Engineering / Volume 1 / Issue 3 / September 1995

Published online by Cambridge University Press:

12 September 2008, pp. f1-f2
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

3326 results in Artificial Intelligence and Natural Language Processing

Generating basic skills reports for low-skilled readers *

Definitional and human constraints on structural annotation of English *

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

A structural approach to the automatic adjudication of word sense disagreements

Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models

Aaron Halpern On the Placement and Morphology of Clitics, Stanford, CA, 1995. ISBN 1 881 52660 7, £17.95 (paperback). ISBN 1 881 52661 5, £35.00 (hardback).

Patrick O'Neil, Database Principles, Programming, and Performance. San Francisco, CA: Morgan Kaufmann, 1994. 800 pages. ISBN 1 55860 219 4

High speed feature unification and parsing

E. Ashcroft, A. Faustini, Anthony, R. Jagannathan and W. Wadge, Multi-dimensional programming. New York, NY: Oxford University Press, 1995. ISBN 019 507597 8, £40.00.

Anaphora and ellipsis in artificial languages

A rule-based phrase parser for real-time text-to-speech synthesis

Editorial

G. Adriaens and U. Hahn (eds.), Parallel Natural Language Processing. Norwood, NJ: Ablex, 1994. $79.50. ISBN 0 89391 869 5

Technical terminology: some linguistic properties and an algorithm for identification in text

Douglas Biber Dimensions of Register Variation: A Cross-linguistic Comparision. Cambridge University Press, 1995. ISBN 0 521 47331 4, £37.50 (hardback). 444pp.

Several knowledge models and a blackboard memory for human-machine robust dialogues

Best parse parsing with Earley's and Inside algorithms on probabilistic RTN

Russian morphology: An engineering approach

Peter Flach Simply Logical: Intelligent Reasoning by Example. Chichester, UK: John Wiley & Sons, 1994. ISBN 0 471 94152 2, £27.95.

NLE volume 1 issue 3 Cover and Front matter

Artificial Intelligence and Natural Language Processing

Refine search

Refine search

Actions for selected content:

Save Search

3326 results in Artificial Intelligence and Natural Language Processing