To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
With the accelerating pace of work being done in the field of software engineering, it is becoming increasingly difficult to see the range of available techniques against a common background. Individual techniques are accessible throughout the literature but generally each is described in isolation. Comparison is difficult unless one has a strong idea of what one is looking for.
Our purpose in writing this book has therefore been to bring together, within a single framework, a variety of methods from all stages of software development and, in this way, to assist software engineers in particular in finding and evaluating the techniques appropriate to their own working environments. This should be regarded principally therefore as a book of signposts, leading the reader out into the field with a map, a compass and some reference points. By combining pointers to a selection of modern development practices with practical hints and checklists, all within one framework, we hope to fill the need for a vade-mecum to the software developer.
Managers with responsibility for software production will also, we hope, find much to stimulate their ideas about what modern practices have to offer and how their departments' efficiency and effectiveness could benefit. Students of computing science will be able to complement their more theoretical work by seeing the software development process as a day-to-day activity, a process that has managerial and sociological as much as purely technical aspects.
The transformation of your client's ideas about his proposed system into the physical, delivered system is a long one. If this transformation is to be at all manageable, it needs to be tackled as a sequence of stages, each bringing you closer to your destination.
The simplest model of the software development process is a purely sequential one with each succeeding stage following on in serial fashion from the previous one. Such a model could serve for a small one-man project but a better approximation to reality is contained in the so-called ‘cascade’ or ‘waterfall’ model suggested in figure 2.1. This recognises that it is generally necessary – even desirable – for the stages to overlap and even repeat.
It is normally customary at this point to talk of the software development ‘lifecycle’ and to draw a diagram along the lines of figure 2.2. This custom is based on the premise that the original development of a system from scratch has the same underlying sequence of events as the enhancement or correction of a system: inception, definition, design, implementation and acceptance. Superficially this is true but the similarity is slight and the generalisation at times unhelpful.
In this book we prefer to talk about the ‘development path’ and ‘maintenance cycles’ and to draw the ‘b-diagram’ along the lines of figure 2.3. This reflects reality more closely. The development path, starting from nothing more than a desire for a system, stands alone.
Given the task of developing a software system, how does one go about it? To start the building of a system of a thousand or maybe a million delivered lines of source code is a daunting (if exciting) prospect. No one should begin without a very clear idea about how the development is to be undertaken and how the quality of its output is to be assessed.
Turning this around we can say that anyone undertaking software development, on no matter what scale, must be strongly advised to establish a methodology for that development – one or more techniques that, by careful integration and control, will bring order and direction to the production process.
To paraphrase Freeman 1982: every software development organisation already has some methodology for building software systems. However, while some software is developed according to modern practices of software development, most of it is still built in an ad hoc way. So it is best to talk about software development techniques and methodologies in terms of changing current practices, replacing them with new techniques that improve the process of software development and the quality of the resulting products.
There is of course a dilemma here for us as system developers: technique X may offer us potential gains such as reduced development time, reduced costs, increased reliability, quality etc, but in adopting it we also incur the risks arising from the fact that we may be unfamiliar with technique X, it might not be suitable for the system we have to develop, and it might not suit our staff and their skills.
In this research, I have shown how principles of discourse structure, discourse coherency, and relevancy criterion can be used for the task of generating natural language text. Each of these areas is particularly important since the generation of text and not simply the generation of single sentences was studied. In this section, the principles and their incorporation as part of the text generation method are reviewed. Some limitations of the generation method are then discussed and finally, some possibilities for future research are presented.
Discourse structure
A central thesis of this research is based on the observation that descriptions of the same information may vary from one telling to the next. This indicates that information need not be described in the same way in which it is stored. For the generation process, this means that production of text does not simply trace the knowledge representation. Instead, standard principles for communication are used to organize a text. These rhetorical techniques are used to guide the generation process in the TEXT system. Since different rhetorical techniques are associated with different discourse purposes, it was shown that different descriptions of the same information can be produced depending upon the discourse situation. By incorporating commonly used techniques for communication into its answers, the system is able to convey information more effectively.
Relevancy criterion
It was pointed out that, when speaking, people are able to ignore large bodies of knowledge and focus on information that is relevant to the current discourse purpose. By constraining focus of attention to relevant information, a generation system is able to determine more efficiently what should be said next.
Focusing is a prevalent phenomenon in all types of naturally occurring discourse. Everyone, consciously or unconsciously, centers their attention on various concepts or objects throughout the process of reading, writing, speaking, or listening. In all these modalities, the focusing phenomena occur at many levels of discourse. For example, we expect a book to concern itself with a single theme or subject; chapters are given headings, indicating that the material included within is related to the given heading; paragraphs are organized around topics; and sentences are related in some way to preceding and succeeding sentences. In conversation, comments such as “Stick to the subject …”, “Going back to what you were saying before ‥”, or “Let's change the subject …” all indicate that people are aware that the conversation centers on specific ideas and that there are conventions for changing the focus of attention.
The use of focusing makes for ease of processing on the part of participants in a conversation. When interpreting utterances, knowledge that the discourse is about a particular topic eliminates certain possible interpretations from consideration. Grosz (77) discusses this in light of the interpretation of definite referring expressions. She notes that although a word may have multiple meanings, its use in an appropriate context will rarely bring to mind any meaning but the relevant one. Focusing also facilitates the interpretation of anaphoric, and in particular, pronominal, references (see Sidner 79). When the coherence provided by focusing is missing from discourse, readers and hearers may have difficulty in determining what a pronoun refers to. When speaking or writing, the process of focusing constrains the set of possibilities for what to say next.
There are two major aspects of computer-based text generation: 1) determining the content and textual shape of what is to be said; and 2) transforming that message into natural language. Emphasis in this research has been on a computational solution to the questions of what to say and how to organize it effectively. A generation method was developed and implemented in a system called TEXT that uses principles of discourse structure, discourse coherency, and relevancy criterion. In this book, the theoretical basis of the generation method and the use of the theory within the computer system TEXT are described.
The main theoretical results have been on the effect of discourse structure and focus constraints on the generation process. A computational treatment of rhetorical devices has been developed which is used to guide the generation process. Previous work on focus of attention has been extended for the task of generation to provide constraints on what to say next. The use of these two interacting mechanisms constitutes a departure from earlier generation systems. The approach taken here is that the generation process should not simply trace the knowledge representation to produce text. Instead, communicative strategies people are familiar with are used to effectively convey information. This means that the same information may be described in different ways on different occasions.
The main features of the generation method developed for the TEXT strategic component include 1) selection of relevant information for the answer, 2) the pairing of rhetorical techniques for communication (such as analogy) with discourse purposes (for example, providing definitions) and 3) a focusing mechanism.
A portion of an Office of Naval Research (ONR) database was used to test the TEXT system. The portion used contains information about military vehicles and weapons. The ONR database was selected for TEXT in part because of its availability (it had been in use previously in a research project jointly with the Wharton School of the University of Pennsylvania) and in part because of its complex structure. Even using only a portion of the database provided a domain complex enough to allow for an interesting set of questions and answers.
As discussed in Chapter One, TEXT accepts three kinds of questions as input. These are:
What is a <e>?
What do you know about <e>?
What is the difference between <e1> and <e2>?
where <e>, <e1>, and <e2> represent any entity in the database. Since the TEXT system does not include a facility for interpreting English questions, the user must phrase his questions in the functional notation shown below which corresponds to the three classes of questions.
(definition <e>))
(information <e>)
(differense <e1> <e2>)27
Note that the system only handles questions about objects in the database. Although the system can include information about relations when relevant to a question about a particular object, it can not answer questions about relations themselves.
System components
The TEXT system consists of six major components: a schema selector, a relevant knowledge selector, the schema filler, the focusing mechanism, a dictionary interface, and a tactical component.
The approach I have taken towards text generation is based on two fundamental hypotheses about the production of text: 1) that how information is stored in memory and how a person describes that information need not be the same and 2) that people have preconceived notions about the ways in which descriptions can be achieved.
I assume that information is not described in exactly the same way it is organized in memory. Rather, such descriptions reflect one or more principles of text organization. It is not uncommon for a person to repeat himself and talk about the same thing on different occasions. Rarely, however, will he repeat himself exactly. He may describe aspects of the subject which he omitted on first telling or he may, on the other hand, describe things from a different perspective, giving the text a new emphasis. Chafe (79) has performed a series of experiments which he claims support the notion that the speaker decides as he is talking what material should go into a sentence. These experiments show that the distribution of semantic constituents among sentences often varies significantly from one version of a narrative to another.
The second hypothesis central to this research is that people have preconceived ideas about the means with which particular communicative tasks can be achieved as well as the ways in which these means can be integrated to form a text. In other words, people generally follow standard patterns of discourse structure. For example, they commonly begin a narrative by describing the setting (the scene, the characters, or the time-frame).
In the process of producing discourse, speakers and writers must decide what it is that they want to say and how to present it effectively. They are capable of disregarding information in their large body of knowledge about the world which is not specific to the task at hand and they manage to integrate pertinent information into a coherent unit. They determine how to appropriately start the discourse, how to order its elements, and how to close it. These decisions are all part of the process of deciding what to say and when to say it. Speakers and writers must also determine what words to use and how to group them into sentences. In order for a system to generate text, it, too, must be able to make these kinds of decisions.
In this work, a computational solution is sought to the problems of deciding what to say and how to organize it effectively. What principles of discourse can be applied to this task? How can they be specified so that they can be used in a computational process? A computational perspective can aid our understanding of how discourse is produced by demanding a precise specification of the process. If we want to build a system that can perform these tasks, our theory of production must be detailed and accurate. Conversely, to build a system that can produce discourse effectively, determining content and textual shape, the development and application of principles of discourse structure, discourse coherency, and relevancy criterion are essential to its success.
The TEXT system was implemented in CMU lisp (an extension of Franz Lisp) on a VAX 11/780. The TEXT system source code occupies a total of 1176 K of memory with the following breakdown:
Knowledge base and accessing functions (not including database and database interface functions): 442K
Strategic component: 573K
Tactical component: 145K
The system, including the knowledge base, was loaded in entirety into memory for use of the TEXT system. Only the database remains on disk. No space problems were encountered during implementation with one exception: the particular Lisp implementation available does not allow for resetting the size of the recursive name stack. This meant that certain functions which were originally written recursively had to be rewritten iteratively since the name stack was not large enough to handle them.
Processing speed is another question altogether. Currently the response time of the TEXT system is far from being acceptable for practical use. The bulk of the processing time, however, is used by the tactical component. Since it was not the focal point of this dissertation, no major effort was made to speed up this component. To answer a typical question posed to the TEXT system, the strategic component (including dictionary interface) uses 3290 CPU seconds, an elapsed time of approximately one and a half minutes, while the tactical component uses 43845 CPU seconds, an elapsed time of approximately 20 minutes. Times vary for different questions. These statistics were obtained when using the system in a shared environment. An improvement in speed could be achieved by using a dedicated system. It should be noted, furthermore, that the strategic component is not compiled, while the tactical component is.
Tracking the discourse history involves remembering what has been said in a single session with a user and using that information when generating additional responses. The discourse history can be used to avoid repetition within a single session and, more importantly, to provide responses that contrast with previous answers. Although the maintenance of a discourse history record was not implemented in the TEXT system, an analysis of the effects such a history could have on succeeding questions as well as the information that needs to be recorded in order to achieve those effects was made. In the following sections some examples from each class of questions that the system handles are examined to show how they would be affected by the various kinds of discourse history records that could be maintained.
Possible discourse history records
Several different discourse history types, each containing a different amount of information, are possible. One history type could simply note that a particular question was asked and an answer provided by maintaining a list of questions On the other hand, the system could record both the question asked and the actual answer provided in its history. The answer itself could be maintained in any of a number of ways. The history could record the structure and information content of the answer (for TEXT, this would be the instantiated schema). Another possibility would be to record some representation of the surface form of the answer, whether its syntactic structure or the actual text.
Interest in the generation of natural language is beginning to grow as more systems are developed which require the capability for sophisticated communication with their users. This chapter provides an overview of the development of research in natural language generation. Other areas of research, such as linguistic research on discourse structure, are also relevant to this work, but are overviewed in the pertinent chapters.
The earliest generation systems relied on the use of stored text and templates to communicate with the user. The use of stored text requires the system designer to enumerate all questions the system must be able to answer and write out the answers to these questions by hand so that they can be stored as a whole and retrieved when needed. Templates allow a little more flexibility. Templates are English phrases constructed by the designer with slots which can be instantiated with different words and phrases depending upon the context. Templates may be combined and instantiated in a variety of ways to produce different answers. One main problem with templates is that the juxtaposition of complete English phrases frequently results in awkward or illegal text. A considerable amount of time must be spent by the designer experimenting with different combinations to avoid this problem.
Both of these methods require a significant amount of hand-encoding, are limited to handling anticipated questions, and cannot be extended in any significant way. They are useful, however, for situations in which a very limited range of generation is required particularly because the system can be as eloquently spoken as the designer.
The paragraphs from the first topic group in the Introduction to Working (Terkel 72) are reproduced below.
INTRODUCTION
This book, being about work, is, by its very nature, about violence – to the spirit as well as to the body. It is about ulcers as well as accidents, about shouting matches as well as fistfights, about nervous breakdowns as well as kicking the dog around. It is, above all (or beneath all), about daily humiliations. To survive the day is triumph enough for the walking wounded among the great many of us.
The scars, psychic as well as physical, brought home to the supper table and the TV set, may have touched, malignantly, the soul of our society. More or less. (“More or less,” that most ambiguous of phrases, pervades many of the conversations that comprise this book, reflecting, perhaps, an ambiguity of attitude toward The Job. Something more than Orwellian acceptance, something less than Luddite sabotage. Often the two impulses are fused in the same person.)
It is about a search, too, for daily meaning as well as daily bread, for recognition as well as cash, for astonishment rather than torpor; in short, for a sort of life rather than a Monday through Friday sort of dying. Perhaps immortality, too, is part of the quest. To be remembered was the wish, spoken and unspoken, of the heroes and heroines of this book.