To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The Cambridge Companion to the African American Novel presents new essays covering the one hundred and fifty year history of the African American novel. Experts in the field from the US and Europe address some of the major issues in the genre: passing, the Protest novel, the Blues novel, and womanism among others. The essays are full of fresh insights for students into the symbolic, aesthetic, and political function of canonical and non-canonical fiction. Chapters examine works by Ralph Ellison, Leon Forrest, Toni Morrison, Ishmael Reed, Alice Walker, John Edgar Wideman, and many others. They reflect a range of critical methods intended to prompt new and experienced readers to consider the African American novel as a cultural and literary act of extraordinary significance. This volume, including a chronology and guide to further reading, is an important resource for students and teachers alike.
The goal of this chapter is to show that even complex recursive NLP tasks such as parsing (assigning syntactic structure to sentences using a grammar, a lexicon and a search algorithm) can be redefined as a set of cascaded classification problems with separate classifiers for tagging, chunk boundary detection, chunk labeling, relation finding, etc. In such an approach, input vectors represent a focus item and its surrounding context, and output classes represent either a label of the focus (e.g., part of speech tag, constituent label, type of grammatical relation) or a segmentation label (e.g., start or end of a constituent). In this chapter, we show how a shallow parser can be constructed as a cascade of MBLP-classifiers and introduce software that can be used for the development of memory-based taggers and chunkers.
Although in principle full parsing could be achieved in this modular, classification-based way (see section 5.5), this approach is more suited for shallow parsing. Partial or shallow parsing, as opposed to full parsing, recovers only a limited amount of syntactic information from natural language sentences. Especially in applications such as information retrieval, question answering, and information extraction, where large volumes of, often ungrammatical, text have to be analyzed in an efficient and robust way, shallow parsing is useful. For these applications a complete syntactic analysis may provide too much or too little information.
An MBLP system as introduced in the previous chapters has two components: a learning component which is memory-based, and a performance component which is similarity-based. The learning component is memorybased as it involves storing examples in memory (also called the instance base or case base) without abstraction, selection, or restructuring. In the performance component of an MBLP system the stored examples are used as a basis for mapping input to output; input instances are classified by assigning them an output label. During classification, a previously unseen test instance is presented to the system. The class of this instance is determined on the basis of an extrapolation from the most similar example(s) in memory. There are different ways in which this approach can be operationalized. The goal of this chapter is twofold: to provide a clear definition of the operationalizations we have found to work well for NLP tasks, and to provide an introduction to TIMBL, a software package implementing all algorithms and metrics discussed in this book. The emphasis on hands-on use of software in a book such as this deserves some justification. Although our aims are mainly theoretical in showing that MBLP has the right bias for solving NLP tasks on the basis of argumentation and experiment, we believe that the strengths and limitations of any algorithm can only be understood in sufficient depth by experimenting with this specific algorithm.
This book presents a simple and efficient approach to solving natural language processing problems. The approach is based on the combination of two powerful techniques: the efficient storage of solved examples of the problem, and similarity-based reasoning on the basis of these stored examples to solve new ones.
Natural language processing (NLP) is concerned with the knowledge representation and problem solving algorithms involved in learning, producing, and understanding language. Language technology, or language engineering, uses the formalisms and theories developed within NLP in applications ranging from spelling error correction to machine translation and automatic extraction of knowledge from text.
Although the origins of NLP are both logical and statistical, as in other disciplines of artificial intelligence, historically the knowledge-based approach has dominated the field. This has resulted in an emphasis on logical semantics for meaning representation, on the development of grammar formalisms (especially lexicalist unification grammars), and on the design of associated parsing methods and lexical representation and organization methods. Well-known textbooks such as Gazdar and Mellish (1989) and Allen (1995) provide an overview of this ‘rationalist’ or ‘deductive’ approach.
The approach in this book is firmly rooted in the alternative empirical (inductive) approach. From the early 1990s onwards, empirical methods based on statistics derived from corpora have been adopted widely in the field. There were several reasons for this.
This book is a reflection of about twelve years of work on memory-based language processing. It reflects on the central topic from three perspectives. First, it describes the influences from linguistics, artificial intelligence, and psycholinguistics on the foundations of memory-based models of language processing. Second, it highlights applications of memory-based learning to processing tasks in phonology and morphology, and in shallow parsing. Third, it ventures into answering the question why memory-based learning fills a unique role in the larger field of machine learning of natural language – because it is the only algorithm that does not abstract away from its training examples. In addition, we provide tutorial information on the use of TIMBL, a software package for memory-based learning, and an associated suite of software tools for memory-based language processing.
For us, the direct inspiration for starting to experiment with extensions of the k-nearest neighbor classifier to language processing problems was the successful application of the approach by Stanfill and Waltz to grapheme-to-phoneme conversion in the eighties. During the past decade we have been fortunate to have expanded our work with a great team of fellow researchers and students on memory-based language processing in two locations: the ILK (Induction of Linguistic Knowledge) research group at Tilburg University, and CNTS (Center for Dutch Language and Speech) at the University of Antwerp. Our own first implementations of memory-based learning were soon superseded by well-coded software systems by Peter Berck, Jakub Zavrel, Bertjan Busser, and Ko van der Sloot.
As argued in chapter 1, if a natural language processing task is formulated as either a disambiguation task or a segmentation task, it can be presented as a classification task to a memory-based learner, as well as to any other machine learning algorithm capable of learning from labeled examples. In this chapter as well as in the next we provide examples of how we formulate tasks in an MBLP framework. We start with one disambiguation and one segmentation task operating at the phonological and morphological levels, respectively.
A non-trivial portion of the complexity of natural languages is determined at the phonological and morphological levels, where phonemes and morphemes come together to form words. A language's phoneme inventory is based on many individual observations in which changing one particular speech sound of a spoken word into another changes the meaning of the word. A morpheme is usually identified as a string of phonemes carrying meaning on its own; a special class of morphemes, affixes, does not carry meaning on its own, but instead affixes have the ability to add or change some aspect of meaning when attached to a morpheme or string of morphemes.
One major problem of natural language processing in the phonological and morphological domains is that many existing sequences of phonemes and morphemes have highly ambiguous surface written forms, and especially in alphabetic writing systems where there is ambiguity in the relation between letters and phonemes.
The concepts of abstraction and generalization are tightly coupled to Ockham's razor, a medieval scientific principle, which is still regarded in many branches of modern science as fundamentally true. Sources quote the principle as “non preterio necessitate delendam”, or freely translated in the imperative form, delete all elements in a theory that are not necessary. The goal of its application is to maximize economy and generality: it favors small theories over large ones, when they have the same expressive power. The latter can be read as ‘having the same generalization accuracy’, which, as we have exemplified in the previous chapters, can be estimated through validation tests with held-out material.
A twentieth-century incarnation of Ockham's razor is the minimal description length (MDL) principle (Rissanen, 1983), coined in the context of computational learning theory. It has been used as the leading principle in the design of decision tree induction algorithms such as C4.5 (Quinlan, 1993) and rule induction algorithms such as RIPPER (Cohen, 1995). The goal of these algorithms is to find a compact representation of the classification information in the given learning material that at the same time generalizes well to unseen material. C4.5 uses decision trees; RIPPER uses ordered lists of rules to meet that end.
In contrast, memory-based learning is not minimal – its description length is equal to the amount of memory it takes to store the learning examples. Keeping all learning examples in memory is all but economical.
Memory-Based Language Processing, MBLP, is based on the idea that learning and processing are two sides of the same coin. Learning is the storage of examples in memory, and processing is similarity-based reasoning with these stored examples. Although we have developed a specific operationalization of these ideas, they have been around for a long time. In this chapter we provide an overview of similar ideas in linguistics, psychology, and computer science, and end with a discussion of the crucial lesson learned from this literature, namely, that generalization from experience to new decisions is possible without the creation of abstract representations such as rules.
Inspirations from linguistics
While the rise of Chomskyan linguistics in the 1960s is considered a turning point in the development of linguistic theory, it is mostly before this time that we find explicit and sometimes adamant arguments for the use of memory and analogy that explain both the acquisition and the processing of linguistic knowledge in humans. We compress this into a brief review of thoughts and arguments voiced by the likes of Ferdinand de Saussure, Leonard Bloomfield, John Rupert Firth, Michael Halliday, Zellig Harris, and Royal Skousen, and we point to related ideas in psychology and cognitive linguistics.
This chapter describes two complementary extensions to memory-based learning: a search method for optimizing parameter settings, and methods for reducing the near-sightedness of the standard memory-based learner to its own contextual decisions in sequence processing tasks. Both complement the core algorithm as we have been discussing so far. Both methods have a wider applicability than just memory-based learning, and can be combined with any classification-based supervised learning algorithm.
First, in section 7.1 we introduce a search method for finding optimal algorithmic parameter settings. No universal rules of thumb exist for setting parameters such as the k in the k-NN classification rule, or the feature weighting metric, or the distance weighting metric. They also interact in unpredictable ways. Yet, parameter settings do matter; they can seriously change generalization performance on unseen data. We show that applying heuristic search methods in an experimental wrapping environment (in which a training set is further divided into training and validation sets) can produce good parameter settings automatically.
Second, in section 7.2 we describe two technical solutions to the problem of “sequence near-sightedness” from which many machine-learning classifiers and stochastic models suffer that predict class symbols without coordinating one prediction with another in some way. When such a classifier is performing natural language sequence tasks, producing class symbol by class symbol, it is unable to stop itself from generating output sequences that are impossible and invalid, because information on the output sequence being generated is not available to the learner.
It follows then that the reason why I – and many others who have written of such events – are compelled to look back in sorrow is because we cannot look ahead.
(“The Greatest Sorrow, The Imam and the Indian: 317)
The Circle of Reason
Vomited out of their native soil years ago in another carnage, and dumped hundreds of miles away, they had no anger left. Their only passion was memory… Lalpukur could fight no war because it was damned to a hell of longing.
(The Circle of Reason: 59)
The Story
When he is eight, “Alu” comes to sleepy Lalpukur from Calcutta to live with his uncle Balaram and aunt Toru-debi. He had been given his nickname by his phrenologist uncle, since his large head looked something like a potato and portended an interesting future – at least, so his uncle thought. His parents had recently died in a car accident. Even though Balaram and his brother had been long-estranged, Balaram and Toru-debi decide to take in Alu and raise him, since they had no children of their own. Alu soon displays an amazing ability to pick up various languages. Yet, in one of the many paradoxes that run through the novel, he rarely speaks at all. When at fourteen the boy stops attending school, Balaram, the supposed scientist, surprises everyone by encouraging the boy to take up weaving. Alu begins by taking lessons from Shombhu-Debnath, a master weaver. Alu seems a gifted child, just as Balaram had predicted: not only is he good at languages (that he doesn't use), but now he also surpasses his teacher in weaving.
Considering the book's title, how might reason be seen as a central topic in the book? What are its strengths and appeal in today's world, and its limitations as portrayed in the novel? Circularity also plays a large thematic role, contrasted to Bhudeb Roy's strict linear logic. Which appears to be more powerful? More helpful?
How is chance a factor in the plot? Does it have “significance,” or is it merely serendipity?
How does traditional Indian literature play a part in the book? How does Ghosh use connotations from this earlier writing to enhance his themes?
Demonstrate the use of magic realism in the book. How helpful would you say these devices are in support of Ghosh's themes? How would you compare them to Salman Rushdie's writing, or to any of the Central and South American writers like Gabriel Garcia Marquez who frequently use similar techniques?
Is Jyoti Das a necessary character?
How would you characterise the portrayal of women in this book?
Discuss the clash of traditional and modern value systems in the book.
Does the direction of the flight/quest – westward – have any implications for the theme of this book? What is being sought by the various characters? How successful is each in this quest?
Is the book hopeful? Humanistic?
Does the book show evidence of being a first novel?
It seemed uncanny that I had never known all those years that in defiance of the enforcers of History, a small remnant of Bomma's world had survived, not far from where I had been living.
(In An Antique Land: 342)
The Story
Ghosh begins his account in Lataifa, the little Egyptian village where he stationed himself as an Oxford University graduate student in anthropology. Doctor Aly Issa; a professor at the University of Alexandria, has brought Ghosh to the home of Abu-'Ali, and it is there that he rents a room during his stay in Egypt. Ghosh does not especially relish living there, since Abu-'Ali, in his mid-fifties, is a somewhat overbearing small-businessman. In fact, Ghosh describes him as “profoundly unlovable” (23), but recognises him as someone who prompts a rather fearful respect from the villagers. After a while, Dr. Issa arranges for Ghosh to move out of Lataifa to Nashawy, a larger town.
Another of the major players in the village is Shaikh Musa, also in his mid-fifties, who runs a government-subsidised shop for retailing essential commodities at controlled prices. Ahmed and Jabir are his sons; Sakkina, their age, is Shaikh Musa's second wife (as Ghosh awkwardly discovers). She is the daughter of Ustaz (“teacher”) Mustapha; and is Abu-'Ali's great grand-niece. The names begin to proliferate; and the reader begins to experience the disorientation that must have been Ghosh's, as well. As things develop, Mustapha – and a good many other people – seems to be interested in converting Ghosh to Islam.
Weaving is Reason, which makes the world mad and makes it human.… it is a technique for laying a cross-thread … between parallel long threads…, so that they lock the weft in place.
(The Circle of Reason, 58, 74)
It is when we think of the world the aesthetic of indifference might bring into being that we recognize the urgency of remembering the stories we have not written
(“The Ghosts of Mrs. Gandhi,” The Imam and the Indian: 62).
If the first quote reminds us of the joy Amitav Ghosh takes in telling his interlocking stories; the second striking quote underscores the sense of vocation that he brings to the task. We must note, first of all, that writing is a career he chose after, or in the course of, an academic career as a trained anthropologist with a doctorate from a good school. If his novels and essays show strong evidence of that anthropological training – in their careful observation of their characters, surroundings, and history; their implied comparative sweep of cultures and eras; their implied philosophical investigation of what it means to be a human being – they just as strongly show the novelist's delight in narrative, in character development, in themes and symbols and the other stylistic devices that might seem extraneous to strict academic investigation. In short, Ghosh ties his wagon to imagination, and especially to stories. The second quote, though, forcefully suggests that he retains the anthropologist's dedication to “remembering” stories that otherwise slip from consciousness and from recorded history.
Dancing in Cambodia, At Large in Burma (1998), Countdown (1999); Ghosh's afterword to Chan Chao's book of photography, Burma: Something Went Wrong (2000), and The Imam and the Indian: Prose Pieces (2002)
This chapter considers Amitav Ghosh's non-fictional writing. Viewed together, the collected essays demonstrate his chief concerns:
The nuclearisation of the subcontinent (Countdown; The Ghat of the Only World)
The current political crisis in Burma and Cambodia (Dancing in Cambodia, At Large in Burma; The Global Reservation; Burma: Something Went Wrong)
The maintenance of cultural heritage (Dancing in Cambodia; The Hunger of Stones; The Human Comedy in Cairo)
Pre-European commerce between India and Africa (The Slave of MS. H.6)
Fundamentalism (The Imam and the Indian; An Egyptian in Baghdad; The Fundamentalist Challenge)
Anthropology and Economics in local communities (Categories of Labour and the Orientation of the Fellah Economy; The Relations of Envy in an Egyptian Village)
The Diaspora (The Imam and the Indian; Tibetan Dinner; The Diaspora in Indian Culture) The March of the Novel Through History)
Viewed with a less narrow focus, many of these pieces share in common the author's abiding concern for the impact of broad historical movements on individuals caught up in events beyond their control, the importance of connections between the past and the present, and the desirability of finding avenues for communication that obviate nationalistic manias.
It is unusual for a novelist to produce as rich a body of essays as Amitav Ghosh has.