To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
No one can tell you how to do it. The technique must be learned the way I did it, by failures.
John Steinbeck, Travels with Charlie
Similarly, a responsive informant could answer questions involving non-terminals, or instead of responding ‘No’ could give the closest valid string.
Jim Horning (Horning, 1969)
There are several situations where the learning algorithm can actively interact with its environment. Instead of using given data, the algorithm may be able to perform tests, create new strings, and find out how far he may be from the solution. The mathematical setting to do this is called active learning, where queries are made to an Oracle.
In this chapter we cover positive and negative aspects of this important paradigm in grammatical inference, but also in machine learning, with again a special focus on the case of learning deterministic finite automata.
About learning with queries
In Section 7.5 we introduced the model of learning from queries (or active learning) in order to produce negative results (which could then also apply to situations where we have less control over the examples) and also to find new inference algorithms in a more helpful but credible learning setting.
Why learn with queries?
Active learning is a paradigm first introduced with theoretical motivations but that for a number of reasons can today be considered also as a pragmatic approach.
Errors using inadequate data are much less than those using no data at all.
Charles Babbage
It is a capital mistake to theorise before one has data.
Sir Arthur Conan Doyle, Scandal in Bohemia
Strings are a very natural way to encode information: they appear directly with linguistic data (they will then be words or sentences), or with biological data. Computer scientists have for a long time organised information into tree-like data structures. It is reasonable, therefore, that trees arise in a context where the data has been preprocessed. Typical examples are the parse trees of a program or the parse trees of natural language sentences.
Graphs will appear in settings where the information is more complex: images will be encoded into graphs, and first-order logical formulae also require graphs when one wants to associate a semantic.
Grammatical inference is a task where the goal is to learn or infer a grammar (or some device that can generate, recognise or describe strings) for a language and from all sorts of information about this language.
Grammatical inference consists of finding the grammar or automaton for a language of which we are given an indirect presentation through strings, sequences, trees, terms or graphs.
As what characterises grammatical inference is at least as much the data from which we are asked to learn, as the sort of result, we turn to presenting some possible examples of data.
The mapping between knowledge representation and natural language is fast becoming a focal point of both knowledge engineering (KE) and computational linguistics (CL). Ontologies have a special role to play in this interface. They are essential stepping stones (a) from natural language to knowledge representation and manipulation and (b) from formal theories of knowledge to their application in (natural language) processing. Moreover, the emergence of the Semantic Web initiative presents a unique opportunity to bring research results in this area to real-world applications, at the leading edge of human-language technology. An essential and perhaps foundational aspect of the mapping between knowledge representation and natural language is the interface between ontologies and lexical resources. On the one hand, their integration includes, but is not restricted to, the use of ontologies (a) as language-independent structures of multilingual computational lexicons and (b) as powerful tools for improving the performance of existing lexical resources on various natural language processing (NLP) tasks such as word-sense disambiguation. On the other hand, lexical resources constitute a formidable source of information for generating ontological knowledge both at foundational and domain levels.
This current volume aims to be an essential general reference book on the interface between ontology and lexical resources. Given the fast developments in this new research direction, we introduce a general framework with a terminology to accommodate both ontological and lexical perspectives.
On two occasions I have been asked [by members of Parliament], ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Charles Babbage
“Was it all inevitable, John?” Reeve was pushing his fingers across the floor of the cell, seated on his haunches. I was lying on the mattress. “Yes,” I said. “I think it was. Certainly, it's written that way. The end of the book is there before the beginning's hardly started.”
Ian Rankin, Knots and Crosses
When ending this manuscript, the author decided that a certain number of things had been left implicit in the text, and could perhaps be written out clearly in some place where this would not affect the mathematical reading of the rest.
Let us discuss these points briefly here.
About convergence
Let us suppose, for the sake of argument, that the task we were developing algorithms for was the construction of random number generators. Suppose now that we had constructed such a generator, that given a seed s would return an endless series of numbers. Some of the questions we may be faced with might be:
It is no coincidence that in no known language does the phrase ‘As pretty as an airport’ appear.
Douglas Adams
Learning languages requires, for the process to be of any practical value, agreement on a representation of these languages. We turn to formal language theory to provide us with such meaningful representations, and adapt these classical definitions to the particular task of grammatical inference only when needed.
Automata and finite state machines
Automata are finite state machines used to recognise strings. They correspond to a simplified and limited version of Turing machines: a string is written on the input tape, the string is then read from left to right and, at each step, the next state of the system is chosen depending only on the previous state and the letter or symbol being read. The fact that this is the only information that can be used to parse the string makes the system powerful enough to accept just a limited class of languages called regular languages. The recognition procedure can be made deterministic by allowing only one action to be possible at each step (therefore for each state and each symbol). It is usually nicer and easier to manipulate these deterministic machines (called deterministic finite automata) because parsing is then performed in a much more convenient and economic way, and also because a number of theoretical results only apply to these.
The biscuit tree. This remarkable vegetable production has never yet been described or delineated.
Edward Lear, Flora Nonsensica
Man muss immer generalisieren.
Carl Jacobi
Formal language theory has been developed and studied consistently over the past 50 years. Because of their importance in so many fields, strings and tools to manipulate them have been studied with special care, leading to the specific topic of stringology. Usual definitions and results can therefore be found in several text books.
We re-visit these objects here from a pragmatic point of view: grammatical inference is about learning grammars and automata which are then supposed to be used by programs to deal with strings. Their advantage is that we can parse with them, compare them, compute distances… Therefore, we are primarily interested in studying how the strings are organised: knowing that a string is in a language (or perhaps more importantly out of the language) is not enough. We will also want to know how it belongs or why it doesn't belong. Other questions might be about finding close strings or building a kernel taking into account their properties. The goal is therefore going to be to organise the strings, to put some topology over them.
Notations
We start by introducing here some general notations used throughout the book.
In a definition ifdef is a definition ‘if’.
The main mathematical objects used in this book are letters and strings.
Among the more interesting remaining theoretical questions are: inference in the presence of noise, general strategies for interactive presentation and the inference of systems with semantics.
Jerome Feldman (Feldman, 1972)
La simplicité n'a pas besoin d'être simple, mais du complexe resserré et synthétisé.
Alfred Jarry
We describe algorithm LSTAR, introduced by Dana Angluin, which has inspired several variants and adaptations to other classes of languages.
The minimally adequate teacher
A minimally adequate teacher (MAT) is an Oracle that can give answers to membership queries and strong equivalence queries. We analysed in Section 9.2 the case where you want to learn with less.
The main algorithm that works in this setting is called LSTAR. The general idea of LSTAR is:
• find a consistent observation table (representing a DFA),
• submit it as an equivalence query,
• use the counter-example to update the table,
• submit membership queries to make the table closed and complete,
• iterate until the Oracle, upon an equivalence query, tells us that the correct language has been reached.
The observation table we use is analogous to that described in Section 12.3, so we will use the same formalism here.
An observation table
An observation table is a specific tabular representation of an automaton. An example is given in Table 13.1(a).
El azar tiene muy mala leche y muchas ganas de broma.
Arturo Perez Reverte
All knowledge degenerates into probability.
David Hume, A Treatise on Human Nature, 1740
If we suppose that the data have been obtained through sampling, that means we have (or at least believe in) an underlying probability over the strings. In most cases we do not have a description of this distribution, and we describe three plausible learning settings.
The first possibility is that the data are sampled according to an unknown distribution, and that whatever we learn from, the data will be measured with respect to this distribution. This corresponds to the well-known PAC-learning setting (probably approximately correct).
The second possibility is that the data are sampled according to a distribution itself defined by a grammar or an automaton. The goal will now no longer be to classify strings but to learn this distribution. The quality of the learning process can then be measured either while accepting a small error (most of the time, since a particular sampling can have been completely corrupted!), or in the limit, with probability one. One can even hope for a combination of both these criteria.
There are other possible related settings that we only mention briefly here: an important one concerns the case where the distribution in the PAC model is computable, without being generated by a grammar or an automaton. The problem remains a classification question for which we have only restricted the class of admissible distributions.
If your experiment needs statistics, you ought to have done a better experiment.
Ernest Rutherford
‘I think you're begging the question,’ said Haydock, ‘and I can see looming ahead one of those terrible exercises in probability where six men have white hats and six men have black hats and you have to work it out by mathematics how likely it is that the hats will get mixed up and in what proportion. If you start thinking about things like that, you would go round the bend. Let me assure you of that!’
Instead of defining a language as a set of strings, there are good reasons to consider the seemingly more complex idea of defining a distribution over strings. The distribution can be regular, in which case the strings are then generated by a probabilistic regular grammar or a probabilistic finite automaton. We are also interested in the special case where the automaton is deterministic.
Once distributions are defined, distances between the distributions and the syntactic objects they represent can be defined and in some cases they can be conveniently computed.
Distributions over strings
Given a finite alphabet Σ, the set Σ* of all strings over Σ is enumerable, and therefore a distribution can be defined.
A semantically insightful way to describe events of change is in terms of their preconditions and effects. For example, if a car accelerates, the value of the ‘speed’ attribute as applied to the car's motion is higher in its effect than it had been in the precondition of the acceleration event; and if the importance of a certain political theory increases, the value of the modality ‘saliency’ scoping over that political theory is higher in its effect than it had been in the precondition of the increase event. Consider the following examples of change events drawn from the Wall Street Journal corpus covering 1987, 1988, 1989:
Other food stocks rallied in response to the offer for Kraft. Gerber Products shot up 4⅜ to 57¾, CPC International rose 2¼ to 55½, General Mills gained 2⅛ to 54½, Borden rose 1¾ to 56½, Quaker Oats went up 1⅜ to 56⅜ and H.J. Heinz went up 1⅜ to 47¾.
This development follows recent proposals from Saudi Arabia, Kuwait, the United Arab Emirates and Qatar that the current OPEC production ceiling of 16.6 million barrels a day should be increased by 900,000 barrels a day.
In 1985, 3.9 million women were enrolled in four-year schools. Their number increased by 49,000 in 1986.
Interco shot up 4 to 71¾ after a Delaware judge barred its poison pill defense against the Rales group's hostile $74 offer.
The establishment of mappings with precise semantics between formal ontologies constitutes one of the central tasks in research on ontology. This requirement affects primarily ontologies and their reciprocal understandability in shared environments like the Semantic Web, providing also a solid infrastructure for the interoperability of ontolex resources.
Suppose that a naive agent, human or artificial, wants to know the meaning of the word ‘thesaurus’. A query submitted to WordNet returns the gloss: a book containing a classified list of synonyms. Navigating through the upward hierarchy, our agent might discover that a {book} is an {artefact}, and then a {physical object}. This result is trivial, indeed. But what about the ‘content’ of the book? Does it make sense to refer to contents as mere physical objects? To us, as human beings, this is obviously not the case. However, we know this is ‘obvious’ because of our (relatively) huge background world knowledge. Is there a conceptual model that can help a naive agent (e.g. a personal software agent that needs to be trained) to shape this knowledge? SUMOwn, Cycwn and DOLCEwn (respectively, the integration of SUMO, Cyc and DOLCE with WordNet) can be exploited for this task. For example, SUMOwn represents book as a ContentBearingPhysical, namely something physical that ‘contains’ some information about a given topic; Cycwn gives a similar conceptualization by linking thesaurus to Object-type entities, those collections of things that are partially like a physical object.
Understanding is compression, comprehension is compression!
Greg Chaitin (Chaitin, 2007)
Comprendo. Habla de un juego donde las reglas no sean la línea de salida, sino el punto de llegada ¿No?
Arturo Pérez-Reverte, el pintor de batallas
‘Learning from an informant’ is the setting in which the data consists of labelled strings, each label indicating whether or not the string belongs to the target language.
Of all the issues which grammatical inference scientists have worked on, this is probably the one on which most energy has been spent over the years. Algorithms have been proposed, competitions have been launched, theoretical results have been given. On one hand, the problem has been proved to be on a par with mighty theoretical computer science questions arising from combinatorics, number theory and cryptography, and on the other hand cunning heuristics and techniques employing ideas from artificial intelligence and language theory have been devised.
There would be a point in presenting this theme with a special focus on the class of context-free grammars with a hope that the theory for the particular class of the finite automata would follow, but the history and the techniques tell us otherwise. The main focus is therefore going to be on the simpler yet sufficiently rich question of learning deterministic finite automata from positive and negative examples.