To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A neural network self-organizes if learning proceeds without evaluating the relevance of output states. Input states are the sole data to be given and during the learning session one does not pay attention to the performance of the network. How information is embedded into the system obviously depends on the learning algorithm, but it also depends on the structure of input data and on architectural constraints.
The latter point is of paramount importance. In the first chapter we have seen that the central nervous system is highly structured, that the topologies of signals conveyed by the sensory tracts are somehow preserved in the primary areas of the cortex and that different parts of the cortex process well-defined types of information. A comprehensive theory of neural networks must account for the architecture of the networks. Up to now this has been hardly the case since one has only distinguished two types of structures, the fully connected networks and the feedforward layered systems. In reality the structures themselves are the result of the interplay between a genetically determined gross architecture (the sprouting of neuronal contacts towards defined regions of the system, for example) and the modifications of this crude design by learning and experience (the pruning of the contacts). The topology of the networks, the functional significance of their structures and the form of learning rules are therefore closely intertwined entities. There is now no global theory explaining why the structure of the CNS is the one we observe and how its different parts cooperate to produce such an efficient system, but there have been some attempts to explain at least the most simple functional organizations, those of the primary sensory areas in particular.
This text started with a description of the organization of the human central nervous system and it ends with a description of the architecture of neurocomputers. An unattentive reader would conclude that the latter is an implementation of the former, which obviously cannot be true. The only claim is that a small but significant step towards the understanding of processes of cognition has been carried out in recent years. The most important issue is probably that recent advances have made more and more conspicuous the fact that real neural networks can be treated as physical systems. Theories can be built and predictions can be compared with experimental observations. This methodology takes the neurosciences at large closer and closer to the classical ‘hard’ sciences such as physics or chemistry. The text strives to explain some of progress in the domain and we have seen how productive the imagination of theoreticians is.
For some biologists, however, the time of theorizing about neural nets has not come yet owing to our current lack of knowledge in the field. The question is: are the models we have introduced in the text really biologically relevant? This is the issue I would like to address in this last chapter. Many considerations are inspired by the remarks which G. Toulouse gathered in the concluding address he gave at the Bat-Sheva seminar held in Jerusalem in May 1988.
Clearly, any neuronal dynamics can always be implemented in classical computers and therefore we could wonder why it is interesting to build dedicated neuronal machines. The answer is two-fold:
Owing to the inherent parallelism of neuronal dynamics, the time gained by using dedicated machines rather than conventional ones can be considerable, so making it possible to solve problems which are out of the reach of most powerful serial computers.
It is perhaps even more important to become aware that dedicated machines compel one to think differently about the problems one has to solve. To program a neurocomputer does not involve building a program and writing a linear series of instructions, step by step. In the process of programming a neurocomputer, one is forced to think more globally in terms of phase space instead, to eventually figure out an energy landscape and to determine an expression for this energy. Z. Pilyshyn made this point clear enough in the following statement (quoted by D. Waltz):
‘What is typically overlooked (when we use a computational system as a cognitive model) is the extent to which the class of algorithms that can even be considered is conditioned by the assumptions we make regarding what basic operations are possible, how they may interact, how operations are sequenced, what data structures are possible and so on. Such assumptions are an intrinsic part of our choice of descriptive formalism.’
Mind has always been a mystery and it is fair to say that it is still one. Religions settle this irritating question by assuming that mind is non-material: it is just linked during the duration of a life to the body, a link that death breaks. It must be realized that this metaphysical attitude pervaded even the theorization of natural phenomena: to ‘explain’ why a stone falls and a balloon filled with hot air tends to rise, Aristotle, in the fourth century BC, assumed that stones house a principle (a sort of a mind) which makes them fall and that balloons embed the opposite principle which makes them rise. Similarly Kepler, at the turn of the seventeeth century, thought that the planets were maintained on their elliptical tracks by some immaterial spirits. To cite a last example, chemists were convinced for quite a while that organic molecules could never be synthetized, since their synthesis required the action of a vital principle. Archimedes, about a century after Aristotle, Newton, a century after Kepler, and Wöhler, who carried out the first synthesis of urea by using only mineral materials, disproved these prejudices and, at least for positivists, there is no reason why mind should be kept outside the realm of experimental observation and logical reasoning.
We find in Descartes the first modern approach of mind.
Neural networks are at the crossroad of several disciplines and the putative range of their applications is immense. The exploration of the possibilities is just beginning. Some domains, such as pattern recognition, which seemed particularly suited to these systems, still resist analysis. On the other hand, neural networks have proved to be a convenient tool to tackle combinatorial optimization problems, a domain to which at first sight they had no application. This indicates how difficult is the task of foreseeing the main lines of developments yet to come. All that can be done now is to give a series of examples, which we will strive to arrange in a logical order, although the link between the various topics is sometimes tenuous. Most of the applications we shall present were put forward before the fall of 1988.
Domains of applications of neural networks
Neural networks can be used in different contexts:
For the modeling of simple biological structures whose functions are known. The study of central pattern generators is an example.
For the modeling of higher functions of central nervous systems, in particular of those properties such as memory, attention, etc., which experimental psychology strives to quantify. Two strategies may be considered. The first consists in explaining the function of a given neural formation (as far as the function is well understood) by taking all available data on its actual structure into account. This strategy has been put forward by Marr in his theory of the cerebellum. The other strategy consists in looking for the minimal constraints that a neuronal architecture has to obey in order to account for some psychophysical property. The structure is now a consequence of the theory. If the search has been successful, it is tempting to identify the theoretical construction with biological structures which display the same organization.
This text is the result of two complementary experiences which I had in 1987 and 1988. The first was the opportunity, which I owe to Claude Godrèche, of delivering, in a pleasant seaside resort in Brittany, a series of lectures on the theory of neural networks. Claude encouraged me to write the proceedings in the form of a pedagogical book, a text which could be useful to the many people who are interested in the field. The second was a one-year sabbatical which I spent at the Hebrew University of Jerusalem on a research program on spin glasses and neural networks. The program was initiated by the Institute for Advanced Studies and organized by a team of distinguished physicists and biologists, namely Moshe Abeles, Hanoch Gutfreund, Haim Sompolinsky and Daniel Amit. Throughout the year, the Institute welcomed a number of researchers who shed different lights on a multi-faceted subject. The result is this introduction to the modeling of neural networks.
First of all, it is an introduction. Indeed the field evolves so fast that it is already impossible to have its various aspects encompassed within a single account.
Also it is an introduction, that is a peculiar perspective which rests on the fundamental hypothesis that the information processed by the nervous systems is encoded in the individual neuronal activities. This is the most widely admitted point of view. However, other assumptions have been suggested.
This chapter is not a course on neurobiology. As stated in the title, it is intended to gather a few facts relevant to neural modeling, for the benefit of those not acquainted with biology. The material which is displayed has been selected on the following accounts. First of all, it is made of neurobiological data that form the basic bricks of the model. Then it comprises a number of observations which have been subjects for theoretical investigations. Finally, it strives to settle the limits of the current status of research in this field by giving an insight on the huge complexity of central nervous systems.
Three approaches to the study of the functioning of central nervous systems
Let us assume that we have a very complicated machine of unknown origin and that our goal is to understand its functioning. Probably the first thing we do is to observe its structure. In general this analysis reveals a hierarchical organization comprising a number of levels of decreasing complexity: units belonging to a given rank are made of simpler units of lower rank and so on, till we arrive at the last level of the hierarchy, which is a collection of indivisible parts.
The next step is to bring to light what the units are made for, how their presence manifests itself in the machine and how their absence damages its properties. This study is first carried out on pieces of the lowest order, because the functions of these components are bound to be simpler than those of items of higher rank.
Autoregressive data modelling using the least-squares linear prediction method is generalized for multichannel time series. A recursive algorithm is obtained for the formation of the system of multichannel normal equations which determine the least-squares solution of the multichannel linear prediction problem. Solution of these multichannel normal equations is accomplished by the Cholesky factorization method. The corresponding multichannel Maximum Entropy spectrum derived from these least-squares estimates of the autoregressive model parameters is compared to that obtained using parameters estimated by a multichannel generalization of Burg's algorithm. Numerical experiments have shown that the multichannel spectrum obtained by the least-squares method provides for more accurate frequency determination for truncated sinusoids in the presence of additive white noise.
INTRODUCTION
Multi-channel generalizations of Burg's1–3 now-classical algorithm for the modelling of data as an auto-regressive sequence and therefore estimation of its equivalent maximum entropy spectrum have been obtained independently by several authors (Jones, Nuttal, Strand, Morf et al., Tyraskis and Tyraskis and Jensen. For single-channel data, Ulrych and Claytonll have also introduced an alternative procedure which is commonly described as ‘the exact-leastsquares method’ for the estimation of the autoregressive data model parameters from which a spectrum can be directly obtained. This method has been further developed and extended and efficient recursive computational algorithms have been provided by Barrodale and Errickson and Marplel. The exact least-squares method has been demonstrated to allow much improved spectral resolution and accuracy when compared to Burg's algorithm for single-channel time series although Burgapos;s algorithm requires somewhat less computational time and storage.
In the first “published” reference on maximum entropy spectra, an enormously influential and seminal symposium reprint, Burg (1967) announced his new method based upon exactly known, error free autocorrelation samples. Burg showed that if the first n samples were indeed the beginning of a legitimate autocorrelation function (ACF) that the next sample (n+1) was restricted to lie in a very small range. If that (n+1) sample were chosen to be in the center of the allowed range, then the sample number (n+2) would have the greatest freedom to be chosen, again in a small range. In the same paper, Burg also showed that the extrapolation to the center point of the permissable range corresponds to a maximum entropy situation in which the available data were fully utilized, while no unwarranted assumptions were made about unavailable data. In fact unmeasured data were to be as random as possible subject to the constraint that the power spectral density produce ACF values in agreement with the known, exact ACF.
Some time later it was recognized that the realization of exactly known ACF values rarely if ever occurs in practice and that the ACF is usually estimated from a few samples of the time series or from some other experimental arrangement and is therefore subject to measurement error. Thus the concept of exact matching of given ACF values was weakened to approximate matching: up to the error variance.
By
Stephen F. Gull, Mullard Radio Astronomy Observatory Cavendish Laboratory Cambridge CB3 0H3 U.K.,
John Fielden, Maximum Entropy Data Consultants Ltd. 33 North End Meldreth Royston SG8 6NR U.K.