To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
By
Ramarao Inguva, Physics Department University of Wyoming Laramie, Wyoming 82071 and Department of Physics University of Albuqurque Albuqurque, New Mexico,
James Baker-Jarvis, Laramie Projects Office Laramie, Wyoming 82070
Using the principle of maximum entropy, a procedure is outlined to study some aspects of the inverse scattering problem. As an application we study a) the quantum mechanical inverse scattering problem using the Born Approximation, b) the electromagnetic inverse problem, and c) the solutions to the Marchenko equation of inverse scattering.
INTRODUCTION
Following the pioneering work by Jaynes [1], concerning the information theoretic approach to statistical mechanics, there have been several novel applications of the principle of maximum entropy in a number of areas such as image processing, and geophysical data analysis [2]. Of particular importance to the present work is a paper by Jaynes on time series analysis using the principle of maximum entropy in which Jaynes [3] derived Burg's [4] spectral method. The main goal of this paper is to demonstrate the applicability of the maximum entropy method for analyzing the generalized inverse problem. In Section 2 we develop the formulation for tackling generalized inverse problems suitable for the case when the available information is either incomplete or noisy. The efficacy of the formulation of Section 2 is demonstrated in Section 3 by studying its application to the inverse problem in quantum mechanical scattering theory. We present two more examples in Sections 4 and 5 where we present a solution to the inverse problem associated with the Marchenko integral equation of scattering theory and the electromagnetic inverse problem. Finally, in Section 6 we make some concluding remarks.
The ultimate problem in exploration seismology is the reconstruction by inference of the structure of portions of the earth's crust to depths of several tens of kilometers from seismic data recorded on the earthls surface. These measurements represent data obtained by recording the arrival of wave fields which are reflected back to the earth's surface due to variations in acoustic impedance within its interior. This data is complicated by multiple travel paths, conversion of wave modes (compressional or shear) at boundaries between homogeneous layers, and is corrupted by additive noise.
Over a long period of development, various procedures have been conceived which, when applied sequentially to seismic data, attempt to accurately reconstruct an earth model which most likely generated the observed data. This process is hampered not only by the complications mentioned above but also by the severely band-limited nature of seismic data (typically 5 to 100 Hz with most energy concentrated around 30 hz, for example) which introduces limitations on resolution, and difficulties in applying certain operators to the data in an attempt to improve resolution.
Most inverse procedures applied to seismic data today are deterministic procedures whose derivation is based on the “convolution model” to be introduced later. Recently, however, some new approaches to inversion of seismic data have been suggested. In contrast to previous methods devised for inverting data, these methods do not rely on operators applied directly to the data but rely instead on directly estimating an earth model which would generate data consistent with the observed data (which may still require some processing).
We consider three aspects of the maximum entropy formalism [1–3]. Our purpose is to dispel the three more common objections raised against the rationale and results of the approach. To do so we restrict the scope of the formalism: We consider only such experiments that can be repeated N, (N not necessarily large), times.
(a) Consistent inference: The probabilities determined using the maximum entropy formalism are shown to have the interpretation of the mean frequency. Their value is independent of the number, N, of repetitions of the experiment. What very much does depend on N is the variance of the frequency. The larger N is, the smaller is the variance and the less likely are the actual, observed, frequencies to deviate from the mean. Here (following [4]), we shall show that the maximum entropy formalism does have the stated consistency property. Elsewhere [5, 6] we have shown that it is the only algorithm with that property. In Ref. 6 there are additional arguments which are also based on the need for consistency of predictions in reproducible experiments.
The maximum entropy approach dates at least as far back as Boltzmann [7]. He showed that, in the N → ∞ limit, the maximum entropy formalism determines the most probable frequencies. Ever since, the approach has been plagued by the criticism that it is only valid in the N → ∞ limit. The present [4–6] results should put an end to such arguments.
MAXENT (MAXimum ENTropy principle) is a general method of statistical inference derived from and intrinsic to statistical mechanics. The probabilities it produces are “logical probabilities” – measures of the logical relationship between hypothesis and evidence. We consider the significance and applications of the “logical probability” of such probabilities. The probability of a “logical probability” is shown to be the probability of the evidence used for the “logical probability”. This suggests a hierarchy of logics, with “evidences” defined as sets of probabilities on the preceding “logic”. Applications to reliability theory are described. We also clarify the meaning of MAXENT and examine arguments in a recent article in which temperature fluctuations are introduced in thermal physics.
INTRODUCTION
A method fundamental to statistical physics is the maximization of entropy. In recent years, this method has been recognized as a general procedure for statistical inference based on the fact that “entropy” is essentially a measure of information uncertainty [1]. The probabilities one obtains using MAXENT (as the “Maximum Entropy Principle” is now called) have a natural interpretation which has not been generally recognized, even by advocates of the procedure. This is the “degree of belief” (DOB) interpretation [2] – that “probability” is a measure of the logical relationship between two propositions: p(H | E) expresses a (normalized) “degree of belief” (DOB) in the relationship of hypothesis H to evidence E. Indeed, MAXENT asserts precisely the (statistical) consequences of assumed evidence since it is based on the idea that one should choose as probability one which maximizes “uncertainty” consistent with the evidence.
The Fourth Workshop on Maximum Entropy and Bayesian Methods in Applied Statistics was held in Calgary, Alberta, at the University of Calgary, August 5–8, 1984. The workshop continued a three-year tradition of workshops begun at the University of Wyoming, in Laramie, attended by a small number of researchers who welcomed the opportunity to meet and to exchange ideas and opinions on these topics. From small beginnings, the workshop has continued to grow in spite of any real official organization or basis for funding and there always seems to be great interest in “doing it again next year.”
This volume represents the proceedings of the fourth workshop and includes one additional invited paper which was not presented at the workshop but which we are pleased to include in this volume (Ellis, Gohberg, Lay). The fourth workshop also made a point of scheduling several exceptional tutorial lectures by some of our noted colleagues, Ed Jaynes, John Burg, John Shore, and John Skilling. These tutorial lectures were not all written up for publication and we especially regret that the outstanding lectures by John Burg and John Shore must go unrecorded.
The depth and scope of the papers included in this volume attest, I believe, to the growing awareness of the importance of maximum entropy and Bayesian methods in the pure and applied sciences and perhaps serve to indicate that much remains to be done and many avenues are yet to be explored.
The maximum entropy approach to inversion appears at its best when conclusions are to be drawn from very limited information. An example is the estimation of the density profile of the earth (assumed to be spherically symmetric) on the basis of only its mean density and its relative moment of inertia. With conventional methods giving rather unsatisfactory results, the maximum entropy method provides a density profile which agrees surprisingly well with the one presently considered to be the best.
INTRODUCTION
Inverse problems in geophysics frequently confront us with one of two extreme situations. While we generally have a large number of unknowns to estimate, we may also have a tremendous amount of data – actually, more data than we can reasonably process – and more or less elaborate data reduction schemes are employed to reduce the wealth of data to a more manageable size. Of course, this reduction is performed in a way which improves the quality of the retained data in some sense (e.g., increases the signal-to-noise ratio).
At other times we may still have large numbers of unknowns to contend with but very few data. In fact, the data may be so inadequate that any attempt at estimating the unknowns appears bound to fail. It is this situation which I now want to address by means of an example.
Unknown prior probabilities can be treated as intervening variables in the determination of a posterior distribution. In essence this involves determining the minimally informative information system with a given likelihood matrix.
Some of the consequences of this approach are non-intuitive. In particular, the computed prior is not invariant for different sample sizes in random sampling with unknown prior.
GENERALITIES
The role of prior probabilities in inductive inference has been a lively issue since the posthumous publication of the works of Thomas Bayes at the close of the 18th century. Attitudes on the topic have ranged all the way from complete rejection of the notion of prior probabilities (Fisher, 1949) to an insistence by contemporary Bayesians that they are essential (de Finetti, 1975). A careful examination of some of the basics is contained in a seminal paper by E.T. Jaynes, the title of which in part suggested the title of the present essay (Jaynes, 1968).
The theorem of Bayes, around which the controversy swirls, is itself non-controversial. It is, in fact, hardly more than a statement of the law of the product for probabilities, plus the commutativity of the logical product. Equally straightforward is the fact that situations can be found for which representation by Bayes theorem is unassailable. The classic classroom two-urn experiment is neatly tailored for this purpose. Thus, the issue is not so much a conceptual one, involving the “epistemological status of prior probabilities, as it is a practical One.
By
Bruce R. Musicus, Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge, Mass. 02139,
Rodney W. Johnson, Computer Science and Systems Branch Information Technology Division Naval Research Laboratory Washington, D.C. 20375
A new relative-entropy method is presented for estimating the power spectral density matrix for multichannel data, given correlation values for linear combinations of the channels, and given an initial estimate of the spectral density matrix. A derivation of the method from the relative-entropy principle is given. The basic approach is similar in spirit to the Multisignal Relative-Entropy Spectrum Analysis of Johnson and Shore, but the results differ significantly because the present method does not arbitrarily require the final distributions of the various channels to be independent. For the special case of separately estimating the spectra of a signal and noise, given the correlations of their sum, Multichannel Relative-Entropy Spectrum Analysis turns into a two stage procedure. First a smooth power spectrum model is fitted to the correlations of the signal plus noise. Then final estimates of the spectra and cross spectra are obtained through linear filtering. For the special case where p uniformly spaced correlations are known, and where the initial estimate of the signal plus noise spectrum is all-pole with order p or less, this method fits a standard Maximum Entropy autoregressive spectrum to the noisy correlations, then linearly filters to calculate the signal and noise spectra and cross spectra. An illustrative numerical example is given.
INTRODUCTION
We examine the problem of estimating power spectra and cross spectra for multiple signals, given selected correlations of various linear combinations of the signals, and given an initial estimate of the spectral density matrix.
In seeking to express efficiently the dependence of one variable upon another, physicists have developed a repertoire of familiar functions which have well-defined properties and can be used in combination with each other to describe more complicated ones. They are the basic building blocks of practical mathematical analysis. In these two chapters we shall discuss the most elementary ones – the exponential, logarithmic and trigonometric (or circular) functions. Later, we shall see that further ‘special functions’ can be useful and that there is no end to this process. The skilled applied mathematician, like the skilled linguist, acquires a wide and powerful vocabulary. Usually, when we speak of an ‘exact’ solution to a physical problem we mean that a solution can be expressed in terms of familiar functions. Thus the availability of such a solution depends upon the range of one's vocabulary of functions.
Probably the most widely discussed, if not always identified, function in or out of physics is the exponential. Radioactive decay – ‘the half-life of 42K is 12.4 hours’, i.e. at the end 12.4 hours the initial activity has reduced to one-half. Exponential growth – ‘invest your surplus cash in X holdings at 20% (taxable)’. Sometimes threatening – ‘if unchecked, the world demand for energy will double every ten years’. Let us look more closely at the everyday example of bank interest.
The non-mathematical New Yorker has no difficulty in locating places – he uses Cartesian coordinates. Numbered streets run east–west and avenues run north–south. If he lives in a city apartment, his floor provides the third coordinate. For the physicist or chemist, atoms in molecules in space are locatable with a similar grid system. Three directions or axes marked out in blocks or length units can provide the unambiguous address of any chosen atom or event. Since the axes, unlike the New York roadways, are fictional, there is some freedom of choice, and this has mathematical consequences. Although any set of axes (mutually perpendicular) will do, if there are other physically significant directions already present, such as the direction of gravity, it generally pays to use them. The house numbers, so to speak, can be counted in either direction so that, having chosen ‘zero’, both positive and negative locations may appear. The use of such coordinates was first introduced by the seventeenth-century French philosopher and scientist René Descartes.
In one dimension (in which we never depart from a straight line), all positions may be measured as displacements (in a chosen direction) from a fixed point, the origin, and are generally denoted by x. So x is continuous and can be positive or negative. Two points x1, x2 have a separation |x2 – x1| (always positive), and the displacement of x2 from x1 is x2 – x1 (may be positive or negative).
The movement of a solid object by a force is practical dynamics in action. Consider for example a ship's anchor being dragged by its cable and digging into the sea-bed. How does the anchor move, and in response to what forces? There is the tension in the cable, the reaction and resistance of the sea-bed material, the weight of the anchor, and so on. (One of the authors can claim some expertise in this matter, having been involved in a court case which centred upon it.) The forces are vectors (chapter 4) – they can be resolved in any convenient direction, and they can be added according to the vector rule to give a resultant. There is a theorem of dynamics which states that the centre of mass of the anchor moves in response to this force, as if all the mass of the anchor were concentrated there. But the various forces act at different points. The cable tension acts at the end of the shank, the soil/sand reaction at the fluke-centre, and so on. Accordingly, they can also rotate the anchor.
What is the appropriate combination of forces with which to consider this aspect of the problem? The answer is the total torque or moment of the forces about some chosen point. Another theorem of dynamics prescribes the rotational motion of the body about the point, in response to the torque.