To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Segmentation of speech signals based on fractal dimension
Computer speech recognition is an important subject that has been studied for many years. Until relatively recently, classical mathematics and signal processing techniques have played a major role in the development of speech recognition systems. This includes the use of frequency-time analysis, the Wigner transform, applications of wavelets and a wide range of artificial neural network paradigms. Relatively little attention has been paid to the application of random scaling fractals to speech recognition. The fractal characterization of speech waveforms was first reported by Pickover and Al Khorasani [1], who investigated the self-affinity and fractal dimension for human speech in general. They found a fractal dimension of 1.66 using Hurst analysis (see e.g. [2]). In the present chapter, we investigate the use of fractal-dimension segmentation for feature extraction and recognition of isolated words. We shall start with a few preliminaries that relate to speech recognition techniques in general.
Speech recognition techniques
Speech recognition systems are based on digitizing an appropriate waveform from which useful data is then extracted using appropriate pre-processing techniques. After that, the data is processed to obtain a signature or representation of the speech signal. This signature is ideally a highly compressed form of the original data that represents the speech signal uniquely and unambiguously. The signature is then matched against some that have been created previously (templates) by averaging a set of such signatures for a particular word.
Developing mathematical models to simulate and analyse noise has an important role in digital signal and image processing. Computer generated noise is routinely used to test the robustness of different types of algorithm; it is used for data encryption and even to enhance or amplify signals through ‘stochastic resonance’. Accurate statistical models for noise (e.g. the probability density function or the characteristic function) are particularly important in image restoration using Bayesian estimation [1], maximum-entropy methods for signal and image reconstruction [2] and in the image segmentation of coherent images in which ‘speckle’ (arguably a special type of noise, i.e. coherent Gaussian noise) is a prominent feature [3]. The noise characteristics of a given imaging system often dictate the type of filters that are used to process and analyse the data. Noise simulation is also important in the synthesis of images used in computer graphics and computer animation systems, in which fractal noise has a special place (e.g. [4, 5]).
The application of fractal geometry for modelling naturally occurring signals and images is well known. This is due to the fact that the ‘statistics’ and spectral characteristics of random scaling fractals are consistent with many objects found in nature, a characteristic that is expressed in the term ‘statistical self-affinity’. This term refers to random processes whose statistics are scale invariant. An RSF signal is one whose PDF remains the same irrespective of the scale over which the signal is sampled.
In this chapter we investigate the use of fractal geometry for segmenting digital signals and images. A method of texture segmentation is introduced that is based on the fractal dimension. Using this approach, variations in texture across a signal or image can be characterized in terms of variations in the fractal dimension. By analysing the spatial fluctuations in-fractal dimension obtained using a conventional moving-window approach, a digital signal or image can be texture segmented; this is the principle of fractal-dimension segmentation (FDS). In this book, we apply this form of texture segmentation to isolated speech signals.
An overview of methods for computing the fractal dimension is presented, focusing on an approach that makes use of the characteristic power spectral density function (PSDF) of a random scaling fractal (RSF) signal. A more general model for the PSDF of a stochastic signal is then introduced and discussed with reference to texture segmentation.
We shall apply fractal-dimension segmentation to a number of different speech signals and discuss the results for isolated words and the components (e.g. fricatives) from which these words are composed. In particular, it will be shown that by pre-filtering speech signals with a low-pass filter of the form 1/k, they can be classified into fractal dimensions that lie within the correct range, i.e. [1, 2]. This provides confidence in the approach to speech segmentation considered in this book and, in principle, allows a template-matching scheme to be designed that is based exclusively on FDS.
Modern information security manifests itself in many ways, according to the situation and its requirements. It deals with such concepts as confidentiality, data integrity, access control, identification, authentication and authorization. Practical applications, closely related to information security, are private messaging, electronic money, online services and many others.
Cryptography is the study of mathematical techniques related to aspects of information security. The word is derived from the Greek kryptos, meaning hidden. Cryptography is closely related to the disciplines of cryptanalysis and cryptology. In simple words, cryptanalysis is the art of breaking cryptosystems, i.e. retrieving the original message without knowing the proper key or forging an electronic signature. Cryptology is the mathematics, such as number theory, and the application of formulas and algorithms that underpin cryptography and cryptanalysis.
Cryptology is a branch of mathematical science describing an ideal world. It is the only instrument that allows the application of strict mathematical methods to design a cryptosystem and estimate its theoretical security. However, real security deals with complex systems involving human beings from the real world. Mathematical strength in a cryptographic algorithm is a necessary but not sufficient requirement for a system to be acceptably secure.
Moreover, in the ideal mathematical world, the cryptographic security of an object can be checked only by means of proving its resistance to various kinds of known attack. Practical security does not imply that the system is secure: other, unknown, types of attack may occur.
Speech is and will remain perhaps the most desirable medium of communication between humans. There are several ways of characterizing the communications potential of speech. One highly quantitative approach is in terms of information theory. According to information theory, speech can be represented in terms of its message content, or information. An alternative way of characterizing speech is in terms of the signal carrying the message information, that is the acoustic waveform [1].
The widespread application of speech processing technology required that touchtone telephones be readily available. The first touch-tone (dual-tone multifrequency, DTMF) telephone was demonstrated at Bell Laboratories in 1958, and deployment in the business and consumer world started in the early 1960s. Since DTMF service was introduced to the commercial and consumer world less than 40 years ago, it can be seen that voice processing has a relatively short history.
Research in speech processing by computer has traditionally been focused on a number of somewhat separable, but overlapping, problem areas. One of these is isolated word recognition, where the signal to be recognized consists of a single word or phrase, delimited by silence, to be identified as a unit without characterization of its internal structure. For this kind of problem, certain traditional pattern recognition techniques can be applied directly.
The computational background to digital signal processing (DSP) involves a number of techniques of numerical analysis. Those techniques which are of particular value are:
solutions to linear systems of equations
finite difference analysis
numerical integration
A large number of DSP algorithms can be written in terms of a matrix equation or a set of matrix equations. Hence, computational methods in linear algebra are an important aspect of the subject. Many DSP algorithms can be classified in terms of a digital filter. Two important classes of digital filter are used in DSP, as follows.
Convolution filters are nonrecursive filters. They use linear processes that operate on the data directly.
Fourier filters operate on data obtained by computing the discrete Fourier transform of a signal. This is accomplished using the fast Fourier transform algorithm.
Digital filters
Digital filters fall into two main categories:
real-space filters
Fourier-space filters
Real-space filters Real-space filters are based on some form of ‘moving window’ principle. A sample of data from a given element of the signal is processed giving (typically) a single output value. The window is then moved on to the next element of the signal and the process repeated. A common real-space filter is the finite impulse response (FIR) filter.
The bootstrap genesis is generally attributed to Bradley Efron. In 1977 he wrote the famous Rietz Lecture on the estimation of sampling distributions based on observed data (Efron, 1979a). Since then, a number of outstanding and nowadays considered classical statistical texts have been written on the topic (Efron, 1982; Hall, 1992; Efron and Tibshirani, 1993; Shao and Tu, 1995), complemented by other interesting monographic exposés (LePage and Billard, 1992; Mammen, 1992; Davison and Hinkley, 1997; Manly, 1997; Barbe and Bertail, 1995; Chernick, 1999).
Efron and Tibshirani (1993) state in the Preface of their book Our goal in this book is to arm scientists andengineers, as well as statisticians, with computational techniques that they can use to analyze and understand complicated data sets. We share the view that Efron and Tibshirani (1993) have written an outstanding book which, unlike other texts on the bootstrap, is more accessible to an engineer. Many colleagues and graduate students of ours prefer to use this text as the major source of knowledge on the bootstrap. We believe, however, that the readership of (Efron and Tibshirani, 1993) is more likely to be researchers and (post-)graduate students in mathematical statistics than engineers.
To the best of our knowledge there are currently no books or monographs on the bootstrap written for electrical engineers, particularly for signal processing practitioners.
Detection of signals in interference is a key area in signal processing applications such as radar, sonar and telecommunications. The theory of signal detection has been extensively covered in the literature. Many textbooks exist, including the classic by Van Trees (2001a) and his later additions to the series (Van Trees, 2001b, 2002a,b), the text on radar detection by DiFranco and Rubin (1980), and several texts on estimation and detection (Scharf, 1991; Poor, 1994; Kay, 1993, 1998). Signal detection theory is well established when the interference is Gaussian. However, methods for detection in the non-Gaussian case are often cumbersome and in many cases non-optimal.
Signal detection is formulated as a test of a hypothesis (Lehmann, 1991). To cover signal detection, we first need to introduce some concepts of hypothesis testing. This is followed by an exposition of bootstrap based hypothesis testing. In the second part of the chapter, we provide details on bootstrap detection of signals in Gaussian and non-Gaussian noise and show how bootstrap detection alleviates the restrictions imposed by classical detectors.
Principles of hypothesis testing
As the term suggests, in hypothesis testing one wishes to decide whether or not some formulated hypothesis is correct. The choice is between two decisions: accepting or rejecting the hypothesis.
Chapters 3 and 4 dealt with fundamentals of bootstrap based detection and model selection, respectively. In this chapter, we provide some interesting applications of the theory covered in the former chapters to real world problems. We report only on some problems we worked on over the last years. These selected problems had been solved using classical techniques only if we had made strong assumptions which may not be valid. They are also analytically intractable.
The applications include a wide range of signal processing problems. We first report on results for optimal vibration sensor placement on spark ignition engines to detect knock. We show how the bootstrap can be used to estimate distributions of complicated statistics. Then we discuss a passive acoustic emission problem where we estimate confidence intervals for an aircraft's flight parameters. This is followed by the important problem of civilian landmine detection. We suggest an approach to detect buried landmines using a ground penetrating radar. We continue with another radar application concerning noise floor estimation in high frequency over-the-horizon radar. The chapter concludes with the estimation of the optimal model for corneal elevation in the human eye.
Optimal sensor placement for knock detection
This application illustrates the concepts discussed in Sections 3.2 and 2.2 of hypothesis testing and variance stabilisation, respectively.
Signal processing has become a core discipline in engineering research and education. Many modern engineering problems rely on signal processing tools. This could be either for filtering the acquired measurements in order to extract and interpret information or for making a decision as to the presence or absence of a signal of interest. Generally speaking, statistical signal processing is the area of signal processing where mathematical statistics is used to solve signal processing problems. Nowadays, however, it is difficult to find an application of signal processing where tools from statistics are not used. A statistician would call the area of statistical signal processing time series analysis.
In most statistical signal processing applications where a certain parameter is of interest there is a need to provide a rigorous statistical performance analysis for parameter estimators. An example of this could be finding the accuracy of an estimator of the range of a flying aircraft in radar. These estimators are usually computed based on a finite number of measurements, also called a sample. Consider, for example, a typical radar scenario, in which we aim to ascertain whether the received signal contains information about a possible target or is merely interference. The decision in this case, based on calculating the so-called test statistic, has to be supported with statistical measures, namely the probability of detection and the probability of false alarm.