To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Machine learning refers to the design of computer algorithms for gaining new knowledge, improving existing knowledge, and making predictions or decisions based on empirical data. Applications of machine learning include speech recognition [164, 275], image recognition [60, 110], medical diagnosis [309], language understanding [50], biological sequence analysis [85], and many other fields. The most important requirement for an algorithm in machine learning is its ability to make accurate predictions or correct decisions when presented with instances or data not seen before.
Classification of data is a common task in machine learning. It consists of finding a function z = G(y) that assigns to each data sample y its class label z. If the range of the function is discrete, it is called a classifier, otherwise it is called a regression function. For each class label z, we can define the acceptance region Az such that y ∈ Az if and only if z = G(y). An error occurs if the classifier assigns a wrong class to y. The probability of classification error
is called the generalization error in machine learning, where Z denotes the actual class to which the observation variable Y belongs. The classifier that minimizes the generalization error is called the Bayes classifier and the minimized ε(G) is called the Bayes error. In practical applications, we generally do not know the probability distribution of (Y, Z).
In this chapter we will study statistical methods to estimate parameters and procedures to test the goodness of fit of a model to the experimental data. We are primarily concerned with computational algorithms for these methods and procedures. The expectation-maximization (EM) algorithm for maximum-likelihood estimation is discussed in detail.
Classical numerical methods for estimation
As we stated earlier, it is often the case that a maximum-likelihood estimate (MLE) cannot be found analytically. Thus, numerical methods for computing the MLE are important. Finding the maximum of a likelihood function is an optimization problem. There are a number of optimization algorithms and software packages. In this and the next sections we will discuss several important methods that are pertinent to maximization of a likelihood function: the method of moments, the minimum χ2method, the minimum Kullback–Leibler divergence method, and the Newton–Raphson algorithm. In Section 19.2 we give a full account of the EM algorithm, because of its rather recent development and its increasing applications in signal processing and other science and engineering fields.
Method of moments
This method is typically used to estimate unknown parameters of a distribution function by equating the sample mean, sample variance, and other higher moments calculated from data to the corresponding moments expressed in the parameters of interest.
Although it is convenient for conceptual and theoretical purposes to think of signals as general functions of time, in practice they are usually acquired, processed, stored and transmitted as discrete and finite time samples. We need to study this sampling process carefully to determine to what extent a sampling or discretization allows us to reconstruct the original information in the signal. Furthermore, real signals such as speech or images are not arbitrary functions. Depending on the type of signal, they have special structure. No one would confuse the output of a random number generator with human speech. It is also important to understand the extent to which we can compress the basic information in the signal to minimize storage space and maximize transmission time.
Shannon sampling is one approach to these issues. In that approach we model real signals as functions f(t) in L2(ℝ) that are bandlimited. Thus if the frequency support of f(ω) is contained in the interval [-Ω, Ω] and we sample the signal at discrete time intervals with equal spacing less than 1/2πΩ, i.e., faster than the Nyquist rate, we can reconstruct the original signal exactly from the discrete samples. This method will work provided hardware exists to sample the signal at the required rate. Increasingly this is a problem because modern technologies can generate signals of higher bandwidth than existing hardware can sample.
Traditionally, wireless and optical fiber networks have been designed separately from each other. Wireless networks are aimed at meeting specific service requirements while coping with particular transmission impairments and optimizing the utilization of the system resources to ensure cost-effectiveness and satisfaction for the user. In optical networks, on the other hand, research efforts rather focused on cost reduction, simplicity, and future-proofness against legacy and emerging services and applications by means of optical transparency. Wireless and optical access networks can be thought of as complementary. Optical fiber does not go everywhere, but where it does go, it provides a huge amount of available bandwidth. Wireless access networks, on the other hand, potentially go almost everywhere, but provide a highly bandwidth-constrained transmission channel susceptible to a variety of impairments.
Future broadband access networks not only have to provide access to information when we need it, where we need it, and in whatever format we need it, but also, and arguably more importantly, have to bridge the digital divide and offer simplicity and user-friendliness based on open standards in order to stimulate the design of new applications and services. Toward this end, future broadband access networks must leverage on both optical and wireless technologies and converge them seamlessly, giving rise to fiber-wireless (FiWi) access networks (Aissa and Maier [2007]). FiWi access networks are instrumental in strengthening our information society while avoiding its digital divide.
The purpose of this chapter is to introduce key structural concepts that are needed for theoretical transform analysis and are part of the common language of modern signal processing and computer vision. One of the great insights of this approach is the recognition that natural abstractions which occur in analysis, algebra and geometry help to unify the study of the principal objects which occur in modern signal processing. Everything in this book takes place in a vector space, a linear space of objects closed under associative, distributive and commutative laws. The vector spaces we study include vectors in Euclidean and complex space and spaces of functions such as polynomials, integrable functions, approximation spaces such as wavelets and images, spaces of bounded linear operators and compression operators (infinite dimensional). We also need geometrical concepts such as distance and shortest (perpendicular) distance, and sparsity. This chapter first introduces important concepts of vector space and subspace which allow for general ideas of linear independence, span and basis to be defined. Span tells us for example, that a linear space may be generated from a smaller collection of its members by linear combinations. Thereafter, we discuss Riemann integrals and introduce the notion of a normed linear space and metric space. Metric spaces are spaces, nonlinear in general, where a notion of distance and hence limit makes sense. Normed spaces are generalizations of “absolute value” spaces.
Fiber-wireless (FiWi) access networks may be viewed as the endgame of broadband access. FiWi access networks aim at leveraging on the respective strengths of emerging next-generation optical fiber and wireless access technologies and smartly merging them into future-proof broadband solutions. Currently, many research efforts in industry, academia, and various standardization bodies focus on the design and development of next-generation broadband access networks, ranging from short-term evolutionary next-generation passive optical networks with coexistence requirements with installed fiber infrastructures, so-called NG-PON1, to mid-term revolutionary disruptive optical access network architectures without any coexistence requirements, also known as NG-PON2, all the way to 4G mobile WiMAX and cellular long term evolution (LTE) radio access networks. To deliver peak data rates of up to 200 Mb/s per user and realize what some people refer to as the vision of complete fixed-mobile convergence (Ali et al. [2010]) it is crucial to replace today's legacy circuit-switched wireline and microwave backhaul technologies with integrated FiWi broadband access networks. To unleash the full potential of FiWi access networks, emerging optical and wireless access network technologies have to be truly integrated at the physical, data link, network, and/or service layers instead of simply mixing and matching them.
In this chapter we will discuss some important inequalities used in probability and statistics and their applications. They include the Cauchy–Schwarz inequality, Jensen's inequality, Markov and Chebyshev inequalities. We then discuss Chernoff's bounds, followed by an introduction to large deviation theory.
Inequalities frequently used in probability theory
Cauchy–Schwarz inequality
The Cauchy–Schwarz inequality is perhaps the most frequently used inequality in many branches of mathematics, including linear algebra, analysis, and probability theory. In engineering applications, a matched filter and correlation receiver are derived from this inequality. Since the Cauchy–Schwarz inequality holds for a general inner product space, we briefly review its properties and in particular the notion of orthogonality. We assume that the reader is familiar with the notion of field and vector space (e.g., see Birkhoff and MacLane [28] and Hoffman and Kunze [153]). Briefly stated, a field is an algebraic structure with notions of addition, subtraction, multiplication, and division, satisfying certain axioms. The most commonly used fields are the field of real numbers, the field of complex numbers, and the field of rational numbers, but there is also a finite field, known as a Galois field. Any field may be used as the scalars for a vector space.
The estimation of a random variable or process by observing other random variables or processes is an important problem in communications, signal processing and other science and engineering applications. In Chapter 18 we considered a partially observable RVX = (Y, Z), where the unobservable part Z is called a latent variable. In this chapter we study the problem of estimating the unobserved part using samples of the observed part. In Chapter 18 we also considered the problem of estimating RVs, called random parameters using the maximum a posteriori probability (MAP) estimation procedure. When the prior distribution of the random parameter is unknown, we normally assume a uniform distribution, and then the MAP estimate reduces to the maximumlikelihood estimate (MLE) (see Section 18.1.2): if the prior density is not uniform, the MLE is not optimal and does not possess the nice properties described in Section 18.1.2. If an estimator θ = T(X) has a Gaussian distribution N(µ, ∑), its log-likelihood function is a quadratic form (t - µ)⊤ ∑-1(t - µ), and the MLE is obtained by minimizing this quadratic form. If the variance matrix is diagonal, the MLE becomes what is called a minimum weighted square error (MWSE) estimate. If all the diagonal terms of ∑ are equal, the MWSE becomes the minimum mean square error (MMSE) estimate. Thus, the MMSE estimator is optimal only under these assumptions cited above.
Ethernet passive optical network (EPON) has gained a great amount of interest both in industry and academia as a cost-effective solution for broadband access networks, as illustrated by the formation of several forums and working groups, including the EPON forum and the Ethernet in the First Mile (EFM) alliance. EPON carries data encapsulated in Ethernet frames, which makes it easy to carry IP packets and eases the interoperability with installed Ethernet local area networks (LANs). EPON represents the convergence of low-cost Ethernet equipment [switches, network interface cards (NICs)] and low-cost fiber architectures. Furthermore, given the fact that more than 90% of today's data traffic originates from and terminates in Ethernet LANs, EPON appears to be a natural candidate for future first-mile solutions.
The main standardization body behind EPON is the IEEE 802.3ah task force. This task force developed the so-called multipoint control protocol (MPCP) which arbitrates the channel access among central office (CO) and subscribers. MPCP is used for dynamically assigning the upstream bandwidth (subscriber to service provider), which is the key challenge in the access protocol design for EPON. Note that MPCP does not specify any particular dynamic bandwidth allocation (DBA) algorithm. Instead, it is intended to facilitate the implementation of DBA algorithms.
To understand the importance of dynamic bandwidth allocation in EPON, note that the traffic on the individual links in the access network is quite bursty.