To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The ability to understand and solve high-dimensional inference problems is essential for modern data science. This chapter examines high-dimensional inference problems through the lens of information theory and focuses on the standard linear model as a canonical example that is both rich enough to be practically useful and simple enough to be studied rigorously. In particular, this model can exhibit phase transitions where an arbitrarily small change in the model parameters can induce large changes in the quality of estimates. For this model, the performance of optimal inference can be studied using the replica method from statistical physics but, until recently, it was not known whether the resulting formulas were actually correct. In this chapter, we present a tutorial description of the standard linear model and its connection to information theory. We also describe the replica prediction for this model and outline the authors’ recent proof that it is exact.
Graph Signal Processing (GSP) is a general theory, whose goal is to bring about tools for graph signals analysis that are a direct generalization of Digital Signal Processing (DSP). The goal of this chapter is understanding the graph-spectral properties of the signals, which are typically explained through the linear generative model using graph filters. Are PMU a graph signal that obeys the linear generative model prevalent in the literature? If so, what kind of graph-filter structure and excitation justifies the properties discussed already? Can we derive new strategies to sense and process these data based on GSP? By putting the link between PMU data and GSP on the right footing, we can determine to what extent GSP tools are useful, and specify how we can use the basic equations for gaining theoretical insight that support the observations.
This chapter focuses on critical infrastructures in the power grid, which often rely on Industrial Control Systems (ICS) to operate and are exposed to vulnerabilities ranging from physical damage to injection of information that appears to be consistent with industrial control protocols. This way, infiltration of firewalls protecting the control perimeter of the control network becomes a significant tread. The goal of this chapter is to review identification and intrusion detection algorithms for protecting the power grid, based on the knowledge of the expected behavior of the system.
Processing, storing, and communicating information that originates as an analog phenomenon involve conversion of the information to bits. This conversion can be described by the combined effect of sampling and quantization. The digital representation in this procedure is achieved by first sampling the analog signal so as to represent it by a set of discrete-time samples and then quantizing these samples to a finite number of bits. Traditionally, these two operations are considered separately. The sampler is designed to minimize information loss due to sampling based on prior assumptions about the continuous-time input. The quantizer is designed to represent the samples as accurately as possible, subject to the constraint on the number of bits that can be used in the representation. The goal of this chapter is to revisit this paradigm by considering the joint effect of these two operations and to illuminate the dependence between them.
With rapid development in hardware storage, precision instrument manufacturing, and economic globalization etc., data in various forms have become ubiquitous in human life. This enormous amount of data can be a double-edged sword. While it provides the possibility of modeling the world with a higher fidelity and greater flexibility, improper modeling choices can lead to false discoveries, misleading conclusions, and poor predictions. Typical data-mining, machine-learning, and statistical-inference procedures learn from and make predictions on data by fitting parametric or non-parametric models. However, there exists no model that is universally suitable for all datasets and goals. Therefore, a crucial step in data analysis is to consider a set of postulated candidate models and learning methods (the model class) and select the most appropriate one. We provide integrated discussions on the fundamental limits of inference and prediction based on model-selection principles from modern data analysis. In particular, we introduce two recent advances of model-selection approaches, one concerning a new information criterion and the other concerning modeling procedure selection.
The purpose of this chapter is to set the stage for the book and for the upcoming chapters. We first overview classical information-theoretic problems and solutions. We then discuss emerging applications of information-theoretic methods in various data-science problems and, where applicable, refer the reader to related chapters in the book. Throughout this chapter, we highlight the perspectives, tools, and methods that play important roles in classic information-theoretic paradigms and in emerging areas of data science. Table 1.1 provides a summary of the different topics covered in this chapter and highlights the different chapters that can be read as a follow-up to these topics.
The electric power system is evolving toward a massively distributed infrastructure with millions of controllable nodes. Its future operational landscape will be markedly different from existing operations, in which power generation is concentrated at a few large fossil-fuel power plants, use of renewable generation and storage is relatively rare, and loads typically operate in open-loop fashion. This chapter provides an overview of the technical developments that aim to leverage advances in optimization and control to develop distributed control frameworks for next-generation power systems that ensure stability, preserve reliability, and meet economic objectives and customer preferences.
Approximate computation methods with provable performance guarantees are becoming important and relevant tools in practice. In this chapter we focus on sketching methods designed to reduce data dimensionality in computationally intensive tasks. Sketching can often provide better space, time, and communication complexity trade-offs by sacrificing minimal accuracy. This chapter discusses the role of information theory in sketching methods for solving large-scale statistical estimation and optimization problems. We investigate fundamental lower bounds on the performance of sketching. By exploring these lower bounds, we obtain interesting trade-offs in computation and accuracy. We employ Fano’s inequality and metric entropy to understand fundamental lower bounds on the accuracy of sketching, which is parallel to the information-theoretic techniques used in statistical minimax theory.
Studies of prosumer decision making in the smart grid have focused on a single decision within the framework of expected utility theory (EUT) and behavioral theories such as Prospect Theory. This chapter studies prosumer decision making in a more natural market situation in which a prosumer has to decide whether to make a sale of solar energy units generated at her home every day or hold (store) the energy units in anticipation of a future sale at a better price. Specifically, it proposes a new behavioral model that extends EUT to take into account bounded horizons (in terms of the number of days) that prosumers implicitly impose on their decision making in arriving at “hold” or “sell” decisions of energy units. The new behavioral model assumes that humans make decisions that will affect their lives within a bounded horizon regardless of how far into the future their units may be sold. Modeling the utility of the prosumer using parameters such as the offered price on a day, the available energy units on a day, and the probabilities of the forecast prices, both traditional EUT and the proposed behavioral model with bounded horizons are fit to prosumer data.