To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The characterization of how routers, computers, and physical links interconnect with each other in the global Internet is a very difficult task due to several key features of network development. A first one is the Internet's size and continued growth. The Internet is growing exponentially and its size has already increased by five orders of magnitude since its birth. In other words, the Internet is a large-scale object whose global properties cannot be inferred, in general, from local ones. A second difficulty is the intrinsic heterogeneity of the Internet, that is it is composed of networks engineered with large technical and administrative diversity. The different networking technologies are merged together by the TCP/IP architecture that, while providing connectivity, does not imply uniform behavior. Moreover, networks range from small local campuses to large transcontinental backbone providers. This difference in size is reflected in different administrative policies that make routing through the Internet a highly unpredictable and heterogeneous phenomenon. Also very important is the fact that the Internet is a self-organizing system, whose properties cannot be traced back to any blueprint or chart. It evolves and drastically changes over time according to evolutionary principles dictated by the interplay between cooperation (the network has to work efficiently) and competition (providers wish to earn money). This means that routers and links are added by competing entities according to local economic and technical constraints, leading to a very intricate physical structure that does not comply with any globally optimized plan.
While the scope of this book is to look at the Internet as a self-organizing complex system and to study its large-scale properties by using a statistical approach, a general knowledge of how the Internet works is necessary to identify the main elements forming the Internet, as well as their respective interactions. To give a physical analogy, if we want to understand the properties of a certain material we first have to know the atomic elements it is composed of, and how they interact. Similarly, it is impossible to approach the Internet without first having some hint of what a router is and how it communicates with its peers.
In this chapter we provide a brief description of the different elements, both hardware and software, at the basis of the Internet's functioning, to allow the non-expert readers to familiarize themselves with the general mechanisms that make it possible to transfer data from host to host. These mechanisms will turn out to be very relevant for understanding problems related with measurement infrastructures and other communication processes taking place in the Internet. Our exploration of the Internet's workings will be necessarily non-technical and, needless to say, the expert reader can freely skip this chapter and use it later on for convenient reference.
Physical description
In providing a general picture of the Internet, the starting point is the concept of a computer network. A network is any set of computers – usually referred to as hosts – connected in such a way that each one of them can inter-operate with all the others.
The Internet is composed by thousands of different elements – both at the hardware and software level – which are naturally susceptible to errors, malfunctioning, or other kind or failures, such as power outages, hardware problems, or software errors (Paxson, 1997; Labovitz, Ahuja, and Jahanian, 1999). Needless to say, the Internet is also subject to malicious attacks. The most common of those are the denial-of-service attacks, that encompass a broad set of attacks aimed at a diversity of Internet services, such as the consumption of limited resources or the physical destruction of network components (C.E.R. Team, 2001). Given so many open chances for errors and failures, it might sometimes be surprising that the Internet functions at all.
The design of a computer network resilient to local failures (either random malfunctions or intentional attacks) was indeed one of the main motivations for the original study of distributed networks by Paul Baran (1964). Considering the worst possible scenario of an enemy attack directed towards the nodes of a nationwide computer network, Baran analyzed the “survivability” (defined as the average fraction of surviving nodes capable of communication with any other surviving node) of the several network designs available at that time. His conclusion was that the optimal network, from the survivability point of view, was a mesh-like graph with a sufficient amount of redundancy in the paths between vertices. Even in the case of a severe enemy strike, depleting a large number of components, such network topology, would ensure the connectivity among the surviving computers, diverting communications along the ensemble of alternative paths.
In previous chapters we have presented an X-ray picture of the Internet's topological skeleton and the consequences that the revealed structure has on several properties of this network. There is, however, another dimension to add to the Internet picture that refers to the quantitative characterization of the actual flow of data and transmission capacity of the various elements forming the Internet. Physical links have a large heterogeneity in their data transmission capacity. Also, the traffic load is highly variable on different connections and at different times; that is, a full characterization of the network cannot prescind from the knowledge of the interplay between the topological structure, the detailed physical properties of each element, and the traffic generated by users.
The measurement and modeling of link traffic and performance in LANs or WANs has been a major area of activity over the recent years. These studies pointed out the failure of the Poisson traffic model, providing empirical evidence for traffic self-similarity. The modeling of these properties finds a natural framework in the vast mathematical theory of self-similar stochastic processes, and intense research activity has been focused both on the understanding of the origin of the scale-free nature of traffic and its implications at the performance level. Measurements of link traffic imply the accumulation of very long time series of traffic loads and usually refer to a specific link or very small networks.
The problem of searching in complex networks is a very relevant issue, with a large number of practical applications. As the most obvious example of this problem we can consider the World Wide Web. It is an immense data depot that, however, acquires a practical interest and value only if it is possible to locate the particular information one is looking for. Probably, the first instance in which searching in complex networks has been considered is in the sociological context. In a now famous experiment, Milgram (1967) randomly selected people in a town in the Midwest, and asked them to send a letter to a target person living in Boston. The rule was that letters could not be sent directly but passed from person to person, only between people known on a first-name basis, until reaching the target. The unexpected result of this experiment was that the letters that arrived to the target traveled through short chains of acquaintances, with an average length of only six intermediate hops. The conclusion that can be drawn from this result is not only that short paths exist between the vertices of a social network (the small-world property), but that these paths can be efficiently found in order to transmit information. The very same scenario turns out to be very relevant also for technological networks such as the Internet.
In this chapter we will review the main searching strategies that can be applied in order to retrieve information from the vertices of a network, focusing specially on the Internet and the virtual networks hosted by it.
The Internet is such a rapidly evolving system that sometimes the effort put in its study might appear as extenuating and futile as Sisyphus' pursuit. Indeed, one cannot but ask the question: Would the picture and results obtained today still be valid for the Internet of tomorrow?
The scientific community has just started to partially understand the complex properties of present Internet maps, as new and more ambitious mapping projects are being developed, and new kinds of networks are making their appearance, posing new theoretical and experimental challenges. For instance, P2P and ad-hoc networks define a new class of dynamical system for which new types of monitoring infrastructures and different modeling frameworks need to be developed.
Another rapidly changing aspect of the Internet is related to traffic load and performance. The traffic in the Internet is steadily increasing, due to users' demands, along with the available bandwidth of communications lines. Unfortunately, we have not yet achieved a proper theoretical understanding of the interplay and mutual feedback between topological features and traffic load. This makes it very difficult to forecast if the existing infrastructures, as well as new networks joining the Internet, will reorganize in view of this interplay, providing a different global picture of the Internet.
As if all that were not enough, new technical improvements, at the software, hardware, and protocol levels, are constantly changing many of the mechanisms that rule the fundamental processes that make the Internet work at the “microscopic” level.
In the formulation of statistical mechanics we are concerned with the theoretical prediction of thermodynamic properties of a physical system which contains a large number of particles such as electrons, atoms, and molecules by some statistical average processes performed over an appropriately prepared statistical sample. There are many different ways to prepare the sample. J. Willard Gibbs (1901) coined the name ensemble for the statistical sample, and three different ensembles are introduced, i.e., the microcanonical, canonical, and grand canonical ensembles.
We can choose for the physical system any thermodynamical system such as a single-component gas, liquid, or solid, as well as a mixture of many components, as long as the system is in the condition of thermodynamical equilibrium. In order to establish a fundamental principle of statistical mechanics, however, we naturally choose as simple a system as possible, such as the one-component dilute gas made up of structureless monatomic molecules. We then extend the fomulation step by step to more complicated systems. In this chapter, formulation of the three Gibbs ensembles will be developed.
The microcanonical ensemble is a collection of identical replicas of a given physical system which is a gas made up of noninteracting structureless particles. Firstly, the system is assumed to be contained in a box of volume V, the number of particles is equal to N, and the total energy is given in a narrow range between E and E + dE.