To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In order to make sense of the way in which users control parallel computers, we shall have to adopt a rather wider definition of the idea of programming than that which is usually taken. This is principally because of the existence of systems which are trained rather than programmed. Later in this chapter I shall attempt to show the equivalence between the two approaches but it is probably best to begin by considering how the conventional idea of programming is applied to parallel systems.
Parallel programming
There are three factors which we should take into account in order to arrive at a proper understanding of the differences between one type of parallel language and another and to appreciate where the use of each type is appropriate. These are whether the parallelism is hidden or explicit, which paradigm is employed and the level of the language. Although there is, inevitably, a certain amount of overlap between these factors, I shall treat them as though they were independent.
The embodiment of parallelism
There is really only one fundamental choice to be made here – should the parallelism embodied in a language be implicit or explicit? That is, should the parallelism be hidden from the programmers, or should it be specified by them? The first alternative is usually achieved by allowing the use of data types, in a program, which themselves comprise multiple data entities.
The purpose of this book has been to introduce the reader to the subject of parallel computing. In attempting to make the subject digestible, it is inevitable that a great deal of advanced material has been omitted. The reader whose interest has been kindled is directed to the bibliography, which follows this chapter, as a starting point for continued studies. In particular, a more rigorous treatment of advanced computer architectures is available in Advanced Computer Architectures by Hwang.
It should also be noted that parallel computing is a field in which progress occurs at a prodigious rate. So much work is being done that some startling new insight or technique is sure to be announced just after this book goes to press. In what follows, the reader should bear in mind the developing nature of the field. Nevertheless, a great deal of material has been presented here, and it is worthwhile making some attempt to summarise it in a structured manner.
I have attempted to make clear in the preceding chapters that understanding parallel computing is an hierarchical process. A valuable degree of insight into the subject can be obtained even at the level considered in the first chapter, where three basic classes or types of approach were identified. Thereafter, each stage of the process should augment the understanding already achieved. It is up to each reader to decide on the level of detail required.
As was indicated in Chapter 1, there is a prima facie case for supposing that parallel computers can be both more powerful and more cost-effective than serial machines. The case rests upon the twin supports of increased amount of computing power and, just as importantly, improved structure in terms of mapping to specific classes of problems and in terms of such parameters as processor-to-memory bandwidth.
This chapter concerns what is probably the most contentious area of the field of parallel computing – how to quantify the performance of these allegedly superior machines. There are at least two significant reasons why this should be difficult. First, parallel computers, of whatever sort, are attempts to map structures more closely to some particular type of data or problem. This immediately invites the question – on what set of data and problems should their performance be measured? Should it be only the set for which a particular system was designed, in which case how can one machine be compared with another, or should a wider range of tasks be used, with the immediate corollary – which set? Contrast this with the accepted view of the general-purpose serial computer, where a few convenient acronyms such as MIPS and MFLOPS (see Section 6.1.3) purport to tell the whole story. (That they evidently do not do so casts an interesting sidelight on our own problem.)
The second reason concerns the economic performance, or cost-effectiveness, of parallel systems.
Up to this point we have considered the subject of parallel computing as a series of almost separated facets – paradigms, languages, processor design, etc. – each of which can be examined in isolation. Of course, in arriving at the design of any real system, be it commercial or prototype, this approach is very far from the true manner of proceeding. In real system design, the watchword is usually compromise – compromise between specific and general applicability, compromise between user friendliness and efficiency in programming, compromise between the demands of high performance and low cost. No imaginary design exercise can do justice to the complexity or difficulty of this process, because the constraints in imaginary exercises are too flexible. In order to see how the various design facets interact, we need to examine real cases. This is the purpose of this chapter.
In it, I present a series of short articles concerning specific machines, each of which has been contributed by an author intimately concerned with the system in question. The brevity has been occasioned by the limited space available in a book of this sort, but it has had the beneficial effect of ensuring that the authors have concentrated on those areas which they consider to be important. Again, because of limited space I have curtailed the number of these contributions, although this has meant that some aspects of the subject remain unconsidered.
One of the main goals in preparing this book for publication was to preserve the thesis, as much as possible, in the form that it was originally submitted. With this in mind, we have restricted ourselves to making only very minor changes to the body of the thesis, for example, correcting typographical errors.
On the other hand, we have continued to work with the ideas presented here, to find new applications and to investigate some of the areas identified as topics for further research. In this short chapter, we domment briefly on some examples of this, illustrating both the progress that has been made and some of the new opportunities for further work that have been exposed.
We should emphasize once again that this is the only chapter that was not included as part of the original thesis.
Constructor classes
The initial ideas for a system of constructor classes as sketched in Section 9.2 have been developed in (Jones, 1993b), and full support for these ideas is now included in the standard Gofer distribution (versions 2.28 and later). The two main technical extensions in the system of constructor classes to the work described here are:
The use of kind inference to determine suitable kinds for all the user-defined type constructors appearing in a given program.
The extension of the unification algorithm to ensure that it calculates only kind-preserving substitutions. This is necessary to ensure soundness and is dealt with by ensuring that constructor variables are only ever bound to constructors of the corresponding kind. Fortunately, this has a very simple and efficient implementation.
While the results of the preceding chapter provide a satisfactory treatment of type inference with qualified types, we have not yet made any attempt to discuss the semantics or evaluation of overloaded terms. For example, given a generic equality operator (==) of type ∀a.Eq a ⇒ a → a → Bool and integer valued expressions E and F, we can determine that the expression E == F has type Bool in any environment which satisfies Eq Int. However, this information is not sufficient to determine the value of E == F; this is only possible if we are also provided with the value of the equality operator which makes Int an instance of Eq.
Our aim in the next two chapters is to present a general approach to the semantics and implementation of objects with qualified types based on the concept of evidence. The essential idea is that an object of type π ⇒ σ can only be used if we are also supplied with suitable evidence that the predicate π does indeed hold. In this chapter we concentrate on the role of evidence for the systems of predicates described in Chapter 2 and then, in the following chapter, extend the results of Chapter 3 to give a semantics for OML.
As an introduction, Section 4.1 describes some simple techniques used in the implementation of particular forms of overloading and shows why these methods are unsuitable for the more general systems considered in this thesis.
The Shorter Oxford English Dictionary defines the word paradigm as meaning pattern or example, but it is used here in its generally accepted sense in this field, where it is taken to imply a fundamental technique or key idea. This chapter, therefore, is concerned with describing the fundamental ideas behind the implementation of parallel computation.
Two matters need to be dealt with before we begin. First, the reader should avoid confusion between the basic approaches set out in Chapter 1 and the paradigms described here. In the final chapter of this book, I develop a taxonomy of parallel computing systems, i.e. a structured analysis of systems in which each succeeding stage is based on increasingly detailed properties. In this taxonomy, the first two levels of differentiation are on the basis of the three approaches of the first chapter, whereas the third level is based on the paradigms described here. This is shown in Figure 2.1.
Next, there is the whole subject of optical computing. In one sense, an optical component, such as a lens, is a data parallel computer of dedicated functionality (and formidable power). There is certainly an overlap in the functions of such components and those of, say, an image processing parallel computer of the conventional sort. A lens can perform a fourier transform (a kind of frequency analysis) on an image, literally at the speed of light, whereas a conventional computer requires many cycles of operation to achieve the same result.
Before attempting to understand the complexities of the subject of parallel computing, the intending user or student ought, perhaps, to ask why such an exotic approach is necessary. After all, ordinary, serial, computers are in successful and widespread use in every area of society in industrially developed nations, and obtaining a sufficient understanding of their use and operation is no simple task. It might even be argued that, since the only reason for using two computers in place of one is because the single device is insufficiently powerful, a better approach is to increase the power (presumably by technological improvements) of the single machine.
As is usually the case, such a simplistic approach to the problem conceals a number of significant points. There are many application areas where the available power of ‘ordinary’ computers is insufficient to obtain the desired results. In the area of computer vision, for example, this insufficiency is related to the amount of time available for computation, results being required at a rate suitable for, perhaps, autonomous vehicle guidance. In the case of weather forecasting, existing models, running on single computers, are certainly able to produce results. Unfortunately, these are somewhat lacking in accuracy, and improvements here depend on significant extensions to the scope of the computer modelling involved.
This chapter, describes GTC, an alternative approach to the use of type classes that avoids the problems associated with context reduction, while retaining much of the flexibility of HTC. In addition, GTC benefits from a remarkably clean and efficient implementation that does not require sophisticated compile-time analysis or transformation. As in the previous chapter we concentrate more on implementation details than on formal properties of GTC.
An early description of GTC was distributed to the Haskell mailing list in February 1991 and subsequently used as a basis for Gofer, a small experimental system based on Haskell and described in (Jones, 1991c). The two languages are indeed very close, and many programs that are written with one system in mind can be used with the other with little or no changes. On the other hand, the underlying type systems are slightly different: Using explicit type signature declarations it is possible to construct examples that are well typed in one but not in the other.
Section 8.1 describes the basic principles of GTC and its relationship to HTC. The only significant differences between the two systems are in the methods used to simplify the context part of an inferred type. While HTC relies on the use of context reduction, GTC adopts a weaker form of simplification that does not make use of the information provided in instance declarations.
Section 8.2 describes the implementation of dictionaries used in the current version of Gofer. As an alternative to the treatment of dictionaries as tuples of values in the previous chapter, we give a representation which guarantees that the translation of each member function definition requires at most one dictionary parameter.
The principal aim of this chapter is to show how the concept of evidence can be used to give a semantics for OML programs with implicit overloading.
Outline of chapter
We begin by describing a version of the polymorphic λ-calculus called OP that includes the constructs for evidence application and abstraction described in the previous chapter (Section 5.1). One of the main uses of OP is as the target of a translation from OML with the semantics of each OML term being defined by those of its translation. In Section 5.2 we show how the OML typing derivations for a term E can be interpreted as OP derivations for terms with explicit overloading, each of which is a potential translation for E. It is immediate from this construction that every well-typed OML term has a translation and that all translations obtained in this way are well-typed in OP.
Given that each OML typing typically has many distinct derivations it follows that there will also be many distinct translations for a given term and it is not clear which should be chosen to represent the original term. The OP term corresponding to the derivation produced by the type inference algorithm in Section 3.4 gives one possible choice but it seems rather unnatural to base a definition of semantics on any particular type inference algorithm. A better approach is to show that any two translations of a term are semantically equivalent so that an implementation is free to use whichever translation is more convenient in a particular situation while retaining the same, well-defined semantics.
Whilst the two issues of control (what are all these parallel processors going to do?) and programming (how is the user to tell them all what to do?) are perhaps the most difficult conceptual aspects of parallel computing, the question of the connections between all the many system components is probably the hardest technical problem. There are two main levels at which the problem must be considered. At the first level, the connections between major system components, such as CPU, controller, data input and output devices, must be given careful consideration to avoid introducing bottlenecks which might destroy hoped-for performance. At the second level, connections within these major components, particularly between processing units and their associated memories, will be the major controlling factor of overall system performance. Within this area, an important conceptual difference exists between two alternative approaches to inter-processor communication. Individual pairs of elements may be either externally synchronised, in which case a controller ensures that if data is input at one end of a line it is simultaneously accepted at the other, or unsynchronised. In this latter case, the donating element signals that data is available, but the data is not transferred until the receiving element is ready to accept it.
In addition to this, technical problems are present. The first is the position and purpose of memory in the general structure.