To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
“But if thought corrupts language, language can also corrupt thought.”
George Orwell
Automated Interoperability Tools
In human cultures, membership in a community often implies sharing a common language. This holds as much for computer programming languages as for natural human tongues. Certain languages dominate certain research communities. As a result, the modern drive towards multidisciplinary studies inevitably leads to mixed-language development. As noted in the preface, survey evidence suggests that approximately 85% of high-performance computing users write in some flavor of C/C++/C#, whereas 60% write in Fortran. These data imply that at least 45% write in both Fortran and C/C++/C#, so the union of these two language families likely contains the most important pairings in mixed-language scientific programming.
In many settings, the language version matters as much as the language identity. For example, interfacing other languages with object-oriented Fortran 2003 poses a much broader set of challenges than does interfacing with procedural Fortran 77 or even object-based Fortran 95. It also matters whether one language invokes code in a second language, the second invokes code in the first, or both. As suggested by Rasmussen et al. (2006), one can account for invocation directionality by considering ordered pairs of languages, where the order determines which language is the caller and which is the callee.
Consider a set {A,B,C, …} of N dialects, where each member is a language or language version distinct from each other member.
“Never be afraid to try something new. Remember, amateurs built the ark. Professionals built the Titanic.”
Miss Piggy
The Problem
While the abstract calculus and strategy patterns apply to the integration of a single physics abstraction, our chief concern lies in linking multiple abstractions. This poses at least two significant software design problems. The first involves how to facilitate interabstraction communication. The GoF addressed interabstraction communication with the mediator pattern. When N objects interact, a software architect can reduce the N(N−1)+ associations between the objects to 2N associations by employing a mediator.
The mediator association count stems from the requirements that the Mediator know each communicating party and those parties know the Mediator. For example, in a mediator implementation presented by Gamma et al. (1995), the sender passes a reference to itself to the mediator. The sender must be aware of the mediator in order to know where to send the message. Likewise, the mediator must be aware of the sender in order to invoke methods on the sender via the passed reference. Figure 8.1 illustrates the associations in an atmospheric boundary layer model, wherein the air, ground, and cloud ADTs might solve equation sets for the airflow, ground transpiration, and discrete droplet motion, respectively.
A second and conceptually more challenging problem concerns how one assembles quantities of an inherently global nature – that is, information that can only be determined with simultaneous knowledge of the implementation details of each of the single-physics abstractions.
“All professions are conspiracies against the laity.”
George Bernard Shaw
The Problem
The context of abstract calculus is the construction of numerical software that approximates various differential and integral forms. Two pairs of conflicting forces arise in this context. In the first pair, the low-level nature of the mathematical constructs provided by mainstream programming languages constrains the design of most scientific programs. A desire for syntax and semantics that naturally represents the much richer mathematical language of scientists and engineers opposes this constraint.
The C++ language contains native scalar and one-dimensional array variables. The C++ STL extends these in vectors with nice properties such as automatic memory management, including sizing and resizing upon assignment. Fortran 2003 provides similar capabilities with its multidimensional allocatable array construct. It also provides numerous useful intrinsic procedures for determining array properties including size, shape, maximum element, and minimum element, as well as intrinsic procedures and operators for combining arrays into sums, matrix vector products, and other derived information. It is common in scientific and engineering work to build up from these native constructs a set of array classes with a variety of additionally useful methods (Barton and Nackman 1994; Heroux et al. 2005). Nonetheless, the resulting objects model very low-level mathematical entities in the sense that one typically arrives at these entities after fairly involved derivations from, and approximations to, much higher-level constructs.
This appendix summarizes the Unified Modeling Language (UML) diagrammatic notation employed throughout this book along with the associated terminology and brief definitions of each term. We consider the elements that appear in the five types of UML diagrams used in the body of the current text: use case, class, object, package, and sequence diagrams. At the end of the appendix, we give a brief discussion of Object Constraint Language (OCL), a declarative language for describing rules for UML models.
Use Case Diagrams
A use case is a description of a system's behavior as it responds to an outside request or input. It captures at a high level who does what for the system being modeled. Use cases describe behavior, focusing on the roles of each element in the system rather than on how each element does its works.
A use case diagram models relationships between use cases and external requests, thus rendering a visual overview of system functionality. Figure B.1 reexamines the fin heat conductor analyzer diagram from Figure 2.6, adding notations to identify the elements of the use case diagram.
Use case diagrams commonly contain the following elements:
Actors: people or external systems that interact with the system being modeled. Actors live outside the system and are the users of the system. Typically actors interact with the system through use cases. In UML, actors are drawn as stick figures. In the fin analyzer system example, system architect, thermal analyst, and numerical analyst are actors.
“When sorrows come, they come not single spies but in battalions.”
William Shakespeare
Toward a Scalable Abstract Calculus
The canonical contexts sketched in Section 4.3 and employed throughout Part II were intentionally low-complexity problems. Such problems provided venues for fleshing out complete software solutions from their high-level architectural design through their implementation in source code. As demonstrated by the analyses in Chapter 3, however, the issues addressed by OOA, OOD, and OOP grow more important as a software package's complexity grows. Complexity growth inevitably arises when multiple subdisciplines converge into multiphysics models. The attendant increase in the scientific complexity inevitably taxes the hardware resources of any platform employed. Thus, leading-edge research in multiphysics applications must ultimately address how best to exploit the available computing platform.
Recent trends in processor architecture make it clear that fully exploiting the available hardware on even the most modest of computing platforms necessitates mastering parallelism. Even laptop computers now contain multicore processors, and the highest-end machines contain hundreds of thousands of cores. The process of getting a code to run efficiently on parallel computers is referred to as getting a code to scale, and code designs that facilitate scaling are termed scalable. The fundamental performance question posed by this chapter is whether one can construct a scalable abstract calculus. The Sundance project (Long 2004) has already answered this question in the affirmative for C++.
Whereas code reuse played an important role in Part I of this text, design reuse plays an equally important role in Part II. The effort put into thinking abstractly about software structure and behavior pays off in high-level designs that prove useful independent of the application and implementation language. Patterns comprise reusable elements of successful designs.
The software community typically uses the terms “design patterns” and “object-oriented design patterns” interchangeably. This stems from the expressiveness of OOP languages in describing the relationships and interactions between ADTs. Patterns can improve a code's structure and readability and reduce its development costs by encouraging reuse.
Software design patterns comprise four elements (Gamma et al. 1995):
The pattern name: a handle that describes a design problem, its solution, and consequences in a word or two.
The problem: a description of when to apply the pattern and within what context.
The solution: the elements that constitute the design, the relationships between these elements, their responsibilities, and their collaborations.
The consequences: the results and trade-offs of applying the pattern.
Although there have been suggestions to include additional information in identifying a pattern, for example, sample code and known uses to validate the pattern as a proven solution, authors generally agree that elements 2-4 enumerate the three essential factors in each pattern.
“Memory is a crazy woman [who] hoards colored rags and throws away food.”
Austin O'Malley
The Problem
Large software development efforts typically require a degree of consistency across the project to ensure that each developer follows practices consistent with the critical goals of the project. In high-performance computing (HPC), for example, Amdahl's law (Chapter 1) suggests that scaling up to the tens or hundreds of thousands of processor cores available on leadership-class machines requires that every dusty corner of the code make efficient use of the available cores. Otherwise, whichever fraction of the code speeds up more slowly with increasing numbers of cores eventually determines the overall speedup of the code.
Another form of consistency proves useful when one desires some universal way to reference objects in a project. Doing so facilitates manipulating an object without knowledge of its identity. The manipulated object could equally well be an instance of any class in the project.
In HPC, communicating efficiently between local memories on distributed processors represents one of the most challenging problems. One might desire to ensure consistency in communication practices across the project. In these contexts, two broad requirements drive the desire to impose a degree of uniformity across a design: one stemming from a need for consistent functionality, and the other stemming from a need for consistent referencing.
Opposing these forces is the desire to avoid overconstraining the design. In the worst-case scenario, imposing too much uniformity stifles creativity and freezes counterproductive elements into the design.
This book is about software design. We use “design” here in the sense that it applies to machines, electrical circuits, chemical plants, and buildings. At least two differences between designing scientific software and designing these other systems seem apparent:
The extent to which one communicates a system's structure by representing schematically its component parts and their interrelationships,
The extent to which such schematics can be analyzed to evaluate suitability and prevent failures.
Schematic representations greatly simplify communications between developers. Analyses of these schematics can potentially settle long-standing debates about which systems will wear well over time as user requirements evolve and usage varies.
This book does not chiefly concern high-performance computing. While most current discussions of scientific programming focus on scalable performance, we unabashedly set out along the nearly orthogonal axis of scalable design. We analyze how the structure of a package determines its developmental complexity according to such measures as bug search times and documentation information content. We also present arguments for why these issues impact solution cost and time more than does scalable performance.
We firmly believe that science is not a horse race. The greatest scientific impact does not necessarily go to the swiftest code. Groundbreaking results often occur at the intersection of multiple disciplines, where the basic science is so poorly understood that important insights can be gleaned from creative simulations of modest size.
“Software abstractions should resemble blackboard abstractions.”
Kevin Long
Abstract Data Type Calculus
A desire for code reuse motivated most of Chapter 2. Using encapsulation and information hiding, we wrapped legacy, structured programs in an object-oriented superstructure without exposing the legacy interfaces. We used aggregation and composition to incorporate the resulting classes into components inside new classes. We also employed inheritance to erect class hierarchies. Finally, we employed dynamic polymorphism to enable child instances to respond to the type-bound procedure invocations written for their parents.
Much of the code in Chapters 1–2 proved amenable to reuse in modeling heat conduction, but few of the object-oriented abstractions presented would find any use for nonthermal calculations. Even to the extent the heat equation models other phenomena, such as Fickian diffusion, calling a procedure named heat_for() to solve the diffusion equation would obfuscate its purpose. This problem could be addressed with window dressing – creating a diffusion-oriented interface that delegates all procedure invocations to the conductor class. Nonetheless, neither the original conductor class nor its diffusion counterpart would likely prove useful in any simulation that solves a governing equation that is not formally equivalent to the heat equation. This begs the question: “When reusing codes, what classes might we best construct from them?”
Most developers would agree with the benefits of breaking a large problem into smaller problems, but choosing those smaller problems poses a quandary without a unique solution.
“Believe those who are seeking the truth. Doubt those who find it.”
Andre Gide
Nomenclature
Chapter 1 introduced the main pillars of object-orientation: encapsulation, information hiding, polymorphism, and inheritance. The current chapter provides more detailed definitions and demonstrations of these concepts in Fortran 2003 along with a complexity analysis. As noted in the preface, we primarily target three audiences: Fortran 95 programmers unfamiliar with OOP, Fortran 95 programmers who emulate OOP, and C++ programmers familiar with OOP. In reading this chapter, the first audience will learn the basic OOP concepts in a familiar syntax. The second audience will find that many of the emulation techniques that have been suggested in the literature can be converted quite easily to employ the intrinsic OOP constructs of Fortran 2003. The third audience will benefit from the exposure to OOP in a language other than their native tongue. All will hopefully benefit from the complexity analysis at the end of the chapter.
We hope that using the newest features of Fortran gives this book lasting value. Operating at the bleeding edge, however, presents some short-term limitations. As of January 2011, only two compilers implemented the full Fortran 2003 standard:
The IBM XL Fortran compiler,
The Cray Compiler Environment.
However, it appears that the Numerical Algorithms Group (NAG), Gnu Fortran (gfortran), and Intel compilers are advancing rapidly enough that they are likely to have the features required to compile the code in this book by the time of publication (Chivers and Sleightholme 2010).
“However beautiful the strategy, you should occasionally look at the results.”
Winston Churchill
The Problem
This chapter introduces the GoF strategy pattern along with a Fortran-specific, enabling pattern: surrogate. In scientific programming, one finds context for a strategy when the choice of numerical algorithms must evolve dynamically. Multiphysics modeling, for example, typically implies multi-numerics modeling. As the physics changes, so must the numerical methods.
A problem arises when the software does not separate its expression of the physics from its expression of the discrete algorithms. In the Lorenz system, for example, equation (4.5) expresses the physics, whereas equation (4.6) expresses the discretization. Consider a concrete ADT that solves the same equations as the abstractions presented in Section 6.2 and extends the integrand ADT but requires a new time integration algorithm. The extended type must overload the integrate name. This would be a simple task when changing one algorithm in one particular ADT. However, when numerous algorithms exist, one faces the dilemma of either putting all possible algorithms in the parent type or leaving to extended types the task of each implementing their own algorithms. Section 7.3 explains the adverse impact the first option has on code maintainability. The second option could lead to redundant (and possibly inconsistent) implementations.
The Lorenz equation solver described in Chapter 6 uses explicit Euler time advancement. That solver updates the solution vector at each time step without explicitly storing a corresponding time coordinate.