To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The preceding papers have been sufficiently diverse to expose (at least) two apparent dichotomies in quantitative epidemiology. These concern the kinds of models needed to solve epidemiological problems: simple versus complex models, and dynamics versus statistical models. A minimum of three papers is needed to highlight the issues: this session has done it with just four.
Those who prefer to begin with complex (Habbema) or simple (Agur, Medley) dynamic models take different views of how many parameters and variables are needed to make these models useful. Crudely, the former hold that model utility is a monotonically increasing function of the number of parameters and variables, whilst the later believe that the slope of this graph will usually be negative. The two schools have different approaches because they have different aims. The view of the complex modellers is epitomized by the statement (Habbema et al. 1992):
Ideally, epidemiological modelling should serve as a scientific framework for studying many aspects of disease control: choice of control policy, prediction, planning, operational decision making, data analysis, evaluation and surveillance.
The effort to deal with ‘many aspects’ is characteristic, but so too is the attempt to construct a model which is sufficiently detailed to predict the absolute (Habbema et al 1992):
14 years of full-scale vector control will be sufficient to reduce the risk of [onchocerciasis] recrudescence to less that 1% even in the most afflicted areas.
It is clear, however, that the more robust policy statements will emerge when detailed models of this kind are used in a comparative way – to choose the best among available options.
These papers deal with different aspects of understanding how hosts cope with the diversity of antigenic challenges that pathogens provide. These comments examine common threads underlying three of the presentations and describe some recent work on the more general problem of the diversity of pathogens that any host population can sustain.
The work of Martin Nowak and his colleagues at Oxford, Imperial College and Amsterdam University epitomizes the challenges that mathematical models present to empirical epidemiologists. As with other recent work on the mathematics of the immune system and infection with HIV (McLean 1993), the work suggests alternative interpretations of epidemiological data and has stimulated the collection (and analysis) of data not normally collected by immunologists and clinicians. At the heart of the Nowak model is the interaction between the diversity of HIV quasi-species in individual patients and the ability of the host to produce a sufficient diversity of antigens to cope with this. At a crucial level of quasi-species diversity, the diversity threshold, the immune system is overwhelmed, CD4 counts decline precipitously and the patient succumbs to full-blown AIDS. The length of time until this occurs is dependent upon a variety of factors, but most importantly upon the replication and mutation rate of the virus, and upon the host's ability to mount an efficient and diverse immune response. My main questions are about this diversity threshold; the diversity levels presented in the model seem much higher than the diversity levels observed in individually infected HIV/AIDS patients.
Models for vector-transmitted infections are always based on several homogeneity assumptions, even when some aspects of heterogeneity are incorporated in the model. Usually most of these assumptions are not stated explicitly, and among these is the implicit assumption that susceptible and infective hosts are bitten homogeneously by insect vectors. Some experiments and a field study documented in the literature (Baylis and Nambiro (1993) and references therein) seem to indicate the possibility that vectors have a feeding preference for infected hosts. On the other hand, if infective human hosts are especially protected against mosquito bites, for example through the use of bed nets, susceptible hosts could on average be bitten more frequently.
The model
In order to investigate effects when biting of insect vectors is non-homogeneous between susceptible and infective hosts, we consider a generalization of the Ross malaria model (which is a type of model applicable not only to malaria transmission), allowing both for increased or decreased attractiveness of infective hosts, but disregarding other kinds of heterogeneity which have e.g. been dealt with by Hasibeder and Dye (1988). We characterize the degree of heterogeneity between susceptible and infective hosts by one single parameter and, unlike the analysis in Kingsolver (1987), we start at the level of individuals:
Let π1(X) denote the proportion of adequate contacts (bites or blood meals facilitating transmission) of infective vectors which they have with susceptible hosts, and π2(X) the proportion of adequate contacts of uninfected vectors which they have with infective hosts.
The importance of CD4 T-cells in AIDS and HIV infection has long been recognized. Measurements of the CD4 number, obtained from the peripheral blood, give an indication of the amount of immune suppression, with lower numbers indicating more severe immune deficiency. They have been shown to be of great prognostic significance for predicting clinical outcomes (Fahey et al. 1990); they are useful in patient care for monitoring an individual's health; they are used in epidemiological studies and in some countries they are used in determining the availability of health care resources and even are incorporated into the definition of AIDS. CD4 T-cell numbers are also used in determining the eligibility criteria and as stratification variables in randomised clinical trials.
In a previous paper (Taylor et al. 1994) we considered various statistical models which attempted to describe the variation in the patterns of decline of CD4 T-cell numbers in HIV infected subjects. These models were fitted to data from the Los Angeles portion of the Multicenter AIDS cohort study (MACS). One of the aims in this analysis is to investigate whether individuals maintain a fixed rate of decline of CD4 after allowing for the variability of the measurements, that is whether a subject who is following a certain path in their CD4 measurements will remain on that path in the future. One possibility is that individuals do maintain a fixed slope indefinitely, another possibility is that the future slope is unrelated to the past slope. We develop a family of models in which these two scenarios are special cases.
Classifications for identifying AIDS cases can take many forms depending often on the use for which the data were assembled. These can include geographical, gender, behavioural, racial and risk factor classifications, with or without further subgroupings within these broader classes. Of interest herein is the modelling of the number of cases over time by traditional autoregressivemoving average time series models for purposes of short term forecasting. One question then to be answered is whether or not some or all of these classifications can be grouped homogeneously.
Our attention is focussed on AIDS reported cases for the United States as reported by the Centers for Disease Control (CDC 1992), using those cases meeting the CDC definition of AIDS. The observed data values refer to the month and year in which the AIDS disease was first diagnosed. Cases diagnosed before 1982 have been recorded as cumulative totals through December 1981. Cases diagnosed from January 1982 through June 1991 are recorded as the number of cases in a given month. In this study, patients are classified according to specific CDC classifications, viz., homosexual males, bisexual males, heterosexual males, intravenous (IV) drug use and male homosexual/ bisexual contact, IV drug use (female and heterosexual males), haemophilia/ coagulation disorder, recipient of transfusion of blood products or tissue, white males, black males, hispanic males, total males, white females, black females, hispanic females, and total females. Thus, the aim of the analysis is to consider which of these patient classifications can be identified by a common time series model. To achieve this, a time series model is fitted to each classification. Then, groups of these classifications are proposed.
A major activity of both academic and governmental epidemiologists involves ascertaining the environmental contaminations, personal behaviors, biological factors, and other risk factors whose control will lead to the control of disease risks. Almost always the data analytic models used in this task are consistent with linear models of the causal process leading to disease. One of their basic assumptions is thus that the outcomes in one study subject are independent of the outcomes in other study subjects.
Infection transmission between humans is inconsistent with these linear model forms. Some particular consequences of this inconsistency explain why epidemiologists have been so unsuccessful in defining the modes of transmission of many infectious agents. Model inconsistencies also explain the poor performance of epidemiological methods in determining the relative importance of different factors contributing to infection risk at both an individual and a population level. It will be demonstrated how most of the effect of risk factors which increase transmission risk will not be detected by the usual analytic methods of epidemiology. This inadequacy of standard methods occurs even when contact patterns are random. It will be further demonstrated how non-random contact patterns can create additional difficulties for the detection of secondary risk factors.
Standard analytic models in epidemiology assume that causal actions occur directly on individuals. Thus their basic parameters are unlike those in transmission models which relate to interactions between individuals. The paradigm jump from cause acting directly on individuals to cause acting on interactions between individuals is a big one for epidemiologists. The paradigm jump would be facilitated if a data collection and analysis framework existed based on transmission models. Some approaches to developing that framework will be discussed.
The spread of many infectious diseases occurs in a diverse population, so that it is desirable to include heterogeneity in the formulation of the epidemiological model in order to improve its predictive and explanatory power and its applicability. Often the heterogeneous population is divided into subpopulations or groups, each of which is homogeneous in the sense that the group members have similar characteristics. This division into groups can be based not only on disease-related factors such as mode of transmission, latent period, infectious period, genetic susceptibility or resistance, and amount of vaccination or chemotherapy, but also on social, cultural, economic, demographic or geographic factors. For example, the mixing behavior may depend on the age of the individuals. If any of the epidemiological characteristics are gender dependent, then groups of men and women may be necessary.
The transmission of sexually transmitted diseases (STDs) often occurs in a very heterogeneous population. People with many different sexual partners have many more opportunities to be infected and to infect others than people who have fewer partners. Thus for STDs it is often necessary to divide the population on the basis of the amount of sexual activity. Frequently, the epidemiological characteristics of STDs are different for men and women. For example, the probability of transmission per partner of gonorrhea from male to female is greater than that from female to male. Moreover, the fraction of women with gonorrhea who are asymptomatic is larger than the fraction for men (Hethcote and Yorke 1984).
There is a vast and well-developed theory for the spread of infectious diseases in randomly mixing populations and in populations structured by age. In general we have the following situation. There is an uninfected stationary state. For certain parameter specifications, i.e. in the case of low transmission rate, fast recovery, or high mortality, this uninfected state is stable. If, say, the transmission rate is increased then there is a bifurcation, the uninfected state loses its stability, and a stable infected state comes into existence. The conditions for stability of the uninfected situation or for the bifurcation, respectively, are usually formulated as threshold conditions. In simple cases the threshold condition is given in such a way that certain quantities describing transmission of the disease, multiplication of parasites etc. exceed other terms describing recovery, death of infected etc. In many cases the quotient of these terms is formed, and the threshold condition assumes the form that a certain quantity is compared to the number one. This quantity has often been interpreted as the basic reproduction number R0, i.e. as the number of cases that will be produced by one ‘typical’ infected individual in a totally susceptible population during its infected period. From this interpretation it is obvious, that the condition for stability should be the inequality R0 < 1. It should be underlined that this interpretation of R0 is not a definition as long as the notion of a typical individual or the underlying averaging procedure has not been specified. Until recently this had been done only for some simple cases of randomly mixing populations.
Cervical cancer is the second commonest female cancer worldwide. Over the last few years, the evidence that sexually transmitted human papillomavirus (HPV) infection is involved in the development of most cases of cervical cancer has become virtually conclusive (Howley 1991, Schiffman 1992). HPV is therefore among the most important targets for practical cancer prevention. Screening for HPV is cheap and reliable, and animal studies suggest that vaccination may prove effective both in curing established infection and in preventing re-infection (Campo et al. 1993).
Against this background, formal modelling of the transmission and persistence of HPV infection would be a useful next step towards understanding the epidemic. The data required for preliminary simple models are beginning to emerge from case-control and prospective studies, but it is already clear that the natural history of infection is complex and heterogeneous. Statistical models of the natural history of various chronic diseases are reviewed by Gore (this volume). As for HIV, HPV susceptibility and infectiousness may be significantly influenced by other sexually transmitted infections as well as by genetic and immunological host factors.
The aim of this paper is to summarise the evidence that HPV is the cause of most cervical cancers, and to indicate what data should now be collected to elucidate its transmission dynamics. There have been rapid advances over the last few years in HPV diagnostic methods and our understanding of the relationship between HPV, dysplasia and malignancy. Much of the material summarised below is described in the recent IARC Scientific Report entitled The Epidemiology of HPV and Cervical Cancer (IARC 1992).
There has been considerable recent interest in expanding traditional mass action models for disease transmission to include selectivity and clustering in the contact process. One approach has been to stratify the population according to one or more population characteristics and then to model the effect of these characteristics on contact patterns. The contact rate between two members with known attributes is described by a mixing matrix or kernel function. The usual approach is to consider a parameterized family of mixing matrices and ask how important epidemiological outcomes are affected by these parameters.
In contrast, we have taken the contact network as the primary unit of observation. Networks are modeled as weighted graphs (static and undirected in the studies reported here). Because a complete description of a network entails a very large amount of information, our goal was find summary statistics that were effective predictors of the speed at which a disease would propagate through the network. The approach was to generate random networks, compute summary statistics, simulate a disease spreading through the network, and then examine the relationship between the statistics and epidemiologically significant outcomes. These simulation studies are preliminary, indicating the direction of ongoing research, and ask more questions than they answer.
Networks were generated using two different probability models, producing clustering by different mechanisms. The first model assumes that spatial proximity is a major consideration in network formation. Each individual in the population is assigned a random location in a square region and is assigned a circular territory of radius r in which it seeks contacts.
Partner studies of heterosexual transmission of HIV have observed tranmissions after relatively few sexual contacts and couples who have remained discordant, with respect to HIV, whilst considered to be at high risk over prolonged periods, suggesting huge variation between individuals in whether a contact seroconverts. This paper is based on a longitudinal partner study which aims to identify behavioural and biological factors which influence heterosexual transmission of HIV. The index case (the first infected) is defined as a patient who is HIV positive whilst the contact partner is a person of the opposite sex who has had a sexual relationship with the index case. In Edinburgh, from October 1987 to the beginning of June 1992, one-hundred and twenty couples have been recruited where the contact's risk of infection is only through heterosexual intercourse with his/her index case. At recruitment, 24 couples (20%) were concordant with respect to HIV and since recruitment one contact has seroconverted. At the initial interview the contact is asked about her/his past sexual practices and contraceptive use and counselled about safer sex. Follow-up interviews of negative contacts take place to reassess these behavioural data and their HIV status. Biological data on the index is also available as the majority of cases are in clinical care.
Three factors which might influence heterosexual transmission of HIV are to be assessed:
behavioural aspects of the couple,
infectivity of the index,
susceptibility of contact.
Behavioural data are required to confirm that the virus has an ‘opportunity’ to transmit. Virological and immunological factors form the basis for assessing the infectivity of the index and susceptibility of the contact.
Conventional deterministic models of infection spread through populations aggregate individuals into compartments and study the dynamics of the resulting simplified system (Anderson and May 1991, Hethcote and Van Ark 1992). In this paper we explore whether knowledge of contact networks at an individual level can add to our epidemiological understanding in the particular setting of STDs. In the case of STDs the limited number and well defined nature of sexual contacts between people allows the description of the networks along which an STD can spread (Klovdahl et al. 1992, 1994). To this end a simple model describing the sexual behaviour of individuals is developed which generates sexual partner networks. The spread of a sexually transmitted disease (STD) through the population is simulated, and the characteristics of the network are related to the resultant spread of the STD. The model constructed contains many assumptions about the mechanisms controlling the sexual partnership formation behaviour, which are varied to generate a large range of possible networks. A central aim of this work is the development of the model as a tool to assist in the analysis of behavioural data. From simulations the parameters which are most influential in STD epidemiology can be identified. Samples can be taken from this network in a way which mirrors methods of sampling used in behavioural research.
The model
Individuals within the population, which can be varied in size, are treated as discrete entities with particular characteristics related to their sexual behaviour. Currently these include: sex; desired number of sexual partners per unit of time; desired duration of sexual partnerships; and a preference function for choosing sexual partners on the basis of their desired number of partners.
In small communities there are usually only few infectious individuals. If they contact too few susceptibles, this might lead to local extinction. On the other hand, if they contact too many susceptibles, they give rise to too many secondary infections. This reduces the number of susceptibles, which may eventually also lead to extinction. In order for long term persistence of the infection to be likely, the population must exceed a minimum size. If there is a long infectious period (e.g. for leprosy, tuberculosis and HIV it lasts for years), the infection can persist even in small populations. High contact rates cause a better ‘exploitation’ of the population, but they also bear the risk of causing large epidemics which in turn can cause local extinction. Human birth and death rates define the population turnover and therefore also influence the persistence of infectious diseases. It is the aim of this study to determine the minimum population size that is necessary for the persistence of polio virus infection by using stochastic simulations.
Methods
The computer models are stochastic. The sequence of epidemiological events is generated by a Markov process. The type of the event (birth, death of a susceptible, infection, loss of infectivity) is assigned according to a multinomial distribution which depends on the state of the population (number of susceptible and infectious individuals; see Appendix for details). If the event is a birth, the number of susceptibles is increased by one. If it is an infection, the number of susceptibles is decreased by one and the number of infectives is increased by one.
There was fascinating dichotomy presented this morning by the papers of Nowak and Taylor. This dichotomy has been given several names during this meeting, and my favourite is the distinction between thought experiments to understand the processes that generate observed patterns, and the analysis of real experimental data. These two papers are essentially addressing the same subject: the pattern of CD4 counts over time, and it appears to me that both approaches would benefit from consideration of the other. On one hand, Taylor explains much of the variability in the observed counts as being derived from an underlying stochastic process, whereas it may well be due to a highly non-linear process changing on a time-scale faster than the sampling interval. On the other hand, Nowak does not use his model to produce predictions of CD4 numbers which may actually be testable by comparison with such data.
There is general problem here with the use of deterministic models, i.e. those that produce a single value or set of single value results for each time point without any measure of variability. Differential equations are an invaluable tool for mathematical descriptions of disease processes, but suffer from the fact that data-derived estimates are required for the processes embedded in the equation system, for example density dependent transmission. There are methods available for fitting equations directly to observations of the system over time, but these tend to regard the variability in data as some form of random error, and the fitting involves simple reduction of the average difference between observation and model.
The incubation period of AIDS is a key characteristic in understanding the HIV and AIDS epidemic, both clinically and epidemiologically. The incubation period distribution (IPD) provides information about the probability of progression to AIDS as it changes with time since infection with HIV. The IPD also provides the link between the HIV infection rate and the occurrence of AIDS cases over time, and is an essential feature in back calculation procedures (Brookmeyer and Gail 1988). Knowledge of the IPD creates the opportunity to make more reliable projections (Hendriks et al. 1992), which are necessary for health-care planning.
Cohort study data relating to development of AIDS are inevitably incomplete in dates of seroconversion (infection) or development of AIDS, or both. This incompleteness has inspired a variety of approaches. We used a multiple imputation procedure, with four related models, each covering different assumptions, to investigate the sensitivity of the estimated IPD regarding the imputation method. The imputation procedure was used to provide the unobserved interval between seroconversion and enrolment for those individuals who were already HIV infected at enrolment. We can exclude observations relating to individuals who received antiviral and or prophylactic treatment designed to delay the onset of AIDS. The results obtained from data such as these will be valuable in future to aid the understanding of the effects of new therapies on the evolution of the AIDS epidemic.
The IPD was estimated using data available at February 1990 from all homosexual and bisexual men with HIV seropositive blood samples (n = 348; aged 25–45 years), who were part of a larger cohort study in Amsterdam.
The lifespan of T lymphocytes is of particular interest because of their central role in immunological memory. Is the recall of a vaccination or early infection, which may be demonstrated clinically up to 50 years after antigen exposure, retained by a long-lived cell, or its progeny? Using the observation that T lymphocyte expression of isoforms of CD45 corresponds with their ability to respond to recall antigens, we have investigated the lifespan of both CD45RO (the subset containing responders, or ‘memory’ cells) and CD45RA (the unresponsive, or ‘naive’ subset) lymphocytes in a group of patients after radiotherapy (Michie et al. 1992). We have found a rapid loss of unstable chromosomes (which result in cell death in mitosis) from the CD45RO but not the CD45RA pool. Immunological memory therefore apparently resides in a population with a more rapid rate of division. The survival curves for the two populations are best described by a model in which there is also reversion in vivo from the CD45RO to the CD45RA phenotype. Expression of CD45RO in T cells may therefore be reversible. Further data showing survival curves of T lymphocytes with stable radiation damage (passed to one daughter cell during mitosis) is also considered. These curves show very little loss of such cells. The difference between the two populations (stable and unstable damage) allows an estimate of their proliferation rates and death rates. These parameter estimates may be of interest to people modelling the dynamics of the immune response as they give some rough indicators of the timescales on which T lymphocytes turn over (McLean and Michie 1993).