Hostname: page-component-54dcc4c588-wlffp Total loading time: 0 Render date: 2025-10-05T02:22:48.930Z Has data issue: false hasContentIssue false

“On Contemporary Mortality Models for Actuarial Use I: practice and II: principles” by Professor Angus Macdonald and Dr Stephen Richards

Published online by Cambridge University Press:  02 September 2025

Rights & Permissions [Opens in a new window]

Abstract

Information

Type
Sessional Meeting Discussion
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Institute and Faculty of Actuaries, 2025. Published by Cambridge University Press on behalf of The Institute and Faculty of Actuaries

The Moderator (Mr K. Kaufhold): I would like to welcome you to today’s sessional meeting of the IFoA on “Contemporary Mortality Models for Actuarial Use.”

My name is Kai Kaufold. I am a German actuary and a guest moderator. It is my honour to host this session with Professor Angus Macdonald and Dr Stephen Richards. Angus (Macdonald) is Professor Emeritus of Heriot Watt University in Edinburgh, and Stephen Richards, also from Edinburgh, runs his own consulting and software firm, Longevitas. My reason for being your moderator is that Stephen (Richards) and I have known each other for more than 15 years. I am an avid user of his contemporary mortality models. Besides providing a very elegant solution to the problem of graduating mortality tables, the main attraction for me is that they reduce the amount of work the actuary must do because they are much simpler than many other models.

Our first speaker, Angus Macdonald, was Stephen (Richards)’s PhD supervisor. Essentially, what you are seeing today is three generations of users of these models.

Prof A. S. Macdonald, F.F.A. (introducing the paper): Thank you very much, Kai (Kaufold). It is my pleasure to begin our two presentations on contemporary mortality models.

What do we mean by continuous-time mortality models? We can try to begin answering that with discrete-time models. Discrete-time models are quite complicated. When we break them down to their elements, we arrive at what we call a Bernoulli “atom,” the simplest possible thing in all of probability. Having broken things down, we then build them up again into likelihoods. The tool with which we do that is the product integral. I will finish with examples that show how this applies to data.

A good place to start is the classic paper by Forfar, McCutcheon and Wilkie, “On Graduation by Mathematical Formulae” (Forfar et al., Reference Forfar, McCutcheon and Wilkie1988). That paper introduced two types of models – q-type models that are parameterised by the one-year probability of death, q x , and µ-type models that are parameterised by the hazard rate or the force of mortality, which we conventionally denote by µ x . For all practical purposes, the central rate of mortality, m x , can be regarded as the same as µ x .

The q-type models lead naturally to the binomial distribution and the µ-type models to the Poisson distribution, but both have their flaws. The binomial models assume that we have a number of people exposed for a whole year, but very often that is not what we observe, because in reality, there are people joining and leaving during the year. This means that the observations clearly do not lead directly to a binomial distribution. There are also problems with the Poisson distribution. We usually have a limited number of individuals in the study and that means that the probability of observing more deaths than we have participants must be zero. So that cannot be a Poisson distribution because that gives positive probability to any positive number of deaths.

Fixing the Binomial model tends to lead us into further and more difficult problems, whereas when we try to fix the Poisson model, we find something rather enlightening. The framework for this is the family tree shown in Figure 1.

Figure 1. Family tree for the Bernoulli trial.

It is a partial picture, because it shows only some branches of a larger tree, but it is based on the Bernoulli trial as the very simplest concept in all of probability. It has two main branches. At the bottom, we have the discrete branch, the q-type models, based on the Bernoulli distribution with probability q x . When we add everything up from the individual to the collective, we end up with a Binomial distribution.

The upper main branch is the Bernoulli trial with the instantaneous probability µ x dx and that branches out in several directions, which we will come to later. These are the branches we are interested in that we will describe as continuous-time models.

To deal with the lower (discrete-time) branch of that family tree first, we suppose we have a finite number of lives. For each life, we have an indicator of whether that life has died or survived – the random variable d i , equal to one if the i th life dies and zero if the life survives.

Out of that we can create the likelihood of all the observations, which is of a familiar binomial form shown in Equation (7) in Macdonald and Richards (Reference Macdonald and Richards2025).

However, this is not as simple as it looks. The discrete-time model is complicated. For example, the probability of surviving for one year is a highly composite event. To survive one year, we first have to survive half a year. To survive half a year, we first must survive a quarter of a year, and so on. You will recognise the Zeno paradox here, as shown below:

In fact, the event of surviving for one year is not simple but is infinitely composite, which we can express by saying survival happens from moment to moment. If we look at the death probabilities, their representations can be even more complicated.

The discrete-time model therefore leads us to consider complicated events. We want to break them down and look at simpler events, which are based on what we call the Bernoulli atom or the infinitesimal Bernoulli trial. For that we go back to the fundamental idea behind the hazard rate or the force of mortality, which is that the probability of dying during a very small time dt, conditional on being alive at some time t, is the hazard rate µ t multiplied by dt plus some small o(dt) term that will disappear, leaving us with approximately µ t dt.

We redefine the notation for convenience. The death indicator di we redefine as dN i (t), which translates as the change in the number of lives that have been observed to die by time t. It is equal to one if a death is so observed and zero if a death is not so observed as in Section 4.3 of Macdonald and Richards (Reference Macdonald and Richards2025).

We can write down the likelihood of the infinitesimal Bernoulli trial as the survival probability if we do not observe a death and the death probability if we do observe a death as follows:

$$P\left[ {\rm Observation \,\, in \,\, {\it dt}} \right]{\rm{ }} = {\rm{ }}\left( {1{\rm{ }} - {\rm{ }}{\mu _t}dt} \right)^{\left( {1 - \Delta {N_i}\left( t \right)} \right)}{\rm{ }}\left( {{\mu _t}dt} \right)^{\Delta {N_i}\left( t \right)}.$$

Having broken down the likelihood of the complicated or composite event into its simplest atoms, we now look at how we can build it back up again. The tool we use to do that is called the product integral.

We begin with the identity that expresses the limit of a product of terms that are very close to 1 and getting closer and closer to 1. It turns out that limit is the exponential:

$$\,\,\,\,\mathop {\lim }\limits_{n \to \infty } {\left( {1 + {1 \over n}} \right)^n} = e\quad or\quad \mathop {\lim }\limits_{n \to \infty } {\left( {1 + {1 \over n}} \right)^n} = {e^{ - 1}}.$$

The next step is to find a link between limits of products and the limits of sums or ordinary integrals, via the exponential function:

$$\prod\limits_{s \in \left( {0,\,t} \right)} {\left( {1 + f\left( s \right)ds} \right) = \exp \left( {\int_0^t {f\left( s \right)ds} } \right)}$$

Applying this with the function f(s) equal to the negative hazard rate, we get a product of survival probabilities over infinitesimal intervals that is more familiarly denoted by t p x :

$$\prod\limits_{s \in \left( {0,\,t} \right]} {\left( {1 - {\mu _{x + s}}ds} \right) = \exp \left( -{\int_0^t {{\mu _{x + s}}ds} } \right) = {}_t{p_x}}.$$

To apply this we simply multiply the infinitesimal Bernoulli probabilities over the interval of observation, which is here written as the interval (a i , b i ] for the i th life. That is the product integral.

Figure 2. The Binomial/Bernoulli family tree and the true Poisson model (FMW = Forfar, McCutcheon and Wilkie).

Conveniently, we can break this product-integral likelihood into two parts. We have the contribution from all the survival probabilities, that is the exponential of minus integrated hazard rate. Then we only really need to worry about the death probabilities at the end of the time interval because we know that the life was observed to survive to the end of the time interval; i.e. the death probability if we observed a death at time b i . The product integral thus reduces to a Poisson type likelihood. That is fundamentally where the Poisson-type model comes from.

To bring in data, we define a process Y for the i th life, which is simply 1 if the life is alive and under observation just before time t, and 0 otherwise:

$${Y^i}\left( t \right) = \left\{ {\matrix{1 & \rm{if\,alive\,and \; under\,observation \; at \; time\,{t^ - }} \cr 0 & {\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\! \rm otherwise} \cr } } \right..$$

If you multiply anything else by that process, then Y acts as a switch. It switches the other item on or off, depending on whether the life is under observation or not at time t. We apply that to the hazard rate. The product of Y times the hazard rate turns the ordinary hazard function into something stochastic and tailored to the life history of the i th life:

$${Y^i}\left( t \right){\mu _t} = \left\{ {\matrix{ {{\mu _t}\quad \rm if\,alive\,\,and \, under\,observation \, at \, time\,{t^ - }} \hfill \cr {0\,\,\,\quad \rm otherwise} \hfill \cr } } \right..$$

We plug that stochastic hazard rate into our Bernoulli likelihoods and take the product integral as we have before:

$$P\left[ {\rm Observation}{}_i \,\, {\rm in} \,\, {\it dt} \right]{\rm{ }} = {\left( {1 - {Y^i}\left( t \right){\mu _t}\,dt} \right)^{1 - \Delta {N_i}\left( t \right)}}{\left( {{Y^i}\left( t \right){\mu _t}\,dt} \right)^{\Delta {N_i}\left( t \right)}}.$$

Then product-integration takes us from the micro-scale, of the Bernoulli atom of Poisson-type likelihoods, to the universal Poisson-type likelihood:

$$\eqalign{ & {L_i} = P\left[ {\rm Observatio{n_{\it i}}} \right] = {\prod\limits_{\left( {0,\,\infty } \right]} {{{\left( {1 - {Y^i}\left( t \right){\mu _t}\,dt} \right)}^{\left( {1 - \Delta {N_i}\left( t \right)} \right)}}\left( {{Y^i}\left( t \right){\mu _t}\,dt} \right)} ^{\Delta {N_i}\left( t \right)}} \cr & \quad \quad \quad \quad \quad \quad \quad \quad \quad = \underbrace {\exp \left( { - \int_0^\infty {{Y^i}\left( t \right){\mu _t}\,dt} } \right)}_{P\left[ {\rm Survival} \right]}\underbrace {{{\left( {{Y^i}\left( {{b_i}} \right){\mu _{{b_i}}}\,dt} \right)}^{\Delta {N_i}\left( {{b_i}} \right)}}}_{P\left[ {\rm Death} \right]}. \cr}$$

I will conclude by going through some examples. The simplest likelihood for M lives is shown in Equation (35) in Macdonald and Richards (Reference Macdonald and Richards2025). It is the likelihood we would get from using individual lifetime data, otherwise called complete observed lifetimes or sometimes the survival model. That is the top branch of the upper part of our Bernoulli family tree, the complete lifetimes, shown in Figure 1.

To split the observations into age groups, say individual years of age, we define a stochastic switch $Y_x^i(t)$ for each age interval from x to age x + 1. It becomes the indicator of the life being alive and under observation on the age interval x to x + 1. We assume a constant hazard rate on each age interval. As well as multiplying over all the lives, we must also multiply over all the age intervals. The likelihood is shown in Equation (37) in Macdonald and Richards (Reference Macdonald and Richards2025).

Apart from assuming the piecewise constant hazard rates, this is the same likelihood as we had before. We call this the pseudo-Poisson model for group data because it “looks” like a Poisson model; it has a Poisson-type likelihood, but there is no Poisson random variable.

This is the model actuaries are almost invariably using when they say they have age-grouped data and are assuming a Poisson random variable for the number of deaths. This is the middle branch of the upper part of the Bernoulli family tree shown in Figure 1.

For completeness, the third example is of a true Poisson model. It is the same as the last (age-grouped) model except that we constrain the observations so that the total exposure at each age is a predetermined constant. The likelihood is shown in Equation (40) in Macdonald and Richards (Reference Macdonald and Richards2025).

We never see this in practice. A hypothetical example would be to assume that everybody who dies is immediately replaced by an identical individual. This does not correspond to any real situation that actuaries would see, but it does result in true Poisson distributions for age-grouped data. It is the bottom branch of the upper part of our family tree shown in Figure 1.

That takes us back to where we started – the Poisson and the binomial, the two models dealt with in the classic paper by Forfar, McCutcheon and Wilkie (Forfar et al., Reference Forfar, McCutcheon and Wilkie1988), as illustrated in Figure 2.

Building on this, we have obtained a family of continuous-time models constructed out of Bernoulli “atoms” by product integration.

It includes not just the three models in the Bernoulli family tree shown in Figure 1, but also multiple-decrement models, multiple-state models, models with covariates, Cox models, and others such as the non-parametric and Kaplan-Meier models that Stephen (Richards) will discuss.

A good reference for this subject is the book “Statistical Models Based on Counting Processes” by Andersen et al. (Reference Andersen, Borgan, Gill and Keiding1993).

Dr S. J. Richards, F.F.A.: Angus (Macdonald) has shown us the principles and theory. I will look at some of the practical applications of this work. In the first paper, we present six major areas of benefit (Richards and Macdonald, Reference Richards and Macdonald2025). To keep things concise, I will look at just four, specifically:

  • The opportunities for improved communication and data quality checking from using continuous-time models.

  • Why continuous-time models are a better match to the day-to-day business reality that actuaries often face.

  • How continuous-time models are better at modelling rapid changes in risk level.

  • How continuous-time models can provide better and more timely management information.

We start with data quality. At the end of his introduction, Angus (Macdonald) mentioned Kaplan and Meier (Reference Kaplan and Meier1958). This happens to be the eleventh most cited academic paper (van Noorden et al., Reference Van Noorden, Maher and Nuzzo2014) and almost every medical trial will have some variant of the Kaplan-Meier survival curve plotted. The Kaplan-Meier estimate of the survival probability from age x to age x+t is:

$$\begin{align} {}_t{\hat{p}_x} = \prod\limits_{{t_i} \le t} {\left( {1 - {{{d_{x + {t_i}}}} \over {{l_{x + t_i^ - }}}}} \right)}\end{align}$$

The Kaplan-Meier estimate is a discretized version of the product integral Angus detailed in his introduction. The discretization is over the distinct ages of death. We find the ages at which people die – the set {x+t i } – and count the number of deaths that took place at each death age and then count the number of lives who were alive immediately prior to that age. The Kaplan-Meier estimate is a step function, as is clear from Richards and Macdonald (Reference Macdonald and Richards2025, Figure A1), which features mortality survival curves for males and females for a home reversion portfolio. It is a data set of modest size with fewer than 200 events, so the step nature of the Kaplan-Meier estimate is clear. However, most data sets that actuaries deal with have a much larger number of deaths, which has the effect of making the Kaplan-Meier estimate visually rather smooth. This is shown for a Dutch pension scheme of around 15,000 members in Figure 3. The Kaplan-Meier steps are barely visible because there is a death for almost every single day of age after 60.

Figure 3. Kaplan-Meier estimates of survival curves for males and females in the Dutch pension scheme.

In Figure 3, as for the Scottish pension scheme in Richards and Macdonald (Reference Richards and Macdonald2025, Figure 12(a)), we have a clear and consistent difference in survival rates for males and females in both cases. A rough guide is that if there are at least 500 deaths for each sex, the Kaplan-Meier estimates should show a clear and consistent gap.

Kaplan-Meier curves are a very useful way of communicating with non-specialists, in contrast to stating percentages of standard tables. They also have a particular application for actuaries working with pension schemes and annuity portfolios. In Figure 4, we see the Kaplan-Meier estimates for a UK pension scheme that was seeking a longevity swap in July 2024. This data set was received from a reinsurer, which in turn received it from an insurer, which had originally obtained the data from the ceding pension scheme.

Figure 4. Kaplan-Meier estimates for males and females in the UK pension scheme seeking a longevity swap in July 2024.

Figure 4 indicates anomalies in the data. Depending on your view, it either shows a higher survival probability for males up to about age 70 or that males and females have no net difference in survival rates over a 15-year period post-retirement. These features are most unlikely and indicate that the data are corrupted. This was a fact that had not been spotted by the reinsurer. Nevertheless, the Kaplan-Meier estimates revealed this data-quality problem in mere minutes. I strongly recommend that actuaries use Kaplan-Meier estimates not only for communication, but also for data-quality checking. The corruption that lies behind Figure 4 is surprisingly common; see also Richards and Macdonald (Reference Richards and Macdonald2025, Figure 12(b)).

We now consider modelling. A binomial mortality model based around q x is essentially like a coin toss, where the only permitted events are death or survival. The problem is that in the real world observations can be interrupted. Figure 3(a) in Richards and Macdonald (Reference Richards and Macdonald2025) shows the number of in-force annuities at each date for a UK annuity portfolio. At some point in late 2013, the insurer carried out a Part VII transfer of liabilities, where around 60,000 annuities were bought by another insurer. The issue for a binomial model is that, of the lives that were in-force on 1 January 2013, some died during the year, some survived to the end of the year, but 60,000 simply left observation. The binomial model is not a good match to a trial with three outcomes.

This kind of interrupted observation is increasingly common. Besides Part VII transfers, where insurers buy and sell portfolios, pension schemes transfer administration to a new administrator and insurers migrate from one administration system to another. There is also commutation of pensions or annuities in payment as Inland Revenue rules permit very small pensions or annuities to be commuted into a single (taxable) lump sum. This means that a pension or annuity that starts observation can cease for a reason other than death.

Binomial models are not a good fit for any of these situations, but survival models almost effortlessly handle interrupted observations by treating them as right-censored records. This means that early exits during the year are treated in the same way as survivors, just with an earlier censoring date.

A binomial model is also not a good fit if you have lives joining part-way through the year. The binomial model assumes that all lives were known at the start of the year – there is no facility for adding more observations with a shorter exposure period. However, the addition of new entrants to a portfolio is routine. For the UK insurer in Richards and Macdonald (Reference Richards and Macdonald2025, Figure 3(a)) we see a continual increase over time in the number of in-force annuities because the portfolio is open to new business, with new annuities being set up on every working day, and new annuities being set up for surviving spouses on death. Pension schemes and annuity portfolios have a lot in common with medical trials with continuous recruitment: new retirees, new annuities being set up and new surviving spouses’ benefits. We also have a continual process of withdrawal: commutation of trivial benefits, transfers to a different insurer or transfers to a different administration system.

Binomial models are not a good match to this continual process of addition and withdrawal. Survival models are, however, a particularly good match for this. Two key concepts are right-censoring and left-truncation. Left truncation occurs because lives only become known to the portfolio well into adult life (there are no policies written to newborns). Right-censoring is where all that we can say about someone is that they were alive at the last point of observation. Figure 1 of Richards and Macdonald (Reference Richards and Macdonald2025) illustrates various combinations of these. In Cases A and B, the annuity is set up on the administration system when it commences at some adult age (left truncation) and then an extract is taken for analysis. The indicator variable d specifies whether the life is alive or dead at the last point of observation. These are the two cases that a binomial model can handle. What it cannot handle is Case C, where there is an early exit from observation, such as commutation or transfer out, where d = 0 but the life was not observed until the extract date. Death has not been observed, but the censoring point did not take place at the extract date either.

We also have Case D where new individuals become known to the portfolio after the annuity has already commenced. This can happen if the insurer has taken on a block of business and the insurer knows nothing about the mortality of such individuals other than from the moment they are transferred onto the administration system. The information from when the pensioner retired to the date when the insurer starts being able to track them is lost – a form of additional left-truncation. There is also Case E where, shortly after transferring in, an insurer may in fact observe a death. A binomial model that permits two events is not a great match for this constellation of five possible options.

Besides being a better match to business realities, continuous-time models can also model rapid changes in risk. Figure 6 of Richards and Macdonald (Reference Richards and Macdonald2025) shows the level of mortality in time after allowing for age, gender and pension size. We can see a lot of rich detail in Figure 6: winter peaks in mortality and summer troughs. We also have a spike in mortality levels in Spring 2020 due to the arrival of COVID-19. The trough in the summer of 2020 was also unusually deep, suggesting a degree of harvesting, i.e., the mortality of frail individuals has been brought forward in time.

Figure 5. Estimated proportion of deaths reported for two annuity portfolios where the extract was taken in June 2020.

Continuous-time models give far greater insight into what is happening over time. A q x model based on calendar years would just have six crude observations and would be missing all the rich detail that the figure offers.

Lastly, we look at management information and what continuous-time models can do for us there. Figure 15 in Richards and Macdonald (Reference Richards and Macdonald2025) shows three panels for the aggregate mortality hazard on a daily basis for a French portfolio of annuities in payment. The insurer writes specialist Additional Voluntary Contributions for academics. The vertical dashed lines in Panels (a)-(c) mark 1 April 2020 in each case. Panel (a) is based on an extract taken in June 2020 and there is no sign of the COVID shock. The reason for this is delays in reporting of deaths to the insurer – variously known as incurred-but-not-reported (IBNR) or, more accurately, occurred-but-not-yet-reported (OBNR) deaths.

This OBNR problem is highly portfolio specific. Figure 5 shows the proportion of deaths that has been reported to two insurers. The horizontal axis is reversed, so 0 is the date of extract. As we move to the left, we move back in time from the date of extract. We can see that if we go about half a year back in time, almost all deaths have been reported to the UK insurer. But as one gets closer to the extract date, the proportion of deaths reported falls off and will obviously be zero on the date of extract itself.

In contrast, the French portfolio has a far more pronounced delay in the reporting of deaths. For the UK insurer, most of the deaths are reported within a quarter of a year. However, for the French insurer, at least 40% of the deaths are not reported within a quarter of a year. This is a portfolio-specific problem, but it does give rise to some opportunities. For example, we could estimate the delay function in Figure 5, then use it to “gross up” the estimate of current mortality to get an estimate of what has happened but has not yet been reported. This kind of thing is commonly done by economists, where it is called “nowcasting,” i.e., trying to get the best available picture of what has happened but has not necessarily been reported. We do this for the French portfolio in Panel (b) of Figure 15 in Richards and Macdonald (Reference Richards and Macdonald2025), which reveals the coming tidal wave of COVID-19 deaths that are about to be reported to the insurer. Panel (b) gives the insurer early warning of the volume of deaths that are probably going to be reported, thus allowing them to make appropriate staffing and other decisions.

An obvious question is how accurate the nowcast in Panel (b) was? We took a much later extract from August 2021 and calculated the same statistic, as shown in Panel (c). Comparing this with Panel (b) allows us to assess the accuracy of the nowcast. We can see that the nowcast, while not exactly accurate, was nevertheless a workable early warning for the French insurer. The peaks are similar, although slightly different in timing. As an early warning system, this was clearly an improvement on the state of ignorance based on just the reported mortality in Panel (a).

To conclude, using continuous-time methods, actuaries can get better communication tools and better data-quality checks. Continuous-time models are a better match to the reality of the business processes of pension schemes and annuity portfolios. They can model rapid changes in risk level to give greater insight, and more timely management information, than can be obtained using annual qx-based methods.

Moderator (starting the Q&A session): Thank you Stephen (Richards) and Angus (Macdonald) for your presentations. Do you have a preferred error measure for this family of models to evaluate relative performance?

Prof Macdonald: This family of models fits into completely standard maximum likelihood estimation methods. If you mean by “preferred error measure” how to choose between members of a family of models, then an information criterion such as the Akaike or Bayesian Information Criterion would usually be used.

Dr Richards: I would add that in practical work I make extensive use of Akaike’s Information Criterion (AIC) for choosing between models. The exception is when fitting a model with a lot of parameters, such as one using splines, where I find it preferable to use the Bayesian Information Criterion for choosing between models.

Questioner: Do you see actuaries moving from q x to µ x type models anytime soon?

Dr Richards: I have perhaps a slightly selective view of the market – all of our clients use µ x because our tools use µ x to fit their models, and therefore I know plenty of insurance companies and reinsurers who use µ x for their modelling. I know of one UK insurer that has µ x -based formulae in its annuity quotation system. So µ x is used not just for backroom actuarial work, but also for production systems. However, it does remain the case that there are still plenty of valuation systems and pricing systems that are q x -based.

Questioner: On a related note, would you like to comment on the implications of your methods for published official tables? It is frustrating that q x are generally tabulated only at integer values, forcing the user to make arbitrary interpolation assumptions. In the past, the CMIB published the graduation formula, but regrettably, that is no longer the case.

Dr Richards: All the CMI’s graduation work is done using µ x and has been for many decades. In the past, the CMI published tables of q x, but also made a point of publishing the mortality formulae and the continuous time parameters so that readers could generate µ x at any age they liked. Unfortunately, this useful habit has fallen by the wayside in recent years. Hopefully, the CMI can change back again.

Prof A. J. G. Cairns, F.F.A.: Is it easy to work out consistent AICs or BICs if you are considering two models for discrete time versus continuous time?

Dr Richards: The golden rule for comparing information criteria is that they must be based on the same data; not just the same lives, but the same observed time periods. One problem is that discrete-time models often need to discard data. If you end up trimming your exposure period so there is an integer number of years of exposure, you change the data. That would mean that information criteria are not comparable between discrete-time and continuous-time models. One way to make them comparable would be to use one of the methodologies listed in the appendices of Richards and Macdonald (Reference Richards and Macdonald2025). To use fractional years of exposure you can select a formula to include incomplete years of exposure in a q x -type model. Under such circumstances, the AIC should be comparable as the data would then be the same.

Mr R. J. Steele, F.I.A.: How would you describe the strengths and weaknesses of parametric models against random survival forests, which also build on Kaplan-Meier and allow continuous time captures, censoring exogenous risk factors and so on?

Dr Richards: I cannot answer that one. I am not familiar with random survival forests.

Moderator: They use the Kaplan-Meier curves, which you discussed in your talk, as well as continuous time or virtually continuous time mortality models. I am hoping that someone will soon publish a nice package that allows us to do random survival forests with the parametric models that you use.

Prof Macdonald: I can perhaps not answer the specific question but make a comment. The jury is still out on the use of machine-learning techniques versus parametric models. One of the chief problems, in my view, is the interpretability of the results. Machine-learning methods are sometimes a black box where you do not quite know what is going on or what the properties are of the model that is being suggested as the best fit. The position is quite fluid with models currently being developed and tested to overcome that problem. But at the moment I do not think there is a good answer to that question for mortality data. My preference is still to use parametric models where the modeller has more control over what answer is selected.

Dr Richards: I would like to add to that from the practitioner point of view. I think it is important for the actuary or modeller to work through the addition or deletion of risk factors on a step-by-step basis. It is quite common for some risk factors to be corrupted. A good example results from insurers avoiding mailing the recently deceased and upsetting any surviving spouse. Some administration systems set the address of the recently deceased to be that of the head office to ensure that no mail can go out and upset the recently bereaved. Of course, this means that you cannot use postcode as a risk factor, as the postcode data would be corrupted. Such issues can only really be detected if the analyst fits a risk factor, looks at the strength of the effect and decides if it is a credible effect and has the right sign. Often, if you have a corrupted risk factor, it manifests as being excessively predictive. This sort of nuance can only be found if the analyst includes risk factors in a stepwise manner and checks if the results are sensible. If you just feed data into an automated process, you can miss data corruptions and include risk factors that are not really risk factors at all. I am in favour of the actuarial analyst fitting the model on a step-by-step basis and sense-checking as they go, because data quality is a perennial problem in experience investigations. I am therefore also in favour of parametric survival models that are interpretable. The analyst can check to see if each parameter estimate is sensible. If not, it may be indicative of some kind of data error.

Prof Cairns: Stephen (Richards), on the slide where you illustrated the harvesting effect for COVID on the left-hand side, it shows the well-known seasonality of deaths. Does moving from a q x world to a µ x world force us to model seasonality explicitly? If so, who is going to pay for the service?

Dr Richards: You can detect seasonality in many actuarial portfolios, as covered in Richards et al. (Reference Richards, Ramonat, Vesper and Kleinow2020). It is a surprisingly strong risk factor, second only to age and sex in terms of explaining variation. However, actuaries do not need to fit seasonal terms if they do not want to. The nature of long-term business is that seasonal fluctuations tend to cancel each other out. It is a good example of a risk factor that is very significant, demographically speaking, but not necessarily significant actuarially speaking. In many cases, modelling seasonality is optional. I know of some insurers that occasionally use seasonal models if they are looking to do short-term forecasting for staffing levels. If you have a winter peak in mortality, you need to staff up accordingly, say by encouraging servicing staff to take their holidays in the summer and not the winter.

Moderator: I agree that you do not need seasonality for regular mortality projections. But if you want to investigate something that is particularly bad at a certain point in time, you need to model the seasonal variation to find the baseline. A good example is the COVID spike that you showed. The bottom of the COVID spike is the baseline for April rather than what the average during the year might have been.

We have now reached the end of the session. I would like to thank Angus (Macdonald) and Stephen (Richards) for a great talk.

Footnotes

[IFoA Sessional Meeting, Thursday 24 October 2024]

References

Andersen, P. K., Borgan, Ø., Gill, R. D., & Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer Series in Statistics. New York: Springer-Verlag.CrossRefGoogle Scholar
Forfar, D. O., McCutcheon, J. J., & Wilkie, A. D. (1988). On graduation by mathematical formula. Journal of the Institute of Actuaries, 115(1), 1149.CrossRefGoogle Scholar
Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457481.CrossRefGoogle Scholar
Macdonald, A. S., & Richards, S. J. (2025).On contemporary mortality models for actuarial use II: Principles. British Actuarial Journal, 30, e19. doi: 10.1017/S1357321725000133 CrossRefGoogle Scholar
Richards, S. J., Ramonat, S. J., Vesper, G. T., & Kleinow, T. (2020). Modelling seasonal mortality with individual data. Scandinavian Actuarial Journal, 2020, 864878.CrossRefGoogle Scholar
Richards, S. J., & Macdonald, A.S. (2025). On contemporary mortality models for actuarial use I: Practice. British Actuarial Journal, 30, e18. doi: 10.1017/S1357321725000121 CrossRefGoogle Scholar
Van Noorden, R., Maher, B., & Nuzzo, R. (2014). The top 100 papers. Nature, 514, 550553.CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. Family tree for the Bernoulli trial.

Figure 1

Figure 2. The Binomial/Bernoulli family tree and the true Poisson model (FMW = Forfar, McCutcheon and Wilkie).

Figure 2

Figure 3. Kaplan-Meier estimates of survival curves for males and females in the Dutch pension scheme.

Figure 3

Figure 4. Kaplan-Meier estimates for males and females in the UK pension scheme seeking a longevity swap in July 2024.

Figure 4

Figure 5. Estimated proportion of deaths reported for two annuity portfolios where the extract was taken in June 2020.