1. Introduction
Understanding mortality by cause is key to informing medical research decisions and planning social services (Alai et al., Reference Alai, Arnold and Sherris2015; Kjaergaard et al., Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019). It is also important in assessing mortality rates and longevity risk for life insurers, as causal factors can drive the best estimate of mortality and morbidity assumptions for the purposes of reserving and pricing. Analyzing and modeling cause-of-death data present two main challenges: the need to account for inherent dependencies between various causes of deaths and the need to produce forecasts by causes coherent with aggregate mortality forecasts.
Traditional methods for mortality modeling and forecasting, including the Lee-Carter (LC) model (Lee & Carter, Reference Lee and Carter1992) or variations thereof and the age-period-cohort model (Holford, Reference Holford1983; Renshaw & Haberman, Reference Schnaubelt2006), among others, generally do not account for dependencies between competing causes of death. As such, over the past two decades, considerable progress has been made on joint models for multiple causes of death, which capture between-cause dependencies. Arnold and Sherris (Reference Arnold and Sherris2013) applied vector error correction models to cause-of-death mortality rates to quantify the dependence between competing risks and subsequently found an improvement in forecasts compared to methods that do not allow for such dependencies. Alai et al. (Reference Alai, Arnold and Sherris2015) formulated a multinomial logistic model across several causes of death to investigate the effects of improvement and elimination of mortality due to cancer. Li et al. (Reference Li, Li, Lu and Panagiotelis2019) adopted a forecast reconciliation approach to ensure coherence in cause-specific mortality rates, while Li and Lu (Reference Li and Lee2019) introduced hierarchical Archimedean copulas to capture dependence between competing risks in causes of death. More recently, Zhang et al. (2023) developed a predictive approach for cause-of-death mortality modeling that jointly models various causes, ages, and years using a penalized tensor decomposition.
The majority of literature on modeling mortality by cause, including those mentioned above, treats cause-specific life-table deaths as non-compositional, that is, through modeling age-specific mortality rates rather than age distributions of death. Although these methods enable modeling dependencies between mortality rates for different causes, a more direct approach is to forecast the cause-specific death distribution, where the dependence is explicitly incorporated by capturing relativities between deaths of one cause and another. Indeed, cause-of-death data are fundamentally compositional, as deaths have been recorded and attributed to various causes for analysis in globally used medical classifications for epidemiology, health management, and understanding mortality experience (World Health Organization, 1992).
With this in mind, an alternative approach, known as compositional data analysis (CODA), has arisen in the actuarial science literature, which aims to model the cause-specific death distribution directly and produce mortality forecasts arising from the composition of the distribution itself. The idea of CODA dates back to the seminal work of Aitchison (Reference Aitchison1982) for analyzing data that arise as a vector of observations where the elements sum to a constant value and, therefore, only contain relative information. In the context of mortality by cause, the compositional sum constraint translates to avoided deaths from one cause, leading to increased deaths from other causes.
When analyzing compositional data, methods that ignore the compositional constraint and apply standard multivariate data analysis to the raw observations (we refer to such approaches as “raw data analysis” or RDA) can encounter potential issues with coherence when it comes to aggregated mortality forecasts. An alternative approach in CODA is to transform the compositional data from the simplex, subject to the unit sum constraint, to the unconstrained real space before applying standard multivariate data analysis and forecasting. Then, the results are transformed into the compositional space for interpretation and inference. Within this latter approach, log-ratio transformations are by far the most widely used to transform compositional data due to their various attractive compositional properties (see Aitchison, Reference Aitchison1982, for details). The first to propose such a “log-ratio analysis” or LRA for forecasting mortality rates was Oeppen (2008), who applied an LC mortality model to log-ratio transformed death compositions to forecast cause-specific mortality. Oeppen (2008) used centered log-ratio (CLR) transformation (see Section 2 for details) and found that capturing dependencies between subgroups via LRA and the CODA framework improved the overall forecast while assuming independence between causes tended to produce pessimistic results; that is, expected deaths tend to be overstated. Kjaergaard et al. (Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019) further extended this approach by developing two new LRA models for cause-specific deaths, adding cause-specific weights to age and time subgroups, and decomposing joint and individual variation between causes of death to improve forecast accuracy further. Other notable works include that of Bergeron-Boucher et al. (Reference Bergeron-Boucher, Canudas-Romo, Oeppen and Vaupel2017), who applied CODA to produce age-coherent forecasts for mortality; Bergeron-Boucher et al. (Reference Bergeron-Boucher, Strozza, Simonacci and Oeppen2022), who used LRA to model healthy life expectancy; and Kjaergaard et al. (Reference Kjaergaard, Ergemen, Bergeron-Boucher, Oeppen and Kallestrup-Lamb2020), who produced longevity forecasts by socioeconomic group using LRA.
While the aforementioned works use LRA to address some of the issues with analyzing compositional data (relative to RDA), one outstanding challenge with LRA-based modeling is the presence of zero counts/values (Bergeron-Boucher et al., Reference Bergeron-Boucher, Canudas-Romo, Oeppen and Vaupel2017; Kjaergaard et al., Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019, Reference Kjaergaard, Ergemen, Bergeron-Boucher, Oeppen and Kallestrup-Lamb2020). Specifically, compositional data with zero values can be interpreted as lying on a boundary of the simplex. So, naively applying a log-ratio transformation to such data results in one or more transformed values taking $\pm \infty$. In the context of mortality by cause, zero death counts in subcategories of the composition arise commonly for new and emerging or granular causes of death at certain ages and at older ages where exposure is limited. Since the existence and treatment of zeros may lead to differences in the overall inference and forecasts, as mentioned above, this could have consequences on our understanding of longevity risk and mortality improvements, along with associated financial implications (Basel Committee on Banking Supervision, 2013).
In the literature, the problem of zeros when using LRA has often been addressed in an ad hoc manner by omitting, aggregating, or adding small arbitrary values to zero values (Martin-Fernandez et al., 2003). For instance, Kjaergaard et al. (Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019) explored imputing half of the minimum observed death count, a method initially used by Bergeron-Boucher et al. (Reference Bergeron-Boucher, Canudas-Romo, Oeppen and Vaupel2017). Alternatively, Kjaergaard et al. (Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019) noted that Hyndman et al. (Reference Hyndman, Booth and Yasmeen2013) imputed death rates based on information from nearby years for the same age group using linear interpolation. None of these methods is ideal; furthermore, Greenacre (Reference Greenacre2021) compared four different algorithms to substitute zeros and showed the resulting conclusions could be susceptible to the technique of zero substitution. More recently, Greenacre (Reference Greenacre2024) introduced the $\chi$-power transformation to address the problem of zeros in compositional data by combining the chi-squared distance in correspondence analysis with the Box-Cox power transformation.
In this article, we propose a novel approach to modeling mortality by cause with zero values using a modification of LRA. We introduce a compositional power transformation known as the $\alpha$-transformation (Tsagris et al., 2011), which addresses the challenges presented by zero values in the setting of CODA in a more statistically principled manner compared to the aforementioned ad hoc techniques. The $\alpha$-transformation, which maps compositional data to remove their unit sum constraint, is a generalized Box-Cox power transformation that includes both RDA and LRA as special cases but more broadly involves a tuning parameter $\alpha \in (0,1]$. This parameter can be calibrated in a data-driven manner to enable more flexibility in producing forecasts compared to standard LRA when there are zero values in the data. While the $\alpha$-transformation has been applied to CODA for geology and biology, among other fields (Tsagris & Stewart, Reference Tsagris and Stewart2020), to our knowledge, this paper is the first to examine its use in forecasting mortality by age and cause.
We apply the $\alpha$-transformation to two datasets: 16 years of cause-of-death data from England and Wales data and 43 years of cause-of-death data from the USA. In both applications, we disaggregate for cardiovascular causes such that there are data with zero counts in one or more subgroups. We couple the $\alpha$-transformation with the LC mortality model for multivariate analysis and forecasting (similar to those of Kjaergaard et al., Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019; Oeppen, 2008), and compare results with several LRA and RDA approaches where ad hoc methods are used to deal with zero values. The results across both applications demonstrate that the $\alpha$-transformation generally improves mortality forecast by cause, while having the added benefit of being able to analyze compositional data with zero counts in a rigorous yet data-driven manner. The $\alpha$-transformation is shown to address the key issue of zero counts in mortality data, generalizing the log-ratio transformation to a broader class of transformations and providing additional flexibility and improved performance when forecasting mortality by cause using CODA-based techniques.
The remainder of this paper is structured as follows: Section 2 reviews several key ideas, including the Lee-Carter (LC) mortality model and LRA. Section 3 introduces the $\alpha$-transformation for mortality by cause data. Section 4 applies the proposed methodology to forecast mortality on cause-of-death data from England and Wales and the USA, while Section 5 offers some concluding remarks.
2. Review of key concepts
We review three foundational concepts for understanding how the $\alpha$-transformation can be applied to cause-of-death mortality modeling, namely, compositional data (Section 2.1), log-ratio analysis or LRA (Section 2.2), and the LC mortality model (Section 2.3).
2.1 Compositional data
Cause-specific mortality can be represented by actual death counts per combination of year, age group, and cause. Specifically, let $D_{t, u, c}$ denote the actual death count for year $t = 1, 2, \dots, T$, age group $u = 1, 2, \dots, U$, and cause $c = 1, 2, \dots, C$, and define $D_t = \,\sum _{u=1}^U \sum _{c=1}^C D_{t, u, c}$ as the total deaths across all age bands and cause groups for year $t$. Then we can calculate $d_{t, u, c} = D_{t, u, c}/D_{t}$ such that for a given year, the vector $\boldsymbol{d}_t = (d_{t,1,1},d_{t,1,2},\ldots, d_{t,1,C},d_{t,2,1},d_{t,2,2}, \ldots, d_{t,2,C}, d_{t,u,1},d_{t,u,2}, \ldots, d_{t,U,C})$ represents the density distribution of deaths by age group and cause. The densities in $\boldsymbol{d}_t$ are ordered such that the cause runs faster than age. Moreover, the compositional vector satisfies $\sum _{u=1}^U \sum _{c=1}^C d_{t, u, c} = 1$. Moreover, by stacking the $\boldsymbol{d}_t$’s as row vectors on top of each other, we can form the $T \times UC$ compositional matrix $\mathbf {D}$ of death densities
Due to the sum-to-one constraint, only $UC-1$ elements are needed to uniquely determine each vector $\boldsymbol{d}_t$. Statistically then, the sample space for compositional cause-of-death mortality data is a simplex: for all $t = 1, \dots T$,
2.2 Log-ratio analysis
A common approach to analyzing compositional data is to employ the log-ratio transformations class, which seeks to transform the data from the simplex back to an unconstrained real space before building a statistical model for analysis. The two most common types of transformations within LRA are the CLR and isometric log-ratio (ILR) transformations, which we consider in this paper. Importantly, the CLR and ILR are used for analyzing compositional data without zero values.
The CLR transformation is defined by dividing all the values in the compositional vector by their geometric mean before applying the natural log transformation. For row $t$ in (1), the CLR for each element is given by
The CLR transformation is symmetric relative to the compositional parts and has the same number of components as the number of parts in the original composition. We can express the CLR-transformed vector as $\boldsymbol{w}(\boldsymbol{d}_t) = (w(d_{t,1,1}),w(d_{t,1,2}),\ldots, w(d_{t,U,C}))$, noting distances between any two elements of this vector remain the same when measured in the simplex and the real space, thus making the CLR particularly useful for analysis (Grifoll et al., Reference Grifoll, Ortego and Egozcue2019). While each element is no longer constrained to be nonnegative (in principle, they can take any real number), the entire vector remains constrained since the elements must sum to zero by the construction of (2).
To further remove this constraint, the ILR left matrix multiplies the CLR-transformed vector by a Helmert sub-matrix and has been promoted as the more theoretically correct method (especially to contrast groups of elements) in CODA (Greenacre & Grunsky, Reference Greenacre and Grunsky2019). The Helmert sub-matrix is an orthonormal $(UC-1) \times UC$ matrix formed by deleting the first row of the Helmert orthogonal matrix (see Greenacre, Reference Greenacre2021, and Tsagris & Stewart, Reference Tsagris, Preston and Wood2022, for technical details). If we denote this Helmert sub-matrix as $\boldsymbol{H}$, then the ILR-transformed vector is defined as
and is no longer subject to any constraint. That is $\boldsymbol{z}(\boldsymbol{d}_t) \in \mathcal {R}^{UC-1}$, and all of its elements can take any real value.
The CLR and ILR aim to transform compositional data into real unconstrained space. On the other hand, as both these transformations are based on taking logarithms, then such methods will not work if one or more of the actual death counts, and subsequently one or more of the $d_{t,u,c}$’s, are exactly zero in value. This is the motivating problem for our subsequent developments as, in practice, many datasets of death counts tend to include zeros for some cause and age combinations.
2.3 The Lee-Carter model for compositional data
We describe a modification of the LC model introduced by Oeppen (2008) for compositional data. We refer to this model as the LC-CODA model, and its construction can be summarized in the following steps.
(I) Center each row of $\boldsymbol{D}$ in (1) by taking the inverse perturbation of the geometric mean from each row of death densities. This results in a matrix of centered death densities, denoted here as $\widetilde {\boldsymbol{D}}$.
-
(II) Apply the CLR transformation to each row of $\widetilde {\boldsymbol{d}}$, mapping the vector of $UC$-compositions for a given year $t$ from the simplex to a $UC$-dimensional Euclidean subspace.
-
(III) Fit and forecast the transformed data using the LC model. Note other more sophisticated models are possible here (e.g., Bergeron-Boucher et al., Reference Bergeron-Boucher, Canudas-Romo, Oeppen and Vaupel2017; Kjaergaard et al., Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019, Reference Kjaergaard, Ergemen, Bergeron-Boucher, Oeppen and Kallestrup-Lamb2020), and this step and all our developments can be modified to employ such approaches. For simplicity, though, we focus on the LC model.
-
(IV) Back-transform the estimated death densities to the simplex by inverting the CLR transformation and performing a compositional perturbation to the geometric mean for each row estimate to obtain the final forecasted compositional results.
We elaborate each of the steps above in detail. Consider the matrix of compositional death densities in (1), and compute $\mathbf {g}$ as the $UC$-vector, the elements of which are given by the column-wise geometric mean of $\mathbf {D}$, that is, $\mathbf {g} = ((\prod _{t=1}^T d_{t, 1, 1})^{1/T}, (\prod _{t=1}^T d_{t, 1, 2})^{1/T}, \dots, (\prod _{t=1}^T d_{t, U, C})^{1/T} )$. Next, define the perturbation operation and its inverse as follows (Aitchison, Reference Aitchison1982). For two vectors of compositions $X = (x_1, x_2, \dots, x_n)$ and $Y = (y_1, y_2, \dots, y_n)$, all of the elements of which are nonzero, we have
where the operator $C$($\cdot$) “closes” the row, that is, normalizes by dividing each entry by the sum of all entries.
In Step (I) of fitting the LC-CODA model, we apply a centering process to construct a matrix of centered death densities, $\widetilde {\boldsymbol{D}}$, where the $t$th row of $\widetilde {\boldsymbol{D}}$ for $t = 1,\ldots, T$ is computed as
Note the elements in $\mathbf {g}$ can be considered analogues of the age- and cause-specific average mortality over time in a standard LC model.
In Step (II), we apply the CLR transformation to obtain the vector $\boldsymbol{w}(\widetilde {\boldsymbol{d}}_t) = (w(\widetilde {d}_{t,1,1}), w(\widetilde {d}_{t,1,2}),\ldots, w(\widetilde {d}_{t, U, C}))$ for $t=1,\ldots, T$, where to be clear the elements are computed using (2) except replacing $d_{t,u,c}$ with $\widetilde {d}_{t,u,c} = d_{t,u,c}/(\prod _{t=1}^T d_{t,u,c})^{1/T}$. Let $\boldsymbol{w}(\widetilde {\boldsymbol{D}})$ denote the resulting $T \times UC$ matrix formed by stacking the $\boldsymbol{w}(\widetilde {\boldsymbol{d}}_t)$’s as row vectors on top of one another.
In Step (III), we apply the singular value decomposition to $\boldsymbol{w}(\widetilde {\boldsymbol{D}})$ and estimate the LC mortality model analogous to how it is done for the non-compositional setting. We provide details of this in Appendix A.1, but to summarize, we fit a model of the form
where $b_{u, c}$ denotes age- and cause-specific coefficients that vary over time, $k_{t, c}$ denotes factors of time-varying indices for the level of mortality, and $\epsilon _{t, u, c}$ denotes a residual error term. Note that a mean/intercept term is omitted from (5) due to centering from the geometric mean in Step (I). For forecasting, we can adopt a similar approach to Kjaergaard et al. (Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019) and Zhang et al. (Reference Zhang, Huang, Hui and Haberman2023), among others, who applied time-series methods such as random walk with drift to $k_{t, c}$, and substitute forecasted values of these back in to (5).
Finally, in Step (IV), after obtaining forecasted values of $w(\widetilde {d}_{t, u, c})$, we can apply an inverse CLR transformation followed by a perturbation operation to obtain the actual forecasted death density distribution. In detail, suppose that at a future time $T' \gt T$, the predicted value of the time factor for cause $c$ is given by $\widehat {k}_{T',c}$, while the estimated age- and cause-specific coefficients from Step (III) are given by $\widehat {b}_{u,c}$. Then a vector of forecasted centered death densities is given by $\widetilde {\boldsymbol{d}}_{T^\prime} = (\widetilde {d}_{T',1,1},\widetilde {d}_{T',1,2},\ldots, \widetilde {d}_{T',U,C})$ where $\widetilde {d}_{T',u,c} = w^{-1}(\widehat {b}_{u,c}\widehat {k}_{T',c})$ and $w^{-1}(\cdot )$ denotes the inverse CLR transformation. The corresponding vector for forecasted death densities from the LC-CODA model is then given by $\widehat {\boldsymbol{d}}_{T^\prime} = \widetilde {\boldsymbol{d}}_{T^\prime} \oplus \boldsymbol{g}$.
Compared to modeling mortality rates independently, one key element of using the LC-CODA model is that death counts are naturally redistributed through compositional constraints. As mortality changes over time, if some deaths do not occur at a specific age band and cause, they are naturally shifted toward a different age band and cause group. This maintains subcompostional coherence with the total number of deaths per year as given by the initial life table and ensures the disaggregated death forecasts will be coherent with the overall aggregated mortality forecast (Oeppen, 2008). In the context of compositional data, subcompositional coherence refers to the property that relationships between parts of a composition are unaffected by forming subcompositions, such that results and summary statistics based on the subcomposition are the same as the composition (Greenacre, Reference Greenacre2021). On the other hand, due to its reliance on the CLR transformation, the LC-CODA model is unable to handle zero values in the raw densities $d_{t,u,c}$, and these would need to be omitted, aggregated, or replaced with an arbitrarily small value before step (I).
We present a more detailed exposition of LC-CODA in Appendix A.1, which we use in the application of the $\alpha$-transformation and log-ratio transformations in this paper.
3. Mortality by cause using the $\alpha$-transformation
Motivated by the challenges of applying LRA to cause-of-death mortality modeling where there are one or more zero values in the death densities, we propose using the $\alpha$-transformation before applying the LC-CODA model for forecasting.
The $\alpha$-transformation can be viewed as a Box-Cox transformation applied to the ratios of components, where $\alpha \in (0,1]$ is a tuning parameter that is tuned to handle compositional challenges in the data with zeros (Tsagris et al., 2011). In detail, let $w^\alpha (x)$ represent the Box-Cox transform of a random variable $x$ (Box & Cox, Reference Box and Cox1964),
and recall the matrix of centered death densities $\widetilde {\boldsymbol{D}}$ in (4). For row $t = 1,\ldots, T$, the $\alpha$-transformation is then defined as
where $\boldsymbol{H}$ is the Helmert sub-matrix defined as part of the ILR transformation in (3) and $\boldsymbol{w}^\alpha (\widetilde {\boldsymbol{d}}_t)$ denotes the vector where the Box-Cox transformation is applied to each element of $\widetilde {\boldsymbol{d}}_t$. That is, $w^\alpha (\widetilde {d}_{t,u,c}) = \ln (\widetilde {d}_{t,u,c})$ if $\alpha = 0$, otherwise $w^\alpha (\widetilde {d}_{t,u,c}) = (UC)(\widetilde {d}_{t,u,c}^\alpha -1)/\alpha$ for $\alpha \neq 0$. Note when $\alpha = 0$, the transformation reduces to the ILR transformation defined in (3). If there is no left matrix multiplication by the Helmert sub-matrix $H$, then we obtain the CLR in (2). Critically, when $\alpha$ is restricted to be greater than zero, the transformed values are well defined even when the raw death densities $\widetilde {d}_{t,u,c} = 0$. This differs from both the ILR and CLR, neither of which can be computed for zero values.
The corresponding sample space of the $\alpha$-transformation is known as the $\alpha$ space, which we denote as $\mathbb {A}^{UC-1}_\alpha$ and is given by
It is not difficult to see that, similar to the ILR transformation, the vectors in $\mathbb {A}^{UC-1}_\alpha$ are not subject to the zero-sum constraint. As $\alpha \rightarrow 0$, then $\mathbb {A}^{UC-1}_\alpha$ tends to the $(UC-1)$ dimensional real space $\mathcal {R}^{UC-1}$; this is again consistent with the ILR, except now zero values of death densities can be handled provided $\alpha \ne 0$ (Tsagris & Stewart, Reference Tsagris, Preston and Wood2022). On the other hand, when $\alpha = 1$, the $\alpha$-transformation is equivalent to RDA, that is, the same as applying standard multivariate analysis ignoring the compositional constraint. While $\alpha$ is often determined using a data-driven approach through maximum likelihood estimation (Tsagris et al., 2011), for strong forecasting performance, in Section 4.1, we discuss an alternative method based on minimizing out-of-sample prediction accuracy.
To construct the LC model in conjunction with the $\alpha$-transformation, we can apply similar steps to those discussed in Section 2.3, except that Step (II) is modified to Step (IIa) where we apply the $\alpha$-transformation instead of the CLR, and Step (IV) is modified to Step (IVa) where the transformation back to the simplex requires inverting the $\alpha$-transformation to obtain the final forecast. With regard to the latter, after forecasting the factors $k_{t,c}$ in a similar manner to Section 2.3, the forecast result derived based on the $\alpha$-transformed data needs to be mapped back to the compositional simplex.
In detail, at future time $T^{'}\gt T$ and for $\alpha \gt 0$, let $\widehat {\boldsymbol{z}}^\alpha (\widetilde {\boldsymbol{d}}_{T^{'}}) = (\widehat {z}^{\alpha }(\widetilde {d}_{T',1,1})$,$\ldots, \widehat {z}^{\alpha }(\widetilde {d}_{T',U,C}))$ denote the vector of forecasted $\alpha$-transformed centered death densities, where $\widehat {z}^\alpha (\widetilde {d}_{T',u,c}) = \widehat {b}_{u,c}\widehat {k}_{T',c}$. Then, the vector of the corresponding inverse $\alpha$-transformed values is given by $ v^\alpha (\widetilde {\boldsymbol{d}}_{T^\prime}) = \alpha \boldsymbol{H}^\top \widehat {\boldsymbol{z}}^\alpha (\widetilde {\boldsymbol{d}}_{T^\prime}) + 1$.
Afterward, the forecast vector of death densities at time $T'$ is given by
and $\widehat {\boldsymbol{d}}_{T^\prime} = \widetilde {\boldsymbol{d}}_{T^\prime} \oplus \boldsymbol{g}$.
To conclude, we remark that as long as the forecasted data $\widehat {\boldsymbol{z}}^\alpha (\widetilde {\boldsymbol{d}}_{T^\prime})$ lies inside $\mathbb {A}_\alpha ^{UC-1}$ defined by the original data, then it can be mapped back to the simplex for inference. In some cases during the process of forecasting, for example, for long-term forecasts when $T' \gg T$, it is possible one or more values of $\widehat {\boldsymbol{z}}^\alpha (\widetilde {\boldsymbol{d}}_{T^\prime})$ are less than $-1/\alpha$ and lie outside the $\alpha$-space. This indicates the corresponding forecasts are at or crossing the boundary of the simplex. In such cases, to ensure the inverse $\alpha$-transformation is possible, we choose to set corresponding elements of $\widehat {\boldsymbol{z}}^\alpha (\widetilde {\boldsymbol{d}}_{T^\prime})$ equal to the boundary value of $-1/\alpha$ (see, e.g., Tsagris et al., 2011, for a similar treatment).
4. Application to the Human Cause-of-Death database
We illustrate an application of the $\alpha$-transformation coupled with an LC model to cause-of-death counts and life-table deaths for two datasets from England and Wales and the USA as part of the Human Cause-of-Death Data series (HCD, 2024). England and Wales were selected as there have been relatively minimal fluctuations in cause composition during the available data period, while the USA was selected to assess the performance of the proposed $\alpha$-transformation for a larger dataset spanning more historical years. Disaggregated causes of death within the cardiovascular causes were selected since cardiovascular disease has been steadily decreasing over the past few decades but remains the second-largest cause of death in the UK (British Heart Foundation, 2023; National Institute for Health and Care Excellence, 2023; Raleigh et al., Reference Renshaw and Haberman2022). Data on the complete list of causes of death were obtained, containing 103 causes at the “long” level for England and Wales and 206 causes for US death counts. We treat males and females as separate data sources and perform analysis separately by gender; this is consistent with treatment in earlier CODA literature (e.g., Kjaergaard et al., Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019; Oeppen, 2008).
To perform analysis and forecasting, we aggregated based on age bands and selected causes. We constructed nine age bands: ages 0–24, ages 25–34, ages 35–44, ages 45–54, ages 55–64, ages 65–74, ages 75–84, ages 85–95, and ages over 95. There was an additional age band for the US data comparison, namely, ages 90–99 and then ages over 100. This additional age band was possible due to the availability of the granular death count data from the Human Cause-of-Death Data series (2024) for the USA. Note the age band 0–24 is not a homogeneous group relative to the other age bands, but the reason for aggregating at these ages is twofold: first, for application to life insurance, analysis is typically performed for working age groups; and second, by aggregating across 0–24, there is greater credibility in death counts. We leave the assessment of the variation of deaths by cause at younger ages as an avenue for future investigation.
Turning to causes, for England and Wales’ death counts, we aggregated death counts by cause into 11 causes as per the HCD shortlist, with only the cardiovascular causes disaggregated to the “long” list level. For the US death counts, we ensured the same International Classification of Diseases-10 (ICD-10) causes of death were used for comparison. These same cardiovascular causes were mapped to 12 causes as per the HCD “long” list level for the US data. Cardiovascular causes were selected as cardiovascular disease causes of death have steadily decreased over the data period, as introduced at the start of this section. All other causes of death were grouped and aggregated for analysis. The selected cardiovascular causes of death for both datasets are shown in Table 1.
In the disaggregated data for cardiovascular deaths, zero death counts were present across most causes in the disaggregated cardiovascular death category over the available period (2001–2016 for England and Wales and 1979–2021 for the USA,) and when split by age band and across both genders. For example, for England and Wales male data, rheumatic heart disease had zero counts for ages less than 20 (and also for ages 20–30 in 2010) for 2002, 2006, 2010, and 2012–2015. Also, for males, cardiac arrest death counts were zero for ages 40–50 in the year 2004. Similarly, for US male data, acute rheumatic deaths had zero counts for ages less than 20 in 1998, 2002, 2004, 2006–2009, 2014–2016, 2018–2019, and 2021. The same cause had zero counts for ages up to 40 in 2007 and across other older bands in the available years.
In total, for the England and Wales death counts, of the ten cardiovascular causes of death, six had one or more zero counts across both genders in the data: rheumatic heart disease, essential hypertension, hypertensive disease, acute myocardial infarction, cardiac arrest, and heart failure. Not surprisingly, zero death counts for most causes tended to be more prevalent in some years at younger ages (below 50). Similarly, for US death counts, five of the total 11 cardiovascular causes had zero counts across genders and age bands: acute and chronic rheumatic, hypertension, and hypertensive (both heart and renal). Figures 1 and 2 present aggregated death counts across all ages from 2001 to 2016 for England and Wales and from 1979 to 2021 for US deaths. As observed, the number of deaths for some causes is small, even when aggregated across all ages. With the above in mind, we anticipate forecast performance will improve by explicitly working with actual death counts, that is, including zero values, compared with the standard approach of excluding zeros or replacing them with an arbitrarily small amount.
4.1 Tuning $\alpha$ parameter
To predict cause-of-death data with zero death counts, we proposed selecting an optimal value of $\alpha$ based on out-of-sample forecast accuracy as assessed via an expanding window cross-validation approach. Specifically, for the England and Wales data, as the available data only spanned 16 years, we adopted a simple fourfold expanding window. For the US data, as there were 43 years of data, we adopted a tenfold expanding window. On England and Wales deaths, this meant the first fold consists of the years 2001–2008 for training and 2009–2012 for validation, the second fold consisted of 2001–2009 for training and 2010–2012 for validation (i.e., the training window was increased by one year) and so on. In each fold, the $\alpha$-transformation coupled with the LC model as detailed in Section 3 was fitted to the training set, and forecasts were made to the validation set. The years 2013–2016 were held out from all four folds as a test set. Analogously, for the US data, the first fold consisted of the years 1979–2001 for training and 2002–2011 for validation, the second fold consisted of 1979–2002 for training and 2003–2011 for validation, and so on. The years 2012–2021 were held out from all folds as a test set. We remark that as the compositional cause-of-death data exhibits a natural time-series dependence, then an expanding window (or forward chaining) cross-validation method was adopted to tune $\alpha$; we refer to Racine (Reference Raleigh, Jefferies and Wellings2000) and Schnaubelt (Reference Shahraz, Bhalla, Lozano, Bartels and Murray2019) for more details around cross-validation in the context of time-series analysis.
For both datasets, we selected $\alpha$ based on minimizing either the average root mean square error (RMSE) or average mean absolute error (MAE) across the four validations sets:
where $\text {observed}_{t,u,c}$ generically denotes the death count for age band $u$, cause $c$ and the $t$th year in the validation set, $\text {predicted}_{t,u,c}$ denotes the corresponding predicted death count, and $T_k$ denotes the number of years in the $k$th validation fold.
Both RMSE and MAE are widely used in model evaluation to measure forecast accuracy (Chai & Draxler, Reference Chai and Draxler2014; Hodson, Reference Hodson2022).
Full results from applying the above cross-validation approach are provided in Appendix A.2. Overall, the optimal $\alpha$ determined using the above cross-validation approach was $0.1$ and $0.8$ for males and females, respectively, when applied to England and Wales cause-of-death data. On the other hand, optimizing $\alpha$ on the US data yielded values of $0.7$ and $0.9$, respectively, for males and females. In three of the four cases for optimizing $\alpha$, the minimum RMSE and MAE produced the same results. Interestingly, the optimal $\alpha$ chosen for the US female data was $1.0$ when using RMSE as the criteria: since the $\alpha$-transformation here converges to RDA, this suggests the compositional constraint impacted the analysis to a lesser extent for this setting. On the other hand, since using MAE produced both lower RMSE and MAE in the validation sets compared with the optimal $\alpha$ determined using RMSE, then we decided to choose the optimal $\alpha$ as $0.9$ for the US female data.
4.2 Results: England and Wales data
Using the values of $\alpha$ tuned in Section 4.1, we produced mortality forecasts of proportions of deaths by cause for the test set (England and Wales years 2016–2020 and US years 2012–2021) using the $\alpha$-transformation coupled with the LC model. We compared this with several LRA methods in the literature for addressing zeros counts, including the CLR and ILR transformations where zeros were omitted from the data and the CLR and ILR with all zeros replaced by 0.25 or 0.5 before modeling. These additional methods were coupled with an LC model for forecasting and comparison.
Table 2 summarizes the performance of females and males. Aside from the optimal values of $\alpha$, we also considered values $\alpha = 0.5$, $\alpha = 0.7$, $\alpha = 0.9$, and $\alpha = 1$, the latter equivalent to RDA, that is, ignoring the compositional constraint. The $\alpha$-transformation, on the whole, tended to produce better forecasting accuracy for the US dataset compared with the CLR and ILR plus either ad hoc method of handling zero values. Improvements in the forecast were more evident when assessing MAE across both genders, although even with RMSE, the $\alpha$-transformation was the second-best performer. Visually, Fig. 3 corroborates the results for males and females, where the $\alpha$-transformation better fits the observed data when compared with the corresponding CLR and ILR transformations.
The results in Fig. 4 are consistent with the broader observations that overall mortality experienced due to cardiovascular causes in the UK has been improving since the 1960s (British Heart Foundation, 2023; NHS, Reference Oeppen2023; Office for National Statistics Reference Park, Choi and Lee2021), although forecasts suggest that an expected decline in the major cardiovascular causes (myocardial infarction and pulmonary heart disease) will be offset by forecast increases in the “other heart” cause category. Again, results from the $\alpha$-transformation follow the observed data over time more closely compared to CLR and ILR with zeros removed. Moreover, the standard LRA approaches, where a value of 0.25 or 0.5 was added to the zeros, tended to forecast higher proportions for causes with the lowest proportion of deaths (in this case, cardiac arrest), which is offset by lower forecast proportions across all other causes (results shown in Appendix A.3). This result is consistent with the fact that these approaches arbitrarily introduce small death counts where there are none.
4.3 Results: US data
For the larger US cause-of-death dataset, Fig. 5 shows the movement in actual proportion of deaths for each cause over the historical data for US death counts, in a similar way to Fig. 3.
The $\alpha$-transformation results followed the observed data over time more closely compared to CLR and ILR with zeros removed. This is shown in Table 3 and Fig. 6. Moreover, the standard LRA approaches, where a value of 0.25 or 0.5 was added to the zeros, tended to forecast higher proportions for causes with the lowest proportion of deaths (in this case, cardiac arrest), which is offset by lower forecast proportions across all other causes (results shown in Appendix A.3). This result was consistent with the arbitrary introduction of a small death count where none existed. More importantly, compared with England and Wales data, the forecast performance using the $\alpha$-transformation was even further improved in the application. Indeed, it suggests that, with a larger volume of data available, our proposed approach can exhibit greater forecasting performance compared to existing log-ratio transformation approaches.
In summary, the point forecast results across both applications suggested that the $\alpha$-transformation, a generalization of the log-ratio transformation to a broader class of transformations, was an effective way to address zero counts in compositional data, especially compared to ad hoc methods of adding small death counts. In Appendix A.4, we performed a sensitivity analysis to assess how much the performance of the two applications depended on the precise $\alpha$ value chosen. Overall, results showed that forecasting performance was largely unaffected when the value of $\alpha$ changed within the tolerance of 0.1 that we employed when tuning this parameter in Section 4.1.
4.4 Interval forecasts
To further understand the projected deaths using the $\alpha$-transformation, we used interval forecasts to quantify the uncertainty around the point forecast and a further source of (probabilistic) comparison between different methods across both applications of the HCD data (i.e. England and Wales and US death counts). Briefly, the interval forecasts were produced by adapting the proposed method of Shang and Haberman (Reference Tang, Wu, Yang and Tian2020) for use with the CLR, ILR, and $\alpha$-transformations and involved the following steps.
(I) Transform the compositional data into the real space using the three methods explored (CLR, ILR, and the proposed $\alpha$-transformation). Construct the point forecast as per Sections 4.2 and 4.3.
(II) Bootstrap (sample with replacement) the forecast component scores (i.e., $b_{u, c}$ or the age- and cause-specific coefficients that vary over time) and the model fit errors (i.e., $\epsilon _{t, u, c}$) in equation (5). By doing this a large number of times and then taking the empirical quantiles (here, 90% intervals are shown), upper and lower bounds for the interval forecast in real space are produced.
(III) Transform the interval forecast from the real space to the simplex for inference using the corresponding inverse CLR, ILR, or $\alpha$-transformations. Finally, add back the geometric mean as per the original point estimate approach discussed per equation (5).
Results for the interval forecasts for both applications are presented in Figs. 7 and 8, where the $\alpha$ parameters used in producing interval forecasts were optimized via the interval score approach of Shang and Haberman (Reference Tang, Wu, Yang and Tian2020). Overall, the results across CLR, ILR, and the $\alpha$-transformation were largely consistent with the corresponding point forecast results shown previously in Figs. 4 and 6. Nevertheless, the interval forecast offers an additional view of uncertainty around the point forecast and reflects the possible extents to which the composition of mortality across different causes could change into the future based on each model.
4.5 Alternative approaches and future directions
In this section, we consider an alternative approach to the $\alpha$-transformation for forecasting mortality by cause. Specifically, we consider the multinomial logistic regression (MLR) of Alai et al. (Reference Alai, Arnold and Sherris2015) and compare its forecast performance to the $\alpha$-transformation.
The MLR model is often used to detect factors significantly influencing a response with several competing outcomes. In the literature, numerous applications of the MLR model have been undertaken in cause-of-death analysis over the past three decades. For example, Eberstein et al. (Reference Eberstein, Nam and Hummer1990) used eight categorical and continuous independent variables, including marital status, education, and birth weight, to model five infant cause-specific mortality rates. Lawn et al. (Reference Lawn, Wilczynska-Ketende and Cousens2006) applied MLR to model the distribution of neonatal deaths in countries with poor data (see Johnson et al., Reference Johnson, Liu, Fischer-Walker and Black2010, for related work). Shahraz et al. (Reference Shang and Haberman2013) employed MLR to redistribute unknown or ill-defined deaths, while Park et al. (Reference Racine2006) used it to account for the impact of the tenth revision of the ICD.
For illustrative purposes, we applied the MLR model to US male cause-of-death counts only, disaggregated for cardiovascular causes as per the application in Section 4.3. The forecast performance from applying MLR was assessed using the sum of the squared residual errors. Based on this, we found that the single and simple MLR performed best when compared against the quadratic and cubic MLR. We present results for these in Figs. 9 and 10, which are analogous to those presented earlier in Figs. 5 and 6. Note in assessing the fits, the problem of zeros was still present in the actual death rates by cause; we handled this by adding a 0.01 death count before calculating mortality rates and taking logarithms.
To compare with the forecast performance using CODA methods and shown in Table 3, we calculated the equivalent RMSE and MAE (scaled by 100) for the MLR application to US male death counts. In this application, the simple MLR produced RMSE and MAE of 2.289 and 1.126, whereas the single MLR produced RMSE and MAE of 2.040 and 1.038. This is substantially higher than the errors of 0.2877 and 0.1299 when we apply the CODA method using an $\alpha$-transformation. We conjecture similar results would also arise for the case of the US female cause-of-death count data, as well as the England and Wales data. Overall, the comparison indicates that, perhaps not surprisingly, CODA approaches perform better when forecasting using compositional data.
Beyond the MLR model, another method to address the problem of zeros in composition data is applying the Dirichlet distribution. This idea has previously been explored by Tsagris and Stewart (Reference Tsagris and Stewart2018) and Graziani and Nigri (Reference Graziani and Nigri2023) in modifying the log-likelihood of the Dirichlet distribution. Such approaches have been applied in other fields, including biology and chromosome detection Tang et al. (Reference Tsagris and Stewart2022). Further exploration of the Dirichlet composition distribution in understanding mortality by cause would further the understanding of mortality forecasting by cause. Finally, a forecast reconciliation approach can be adopted to ensure forecast coherence instead of applying CODA (Li et al. Reference Li, Li, Lu and Panagiotelis2019). Such approaches address the potential problems arising when subpopulation mortality forecasts do not sum up to the aggregate forecast, and they could be considered an alternative approach where there are few zeros in subgroups.
5. Conclusion
In this paper, we have introduced the $\alpha$-transformation, coupled with an LC model for mortality modeling, as a statistical method to handle cause-of-death compositional data with zero values. Using an expanding window cross-validation approach to select $\alpha$, we presented two applications to death counts by cause, disaggregated for cardiovascular causes in England and Wales data from 2001 to 2016 and on US data from 1979 to 2021. Forecasts using the $\alpha$-transformation tend to perform better than those produced using standard log-ratio transformations and is particularly evident in the application to US death counts by cause, having more years of historical data.
We tested a single model (LC) in the compositional framework and focused on heart-related causes of death, where the dataset includes zero counts for several years and age bands. Mortality forecasting by cause may be further enhanced by combining the $\alpha$-transformation with variations of the LC model, for example, a model which decomposes cause-specific variation into joint and individual variation (Kjaergaard et al., Reference Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen and Lindahl-Jacobsen2019), or using nonparametric techniques such as smoothers or tensor decompositions (Zhang et al., Reference Zhang, Huang, Hui and Haberman2023). Also, rather than adding a small death count or removing zeros entirely, other approaches could be compared against the $\alpha$-transformation, including “borrowing” from a neighboring age (for the same cause) or smoothing over similar causes (for the same age), along with other imputation methods (Lubbe et al., Reference Martin-Fernandez, Barcelo-Vidal and Pawlowsky-Glahn2021). We leave such investigations as avenues for future study.
One feature of the death counts by cause for both England and Wales and the USA, which is true of many other cause-of-death datasets in other countries, is that zero counts of death for multiple causes tend to occur across consecutive years and/or adjacent age groups. In other settings with fewer or no zeros count, and where the occurrence of the zeros is more sporadic, simpler approaches, such as adding a small value to enable LRA may have fewer implications on the analysis and conclusions relative to using the $\alpha$-transformation (Tsagris & Stewart, Reference Tsagris, Preston and Wood2022). Conversely, for older ages and emerging causes with only recent data (including COVID-19), reflecting true zeros in the data in a statistically more rigorous and data-driven manner, as the $\alpha$-transformation does, is expected to produce more accurate forecasts.
While CODA is useful in capturing dependencies between causes arising due to the compositional nature of the data, other dependencies, such as comorbidities, can arise irrespective of how the data are treated. An important avenue of future research is how methods such as $\alpha$-transformation could be coupled with techniques that can account for such dependencies. Indeed, an essential application of CODA for life insurers is to enhance the understanding of morbidity and mortality risks. CODA can also be used to investigate the risk implications across different subgroups of insured lives and exposures, and we anticipate the $\alpha$-transformation will play a useful role in modeling compositional data arising from these other settings. Finally, the results from the application suggest that while the aggregate cardiovascular death counts are expected to reduce, some granular causes of death within the cardiovascular cause are expected to increase, particularly for males across England and Wales. Analysis using US death counts indicate slight decreases across all granular cardiovascular causes. These findings should be further investigated, along with other causes.
Acknowledgments
The authors are grateful for the comments from the participants at the Insurance Data Science conference in 2023 and the Conference in Celebration of David Wilkie’s 90th birthday in 2024.
Funding statement
The second author was supported by an Australian Research Council Discovery Project, DP230102250, and the third author was supported by an Australian Research Council Discovery Project, DP230101908.
Data availability statement
The data and code that support the findings of this study are openly available at https://github.com/zm-dong/coda_cause_mortality.
A. Supplementary Information and Results
A.1 The Lee-Carter model for modeling mortality
This paper applies the LC mortality model after LRA and the $\alpha$-transformation (Lee & Carter, Reference Lee and Carter1992). For completeness, this appendix provides a brief review of the LC model for analysis of non-compositional data, that is, RDA.
Treating causes independently, the LC model fits and predicts central mortality rates by expressing the log mortality rate as a linear function of a time factor with age parameters. For cause $c$, let $m_{t, u, c}$ denote the central death rate for age $u$ in year $t$, which we compute as $m_{t, u, c} = d_{t, u, c}/L_{t, u}$, where the denominator $L_{t, u}$ is the exposure of person-years lived at age $u$. The LC model is then defined as
where $\mu _{u, c}$ represents an age- and cause-specific average mortality over time, $b_{u, c}$ denotes the age- and cause-specific coefficients that vary over time; $k_{t, c}$ denotes a factor of time-varying indices for the level of mortality, and the $\epsilon _{t, u, c}$ denote residual error terms. The model is typically fitted by applying a singular value decomposition to a $U \times T$ matrix the elements of which are given by $\ln (m_{t, u, c})$, after subtracting the average mortality rate over time for a given cause. After fitting, mortality forecasting is performed by modeling the estimated time factors $k_{t, i}$ as an autoregressive integrated moving average time series. The common choice is a simple random walk with drift. We refer the reader to Lee and Carter (Reference Lee and Carter1992) for more details regarding parameter estimation of the LC model.
The LC model is commonly used for national forecasts, with its primary advantages including its simplicity, ability to deal with uncertainty, and low requirement for subjective judgment (Bergeron-Boucher & Kjærgaard, Reference Bergeron-Boucher and Kjærgaard2022). With its simplicity comes a number of limitations, and consequently many variations of LC exist to improve its performance. Among many others, examples include Renshaw and Haberman (Reference Renshaw and Haberman2003), which generalized the LC model to include more than one factor; the Cairns et al. (Reference Cairns, Blake and Dowd2006) model, which is a popular alternative that models the probability of survival rather than the $\log _{10}$ mortality rates; the Lee and Miller (Reference Lee and Miller2001) and Booth et al. (Reference Booth, Maindonald and Smith2002) models, both of which aim to improve the forecasting performance of the LC model (Booth et al., Reference Booth, Tickle and Smith2005); and Li and Lee (Reference Lubbe, Filzmoser and Templ2005) and Gao and Shi (Reference Gao and Shi2021), who apply coherence in the context of mortality modeling and age-coherent extensions of LC, respectively.
A.2 Additional results for the application to the HCD database
Table A.1 shows the results from cross-validation for England and Wales’s cause-of-death data. Based on cross-validation, we determined the optimal $\alpha$ value is 0.1 for males and 0.8 for females. This was then applied to produce the results in Section 4.2.
Table A.2 similarly shows the results from cross-validation for US death counts by cause to determine the optimal $\alpha$. Based on cross-validation, we determined the optimal $\alpha$ value is 0.7 for males and 0.9 for females. This was applied to produce the results in Section 4.3.
A.3 Additional results comparing forecast performance using CLR and ILR transformations with different techniques to replace zero counts
We further compared the performance of CLR and ILR forecasts when zero counts are replaced by 0.25 or 0.5 for both England and Wales and US death counts by cause of death. This was applied for both male and female death counts on both sets of data for completeness. These results are included in Sections 4.2 and 4.3. For England and Wales, Figs. A.1 and A.2 show the visualizations of the forecasts when different zero replacement approaches are used. The forecast and trends change and are sensitive to the method of zero replacement. The $\alpha$-transformation presents a statistical approach that removes this sensitivity.
Similarly, for US death counts, Figs. A.3 and A.4 show visualizations of the forecasts when different zero replacement approaches are used. It is worth noting that longer term trends are still impacted by different approaches to replace zeros, despite the US dataset having a longer history compared to the England and Wales death counts by cause.
A.4 Sensitivity analysis of the choice of $\alpha$
A sensible question to ask then is how sensitive were the results to the particular chosen values of $\alpha$ as long as we were within this tolerance range. Based on additional testing, we found that results remain largely unaffected when $\alpha$ was specified within 0.1.
The optimal $\alpha$ for England and Wales death counts (Section 4.2) is 0.1 for males, resulting in RMSE and MAE of 0.1818 and 0.1046, respectively. For $\alpha = 0.09$, the resulting RMSE and MAE are 0.1832 and 0.1055. For $\alpha = 0.11$, the resulting RMSE and MAE are 0.1806 and 0.1037. Here, the results improve when $\alpha = 0.11$, compared to specifying $\alpha$ to the nearest 0.1. However, the resulting inferences around mortality forecasts by cause are unchanged.
We perform a similar exercise on the optimal alphas for US data, where there is a longer history of death counts. For example, the optimal $\alpha$ for US females (Section 4.3) is 0.9, resulting in RMSE and MAE of 0.2516 and 0.1238, respectively. For $\alpha = 0.91$, the resulting RMSE and MAE are 0.2528 and 0.1243. For $\alpha = 0.89$, the resulting RMSE and MAE are 0.2504 and 0.1233, an improvement to the selected optimal $\alpha = 0.90$. Moreover, the resulting inferences from the forecast were largely unchanged in terms of shape and trend in the forecast of cause-specific mortality.