Collated mortality insights

These are the collated mortality insights from all my blog articles.

Insight 1. Always allow for overdispersion

If you don’t allow for overdispersion then you will underestimate uncertainty and overfit models.

[Original article]

Insight 2. Experience data is ‘measurable‘

Provided we use measures, we’ll always get the same answer regardless of how an experience dataset is partitioned.

In particular, there is no need

for experience time periods to be contiguous¹ – the sole requirement is that elements of the experience datasets do not intersect, or
to track individuals across experience datasets relating to different time periods².

[Original article]

Insight 3. The continuous time definitions of A and E are canonical

The continuous time definitions of \(A\) and \(E\) are measures and the canonical definitions of actual and expected deaths.

Other definitions can lead to confusion – usually over \(\text{E}\) vs true expectation – and spurious complexity.

[Original article]

Insight 4. The expected value of A−E is zero

If \(\mu\) is the true mortality then the expected value of \(\text{A}f-\text{E}f\) is zero, i.e.

\[\mathbb{E}\big(\text{A}f-\text{E}f\big)=0\]

for any variable \(f\) (even if \(f\) was used to fit the mortality in question), and
for any subset of the experience data (provided the choice of subset does not depend on E2R information).

[Original article]

Insight 5. The same machinery that defines A−E can be used to estimate its uncertainty

If \(\mu\) is the true mortality then, before allowing for overdispersion, the variance of \(\text{A}f-\text{E}f\) equals the expected value of \(\text{E}f^2\), i.e.

\[\text{Var}\big(\text{A}f-\text{E}f\big)=\mathbb{E}\big(\text{E}f^2\big)\]

Allowing for overdispersion \(\Omega\), this becomes

\[\text{Var}\big(\text{A}f-\text{E}f\big)=\Omega\,\mathbb{E}\big(\text{E}f^2\big)\]

Caveat: \(f\) is an ad hoc reallocation of log-likelihood; it is not relevance. For the version of this insight that does take account of relevance, see Insight 17.

[Original article]

Insight 6. A/E variance increases with concentration

\(\sqrt{\text{E}w^2} / \text{E}w\), where \(w\ge0\) is a useful and recurring measure of effective concentration in relation to mortality uncertainty. It implies that the more concentrated the experience data (in some sense) then the greater the variance of observed mortality.

Using unweighted variance without adjustment to estimate weighted statistics will likely understate risk.

[Original article]

Insight 7. Log-likelihood can be defined directly in terms of the \(\text{A}\) and \(\text{E}\) operators

The log-likelihood written in terms of the \(\text{A}\) and \(\text{E}\) operators is

\[L=\text{A}w\log\mu-\text{E}w\]

where \(w\ge0\) is the weight variable.

(This is before allowing for overdispersion.)

[Original article]

Insight 8. Proportional hazards models are probably all you need for mortality modelling

The proportional hazards model

\[\mu(\beta) = \mu^\text{ref}\exp\Big(\beta^\text{T}X\Big)\]

is

highly tractable, and
sufficiently powerful to cope with almost all practical mortality modelling problems.

[Original article]

Insight 9. An estimate of the variance of the fitted parameters for a proportional hazards mortality model is available in closed form for any ad hoc log-likelihood weight

\[\text{Var}\big(\hat\beta\big)\mathrel{\hat=} \Omega\,\mathbf{I}^{-1}\mathbf{J}\mathbf{I}^{-1}\]

where \(\hat\beta\) is the maximum likelihood estimator of the covariate weights, \(X\) is the vector of covariates, \(w\ge0\) is the log-likelihood weight, \(\mathbf{I}=\text{E}wXX^\text{T}\), \(\mathbf{J}=\text{E}w^2XX^\text{T}\) and \(\Omega\) is overdispersion

Caveat: \(w\) is an ad hoc reallocation of log-likelihood; it is not relevance. For the version of this insight that does take account of relevance, see Insight 17.

[Original article]

Insight 10. A penalised log-likelihood for a proportional hazards mortality model is available in closed form for any ad hoc log-likelihood weight

\[L_\text{P}= L(\hat\beta)-\text{tr}\big(\mathbf{J}\mathbf{I}^{-1}\big)\]

where \(\hat\beta\) is the maximum likelihood estimator of the covariate weights, \(X\) is the vector of covariates, \(L\) is the log-likelihood (which has already been adjusted for overdispersion), \(w\ge0\) is the log-likelihood weight, \(\mathbf{I}=\text{E}wXX^\text{T}\) and \(\mathbf{J}=\text{E}w^2XX^\text{T}\).

Caveat: \(w\) is an ad hoc reallocation of log-likelihood; it is not relevance. For the version of this insight that does take account of relevance, see Insight 17.

[Original article]

Insight 11. Adjusting globally for overdispersion is reasonable and straightforward

If \(\Omega\) is global overdispersion then:

A standard method for allowing for overdispersion is to scale log-likelihood by \(\Omega^{-1}\) and variances by \(\Omega\).
Suitable default values for mortality experience data are \(2\le\Omega\le3\).
Use the same \(\Omega\) for all candidate models being tested, including when \(\Omega\) is being estimated from the experience data at hand.

[Original article]

Insight 12. Rating factors must be coherent

In order for a function of information associated with individuals to be valid as a rating factor, it must be coherent, which means:

No foreknowledge of death
Correspondence between exits and survivors
Comparability between individuals
Comparability by time

[Original article]

Insight 13. Take care when using pension as a rating factor

Be wary of phrases like ‘just use pension as a covariate’ because it trivialises the problems involved in making pension a coherent rating factor:

Pensions for individuals in different pension plans are not directly comparable. For general pension plan mortality models consider using leave-one out cross validation to understand this risk and/or using an alternative approach.
Pensions as at date of exit need careful adjustment to be consistent with pensions of survivors (which can be non trivial for UK DB plans).
Pensions for actives require additional consideration in relation to potential future accrual.
Consideration needs to be given to whether or how to adjust pensions for inflation (typically since retirement). This is more of an issue in pension systems where indexation of pensions in payment is less common (e.g. the USA).
Do not assume that longevity always increases with benefit amount.

[Original article]

Insight 14. The bulk of pension plan mortality variation can be captured on a monotonic one dimensional scale

Modelling base mortality for UK DB pension plans can be reasonably reduced to modelling a single parameter for each of male pensioners, female retirees and female dependants, i.e.

\[\mu_{it}(\beta)= \mu_{it}^\text{ref} \exp\big(\beta\psi_x\big)\]

where \(x\) is age as a function of birth date from individual data \(i\) and time \(t\), \(\mu_{it}^\text{ref}\) is a common base mortality and \(\psi_x\) is a common (non-negative) pattern of mortality variation by age that tends to zero at high ages.

[Original article]

Insight 15. Weighted log-likelihood automatically estimates liabilities correctly for single scalar parameter models when provided with relevance

If (a) a mortality model has a single scalar parameter and (b) relevance is provided then maximising log-likelihood weighted by

\[w_{it}=\sum_{j\in\text{Val}} r_{it}^{jt_0} \, I_j^{-1} v'_j\]

automatically results in the best estimate of the present value of liabilities.

In the above,

\(r_{it}^{jt_0}\) is the relevance of the log-likelihood of the E2R of individual \(i\) at time \(t\) to individual \(k\) in the valuation data as at the valuation date \(t_0\),
\(I_j\) is the relevant information matrix for valuation individual \(j\), and
\(v'_j\) is derivative of liability value for valuation individual \(j\) with respect to the model parameter \(\beta\).

For further definitions, see article body.

[Original article]

Insight 16. A different weight is required to determine uncertainty in the presence of relevance

The log-likelihood weight to determine uncertainty that corresponds to the best estimate weight in Insight 15 is

\[u_{it}=\sum_{j\in\text{Val}} \sqrt{r_{it}^{jt_0}} \, I_j^{-1} v'_j\]

[Original article]

Insight 17. Always allow for time-based relevance

Allowing for time-based relevance, e.g. using

\[r_s^t=\exp\!\big(\!-\phi\,\big|t-s\big|\big)\]

where \(s\) and \(t\) are dates (measured in years) and \(\phi>0\), is to be preferred in all mortality modelling contexts because

it automatically allows for the decay in relevance as time elapses, and
compared with fixed windows, leaves models less sensitive to the falling away of more historical data.

If relevance is purely time-based then this can be accomplished simply by scaling the experience data.

[Original article]

Insight 18. Use relevance for calibrating and selecting DB pensioner base mortality models

Using the weights by \(w\) and \(u\) as defined in Insights 15 and 16 respectively to calibrate and select DB pensioner base mortality models

takes explicit account of liability impact, and
defaults to sensible results regardless of the quantum of experience data available.

The following Insights need to be restated to accommodate relevance:

Insight 5 (allowing for overdispersion \(\Omega\)) becomes

\[\text{Var}\big(\text{A}w-\text{E}w\big)=\Omega\,\mathbb{E}\big(\text{E}u^2\big)\]

The equations for Insights 9 and Insights 10 are unchanged as

\[\begin{aligned}
\text{Var}\big(\hat\beta\big)\mathrel{\hat=} \Omega\,\mathbf{I}^{-1}\mathbf{J}\mathbf{I}^{-1}
\\[1em]
L_\text{P}= L(\hat\beta)-\text{tr}\big(\mathbf{J}\mathbf{I}^{-1}\big)
\end{aligned}\]

But \(\mathbf{J}\) is redefined as \(\text{E}u^2XX^\text{T}\), i.e. weighted by \(u^2\) rather than \(w^2\).

[Original article]

Insight 19. Prefer amounts-weighted to lives-weighted log-likelihood

For DB pensioner mortality analysis, prefer statistics weighted by pension amount over lives-weighted.

When there is a lot of experience data it won’t matter; when there is only a little it will mitigate biased liability value estimates.

[Original article]

An obvious example is excluding mortality experience from the height of the COVID-19 pandemic, potentially resulting in non-contiguous data from before and after the excluded time period. ↩
Tracking individuals across experience datasets for different time periods may however be a very sensible data check. ↩