Skip to content

Collated mortality insights

These are the collated mortality insights from all my blog articles.

Insight 1. Always allow for overdispersion

If you don’t allow for overdispersion then you will underestimate uncertainty and overfit models.

[Original article]

Insight 2. Experience data is ‘measurable

Provided we use measures, we’ll always get the same answer regardless of how an experience dataset is partitioned.

In particular, there is no need

  • for experience time periods to be contiguous1 – the sole requirement is that elements of the experience datasets do not intersect, or
  • to track individuals across experience datasets relating to different time periods2.

[Original article]

Insight 3. The continuous time definitions of A and E are canonical

The continuous time definitions of \(A\) and \(E\) are measures and the canonical definitions of actual and expected deaths.

Other definitions can lead to confusion – usually over \(\text{E}\) vs true expectation – and spurious complexity.

[Original article]

Insight 4. The expected value of AE is zero

If \(\mu\) is the true mortality then the expected value of \(\text{A}f-\text{E}f\) is zero, i.e.

\[\mathbb{E}\big(\text{A}f-\text{E}f\big)=0\]
  • for any variable \(f\) (even if \(f\) was used to fit the mortality in question), and
  • for any subset of the experience data (provided the choice of subset does not depend on E2R information).

[Original article]

Insight 5. The same machinery that defines AE can be used to estimate its uncertainty

If \(\mu\) is the true mortality then, before allowing for overdispersion, the variance of \(\text{A}f-\text{E}f\) equals the expected value of \(\text{E}f^2\), i.e.

\[\text{Var}\big(\text{A}f-\text{E}f\big)=\mathbb{E}\big(\text{E}f^2\big)\]

Allowing for overdispersion \(\Omega\), this becomes

\[\text{Var}\big(\text{A}f-\text{E}f\big)=\Omega\,\mathbb{E}\big(\text{E}f^2\big)\]

[Original article]

Insight 6. A/E variance increases with concentration

\(\sqrt{\text{E}w^2} / \text{E}w\), where \(w\ge0\) is a useful and recurring measure of effective concentration in relation to mortality uncertainty. It implies that the more concentrated the experience data (in some sense) then the greater the variance of observed mortality.

Using unweighted variance without adjustment to estimate weighted statistics will likely understate risk.

[Original article]

Insight 7. Log-likelihood can be defined directly in terms of the \(\text{A}\) and \(\text{E}\) operators

The log-likelihood written in terms of the \(\text{A}\) and \(\text{E}\) operators is

\[L=\text{A}w\log\mu-\text{E}w\]

where \(w\ge0\) is the weight variable.

(This is before allowing for overdispersion.)

[Original article]

Insight 8. Proportional hazards models are probably all you need for mortality modelling

The proportional hazards model

\[\mu(\beta) = \mu^\text{ref}\exp\Big(\beta^\text{T}X\Big)\]

is

  • highly tractable, and
  • sufficiently powerful to cope with almost all practical mortality modelling problems.

[Original article]

Insight 9. An estimate of the variance of the fitted parameters for a proportional hazards mortality model is available in closed form for any ad hoc log-likelihood weight

\[\text{Var}\big(\hat\beta\big)\mathrel{\hat=} \Omega\,\mathbf{I}^{-1}\mathbf{J}\mathbf{I}^{-1}\]

where \(\hat\beta\) is the maximum likelihood estimator of the covariate weights, \(X\) is the vector of covariates, \(w\ge0\) is the log-likelihood weight, \(\mathbf{I}=\text{E}wXX^\text{T}\), \(\mathbf{J}=\text{E}w^2XX^\text{T}\) and \(\Omega\) is overdispersion

Caveat: \(w\) is an ad hoc reallocation of log-likelihood; it is not relevance.

[Original article]

Insight 10. A penalised log-likelihood for a proportional hazards mortality model is available in closed form for any ad hoc log-likelihood weight

\[L_\text{P}= L(\hat\beta)-\text{tr}\big(\mathbf{J}\mathbf{I}^{-1}\big)\]

where \(\hat\beta\) is the maximum likelihood estimator of the covariate weights, \(X\) is the vector of covariates, \(L\) is the log-likelihood (which has already been adjusted for overdispersion), \(w\ge0\) is the log-likelihood weight, \(\mathbf{I}=\text{E}wXX^\text{T}\) and \(\mathbf{J}=\text{E}w^2XX^\text{T}\).

Caveat: \(w\) is an ad hoc reallocation of log-likelihood; it is not relevance.

[Original article]

Insight 11. Adjusting globally for overdispersion is reasonable and straightforward

If \(\Omega\) is global overdispersion then:

  1. A standard method for allowing for overdispersion is to scale log-likelihood by \(\Omega^{-1}\) and variances by \(\Omega\).

  2. Suitable default values for mortality experience data are \(2\le\Omega\le3\).

  3. Use the same \(\Omega\) for all candidate models being tested, including when \(\Omega\) is being estimated from the experience data at hand.

[Original article]


  1. An obvious example is excluding mortality experience from the height of the COVID-19 pandemic, potentially resulting in non-contiguous data from before and after the excluded time period. 

  2. Tracking individuals across experience datasets for different time periods may however be a very sensible data check.