Skip to content

Mortality: Measures matter

This is the first in a series of articles outlining mortality experience analysis fundamentals, by which I mean estimating the underlying mortality for individuals in a defined group (e.g. members of a pension plan) from mortality experience data.

This will be fairly technical, but I’ll aim

  • to be concise,
  • to pull out the key insights, including what does and doesn’t matter in practice, and
  • to map concepts one-to-one to the process of actually carrying out a mortality experience analysis or calibrating and selecting mortality models.

A lot of this is reasonably well known, but it is not always available in one place or easily accessible, and sometimes key points are omitted. It won’t be all plain sailing, and there’ll be a few potentially contentious and maybe even surprising points along the way.

In this first article, I’ll set out the foundations.

Articles in this series

  1. Measures matter
  2. A over E

Definitions

An exposed-to-risk (E2R) for an individual comprises

  • an exposure period \([\nu,\tau)\)23 throughout which the individual was alive, and
  • an indicator \(\delta\), which is \(1\) if the individual died at \(\tau\) or else \(0\).

I’ll write an E2R as \(\varepsilon=(\nu,\tau,\delta)\).

For each individual we also have a set of facts, \(i\), known at the time of the analysis and which are time invariant.

If a fact relates to something that would have happened except that the individual died, e.g. pension in payment in 2025 for an individual who died in 2024, then we assume this fact is determined as if the individual had lived. This is critical for unbiased analysis and is sometimes referred to as ‘the principle of correspondence’1.

An experience dataset \(\mathscr{E}\) comprises pairs of facts and E2Rs, i.e. \(\{(i,\varepsilon)\}\) for which no E2Rs for the same individual overlap in time4.

A variable is a real-valued function \(f(i,t)\) of facts \(i\) and time \(t\)5. A variable is not random – by assumption, the sole source of stochasticity is whether individuals die or not, which is embedded in \(\varepsilon\).

A mortality, \(\mu\), is a strictly positive variable6 that specifies the probability of an individual dying over an infinitessimal time interval \(\text{d}t\) as \(\mu(i,t)\,\text{d}t\), i.e. an independent7 Bernoulli trial.

What is random?

We will assume that the mortality \(\mu\) at hand is itself completely deterministic and the sole source of random variation is whether or not individuals die according to \(\mu\).

Don’t lose sight of the fact that this is a convenient fiction to make the problem tractable and that, in practice, this is never true for mortality data, not least because \(\mu\) itself is also stochastic. This manifests as observed variances being materially higher than would be predicted by a fitted Poisson distribution, and is known as overdispersion.

Insight 1. Always allow for overdispersion

If you don’t allow for overdispersion then you will underestimate uncertainty and overfit models.

[All mortality insights]

The good news is that we can fix things up to allow for overdispersion, which we’ll get to in a later article. In the meantime, whenever you come across references to mortality variance or uncertainty (in this blog or anywhere else), a little voice in your head should be saying ‘remember to allow for overdispersion’.

The measures that matter

We can picture experience data as comprising infinitesimals \(\text{d}(i,\varepsilon)\) that can be added up in a couple of different ways. The mathematical approach to this is to define measures on the data. The pay off is that provided we use a measure we can add up functions over experience data any way we like and we’re guaranteed to end up with the same answer.8

Insight 2. Experience data is ‘measurable

Provided we use measures, we’ll always get the same answer regardless of how an experience dataset is partitioned.

In particular, there is no need

  • for experience time periods to be contiguous9 – the sole requirement is that elements of the experience datasets do not intersect, or
  • to track individuals across experience datasets relating to different time periods10.

[All mortality insights]

I suggested above that there are two things we want to add up. If you’re a practitioner, you’ve already come across them (or at least an approximation to them) and, as we’ll see over this series, pretty much every useful aspect of mortality experience analysis can be expressed directly in terms of them:

  1. Actual deaths is the sum of \(f\) over deaths at time of death:

    \[\text{A}f=\sum_{(i,\varepsilon)\in \mathscr{E}}\delta_\varepsilon f(i,\tau_\varepsilon)\]

    Note that the variable, \(f\), is evaluated at time of death.

  2. Expected deaths11 with respect to mortality \(\mu\) is the integral of \(\mu\) times \(f\) over all exposure periods:

    \[\text{E} f=\sum_{(i,\varepsilon)\in \mathscr{E}}\int_{\nu_\varepsilon}^{\tau_\varepsilon}\!\mu(i,t)f(i,t)\,\text{d}t\]

‘Expected’ is not expectation

\(\text{E}f\) is a random variable (like \(\text{A}f\)), and so describing it as ‘expected’ deaths can give rise to confusion. But the practice is so ensconced that using a different term would be even more confusing.

The terminology arises because ‘expected deaths’ is typically an estimate of expected deaths over short but finite time intervals, e.g. years. The definition of \(\text{E}\) here is the same except it is defined on a grid of infinitesimal time intervals, which discards the complexity of determining survival during finite intervals and makes this the canonical definition.

For avoidance of doubt, I’ll always use \(\mathbb{E}\) to refer to true expectation.

On notation and terminology:

  • The dataset \(\mathscr{E}\) and, for \(\text{E}\), the mortality \(\mu\) are typically implicit from context – it is rare that we need additional notation to make them explicit. But, for the avoidance of doubt, \(\text{E}\) always implies a background mortality \(\mu\).
  • There is a multitude of notations for integrating using measures (see e.g. here), of which \(\int \!f(x)\,\text{M}(\text{d}x)\) and \(\int \!f\,\text{dM}\) are common. But the simplest is \(\text{M}f\), which is what I’ll use.

Although I’ve emphasised that \(\text{A}\) and \(\text{E}\) are measures over experience data, I confess that I don’t use this terminology day-to-day12. In fact, I’ve hardly heard anyone mention ‘measures’ in connection with experience analysis, which may explain why their importance seems to be overlooked and why some practitioners end up confused over the meaning of ‘expected’ (see box out), or whether individuals need to be tracked throughout the experience data.

Insight 3. The continuous time definitions of A and E are canonical

The continuous time definitions of \(A\) and \(E\) are measures and the canonical definitions of actual and expected deaths.

Other definitions can lead to confusion – usually over \(\text{E}\) vs true expectation – and spurious complexity.

[All mortality insights]

Why include f ?

References to actual and expected deaths in mortality analyses are often written simply as \(A\) and \(E\), so why do we have a variable in our definitions?

An simple justification14 is that real world mortality work requires weighted statistics13:

  1. It is standard to analyse actual and expected deaths weighted by benefit amount (‘amounts-weighted’) as well as unweighted (‘lives-weighted’). So including \(f\) means that we have this requirement covered.

  2. We may15 want to weight data by its relevance (also known as reliability or importance).

The fundamental reason though is that, as noted above, pretty much every useful aspect of mortality experience analysis can be expressed directly in terms of \(\text{A}f\) and \(\text{E}f\).

In the next article I’ll review the properties of \(\text{A}f\) and \(\text{E}f\) and their role in A/E analysis.


  1. From a technical point of view, this principle also means mortality models can’t cheat simply by looking at the data. 

  2. The notation \([\nu,\tau)\) means the interval \(\{t\in\mathbb{R} \mid \nu\le t \lt \tau\}\)

  3. I’ll take it as a given that we should work in (some representation of) continuous time if at all possible. Otherwise we’d be (a) throwing away data and (b) creating additional cognitive load and potentially biased or even plain wrong results by having to make assumptions about averages. 

  4. Easy to stipulate in theory, but data de-duplication is an essential and sometimes non-trivial part of real world mortality experience analysis. 

  5. We need to place some conditions on the dependence of a variable \(f(i,t)\) on time \(t\).

    The most general is that \(f(i,t)\) is left-continuous with right limits in \(t\), left continuity being required so that the value at death is consistent with the value in the immediately preceding exposure. But this level of generality is impractical for an actual implementation.

    A more useful real-world condition is that \(f(i,t)\) is smooth in \(t\) at the scale of numerical integration. We’ll leave smoothness undefined for now, other than to state that, as a minimum, it implies absolute continuity in \(t\)

  6. An implementation would also need a mortality to specify a terminal date or age by individual (because mortality tables stop), but we don’t need that for this exposition. 

  7. For the avoidance of doubt, we assume that these Bernoulli trials are independent by time and by individual. 

  8. The freedom to partition experience data may also present opportunities to run calculations in parallel. (I suggest that your mortality experience calculations should be running in parallel at least somewhere along the line.) 

  9. An obvious example is excluding mortality experience from the height of the COVID-19 pandemic, potentially resulting in non-contiguous data from before and after the excluded time period. 

  10. Tracking individuals across experience datasets for different time periods may however be a very sensible data check. 

  11. We could alternatively define an ‘exposure measure’ as

    \[\text{S} f=\sum_{(i,\varepsilon)\in \mathscr{E}}\int_{\nu_\varepsilon}^{\tau_\varepsilon}\!f(i,t)\,\text{d}t\]

    While this is a simpler definition, it would

    • clutter up our notation because we’d end up writing \(\text{S}\mu f\) everywhere instead of \(\text{E}f\), and
    • obscure the symmetry between \(\text{A}\) and \(\text{E}\).

  12. I think I usually describe \(A\) and \(E\) as ‘linear operators’. 

  13. Note that \(E\) can no longer serve as an estimate of its own variance when dealing with weighted statistics. 

  14. Another potential justification is that it is mathematically convenient to use \(f\in \{0,1\}\) as an indicator of dataset membership by time. Unfortunately, this is an implementation nightmare in its full generality, and so has limited real-world value. There are better approaches to achieving this in practice. 

  15. And in a future article will