Mortality: A over E

Why ‘A over E’?

‘A over E’ literally refers to ‘actual’ deaths divided by ‘expected’ deaths as a measure of how experience data compares with a mortality.

In practice, ‘A over E’ is often interpreted as meaning the whole statistical caboodle, which is how I’ll use it here.

In the previous article we defined experience data, variables and mortality with respect to that data, and the \(\text{A}\) (actual) and \(\text{E}\) (expected) deaths operators.

In this article we’ll put \(\text{A}\) and \(\text{E}\) to work.

The ‘expectation’ result

When the mortality \(\mu\) is the true mortality then for any variable \(f\) it is easy to show that

\[\mathbb{E}\big(\text{A}f-\text{E}f\big)=0\tag{1}\]

Articles in this series

where \(\mathbb{E}\) is true expectation (allowing for probabilities of survival etc).

If is worth emphasising that this is true

for any weight, even if that that weight was used to fit the mortality in question, and
for any subset of the experience data, provided the choice of subset does not depend on E2R information.

This bears repeating: it doesn’t matter how you calibrated your mortality model, if your model works then the expected value of \(\text{A}f-\text{E}f\), whether it be weighted by lives¹, by amounts or by any other variable you or I may choose, is zero.

Insight 4. The expected value of A−E is zero

If \(\mu\) is the true mortality then the expected value of \(\text{A}f-\text{E}f\) is zero, i.e.

\[\mathbb{E}\big(\text{A}f-\text{E}f\big)=0\]

for any variable \(f\) (even if \(f\) was used to fit the mortality in question), and
for any subset of the experience data (provided the choice of subset does not depend on E2R information).

[All mortality insights]

The ‘variance’ result

Observations are noisy and so we’d also like to calculate the variance to assess how close the observed result is to \(0\).

When the mortality \(\mu\) is the true mortality then for any variable \(f\) then we can show that

\[\text{Var}\big(\text{A}f-\text{E}f\big)=\mathbb{E}\big(\text{E}f^2\big) \tag{2}\]

Some observations:

This is a generalisation of the lives-weighted (or unweighted) result¹ that the variance of A−E is E.
If you have the machinery in place to calculate \(\text{A}f\) and \(\text{E}f\) then you are also immediately in a position to estimate the variance of \(\text{A}f-\text{E}f\) because it’s just \(\text{E}g\) where \(g=f^2\).
This does not allow for overdispersion and so is typically an underestimate of variance. I’ll ignore this for the time being on the basis that I’ll cover overdispersion later in this series.

Insight 5. The same machinery that defines A−E can be used to estimate its uncertainty

If \(\mu\) is the true mortality then the variance of \(\text{A}f-\text{E}f\) equals the expected value of \(\text{E}f^2\), i.e.

\[\text{Var}\big(\text{A}f-\text{E}f\big)=\mathbb{E}\big(\text{E}f^2\big)\]

(This is before allowing for overdispersion.)

Caveat: \(f\) is an ad hoc reallocation of log-likelihood; it is not relevance. For the version of this insight that does take account of relevance, see Insight 17.

[All mortality insights]

Familiarity with lives-weighted case can lead practitioners astray in relation to weighted statistics:

An unadjusted lives-weighted variance estimate should not be used to estimate the variance of a weighted A/E because it will always understate it².
Hard-coding the assumption that all statistics are lives-weighted into your system will mean your system will struggle when weighted results are required in the future.
When experience data is provided in the form of grouped deaths and exposures, it is reasonably common for this data to be provided weighted both by lives and by benefit amount. But what is almost always overlooked³ is that the amounts-weighted grouped experience data should in addition include exposure data weighted by amount squared. In other words, grouped weighted data should comprise \(\text{A}f\), \(\text{E}f\) and \(\text{E}f^2\) as opposed to just \(\text{A}f\) and \(\text{E}f\).

Diagnostics

With a mean and a variance to hand, we are in a position to take a first stab at A/E diagnostics, i.e. residuals and confidence intervals.

The Pearson residual is (actual − expected) / (estimated standard deviation). In our case, actual is \(\text{A}f-\text{E}f\), expected is \(0\), and estimated standard deviation is \(\sqrt{\text{E}f^2}\), and so the Pearson residual is

\[r_\text{P}=\frac{\text{A}f-\text{E}f}{\sqrt{\text{E}f^2}}\sim N\!\left(0,1\right)\]

Another obvious A/E diagnostic is the literal one – appeal to the central limit theorem and the low variance of \(\text{E}f\) to assume

\[\frac{\text{A}f}{\text{E}f}\sim N\!\left(1,\;\frac{\text{E}f^2}{(\text{E}f)^2}\right)\]

In other words, review \(\text{A}f/\text{E}f\) and compare its difference from \(1\) with (a multiple of) \(\pm\sqrt{\text{E}f^2} / \text{E}f\).

Insight 6. A/E variance increases with concentration

\(\sqrt{\text{E}w^2} / \text{E}w\), where \(w\ge0\) is a useful and recurring measure of effective concentration in relation to mortality uncertainty. It implies that the more concentrated the experience data (in some sense) then the greater the variance of observed mortality.

Using unweighted variance without adjustment to estimate weighted statistics will likely understate risk.

[All mortality insights]

The above diagnostics are fine in practice, but they have some nagging drawbacks:

We’ve relied on \(\sqrt{\text{E}f^2}\gg \text{E}f\) and the central limit theorem, which will break down for datasets with fewer or more concentrated weighted deaths.
The implied confidence intervals can include negative values!

Next article: Log-likelihood

We can do (a bit) better and so we’ll revisit A/E diagnostics in due course. But in order to do that we’ll need to define the log-likelihood, which will be the subject of my next article.

I’ll interpret lives-weighted as meaning \(f\in\{0,1\}\), which is a little more general than unweighted, which is \(f=1\). In both cases \(f^2=f\) and hence \(\text{E}f^2=\text{E}f\). ↩↩
Provided \(f\) exhibits some variation over the experience data, it is a mathematical truth that \(\text{E}f^2\cdot\text{E}1\gt (\text{E}f)^2\). ↩
By parties that I think should know better. ↩