### Do We Know What We Are Talking About When Measuring Accuracy in Forecasting?

While facilitating the CPDF professional development workshops on *Smarter **Forecasting and Planning* for supply chain practitioners around the world, I would ask the participants to give me their definitions of forecast error and forecast accuracy. I encountered plenty of misunderstanding and disagreement even among participants from the same company. This led me to write several articles on LinkedIn entitled: (1) **Taming Uncertainty: How You Can Mitigate the Effect of Large Forecast Errors on Forecasting Performance****, (2) ****DATA Quality Matters: Are Your Planners and Forecasters Adequately Prepared?**** (3) ****In the Land of the APEs, Is the Mean APE (MAPE) a King or a Myth?**** and (4) ****The Myth of the MAPE . . . and how to avoid it****. **

Outliers in forecast errors and other sources of unusual data values should never be ignored in the accuracy measurement process. Even the simplest measure of forecast accuracy, the bias, can be misleading when calculating the *average forecast error* ME (the arithmetic mean of Actual (A) minus Forecast (F)). An otherwise unbiased pattern of performance can be distorted by just a single unusual value.

I will now lay a foundation for forecast accuracy measurement that could help practitioners acquire a common basis for measuring forecasting performance. It all starts with the *forecast error*. There are a number of ways, forecast errors come up in practice. You can, for example,

- create lead time demand forecasts for inventory planning.
- create multiple forecasts for a single period in the future based on a rolling forecast horizon, as in an annual budget or production cycle.
- create forecasts for multiple periods in an 18-month operations planning horizon.

I find the waterfall charts derived from hold out sample testing particularly useful when analyzing forecasting performance on spreadsheets with historical data. I use months for illustrative purposes, but it would work equally well with other types of data periodicities.d alt text

### A Simple Measurement Model for Forecast Bias: Phase I

If we want to measure forecast accuracy for a *single* forecast, we would say that the forecast error FE = Actual (A) *minus* Forecast (F) is the common, acceptable way to do that. If I shoot an arrow at a target, then we could measure the distance from the bulls eye to the arrow as a measure of accuracy. It is only a measurement so the exact difference is subject to some measurement error.

### We can view this process as a black box, in which the measurement error enters the box from the right and the observed difference, or bias, is the outcome of a translation of an unknown constant, commonly referred to as a parameter in a model. This is a black box with one knob to twiddle. The model can be represented by an equation: FE = β + **ɛ**

Here **β** is an unknown constant (the knob) and ɛ is the measurement error which has a known or assumed distribution. Being very sensitive to outliers in realistic forecasting applications in demand forecasting, we will have to shy away from the normal distribution. Rather, we will assume a family of distributions, known as the *exponential family*. It contains many familiar distributions including the normal (Gaussian) distribution. However, our approach is *algorithmic, driven by data* within the context of applications. This is not a conventional *data-model*. Our motivation comes from looking at data first within the context of a particular problem or application, rather than following the more conventional approach of assuming a specific data-model with normally distributed errors to describe a *data generation* process.

I start with a simple, almost trivial example, but do so intentionally, because my aim is not estimation but rather to use the model to uncover information conveyed by the operation of this black box, in a practical environment. What can we learn about the measurement process given THIS black box model and the data? This black box model says that the output FE is a translation of an input ɛ shifted by an amount β. It is called a *location *model*.*

Lest you think I am just making this up, this black box model and its generalizations were worked out over four decades ago and can be found in a book by D.A.S. Fraser, entitled ** Inference and Linear Models**, and also in a number of his peer-reviewed journal articles. It is not mainstream statistical modeling and certainly not related to Bayesian inference. There are, for instance, no “priors” involved, which are so essential to the Bayesian approach,

In practice, we would have multiple measurements **FE** = {FE1, FE2, FE3., . . ., FEn} of observed forecast errors given by the output of the location model black box

FE1 = β + ɛ1

FE2 = β + ɛ2

FE3 = β + ɛ3

..

.

FEn = β + ɛn.

where ɛ = { ɛ1, ɛ2, ɛ3., . . . , ɛn } are now *n* realizations of measurement errors from an assumed distribution in the exponential family. The black box has been twiddled n times with the same constant, but unknown, parameter **β**. The question is again, what information can we uncover about the black box process. This is where it gets interesting, and perhaps a bit unfamiliar for those who have been through a classical statistics course on Inferential Statistics.

If you could watch the innards of the black box, you would discover that there is information now about the unknown*, realized* measurement errors, which will guide us to the next step, namely a parsing of the measurement error distribution into two components: a marginal distribution for the *observed* component and a conditional distribution for the remaining measurement error distribution. I will not bore you with the details here because, as I said, the derivations are documented in the literature. Because the results do not lend themselves to nice theoretical answers (except for the normal distribution), they can be dealt with empirically nowadays with fast, cheap computing, real data and plenty of storage. In other words, it has become a practical approach. With modern computing power, we can show what we *should* do, not necessarily what we *could* do based on the mathematics of normal distribution theory.

So, what is the insight or information gleaned from the observed output data and the design of the black box. If you pick a *location statistic* m(.) like the arithmetic mean, median, or even the smallest value (first order statistic), you can make a calculation that gives you some observable information about the measurement errors and the measurement error distribution. Let’s calculate this metric m(**FE**) from the output of the black box. Then the black box process shows that

FE1 – m(**FE**) = β + ɛ1 -– m(β + **ɛ)** = β + ɛ1 – β – m(**ɛ) = **ɛ1 – m(**ɛ)**

FE2 – m(**FE**) = β + ɛ2 -– m(β + **ɛ)** = ɛ2 – m(**ɛ)**

FE3 – m(**FE**) = **ɛ**3– m(**ɛ)**

.

.

FEn – m(**FE**) = ɛn– m(**ɛ)**

The *left-hand* side of each equation can be calculated given the data, so the *right-hand* side is information about the measurement error and its distribution that is now realized. What is known we can condition on. So, we can derive a conditional distribution given the known error component and a marginal distribution for the now realized or *observed* error component. We do not need to go into details here.

To convince yourself that the above equations work, think of m (.) as the arithmetic mean or the median, then as you add a constant amount to each value, the mean or median is shifted by the same constant amount. In other words, m(a + **x**) = a + m(**x**), where a is a constant and **x =** (x1, x2, x3, … xn).

### The Structured Inference Base (SIB) Approach (Phase I)

The useful information we can derive from this analysis is a *decomposition* of the measurement error distribution. The final analysis will yield a (conditional) *posterior distribution* for the unknown bias parameter **β** from which we can derive unique confidence intervals and related inferences.

The simple location model is an application of a *Structured Inference Base* (**SIB**) approach that can be generalized to a wide range of applications, not just for accuracy measurement. In a future article I will set up a black box for *bias* and *disperson,* known as a *location-scale* model, in a forecast accuracy measurement process,