Why the MAPE May Not Give the Accuracy Results You Expect
During the CPDF® professional development workshops on Smarter Forecasting and Planning for supply chain practitioners, I often have a discussion with the participants about measuring forecast error and forecast accuracy. I encountered plenty of misunderstandings and disagreement even among participants from the same company. This led me to write several articles on my Profile; the most recent ones on modeling bias and how to reduce bias in forecasts. Others on this topic are entitled: (1) Taming Uncertainty: How You Can Mitigate the Effect of Large Forecast Errors on Forecasting Performance, (2) DATA Quality Matters: Are Your Planners and Forecasters Adequately Prepared? (3) In the Land of the APEs, Is the Mean APE (MAPE) a King or a Myth? and (4) The Myth of the MAPE . . . and how to avoid it.
Outliers in forecast errors and other sources of unusual data values should never be ignored in the accuracy measurement process. Even a common metric of forecast accuracy, the MAPE, can become misleading (MAPE is the arithmetic mean of the ratio of the absolute value of Actual (A) minus Forecast (F) divided by the Actual (A). An otherwise acceptable looking measure of performance can be distorted by just a single unusual value in the numbers, but more so the lack of normality (Gaussianity) in the underlying numbers. The arithmetic mean can only be trusted when data are normally (Gaussian) distributed and becomes a very poor measure of central tendency with even slightly non-Gaussian data. This has been well documented, but not widely embraced by the forecasting community, apparently.
After my previous LinkedIn article on bias in forecasting, I will now introduce an algorithmic model for accuracy measurement. It all starts with the forecast error. There are a number of ways, forecast errors come up in practice. You can, for example,
- create lead time demand forecasts for inventory planning.
- create multiple forecasts for a single period in the future based on a rolling forecast horizon, as in an annual budget or production cycle.
- create forecasts for multiple periods in an 18-month planning horizon.
I find the waterfall charts derived from hold out sample testing particularly useful when analyzing forecasting performance on spreadsheets with real-world historical data.
A Measurement Model for Bias and Dispersion
The text you add here will only be seen by users with visual disabilities. It will not be visible on the article itself.
If we want to measure forecast accuracy for a single forecast, we would say that the forecast error FE = Actual (A) minus Forecast (F) is the common, acceptable way to do that. If we shoot an arrow at a target, then we could measure the distance from the bulls eye to the arrow as a measure of accuracy. It is only a measurement so the exact difference is subject to some measurement error. We can view this process as a black box, in which the measurement error € enters the box from the right and the observed difference, or bias β, is the outcome of a translation of an unknown constant β while the dispersion σ is an unknown constant that scales the measurement error; these constants are commonly referred to as parameters in a model.
This is a black box model that can be represented by an equation:
FE = β + σ€,
where β and σ are unknown constants (the knobs) and € is the measurement error which has a known or assumed distribution. If you are aware of outliers (even just a few) or unusual variation with real world data in a forecasting application, you will have to shy away from the normal distribution. Rather, we will assume a family of distributions, known as the exponential family, for the measurement error distribution. The exponential family contains many familiar distributions including the normal (Gaussian) distribution. However, our approach is fundamentally algorithmic, driven by data, and not as a conventional data-model approach. My motivation comes from examining the data first within the context of a particular problem or application, rather than following the more conventional approach of assuming a specific data-model with normally distributed errors as the data generation process.
What We Can Learn About Forecast Accuracy Measurement with This Model and the Data
This measurement model is known as a location- scale model. . The black box model shows that the output FE is a translation of an input € shifted by an amount β, and where the measurement errors are scaled by a positive constant σ. The Location-Scale measurement model and its generalizations were worked out over four decades ago and can be found in a book by D.A.S. Fraser, entitled Inference and Linear Models. There are also a number of peer-reviewed journal papers on statistical inference on the subject. It is not mainstream statistical modeling and certainly not related to Bayesian inference. There are, in fact, no “priors” involved, which are so essential to the Bayesian approach, Also, not relying on normally distribution assumptions is key as it achieves unique estimates and confidence bounds procedures for the parameters without it.
In practice, we have multiple measurements FE = {FE1, FE2, FE3., . . ., FEn} of observed forecast errors illustrated by the output of the location-scale black box model
FE1 = β + σ €1,
FE2 = β + σ €2,
FE3 = β + σ €3,
FEn = β + σ €n,
where € = { €1, €2, €3., . . . , €n } are now n realizations of measurement errors from an assumed distribution in the exponential family. The black box has been twiddled n times with the same unknown constants β and σ. The question now is what information can we uncover about the black box process. This is where it gets interesting, and perhaps a bit strange for those who have been through a classical statistics course on statistical nference.d alt text
If you could explore the innards of the black box like a detective, you would discover that, based on the observed data, there is information about the unknown, realized measurement errors. This revelation will guide us to the next important step, namely a parsing or decomposition of the measurement error distribution into two components: a marginal distribution for the observed components and a conditional (conditioned on the observed components) distribution for the remaining unknown measurement error distribution.
I will not bore you with the details because, as a graduate student I struggled along with my peers with the derivations. However, it is documented in books and peer reviewed academic literature. Because the results do not lend themselves to nice theoretical answers (except for the normal distribution), they have not seen much daylight in practice until recently when the inferential modeling can be dealt with in today’s empirically rich and fast, computing environment, as in bootstrapping, machine learning, neural nets, and AI. In other words, it has become a practical approach today, which I could not pursue in grad school. With modern computing power, we can now show what we should do with the data, not necessarily what we could do based on the mathematics of normal distribution theory and asymptotics.
So, what is the insight or information gleaned from the observed output data and the structure of the black box? If you now select a location and a scale metric like the arithmetic mean, median, smallest value (first order statistic) for location and a standard deviation or range for scaling the measurement error, you can make a calculation which gives you some known information about the measurement errors and the measurement error distribution (this is like a negative feedback loop from FE to € below the black box). Let’s call these metrics m(FE) and s(FE). Then the black box process reveals (with substitution and some manipulation) that
[FE1 – m(FE)]/ s(FE) = [β + σ €1 -– m(β + σ €)]/ s(σ €)
= [β + σ €1 – β – m(σ €)]/ σ s(€) = σ [€1 –m(€)]/ σ s(€)
= [€1 –m(€)]/ s(€)
[FE2 – m(FE)]/ s(FE) = [€2 – m(€)]/ s(€)
[FE3 – m(FE)]/ s(FE) = [€3– m(€)]/ s(€)
.
.
[FEn – m(FE)]/ s(FE) = [€n– m(€)]/ s(€)
The left-hand side of each equation can be calculated from the data, so the right-hand side is information about a realized measurement error and its distribution. What is known we can condition on so we can derive a conditional distribution given the known error component and a marginal distribution for the realized error component. We do not need to go further into details at this time. This can be done and will lead to the practical inferences about the parameters.
To convince yourself that the above equations work, think of the metric m(.) as the arithmetic mean or the median, then as you add a constant amount to each value of FE, the mean or median is shifted by the same constant amount. In other words, m(a + x) = a + m(x), where a is a constant and x = (x1, x2, x3, … xn). Likewise, think of the metric s(.) as a standard deviation, then s(c x) = c s(x), where c is a constant and x = (x1, x2, x3, … xn). The particular choice of the two metrics is not so important in the analysis, as long as they have the above properties. The MAPE is not an m(.) or s(.) metric.
A Structured Inference Base Approach (Phase I)
The useful information we can derive from this analysis is a decomposition of the measurement error distribution. The analysis will yield a (conditional) posterior distribution for the unknown parameters β and σ from which we can derive unique confidence intervals and related practical inferences.
The location-scale model is an application of a Structured Inference Base (SIB) approach that can be generalized to a wide range of applications, not just for accuracy measurement. In a future article I will elaborate on the details of the SIB process for forecasting intermittent demand.
