How You Can Improve Forecasting Performance with the MAPE by Finding and Fixing Data Quality Issues in Your Demand History
In a recent article entitled Improving Forecasting Performance with Intermittent Data – A New Way of Scoring Performance, I gave some insight into why the MAPE (Mean Absolute Percentage Error) can be a seriously flawed accuracy measure. Aside from intermittency. when demand has zeros and APEs (Absolute Percentage Errors) are undefined, there are often unusual values in the APEs that distort the average. I showed that there is a remedy that you can apply by considering the MdAPE (Median Absolute Percentage Error) or the HBB TAPE (Typical Absolute Percentage Error) as a more typical summary than the average. The HBB TAPE metric was introduced in my LinkedIn post and is also described in my book Change&Chance Embraced (p. 97) dealing with smarter, agile demand forecasting practices for the Supply Chain. I have also used it in facilitating CPDF professional development workshops (see the CPDF Workshop Manual, p. 188., available on Amazon). I have also placed my articles on my website blog.
In this article, I show how the difference between the MAPE and MdAPE can lead to insights into why the MAPE has been so misleading as an accuracy measure. The MAPE appears to be commonly provided in demand planning software without reservation.
In this spreadsheet example, the twelve months starting September, 2016 (row 11) was taken as a holdout sample or training data set for forecasting with three methods: (1) the previous year’s twelve-month actuals (Year-1), (2) a trend/seasonal exponential smoothing model ETS(AAM), and (3) a twelve-month average of the previous year (MAVG-12). The twelve-month forecasts in the holdout sample are called Forecast Profiles (FP). The results ((left panel, row 34) show that the three MAPEs for the twelve months are about the same around 50%, not great but probably typical, at a SKU-location level.
However, when you calculate the MdAPEs (left panel, row 35) for these data, you get a strikingly different and better result for a summary performance of the three forecast profiles, Year-1, ETS(AAM) and MAVG-12, as follows: Year-1 (41%), ETS(AAM) (25%) and MAVG-12 (32%).
Using Gaussian Arithmetic is Not Always a Best Practice
What’s the story here? As they say, the information is the in the DATA. There maybe an outlying or unusual value in the APEs that distorts averaging with the arithmetic mean (“Gaussian arithmetic”). Is the simple average MAVG-12 really more accurate than the Year-1 forecast profile and only slightly less valuable than the ETS model forecast profile?
The takeaway here is that you need to ALWAYS be able to examine the underlying data for anomalies that commonly distort or disguise results. Many software systems may not always give you that flexibility, however.
Add alt text
When you examine the historical data more closely with an ETS model, you can see that the data are seasonal with a seasonal peak in December (not surprisingly) and a seasonal trough in July.
Now, the peak month in 2016 holdout data in the spreadsheet (cell F11) does not appear in December (= 49786), as expected, but rather the following month, January 2017 (=75069). This is not credible, so for the sake of model integrity and stability in performance measurement, the two data values are switched in the spreadsheet, and the MAPEs and MdAPEs recalculated
I want to capture the change in the MAPE as the holdout year progresses. So, I calculate a rolling MAPE starting in period 5 (using periods 1-5, etc.) of the holdout period and continuing until end of the holdout period. The chart to the right of the table (top panel) summarizes the results graphically. There is no clear distinction between the performance of a level profile (MAVG-12) and the ETS(AAM) profile in the unadjusted MAPEs. However, with the December anomaly found and fixed, it becomes clear with the MdAPEs that the ETS(AAM) forecast profile clearly outperformed the other two during this period (lower panel). The MdAPE is outlier-resistant and a gives a better picture of a typical accuracy, while the MAPE never can when there are outliers and unusual values (even one or two!).
Another takeaway: It should not be a best practice for demand planners to rely on MAPEs alone.
As I pointed out before, this forecast profile analysis with MAPEs is not appropriate with intermittent lead-time demand data. In a follow-up article, I will again use a more appropriate information-theoretic approach to forecast accuracy measurement, which is valid for both regular and intermittent demand forecasting. Moreover, you will see that the approach allows for direct comparisons between the forecasting performance at the SKU level any any corresponding summary level (product family, brand, etc.), because of the use of comparable alphabet profiles.