Scrap Your Moving Average Forecast! Improve Forecasting Performance with a NaïveLT Benchmark for Leadtime Demand Forecasting
Assessing forecasting performance for lead-time demand forecasts has been a topic herebefore. A lead-time forecast is commonly used in practice for intermittent and regular demand forecasting applications, when
- planning product mix based on future patterns of demand at the item, product group and store level
- setting safety stock levels for SKUs at multiple locations
- evaluating accuracy in S&OP and annual budget planning meetings
- validating performance standardsin forecasting competitions
In contrast to multi one-step ahead or short-term forecasts, a lead-time forecast is a multi-step ahead forecast with a fixed horizon. It has always been a difficult and challenging task for demand planners and managers.
In several previous articles, I used a dataset (shown below) to assess forecasting performance with the test or holdout data (in italics) in the row for year 2016.
For a twelve-month holdout sample (Sep 2016 – Aug 2017), I have created three forecasts by using (1) judgment, (2) a method and (3) a statistical model. For a judgment forecast, I used previous year actuals (Sep 2015 – Aug 2016) as the forecast for the hold-out sample year. This forecast profile is labeled Year-1. Thisis also known as the Naive12 method. For a method forecast, I use the level point- forecast (MAVG_12), which is simply the average of previous 12-months of history repeated over the forecast horizon. The model forecast is based on the State Space forecasting model ETS (A,A,M), which is an exponential smoothing model with a local level and multiplicative seasonal forecast profile as described in Chapter 8 of my book Change&Chance Embraced.
View Lead-time Forecasts as Forecast Profiles Rather Than a Sequence of Point Forecasts
For lead-time forecasts, we can assess the performance of forecast profiles and create objective measures of accuracy and skill performance with information theoretic concepts for performance measurement.
The actuals and lead-time forecasts in the respective lead-time forecasts are coded or transformed into alphabet profiles AAP (Actual Alphabet Profile) and FAPs (Forecast Alphabet Profile) by dividing a lead-time Total into each component of the respective profiles. Coding results in positive fractions (weights) that sum to one, and have the same property as a discrete probability distribution. You can see that this does not change the pattern of the original profile.
Data Quality Matters Because Bad Data Will Beat a Good Forecast Every Time
It is not only a best practice, but should be a necessary requirement for demand planners to first explore and improve the quality of data prior to forecast modeling. .In the italicized row for year 2016, there appear to be unusual values in period 4 (Dec) and period 5 (January). You can also see it in the graph as the AAP peaks one period after the FAPs.
By switching Dec 2016 (= 49786) with Jan 2017 (= 73069), the seasonal pattern in the holdout sample is more consistent with its history. A preliminary analysis of variance for the seasonal and trend variation in these four years suggests that making the adjustment is important. The seasonal contribution to the total variation increased from 48% to 68%, appearing more consistent with the seasonality in the historical data and what might be expected in future years.
Introducing the NaïveLT Benchmark Forecast
A Naïve LT Benchmark can serve as a better benchmark than either the Year-1 naïve judgment forecast or MAVG-12 constant level method. The Naïve LT benchmark is the forecast to beat, so that we can use it to measure the effectiveness of the forecasting process demonstrating how much the method, model or judgmental forecast contributes to the benefit of the overall forecasting process.
The forecast alphabet profile (FAP) for the Naïve LT benchmark is a very straightforward calculation, simply the average of the individual FAPs in the history; in this case Year-1naive, Year-2 naive and Year-3 naive. They have the same pattern as the lead time forecast profiles in a Tier chart below. Note that creating the FAPs does not alter the pattern in the Tier chart, it only rescales them. In order to determine how effective the Naïve LT benchmark is, we need to calculate the Profile Accuracy D(a|f) and the L-Skill score and evaluate the best contributor to overall forecasting performance.
Assessing the Performance of Forecast Profiles
A forecast profile error (FPE) is measured by the difference
Profile Miss. A profile miss can be interpreted as a measure of ignorance about the forecast profile errors (FPE). The closer to zero the better, and the sign indicates over or underforecasting. The units are known as ‘nats’ (for natural logarithms).and ‘bits’ when using logarithms to the base 2 (e.g., climatology applications). I prefer ‘nats’ for lead-time demand forecasting. Thus, a forecast Profile Miss measures how different a forecast profile (or alphabet pattern) differs from a profile of actuals over a fixed horizon. For alphabet profiles, there is a measure of information H(a)that gives the information about the actual alphabet profile (AAP) and a measure H(f) that gives the information about a forecast alphabet profile (FAP). Hence, FAP Miss = H(f) – H(a):
Profile Accuracy. The accuracy of a forecast profile can be measured by a ‘distance’ measure between a forecast alphabet profile (FAP) and actual alphabet profile (AAP), given by the Kulback-Leibler divergence measure D(a,f): The D(a|f) measure is non-negative and is equal to zero if and only if a(i) = f(i), (i = 1,2, . .m). When D(a|f) = 0, the alphabet profiles overlap, or what we consider as 100% accuracy.
L-Skill score. The profile accuracy measure D(a|f) can be decomposed into two components: (1) a forecast Profile Miss and (2) a forecast Profile relative skill measure. Thus, Profile Miss = Profile Relative Skill + Profile Accuracy. This means that accurately hitting a target involves both skill at aiming and scoring how far from the target the darts strike the dart board. You won’t win many medals by just accurately hitting any spot on the dartboard. And by becoming more accurate does not necessarily improve your standing in the process. You also need to improve your skill score in order to get closer to the real target.
The relative skill measure is in absolute value greater than zero, but does not include zero, unless the forecast profile errors are zero. The smaller, in absolute value, the better the relative skill. When a FAP is constant, as with the Croston methods, SES and MAVG-12 point forecasts, the relative skill is always equal to zero, meaning that with constant level methods a zero forecast profile miss is not possible. One good reason to scrap the MAVG-12 forecast or use it as a benchmark.
For the NaïveLT benchmark, we calculate Profile Accuracy = 0.052, which is better than MAVG-12 and almost as good as the ETS(A,A,M) model. Also, the Profile relative skill = – 0.014 is also better than MAVG-12 but not as effective as the ETS(A,A,M) model.
The Levenbach L-Skill score is the ratio of the Profile relative skill measure and the Profile accuracy. For the NaïveLT benchmark this is = -0.264, which is better than the ETS(A,A,M) model for the unadjusted data, but not after solving the data quality issue first. This makes NaïveLT benchmark a good benchmark for lead-time forecasting, as it requires no modeling assumptions. And data quality matters!
How the NaïveLT Benchmark Performs
In my practice, I have often seen demand planners start a forecast by taking last year’s actuals and applying a growth percentage to the Total and prorating that bias back to the individual periods. Sounds familiar? Here are the results.
Hans Levenbach, PhD is Owner/CEO of Delphus, Inc and Executive Director, CPDF Professional Development Training and Certification Programs. Dr. Hans is the author of a new book (Change&Chance Embraced) on Demand Forecasting in the Supply Chain and created and conducts hands-on Professional Development Workshops on Demand Forecasting and Planning for multi-national supply chain companies worldwide. Hans is a Past President, Treasurer and former member of the Board of Directors of the International Institute of Forecasters.