The New L-Skill Score: An Objective Way to Assess the Effectiveness and Accuracy of a Lead-time Demand Forecasting Process
Consumer demand-driven historical data are characterized to a large extent by seasonal patterns (consumer habits: economics) and trends (consumer demographics). For the sample data, this can be readily shown using the ANOVA: Two Factor without Replication option in Excel Data Analysis Add-in. These data have an unusual value in January 2016. identified in the a previous article and in my website blog; when adjusted it has a significant impact on the variation impacting seasonality (consumer habit) while reducing unknown variation.
Demand forecasting in today’s disrupted consumer demand-driven supply chain environment has become an extremely challenging discipline. For situations when – in pre new-normal times – demand occurs sporadically, the challenge becomes even greater. Intermittent data (also known as sporadic demand) comes about when a product experiences periods of zero demand. This has become more common during the pandemic supply chain disruptions, especially in the retail industry. The Levenbach L-Skill score, introduced here, is applicable in a ‘regular’ as well as intermittent lead-time demand forecasting process.
First Takeaway: Embrace change & chance by improving data quality through exploratory data analysis (EDA) as an essential preliminary step in improving forecasting performance.
An Objective Approach to Lead-time Demand Forecasting Performance Evaluation
When assessing forecasting performance, standard measures of forecast accuracy can be distorted by a lack of robustness in normality (Gausianity) assumptions even in a ‘regular’ forecasting environment. In many situations, just a single outlier or a few unusual values in the underlying numbers making up the accuracy measure can make the result unrepresentative and misleading. The arithmetic mean calculation can ordinarily only be trusted as representative or typical when data are normally distributed but can become a very poor measure of central tendency even for slightly non-Gaussian data. This is not widely realized, though commonly ignored in practice. Its impact on forecasting best practices needs to be more widely recognized among demand planners and forecasting practitioners.
Imagine a situation in which you need to hit a target on a dartboard with twelve darts, each one representing a month of the year. Your darts may end up in a tight spot within three rings from the center. Your partner throws twelve darts striking within the first ring but somewhat scattered around the center of the dartboard. Who is the best dart thrower in this contest? It turns out that it depends not only how far the darts land from the center but also on the precision of the thrower.
In several recent articles on my LinkedIn Profile and Delphus website blog, I laid out an information-theoretic approach to lead-time demand forecasting performance evaluation for both intermittent and regular demand. In the spreadsheet example, the twelve months starting September, 2016 were taken as a holdout sample or training data set for forecasting with three methods: (1) the previous year’s twelve-month actuals (Year-1) as a forecast of the holdout data, (2) a trend/seasonal exponential smoothing model ETS(A,A,M), and (3) a twelve-month average of the previous year (MAVG-12). The twelve-month point forecasts of the holdout sample are called the Forecast Profile (FP). Previous results show that the three Mean Absolute Percentage Errors (MAPE) are about the same around 50%, not great but probably typical, especially at a SKU-location level.
We now want to examine the information-theoretic formulation in more detail in order to derive a skill score (how precisely we can get the darts to land in the same spot on the dartboard) so that we can use it to measure the effectiveness of the forecasting process or the forecaster in terms of how much the method, model or forecaster contributed to the benefit of the forecasting process!
Creating an Alphabet Profile
For a given Forecast Profile FP, the Forecast Alphabet Profile FAP is a set of positive weights whose total sums to one. That is, a Forecast Alphabet Profile is a set of m positive fractions FAP = [f(1) f(2), . . f(m)] where each element f(i) is defined by
In other words, we simply divide each point forecast by the sum of the forecasts over the horizon (here m = 12). This will give us fractions whose sum equals one.
Likewise, the Actual Alphabet Profile is AAP = [a(1), a(2), . . . a(m)], where a(i) is defined by dividing the Leadtime Total into each actual value and a Forecast Alphabet Profile FAP = [f(1), f(2), . . . f(m)], where f(i) is defined by dividing the Leadtime Total into each forecast value. When you code a forecast profile FP into the corresponding alphabet profile FAP, you can see that the forecast pattern does not change.
For profile forecasting performance, we use a measure of information H, which has a number of interpretations in different application areas like climatology and machine learning. The information about the actual alphabet profile (AAP) is H(AAP) and the information about a forecast alphabet profile (FAP) is H(FAP), both entropy measures. There is also a measure of information H(a|f) about the FAP given we have the AAP information.
Accuracy of the forecast profile was previously defined as the Kullback-Leibler divergence measure D(a|f) = H(a|f) – H(a). If we rewrite D(a|f) = [H(a|f) – H(f)] + [H(f) – H(a)], it results in a decomposition of profile accuracy into two components, namely the (1) Forecast Profile Error (FPE) = H(f) – H(a) and (2) a Skill score.
I define Levenbach L-Skill score = H(f) – H(f|a), the negative of the first bracketed term in the expression for D(a|f).
As shown in the diagram, the information about FAP gets transformed into the information about AAP by the relation
Another way of looking at this is using Forecast Profile Error (Bias) = Profile Accuracy + L-Skill score. In words, this means that accurately hitting the bullseye requires both skill at aiming and measuring how far from the bullseye the darts strike the board.
The L-Skill score is in absolute value greater than zero, but does not include zero. The smaller, in absolute value, the better the L-Skill score. When a FAP is constant, as with the Croston methods, SES and MAVG-12 point forecasts, the L-Skill score = 0, meaning that with those methods, obtaining an (unbiased) Profile Bias of zero is clearly not possible.
For monitoring forecasts on an ongoing basis, it might be useful to create an L-Skill score indexdefined by 100 – |L-Skill score| and track the paths of the methods, models and judgmental overrides used in the forecasting process over time along with the Forecast Profile Accuracy (FPA) index, introduced in a previous article and blog. In a forthcoming article, I will embrace change and chance again by using the prediction limits for the most effective ESM model to create zones for excellent, good, adequate, poor and unacceptable L-skill scores.
In the context of a multistep-ahead forecasting process with a fixed horizon, we can assess the contribution of a method, model or forecaster in the performance of a forecasting process with the L-Skill score. In theory, statistical forecasting models are designed to be unbiased, but that theoretical consideration may not be valid in practice, particularly for ‘fixed horizon’ lead-time demand forecasting. Moreover, multiple one-step ahead forecasts have little practical value in lead-time demand forecasting as the lead-time is the ‘frozen’ time window in which operational changes can usually not be made.
Try it out on some of your own data and see for yourself what biases and performance issues you have in your lead-time demand forecasts and give me some of your comments in the meantime. I think it depends on the context and application, so be as specific as you can.
Hans Levenbach, PhD is Executive Director, CPDF Professional Development Training and Certification Programs. Dr. Hans is the author of a new book (Change&Chance Embraced) on Demand Forecasting in the Supply Chain and created and conducts hands-on Professional Development Workshops on Demand Forecasting and Planning for multi-national supply chain companies worldwide. Hans is a Past President, Treasurer and former member of the Board of Directors of the International Institute of Forecasters.
Embracing Change and Chance. The examination would not be complete until we consider the performance of the forecasting process or forecaster and consider the bounds for the new Levenbach L-skill score measure of performance, as well. This topic is included in my latest book revision available on Amazon.