### Do You Need Something More Credible than the Croston Method for Forecasting Items with Intermittent Demand?

**Data quality in forecast errors** and other sources of unusual data values should never be ignored in demand forecast modeling and **accuracy measurement**, especially when forecasting intermittent demand. **Intermittent demand** or ID (also known as **sporadic demand**) comes about when a product experiences several periods of zero **demand**. Often in these situations, when **demand** occurs it is small, and sometimes highly variable in size. There are a number of ways in which intermittent demand comes up in practice, for example, when

· creating lead time demand forecasts for inventory planning.

· creating multiple forecasts of low volume items for a single period in the future, based on a rolling forecast horizon, as in an annual budget or production cycle.

· creating forecasts for multiple periods in an 18-month business planning horizon.

Before making any statistical or probability assumptions for a forecasting problem, we should first thoroughly **explore the underlying historical data** patterns within the context of a particular application. The spreadsheet table shows the intermittent product demand series for a SKU and location in a population health forecasting application at a NYC area hospital from Jan 2015 to Apr 2018. We are asked to make a forecast through Dec 2019. Similar data may occur in the retail industry at doors (stores) in different regions of the country.dd alt text

The first step is to explore the nature of the interdemand intervals and the distribution of non-zero demand sizes in relation to interdemand interval duration. Starting from Feb 2015, we examine the dependence relationship between intervals and demand size. I define a ”**L**ag time” **Z**ero **I**nterval LZI as the interval duration *preceding* a demand size. The results are shown below. There are three types of LZI interval durations. For example, the first LZI interval has one zero preceding the demand size 211. The next LZI interval has two zeros preceding the demand value 458. The next one has none and so on. If the process started in Jan 2015 with a new product introduction, for instance, we could add demand size 390 in the one-zero LZI bucket. However, this should not make a material difference in an ongoing intermittent demand forecasting process.

Treating the data in this way differs from a Croston-based method in that our approach does not assume the independence between intervals and demand sizes.

### A Structured Inference Base (SIB) Model for Forecasting Intermittent Demand

Similar to my previous LinkedIn article on bias measurement in forecasting, I will introduce a SIB algorithmic model for forecasting intermittent demand sizes based its dependence on an LZI distribution.

A *location-scale* measurement error model can be viewed as a black box model. The nonzero demand size ID in an intermittent demand history can be represented as the output ID = β + σ ɛ, in which β and σ are unknown constants and the input ɛ is a measurement error with a known or assumed distribution. This equation can be rewritten as ID = β (1 + σ/β ɛ).

If Ln refers to the natural logarithm, then the logarithm of the demand sizes can be transformed into *location* measurement error model, known as a *simple measurement model*. The location parameter is the unknown constant Ln β in the equation Ln ID = Ln β + {Ln (1 + σ/β ɛ)}. The term inside the curly brackets represents the measurement error in this model. Thus, Ln ID = β* + ɛ* is a SIB location measurement model with error ɛ*= Ln (1 + λ ɛ) and location parameter β* = Ln β. We will call σ/β a *shape parameter* λ for the ɛ* distribution.

In practice, when there are more interdemand interval durations, the error variable ɛ* may need to represented by a multivariate measurement error for each possible LZI duration in the data, but for now we will assume ɛ*(**τ)** will depend on a typical LZI, represented by a single constant **τ.**

Keeping in mind the pervasive presence of outliers and non-normal variation in the real-world intermittent demand forecasting environment, we will have to shy away from the normal distribution. Rather, we will assume a *flexible family* of distributions for ɛ*, known as the **exponential family**. It contains many familiar distributions including the normal (Gaussian) distribution, as well as ones with thicker tails and skewness. There are also some technical reasons for selecting the **exponential family****,** besides its flexibility.

### The SIB Model Approach: Phase I

The SIB approach is ** algorithmic and data-driven**, in contrast to a conventional

*data-model with normality assumptions*. The simple measurement model is known as a

*location*model because of its structure. What can be learned about the measurement process given THIS black box model and the data?

The black box model Ln ID = β* + ɛ*(τ, λ) shows that the output Ln ID results from a translation of an input measurement error ɛ*(τ, λ) shifted by a constant amount β*, in which a *conditioned* measurement error distribution depends on a fixed shape parameter λ and a typical lagtime interdemand interval τ.

This simple measurement model and its generalizations were worked out over four decades ago and can be found in a book by D.A.S. Fraser, entitled **Inference and Linear Models**, and also in a number of Fraser’s academic journal articles dealing with statistical inference and likelihood methods. (*Statistical inference* refers to the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships.)

Starting with the SIB location model, we can analyze the model for demand size ID as follows:

In practice, we have multiple measurements {ID 1, ID 2, ID 3., . . ., ID n} of observed demand sizes over a time horizon, as shown in the spreadsheet example with n =22. The output of the location measurement model is

Ln ID 1 = β* + ɛ*1(τ, λ)

Ln ID 2 = β* + ɛ*2(τ, λ)

Ln ID 3 = β* + ɛ*3(τ, λ)

Ln ID n = β* + ɛ*n(τ, λ)

where ɛ*(τ, λ) = {ɛ*1(τ, λ), ɛ*2(τ, λ), ɛ*3(τ, λ). . . , ɛ*n(τ, λ)} are now *n realizations* of measurement errors from an assumed distribution with fixed shape parameter λ and lagtime interdemand interval τ in the exponential family.

The question now is what information can we uncover about the black box process? This is where it gets interesting, and may perhaps appear somewhat unfamiliar for those who have been through a statistics course on inferential methods.

If you could explore the innards of the black box as a detective, you would discover that, based on the data, there is now information about the unknown, but *realized* measurement errors ɛ*(τ, λ). This revelation will guide us to the next important SIB modeling phase, namely a decomposition of the measurement error distribution into two components: (1) a marginal distribution for the *observed* components with fixed (τ, λ) and (2) a conditional distribution (based on the observed components) for the remaining unknown measurement error distribution, which does not depend on (τ, λ). So, what are these observed components of the error distribution that we could expose?

This is the insight or essential information gleaned from the structure of the black box and the recorded output data. If we select a *location metric* like the arithmetic mean, median, or smallest value (first order statistic) for location, we can make a calculation that provides observable information about the measurement process. Let’s call this location metric m(ID*), where ID*= Ln ID. Then the black box process reveals (with substitution and some simple manipulations) and leaving out the (τ, λ)notation, that

ID*1 – m (**ID***) = β* + ɛ*1 -– m (β* + **ɛ***)

= β* + ɛ*1 – β* – m (**ɛ***)

** =** ɛ*1 –m (**ɛ***)

ID*2 – m (**ID***) = ɛ*2 – m (**ɛ***)

ID*3 – m (**ID***) = ɛ*3– m (**ɛ***)

ID*n – m (**ID***) = ɛ*n– m (**ɛ*)**

Notice that the *left-hand side* of each equation can be calculated from the data, so the *right-hand side* is information about a realized measurement error ɛ*and its distribution. What is known we can condition on, so we can derive a conditional distribution given the *known* error component and a marginal distribution for the known error component with fixed shape parameter λ and lagtime interdemand interval τ . We do not need to go further into details at this time. This is doable, albeit computationally intensive. This is somewhat like what a gambler can do knowing the odds in a black jack game. You can make calculations and inferences from what you observe in the dealt cards. We do the same thing here with our black “inference game” box.

To convince yourself that the above equations work, think of the metric m (.) as the arithmetic mean or the median, then as you add a constant amount to each value, the mean or median is shifted by the same constant amount. In other words, m (*a* + **x**) = *a* + m (**x**), where *a* is a constant and **x** = (x1, x2, x3, … x n) are the data. As it turns out, the choice of the metric is not important in the inferential procedure, as long as m (.) has the above property. Some metrics may turn out to be easier to process in the computations than others depending on the choice of the error distribution.

The useful information we can derive from this analysis is a decomposition of the measurement error distribution. The analysis will yield a (conditional) posterior distribution for the unknown parameter β* from which we can derive unique confidence bounds for the unknown constant and related likelihood inferences for the (τ, λ) constants.

I will not bore you with the mathematical details here because, as a graduate student I struggled with the derivations along with my peers. It was all math back then. Now, as data scientists we can compute using real data.

The SIB model is well documented in two of D.A.S. Fraser’s books along with his peer reviewed academic journal articles. Because the results do not lend themselves to simple theoretical formulae (except for the normal distribution), they have not seen much daylight in practice until data science came into its own. I was trained as a Data Scientist early in my career, but then it was called Applied Statistics. Nowadays, inferential modeling can be dealt with in today’s, empirically rich and fast, computing environment. In other words, SIB modeling is a practical approach, not possible in my grad school days. With modern computing power, we can now begin to show what we *should* be doing for intermittent demand forecasting, not necessarily what we *could* do based on the mathematics of unrealistic normality assumptions.

In a practical forecasting environment, we first need to project the LZI distribution with an urn model for sampling from the empirical or an assumed LZI distribution. Each LZI is followed by a projection from the ID* distribution conditioned on the sampled LZI values. Each LZI value is followed by a projected ID* from the associated conditioned distribution for ɛ*(**τ,** λ**)** in the model. In practice, this means that forecasters need to maintain quality source data, scrubbed and updated; that is, the LZI and ID* distributions, as shown in the spreadsheet.

The simple measurement model is an application of a *Structured Inference Base* (SIB) approach that can be generalized to a wide range of applications. In a future article I will elaborate on the details of the SIB process for forecasting and making inferences for the unknown constants in a repair parts inventory problem.