Forecasting with Intermittent Demand – A New Approach
Before making any modeling assumptions, we should first consider an application and examine the underlying historical data. The spreadsheet table shows the demand series for a SKU and location in a population health forecasting application at a NYC area hospital from Jan 2015 to Apr 2018. The next 20 months are used as a test period, so that the results can be compared with the actuals for this period. We are asked to make a forecast for the year 2020. Similar data maybe seen for hospital admissions in virus pandemics, retail sales at doors (stores) in different regions of the country, and more commonly in spare parts inventory management applications.
Intermittent demand or ID (also known as sporadic demand) comes about when a product experiences several periods of zero demand. Often in these situations, when demand occurs it is small, and sometimes highly variable in size. Forecasting intermittent demand occurs in practice, when
- creating lead time demand forecasts for inventory planning.
- creating multiple forecasts of low volume items for a particular period in the future based on a rolling forecast horizon, as in an annual budget or production cycle.
- creating forecasts for multiple periods in an 18-month business planning horizon.
Step 1. Explore the nature of the interdemand intervals and its relation to the distribution of non-zero demand sizes, as it may be surmised that for longer interdemand intervals, demand sizes could be increasing or decreasing in size over time.
Starting from Feb 2015, we want to examine the relationship between intervals and demand sizes. I will define a ”Lag time” Zero Interval LZI as the interval duration preceding a demand size. The results are shown below. In this dataset there are three LZI interval durations. For example, the first LZI interval has one zero preceding the demand size 211. The next LZI interval has two zeros preceding the demand size 458. The next one has none and so on. If the process started in Jan 2015 with a new product introduction, for instance, we could add demand size 390 in the single-zero LZI bucket. However, this should not make a material difference in forecasting in an ongoing intermittent demand forecasting process.
Examining the data in this way differs from a Croston-based method in that in this SIB model, we do not assume that intervals and demand sizes are independent.
Add alt text
A Structured Inference Base (SIB) Model for Forecasting Intermittent Demand
In a previous LinkedIn article on bias measurement in forecasting, I introduced a simple measurement model that has a similar structure as the algorithmic model I introduced in the previous article for forecasting intermittent demand sizes based on the dependence on an LZI distribution.
If Ln refers to the natural logarithm, then the logarithm of the demand sizes can be described by a location measurement error model, known as a simple measurement model. The simple measurement model is known as a location model because of its structure. The location parameter is the unknown constant β* = Ln β in the equation Ln ID = β* + ɛ* . This is a SIB location measurement model. In practice, when there are more interdemand interval durations, the error variable ɛ* may need to be represented by a multivariate measurement error representing each possible LZI duration in the data, but for now we will assume ɛ*(τ) will depend on a typical LZI, represented by a single constant τ.
The SIB model approach is algorithmic and data-driven, in contrast to a conventional data-model with normality assumptions. According to Leo Breiman, there are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.
Keeping in mind the pervasive presence of outliers and non-normal (non-Gaussian) thick-tailed variation in real-world intermittent demand data, we will have to shy away from the normal distribution assumptions in what I refer to as a “gaussian arithmetic culture”, by not relying on arithmetic means, variances, CVs for summarizing data. Rather, we will assume a flexible familyof distributions for ɛ*, known as the exponential family. It contains many familiar distributions including the normal (Gaussian) distribution, as well as ones with thicker tails and skewness. There are also some technical reasons for selecting the exponential family, besides its flexibility.
The black box model Ln ID = β* + ɛ*(τ, λ) shows that the output Ln ID results from a translation of an input measurement error ɛ*(τ, λ) shifted by a constant amount β*, in which a conditioned measurement error distribution depends on a fixed shape parameter λ and a typical “lagtime” interdemand interval τ.
Forecasting Intermittent Demand: A Case Example
Starting with the SIB location model, we can continue to analyze the model for forecasting demand size ID as follows:
Step 2: Determine the interdemand interval distribution. Each demand size is preceded by an interval of zeros. Adjacent demand sizes are separated by a zero lagtime interval LZI_0. In this dataset, there are three LZI intervals with the following distribution: LZI_0 has 10 occurrences, LZI_1 and LZI_2 each have 6 occurrences. Thus, the empirical frequency distribution of LZI looks like this:
Step 3: Each interdemand interval LZI is followed by a demand size ID. Our measurement model is expressed in terms of ID* = Ln ID, so we display the distribution of demand sizes ID* by LZI intervals in a box and whisper plot or box plot. The box plot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot. The three distributions corresponding to each interval size are not the same. The middle one has thicker tails than the first one, and the third box plot depicts a skewed distribution.
Step 4: Create point forecasts for ID* by interval size. Unlike the Croston method, multistep forecasts are not assumed to have a level or flat forecast profile. In our example, we can start by using an overall mean, median, last actual (naïve) or an average of the most recent three demand sizes. The typical projection can be obtained by averaging as an averaged projection can often outperform an individual projection. Exponentiate the typical projection to obtain a typical ID = exp (LN ID*) for each interval size LZI.
Step 5: Create enough interval projections to cover the lead-time or planning horizon. By sampling the LZI distribution as an urn model, we obtain multistep interdemand intervals. Each projected interdemand interval will be followed by a projected demand size associated with that interval. The urn model in this example would contain 22 colored balls for the forecast horizon, each color associated with an interval duration [ten blue balls for LZI_0, six orange balls for LZI_1, and six grey balls for LZI_2]. In the sampling process, a ball is drawn from the urn, a color noted, the ball replaced in the urn, the urn shaken, and then another ball drawn, etc. until the forecast horizon (20 months) is covered. These are all programmable steps.
Step 6. Create a forecast profile as needed. For example, consider a sampled sequence of interval sizes to cover May 2018 through Dec 2019: {LZI_0, LZI_2, LZI_0, LZI_1, LZI_0, LZI_2, LZI_1, LZI_0, LZI_2, LZI_0, LZI_1}. Each interval is followed by a demand size, so in this example, the projected intermittent demand profile starting in May 2018 is {550, 0, 0, 195, 550, 0, 214, 550, 0, 0, 195, 0, 214, 0, 0 , 195, 550, 0, 214}.
Add alt text
The SIB model also allows for an assessment of the uncertainty in terms of confidence bounds on the location parameter β*, a posterior distribution for β*, and likelihood analyses for the shape parameter λ and typical “lagtime” interdemand interval τ in the measurement error ɛ*(τ, λ). But, to fully embrace change (ID*) and chance (ɛ*) with the intermittent data is a topic for another time.