Sage Journals: Discover world-class research

Abstract

Accurate demand forecasting is always critical to supply chain management. However, many uncertain factors in the market make this issue a huge challenge. Especially during the current COVID-19 outbreak, the shortage of certain types of medical consumables has become a global problem. The intermittent demand forecast of medical consumables with a short life cycle brings some new challenges, such as the demand occurring randomly in many time periods with zero demand. In this research, a seasonal adjustment method is introduced to deal with seasonal influences, and a dynamic neural network model with optimized model selection procedure and an appropriate model selection criterion are introduced as the main forecasting models. In addition, in order to reduce the impact of zero demand, it adds some input nodes to the neural network by preprocessing the original input data. Lastly, a modified error measurement method is proposed for performance evaluation. Experimental results show that the proposed forecasting framework is superior to other intermittent demand models.

Keywords

intermittent demand forecasting dynamic neural network optimized model selection accuracy measurement short life cycle

Introduction

Demand could be smooth, intermittent, lumpy, erratic and slow-moving.¹ In the past few decades, many researchers have proposed different indicators to categorize demand types. The two most popular indicators are Average inter-Demand Interval (ADI) and Coefficient of Variation (CV).¹ A lot of methods have been proposed to cope with the forecasting of different demand patterns.^2–4 Generally, if the average number of time periods between two non-zero demands is 1.25 times the number of the inventory review period, the demand can be considered as intermittent demand.⁵ Traditional forecasting techniques, such as simple exponential smoothing or simple moving average, are inappropriate for intermittent demand.⁶ This paper will focus on new methods that can forecast the intermittent demand for medical consumables with short life cycle, which may help supply chain decisions during the COVID-19 epidemic.

Intermittent demand is usually related to engineering spares, service parts and high-priced capital goods, such as heavy machinery.⁷ In recent years, research on customized short-life medical consumables has attracted attentions from global scientists from all over the world. The main characteristics of this typical medical consumables are short lifecycle, large proportion of zero daily demands, various product styles and long replenishment lead time, which are very similar to the fast fashion industry¹ in certain aspects, and it also brings a lot of challenges to the supply chain management around world. Demand forecasting is of great importance in controlling the inventory of such products, but the intermittent and slow-moving nature of demand makes forecasting particularly difficult.

Motivated by a real business analytics project, we propose a new method to forecast the intermittent demand. Our study arises from a manufacturer of medical consumables that operates a warehouse and a couple of retail stores in Asia. Since the demand for medical consumables is usually erratic, we introduce the techniques of dropping outliers, seasonal adjustment and aggregation to preprocess historical data. In addition, a new forecast accuracy measurement is proposed specifically for the zero demand records and a dynamic neural network is designed to handle the erratic and unstable demand data.

The remainder of this study is structured as follows. In Section 2, the overview of relevant research and existing literature is provided. Section 3 introduces the main forecasting models, accuracy measurements and competing intermittent demand forecasting methods. In Section 4, the experimental and computational results are reported, including descriptive analysis of real data set, data preprocessing, experiment implementation and results analysis. Section 5 concludes this study with some remarks and future research directions.

Literature review

In this section, we mainly review previous work on data aggregation, demand forecasting, model optimization and selection.

Data aggregation

The main challenge of intermittent demand forecasting lies in the large proportion of zero demands. An intuitive way dealing with such pattern is to aggregate demands. The data from the company of this study reveals that most product demand is intermittent. Therefore, it is very difficult to use the original sales data in the point-to-sale (POS) system of each individual store to make forecasting directly. Data aggregation is usually conducted before applying forecasting methods.^8,9 An important decision that should be made is to aggregate the dimensions and levels of sales data. There are three possible dimensions for aggregation, namely the time dimension, the product hierarchical classification (i.e. product tree)^8–10 dimension and the sales channel dimension. In the time dimension, the aggregation level can be day,^11,12 week,¹³ month^9,14–16 or year.^17,18 In the product hierarchical classification dimension, the aggregation level can be SKU,¹⁵ article (a group of SKUs under the same product number but with different sizes and colors),¹⁹ product family/group (e.g. shoes, apparel, or accessories),^11,13,14,18 attribute values (such as the color being red or black, or the size being large, small, or middle),^13,17,19 or the assortment of the overall supply chain.¹⁶ In the sales channel dimension, the aggregation level can be store,¹², all stores in the supply chain,^13–18 or a set of stores in a region/city that is a part of the supply chain.¹¹ The aggregated data sets for different levels in each dimension often have different features in smoothness, intermittence, lumpiness and slow-moving. For example, the daily sales data sets for all SKUs and all stores are very likely to be smooth. Different aggregation levels in each dimension may require different forecasting methods and have different influences on forecasting accuracy.⁸ In Table 1, we summarize the aggregation levels in existing literature on short life cycle product sales forecasts.

Table 1.

Summary of the aggregation levels in the three dimensions.

Reference	Aggregation levels in time dimension	Aggregation levels in product hierarchical classification dimension	Aggregation levels in distribution channel dimension
Choi et al.¹⁹	N/A	Color	N/A
Winters¹⁴	Month, Bi-monthly	Product family, such as cooking utensils, paint and cellars	All stores in the supply chain or Geographical area
Sun et al.¹⁵	Month	SKU	All stores in the supply chain, all stores in Hong Kong
Xia et al.¹⁶	Month	All SKUs in the supply chain	The whole supply chain
Au et al.¹¹	Day	Product family/group (T-shirt and jeans)	Region with less than 10 shops
Choi et al.¹⁷	Year	Aggregated by attribute level of color	The whole supply chain
Ren et al.¹³	Week	One is at the product group/family level (such as T-shirt, dress, bag, pants, accessory, or belt), the other is at the color level (such as red, black, or blue)	The whole supply chain
Choi et al.¹²	Day	N/A	Store
Wong and Guo¹⁸	Month, Quarter, Year	Product family/group (skirt, jacket, coats, or pants)	All stores in the supply chain or a city
Our study	Day	SKU	All stores in the supply chain

N/A = Not Available (not clearly given in the reference paper).

Nikolopoulos et al.²⁰ proposed a framework, called ADIDA, to optimize the aggregation level. ADIDA is an empirical framework, not a theoretical framework, which only considers the aggregation of time dimension. The time dimension has to consider the lead time.²¹ In the dimension of product hierarchical classification, the products themselves and the manufacturing procedure should be considered, so that the aggregation improves the manufacturing efficiency. For instance, if the aggregation level in the product hierarchical classification dimension is a color level, and the painting procedure precedes the manufacturing procedure, the aggregation does not make sense because managers must determine the quantity of each color during the manufacturing process. Lastly, in the distribution channel dimension, three levels can be considered, namely store, region and supply chain. The selection of distribution channel dimension also needs to consider the manufacturing procedure. In short, data aggregation is a systematic work, and various aspects should be considered. Syntetos et al.²² established a supply chain structure, which includes four dimensions for supply chain forecasting, namely product dimension, location dimension, time dimension and echelon dimension. Figure 1 depicts the three dimensions of data aggregation, where each small block represents one choice of data aggregation. In this study, we select SKU-Day-Chain level to aggregate the demand according to the discussion with managers.

Figure 1.

Data aggregation structure.

Demand forecasting

Most of the previous studies dealing with intermittent demands has focused on inventory management and employed appropriate forecasting methods to predict future demands. It should be noted that these forecasting methods are based on the assumption that future demand follows a standard probability distribution, such as Poisson or Negative Binomial distribution. However, real data in the industry often indicates that demand does not satisfy those standard distributions. Thus, these assumption-based forecasting methods performed poorly on real industrial data sets.^7,20 Various models have been applied to intermittent demand forecasting. For example, single exponential smoothing and simple moving average are often employed in practice to handle intermittent demands. However, Croston²³ demonstrated that traditional forecasting methods, such as Single Exponential Smoothing, did not perform well for intermittent demand, so a more suitable method, the Croston method, was proposed. Their experiments have shown that Croston’s method outperforms traditional forecasting methods for intermittent demand. Later,^5,24 modified the original Croston’s method to improve the forecasting accuracy.

In recent years, specially designed consumables have become more and more popular. One of the most typical characteristics of such products is the short life cycle, which makes historical sales data very limited. Due to the short life cycle, inventory management is very crucial, and the demand forecast of medical consumables with short life cycles is also very difficult. To address this challenge, extensive research has been conducted on the sales forecasts of such products in the existing literature.^1,25 Numerous sales forecasting methods have been proposed, such as moving averages and exponential smoothing,²⁶ Holt-Winters exponential smoothing,¹⁴ autoregression, artificial neural networks, extreme learning machines,^15,16 evolutionary neural networks,¹¹ the hierarchical Bayesian approach,²⁷ and hybrid model.¹² More detailed and extensive reviews on these forecasting methods can be found in Nenni et al.,¹ Thomassey,⁹ and Liu et al.²⁵ Liu et al. divided sales forecasting methods into three groups, namely statistical forecasting methods, advanced statistical learning forecasting methods and hybrid forecasting methods. The advantages and disadvantages of each type of forecasting method are discussed in Liu et al.²⁵ The statistical forecasting methods are simple and fast, but they cannot produce good results because they cannot handle irregular patterns. Although advanced statistical learning forecasting methods have a stronger ability to identify irregular patterns than statistical forecasting methods, they are time-consuming and require sufficient data. Therefore, they are inappropriate for such products because usually there are a large number of different products for short life cycle medical consumables, and future demand needs to be predicted promptly. In terms of calculation speed, hybrid forecasting methods perform the worst because they include not only advanced statistical learning forecasting procedure, but also other processes.

It should be noted that due to several factors such as short life cycle, short selling seasons and long replenishment lead times, the sales data sets of specially designed consumables are usually highly unstable and limited in size.^1,8,9,19 The concepts of intermittent data and such product sales data are not equivalent because the data may not be intermittent. Many previous works have been established to classify demand patterns.^28–30 As shown in Figure 2, we employ the method in Syntetos et al.,²⁹ which uses two indicators, namely ADI and CV, to recognize intermittent demands.

Figure 2.

Classification of demand patterns.

Model selection

Some researchers have investigated how to optimize the forecasting model of intermittent demand.^31,32 Another research direction similar to model specification in statistics is model selection. The purpose of model specification and selection is to find the most suitable model for a given data set. Some methods for selecting intermittent demand models based on data characteristics have been proposed by Kostenko and Hyndman²⁸ and Heinecke et al.³⁰ Furthermore, Kourentzes³³ investigated whether different accuracy measures can facilitate and automate the model selection for intermittent demand data.

There are few researches on optimizing the forecasting model for short life cycle products.¹¹ proposed a modified evolutionary neural network to conduct forecasting for a fashion retailer, which is faster than the original evolutionary neural network. However, the model in Au et al.¹¹ is still time-consuming and may not converge to the optimal result.

In brief, model selection is very important to forecast the demand for such short life cycle products because of the considerable irregularity in their sales data. In our research, we propose a new forecasting accuracy measure for the model to deal with zero demand, and introduce a dynamic neural network, minimum description length neural network (MDL-NN), as the core part of the forecasting model. MDL was originally used for data compression,^34,35 which is a technique from algorithmic information theory, which indicates that the best hypothesis for a given data set is the assumption that results in maximum data compression. One seeks to minimize the sum of length, which includes an effective description of the model and length, and an effective description of the data when coding with the model.³⁵ In time series forecasting, the description length of a model is the sum of the model description length and the model prediction error. MDL-NN searches for the optimal model size (i.e. the number of neurons in neural network), and avoids overfitting or underfitting by minimizing the model’s description length and its performance.³⁴

Methodologies

Our proposed demand forecasting model

In this section, our proposed demand forecasting model will be introduced. This demand forecasting model mainly includes three parts, the first part is the basic forecasting model, the second part is the criterion about the optimal model, including when the model selection procedure should be stopped, and the last part is the optimized model selection procedure. Since the model selection problem is an NP-hard problem, we cannot search all candidate models. Thus, the optimized model selection procedure can effectively accelerate the search for optimal model. Lastly, we propose a framework for the demand forecasting model employed in our research.

Basic forecasting model

Neural network as a flexible model was employed in our study. The activation functions are radial basis functions, since we design a flexible neural network, initially, both linear radial basis functions and nonlinear radial basis functions are included in neural network model. Without loss of generality, let $d$ previous daily sales records, denoted by $z_{i - 1} = (x_{i - 1}, x_{i - 2}, x_{i - 3} \dots x_{i - d})$ , be the input data for forecasting the sales in day $i$ . In addition, to avoid high ratio of sparse data, we add some estimators as the input data, assume $ϑ_{q, i} = (x_{i - (q + 1)} + x_{i - (q)} + \dots + x_{i - (1)}) / (q + 1)$ , where $q = 1, 2, \dots, d - 1$ .

Then the new input data is $Z_{i} = (x_{i - 1}, x_{i - 2}, x_{i - 3} \dots x_{i - d}; ϑ_{1, i}; ϑ_{2, i}, \dots, ϑ_{d - 1, i})$ and $y_{i}$ is the output of neural network. $ω = {Z_{i}, {y_{i}}}_{i = 1}^{n}$ is the final data set for forecasting model. And the mathematical expression of the three-layer neural network can be written as:

\begin{array}{l} y_{i} = f (Z_{i}; Λ_{k}) = φ (λ_{0} + \sum_{l = 1}^{d} β_{l} . x_{l} + \sum_{s = 1}^{d - 1} α_{s} . ϑ_{s, i} + \sum_{j = 1}^{m} λ_{j} ϕ (Z_{i - 1} . c_{j} - r_{j})), \\ i = d + 1, 2, \dots, N \end{array}

(1)

where $Λ_{k} = (β_{1}, β_{2}, \dots, β_{d}; α_{1}, α_{2}, \dots, α_{d - 1}; λ_{0}, λ_{1}, λ_{2}, \dots, λ_{p})$ , the vector $c_{j} \in R^{2 d - 1}$ is the center of the $j$ -th basis function and $r_{j} > 0$ is known as the radius. The activation functions $ϕ$ and function $φ$ are:

ϕ (x) = t a n h (x) = \frac{e^{2 x} - 1}{e^{2 x} + 1} a n d φ (x) = x .

(2)

Criterion of optimal model

In order to select optimal neurons in neural network, the minimum description length (MDL) is adopted as a criterion to select the optimal neural network. The number of neurons in the hidden layer determines the complexity of the neural network, and selecting the most appropriate number of neurons becomes a challenge and important task when optimizing the architecture of neural network. According to bias-variance trade-off theory, if the model is too simple and includes very few parameters relative to data, then it will have large bias, we call that underfitting, and then model cannot completely capture the data patterns.³⁶ While if the model includes very many parameters relative to data, then it will suffer from large variance, we call that overfitting, the model performs well on the training data, but bad on the test data. Therefore, we employ a criterion to select optimal model to avoid overfitting or underfitting problem. The original expression of MDL³⁷ is presented as follow:

S_{k} (ω) = L (ω | \hat{Λ}) + (\frac{1}{2} + l n γ) . k - \sum_{j = 1}^{k} l n {\hat{δ}}_{j}

(3)

where $L$ is the negative log likelihood. However, in equation (3) $\hat{δ}$ need calculating second derivative of $L (ω | \hat{Λ})$ , and then a first derivation is also need for the optimal $\hat{δ}$ , this procedure is complex and time consuming. Thus, a simpler version of MDL,³⁸ called sMDL in our study, is employed to be referred to as model selection criterion in our study, which is presented as follow:

S_{k}^{'} (ω) = L (ω | \hat{Λ}) + (\frac{k}{2}) \cdot l n (N)

(4)

The above criterion is the same with Schwarz’s criterion³⁹ in form but not in scope nor in content.³⁸

Optimized model selection procedure

In this section, optimized model selection procedure will be introduced, which adopted sMDL to selection optimal neural network. As we have mentioned before, the search of optimal model is a NP-hard problem, effective search algorithm should be employed to handle this problem. The main algorithm is searching for the optimal number of neurons by iteratively adding and removing neurons until the model reach the criterion of sMDL. Thus, we have no need to set the number of neurons in the hidden layer in advance. The detailed procedure of selecting the optimal nodes in neural network model is presented as follow:

Step 1: Let $Φ^{(0)} = ϕ$ , which is the empty matrix and let $k = 0$ , $Φ^{(k)}$ is the matrix consisting of the evaluation of the $k$ (selected) neurons.

Step 2: Compute the weights $Λ_{k}$ such that ${‖ e = y - Λ_{k} Φ^{(k)} ‖}^{2}$ is minimal. Initially, $Λ_{k}$ is empty and ${‖ e = y ‖}^{2}$ .

Step 3: To find optimal neurons in the hidden layer, we generate a set of candidate nonlinear neurons $Θ^{(k)} \subseteq {ϕ (Z \cdot c - r) | c \in R^{2 d - 1}, r \in R}$ , where a set of candidate centers $c$ and a set of radii $r$ are selected.

Step 4: To select the best candidate neurons, we find the basis function $θ$ that $| \sum^{} θ (y_{i}) e_{i} |$ is maximal.

Step 5: Let $Φ^{(k + 1)} = [\begin{matrix} Φ^{(k)} \\ θ (Y) \end{matrix}]$ , where $Y = {y_{i}}_{i = d + 1}^{N}$ , and then compute the weights $Λ_{k}$ such that ${‖ e = y - Λ_{k} Φ^{(k + 1)} ‖}^{2}$ is minimal.

Step 6: Add a new neuron in the hidden layer. If there is a neuron that fits the current error worse than the new neuron, then that neuron should be abandoned, namely, find $h (1 \leq h \leq k)$ such that $| \sum^{} ϕ_{h, i} e_{i} | < | \sum^{} ϕ_{g, i} e_{i} |$ for all $g (1 \leq g \leq k)$ .

Step 7: If $h$ is the new neuron, then $k = k + 1$ , or else, remove neuron $h$ .

Step 8: The weights $Λ_{k}$ are recomputed, and model accuracy is ${‖ e = y - Λ_{k} Φ^{(k)} ‖}^{2}$ .

Step 9: If the minimal sMDL in equation (4) is reached, then stop, or else, go to step 3.

The framework of demand forecasting

The framework of our demand forecasting method is presented in Figure 3. The detailed steps are described as follows:

Step 1: Aggregate data based on the original data set;

Step 2: Preprocess data;

Step 3: Select the optimal neural network for the sales time series of each product;

Step 4: Conduct forecasting for each time series;

Step 5: Calculate forecasting accuracy and significance test of performance for different models.

Figure 3.

The framework of our demand forecasting method.

Forecasting accuracy measurements

To evaluate the forecasting methods, we use some common measurements, including mean absolute percentage (MAPE), symmetric mean absolute percentage error (SMAPE) and mean absolute error (MAE). Although several studies proposed new measures to particularly evaluate the forecasting methods for intermittent demand,^40,41 very few of them paid attention to the large proportion of zero values. Prestwich et al.⁴² proposed some mean-based measurements for intermittent demand. The most widely used measure is MAPE whose mathematical expression is:

M A P E = \frac{1}{N} \sum_{i = 1}^{N} | \frac{x_{i} - y_{i}}{x_{i}} | \times 100 %

(5)

where $N$ is the number of data points to be forecasted, $x_{i}$ is the actual sales and $y_{i}$ is the forecasted value. However, when the actual sales $x_{i}$ is zero, the MAPE value will be positive infinite and become meaningless. In our data sets, the zero sales records frequently occur during the life cycles of many SKUs. Handling the zero sales records in the evaluation process is very important and challenging, but it did not attract attentions in the existing literature. Too many zero sales records bring serious problem for model training. For example, if 90% of all sales records are zero, then the model will learn too much zero information due to the imbalance data.⁴³ As a result, the trained model will tend to identify every demand as zero. To deal with this issue, we propose a new accuracy measure to improve the generality of the method. This measurement is a modified version of MAPE and we called it RMAPE:

R M A P E = \frac{1}{N} \sum_{i = 1}^{N} ρ (x_{i}, y_{i})

(6)

where

ρ (x_{i}, y_{i}) = {\begin{matrix} | \frac{x_{i} - y_{i}}{x_{i}} | \times 100 % x_{i} > 0 \\ | \frac{x_{i} - y_{i}}{x_{i}^{m}} | \times 100 % x_{i} = 0 \end{matrix}

(7)

Where $x_{i}^{m} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$ . Model can be strengthened is terms of the generalization ability by taking RMAPE to guide the training process. RMAPE allows small forecasting errors for zero demand, and thus the model will not be forced to learn from too much zero demand information. Then, in the prediction of the testing set, the model will not tend to recognize every point as zero, which enhances the ability for predicting the non-zero demand.

In addition to the proposed RMAPE, we use two other measures, namely SMAPE and MASE, which also address the issue of zero sales records, we replace SMAPE by RSMAPE to eliminate the effect of zero. These two methods are defined as:

R S M A P E = \frac{1}{N} \sum_{i = 1}^{N} μ (x_{i}, y_{i})

(8)

where

μ (x_{i}, y_{i}) = {\begin{matrix} | \frac{x_{i} - y_{i}}{(x_{i}^{m} + y_{i}) / 2} | \times 100 % x_{i} = 0 a n d y_{i} = 0 \\ | \frac{x_{i} - y_{i}}{(x_{i} + y_{i}) / 2} | \times 100 % o t h e r w i s e \end{matrix}

(9)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | x_{i} - y_{i} |

(10)

Where $x_{i}^{m} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$ . The RSMAPE method imposes a larger penalty on the underestimated values on the overestimated values. This property is extremely appropriate for the sales forecasting. Although underestimation does not increase inventory cost, it results in not only losing current sales and revenue reduction, but also decreasing the level of customer satisfaction. Demand overestimation only increases the inventory cost. The MAE does not have this property. However, in our background, the value of samples is small, thus, big MAPE value doesn’t always indicate that the model is ineffective, MAE is more appropriate than other two measurements to judge the effectiveness of model.

Competing methods

Five other widely used forecasting methods for the demand of short life cycle products, namely gray model (GM), extreme learning machine (ELM), support vector machine (SVM), Markov regime switching (MS), Syntetos and Boylan’s method and neural network in equation (1) without optimized model selection procedure are applied to our data sets for comparative analyses. The GM was first introduced by Deng⁴⁴ and used for demand forecasting in.^17,45 This method has two parameters $M$ and $N$ (so the GM with $M$ and $N$ is denoted by $G M (M, N)$ ), where $M$ represents the order of the differential equations and $N$ indicates the number of variables. Here we only use $G M (1, 1)$ as a benchmarking method. The ELM was proposed by Huang et al.⁶ and used by Sun et al.¹⁵ to examine the relationship between sales and other factors such as color and design. SVM is a popular nonlinear forecasting technique which has been successfully applied in a lot of fields.⁴⁷ Markov regime switching was proposed by Hamilton,⁴⁸ which has been widely applied in economics and finance. Syntetos and Boylan’s method is a common method to conduct intermittent demand forecasting.²⁴

Numerical investigation

In this section, Numerical investigation will be conducted. We first describe a data set of a real medical consumable retailer, and then report detailed information on data preprocessing, experiment implementation, result analysis and significant test.

Data

Our data set comes from a point of sale (POS) system from the retail stores of a manufacturer of medical consumables under a non-disclosure agreement. The time horizon ranges from January 1, 2012 to July 20, 2014. We present some descriptive statistics on the data set in Table 2.

Table 2.

Descriptive statistics of all data.

	Total	Max	Min	Average	Variance	=0	=1	=2	>2	>10	>50
Count	48,088,695	95	0	0.0994	0.3964	44,173,548	3,365,236	384,303	165,608	1647	25
Ratio	100%					91.86%	7.00%	0.80%	0.34%

“=1” means the number of data that the value is equal 0, “>2” means the number of data that the value is bigger than 2.

Table 2 shows that 91.86% of the total sales records have a zero demand. According to demand classification in Figure 2, we can show the distribution of demand patterns in our data set in Figure 4. From this figure, we can know that the majority of the demands in the data set are intermittent. Thus, we have to aggregate the daily sales records of each SKU at the supply chain level. After aggregation, each record represents the total sales of a certain SKU in all retail stores. Then the descriptive statistics of the aggregated data is presented in Table 3.

Figure 4.

The distribution of demand patterns in the data set.

Table 3.

Descriptive statistics of the aggregated data at the supply chain level.

	Total	Max	Min	Average	Variance	=0	=1	=2	>2	>10	>50	>100
Count	7,723,023	571	0	0.6186	2.2296	5,702,674	1,238,702	365,052	416,595	62,927	1089	61
Ratio	100%					73.84%	16.04%	4.73%	5.39%

“=1” means the number of data that the value is equal 0, “>2” means the number of data that the value is bigger than 2.

For the aggregated data, we need to pay attention to the following three aspects. First, around three-quarter of the sales records are still zero and thus data is still sparse. Second, that the maximum demand is 571 implies the existence of the outliers. Third, the fact that many records have large demand sizes may be caused by the seasonal factors or sales promotion. Therefore, we drop outliers and conduct seasonal adjustments before forecasting the demand.

Data preprocessing

The data preprocessing includes deleting outliers and records with missing values. For simplicity, Tukey’s Boxplot⁴⁹ is adopted to determine whether a certain value is an outlier. Moreover, we represent the outliers by the mean of its before and after value. Then, we employ the multiplicative seasonal decomposition approach,⁵⁰ which is the most common way, to conduct seasonal adjustment.⁵⁰ Lastly, we normalize the sales time series for each SKU separately using the unity-based normalization method.

Implementations

Before applying seven methods discussed in Section 3, we divide the data sets into training and test data sets. The former 66% of the data is the training data set, and the remaining 34% is the testing data set. Since the forecasted values by these methods are decimal numbers, we round them to their nearest integer. When calculating the three-performance indicators, we consider both the forecasted decimal values and the rounded integral values. Lastly, encompassing test is further employed to indicate the capabilities of different models.

Results and analysis

To our knowledge, no other studies have discussed the influence of different formats of results on the accuracy of forecasting methods. The demand forecasted by advanced statistical learning and statistical methods is commonly presented in decimal number. However, only integral demand of SKUs can be ordered or delivered. So, in practice the decimal numbers need to be rounded to their nearest integers. When the demand is large, the difference between decimal and integral numbers has little effect on the result. But this difference matters for short life cycle products as the demand size of such products is usually very small (e.g. most of sales quantities are 0, 1 or 2). By experiments, we demonstrate that different result formats can lead to different accuracies and thus affect the selection of the appropriate forecasting methods. The computational results produced by the seven methods are shown in Table 4. Columns 2-4 display the decimal values of the RMAPE, RSMAPE, and MASE, respectively, associated with the forecasted decimal results. Columns 5-7 report the corresponding accuracies associated with the integral results. Figure 5 presents the results in Table 4 intuitively. According to the measurement of RMAPE and MAE, the best model is sMDL-NN. In our problem, since a lot of demand is zero, and RMAPE and SMAPE cannot give managers the intuitive judgment based on the results. Thus, MAE is the most accurate measurement in this problem, which can directly provide results for managers, and managers can make decisions accordingly.

Table 4.

Performance for different models.

Model	Double			Integer
Model	RMAPE(%)	RSMAPE(%)	MAE	RMAPE(%)	RSMAPE(%)	MAE
S.B.	65.54	70.52	0.9781	65.07	69.94	0.9583
GM	62.27	63.04	0.9070	61.28	63.86	0.8961
SVM	70.49	134.16	1.0975	73.62	146.99	1.1217
MS	99.68	173.01	2.8984	99.68	173.01	2.8964
ELM	75.63	146.93	1.1334	75.76	147.63	1.1350
NN	60.85	46.21	1.6092	56.25	46.26	1.7960
sMDL-NN	45.62	60.58	0.8297	39.51	52.14	0.7993

The bold number means the best performance in corresponding column.

Figure 5.

Performance for different models.

Encompassing test

In this study, to illustrate the ability of different models in Table 4 further, we introduce the encompassing test^51,52 to determine whether forecast encompassing between different models occurs. According to Chong-Hendry forecast encompassing test, if model $M 1$ encompasses $M 2$ on forecasting performance, then M1 can explain information in the error associated with M2, while M2 cannot explain information in the error associate with M1. Which indicates the dominance of M1 and redundancy of M2. Based on the forecasting error in Table 4 and,^51,52 we give the results of encompassing test in Table 5.

Table 5.

Encompassing test for different models.

	S.B.	GM	SVM	MS	ELM	NN	sMDL-NN
S.B.		~	*	**	*	~	~
GM	***		*	**	*	~	~
SVM	~	~		**	~	~	~
MS	~	~	~		~	~	~
ELM	~	~	~	***		~	~
NN	~	~	~	~	~		~
sMDL-NN	**	~	**	***	***	~

***

p < 0.01, **p < 0.05, *p < 0.1, ~not significant.

The label in Table 5 indicates whether the model of corresponding row encompasses the model of corresponding column. For instance, GM encompasses S.B., but S.B. does not encompasses GM. In Table 5, our proposed model sMDL-NN has shown the strongest encompassment ability. However, no model can encompass all models in Table 5.

Conclusions

For the intermittent demand forecasting problem of medical consumables with a short life cycle, this study provides a dynamic neural network model based on optimized model selection procedure to select optimal structure of neural network. One critical issue we have to address is to avoid underfitting and overfitting, especially in our limited historical data, which is essentially the problem of model selection. An appropriate model selection criterion is introduced to determine the optimal model. In order to tackle the problem of erratic demand, dropping outliers, seasonal adjustment techniques and aggregation technique are introduced. In addition, a new forecasting accuracy estimator is proposed to improve the generalization capability of zero-demand data.

Six other benchmark methods are also applied, namely Syntetos and Boylan’s method, GM, SVM, MS, ELM and NN. Our experimental results show that the performance of our proposed method outperforms others. Our findings suggest that due to the complexity of sales data, managers should consider model selection, and sMDL-NN is an ideal candidate model to achieve this goal.

The numerical investigation is conducted at the supply chain level. Experimental results and encompassing tests indicate that our proposed sMDL-NN model is superior. The forecasting results of the proposed RMAPE are consistent with MAE.

Another important finding from our study is that different formats of forecasted sales values, such as decimal or integer, can lead to significantly different performance in small demand problems. Although it is common for forecasting methods to use decimal number sales values, in reality only integers can be used because we cannot order or deliver a part of the SKU.

Even this problem is a classical problem, it will not have much influence when the sales data is large. However, in the industry of short life cycle product sales, the demand is usually very small, and then, the forecast sales value of different formats of forecast methods may lead to different decisions. We find that the format of the values should not be ignored, and our experimental results show the gap between the different formats.

Intermittent demand management is a very complicated issue. Further research can be to integrate the forecasted results into inventory, logistics distribution, and pricing operations, such as markdown decisions, to develop models and algorithms to solve practical problems.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the National Natural Science Foundation of China (Nos. 71531009).

ORCID iD

Peipei Liu

References

Nenni

Giustiniano

Pirolo

Demand forecasting in the fashion industry: a review. International Journal of Engineering Business Management 2013; 5: 37.

Basallo-Triana

Rodríguez-Sarasty

Benitez-Restrepo

HD.

Analogue-based demand forecasting of short life-cycle products: a regression approach and a comprehensive assessment. Int J Prod Res 2017; 55(8): 2336–2350.

Chatfield

Hayya

JC.

All-zero forecasts for lumpy demand: a factorial study. Int J Prod Res 2007; 45(4): 935–950.

Huang

Chang

Chou

YC.

Demand forecasting and smoothing capacity planning for products with high random demand volatility. Int J Prod Res 2008; 46(12): 3223–3239.

Johnston

Boylan

JE.

Forecasting for items with intermittent demand. Journal of the operational research society 1996; 47(1): 113–121.

Teunter

Syntetos

Babai

MZ.

Intermittent demand: linking forecasting to inventory obsolescence. European Journal of Operational Research 2011; 214(3): 606–615.

Willemain

Smart

Schwarz

HF.

A new approach to forecasting intermittent demand for service parts inventories. International Journal of forecasting 2004; 20(3): 375–387.

Thomassey

Sales forecasts in clothing industry: the key success factor of the supply chain management. International Journal of Production Economics 2010; 128(2): 470–483.

Thomassey

Sales forecasting in apparel and fashion industry: a review/Intelligent fashion forecasting systems: models and applications. Berlin, Heidelberg: Springer, 2014, pp. 9–27.

10.

Chong

Tang

CS.

A modeling framework for category assortment planning. Manufacturing & Service Operations Management 2001; 3(3): 191–210.

11.

Choi

Fashion retail forecasting by evolutionary neural networks. Int J Prod Econ 2008; 114(2): 615–630.

12.

Choi

KF.

A hybrid SARIMA wavelet transform method for sales forecasting. Decis Support Syst 2011; 51(1): 130–140.

13.

Ren

Choi

Liu

Fashion sales forecasting with a panel data-based particle-filter model. IEEE Transactions on Systems, Man, and Cybernetics: Systems 2014; 45(3): 411–421.

14.

Winters

PR.

Forecasting sales by exponentially weighted moving averages. Management science 1960; 6(3): 324–342.

15.

Sun

Choi

, et al. Sales forecasting using extreme learning machine with applications in fashion retailing. Decis Support Syst 2008; 46(1): 411–419.

16.

Xia

Zhang

Weng

, et al. Fashion retailing forecasting based on extreme learning machine with adaptive metrics of inputs. Knowledge-Based Systems 2012; 36: 253–259.

17.

Choi

Hui

, et al. Color trend forecasting of fashionable products with very few historical data. IEEE Trans Syst Man Cybern C (Applications and Reviews) 2011; 42(6): 1003–1010.

18.

Wong

Guo

ZX.

A hybrid intelligent model for medium-term sales forecasting in fashion retail supply chains using extreme learning machine and harmony search algorithm. International Journal of Production Economics 2010; 128(2): 614–624.

19.

Choi

Hui

Liu

, et al. Fast fashion sales forecasting with limited data and time. Decis Support Syst 2014; 59: 84–92.

20.

Nikolopoulos

Syntetos

Boylan

, et al. An aggregate–disaggregate intermittent demand approach (ADIDA) to forecasting: an empirical proposition and analysis. Journal of the Operational Research Society 2011; 62(3): 544–554.

21.

Rostami-Tabar

Babai

Syntetos

, et al. A note on the forecast performance of temporal aggregation. Naval Research Logistics (NRL) 2014; 61(7): 489–500.

22.

Syntetos

Babai

Boylan

, et al. Supply chain forecasting: theory, practice, their gap and the future. European Journal of Operational Research 2016; 252(1): 1–26.

23.

Croston

JD.

Forecasting and stock control for intermittent demands. Journal of the Operational Research Society 1972; 23(3): 289–303.

24.

Syntetos

Boylan

JE.

On the bias of intermittent demand estimates. International journal of production economics 2001; 71(1–3): 457–466.

25.

Liu

Ren

Choi

, et al. Sales forecasting for fashion retailing service industry: a review. Mathematical Problems in Engineering 2013; 2013.

26.

Harrison

PJ.

Exponential smoothing and short-term sales forecasting. Management Science 1967; 13(11): 821–842.

27.

Yelland

Dong

Forecasting demand for fashion goods: a hierarchical Bayesian approach/Intelligent fashion forecasting systems: models and applications. Berlin, Heidelberg: Springer, 2014, pp. 71–94.

28.

Kostenko

Hyndman

RJ.

A note on the categorization of demand patterns. Journal of the Operational Research Society 2006; 57(10): 1256–1257.

29.

Syntetos

Boylan

Croston

JD.

On the categorization of demand patterns. Journal of the operational research society 2005; 56(5): 495–503.

30.

Heinecke

Syntetos

Wang

Forecasting-based SKU classification. International Journal of Production Economics 2013; 143(2): 455–462.

31.

Petropoulos

Nikolopoulos

Spithourakis

, et al. Empirical heuristics for improving intermittent demand forecasting. Industrial Management & Data Systems 2013.

32.

Eaves

AHC

Kingsman

. Forecasting for the ordering and stock-holding of spare parts. Journal of the Operational Research Society 2004; 55(4): 431–437.

33.

Kourentzes

On intermittent demand model optimisation and selection. International Journal of Production Economics 2014; 156: 180–190.

34.

Small

Tse

CK.

Minimum description length neural networks for time series prediction. Physical Review E 2002; 66(6): 066701.

35.

Rissanen

Modeling by shortest data description. Automatica 1978; 14(5): 465–471.

36.

Burnham

Anderson

. Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed. New York: Springer, 2002, p. 488.

37.

Judd

Mees

On selecting models for nonlinear time series. Physica D: Nonlinear Phenomena 1995; 82(4): 426–444.

38.

Rissanen

. Stochastic complexity and modeling. The annals of statistics 1986: 1080–1100.

39.

Schwarz

Estimating the dimension of a model. The annals of statistics, 1978; 6(2): 461–464.

40.

Syntetos

Boylan

. The accuracy of intermittent demand estimates. International Journal of Forecasting 2005; 21(2): 303–314.

41.

Syntetos

Boylan

JE.

On the variance of intermittent demand estimates. International Journal of Production Economics 2010; 128(2): 546–555.

42.

Prestwich

Rossi

Armagan Tarim

, et al. Mean-based error measures for intermittent demand forecasting. Int J Prod Res ; 52(22): 6782–6791.

43.

Imbalanced learning: foundations, algorithms, and applications. Hoboken: John Wiley & Sons, 2013.

44.

Julong

Introduction to jud system theory. The Journal of grey system 1989; 1(1): 1–24.

45.

Hsu

Chen

CY.

Applications of improved grey prediction model for power demand forecasting. Energy Conversion and management 2003; 44(14): 2241–2249.

46.

Huang

Zhu

Siew

. Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 25–29 July 2004, pp. 985–990. IEEE.

47.

Vapnik

VN.

An overview of statistical learning theory. IEEE transactions on neural networks 1999; 10(5): 988–999.

48.

Hamilton

. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica: Journal of the Econometric Society 1989: 357–384.

49.

Tukey

JW.

Exploratory data analysis. Reading, Mass.: Addison-Wesley, 1977.

50.

Wheelwright

Makridakis

Hyndman

. Forecasting: methods and applications. New York: John Wiley & Sons, 1998.

51.

Fair

Shiller

. The informational content of ex ante forecasts. The Review of Economics and Statistics 1989: 325–331.

52.

Chong

Hendry

DF.

Econometric evaluation of linear macro-economic models. The Review of Economic Studies 1986; 53(4): 671–690.

Intermittent demand forecasting for medical consumables with short life cycle using a dynamic neural network during the COVID-19 epidemic

Abstract

Keywords

Introduction

Literature review

Data aggregation

Demand forecasting

Model selection

Methodologies

Our proposed demand forecasting model

Basic forecasting model

Criterion of optimal model

Optimized model selection procedure

The framework of demand forecasting

Forecasting accuracy measurements

Competing methods

Numerical investigation

Data

Data preprocessing

Implementations

Results and analysis

Encompassing test

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References