Abstract
Demand forecasting in the biomedical area is becoming more important because of radical changes in the macroeconomic environment and consumption trends. Moreover, the need for big data analysis on data from wireless sensor networks and social media is increasing because it shows not only the rapidly changing environmental data such as fine dust concentration but also the responses of potential customers that are expected to affect the demand for a medicine. Therefore, demand forecasting models based on data analysis in wireless sensor networks and topic modeling of buzzwords in blog documents were suggested in this study. First, we analyzed topics of documents from blogs that describe the symptoms of certain diseases related to selected medicines. Thereafter, we extracted topic trends for a selected period and constructed demand forecasting models that consist of topic trends, environmental data from wireless sensor networks, and time-series sales data. The experiment results show that topic trends about medicines significantly affect the performance of demand forecasting for these medicines.
1. Introduction
Effective data analysis and prediction for future situations are expected to present strategic direction for organizations and provide implications for decision-making processes [1]. Furthermore, demand for the analysis and utilization of big data in the medicine industry is increasing as the quantity of related data increases rapidly. Social media [1–3] and wireless sensor networks [4, 5] are good examples of such data sources. Environmental monitoring is one of the important categories for sensor network applications [4], and the environmental data such as fine dust concentration are expected to affect the demand for medicines. However, existing studies about demand forecasting with social media are insufficient, in spite of increasing demand, because most have focused on simple data analyses. In particular, there is limited demand forecasting for medicines using exogenous variables, such as the data from wireless sensor networks and social media, even though there are quite plausible possibilities that the change of environmental data and trends in social media affect the demand for medicines [6].
Recently, social network services, such as Facebook and Twitter, have become indispensable to the online activities of people, and they produce a huge amount of data on individuals' interests in everyday life [7]. Because social media data can be acquired in real time and reflect the interests of people, they are expected to be effective exogenous variables of demand forecasting for medicine [6, 7]. Environmental data such as fine dust concentration are also expected to be exogenous variables because there are close relations between environment and human health.
Most existing studies tend to forecast demand with patterns of time-series data of medicine consumption and sales [8]. However, demand is affected by many exogenous factors. For example, it is known that the number of people diagnosed with retinopathy is increasing because people are increasingly using various types of small displays, such as smartphones. The objective of our research is to propose a demand forecasting model by incorporating environmental data from sensor networks and topic trend analysis [9] into a VARX model [10, 11]. Topic trends are produced from the results of topic modeling [12], which analyzes the social media documents that describe the symptoms of certain diseases related to selected medicines. We expect that our proposed model will overcome the limits of using only sales data in demand forecasting for medicines.
This paper is organized as follows: in Section 2, we review previous demand forecasting works, vector regressive models with exogenous variables, and topic trend analysis. Section 3 describes demand forecasting procedure through topic trend analysis. Moreover, demand forecasting results are shown. Lastly, Section 4 presents our conclusions.
2. Related Studies
2.1. Demand Forecasting
There is a variety of opinions on categorizing forecasting techniques. One of the prevalent opinions involves categorizing techniques into qualitative and quantitative methods. The latter are categorized again into time-series analysis and causal forecasting methods [13, 14].
Qualitative forecasting techniques are applied mainly to long-term forecasting. Market potential is affected by external environmental changes; thus, demand is predicted based on the subjective judgment of experts.
Time-series analysis techniques for demand forecasting determine the pattern of time-dependent demand in the past and predict future demand by extending the pattern. The basic assumption of the techniques is market stability, in which future demand patterns are assumed to persist as they have in the past. This scheme of prediction is useful for short-term prediction and makes an accurate prediction possible with relatively little data available. A moving average, exponential smoothing, the method of least squares, and the Box–Jenkins method are examples of time-series analysis.
Causal forecasting methods identify environmental factors that affect demand and identify causal relationships between these factors in order to predict future demand. Regression analysis, econometric models, input-output analysis, and the leading indicator method are examples of causal forecasting methods.
There are many studies about demand forecasting in various domains. For example, Lo et al. [15] suggest a demand forecasting method for the liquid crystal display monitor market and Fildes et al. [16] positively analyze the modification of forecasting models. Chen et al. [17] suggest a demand forecasting model for fresh foods. Smith and Mentzer [18] study the relation of forecasting precision and logistics outcomes. Most research about demand forecasting uses exponential smoothing, time-series models, growth curve models, and trend analysis but does not use exogenous variables from social media, especially in the medicine industry [15].
2.2. Vector Autoregressive Models with Exogenous Variables
In this study, an autoregressive model with exogenous variables (ARX) is applied by using data related to demand forecasting as exogenous variables. ARX is a time-series analysis using exogenous variables, which is one of the quantitative prediction methods. The vector autoregressive (VAR) model is widely used in empirical research in macroeconomics, in particular as a prediction model [19]. We use VAR models with exogenous variables (VARX) [20] with the results of topic trend analysis [9] for demand forecasting. VARX are employed widely for factor analysis and forecasting by using more than one endogenous variable and more than one exogenous variable [10, 11]. It can simultaneously analyze the impact of all variables on other variables, and thus it is highly adaptable to structural changes. At the same time, it can analyze the impact of exogenous variables on endogenous variables.
The following shows the VARX model used in our research:
2.3. Topic Modeling and Topic Trend Analysis
Topic modeling is a useful tool for knowledge discovery in the field of text mining [12]. The main purpose is discovering valuable patterns from a huge collection of documents. A pattern is the vector of words, which is called a topic. Topic modeling produces a statistical set of these topics. Highly relevant words are tied into a topic, which is the probability distribution of words in documents. The basic idea of topic modeling is that one can expect particular words to appear more frequently in a document if the document has a particular topic. Therefore, we can statistically guess and extract topics, including probabilities, from the series of words of the document. Latent Dirichlet allocation (LDA) [21], which is the most representative algorithm of topic modeling, is adopted in our research. In LDA, each document is assumed as a particular set of topics. The only observable variables are the specific words in documents. LDA extracts the latent variables of the topic distribution and word distribution for topics through statistical inferences based on Dirichlet multinomial distribution.
Topic trends provide a changing trend of topics over time. Matsubara et al. [22] insist that LDA does not directly provide topic trends in studies that predict human behavior of web clicks. However, we propose a method that provides topic trends by conducting LDA for each predefined unit period.
There are several studies that tried to apply topic modeling and trends in various domains. Bolelli et al. [23] proposed a model for mining distinct topics in documents of digital library. They discovered topics in each segment of times segments of the document and tried to show that the results could effectively detect the evolution of the topics over time. Martie et al. [24] tried to find topic trends in the Android bug reports. They observed bug discussion trends in the public issue trackers in order to analyze the development of the Android open source projects. The analysis results can be used in resource allocation.
3. Demand Forecasting Procedure through Topic Trend Analysis
The first step in the procedure of our method involves document collection and refinement, the second step is topic trend analysis through LDA, and the final step is demand forecasting through VARX models. We conduct experiments that compare the results of an autoregressive model with only time-series sales data and a VARX model that includes topic trends in order to confirm the effect of topic trends on demand forecasting.
3.1. Document Collection and Refinement
We gather sales data from a pharmaceutical company, which consist of monthly sales of four medicines for 44 months ranging from January 2010 to August 2013. The four selected medicines are denoted as AD, AX, UT, and TF. The next step is gathering documents from Naver blogs (http://section.blog.naver.com/), which are among the most popular Korean multiblog sites. We extract the names, symptoms, and indicants of four medicines as keywords in order to search for documents containing these keywords in the stipulated period. Furthermore, we collect fine dust concentration data ranging from January 2010 to august 2013 from Korean Statistical Information Service (http://kosis.kr/statHtml/statHtml.do?orgId=106&tblId=DT_106N_03_0200045&vw_cd=MT_ZTITLE&list_id=E1_106_A001&seqNo=&lang_mode=ko&language=kor&obj_var_id=&itm_id=&conn_path=E1) for environmental data collected from sensor networks. We use a morphological analyzer (KLT version 2.0) [25] to refine the documents and eliminate stop words. The results of the refinement step are modified documents composed of only meaningful nouns.
3.2. Topic Trend Analysis through LDA
The purpose of topic modeling in the research is to extract topics that are useful in demand forecasting for medicines. We use a Mallet package [26], which is a Java-based topic modeling tool. We produce 100 topics for each medicine as a result of topic modeling. Thereafter, we determine the five most meaningful topics with higher weights for each medicine by clustering the 100 topics. Table 1 shows the selected topics for the medicine AD. Tables 2, 3, and 4 show the topics for AX, UT, and TF, respectively.
Selected topics for AD.
Selected topics for AX.
Selected topics for UT.
Selected topics for TF.
We conduct topic trend analysis by averaging topic weights over the documents of the same month. Therefore, the results of the analysis are the time-series vectors for the period of 44 months (see Tables 1, 2, 3, and 4).
3.3. Demand Forecasting through VARX
We use “fastVAR package of R” [27] to forecast the demand of four medicines with time-series sales data and topic trends over 44 months. In detail, we set the sales data ranging from January 2010 to August 2013 as endogenous variables and the moving averages of the topic trends for six months over the same period as exogenous variables. From the pretest using all exogenous variables consisting of monthly fine dust concentration and topic trends, the results show that fine dust concentration does not significantly affect the demand, so it was eliminated from main experiments.
Moreover, we conduct multicollinearity analysis and eliminate exogenous variables that have similar patterns. We construct two models to test the effect of exogenous variables. The first (Model 1) is a pure autoregressive model without exogenous variables and the second (Model 2) is our suggested VARX model [10]. Figures 1, 2, 3, and 4 show the graphs of forecasting results for the medicines AD, AX, UT, and TF, respectively. The graphs show changes of sales over time for three cases, which are actual values, forecasting the results of Model 1 and Model 2. The forecasting results start six months after January 2010 because six-month data are required for the models.

Forecasting result for AD medicine.

Forecasting result for AX medicine.

Forecasting result for UT medicine.

Forecasting result for TF medicine.
Table 5 shows the prediction error rates of four medicines for each model. In the results, the error rates of two medicines (AD, AX) are almost the same. This means that demand forecasting for these medicines is not affected by the responses of social media. However, the error rates of the other medicines (UT, TF) are significantly enhanced by adding topic trends in the model. We believe that this happened because UT and TF are more sensitive to the change of environment and season, while AD and AX are not. Actually, AD and AX are medicines for hypertension and stomach ulcer, respectively, and UT and TF are for prostate and eye disease, respectively. The average error rate of Model 2 is lower than that of Model 1. These results show that topic trends enhance the performance of demand forecasting, as we expected.
Error rates of forecasting results.
4. Conclusion
The responses of customers in social media are expected to be a good signal for demand forecasting. We suggested a demand forecasting method that includes such customer responses in time-series analysis of sales. Topic modeling was used to capture the topics of documents from blogs and topic trends were calculated for each month. The trends were added in VARX models as exogenous variables to establish the effects of topic trends on sales prediction. The results showed that topic trend analysis enhanced the performance of demand forecasting for medicines.
There are two aspects of our contribution. The first is the theoretical contribution. We expect our model to contribute to building new types of demand forecasting models by using the responses of customers that appeared in social media. The model can be applied to various domains that require analysis of social factors from social media. The other aspect is the practical implication. As mentioned before, the performance of demand forecasting for medicines was enhanced through the topic trend analysis suggested in our research.
One of the limitations of our research is that we used only one variable, fine dust concentration for environmental data of the model, and it does not significantly affect the demand for medicines. We guess that the results closely depend on the characteristics of medicines. If we had used sales data of medicine related to a respiratory disease, the results would have been different. For our future research, we are planning to collect more diverse environmental data from sensor networks and sales data of various medicines for more generalized models.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work was supported by the Ajou university research fund of 2012-2013.
