Sage Journals: Discover world-class research

Abstract

In today's competitive market environment, it is vital for companies to gain insight about competitors' new product launches. Past studies have demonstrated the predictive value of prerelease online search traffic (PROST) for new product forecasting. Relying on these findings and the public availability of PROST, we investigate its usefulness for estimating sales of competing products. We propose a model for predicting the success of competitors' product launches, based on own past product sales data and competitor's prerelease Google Trends. We find that PROST increases predictive accuracy by more than 18% compared to models that only use internally available sales data and product characteristics of video game sales. We conclude that this inexpensive source of competitive intelligence can be helpful when managing the marketing mix and planning new product releases.

Keywords

competitive intelligence Google Trends market analysis new product forecasting

INTRODUCTION

In today's competitive market environment, predicting competitors' actions is a key priority for senior management (Crayon SCIP, 2022). At the same time, decision makers often choose to not react to competitor activities (Steenkamp et al., 2005), partially because of the high cost of obtaining predictive competitive intelligence (CI) and the uncertainty around such information (Montgomery et al., 2005). While the Internet and Big Data have facilitated data collection and observation of competitors' actions (Calof & Wright, 2008; Feng & Shanthikumar, 2018; Fleisher, 2008; Teo & Choo, 2001) from sources like user‐generated content, this research is predominantly descriptive (e.g., Gutt et al., 2019; Netzer et al., 2012; Xu et al., 2011) or the models lack decision metrics, such as sales (Silva et al., 2019). Moreover, the literature on leading indicators either focuses on user‐generated content (e.g., Boone et al., 2018; Cui et al., 2018; Lau et al., 2018), clickstream traffic (e.g., Huang et al., 2014), or incorporates sources such as weather information (Steinker et al., 2017) and economic indicators (Sagaert et al., 2018) to predict demand.

One of the main drivers of disruption is new product introductions (Palacios Fenech & Tellis, 2016; Peres et al., 2010) and insights on potential success are crucial to determine market share and defensive strategies (e.g., Kumar et al., 2020; Roberts et al., 2005). Predicting the success of new products prior to launch is a challenging task (Goodwin et al., 2013; Trusov et al., 2013), but recent studies suggest that prerelease buzz (PRB) information can substantially improve new product forecasting (e.g., Kim et al., 2015; Schaer et al., 2019b; Xiong & Bharadwaj, 2014). PRB represents the aggregated anticipated interest of consumers toward a new product (Houston et al., 2018) and has been collected, for example, in the forms of online search traffic information (e.g., Schaer et al., 2019b; Tian et al., 2014), blogs (e.g., Dhar & Chang, 2009; Kim et al., 2015), or microblogs (e.g., Asur & Huberman, 2010; Gelper et al., 2015). It differs from postrelease signals where both pre‐ and postconsumption behavior occurs (Houston et al., 2018). Nonetheless, there are several examples that show it is more difficult to extract a clear leading signal from postrelease information (for an overview, see Schaer et al., 2019a).

Companies can easily obtain PRB information for competitor products. This is, in particular, true for pre‐release online search traffic (PROST) that is available through Google Trends or Baidu. Not only is this information freely accessible but the platforms also report historic data that simplify the data collection process and is a frequently used data source in the literature (for an overview, see Schaer et al., 2019b). This makes it attractive to apply PROST information into new product forecasting for CI, so as to infer the success of competitors' products. Predicting competitors' success can provide vital insights to help allocate resources (Kumar et al., 2020) and time new product releases (Schoenherr & Swink, 2015; Sun & Kumar, 2020).

Studies that include PRB information have focused on own product sales only (Schaer et al., 2019b) or they have estimated and predicted new product sales from a data set that includes products from multiple brands (e.g., Divakaran et al., 2017; Onishi & Manchanda, 2012). While the latter typically provides better estimates, it does not reflect a real‐world environment in which sales information is only available for own brands. This questions the realism and feasibility of the reported value of these estimates. In the analysis that follows, we use the word “brand” as a synonym for companies, publishers, or producers. We distinguish between internal and competitor information; brands typically have unconstrained access to their own sales history and marketing activities, but external competitor insights are constrained to only what is publicly available. Assuming some degree of homogeneity among competing products, that is, similar product characteristics, a key question is whether competitor PROST can be used together with internal sales data to infer competitor success.

To understand the potential of PROST, in this study, we focus on video games that have a short life cycle and exhibit intense competition. We find that competitor PROST information improves predictions of competitors' new video games market potential by more than 18% compared to models without PROST information. Furthermore, splitting the data by brands is as effective as data‐driven clustering, which supports our homogeneity assumption for the video games market.

Ranjan and Foropon (2021) find that organizations are rarely tapping into Big Data for CI. Market analysts often track competitors' activities in a nonsystematic fashion, based on informal processes relying on unstructured judgment. Feedback that the authors received from the data science team at 2K Games, a video game publisher with more than $3bn revenue, confirms this. The team indicated the need to better understand the competitive market dynamics to support managers' decisions on own product launches and to counteract by adjusting marketing parameters like price and ad spend. Moreover, the improved predictive capabilities could help acquire and retain players and achieve key company goals, like determining return on investment (ROI). Our proposed approach supports this by directly linking PROST with sales. Furthermore, contrary to models without PROST, the predicted values may be incorporated into marketing‐mix models (Luan & Sudhir, 2010).

In this paper, we first review and examine how the literature on new product forecasting with PRB information has addressed competition. Next, in Section 3, we describe our approach for forecasting the market potential of competitors' products using own sales and competitors' PROST. Section 4 evaluates the predictive performance for video game sales using PROST from Google Trends. Finally, Section 5 discusses the findings and practical managerial implications, with conclusions following in Section 6. Our primary contribution is to demonstrate the value of PROST in fulfilling companies' key needs. When companies use appropriate methods, like PROST, they can more accurately understand the competition and predict sales, giving them a competitive edge.

PREDICTING NEW PRODUCTS WITH PRB AND COMPETITOR INFORMATION

The efficiency of predicting the success of a new product using PRB information has been well researched, as Table 1 illustrates. The majority of scholars focused on box office sales; others looked at the sales of music albums (Dhar & Chang, 2009; Hann et al., 2011), alpine skis (Mülbacher et al., 2011), and video games (Schaer et al., 2019b; Xiong & Bharadwaj, 2014). In their studies, they used a variety of PRB sources including forums (e.g., Craig et al., 2015; Liu, 2006), blogs (e.g., Divakaran et al., 2017; Onishi & Manchanda, 2012), Twitter (e.g., Asur & Huberman, 2010; Gelper et al., 2015), and Facebook (Ding et al., 2017; Kim et al., 2017). Another major PRB source is online search traffic available through Google Trends (e.g., Kim, 2021; Kim & Hanssens, 2017; Kulkarni et al., 2012) or Baidu (Tian et al., 2014). Studies that forecast with online information, including PRB, stem from a broad range of disciplines and therefore many lack adherence to well‐established forecasting principles, as highlighted by Schaer et al. (2019a). Most notably they lack hold‐out‐sample validation (column 3, Table 1). Moreover, researchers routinely benchmark their PRB models against same‐model families but omit testing with a naïve model, making it difficult to compare across studies. Nevertheless, we conclude that PRB substantially improves prerelease estimation when including PRB as volume or valance.

TABLE 1

Summary of literature on forecasting with prerelease buzz (PRB)

Study	Buzz predictor	Estimation/hold‐out	Company control	# series	Model	Target variable	Buzz measure	Horizon
This study	GTD	i / c	b / ‐	240	FR	Video game	Vol.	PLC
Kim (2021)	GTD	x/‐	‐ / ‐	154	LR	Box office	Vol.	1 w
Schaer et al. (2019b)	GTD	i / i	‐ / ‐	255	ALFC	Video game	Vol.	PLC
Kulkarni et al. (2012)	GTD	x / x	‐ / m	61	LR	Box office	Vol.	1 w
Goel et al. (2010)	GTD	x / ‐	‐ / ‐	106	LR	Video games	Vol.	1 m
	GTD	x / ‐	‐ / ‐	119	LR	Box office	Vol.	1 w
	GTD	x / ‐	‐ / ‐	307	LR	Music album	Vol.	1 w
Kim and Hanssens (2017)	GTD; BLG	x / x	‐ / ‐	137	LR	Box office	Vol.; Vol.	1 w
Tian et al. (2014)	BAU	x / x	b / m	92	ML	Box office	Vol.	1 d
Xiong and Bharadwaj (2014)	GTD; BLG; FOM	x / x	‐ / m	681	FR	Video games	Vol.	3 w
Dhar and Chang (2009)	BLG	x / ‐	‐ / ‐	108	LR	Music album	Vol.	3 w
Kim et al. (2015)	BLG	x / x	‐ / ‐	212	ML	Box office	Vol.	1 w
Divakaran et al. (2017)	BLG	x / x	‐ / m	373	PR	Box office	Vol.	1 w
Onishi and Manchanda (2012)	BLG	x / x	b / ‐	1729	SEM	Box office	Vol.; Val.	1 d
	BLG	x / x	b / ‐	90	SEM	Cell phone service	Vol.; Val.	1 d
Craig et al. (2015)	FOM	x / ‐	‐ / ‐	62	LR	Box office	Vol.	1 w
Mülbacher et al. (2011)	FOM	x / ‐	‐ / ‐	10	LogR	Skis	Vol.; Val.	1 y
Liu (2006)	FOM	x / ‐	b / m	40	LR	Box office	Vol.; Val.	1 w
Wang et al. (2010)	FOM	x / x	‐ / ‐	51	BC	Box office	Vol.	2 w
Asur and Huberman (2010)	TWR	x / ‐	‐ / ‐	24	LR	Box office	Vol.; Val.	1 d
Gelper et al. (2015)	TWR	x / x	‐ / ‐	106	LR	Box office	Vol.; Val.	1 d
Ding et al. (2017)	FBK	x / ‐	‐ / ‐	64	LR	Box office	Vol.	1 w
Hann et al. (2011)	P2P	x / x	‐ / ‐	172	FR	Music album	‐	1 w
Kim et al. (2017)	SNS	x / x	‐ / m	175	ML	Box office	Vol.; Val.	PLC
Foutz and Jank (2010)	VSX	x / x	b / ‐	262	LR	Box office	‐	1 w

Abbreviations: BAU, Baidu; BLG, Blog; FBK, Facebook; FOM, Forum; GTD, Google Trends; P2P, Peer‐to‐Peer Network; SNS, Social Network Services; TWR, Twitter; VSX, Virtual Stock Exchange; x, cross‐brands; i, intrabrand; c, competitor; ‐, no hold‐out evaluation; b, brand variable; m, market variables; ALFC, analogy life cycle curves; BC, Bass curve; FR, functional regression; LogR, logistic regression; LR, linear regression; ML, machine learning; SEM, structural equation model; Vol., volume‐based PRB measure; Val., valence‐based PRB measure; d, daily; w, weekly; m, monthly; PLC, product life cycle.

In Table 1 the column estimation/hold‐out shows that most studies estimate and evaluate their models across multiple brands. While such a cross‐brand approach has the advantage of evaluating the effects of PRB on a richer data set, it is somewhat impractical in terms of operational decision support, as it does not reflect what might be seen in practice, where companies typically only have access to sales data of their own products. Any additional competitor information would need to be sourced, typically via market research agencies that may come at a hefty price. Alternatively, PRB is available at a relatively low cost and has been shown to provide significant accuracy improvements, even with a smaller intrabrand sample (Schaer et al., 2019b). However, its potential remains unexplored for competitor products. Furthermore, other brand‐related variables are not straightforward to use for intrabrand‐based estimation (Dhar & Chang, 2009; Foutz & Jank, 2010; Onishi & Manchanda, 2012) and their predictive information has been questioned (Foutz & Jank, 2010). This possibly explains why only limited studies have included marktet information, as the column company control depicts.

The general scholarly view is that, in addition to own brand strength, the success of a company's entertainment product largely depends on its competition (see Hennig‐Thurau & Houston, 2019). However, there are mixed findings regarding the relevance of this variable. For example, studies that measured competition by the number of competitors' products released during the same period found an insignificant competition effect on sales (Divakaran et al., 2017; Kulkarni et al., 2012; Liu, 2006; Xiong & Bharadwaj, 2014). Contrary to this finding, Kim et al. (2017) reported substantial gains in forecast accuracy when including a broader set of competition variables, such as the number of seats and screens for top movies. However, none of these studies used any competitor PRB information.

Research suggests that PRB is impacted by advertising expenditure, genre (Xiong & Bharadwaj, 2014), and whether it refers to a sequential or nonsequential product (Craig et al., 2015; Kim, 2021). For movies, Divakaran et al. (2017) reported significant effects from the cast's star power. Xiong and Bharadwaj noted that for video games there is a brand effect, but it is only significant in the early prerelease phase, it vanishes closer to the release as more details about the video game emerge. This illustrates that PRB is a wide measure that carries various informative dimensions, which are difficult to measure directly for competitors.

Another stream of CI literature investigated how mining user‐generated content can identify competitors (Abrahams et al., 2013; Li & Netessine, 2012) or compared customer reviews (Xu et al., 2011), interactions (Chau & Xu, 2012; Netzer et al., 2012), product ratings (Gutt et al., 2019), prices (Carta et al., 2019), and brand reputations (Rust et al., 2021; Silva et al., 2019). Other studies, such as He et al. (2013, 2015), developed benchmarks to assess own and competitors' social media performance. However, we are unaware of specific research for the prerelease phase. Gopinath et al. (2013) analyzed blog market coverage of movie studios, but not in a predictive context.

This motivated our investigation into the extent to which PRB contains predictive value for competitor's products, with the objective of further enhancing CI insights. For this study, we focus on PROST information, as it is one of the most frequently used data sources. Nonetheless, it is easy to apply the same methodology to other PRB sources.

Scenario (i) in Figure 1 illustrates the standard approach used in most studies that investigate new product forecasting with PRB: All brands and their corresponding products are represented randomly in the training and test sample. This is ineffective for our research since we are interested in forecasting competitor sales from only own brand information (intrabrand), which is more realistic than this approach. We propose to split the data set according to scenario (ii), and formulate our first research question: Research Question 1

Does PROST improve the predictions of a competitor's new product sales, compared to non‐PROST models?

FIGURE 1

Different ways to split a data set

While this will allow us to investigate the impact of PROST, it does not demonstrate its efficiency compared to cross‐brand estimation. Therefore, we propose a second research question that investigates the information loss when restricting the training to intrabrand data: Research Question 2

Do PROST models with cross‐brand information outperform intrabrand PROST competitor models?

To answer this question, we alter the sampling methodology to the case shown in scenario (iii) in Figure 1, where public sales from multiple brands are available to assess predictions for a single competitor brand, increasing the training sample over the previous case. As we noted, in many cases, this may be infeasible in practice.

Strictly speaking, restricting the data set from the cross‐brand data to just intrabrand data is a crude form of clustering, that is, on a priori segmentation (see Morwitz & Schmittlein, 1992, for a discussion on forecasting with segmentation). This only works when the different brand products are homogeneous enough so that the internal products act as analogies for the competition. Using analogies is a common forecasting approach (e.g., Hu et al., 2019; Martínez‐de Albéniz et al., 2020). Research from the retail industry suggests there is often little brand segmentation (Hammon et al., 1996). If the products are indeed homogeneous, then splitting by brands should make little difference on forecasting performance, other than limiting the sample size, as indicated in Research Question 2. However, if the products differ, an alternative approach is to use data‐driven clustering and to split the data set into more homogeneous subsamples, or product segments, instead of brands, using the same training set as scenario (iii). Our third research question investigates the value of using product segments instead of brands. Research Question 3

Are predictions by brands superior to using product segments?

In summary, our first research question investigates the efficacy of PROST in inferring competitors' sales, while the other research questions explore the efficiency and conditions of good performance of the proposed model, while also taking into account the limited data availability in practice. Moreover, as suggested by Schaer et al. (2019a), we provide comparisons with findings from previous literature and other benchmarks.

PREDICTING THE MARKET POTENTIAL OF COMPETITORS' NEW PRODUCTS

To answer the outlined research questions, we use functional data analysis (FDA), a common way to predict new product sales (e.g., Foutz & Jank, 2010; Hann et al., 2011; Xiong & Bharadwaj, 2014). In contrast to classical regression, FDA does not regress the raw inputs directly on the target variable, but instead, for each observation, compresses vectors of values (curves) into scalars (Ramsay & Silverman, 2005). An advantage of this representation is that it allows us to mix variables of different lengths, that is, time series–based PROST information and product characteristics. This is achieved by reducing the vectors across time into scalars that summarize their characteristics. This is detailed in the next subsection. Additionally, by using FDA, we eliminate the time dimension, which supports our aim to facilitate estimation using small sample sizes, as is often the case for new products. We introduce the two models outlined in Figure 2 to forecast sales. The first is a regression‐based model that can be applied to both intrabrand and cross‐brand data, abbreviated with iRg and xRg, respectively. The second is a two‐staged model, based on segmentation, using clustering and classification. Although, in principle, we can cluster intrabrand data, the available sample size of own products might limit its applicability in practice. Therefore, we consider the segmentation model only for the cross‐brand data, which we use as a benchmark, labelled as xCl.

FIGURE 2

Different approaches to obtain competitor forecasts from common PROST features and product characteristics. The clustering for intrabrand data is subject to a sufficient sample

Functional regression

To derive a feature‐based time‐series representation, we consider a variety of parametric and nonparametric methods (for a general overview on time‐series dimensionality reduction methods, see Fulcher, 2018). Specifically, instead of considering all past PROST up to period t as a vector of length t, we use a single value, total PROST volume. In the context of functional regression, we regress this on the cumulative sales

Y_{j, T + h}

for a new product j at time

T + h

where h is the desired forecast horizon and T is the product release date. Note that by using cumulative sales, the last observation within a product life cycle automatically reflects the total sales (or market potential within the product adoption context; Bass, 1969). The model is:

Y_{j, T + h} = f_{h} (P R O S T_{j, T - l}, P r o d C h a r_{j}) + ε_{j},

where the target

Y_{j, T + h}

for a given horizon h is a function of the variables of PROST up to l‐steps before release and product characteristics and

ε_{j} \sim N (0, σ_{j}^{2})

. These features are reduced to scalars, as we discuss below. The

f_{h} (\cdot)

stands in to summarize the full model in Equation (6).

The most common ways researchers reduce the PRB time dimension are summing its volume over a certain period (e.g., Gelper et al., 2015; Tian et al., 2014; Wang et al., 2010) or capturing its adoption dynamics with diffusion model parameters (Kulkarni et al., 2012) and functional principal components (FPCs; Foutz & Jank, 2010; Hann et al., 2011; Xiong & Bharadwaj, 2014). Although Xiong and Bharadwaj (2014) and Foutz and Jank (2010) directly compared FPC against volume‐based PRB models, they did not include PRB valence; valence and volume are complementary, as they summarize PRB information in different ways. Since our goal is to maximize predictive power, we use all information types and let the model determine the influential ones. In addition to these measures, we also quantify the velocity and trend, which are defined below.

PROSTvolume

We define the per‐period PROST as

g_{j, t}

and its cumulative as

G_{j, t}

. When measuring PROST volume, two relevant aspects researchers should consider are the number of periods over which PROST is calculated and the time before release, that is, the window length w and lead time l to product release date T. We measure the volume as follows:

{PROSTvol}_{j} = \sum_{k = 1}^{w} g_{j, T - l - k} .

Characterizing PROST by its volume has the disadvantage that any dynamics are lost. The same volume may follow very different trajectories, which can affect product adoption (see Xiong & Bharadwaj, 2014). The subsequent variables capture these dynamics in different ways.

PROSTtrend

One way to assess the dynamics of PROST is to fit a linear trend through to the observed PROST up to the point

g_{T - l}

. Once this slope parameter is estimated, we can use it as the PROSTtrend measure.

PROSTvelocity

A simple way to measure the adoption speed is to measure the number of periods it takes to reach a certain percentage of the PROST adoption over

n = 1, …, w

periods:

\begin{matrix} r_{j, n} & = n 1 \{\frac{g_{j, T - l - w + n}}{G_{j, T - l}} \geq τ\}, \\ {PROSTvelocity}_{j} & = {(r_{j, n})}_{1}^{+}, \end{matrix}

where τ is a threshold that can be any number between 0 and 1, and

1

is an indicator function that takes the value 1 when its condition is satisfied and zero otherwise. The

r_{j, n}

takes values 0 or the location of the vector that the condition is satisfied. Then for the PROSTvelocity we take the first nonzero element of

r_{j, n}

. The index n helps to measure the location in an increasing manner.

PROSTadoption

We parameterize the increasing buzz through diffusion curves (e.g., Hann et al., 2011; Kulkarni et al., 2012). The best‐known diffusion curve is Bass (1969), which describes the word‐of‐mouth process through innovators and imitators, captured by coefficients p and q, respectively. First, we fit a Bass curve to each product

G_{j, t}

\begin{matrix} G_{j, t} = \frac{1 - \exp (- (p_{j} + q_{j}) t)}{1 + \exp (\frac{q_{j}}{p_{j}}) (- (p_{j} + q_{j}) t)} + ε_{j, t} . \end{matrix}

Then, we use the estimated

{\hat{p}}_{j}

and

{\hat{q}}_{j}

coefficients as our measure to describe PROSTadoption:

\begin{matrix} {PROSTadoption}_{j} = ({\hat{p}}_{j}, {\hat{q}}_{j}) . \end{matrix}

We experimented with other diffusion curves, namely, the Weibull, recommended by the literature for modeling buzz adoption (Kulkarni et al., 2012), and the Gompertz curve with its Gamma‐shifted variant (Bemmaor, 1994) that has been shown to have good forecasting performance (Meade & Islam, 2006). However, compared to the Bass curve, those had an overall worse fit on PROST or required an additional model parameter without significant increases in accuracy. More details on their performances are provided in the Supporting Information.

PROST FPCs

FPC analysis has gained popularity for predicting new product adoption (e.g., Fan‐Osuala et al., 2018; Sood et al., 2009). Studies that use FPC for forecasting with PRB report better predictive performance against diffusion curves (Hann et al., 2011) and volume‐based models (Foutz & Jank, 2010; Xiong & Bharadwaj, 2014). The idea is to characterize and identify unique shapes of all PROST adoption curves through a principal components analysis. It is recommended to first reduce the noise by smoothing the raw shape by using smoothing splines (Ramsay & Silverman, 2005).

We decompose the smoothed curves into principal components and use the resulting values as our PROSTfpc measure, that is, include for each product a vector of individual principal component scores. FPC requires that all PROST curves are of the same length and may therefore trim some data, however, the FPC will still take dynamics, such as slow adoption, into account.

Product characteristics

In addition to PROST information, we consider product characteristics related to competitors' products that are available before release. Since we aim to evaluate our research questions using video game sales data, we consider information that is typically available for entertainment products. These include information such as the Genre (PCTGenre; Kim & Hanssens, 2017; Xiong & Bharadwaj, 2014), the Sequel number (PCTsequel; Craig et al., 2015; Foutz & Jank, 2010; Hann et al., 2011; Liu, 2006; Xiong & Bharadwaj, 2014), and information about the Release time (PCTrelease; Kim & Hanssens, 2017; Xiong & Bharadwaj, 2014). In this research, we focus on freely available sources, and therefore, although extant studies have included marketing information such as ad spend (e.g., Kim & Hanssens, 2017; Xiong & Bharadwaj, 2014) or competition (e.g., Gopinath et al., 2013; Kim et al., 2017), we exclude these variables because only companies with marketing intelligence databases may access this information. These variables are also at an aggregation level that limits usefulness for predicting individual product launches.

The full model in Equation (1) can be written as:

\begin{matrix} Y_{j, T + h} & = & α_{0} + α_{1} {PROSTvolume}_{j} + α_{2} {PROSTtrend}_{j} \\ + \sum_{m = 1}^{M} α_{m + 2} {PROSTvelocity}_{j, m} \\ + \sum_{d = 1}^{2} α_{M + d + 2} {PROSTadoption}_{j, d} \\ + \sum_{e = 1}^{E} α_{M + e + 4} {PROSTfpc}_{j, e} \\ + \sum_{z = 1}^{Z} α_{M + E + z + 4} {PCTgenre}_{j, z} \\ + α_{M + E + Z + 5} {PCTrelease}_{j} \\ + α_{M + E + Z + 6} {PCTsequel}_{j} + ε_{j}, \end{matrix}

for

j = 1

to I for intrabrand estimation and

j = 1

to X for cross‐brand, whereas

M, E, and Z

identify the total number of variables within each variable category, as defined in the preceding sections. M and E are defined in Section 4.2.1; this provides the specifics on PROST data for our evaluation. In Section 4.2.2 we detail Z that specifies product characteristics. PROSTadoption has two coefficients, see Equation (5). We use the logarithmic version of all raw inputs (sales, PROST, and product characteristics) as suggested by Schaer et al. (2019b).

Functional clustering

For our second forecasting approach, we use product segmentation to obtain and use homogeneous subgroups. We construct segments using clustering, which is a common approach to predict new product sales (recent examples include Baardman et al., 2017; Basallo‐Triana et al., 2017; Hu et al., 2019), though we are not aware of applications to PRB data. We use functional clustering where functions are used to reduce the time dimensionality, similarly to FDA (see Goia et al., 2010; Sood et al., 2009, for applications to short time series). We obtain k‐clusters and then train a multiclass classification algorithm to predict the corresponding cluster of a new product. The products in this cluster are then used to forecast sales.

By including additional postrelease information, we create richer classifier inputs, leading to better performance. For this study, we include product reviews (labelled as PCTreview; see Chintagunta & Lee, 2012; Dellarocas et al., 2007; Zhu & Zhang, 2010) and dynamics of sales to enrich our data set. Additionally, we include sales volume (SLSsales), the time it takes until a certain percentage of total sales is reached (SLSvelocity), and the Bass adoption curve parameters (SLSadoption), as introduced in Section 3.1. These additional inputs are only used for clustering. The classification algorithm only uses data that are available prelaunch, as in Equation (6).

PREDICTING THE SUCCESS OF NEW VIDEO GAMES RELEASES

We empirically investigate the value of PROST (Research Question 1), comparing cross‐brand and intrabrand models (Research Question 2), and the value of predicting by brands (Research Question 3) in the video game industry, a highly competitive multibillion dollar market (Tripp et al., 2020). It is common for consumers to actively discuss prerelease games on online platforms, such as blogs or social media websites. This is further fueled by advertising activities (Marchand & Hennig‐Thurau, 2013). Furthermore, the relative homogeneity of products and short life cycles is helpful for our research. Although the literature on PROST predominately focuses on forecasting opening sales, Schaer et al. (2019b) showed that Google Trends contains valuable information to predict complete life cycle sales. Because one goal of CI is to meet a long‐term strategic focus, we evaluate both horizons, opening and total sales achieved by end‐of‐life (EoL).

Data

Our data set consists of weekly physical video games sales data from VGChartz (http://www.vgchartz.com; also used, for example, by Marchand et al., 2017; Xiong & Bharadwaj, 2014). We aggregate sales across console platforms for 240 games that represent 23 well‐known brands such as EA, Ubisoft, and Take‐Two. The average number of games per brand is 10.4 (median 4) and ranges from 1 to 61 titles.

In some instances, the sales history of a game spans several years, with its tail only capturing a few sales. In these cases, we truncate the sales time series when the growth rate of the cumulative sales becomes less than 0.05% per week. With this treatment, the average EoL is typically reached 40 weeks after launch. This is considered our total sales target.

To represent PROST, we collect information from Google Trends for each video game in our sample. Figure 3 illustrates the available signal before release and the subsequent adoption of sales. For each game, we download its topic popularity, using the Google Knowledge Graph entity. This method combines linguistic and semantic–related keywords into one search query, which leads to a more robust search traffic coverage (see discussion by Schaer et al., 2019a; Siliverstovs & Wochner, 2018). If there is no Google Knowledge Graph entity, then we use the video game title as the keyword.

FIGURE 3

Typical video game sales pattern and the prerelease buzz of search traffic

Since Google Trends is peak scaled (values between 0 and 100), it makes individual data queries noncomparable. Although Google allows retrieving search popularity for up to five keywords per request, downloading multiple keywords becomes complicated, as new keywords might have a higher volume, which requires rescaling. To overcome this, we use the same scaling procedure as proposed by Schaer et al. (2019b). They use the neutral scaling keyword “marker” to scale each video game's popularity accordingly.

Feature estimation

PROST data

Several scholars have showed the predictive power of buzz increases toward release (e.g., Kim & Hanssens, 2017; Xiong & Bharadwaj, 2014). Therefore, the decision lead time becomes a trade‐off between maximizing forecast performance and the management's operational requirements. For this experiment, we investigate lead times l of 1, 4, and 8 weeks. For a further discussion about the lead time properties of PROST, and its application on video games, we refer the reader to Schaer et al. (2019b) and Xiong and Bharadwaj (2014).

For PROSTfpc we set the window length w to 26 weeks (182 days) similar to Xiong and Bharadwaj (2014). For all other measures, we use a flexible window length w for each product, dependent on when search traffic becomes available. To avoid any spurious start of PROST, we limit the maximum window length to 40 weeks and require two consecutive observations. The week when PROST becomes available marks our 0% entry for PROSTvelocity, with further inputs measured at 25%, 50%, and 75% of the observed PROST adoption.

All analysis and model estimation is carried out in the statistical programming language R (R Core Team, 2019). We estimate the PROSTtrend coefficient with ordinary least squares. The PROSTadoption coefficients for the Bass curve are estimated using nonlinear least squares on the per‐period adoption with Hooke–Jeeves optimization algorithm, as implemented in the diffusion package for R (Schaer & Kourentzes, 2021). All PROST curves are smoothed with b‐splines, using Akaike information criterion to determine the smoothing parameter λ, that is available in the cobs package for R (Ng & Maechler, 2020). For PROSTfpc we include the first 4 PC, as this provided the best predictive performance (similar methodology to Hann et al., 2011; Xiong & Bharadwaj, 2014). For Equation (6),

M = 4

and

E = 4

Product characteristics data

We encode the video games for genre (PCTgenre; Equation (6)

Z = 11

) and November release (PCTrelease), since it is the month with the most releases. We experimented in encoding months individually, but this led to worse predictive performance. For each video game, we also indicate the sequel number PCTsequel. For the segmentation, we also include the review (PctReview) scores from MetaCritic and IGN available on a Kaggle data set repository.

Sales data

Similarly to the PROSTvelocity defined in Equation (3), we measure the SLSvelocity of sales as the number of weeks it takes to reach 25%, 50%, 75%, and 95% of the overall adoption. We use the diffusion package (Schaer & Kourentzes, 2021) in R to estimate the parameters of the Bass curve for the inputs of SLSadoption. The last two inputs included in the model are the opening week and total sales (SLSsales).

Predictive algorithm

Functional regression

To estimate the market potential of competitors' new products with PROST information, compressed into scalars, we use Random Forest (RF) (Breiman, 2001). This machine learning method is an ensemble technique that uses bootstrapping to build a large number of decision trees and then selects the most voted one. We opt for RF instead of linear regression, as the former can capture flexible variable inputs beyond simple linear ones. An additional motivation to use RF is that it is performing well in both regression and classification, simplifying modeling. The RF algorithm is available in the caret package for R (Kuhn et al., 2021).

We train a PROST model for both the intrabrand and cross‐brand scenarios, labelled hereafter as iRgPROST and xRgPROST, respectively. In the cross‐brand case, we use a 10‐fold cross‐validation approach and tune the number of variables sampled at each split via a grid search tracking the root mean squared error (RMSE). The restrictive intrabrand sample size requires the use of leave‐one‐out cross‐validation. We considered alternative algorithms, such as gradient boosting in the form of XGBoost (Chen & Guestrin, 2016), and sparse regression in the form of Ridge and LASSO (Hastie et al., 2015). However, the RF results consistently performed best. We note that Ridge and LASSO are strictly linear models, while XGBoost is more sensitive to hyperparameter tuning than RF. The detailed results are available in the Supporting Information.

Segmentation

Since our data set contains both continuous and categorical data, we use the Gower similarity coefficient to create the distance matrix (Gower, 1971). We avoid transforming categories into binary variables as this leads to information loss (Xu & Wunsch, 2009). For the clustering, we use the Partitioning Around Medoids algorithm, as suggested by Kaufman and Rousseeuw (2005) and implemented in the cluster package for R (Maechler et al., 2018). In contrast to k‐means, k‐medoids is not dependent on having squared Euclidean distances and is suitable to use with the Gower distance (Hastie et al., 2008). Note that the segmentation uses the same data for training and prediction as xRgPROST. As mentioned in Section 3.2, the postrelease information for the distance matrix creation is only based on the training sample.

Once the clusters are determined, the second step is to train a classifier that can allocate new prelaunch information of a new product to a cluster. The restricted sample size within clusters makes it challenging to run the proposed regression on clusters. Instead, we directly predict total sales

{\hat{Y}}_{j, T + h}

from the median of all video games sales observed at

T + h

within a cluster. Note that in the case of

h = E o L

, we take the last available observation, as described in Section 4.1.

There are a variety of measures that help in selecting the optimal number of clusters, such as the Gap statistics (Tibshirani et al., 2001) or the Jump method (Sugar & James, 2003). While these measures rely solely on cluster characteristics, our two‐stage process of clustering and forecasting has the advantage that it allows measuring the clustering quality directly on the target variable. More specifically, we select the cluster with the smallest mean squared error (MSE) on sales using 10‐fold cross‐validation, considering up to 30 clusters. We refer to this model as xClPROST.

Benchmark models

To assess the predictive value of PROST, we introduce two types of benchmarks that use both intra‐ and cross‐brand data. The first is based on regression that uses no PROST information and draws upon observed sales and product characteristics (iRgPCT & xRgPCT). The iRgPCT model reflects how companies base their competitors' forecast without PROST information available. The second is a naïve model where we calculate the median of the entire training sample (xMdSLS & iMdSLS). Despite being trivial to implement, such parsimonious benchmarks are good forecasting practices and often hard to beat (Ord et al., 2017). For convenience, we summarize all included features of the different forecasting models in Table 2. The first two columns list and describe the input features. The other columns indicate their inclusion into the different forecasting models.

TABLE 2

Overview of features included for different prediction models

		PROST models			Benchmark models
			ClPROST
Feature	Input	RgPROST	Clust.	Class.	RgPCT	MdSLS
PROST data based
PROSTadaption	Bass param.	x	x	x	‐	‐
PROSTfpc	FPC scores	x	x	x	‐	‐
PROSTtrend	Trend slope	x	x	x	‐	‐
PROSTvelocity	Time to % adop.	x	x	x	‐	‐
PROSTvolume	Total sum	x	x	x	‐	‐
Product characteristics data based
PCTgenre	Genre cat.	x	x	x	x	‐
PCTrelease	Release month	x	x	x	x	‐
PCTreviews	Review scores	‐	x	‐	‐	‐
PCTsequel	Sequel #	x	x	x	x	‐
Sales data based
SLSadoption	Bass param.	‐	x	‐	‐	‐
SLSsales	Observed sales	‐	x	‐	‐	x
SLSvelocity	Time to % adop.	‐	x	‐	‐	‐

Performance evaluation

Our research design is based on scenarios (ii) and (iii) with intra‐ and cross‐brand estimation, as illustrated in Figure 1. More specifically, we generate individual product forecasts

{\hat{Y}}_{j, T + h}

for competitors' titles j. We can construct a total of 23 different out‐of‐sample sets of various sizes. Each out‐of‐sample is individually predicted by the model introduced in Equation (6), either based on the full training sample (cross‐brand case) or as an individual brand (intra‐brand case). For the latter, we only consider brands with at least six games, as the estimation becomes very unreliable with fewer data. This allows testing the intrabrand case for 10 brands, as listed in Table 3.

TABLE 3

Brand represented within the training and test sample

Training & testing		Testing only
Brand	# Games	Brand	# Games
Capcom	16	Level 5	1
Nintendo	18	Codemasters	2
Ubisoft	30	Bethesda Softworks	3
Electronic Arts	61	Eidos Interactive	1
Take‐Two Interactive	24	Square Enix	1
Activision	23	Valve	1
Microsoft Game Studios	13	Spike	2
Sega	8	Konami Digital Entertainment	6
Sony	7	MTV Games	1
THQ	14	Deep Silver	1
		Namco Bandai Games	2
		From Software	1
		WB Games	4

We have a total of 2186 out‐of‐sample predictions for each horizon, from different training samples for each of the previously outlined models. Note that we retrain the models for every forecast horizon. We measure the forecast accuracy using the geometric mean relative absolute error (GMRAE) (Armstrong & Collopy, 1992):

\begin{matrix} {GMRAE}_{i, T + h} & = & \sqrt[n]{\prod_{i = 1}^{n} (\frac{{AE}_{i, r}}{{AE}_{i, b}})}, \\ {AE}_{i, T + h} & = & |{\hat{y}}_{i, T + h} - y_{i, T + h}|, \end{matrix}

where r is the candidate model and b is a benchmark forecast, which is the intrabrand model without PROST information (iRgPCT), for each series i and horizon

h = {1, E o L}

. The GMRAE is an intuitive scale‐independent error metric, with errors smaller than 1 meaning that the candidate model outperforms the selected benchmark by (1 − GMRAE) × 100% (Ord et al., 2017). The favorable statistical properties in terms of asymmetry and robustness make the GMRAE ideal for new product forecasts, as these often observe large errors. Metrics based on percentages or squared errors will be influenced by the direction of the error or outliers, respectively (Armstrong & Collopy, 1992; Fildes, 1992).

Results

Table 4 presents the predictive performance of the different forecasting models against the benchmark iRgPCT for the opening week (

h = 1

) and total sales (

h = E o L

) forecasts. The top three rows show the intrabrand models, while the bottom four rows highlight the cross‐brand ones. The best performing models for each category are highlighted in bold. In both intrabrand and cross‐brand scenarios, the PROST‐augmented model performs substantially better than the benchmark. This finding is consistent across the different lead time (l) scenarios of 1, 4, and 8 weeks and there is only a marginal drop in performance.

TABLE 4

GMRAE performance of models against iRgPCT (product characteristics only)

Model type			$h = 1$			$h = E o L$
s	e	v	$l = 1$	$l = 4$	$l = 8$	$l = 1$	$l = 4$	$l = 8$
i	Rg	PROST	0.795	0.790	0.812	0.795	0.824	0.825
i	Rg	PCT	1.000	1.000	1.000	1.000	1.000	1.000
i	Md	SLS	1.099	1.099	1.099	1.008	1.008	1.008
x	Rg	PROST	0.601	0.663	0.655	0.598	0.614	0.613
x	Cl	PROST	1.033	1.010	1.589	0.897	0.865	0.989
x	Rg	PCT	0.906	0.906	0.906	0.849	0.849	0.849
x	Md	SLS	0.945	0.945	0.945	0.854	0.854	0.854

Note: Column s describes the data sample being intra‐ (i) or cross‐brand (x) based. Column e indicates the estimation method: regression (Rg), segmentation (Cl), and medians (Md). Column v notes the inputs of the model using PROST or only product characteristics (PCT) and sales (SLS). Column h indicates the forecast horizon and l the lead time.

Within the intrabrand model category, iRgPROST outperforms iRgPCT on average by nearly 20% for

h = 1

and 15% for

h = E o L

across lead times. iMdSLS is outperformed by more than 28% for the opening and 19% for the total sales. This illustrates that PROST contains more predictive performance for competitor products compared to solely competitor product characteristics and internal sales data only. To further confirm this finding, we test whether differences between models are significant and not due to randomness. We use a nonparametric Friedman and then post hoc Nemenyi tests, as GMRAE errors are nonnormally distributed (Hollander et al., 2014) as implemented in tsutils v.0.9.0 package for R (Kourentzes, 2019). The Friedman test results in a p‐value of 0.000 for both forecast horizons. This indicates that at least one set of results is statistically different. We then proceed with the post hoc Nemenyi test to identify subgroups. Figure 4 shows the mean ranks of the different models with the whisker being the critical Nemenyi distance at 0.193 for the lead time of 1 week. A model is considered to be statistically different if there is no overlap between them. We highlight the iRgPROST model with a gray bar, indicating that for both forecast horizons, it significantly outperforms the intrabrand benchmarks and confirming Research Question 1. Those findings are consistent across lead times except for

l = 8

, where the xClPROST is no longer statistically different than iRgPROST.

FIGURE 4

Nemenyi test results at 5% significance level for

l = 1

Compared to cross‐brand data, iRgPROST provides better forecast accuracy than xRgPCT and xMdSLS, even though the improvements are less impressive compared to the intrabrand benchmarks, with gains of at least 6% and 1.7% for the opening and EoL horizon. In the latter case, differences are also no longer significantly different. If we compare the RgPROST‐based models, we see that the cross‐brand model significantly outperforms its intrabrand counterpart by more than 19%. It appears that the richer cross‐brand training can generate substantially more accurate predictions than just intrabrand PROST, which confirms Research Question 2. However, this does not hold for the clustering approach, which falls significantly short against both regression PROST models. This again supports our Research Question 3. We attribute this to the relative homogeneity of the video games sector.

In addition to the relative error performance, the first two columns in Table 5 show the percentage a model performs best and in the last two columns the percentage each model outperforms the iRgPCT benchmark. Within the intrabrand category, iRgPROST is most often the best choice over any of its category benchmarks. If compared to cross‐brand candidates, iRgPROST still remains the model that achieves most often the second best forecast accuracy. The drop between 1‐ and 8‐week lead time is only 1% for iRgPROST. Last but not least, the times iRgPROST outperforms iRgPCT is higher within the category and is similar to cross‐brand models.

TABLE 5

Percentage best overall (per group) and percentage better than iRgPCT at

l = 1

Model type			% best overall (% best in group)		% better than benchmark
s	e	v	$h = 1$	$h = E o L$	$h = 1$	$h = E o L$
i	Rg	PROST	$13 % (45 %)$	$14 % (45 %)$	66%	58%
i	Rg	PCT	$12 % (28 %)$	$10 % (28 %)$	0%	0%
i	Md	SLS	$7 % (27 %)$	$10 % (27 %)$	50%	48%
x	Rg	PROST	$34 % (46 %)$	$30 % (46 %)$	68%	67%
x	Cl	PROST	$10 % (20 %)$	$14 % (20 %)$	51%	54%
x	Rg	PCT	$12 % (16 %)$	$12 % (16 %)$	57%	59%
x	Md	SLS	$9 % (17 %)$	$10 % (17 %)$	57%	59%

Note: Column s describes the data sample being intra‐ (i) or cross‐brand (x) based. Column e indicates the estimation method: regression (Rg), segmentation (Cl), and medians (Md). Column v notes the inputs of the model using prerelease online search traffic (PROST) or only product characteristics (PCT) and sales (SLS).

We include multiple inputs into the regression model and let it flexibly identify the most influential ones. We tested more restricted models, with prefiltering variables based on their PROST feature category, for example, only using inputs from PROSTvelocity, PROSTfpc. We find that all of them provide predictive value, and, in some cases, they can provide slightly better performance. However, the approach that includes all variables has the overall most stable performance and greatly simplifies the modeling process. More details are available in the Supporting Information. The results are also consistent for MAPE and MAE but offer little interpretability on cumulative data.

DISCUSSION

In this study, we set out two research questions to test the suitability of PROST for predicting the success of competitors' new products. We propose a model that is able to generate sales forecasts and empirically test its performance using global video games sales. We find support for Research Question 1; PROST significantly improves competitors' new product predictions over models that use only internal sales and product characteristics. The literature review highlighted that very little research is concerned with using competitor user‐generated content, like PROST, in a predictive context. Our findings suggest that companies may find great value in linking their internal sales data with competitor‐related online content.

Beyond the proposed competitor PROST model, in our third question, we look at an alternative that can consider potential heterogeneity in a market using segmentation. Our findings indicate that the video games market is fairly homogeneous, and although this model can outperform benchmarks, it is inferior to the regression‐based model. This supports Research Question 3: Separating by brands is better than product segmentation in the case of video games sales. Nevertheless, this insight may not hold in other markets, and further research could reinforce this work with additional empirical evidence. Researchers may pay particular attention to brands that observe a distinctive reputation in terms of product quality or associated lifestyle. For example, Apple products might be perceived differently in terms of PRB and actual product adoption compared to less dominant tech brands.

Arguably, our benchmark model iRgPCT is limited by not considering all possible product categories. We argue that any cross‐brand data set is based on information that is not readily available to all organizations. With that in mind, the performance against xRgPCT is useful to assess whether PROST can overcome those restrictions, but is otherwise not practical. We find that using PROST outperforms opening sales and matches the performance on total sales, demonstrating the efficacy of the proposed model. All models, however, fall short against the xRgPROST model. This is expected, as it can draw from a much richer data set than the intrabrand models and confirms Research Question 2. Nonetheless, the practical usefulness of xRgPROST is questionable, but exemplifies the gap between research and practice.

One interesting aspect to investigate further is the performance at the brand level. Table 6 reports the GMRAE of iRgPROST model against individual benchmark models. The results show that iRgPROST outperforms iRgPCT, except in the case of Nintendo for

h = 1

and Sega when

h = E o L

. In most cases, iRgPROST also outperforms xRgPCT but trails against the xRgPROST. Upon visual examination across all brands, we cannot identify a systematic trend that would indicate intrabrand errors reduce when sample sizes increases (see the Supporting Information).

TABLE 6

GMRAE performance of iRgPROST against various benchmarks at the brand level

	$h = 1$			$h = E o L$
Brand	iRgPCT	xRgPCT	xRgPROST	iRgPCT	xRgPCT	xRgPROST
Activision (23)	0.596	0.802	1.185	0.706	1.363	1.966
Capcom (16)	0.782	0.787	1.149	0.748	0.724	0.991
Electronic Arts (61)	0.759	0.729	1.151	0.612	0.688	1.052
Microsoft Game Studios (13)	0.841	1.576	2.377	0.753	1.015	1.394
Nintendo (18)	1.007	0.845	1.229	0.755	0.994	1.319
Sega (8)	1.018	0.815	1.252	1.116	1.127	1.642
Sony Computer Ent. (7)	0.594	1.003	1.487	0.692	0.997	1.392
Take‐Two Interactive (24)	0.656	0.760	1.217	0.735	0.775	1.080
THQ (14)	0.956	0.898	1.346	0.944	0.852	1.264
Ubisoft (30)	0.870	0.739	1.109	0.969	0.951	1.405

Note: Cases where iRgPROST outperforms the benchmarks are highlighted in bold.

It is worth noting that the forecasting performance of iRgPROST is similar for the two different forecast horizons. Similar observations have been reported, where PROST not only provides value for predictions close to release, but also contains predictive information for long‐term forecasts (Schaer et al., 2019b). We also find that PROST is relatively consistent for the tested lead times, confirming findings by Schaer et al. (2019b) and Xiong and Bharadwaj (2014).

Managerial implications

Our study shows that it is not only possible to gain valuable insights from PROST for own sales product launches, but can also provide valuable insights for competitor launches. The lead time provided by PROST enables firms to counteract or support their own sales, for example, with marketing activities. As such, it can also provide insights when planning a product release, by monitoring competing products that might impact the release success. Moreover, the predicted market potential offers ways to be incorporated into market share models (e.g Chen & Steckel, 2012; Du et al., 2007; Kumar et al., 2020; Luan & Sudhir, 2010; Zheng et al., 2012). Linking PROST to first week and life cycle sales provides a simple‐to‐interpret measure that allows contextualizing competitive pressures. Crayon SCIP (2022) indicates that competitors' revenue and sales are some of the most important key performance indicators for managers.

As Figure 3 illustrates, most video game sales are accrued in the first few weeks after launch. Therefore, having an accurate understanding of competitors' launch sales, either at the entry weeks or as cumulative to the EoL, is valuable information to manage competitive forces. A publisher can use these forecasts to inform their own launch decisions in order to either maximize sales, or use a game launch to increase competitive pressure. Similarly, this information can be paired with associated marketing instruments, such as pricing, advertising, or promotions, for both new and existing products. However, due to their short life cycles, the competition is often not on price but on quality and matching emerging consumer tastes. Therefore, research suggests that managing the release timing is more important (Calantone et al., 2010; Engelstätter & Ward, 2018). Another way to retain an active player base is to provide feature updates (Hyeong et al., 2020) and engage with the gaming community. Knowing the likely market success of competing products in advance allows for better deploying those countermeasures, depending on the publisher's objective, and avoids wasteful activities. The director of data science and Analytics at 2K Games indicated the importance of a systematic model‐based approach to support such decisions, something that, across the industry, is currently done ad hoc and primarily using human judgment. He highlighted the team at 2K Games indicated interest in ideas discussed here, and saw potential expansions, for example, including Twitch or Reddit. We draw parallels on these from other sectors in Section 5.2.

PROST is freely available and can readily be implemented into predictive frameworks, as outlined in this research. Preannouncing new products, igniting the PROST signal, is common practice, especially in competitive markets (e.g., for software, see Bayus et al., 2001), where the benefit of being able to choose the desired launch date upfront and attract customer attention outweighs the risk of direct imitators (Bhaskaran & Ramachandran, 2011; Su & Rao, 2010). In the video game industry, as our analysis shows, such a PROST signal usually becomes available well in advance, and the forecasting model can be updated once new information arrives. This is relevant to practitioners as consumer preference often changes during the prerelease phase (Meeran et al., 2017), and getting timely CI information is one of the biggest struggles brands face (Crayon SCIP, 2022). One of the main benefits of PROST is that it adds value even when the own product sample size is relatively small. This is a particularly effective strategy for minor publishers and independent developers, who lack the resources and historical data of established major publishers. Although this process can be fully automated, we would expect it to be operationalized in a supervised setting and used to supplement analysts' judgment‐based forecasts.

Moreover, PROST includes information for the near‐term open sales but also provides insights on the strategic horizon by explaining some of the overall market potential. This also creates the opportunity for new research avenues with direct impact on practice as the gaming sector evolves. 2K Games indicated that, for instance, now it is common practice to foster a community around games on online platforms, such as Discord or Reddit, with exclusive prerelease availability of titles to associated or independent streamers on platforms such as YouTube and Twitch. This not only provides the potential for richer PRB information, but also for managing it (Dost et al., 2019). Finally, our communication with the publisher makes it apparent that the proposed model enables further research into how to best translate new competition insights into optimal marketing and launch strategy responses.

Model extensions

In this study, we focus on measuring the future market potential of competitors' products and provide initial insights on the suitability of PROST. A natural next step is to formulate a framework that also measures the competitive impact on own products. Recent research suggests that incorporating CI improves not only short‐term forecast accuracy (Huang et al., 2014; Li et al., 2019), but also helps identifying analogies for new product forecasting (Baardman et al., 2017). In these cases, PROST can be a more cost‐effective alternative to external market research services.

Search traffic information has, for example, been incorporated into a market response model that measures impact on sales (e.g., Du & Kamakura, 2012; Du et al., 2015) or advertising (e.g., Hu et al., 2014). An extension, into this area of new products would have two potential implications (for a prerelease application with survey data, see Roberts et al., 2005): first, to improve the marketing mix (for an application without PROST, see Luan & Sudhir, 2010), and second, to better manage product release timing. For example, postponing a release is common practice in the movie industry (Einav, 2010). There is limited work to help identify when the loss in revenue due to competitors' launches outweighs the cost of postponement that can impact stock value (Einav & Ravid, 2009) and brand trust (Herm, 2013). An alternative is to look directly into the impact on sales and associated decisions, such as to improve inventory management and dynamic pricing (e.g., Huang et al., 2014; Martínez‐de Albéniz et al., 2020). However, we would expect this to be of more importance for the postrelease phase. All of those directions would generally profit from a deeper focus on the assessment of confidence and risk that is involved when forecasting with user‐generated content.

While our research suggests PROST models are useful over naïve ones, it remains an open question how PRB compares to commercially available market data. As Kumar et al. (2020) argue, such data sources quickly become costly, especially if tracked over time. Therefore, many companies heavily rely on human judgment for their models, remaining relatively immature in their competitor analysis (see Crayon SCIP, 2022; Ranjan & Foropon, 2021). This is further amplified by the judgmental processes used for new products (Kahn & Chase, 2018). We see an interesting avenue of future research in how this new kind of marketing intelligence can integrate into decision‐making processes, hopefully mitigating biases involved with new products (Belvedere & Goodwin, 2017; Markovitch et al., 2014).

Our modeling approach is based on a set of well‐established methods to reduce the dimensionality of PROST. While our results are in favor of a data‐driven variable selection, there is room to experiment further with different time‐series characterizations (e.g., Lubba et al., 2019). It is also possible to expand our framework to a more granular level, for example, using regional PROST for better planning of the distribution. However, depending on the product, search traffic information might be limited (Schaer et al., 2019a). A way forward for future researchers might be to use a top‐down approach and split global search interest to regional sales observation. Moreover, researchers may include other types of PRB sources, in particular those involving unstructured data, from platforms such as Reddit and video content (see Balducci & Marinova, 2018). Although video games are a specific type of entertainment product, we anticipate our findings to hold for other products such as electronics and outdoor gear, as long they observe a similarly active online community.

We show that FW and EoL sales have similar performance. Although our research is at the product level and tracks consumer interest, research focused on stock returns indicates that investors change their beliefs during the prerelease phase, based on the expectation that any preannouncements can affect in the long term (Sorescu et al., 2007). Similarly, we conjecture that depending on the prerelease strategy, the adoption dynamic might be different, and, in turn, may affect the market potential. There is a lack of research that investigates PRB adoption dynamics and its link to sales. This might also partially explain why we found little value in clustering video games sales.

Clustering might prove valuable on different data sets that observe greater heterogeneity. Our cluster sizes were very small, and therefore we could not use RFs. A data set with a large number of products might profit from experimenting with the combination of more advanced algorithms that also utilize within‐cluster information. This would not only permit researchers to combine the clustering with the regression modeling, but also to explore the intrabrand segmentation. That said, there is room for researchers to explore more advanced clustering methods for prerelease forecasting, such as those presented by Hu et al. (2019) and Baardman et al. (2017).

Finally, our work has focused on the development of the methods that use PROST to enhance CI. We discuss some of the decision‐making advantages this can provide, yet this research is limited in that it does not present observations from a company case investigation. Targeted future research would not only enable identifying direct benefits, for instance, revenue impact based on better CI, but also track potential changes in the decision‐making process.

CONCLUSIONS

Although research on CI is well established and frequently draws from user‐generated content, there is limited focus on its predictive value for new products. Moreover, it can miss the connection to decision variables, such as sales. Our study contributes to the growing literature on using Big Data for CI (e.g., Choi et al., 2018, for an overview in applications in operations management) by providing insights into the predictive value of publicly available PROST information of competitor products, linking it directly to sales. While most of the literature evaluates the proposed approaches on randomized cross‐brand data sets, we limit the training of our forecasting model to internal‐brand information and competitor PROST. This is both more difficult and realistic. Our results suggest PROST provides valuable insights for companies to better understand their competitive standing.

Shorter life cycles have led to increased competition on product launches, making it important to have insights on future market developments (Calantone et al., 2010). As such, companies have increased their CI efforts, yet they struggle to acquire data in a timely fashion (Crayon SCIP, 2022). In comparison to other market intelligence, PROST is an inexpensive source of information that reflects consumer interest (Houston et al., 2018). Moreover, it has the advantage that is available over time, which allows capturing its dynamics and change in consumer preference. We show that in a relatively homogeneous market setting, only a few internal sales observations are required to produce valuable insights into the competitors' market potential. This information can be taken into consideration when managing the marketing mix and can support new product release planning.

Footnotes

ACKNOWLEDGMENTS

We would like to thank the editor and three anonymous reviewers for their comments. We are thankful for having received valuable feedback from Florian Dost, Nigel Meade, Catherine Owsik, and Doug Thomas, as well as for the operational insights by the 2K Games data science team.

ORCID

Oliver Schaer

Nikolaos Kourentzes

Robert Fildes

References

Abrahams

A. S.

Jiao

Fan

Wang

G. A.

Zhang

(2013). What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings. Decision Support Systems, 55(4), 871–882.

Armstrong

Collopy

(1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69–80.

Asur

Huberman

(2010). Predicting the future with social media. In Hoeber

Huang

X. J.

(Eds.) IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (Vol. 1, pp. 492–499). IEEE.

Baardman

Levin

Perakis

Singhvi

(2017). Leveraging comparables for new product sales forecasting . https://ssrn.com/abstract=3086237

Balducci

Marinova

(2018). Unstructured data in marketing. Journal of the Academy of Marketing Science, 46(4), 557–590.

Basallo‐Triana

M. J.

Rodríguez‐Sarasty

J. A.

Benitez‐Restrepo

H. D.

(2017). Analogue‐based demand forecasting of short life‐cycle products: A regression approach and a comprehensive assessment. International Journal of Production Research, 55(8), 2336–2350.

Bass

F. M.

(1969). A new product growth for model consumer durables. Management Science, 15(5), 215–227.

Bayus

B. L.

Jain

Rao

A. G.

(2001). Truth or consequences: An analysis of vaporware and new product announcements. Journal of Marketing Research, 38(1), 3–13.

Belvedere

Goodwin

(2017). The influence of product involvement and emotion on short‐term product demand forecasting. International Journal of Forecasting, 33(3), 652–661.

10.

Bemmaor

A. C.

(1994). Modeling the diffusion of new durable goods: Word‐of‐mouth effect versus consumer heterogeneity. In Laurent

Lilien

G. L.

Pras

(Eds.) Research traditions in marketing (pp. 201–229). Springer.

11.

Bhaskaran

S. R.

Ramachandran

(2011). Managing technology selection and development risk in competitive environments. Production and Operations Management, 20(4), 541–555.

12.

Boone

Ganeshan

Hicks

R. L.

Sanders

N. R.

(2018). Can Google Trends improve your sales forecast? Production and Operations Management, 27(10), 1770–1774.

13.

Breiman

(2001). Random forests. Machine Learning, 45(1), 5–32.

14.

Calantone

R. J.

Yeniyurt

Townsend

J. D.

Schmidt

J. B.

(2010). The effects of competition in short product life‐cycle markets: The case of motion pictures. Journal of Product Innovation Management, 27(3), 349–361.

15.

Calof

J. L.

Wright

(2008). Competitive intelligence: A practitioner, academic and inter‐disciplinary perspective. European Journal of Marketing, 42(7/8), 717–730.

16.

Carta

Medda

Pili

Reforgiato Recupero

Saia

(2019). Forecasting e‐commerce products prices by combining an autoregressive integrated moving average (ARIMA) model and Google Trends data. Future Internet , 11(1), 1–19.

17.

Chau

(2012). Business intelligence in blogs: Understanding consumer interactions and communities. MIS Quarterly, 36(4), 1189–1216.

18.

Chen

Guestrin

(2016). Xgboost: A scalable tree boosting system. In Krishnapuram

Shah

Smola

Aggarwal

Shen

Rastogi

(Eds.) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). Association for Computing Machinery, New York, NY, United States.

19.

Chen

Steckel

J. H.

(2012). Modeling credit card share of wallet: Solving the incomplete information problem. Journal of Marketing Research, 49(5), 655–669.

20.

Chintagunta

P. K.

Lee

(2012). A pre‐diffusion growth model of intentions and purchase. Journal of the Academy of Marketing Science, 40(1), 137–154.

21.

Choi

T.‐M.

Wallace

S. W.

Wang

(2018). Big data analytics in operations management. Production and Operations Management, 27(10), 1868–1883.

22.

Craig

C. S.

Greene

W. H.

Versaci

(2015). E‐word of mouth: Early predictor of audience engagement. Journal of Advertising Research, 55(1), 62–72.

23.

Crayon SCIP (2022). State of competitive intelligence . https://www.crayon.co/state‐of‐competitive‐intelligence

24.

Cui

Gallino

Moreno

Zhang

D. J.

(2018). The operational value of social media information. Production and Operations Management, 27(10), 1749–1769.

25.

Dellarocas

Zhang

Awad

N. F.

(2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45.

26.

Dhar

Chang

E. A.

(2009). Does chatter matter? The impact of user‐generated content on music sales. Journal of Interactive Marketing, 23(4), 300–307.

27.

Ding

Cheng

H. K.

Duan

Jin

(2017). The power of the “like” button: The impact of social media on box office. Decision Support Systems, 94, 77–84.

28.

Divakaran

P. K. P.

Palmer

Søndergaard

H. A.

Matkovskyy

(2017). Pre‐launch prediction of market performance for short lifecycle products using online community data. Journal of Interactive Marketing, 38, 12–28.

29.

Dost

Phieler

Haenlein

Libai

(2019). Seeding as part of the marketing mix: Word‐of‐mouth program interactions for fast‐moving consumer goods. Journal of Marketing, 83(2), 62–81.

30.

R. Y.

Damangir

(2015). Leveraging trends in online searches for product features in market response modeling. Journal of Marketing, 79(1), 29–43.

31.

R. Y.

Kamakura

W. A.

(2012). Quantitative trendspotting. Journal of Marketing Research, 49(4), 514–536.

32.

R. Y.

Kamakura

W. A.

Mela

C. F.

(2007). Size and share of customer wallet. Journal of Marketing, 71(2), 94–113.

33.

Einav

(2010). Not all rivals look alike: Estimating an equilibrium model of the release date timing game. Economic Inquiry, 48(2), 369–390.

34.

Einav

Ravid

S. A.

(2009). Stock market response to changes in movies' opening dates. Journal of Cultural Economics, 33(4), 311–319.

35.

Engelstätter

Ward

M. R.

(2018). Strategic timing of entry: Evidence from video games. Journal of Cultural Economics, 42(1), 1–22.

36.

Fan‐Osuala

Zantedeschi

Jank

(2018). Using past contribution patterns to forecast fundraising outcomes in crowdfunding. International Journal of Forecasting, 34(1), 30–44.

37.

Feng

Shanthikumar

J. G.

(2018). How research in production and operations management may evolve in the era of big data. Production and Operations Management, 27(9), 1670–1684.

38.

Fildes

(1992). The evaluation of extrapolative forecasting methods. International Journal of Forecasting, 8(1), 81–98.

39.

Fleisher

C. S.

(2008). Using open source data in developing competitive and marketing intelligence. European Journal of Marketing, 42(7/8), 852–866.

40.

Foutz

N. Z.

Jank

(2010). Research note—Prerelease demand forecasting for motion pictures using functional shape analysis of virtual stock markets. Marketing Science, 29(3), 568–579.

41.

Fulcher

B. D.

(2018). Feature‐based time‐series analysis. In Dong

Liu

(Eds.) Feature engineering for machine learning and data analytics (pp. 87–143). CRC Press.

42.

Gelper

Peres

Eliashberg

(2015). Pre‐release word‐of‐mouth dynamics: The role of spikes . Working Paper (pp. 1–40). https://goo.gl/csebdR

43.

Goel

Hofman

J. M.

Lahaie

Pennock

D. M.

Watts

D. J.

(2010). Predicting consumer behavior with web search. Proceedings of the National Academy of Sciences, 107(41), 17486–17490.

44.

Goia

May

Fusai

(2010). Functional clustering and linear regression for peak load forecasting. International Journal of Forecasting, 26(4), 700–711.

45.

Goodwin

Dyussekeneva

Meeran

(2013). The use of analogies in forecasting the annual sales of new electronics products. IMA Journal of Management Mathematics, 24(4), 407.

46.

Gopinath

Chintagunta

P. K.

Venkataraman

(2013). Blogs, advertising, and local‐market movie box office performance. Management Science, 59(12), 2635–2654.

47.

Gower

J. C.

(1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871.

48.

Gutt

Herrmann

Rahman

M. S.

(2019). Crowd‐driven competitive intelligence: Understanding the relationship between local market competition and online rating distributions. Information Systems Research, 30(3), 980–994.

49.

Hammon

Ehrenberg

Goodhardt

(1996). Market segmentation for competitive brands. European Journal of Marketing, 30(1212), 39–49.

50.

Hann

I.‐H.

James

(2011). Forecasting the sales of music albums: A functional data analysis of demand and supply side p2p data . Working Paper. https://goo.gl/goWrKN

51.

Hastie

Tibshirani

Friedman

(2008). The elements of statistical learning. Data mining, inference, and prediction. 2nd ed. Springer.

52.

Hastie

Tibshirani

Wainwright

(2015). Statistical learning with sparsity. The lasso and generalizations. Monographs on Statistics and Applied Probability 143. CRC Press.

53.

Yan

Akula

Shen

(2015). A novel social media competitive analytics framework with sentiment benchmarks. Information & Management, 52(7), 801–812.

54.

Zha

(2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464–472.

55.

Hennig‐Thurau

Houston

M. B.

(2019). Entertainment science. Data analytics and practical theory for movies, games, books, and music. Springer.

56.

Herm

(2013). When things go wrong, don't rely on committed consumers: Effects of delayed product launches on brand trust. Journal of Product Innovation Management, 30(1), 70–81.

57.

Hollander

Wolfe

D. A.

Chicken

(2014). Nonparametric statistical methods. 3rd ed. Wiley.

58.

Houston

M. B.

Kupfer

A.‐K.

Hennig‐Thurau

Spann

(2018). Pre‐release consumer buzz. Journal of the Academy of Marketing Science, 46(2), 338–360.

59.

Acimovic

Erize

Thomas

D. J.

Van Mieghem

J. A.

(2019). Forecasting new product life cycle curves: Practical approach and empirical analysis. Manufacturing & Service Operations Management, 21(1), 66–85.

60.

R. Y.

Damangir

(2014). Decomposing the impact of advertising: Augmenting sales with online search data. Journal of Marketing Research, 51(3), 300–319.

61.

Huang

Fildes

Soopramanien

(2014). The value of competitive information in forecasting FMCG retail product sales and the variable selection problem. European Journal of Operational Research, 237(2), 738–748.

62.

Hyeong

J. H.

Choi

K. J.

Lee

J. Y.

Pyo

T.‐H.

(2020). For whom does a game update? Players' status‐contingent gameplay on online games before and after an update. Decision Support Systems, 139, 113423.

63.

Kahn

K. B.

Chase

C. W.

(2018). The state of new‐product forecasting. Foresight: The International Journal of Applied Forecasting, 3(51), 24–31.

64.

Kaufman

Rousseeuw

P. J.

(2005). Finding groups in data. An introduction to cluster analysis. 2nd ed. John Wiley.

65.

Kim

(2021). Do online searches influence sales or merely predict them? The case of motion pictures. European Journal of Marketing, 55(2), 337–362.

66.

Kim

Hanssens

D. M.

(2017). Advertising and word‐of‐mouth effects on pre‐launch consumer interest and initial sales of experience products. Journal of Interactive Marketing, 37, 57–74.

67.

Kim

Hong

Kang

(2015). Box office forecasting using machine learning algorithms based on SNS data. International Journal of Forecasting, 31(2), 364–390.

68.

Kim

Hong

Kang

(2017). Box office forecasting considering competitive environment and word‐of‐mouth in social networks: A case study of Korean film market. Computational intelligence and neuroscience, 2017, 1–16. https://doi.org/10.1155/2017/4315419

69.

Kourentzes

(2019). tsutils: Time series exploration, modelling and forecasting . R package version 0.9.0. https://cran.r‐project.org/package=tsutils

70.

Kuhn

Wing

Weston

Williams

Keefer

Engelhardt

Cooper

Mayer

Kenkel

Benesty

Lescarbeau

Ziem

Scrucca

Tang

Candan

Hunt

(2021). caret: classification and regression training . R package version 6.0‐86. https://cran.r‐project.org/package=caret

71.

Kulkarni

Kannan

Moe

(2012). Using online search data to forecast new product sales. Decision Support Systems, 52(3), 604–611.

72.

Kumar

Saboo

A. R.

Agarwal

Kumar

(2020). Generating competitive intelligence with limited information: A case of the multimedia industry. Production and Operations Management, 29(1), 192–213.

73.

Lau

R. Y. K.

Zhang

(2018). Parallel aspect‐oriented sentiment analysis for sales forecasting with big data. Production and Operations Management, 27(10), 1775–1794.

74.

Netessine

(2012). Who are my competitors? Let the customer decide . https://ssrn.com/abstract=2147638

75.

Fok

Franses

P. H.

(2019). Forecasting own brand sales: Does incorporating competition help? Erasmus School of Economics, Econometric Institute Research Papers, EI2019‐35, 1–28. http://hdl.handle.net/1765/123417

76.

Liu

(2006). Word of mouth for movies: Its dynamics and impact on box office revenue. Journal of Marketing, 70(3), 74–89.

77.

Luan

Y. J.

Sudhir

(2010). Forecasting marketing‐mix responsiveness for new products. Journal of Marketing Research, 47(3), 444–457.

78.

Lubba

C. H.

Sethi

S. S.

Knaute

Schultz

S. R.

Fulcher

B. D.

Jones

N. S.

(2019). catch22: Canonical time‐series characteristics. Selected through highly comparative time‐series analysis. Data Mining and Knowledge Discovery, 33, 1821–1852.

79.

Maechler

Rousseeuw

Struyf

Hubert

Hornik

(2018). cluster: Cluster analysis basics and extensions . R package version 2.0.7‐1. https://cran.r‐project.org/package=cluster

80.

Marchand

Hennig‐Thurau

(2013). Value creation in the video game industry: Industry economics, consumer benefits, and research opportunities. Journal of Interactive Marketing, 27(3), 141–157.

81.

Marchand

Hennig‐Thurau

Wiertz

(2017). Not all digital word of mouth is created equal: Understanding the respective impact of consumer reviews and microblogs on new product success. International Journal of Research in Marketing, 34(2), 336–354.

82.

Markovitch

D. G.

Steckel

J. H.

Michaut

Philip

Tracy

W. M.

(2014). Behavioral reasons for new product failure: Does overconfidence induce overforecasts? Journal of Product Innovation Management, 32(5), 825–841.

83.

Martínez‐de Albéniz

Planas

Nasini

(2020). Using clickstream data to improve flash sales effectiveness. Production and Operations Management, 29(11), 2508–2531.

84.

Meade

Islam

(2006). Modelling and forecasting the diffusion of innovation—A 25‐year review. International Journal of Forecasting, 22(3), 519–545.

85.

Meeran

Jahanbin

Goodwin

Neto

J. Q. F.

(2017). When do changes in consumer preferences make forecasts from choice‐based conjoint models unreliable? European Journal of Operational Research, 258(2), 512–524.

86.

Montgomery

D. B.

Moore

M. C.

Urbany

J. E.

(2005). Reasoning about competitive reactions: Evidence from executives. Marketing Science, 24(1), 138–149.

87.

Morwitz

V. G.

Schmittlein

(1992). Using segmentation to improve sales forecasts based on purchase intent: Which “intenders” actually buy? Journal of Marketing Research, 29(4), 391–405.

88.

Mülbacher

Füller

Huber

(2011). Online forum discussion‐based forecasting of new product market performance. Marketing ZFP, 33(3), 221–234.

89.

Netzer

Feldman

Goldenberg

Fresko

(2012). Mine your own business: Market‐structure surveillance through text mining. Marketing Science, 31(3), 521–543.

90.

P. T.

Maechler

(2020). COBS–constrained b‐splines (sparse matrix based) . R package version 1.3‐4. https://CRAN.R‐project.org/package=cobs

91.

Onishi

Manchanda

(2012). Marketing activity, blogging and sales. International Journal of Research in Marketing, 29(3), 221–234.

92.

Ord

J. K.

Fildes

Kourentzes

(2017). Principles of business forecasting. 2nd ed. Wessex.

93.

Palacios Fenech

Tellis

G. J.

(2016). The dive and disruption of successful current products: Measures, global patterns, and predictive model. Journal of Product Innovation Management, 33(1), 53–68.

94.

Peres

Muller

Mahajan

(2010). Innovation diffusion and new product growth models: A critical review and research directions. International Journal of Research in Marketing, 27(2), 91–106.

95.

R Core Team (2019). R: A language and environment for statistical computing . R Foundation for Statistical Computing. https://www.R‐project.org/

96.

Ramsay

Silverman

B. W.

(2005). Functional data analysis. 2nd ed. Springer.

97.

Ranjan

Foropon

(2021). Big data analytics in building the competitive intelligence of organizations. International Journal of Information Management, 56, 102231.

98.

Roberts

J. H.

Nelson

C. J.

Morrison

P. D.

(2005). A prelaunch diffusion model for evaluating market defense strategies. Marketing Science, 24(1), 150–164.

99.

Rust

R. T.

Rand

Huang

M.‐H.

Stephen

A. T.

Brooks

Chabuk

(2021). Real‐time brand reputation tracking using social media. Journal of Marketing, 85(4), 21–43.

100.

Sagaert

Y. R.

Aghezzaf

E.‐H.

Kourentzes

Desmet

(2018). Tactical sales forecasting using a very large set of macroeconomic indicators. European Journal of Operational Research, 264(2), 558–569.

101.

Schaer

Kourentzes

(2021). diffusion: Forecast the diffusion of new products . R package version 0.3.2. https://cran.r‐project.org/package=diffusion

102.

Schaer

Kourentzes

Fildes

(2019a). Demand forecasting with user‐generated online information. International Journal of Forecasting, 35(1), 197–212.

103.

Schaer

Kourentzes

Fildes

(2019b). Estimating the market potential with pre‐release buzz . https://ssrn.com/abstract=3325878

104.

Schoenherr

Swink

(2015). The roles of supply chain intelligence and adaptability in new product launch success. Decision Sciences, 46(5), 901–936.

105.

Siliverstovs

Wochner

D. S.

(2018). Google Trends and reality: Do the proportions match? Appraising the informational value of online search behavior: Evidence from Swiss tourism regions. Journal of Economic Behavior & Organization, 145, 1–23.

106.

Silva

E. S.

Hassani

Madsen

D. Ø.

Gee

(2019). Googling fashion: Forecasting fashion consumer behaviour using Google Trends. Social Sciences , 8(4), 1–23.

107.

Sood

James

G. M.

Tellis

G. J.

(2009). Functional regression: A new model for predicting market penetration of new products. Marketing Science, 28(1), 36–51.

108.

Sorescu

Shankar

Kushwaha

(2007). New product preannouncements and shareholder value: Don't make promises you can't keep. Journal of Marketing Research, 44(3), 468–489.

109.

Steenkamp

J.‐B. E. M.

Nijs

V. R.

Hanssens

D. M.

Dekimpe

M. G.

(2005). Competitive reactions to advertising and promotion attacks. Marketing Science, 24(1), 35–54.

110.

Steinker

Hoberg

Thonemann

U. W.

(2017). The value of weather information for e‐commerce operations. Production and Operations Management, 26(10), 1854–1874.

111.

Rao

V. R.

(2010). New product preannouncement as a signaling strategy: An audience‐specific review and analysis. Journal of Product Innovation Management, 27(5), 658–672.

112.

Sugar

C. A.

James

G. M.

(2003). Finding the number of clusters in a dataset. Journal of the American Statistical Association, 98(463), 750–763.

113.

Sun

Kumar

(2020). A manufacturer's new product preannouncement decision and the supplier's response. Production and Operations Management, 29(10), 2289–2306.

114.

Teo

T. S.

Choo

W. Y.

(2001). Assessing the impact of using the internet for competitive intelligence. Information & Management, 39(1), 67–83.

115.

Tian

C. H.

Wang

W. T.

Huang

F. C.

Dong

Huang (2014). Pre‐release sales forecasting: A model‐driven context feature extraction approach. IBM Journal of Research and Development, 58(5/6), 8:1–8:13.

116.

Tibshirani

Walther

Hastie

(2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(2), 411–423.

117.

Tripp

Grueber

Simkins

Yetter

(2020). Video games in the 21st century: The 2020 economic impact report . The Entertainment Software Association, pp. 1–54.

118.

Trusov

Rand

Joshi

Y. V.

(2013). Improving prelaunch diffusion forecasts: Using synthetic networks as simulated priors. Journal of Marketing Research, 50(6), 675–690.

119.

Wang

Zhang

Zhu

(2010). Why do moviegoers go to the theater? The role of prerelease media publicity and online word of mouth in driving moviegoing behavior. Journal of Interactive Advertising, 11(1), 50–62.

120.

Xiong

Bharadwaj

(2014). Prerelease buzz evolution patterns and new product performance. Marketing Science, 33(3), 401–421.

121.

Liao

S. S.

Song

(2011). Mining comparative opinions from customer reviews for competitive intelligence. Decision Support Systems, 50(4), 743–754.

122.

Wunsch

D. C.

(2009). Clustering. John Wiley & Sons, Inc.

123.

Zheng

Z. E.

Fader

Padmanabhan

(2012). From business intelligence to competitive intelligence: Inferring competitive measures using augmented site‐centric data. Information Systems Research, 23(3‐part‐1), 698–720.

124.

Zhu

Zhang

X. M.

(2010). Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics. Journal of Marketing, 74(2), 133–148.

Predictive competitive intelligence with prerelease online search traffic

Abstract

Keywords

INTRODUCTION

PREDICTING NEW PRODUCTS WITH PRB AND COMPETITOR INFORMATION

PREDICTING THE MARKET POTENTIAL OF COMPETITORS' NEW PRODUCTS

Functional regression

PROSTvolume

PROSTtrend

PROSTvelocity

PROSTadoption

PROST FPCs

Product characteristics

Functional clustering

PREDICTING THE SUCCESS OF NEW VIDEO GAMES RELEASES

Data

Feature estimation

PROST data

Product characteristics data

Sales data

Predictive algorithm

Functional regression

Segmentation

Benchmark models

Performance evaluation

Results

DISCUSSION

Managerial implications

Model extensions

CONCLUSIONS

Footnotes

ACKNOWLEDGMENTS

ORCID

References