Sage Journals: Discover world-class research

Abstract

Online reviews are a new source for the valuable voice of customers. By identifying the customer’s opinion, designers can comprehend the important features of a product to satisfy customer demand, thus enhancing the market competitiveness of the product. Customers have opinions on multiple aspects of products hidden in reviews, and sentiment divergence may exist. Moreover, there is a gap between customer requirements and the product’s system requirements. How to effectively analyze a large number of reviews to extract the aspect-level customer opinion and thus determine the most important product engineering characteristics in design are the critical challenges for market-driven design. A systematic requirement analysis framework is proposed in this work. First, a convolutional neural network and sentiment analysis are used for opinion mining of online reviews. Then, based on fuzzy logic, the customer sentiment divergence (which is quantified by controversy indexes) and the average sentiment of a requirement are used to determine the degree of satisfaction. Finally, based on the product’s quality function development matrix, the satisfaction and frequency of the customer requirements are used to estimate the importance of the product’s engineering characteristics, which identifies the focus of product design. A case study of a hair dryer is given to demonstrate the effectiveness of the proposed methods.

Keywords

Requirement analysis opinion mining customer satisfaction engineering characteristics online reviews

Introduction

Customer satisfaction of features and expected use scenarios of a product determine whether a customer will purchase a product.¹ Therefore, developers should have methods for detecting and responding to market feedback. By capturing and analyzing customer requirements, the important features of a product, including the functionality, structure, and behavioral characteristics, can be determined.

With the development of e-commerce, online reviews have become an easy-to-access source for customer requirements.^1,2 However, compared to traditional requirement analysis, there are many challenges in analyzing online reviews: (1) customer’s attitudes on various aspects of a product are implied in comments and expressed by natural language, rather than rating each individual aspect explicitly; (2) e-commerce platforms usually use the average rating of a product to measure customer satisfaction, which exacerbates the controversy of the opinions of each aspect; and (3) customer opinions on product features cannot be directly used to improve product design and should be translated to the importance of product engineering characteristics.

Based on the above analysis, an online review-based framework for customer requirement analysis is proposed to help designers determine the focus of product design. First, the customer-concerned product features and the corresponding customer opinions are extracted from the comment texts in online reviews based on a convolutional neural network (CNN) and sentiment analysis. Second, the customers’ sentiment divergence for each aspect of the product is quantified. Then, the average sentiment score is considered to determine the satisfaction of this aspect by fuzzy logic reasoning. Finally, the importance of each engineering characteristic is calculated.

The article is organized as follows: relevant research studies are reviewed in section “Related work.” Section “Method overview” presents a framework for online review–based requirement analysis. Sections “Customer requirement extraction of online reviews,”“Fuzzy logic-based customer satisfaction analysis,” and “Evaluation of the importance of engineering characteristics” are the foundation of this work, that is, automated customer opinion mining from online reviews, fuzzy logic-based customer satisfaction analysis, and determination of the importance of engineering characteristics. The experiments on the proposed method and the requirement analysis of the hair dryer product are shown in “Experiments and Analysis” to demonstrate the proposed framework. In section “Conclusion,” conclusions are drawn and future work is discussed.

Related work

Customer-driven product design

The voice of the customer (VOC) is widely used to capture customer expectations, preferences, and aversions.^3–5 Traditionally, market research methods, such as customer interviews, market questionnaires, and surveys, have been used to collect data and conduct customer requirements.^2,6 Under the guidance of an expert, a group of typical users is gathered together to freely discuss product-related issues in accordance with a predetermined process. Moreover, several methods, such as the analytic hierarchy process,⁷ fuzzy linguistic model,⁸ and fuzzy expert system,⁹ are used to find customers’ satisfaction with different kinds of requirements.

However, it is hard to obtain deep demands of customers and generate novel product concepts using this method.^10,11 With the development of e-commerce, the online comments of products provide a new source for capturing customer requirements. Therefore, a new requirement analysis based on massive online comments should be developed.

Opinion mining

Current research on opinion mining of online reviews focuses on feature extraction and sentiment analysis.^12–15 Jin et al.¹⁶ gave a complete review of recent advances in information mining from big consumer opinion data for product design. Aspect-level sentiment analysis¹⁷ is used to summarize customers’ attitudes of different attributes or features (which are referred to as “aspects”). Hu and Liu^18,19 used the apriori algorithm to mine the words of each product and predict whether a customer’s opinion on a specific feature is positive or negative. Based on dependency grammar, Qiu et al.²⁰ proposed an unsupervised method for extracting feature and opinion words through the double propagation of information between opinion words.²¹ Jain et al.²² employed CNN and long short-term memory architectures for extracting sentiment present in social media data and customer reviews with effective training solutions. Wu et al.²³ used CNN to directly map each review sentence into predefined aspects.

Based on sentiment analysis and analogy reasoning, Zhou et al.²⁴ proposed a model to obtain potential requirements of customers implicated in use cases. Yang et al.²⁵ combined local context information and global context information to rank features based on the rating and frequency of each feature. Jiang et al.²⁶ used both frequencies and sentiment scores to determine the importance of product features and predict their future importance based on fuzzy time series. However, the sentiment divergences between different customers are ignored in all the above methods. Xu et al.²⁷ explored the controversy of aspects in reviews by the distribution of sentiment scores, and quantitative measures were used to evaluate customer sentiment divergence.

Customer needs translation

Quality function deployment (QFD)²⁸ is widely used for translating customer requirements into system requirements. Here, each element indicates the weighted relationship between a customer requirement and an engineering characteristic.^29,30 Fuzzy set theory is usually introduced to address the vagueness of the description of customer requirements;³¹ thus, the importance of customer requirements can be determined.³² Then, computer-aided approaches are used to determine the weight between customer requirements and engineering characteristics^33,34 for customer-driven product design processes.

To convert customer preferences into the importance of engineering characteristics, Kang and Tucker³⁵ used the latent Dirichlet allocation (LDA)-based method to extract the user-desired product features implied in reviews and the engineering characteristics hidden in the patent documents. Moreover, the PageRank algorithm³⁶ was used to match a customer requirement term to corresponding engineering characteristics. However, the topic vectors that are extracted by LDA are difficult to understand and difficult to use to directly represent a product feature.

By employing machine learning techniques, Wang and Zhang³⁷ trained support vector machine (SVM)-based and decision tree–based classification models. A product can be recognized if it meets a specific customer requirement according to the technical parameters. However, it cannot address the direct conversion between individual customer requirements and engineering features. Based on the identified helpful online reviews,³⁸ Jin et al.^39,40 used a statistical language model to label the feature words in online reviews to indicate the engineering features in QFD. However, the product features and keywords in the comment sentences were all marked manually, which increases the workload and may introduce unnecessary subjectiveness.

Method overview

Based on the above analysis, a comprehensive analysis framework is proposed to effectively extract the customer opinions contained in online reviews, as shown in Figure 1. Here, the customers’ attitudes and the sentiment divergence of their opinions are fully considered to evaluate their satisfaction of each aspect to help measure the importance of product engineering characteristics to determine the focus in design. The proposed comprehensive requirement analysis process based on online reviews is as follows:

Customer requirements extraction. First, online reviews are collected, and the natural language processing technique is used to clean the obtained data. Then, the customer requirements hidden in each review are obtained with the opinion mining method. Finally, each customer’s aspect-level sentiment of the product is measured.

Customer satisfaction analysis. First, the average customer’s sentiment score for each requirement is calculated. Then, the distribution and the variance in the scores are analyzed, and the controversy indexes are used to quantify customer sentiment divergences. Finally, the sentiment score and controversy indexes of each customer requirement are used to evaluate the customer’s satisfaction for each requirement by applying the fuzzy logic method.

Importance determination of engineering characteristics. Based on the relationship between customer requirements and product engineering characteristics established in QFD, the weighted summation of the satisfaction of associated customer requirements is calculated for each engineering characteristic. Then, the importance of engineering characteristics for product design is estimated and ranked.

Figure 1.

Customer requirement analysis framework.

Customer requirement extraction of online reviews

Often, customers can express their attitudes freely based on comment texts, and thus, the number of reviews is very large. It is cumbersome to manually analyze the customer requirements from online reviews. Here, an approach for automated extraction of hidden customer opinions is proposed based on natural language and text analysis technique.

Problem analysis

Generally, the comments in an online review consist of short texts. Text classification methods can be used to label a comment sentences with customer requirements based on the implicit semantic features of these sentences. Here, TextCNN⁴¹ is used. By fine-tuning a small number of hyperparameters, the model works well for benchmark problems of different applications. One advantage of using CNN for text classification is that it can automatically obtain the feature representation of the text, and thus, complicated artificial feature engineering is avoided.

Note that the customer rating can only reflect a customer’s overall attitude for the product. As shown in Figure 2, the customer’s rating for the product is two points, which is a negative opinion. However, the customer is only dissatisfied with the heating effect of the hair dryer and has no clear bias to other aspects. In this section, automated aspect-level sentiment analysis is used to mind customer sentiment for each aspect to improve the accuracy of customer opinion analysis.

Figure 2.

An instance of a customer’s comments.

TextCNN-based text classification

Since each comment may express customer requirements of multiple aspects of a product, the automated annotation of reviews can be recognized as a problem of multilabel classification. To simplify the classification problem, an independent classifier is constructed for each of the customer requirement labels here. Therefore, the multilabel classification problem is simplified to multiple binary classifications.

Training strategy

For each binary classification model for a certain requirement label, the corresponding training set and verification set are prepared. Then, these models are trained independently to adjust the model parameters and determine whether the model has converged.

Note that the obtained samples annotated with a certain requirement label are extremely unbalanced during the manual annotation process. The number of negative samples is usually much larger than that of positive samples. The resampling strategy is introduced in each epoch of model training to equalize the negative and positive samples. Moreover, the 10-fold cross-validation method is applied for training a TextCNN model. In each training epoch, the whole data set for this epoch is divided into 10 subdata sets, nine of which are selected for model training, and the last is used for verification. The minibatch training protocol is used here. The pseudocode for model training is shown in Algorithm 1.

Algorithm 1. The pseudocode for TextCNN model training.
Input: reviews_data_set, max_epoch, batch_size, validation_frequency, stop_ precise; Output: best__params. 1 current_step = 0 2 for i in range(max_epoch): 3 pos_reviews, neg_reviews = seperate(reviews_data_set) 4 data_set = shuffle(pos_reviews + random_select(neg_reviews, pos_reviews.size)) 5 n_batches = length(data_set.size)/batch_size 6 n_train_batches = int(round(n_batches*0.9)) 7 test_set = data_set[n_train_batches:] 8 for minibatch_index in range(n_train_batches): 9 current_batch_data = generate_batch(minibatch_index, batch_size, data_set) 10 update TextCNN model with current_batch_data 11 if current_step % validation_frequency == 0: 12 validate model with test_set 13 if test_precise > stop_ precise: 14 return 15 current_step = current_step + 1

Algorithm 1. The pseudocode for TextCNN model training.

Input: reviews_data_set, max_epoch, batch_size, validation_frequency, stop_ precise;
Output: best__params.
1 current_step = 0
2 for i in range(max_epoch):
3 pos_reviews, neg_reviews = seperate(reviews_data_set)
4 data_set = shuffle(pos_reviews + random_select(neg_reviews, pos_reviews.size))
5 n_batches = length(data_set.size)/batch_size
6 n_train_batches = int(round(n_batches*0.9))
7 test_set = data_set[n_train_batches:]
8 for minibatch_index in range(n_train_batches):
9 current_batch_data = generate_batch(minibatch_index, batch_size, data_set)
10 update TextCNN model with current_batch_data
11 if current_step % validation_frequency == 0:
12 validate model with test_set
13 if test_precise > stop_ precise:
14 return
15 current_step = current_step + 1

Model parameter setting

At the input layer of a textCNN model, the input text should be encoded to word vectors. A dictionary for the corpus (all the collected reviews data) should be established to develop the model and randomly initialize the encoding of each word and then dynamically update the word vectors through backpropagation of the neural network during the model training process. The parameters used for model training in this study are as follows. The input includes 128D word vectors. The convolution layers are created with the region size h =[1,2,3,4,5] and 128 feature maps for each region size. The probability of dropout during training is 0.5 (randomly choose 50% of neurons and set the weight to 0) to prevent overfitting.

Sentiment analysis

Sentiment analysis refers to the use of natural language processing, text analysis, and computational linguistics to systematically determine the attitude with respect to some topics. Here, the opinions expressed by the comment sentences refer to the positive or negative sentiment of some aspects of the product. The sentiment dictionary SentiWordNet⁴² is adopted to calculate the sentiment score of each comment in which each word is associated with its synsets. Each synset can be viewed as an interpretation of the semantics of the word in different contexts. Moreover, according to the statistics of sentiments in massive documents, it gives the estimated degrees of positivity, negativity, and neutrality for each synset. Therefore, for each meaningful word in a comment sentence, its sentiment can be determined by searching the synsets from SentiWordNet. After the comment, sentences are annotated, and the sentiment score of each word is calculated, the sentiment score of a comment sentence is calculated as follows

scor e_{obj} = 1 - (\sum scor e_{pos} + \sum scor e_{neg})

(1)

Here, $scor e_{pos}$ and $scor e_{neg}$ are the positivity and negativity of the contained meaningful words such as nouns, verbs, adjectives, and adverbs.

After that, all the sentiment scores of the sentences in a review are integrated to obtain the aspect-level sentiment score of that review. If all sentences in the review are annotated with only one requirement r, its rating is used as the sentiment score of r. Otherwise, for each requirement $r_{i}$ mentioned by this review, all the sentences that have been annotated with the label $r_{i}$ are selected, and the meaning of the sentiment scores of these sentences is calculated as $\bar{scor e_{r_{i}}}$ . Finally, $\bar{scor e_{r_{i}}}$ is normalized into a value of [1,2,3,4,5]

By analyzing the sentiment scores of all the requirements mentioned in each review, the average sentiment score of each requirement can be obtained. Moreover, by analyzing the distribution of sentiment scores from individual customers, it can be evaluated that the customer group has aspect-level sentiment divergences.

Fuzzy logic-based customer satisfaction analysis

Based on the opinion mining of online reviews, the average sentiment score of each aspect can be obtained to measure the satisfaction of each customer requirement. However, the customers may have sentiment divergences of an aspect. Here, the term “controversy” is introduced to quantify the customer’s sentiment divergences. A fuzzy logic approach is used to derive the satisfaction with each customer requirement, and a fuzzy logic reasoning framework is given, as shown in Figure 3. Here, the hard controversy and soft controversy are considered first, and a series of fuzzy rules are used to determine the controversy of requirements. Then, 20 fuzzy rules are used to determine the satisfaction of each requirement.

Figure 3.

The fuzzy reasoning framework of requirement satisfactions.

The quantitative measure of controversy

Amendola et al.⁴³ proposed two kinds of controversy to evolve the perception of controversial movies: hard controversy and soft controversy. Two indexes, H and S, were defined as their normalized measures. Here, the two indexes are adapted. H measures the degree of sentiment difference among customer reviews, whereas S measures the sentiment scores of reviews with evenly distributed reviews, which are defined as follows

H_{i} = \frac{1}{C_{H}} {[\sum_{r_{m} \in R_{i}} {(r_{m} - \bar{r_{i}})}^{2}]}^{1 / 2}

(2)

s_{i} = 1 - \frac{1}{c_{s}} {[\sum_{m = 1}^{M} {(p_{i, m} - \frac{1}{m})}^{2}]}^{1 / 2}

(3)

Here, H_i measures the hard controversy of the ith customer requirement; $\bar{r_{i}} = (\sum_{r_{m} \in R_{i}} r_{m}) / N_{i}$ is the average sentiment score of all the reviews for customer requirement i; $N_{i}$ is the number of reviews for customer requirement i; and $C_{H}$ is a normalization factor. In a completely polarized distribution (if half the scores are 1 and half are 5), the largest value $C_{H} = (5 - 1) / 2 = 2$ is obtained. Thus, H is the normalized standard deviation of sentiment scores and $H = 1$ in the completely polarized distribution.

S_i is a square root of a $χ^{2}$ statistics relative to the flat distribution for sentiment scores of the ith customer requirement. $p_{i, m}$ is the ratio of the scores of i and its value is in the range of (m − 1, m + 1). $c_{s} = \sqrt{1 - (1 / m)}$ is a normalization factor with M = 5 (the highest score). Therefore, if the sentiment scores are across a broad spectrum, S_i = 1. If all the sentiment scores gather in a single value, S_i = 0.

Satisfaction determination based on fuzzy logic

As mentioned above, H and S are used to reflect the degree of controversy of the sentiment scores. However, there is a contradiction between human fuzzy language habits and precise mathematical quantities. In this section, a fuzzy logic approach is proposed to determine the satisfaction degree of a customer requirement based on its controversy indexes and sentiment score.

The inference of controversy based on fuzzy logic

Obviously, the lower the value of H, the more scores that are concentrated to the mean. Similarly, S = 0 if all the sentiment scores are concentrated to a single value. The higher the value S is, the more uniform the distribution of scores. Here, three fuzzy sets are defined for the value of H according to the conclusion of Amendola:³⁹ low (L), medium (M), and high (H). Their membership functions are shown in Figure 4, which are trapezoidal fuzzy functions, triangular fuzzy functions, and trapezoidal fuzzy functions, respectively. The values of the three segmentation points are 0.4, 0.6, and 0.8. Similarly, the fuzzy sets of S are defined with low (L), medium (M), and high (H). The membership functions are shown in Figure 5, where the values of the three segmentation points are 0.6, 0.7, and 0.8, respectively.

Figure 4.

Membership functions of H.

Figure 5.

Membership functions of S.

The value of the output linguistic variable, the controversiality, is also divided into four fuzzy sets: low (L), medium (M), high (H), and very high (VH), and the corresponding membership function is shown in Figure 6. After the value of H and the value of S are converted into the membership degrees of corresponding linguistic variables, these membership degrees, together with the linguistic variables, are regarded as the antecedent parts for fuzzy inference.

Figure 6.

Membership functions of controversiality.

To comprehensively consider the indexes of hard controversy and soft controversy to determine the degree of the controversiality of a customer requirement, nine fuzzy rules are defined in the form of “IF H = X AND S = Y,

THEN controversiality = Z” as shown in Table 1.

Table 1.

The fuzzy rules for the determination of controversiality.

AND		IF H is
		L	M	H
IF S is	L	L	M	VH
	M	L	M	VH
	H	M	H	H

VH: very high.

The inference of satisfaction

Using the average sentiment score and controversiality of a customer requirement as inputs, the customers’ satisfaction of this requirement can be determined by fuzzy logic. Since the sentiment scores are normalized to [1,5], the corresponding fuzzy sets are defined with very negative (VN), negative (N), neutral (M), positive (P), and very positive (VP). Their membership functions are shown in Figure 7. For the output variables, based on the experience of marketing experts, the fuzzy sets of satisfaction are set to extremely not satisfied (ENS), not satisfied (NS), satisfied (S), very satisfied (VS), and extremely satisfied (ES). Their membership functions are shown in Figure 8.

Figure 7.

Membership functions of sentiment.

Figure 8.

Membership functions of satisfaction.

In this work, “controversy” is introduced as a factor for the determination of customers’ satisfaction degree of a certain requirement. If a customer requirement has a high average sentiment score with a single polarity distribution of sentiment scores, the customers have a census of positive comments. Otherwise, when there is no census of customers’ attitudes for a requirement, even if it has a high average sentiment score, it indicates that customers are not fully satisfied with this requirement. Based on the above analysis, 20 fuzzy rules are used to determine the satisfaction of a customer requirement, as shown in Table 2.

Table 2.

The fuzzy rules for the determination of satisfaction.

AND		IF sentiment is
		VN	N	M	P	VP
IF controversiality is	L	ENS	NS	S	VS	ES
	M	ENS	NS	S	VS	VS
	H	ENS	NS	NS	S	S
	VH	ENS	ENS	ENS	NS	S

VN: very negative; VP: very positive; ENS: extremely not satisfied; VS: very satisfied; ES: extremely satisfied; VH: very high.

Evaluation of the importance of engineering characteristics

During the product design process, designers often develop a QFD matrix to interpret the correspondence between the requirements and the product’s engineering characteristics. According to statistics, 80% of product design is an adaptive design in which there are existing product design documents, including the QFD matrix. Therefore, based on a product’s QFD matrix, the importance of various customer requirements can be translated into that of all engineering characteristics.

The relationship between 21 customer requirements and 26 engineering characteristics of hair dryer products is given in Ulrich and Eppinger.³ Here, 12 types of customer requirements are selected and integrated for analysis. Moreover, 21 corresponding engineering characteristics and the relations between requirements and these engineering characteristics are listed, as shown in Figure 9. For example, when designers want to evaluate the importance of the engineering characteristic “supply front airflow” in product design, the frequency of mention and customer satisfaction of the four requirements of “reliability,”“price,”“noise,” and “weight” need to be considered. In addition, the weights of these four requirements to “supply front airflow” are 9, 4, 9, and 9.

Figure 9.

The QFD matrix of hair dryer product.

The frequency and sentiment scores of each customer requirement mentioned in the online reviews imply the importance of this requirement for product development, while the product QFD matrix reflects the product’s engineering characteristics that are closely related to the customer requirements. If an engineering characteristic of the product is exactly associated with the requirements that are of high concern by customers with low satisfaction, the improvement of this characteristic can effectively meet the market’s demands. Based on the above analysis, the evaluation of the importance of a product’s engineering characteristics is proposed to improve product design efficiency. The calculation formula is as follows

\begin{matrix} Importanc e_{i} = \\ \sum_{r \in R_{i}} (1 - Satisfactio n_{r}) \times Fre q_{r} \times Weigh t_{r, i} \end{matrix}

(4)

Here, $Importanc e_{i}$ is the quantitative indicator of the importance of the ith engineering characteristic. $Satisfactio n_{r}$ refers to the degree of customer satisfaction to the degree of requirement r, which is determined by fuzzy logic and its range is [0,1]. The larger the value, the higher the customer satisfaction. There is a negative correlation between the importance of a requirement and its satisfaction. $Fre q_{r}$ is the frequency of the customer requirement r mentioned in all reviews, and $Weigh t_{r, i}$ indicates the weight of the relationship between the requirement r and the engineering characteristic i in the QFD matrix. Therefore, evaluation of the importance of each product’s engineering characteristic requires checking all the related customer requirements with their weights in the QFD matrix. By calculating and ranking the importance of all engineering characteristics, designers can identify the focus of product design.

Experiments and analysis

In this section, hair dryers are chosen as research objects, and online reviews of multiple popular hair dryers from e-commerce platforms are collected. All of the reviews are manually annotated with customer requirement labels for training the classifiers for automated review annotations. Finally, the online reviews of a specific hair dryer are used to verify the proposed method for assessing the importance of each engineering characteristic in the product design based on customer requirements analysis.

Online review collection and annotation

All the customer reviews are crawled from Amazon (www.amazon.us). Using “hair dryer” as the search keyword, Amazon automatically sorts and displays the products of different brands and models according to their sales and ratings. Based on the ranking results and the number of customer reviews, four best-selling products from 4 well-known brands are selected: Revlon infrared hair dryer, Remington hair dryer with ionic +ceramic + tourmaline technology, Conair mid-size styler hair dryer, and Conair ionic ceramic hair dryer. The scale of the customer review data is listed in Table 3 to show the number of reviews obtained for each product, the number of valid sentences after text cleaning and segmentation (the sentences that contain only one word or only illegal characters are removed), the maximum sentence length of the valid sentences (the number of words in the longest sentence), and the average sentence length.

Table 3.

The review data of four hair dryer products.

Product name	Revlon	Remington	Conair mid-size	Conair ionic ceramic
Rate	4.3/5	4.2/5	4.2/5	4.3/5
ID	B00GZG4KSG	B00MMRFUG8	B001LQY1X4	B00132ZG3U
Number of reviews	470	819	1276	1982
Number of valid sentences	996	2012	2887	6050
Maximum length	93	69	77	77
Average length	14	11	10	12

After that, all the reviews of the four products listed in Table 3 are labeled by two graduate students from industrial design. During the annotation process, all the comment sentences are first labeled by the two annotators separately. If a comment sentence has different annotation results, the annotators negotiate with each other to determine the final labels of that sentence. The reviews of the four products are labeled by the 12 selected types of customer requirements. The annotation results are shown in Figure 10. The y-axis shows the number of sentences labeled to the corresponding customer requirements for each product.

Figure 10.

The manual annotation results for the four hair dryer products.

Performance analysis of the automated annotation

Based on the selected CNN architecture and parameter settings, the TensorFlow framework is used to construct the binary classification models for the automated annotation of reviews of hair dryers to 12 customer requirement labels. Two experiments are designed to validate the performance of the models and verify our hypothesis: “the classifiers that are trained with the review data set of some products belong to a specific category and can be used to label the customer reviews of the same type of product.”

Experiment 1

Customer reviews of a single hair dryer product are used to develop the classification model, and customer reviews of other hair dryer products are used to test the automated annotation performance. For the review data set shown in Table 3, product B00132ZG3U has the largest number of samples (6050 valid sentences). Therefore, it is selected as the training set, while the reviews of the other three products are used as the test set. The experimental results are shown in Table 4. Although the binary discriminant classifiers of the 12 customer requirements exhibit more than 95% accuracy for the test set, there are still some cases where the precise and recall are 0. The low precision and recall are due to the imbalance of positive and negative samples in the training set. Thus, even if the classifier labels all positive samples in the test set as negative samples, it can still achieve high accuracy. Moreover, in the review data of a single product, the customers may describe a particular requirement with a limited vocabulary. For example, the word “purple” is extensively used to comment on the appearance of the B00132ZG3U. However, the feature does not appear in the reviews of the other three products that were used as the test set. The singularity of the training set may lead to overfitting the classifiers.

Table 4.

Performance of the classifier trained by the reviews of a single product.

Training set		Test set
B00132ZG3U		B00GZG4KSG			B00MMRFUG8			B001LQY1X4
Discriminative model	Time (s)	Acc.	Pre.	Rec.	Acc.	Pre.	Rec.	Acc.	Pre.	Rec.
Safety	30	0.980	0.778	0.28	0.986	0.5	0.276	0.988	0.842	0.34
Efficiency	61	0.930	0.771	0.679	0.963	0.676	0.754	0.964	0.538	0.788
Reliability	32	0.972	0	0	0.986	0	0	0.987	0	0
Appearance	33	0.978	0	0	0.992	1	0.414	0.997	1	0.1
Price	34	0.964	0.863	0.734	0.967	0.647	0.688	0.938	0.694	0.690
Noise	36	0.974	0.565	0.448	0.976	0.688	0.603	0.988	0.585	0.558
One-hand use	38	0.963	0.103	0.667	0.981	0.660	0.574	0.981	0.117	0.778
Temperature setting	117	0.971	0.639	0.590	0.968	0.775	0.769	0.952	0.715	0.456
Speed setting	41	0.984	0.636	0.368	0.980	0.639	0.767	0.978	0.239	0.630
Hand grip	43	0.979	1	0.16	0.995	0	0	0.993	1	0.269
Weight	46	0.978	0.763	0.849	0.976	0.789	0.71	0.982	0.790	0.832
Size	48	0.991	0.533	0.8	0.980	0.472	0.436	0.948	0.742	0.570

Experiment 2

Combining the customer review corpus of different products can improve the diversity of training data. As a result, the trained discriminant classifiers can have a better performance for labeling reviews of the same type of products. In this experiment, two sets of classifiers are trained. First, the reviews of B00132ZG3U, B00MMRFUG8, and B001LQY1X4 are used as training sets to develop the classifiers, while B00GZG4KSG’s customer reviews are used as test sets. Then, the reviews of B00132ZG3U, B00MMRFUG8, and B001LQY1X4 are used as the training set, while B001LQY1X4’s customer reviews are used as the test set. The experimental results are shown in Table 5. Compared with Table 4, the precision and recall of the classifiers of each customer requirement improve, especially in the three aspects of “reliability,”“appearance,” and “hand grip.” It can be concluded that the low precision and recall caused by overfitting can be avoided using larger and more diverse training data.

Table 5.

Performance of the classifier trained by the reviews of multiple products.

Training set	B00132ZG3U, B00MMRFUG8, and B001LQY1X4				B00132ZG3U, B00MMRFUG8, and B00GZG4KSG
Test set	B00GZG4KSG				B001LQY1X4
Discriminative model	Time (s)	Acc.	Pre.	Rec.	Time	Acc.	Pre.	Rec.
Safety	30	0.985	0.857	0.480	30	0.981	0.437	0.596
Efficiency	62	0.934	0.804	0.672	158	0.976	0.610	0.847
Reliability	32	0.969	0.440	0.393	31	0.985	0.379	0.297
Appearance	33	0.982	1	0.143	33	0.995	0.3	0.3
Price	297	0.967	0.859	0.777	35	0.947	0.736	0.741
Noise	36	0.973	0.533	0.552	71	0.990	0.683	0.651
One-hand use	36	0.944	0.083	0.833	74	0.976	0.093	0.777
Temperature setting	305	0.970	0.596	0.718	156	0.954	0.689	0.585
Speed setting	40	0.979	0.450	0.474	82	0.974	0.236	0.777
Hand grip	42	0.982	0.733	0.440	43	0.992	0.536	0.577
Weight	44	0.962	0.590	0.925	89	0.986	0.799	0.939
Size	138	0.991	0.533	0.800	49	0.944	0.707	0.540

By comparing the results of the two experiments, it can be concluded that the classifiers that are trained with the review data set of some products belonging to a specific category can be used to label the customer reviews of the same type of products. Therefore, if a sufficient number of customer reviews of the typical and popular products for a certain category are collected and manually labeled as the training data, classifiers can be trained for the automated requirement type annotation of customer reviews of other products of the same category. Note that the classification model has universality for reviews of the products belonging to the same category, which can reduce the cumbersome manual annotation.

Case study of customer requirement analysis

In this section, the Revlon hair dryer (B000A3I2X4) is analyzed based on its online reviews using the proposed method step by step. First, for each comment sentence of the reviews, the corresponding requirement types should be labeled for subsequent customer opinion mining and customer satisfaction estimation. The scale of the customer reviews of the hair dryer is shown in Table 6. There are 1341 reviews with 2007 valid sentences in which the average and the maximum length are 14 and 87 words, respectively.

Table 6.

Customer review of Revlon quick dry lightweight hair dryer.

Rate	Number of reviews	Number of valid sentence	Maximum length	Average length
3.9/5	1341	2006	83	14

To obtain the highest possible accuracy for the CNN short text classifier, it is necessary to provide as much training data as possible. Therefore, the manually labeled customer reviews of the B00GZG4KSG, B00MMRFUG8, B001LQY1X4, and B00132ZG3U are used as training sets to train 12 binary discriminant classifiers for the selected 12 customer requirements.

The customer reviews of the Revlon hair dryer are automatically annotated by the trained classifiers. The frequencies of the 12 customer requirements that have been mentioned in the collected 2006 comment sentences are shown in Table 7. The four items mentioned with highest frequencies are “price” (305 times), “weight” (295 times), “temperature setting” (215 times), and “efficiency” (163 times). Therefore, the customer’s concern reflected in the classification results of its reviews meets the characteristics of this product as its name “Revlon quick dry lightweight hair dryer.”

Table 7.

The frequencies of the 12 customer requirements mentioned in the reviews.

Customer needs	Number of sentence	Frequency (%)
Safety	75	6.06
Efficiency	163	13.2
Reliability	87	7.03
Appearance	30	2.42
Price	305	24.6
Noise	79	6.38
One-hand use	138	11.1
Temperature setting	215	17.4
Speed setting	106	8.56
Hand grip	20	1.62
Weight	295	23.8
Size	87	7.03

Then, sentiment analysis and controversy calculation are performed on the review data corresponding to each of the customer requirements. The calculation results and comparisons of the average sentiment score, hard controversy, and soft controversy for the 12 customer requirements of the hair dryer are shown in Figure 11. Among all the requirement items, the “appearance” has a very high sentiment score with relatively low soft/hard controversy. It can be determined that customers have high satisfaction with the appearance of the hair dryer. In addition, there are low sentiment scores and a high degree of controversy from the reviews for the “reliability” of this product, which means that customers are not satisfied with the reliability of this product.

Figure 11.

Customers’ opinions of the 12 requirements.

Using the soft/hard controversy and average sentiment score of each requirement item as input, the degree of customer satisfaction of each requirement can be evaluated based on the fuzzy logical framework proposed in section “Satisfaction determination based on fuzzy logic.” The average sentiment score, soft/hard controversy, and the overall controversy inferred by fuzzy logic with the evaluated satisfaction for each customer requirement are shown in Table 8. The “appearance” has the highest satisfaction with a value of 0.823, and the “reliability” has the lowest satisfaction with a value of 0.086.

Table 8.

The obtained satisfaction of 12 requirements based on fuzzy logic.

Customer needs	Average score	H	S	Controversy	Satisfaction
Safety	2.83	0.8	0.764	0.792	0.121
Efficiency	3.42	0.837	0.709	0.866	0.207
Reliability	2.72	0.854	0.722	0.851	0.086
Appearance	4.73	0.412	0.204	0.193	0.823
Price	4.24	0.646	0.318	0.595	0.75
Noise	2.86	0.788	0.793	0.691	0.266
One-hand use	3.46	0.781	0.681	0.818	0.213
Temperature setting	3.36	0.796	0.738	0.82	0.196
Speed setting	3.6	0.757	0.721	0.769	0.319
Hand grip	4.3	0.55	0.414	0.396	0.751
Weight	3.74	0.75	0.613	0.758	0.349
Size	3.51	0.795	0.666	0.848	0.218

Then, based on the relationships between customer requirements and engineering characteristics in the QFD matrix, as shown in Figure 9, the importance of all engineering characteristics can be calculated with the formula given in section “Evaluation of the importance of engineering characteristics.” The importance of engineering characteristics reflects how important a certain aspect of the product needs to be considered in the design process to maximize customer satisfaction. The importance of the 12 engineering characteristics of the Revlon hair dryer is ranked from high to low, as shown in Figure 12. Compared to other features, “heats front air flow” has the highest priority in the design process. In addition, “offers foldability,”“allowing control of burst,” and “supplies front air flow” also have very high importance. Therefore, if Revlon intends to develop an upgrade to its quick dry lightweight hair dryer, it is necessary to attach great importance to these four features to enhance the customer experience and meet the user’s needs.

Figure 12.

The rank of the importance of engineering characteristics based on customer requirement analysis.

To verify the effectiveness of this approach, we designed a questionnaire about the importance of the 21 engineering characteristics and identified 12 persons using this kind of hair dryer. Similarly, the importance is ranged and quantified by a score from 5 to 0. The returned results are tabulated as follows (here, only the most important seven engineering characteristics are given). It can be seen from Table 9 that the proposed method is quite effective even though there are some differences between the values of the first seven important engineering characteristics.

Table 9.

Review comments directly collected from real users.

	Heats front air flow	Offers foldability	Allows control of a burst	Supplies front air flow	Allows power control	Draws air in	Contains rear air flow
Total score	51	32	33	35	28	14	18
Averaged score	4.25	2.67	2.75	2.92	2.33	1.17	1.5

Conclusion

An automated customer opinion mining and analysis framework is proposed in this work. The customers’ attitudes and sentiment divergences of each aspect of the product are considered to determine the satisfaction of different customer requirements by fuzzy logic inference. The main contributions are as follows:

An aspect-level opinion mining method for the automated analysis of online reviews is proposed that can process a large number of online reviews efficiently and thus avoid time-consuming and labor-intensive manual analysis.

A fuzzy logic system is developed to infer the degree of satisfaction of each aspect of a product in which the hard and soft controversy indexes are also considered for measuring the sentiment divergences among customers. It helps to find the product attributes with negative opinions or a polarized distribution of opinions.

A systematic customer requirement analysis framework for market-driven product design is proposed. The importance of a product’s engineering characteristics is determined using the product’s QFD matrix through which the customer’s attitudes and preferences are used to guide the design.

A series of experiments with the practical products of hair dryers and their online reviews are conducted. The experimental results show that the importance of different engineering characteristics can be analyzed based on customer requirements, which are quite useful for improving product design.

In the future, we will look for benchmark data sets to compare the opinion mining methods proposed in this work with other text classification or text annotation methods. Moreover, we also intend to use machine learning-based sentiment polarity classification on the data set used in this work. Therefore, finding the most suitable opinion mining method to improve the effectiveness of online review-based requirements analysis is a future focus. Moreover, the mapping of the comment text to the corresponding product’s engineering characteristics is another interesting topic that helps to directly analyze the customer’s satisfaction with the product’s engineering characteristics.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors appreciate the support from the National Key Technology Support Program (2018YFB1700901/3), the National Science Foundation of China (61672247, 61772247, and 61873236), and the public service projects of Zhejiang province (LGG18E050006).

ORCID ID

Yusheng Liu

References

Woodruff

. Customer value: the next source for competitive advantage. J Acad Mark Sci 1997; 25(2): 139.

Otto

. Product design: techniques in reverse engineering and new product development. Beijing, China: Tsinghua University Press Ltd, 2003.

Ulrich

Eppinger

. Product design and development product design and development. Biosen Bioelectron 2011; 7(2): 85–89.

Narasimhan

. Voice of the customer: capture and analysis. TQM J 2009; 21(6): 636–637.

Yang

Liu

Liang

, et al. Exploiting user experience from online customer reviews for product design. Int J Info Manag 2019; 46: 173–186.

Ming

, et al. Understanding customer requirements through quantitative analysis of an improved fuzzy Kano’s model. Proc IMechE, Part B: J Engineering Manufacture 2015; 231(4): 699–712.

Lin

Wang

Chen

, et al. Using AHP and TOPSIS approaches in customer-driven product design process. Comput Indus 2008; 1: 17–31.

Yan

. A novel fuzzy linguistic model for prioritising engineering design requirements in quality function deployment under uncertainties. Int J Prod Res 2013; 51(21): 6336–6355.

Kwong

Chen

Bai

, et al. A methodology of determining aggregated importance of engineering characteristics in QFD. Comput Indus Eng 2007; 53(4): 667–679.

10.

Leonard

Rayport

. Spark innovation through empathic design. Harv Busin Rev 1997; 75: 102–115.

11.

Yan

, et al. Conceptual design method driven by product genes. Proc IMechE, Part B: J Engineering Manufacture 2020; 234(3): 463–478.

12.

Cambria

Speer

Havasi

, et al. SenticNet: a publicly available semantic resource for opinion mining. In: Proceedings of 2010 association for the advancement of artificial intelligence fall symposium (AAAI), Arlington, VA, 11–13 November 2010, pp.14–18. Menlo Park, CA: AAAI Press.

13.

Moreo

Romero

Castro

, et al. Lexicon-based comments-oriented news sentiment analyzer system. Expert Syst Appl 2012; 39(10): 9166–9180.

14.

Khan

Qamar

Bashir

. Building normalized SentiMI to enhance semi-supervised sentiment analysis. J Intell Fuzzy Syst 2015; 29: 1805–1816.

15.

Socher

Perelygin

, et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, WA, 18–21 October 2013, pp.1631–1642. Stanford, CA: Stanford University.

16.

Jin

Liu

, et al. Review on recent advances in information mining from big consumer opinion data for product design. J Comput Info Sci Eng 2018; 19(1): 010801.

17.

Zha

Wang

, et al. Aspect ranking: identifying important product aspects from online consumer reviews. In: Proceedings of the fourty-nineth annual meeting of the Association for Computational Linguistics: human language technologies, Portland, OR, 19–24 June 2011, pp.1496–1505. Stroudsburg, PA: Association for Computational Linguistics.

18.

Liu

. Mining opinion features in customer reviews. Proc Nat Conf Artif Intel 2004; 4(4): 755–760.

19.

Liu

Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, 22–25 August 2004, pp.168–177. New York: ACM.

20.

Qiu

Liu

, et al. Opinion word expansion and target extraction through double propagation. Comput Linguis 2011; 37(1): 9–27.

21.

Lafferty

McCallum

Pereira

FCN

. Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings of the 18th international conference on machine learning, Williamstown, MA, 28 June–1 July 2001.

22.

Jain

Kumar

Mahanti

. Sentiment recognition in customer reviews using deep learning. Int J Enterp Info Syst 2018; 14(2): 77–86.

23.

Sun

, et al. Aspect-based opinion summarization with convolutional neural networks. In: Proceedings of the international joint conference on neural networks, Vancouver, BC, Canada, 24–29 July 2016, pp.3157–3163. New York: IEEE.

24.

Zhou

Jiao

Linsey

. Latent customer needs elicitation by use case analogical reasoning from sentiment analysis of online product reviews. J Mech Des 2015; 137(7): 071401.

25.

Yang

Liu

Lin

, et al. Combining local and global information for product feature extraction in opinion documents. Info Process Lett 2016; 116(10): 623–627.

26.

Jiang

Kwong

Yung

, et al. Predicting future importance of product features based on online customer reviews. J Mech Des 2017; 139(11): 111413.

27.

Tao

Lin

, et al. Exploring controversy via sentiment divergences of aspects in reviews. In: Proceedings of the IEEE pacific visualization symposium, Seoul, South Korea, 18–21 September 2017, pp. 240–249. New York: IEEE.

28.

Childs

PRN

. Mechanical design engineering handbook. 2nd ed. Burlington: Butterworth-Heinemann, 2018.

29.

Nahm

Ishikawa

Inoue

. New rating methods to prioritize customer requirements in QFD with incomplete customer preferences. Int J Adv Manuf Technol 2013; 65(9–12): 1587–1604.

30.

Griffin

Hauser

. The voice of the customer. Mark Sci 1993; 12(1): 1–27.

31.

Harding

Popplewell

Fung

, et al. An intelligent information framework for market driven product design. Comput Indus 2001; 44(1): 49–63.

32.

Jia

Liu

Lin

, et al. Quantification for the importance degree of engineering characteristics with a multi-level hierarchical structure in QFD. Int J Prod Res 2016; 54(6): 1627–1649.

33.

Moore

Louviere

Verma

. Using conjoint analysis to help design product platforms. J Prod Innov Manag 1999; 16(1): 27–39.

34.

Chen

Hoyle

Wassenaar

. Decision-based design: integrating consumer preferences into engineering design. London: Springer Science & Business Media, 2012.

35.

Kang

Tucker

. Exploring the correlation between new function attributes mined from different product domains and market sales. Eng Econ. Epub ahead of print 19 May 2017. DOI: 10.1080/0013791X.2017.1314567.

36.

Kang

Tucker

. Automated mapping of product features mined from online customer reviews to engineering product characteristics. In: Proceedings of the ASME international design engineering technical conferences and computers and information in engineering conference (IDETC/CIE), Charlotte, NC, 21–24 August 2016.

37.

Wang

Zhang

. Bridging the semantic gap in customer needs elicitation: a machine learning perspective. In: Proceedings of the 21st international conference on engineering design (ICED 17), Vancouver, BC, Canada, 21–25 August 2017.

38.

Liu

Jin

, et al. Identifying helpful online reviews: a product designer’s perspective. Comp Aid Design 2013; 45(2): 180–194.

39.

Jin

Liu

, et al. Understanding big consumer opinion data for market-driven product design. Int J Prod Res 2016; 54(10): 1–23.

40.

Jin

Liu

, et al. Translating online customer opinions into engineering characteristics in QFD: a probabilistic language analysis approach. Eng Appl Artif Intell 2015; 41: 115–127.

41.

Kim

. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 empirical methods in natural language processing (EMNLP), Doha, Qatar, 26–28 October2014.

42.

Baccianella

Esuli

Sebastiani

. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the seventh international conference on language resources and evaluation (LREC), vol. 10, Valletta, Malta, 17–23 May 2010, pp.2200–2204. Paris: European Language Resources Association.

43.

Amendola

Marra

Quartin

. The evolving perception of controversial movies. Soc Sci Electro Pub 2015; 1: 15038.

A systematic estimation approach for the importance of engineering characteristics based on online reviews

Abstract

Keywords

Introduction

Related work

Customer-driven product design

Opinion mining

Customer needs translation

Method overview

Customer requirement extraction of online reviews

Problem analysis

TextCNN-based text classification

Training strategy

Model parameter setting

Sentiment analysis

Fuzzy logic-based customer satisfaction analysis

The quantitative measure of controversy

Satisfaction determination based on fuzzy logic

The inference of controversy based on fuzzy logic

The inference of satisfaction

Evaluation of the importance of engineering characteristics

Experiments and analysis

Online review collection and annotation

Performance analysis of the automated annotation

Experiment 1

Experiment 2

Case study of customer requirement analysis

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID ID

References