Data Mining for the Internet of Things: Literature Review and Challenges

Abstract

The massive data generated by the Internet of Things (IoT) are considered of high business value, and data mining algorithms can be applied to IoT to extract hidden information from data. In this paper, we give a systematic way to review data mining in knowledge view, technique view, and application view, including classification, clustering, association analysis, time series analysis and outlier analysis. And the latest application cases are also surveyed. As more and more devices connected to IoT, large volume of data should be analyzed, the latest algorithms should be modified to apply to big data. We reviewed these algorithms and discussed challenges and open research issues. At last a suggested big data mining system is proposed.

1. Introduction

The Internet of Things (IoT) and its relevant technologies can seamlessly integrate classical networks with networked instruments and devices. IoT has been playing an essential role ever since it appeared, which covers from traditional equipment to general household objects [1] and has been attracting the attention of researchers from academia, industry, and government in recent years. There is a great vision that all things can be easily controlled and monitored, can be identified automatically by other things, can communicate with each other through internet, and can even make decisions by themselves [2]. In order to make IoT smarter, lots of analysis technologies are introduced into IoT; one of the most valuable technologies is data mining.

Data mining involves discovering novel, interesting, and potentially useful patterns from large data sets and applying algorithms to the extraction of hidden information. Many other terms are used for data mining, for example, knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, and information harvesting [3]. The objective of any data mining process is to build an efficient predictive or descriptive model of a large amount of data that not only best fits or explains it, but is also able to generalize to new data [4]. Based on a broad view of data mining functionality, data mining is the process of discovering interesting knowledge from large amounts of data stored in either databases, data warehouses, or other information repositories.

On the basis of the definition of data mining and the definition of data mining functions, a typical data mining process includes the following steps (see Figure 1). (i)

Data preparation: prepare the data for mining. It includes 3 substeps: integrate data in various data sources and clean the noise from data; extract some parts of data into data mining system; preprocess the data to facilitate the data mining.

(ii)

Data mining: apply algorithms to the data to find the patterns and evaluate patterns of discovered knowledge.

(iii)

Data presentation: visualize the data and represent mined knowledge to the user.

Figure 1

The data mining overview.

We can view data mining in a multidimensional view. (i)

In knowledge view or data mining functions view, it includes characterization, discrimination, classification, clustering, association analysis, time series analysis, and outlier analysis.

(ii)

In utilized techniques view, it includes machine learning, statistics, pattern recognition, big data, support vector machine, rough set, neural networks, and evolutionary algorithms.

(iii)

In application view, it includes industry, telecommunication, banking, fraud analysis, biodata mining, stock market analysis, text mining, web mining, social network, and e-commerce [3].

A variety of researches focusing on knowledge view, technique view, and application view can be found in the literature. However, no previous effort has been made to review the different views of data mining in a systematic way, especially in nowadays big data [5–7]; mobile internet and Internet of Things [8–10] grow rapidly and some data mining researchers shift their attention from data mining to big data. There are lots of data that can be mined, for example, database data (relational database, NoSQL database), data warehouse, data stream, spatiotemporal, time series, sequence, text and web, multimedia [11], graphs, the World Wide Web, Internet of Things data [12–14], and legacy system log. Motivated by this, in this paper, we attempt to make a comprehensive survey of the important recent developments of data mining research. This survey focuses on knowledge view, utilized techniques view, and application view of data mining. Our main contribution in this paper is that we selected some well-known algorithms and studied their strengths and limitations.

The contribution of this paper includes 3 parts: the first part is that we propose a novel way to review data mining in knowledge view, technique view, and application view; the second part is that we discuss the new characteristics of big data and analyze the challenges. Another important contribution is that we propose a suggested big data mining system. It is valuable for readers if they want to construct a big data mining system with open source technologies.

The rest of the paper is organized as follows. In Section 2 we survey the main data mining functions from knowledge view and technology view, including classification, clustering, association analysis, and outlier analysis, and introduce which techniques can support these functions. In Section 3 we review the data mining applications in e-commerce, industry, health care, and public service and discuss which knowledge and technology can be applied to these applications. In Section 4, IoT and big data are discussed comprehensively, the new technologies to mine big data for IoT are surveyed, the challenges in big data era are overviewed, and a new big data mining system architecture for IoT is proposed. In Section 5 we give a conclusion.

2. Data Mining Functionalities

Data mining functionalities include classification, clustering, association analysis, time series analysis, and outlier analysis. (i)

Classification is the process of finding a set of models or functions that describe and distinguish data classes or concepts, for the purpose of predicting the class of objects whose class label is unknown.

(ii)

Clustering analyzes data objects without consulting a known class model.

(iii)

Association analysis is the discovery of association rules displaying attribute-value conditions that frequently occur together in a given set of data.

(iv)

Time series analysis comprises methods and techniques for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.

(v)

Outlier analysis describes and models regularities or trends for objects whose behavior changes over time.

2.1. Classification

Classification is important for management of decision making. Given an object, assigning it to one of predefined target categories or classes is called classification. The goal of classification is to accurately predict the target class for each case in the data [15]. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks [16].

There are many methods to classify the data, including decision tree induction, frame-based or rule-based expert systems, hierarchical classification, neural networks, Bayesian network, and support vector machines (see Figure 2). (i)

A decision tree is a flow-chart-like tree structure, where each internal node is denoted by rectangles and leaf nodes are denoted by ovals. All internal nodes have two or more child nodes. All internal nodes contain splits, which test the value of an expression of the attributes. Arcs from an internal node to its children are labeled with distinct outcomes of the test. Each leaf node has a class label associated with it. Iterative Dichotomiser 3 or ID3 is a simple decision tree learning algorithm [17]. C4.5 algorithm is an improved version of ID3; it uses gain ratio as splitting criteria [18]. The difference between ID3 and C4.5 algorithm is that ID3 uses binary splits, whereas C4.5 algorithm uses multiway splits. SLIQ (Supervised Learning In Quest) is capable of handling large data sets with ease and lesser time complexity [19, 20], SPRINT (Scalable Parallelizable Induction of Decision Tree algorithm) is also fast and highly scalable, and there is no storage constraint on larger data sets in SPRINT [21]. Other improvement researches are finished [22, 23]. Classification and Regression Trees (CART) is a nonparametric decision tree algorithm. It produces either classification or regression trees, based on whether the response variable is categorical or continuous. CHAID (chi-squared automatic interaction detector) and the improvement researcher [24] focus on dividing a data set into exclusive and exhaustive segments that differ with respect to the response variable.

(ii)

The KNN (K-Nearest Neighbor) algorithm is introduced by the Nearest Neighbor algorithm which is designed to find the nearest point of the observed object. The main idea of the KNN algorithm is to find the K-nearest points [25]. There are a lot of different improvements for the traditional KNN algorithm, such as the Wavelet Based K-Nearest Neighbor Partial Distance Search (WKPDS) algorithm [26], Equal-Average Nearest Neighbor Search (ENNS) algorithm [27], Equal-Average Equal-Norm Nearest Neighbor code word Search (EENNS) algorithm [28], the Equal-Average Equal-Variance Equal-Norm Nearest Neighbor Search (EEENNS) algorithm [29], and other improvements [30].

(iii)

Bayesian networks are directed acyclic graphs whose nodes represent random variables in the Bayesian sense. Edges represent conditional dependencies; nodes which are not connected represent variables which are conditionally independent of each other. Based on Bayesian networks, these classifiers have many strengths, like model interpretability and accommodation to complex data and classification problem settings [31]. The research includes naïve Bayes [32, 33], selective naïve Bayes [34], seminaïve Bayes [35], one-dependence Bayesian classifiers [36, 37], K-dependence Bayesian classifiers [38], Bayesian network-augmented naïve Bayes [39], unrestricted Bayesian classifiers [40], and Bayesian multinets [41].

(iv)

Support Vector Machines algorithm is supervised learning model with associated learning algorithms that analyze data and recognize patterns, which is based on statistical learning theory. SVM produces a binary classifier, the so-called optimal separating hyperplanes, through an extremely nonlinear mapping of the input vectors into the high-dimensional feature space [32]. SVM is widely used in text classification [33, 42], marketing, pattern recognition, and medical diagnosis [43]. A lot of further research is done, GSVM (granular support vector machines) [44–46], FSVM (fuzzy support vector machines) [47–49], TWSVMs (twin support vector machines) [50–52], VaR-SVM (value-at-risk support vector machines) [53], and RSVM (ranking support vector machines) [54].

Figure 2

The research structure of classification.

2.2. Clustering

Clustering algorithms [55] divide data into meaningful groups (see Figure 3) so that patterns in the same group are similar in some sense and patterns in different group are dissimilar in the same sense. Searching for clusters involves unsupervised learning [56]. In information retrieval, for example, the search engine clusters billions of web pages into different groups, such as news, reviews, videos, and audios. One straightforward example of clustering problem is to divide points into different groups [16]. (i)

Hierarchical clustering method combines data objects into subgroups; those subgroups merge into larger and high level groups and so forth and form a hierarchy tree. Hierarchical clustering methods have two classifications, agglomerative (bottom-up) and divisive (top-down) approaches. The agglomerative clustering starts with one-point clusters and recursively merges two or more of the clusters. The divisive clustering in contrast is a top-down strategy; it starts with a single cluster containing all data points and recursively splits that cluster into appropriate subclusters [57, 58]. CURE (Clustering Using Representatives) [59, 60] and SVD (Singular Value Decomposition) [61] are typical research.

(ii)

Partitioning algorithms discover clusters either by iteratively relocating points between subsets or by identifying areas heavily populated with data. The related research includes SNOB [62], MCLUST [63], k-medoids, and k-means related research [64, 65]. Density-based partitioning methods attempt to discover low-dimensional data, which is dense-connected, known as spatial data. The related research includes DBSCAN (Density Based Spatial Clustering of Applications with Noise) [66, 67]. Grid based partitioning algorithms use hierarchical agglomeration as one phase of processing and perform space segmentation and then aggregate appropriate segments; researches include BANG [68].

(iii)

In order to handle categorical data, researchers change data clustering to preclustering of items or categorical attribute values; typical research includes ROCK [69].

(iv)

Scalable clustering research faces scalability problems for computing time and memory requirements, including DIGNET [70] and BIRCH [71].

(v)

High dimensionality data clustering methods are designed to handle data with hundreds of attributes, including DFT [72] and MAFIA [73].

Figure 3

The research structure of clustering.

2.3. Association Analysis

Association rule mining [74] focuses on the market basket analysis or transaction data analysis, and it targets discovery of rules showing attribute-value associations that occur frequently and also help in the generation of more general and qualitative knowledge which in turn helps in decision making [75]. The research structure of association analysis is shown in Figure 4. (i)

For the first catalog of association analysis algorithms, the data will be processed sequentially. The a priori based algorithms have been used to discover intratransaction associations and then discover associations; there are lots of extension algorithms. According to the data record format, it clusters into 2 types: Horizontal Database Format Algorithms and Vertical Database Format Algorithms; the typical algorithms include MSPS [76] and LAPIN-SPAM [77]. Pattern growth algorithm is more complex but can be faster to calculate given large volumes of data. The typical algorithm is FP-Growth algorithm [78].

(ii)

In some area, the data would be a flow of events and therefore the problem would be to discover event patterns that occur frequently together. It divides into 2 parts: event-based algorithms and event-oriented algorithms; the typical algorithm is PROWL [79, 80].

(iii)

In order to take advantage of distributed parallel computer systems, some algorithms are developed, for example, Par-CSP [81].

Figure 4

The research structure of association analysis.

2.4. Time Series Analysis

A time series is a collection of temporal data objects; the characteristics of time series data include large data size, high dimensionality, and updating continuously. Commonly, time series task relies on 3 parts of components, including representation, similarity measures, and indexing (see Figure 5) [82, 83]. (i)

One of the major reasons for time series representation is to reduce the dimension, and it divides into three categories: model based representation, non-data-adaptive representation, and data adaptive representation. The model based representations want to find parameters of underlying model for a representation. Important research works include ARMA [84] and the time series bitmaps research [85]. In non-data-adaptive representations, the parameters of the transformation remain the same for every time series regardless of its nature, related research including DFT [86], wavelet functions related topic [87], and PAA [72]. In data adaptive representations, the parameters of a transformation will change according to the data available and related works including representations version of DFT [88]/PAA [89] and indexable PLA [90].

(ii)

The similarity measure of time series analysis is typically carried out in an approximate manner; the research directions include subsequence matching [91] and full sequence matching [92].

(iii)

The indexing of time series analysis is closely associated with representation and similarity measure part; the research topic includes SAMs (Spatial Access Methods) and TS-Tree [93].

Figure 5

The research structure of time series analysis.

2.5. Other Analysis

Outlier detection refers to the problem of finding patterns in data that are very different from the rest of the data based on appropriate metrics. Such a pattern often contains useful information regarding abnormal behavior of the system described by the data. Distance-based algorithms calculate the distances among objects in the data with geometric interpretation. Density-based algorithms estimate the density distribution of the input space and then identify outliers as those lying in low density. Rough sets based algorithms introduce rough sets or fuzzy rough sets to identify outliers [94].

3. Data Mining Applications

3.1. Data Mining in e-Commerce

Data mining enables the businesses to understand the patterns hidden inside past purchase transactions, thus helping in planning and launching new marketing campaigns in prompt and cost-effective way [95]. e-commerce is one of the most prospective domains for data mining because data records, including customer data, product data, users’ action log data, are plentiful; IT team has enriched data mining skill and return on investment can be measured. Researchers leverage association analysis and clustering to provide the insight of what product combinations were purchased; it encourages customers to purchase related products that they may have been missed or overlooked. Users’ behaviors are monitored and analyzed to find similarities and patterns in Web surfing behavior so that the Web can be more successful in meeting user needs [96]. A complementary method of identifying potentially interesting content uses data on the preference of a set of users, called collaborative filtering or recommender systems [97–99], and it leverages user's correlation and other similarity metrics to identify and cluster similar user profiles for the purpose of recommending informational items to users. And the recommender system also extends to social network [100], education area [101], academic library [102], and tourism [103].

3.2. Data Mining in Industry

Data mining can highly benefit industries such as retail, banking, and telecommunications; classification and clustering can be applied to this area [104].

One of the key success factors of insurance organizations and banks is the assessment of borrowers’ credit worthiness in advance during the credit evaluation process. Credit scoring becomes more and more important and several data mining methods are applied for credit scoring problem [105–107].

Retailers collect customer information, related transactions information, and product information to significantly improve accuracy of product demand forecasting, assortment optimization, product recommendation, and ranking across retailers and manufacturers [108, 109]. Researchers leverage SVM [110], support vector regression [111], or Bass model [112] to forecast the products’ demand.

3.3. Data Mining in Health Care

In health care, data mining is becoming increasingly popular, if not increasingly essential [113–118]. Heterogeneous medical data have been generated in various health care organizations, including payers, medicine providers, pharmaceuticals information, prescription information, doctor's notes, or clinical records produced day by day. These quantitative data can be used to do clinical text mining, predictive modeling [119], survival analysis, patient similarity analysis [120], and clustering, to improve care treatment [121] and reduce waste. In health care area, association analysis, clustering, and outlier analysis can be applied [122, 123].

Treatment record data can be mined to explore ways to cut costs and deliver better medicine [124, 125]. Data mining also can be used to identify and understand high-cost patients [126] and applied to mass of data generated by millions of prescriptions, operations, and treatment courses to identify unusual patterns and uncover fraud [127, 128].

3.4. Data Mining in City Governance

In public service area, data mining can be used to discover public needs and improve service performance, decision making with automated systems to decrease risks, classification, clustering, and time series analysis which can be developed to solve this area problem.

E-government improves quality of government service, cost savings, wider political participation, and more effective policies and programs [129, 130], and it has also been proposed as a solution for increasing citizen communication with government agencies and, ultimately, political trust [131]. City incident information management system can integrate data mining methods to provide a comprehensive assessment of the impact of natural disasters on the agricultural production and rank disaster affected areas objectively and assist governments in disaster preparation and resource allocation [132].

By using data analytics, researchers can predict which residents are likely to move away from the city [133], and it helps to infer which factors of city life and city services lead to a resident's decision to leave the city [134].

A major challenge for the government and law-enforcement is how to quickly analyze the growing volumes of crime data [135]. Researchers introduce spatial data mining technique to find out the association rules between the crime hot spots and spatial landscape [136]; other researchers leverage enhanced k-means clustering algorithm to discover crime patterns and use semisupervised learning technique for knowledge discovery and to help increase the predictive accuracy [137]. Also data mining can be used to detect criminal identity deceptions by analyzing people information such as name, address, date of birth, and social-security number [138] and to uncover previously unknown structural patterns from criminal networks [139].

In transport system, data mining can be used for map refinement according to GPS traces [140–142], and based on multiple users’ GPS trajectories researchers discover the interesting locations and classical travel sequences for location recommendation and travel recommendation [143].

3.5. Summary

The data mining application and most popular data mining functionalities can be summarized in Table 1.

Table 1

The data mining application and most popular data mining functionalities.

Application	Classification	Clustering	Association analysis	Time series analysis	Outlier analysis
e-commerce		✓	✓
Industry	✓	✓	✓
Health care		✓	✓		✓
City governance	✓	✓	✓	✓

4. Challenges and Open Research Issues in IoT and Big Data Era

With the rapid development of IoT, big data, and cloud computing, the most fundamental challenge is to explore the large volumes of data and extract useful information or knowledge for future actions [144]. The key characteristics of the data in IoT era can be considered as big data; they are as follows. (i)

Large volumes of data to read and write: the amount of data can be TB (terabytes), even PB (petabytes) and ZB (zettabyte), so we need to explore fast and effective mechanisms.

(ii)

Heterogeneous data sources and data types to integrate: in big data era, the data sources are diverse; for example, we need to integrate sensors data [145–147], cameras data, social media data, and so on and all these data are different in format, byte, binary, string, number, and so forth. We need to communicate with different types of devices and different systems and also need to extract data from web pages.

(iii)

Complex knowledge to extract: the knowledge is deeply hidden in large volumes of data and the knowledge is not straightforward, so we need to analyze the properties of data and find the association of different data.

4.1. Challenges

There are lots of challenges when IoT and big data come; the quantity of data is big but the quality is low and the data are various from different data sources inherently possessing a great many different types and representation forms, and the data is heterogeneous, as-structured, semistructured, and even entirely unstructured. We analyze the challenges in data extracting, data mining algorithms, and data mining system area. Challenges are summarized below. (i)

The first challenge is to access, extracting large scale data from different data storage locations. We need to deal with the variety, heterogeneity, and noise of the data, and it is a big challenge to find the fault and even harder to correct the data. In data mining algorithms area, how to modify traditional algorithms to big data environment is a big challenge.

(ii)

Second challenge is how to mine uncertain and incomplete data for big data applications. In data mining system, an effective and security solution to share data between different applications and systems is one of the most important challenges, since sensitive information, such as banking transactions and medical records, should be a matter of concern.

4.2. Open Research Issues

In big data era, there are some open research issues including data checking, parallel programming model, and big data mining framework. (i)

There are lots of researches on finding errors hidden in data, such as [148]. Also the data cleaning, filtering, and reduction mechanisms are introduced.

(ii)

Parallel programming model is introduced to data mining and some algorithms are adopted to be applied in it. Researchers have expanded existing data mining methods in many ways, including the efficiency improvement of single-source knowledge discovery methods, designing a data mining mechanism from a multisource perspective, and the study of dynamic data mining methods and the analysis of stream data [149]. For example, parallel association rule mining [150, 151] and parallel k-means algorithm based on Hadoop platform are good practice. But there are still some algorithms which are not adapted to parallel platform, this constraint on applying data mining technology to big data platform. This would be a challenge for data mining related researchers and also a great direction.

(iii)

The most important work for big data mining system is to develop an efficient framework to support big data mining. In the big data mining framework, we need to consider the security of data, the privacy, the data sharing mechanism, the growth of data size, and so forth. A well designed data mining framework for big data is a very important direction and a big challenge.

4.3. Recent Works of Big Data Mining System for IoT

In data mining system area, many large companies as Facebook, Yahoo, and Twitter benefit and contribute works to open source projects. Big data mining infrastructure includes the following. (i)

Apache Mahout project implements a wide range of machine learning and data mining algorithms [152].

(ii)

R Project is a programming language and software environment designed for statistical computing and visualization [153].

(iii)

MOA project performs data mining in real time [154] and SAMOA [155] project integrates MOA with Strom and S4.

(iv)

Pegasus is a petascale graph mining library for the Hadoop platform [156].

Some researchers from IoT area also proposed big data mining system architectures for IoT, and these systems focus on the integration with devices and data mining technologies [157]. Figure 6 shows an architecture for the support of social network and cloud computing in IoT. They integrated the big data and KDD into the extraction, management and mining, and interpretation layers. The extraction layer maps onto the perception layer. Different from the traditional KDD, the extraction layer of the proposed framework also takes into consideration the behavior of agents for its devices [2].

Figure 6

Big data mining system for IoT.

4.4. Suggested System Architecture for IoT

According to the survey of big data mining system and IoT system, we suggest the system architecture for IoT and big data mining system. In this system, it includes 5 layers as shown in Figure 7. (i)

Devices: lots of IoT devices, such as sensors, RFID, cameras, and other devices, can be integrated into this system to apperceive the world and generate data continuously.

(ii)

Raw data: in the big data mining system, structured data, semistructured data, and unstructured data can be integrated.

(iii)

Data gather: real-time data and batch data can be supported and all data can be parsed, analyzed, and merged.

(iv)

Data processing: lots of open source solutions are integrated, including Hadoop, HDFS, Storm, and Oozie.

(v)

Service: data mining functions will be provided as service.

(vi)

Security/privacy/standard: security, privacy, and standard are very important to big data mining system. Security and privacy protect the data from unauthorized access and privacy disclosure. Big data mining system standard makes data integration, sharing, and mining more open to the third part of developer.

Figure 7

The suggested big data mining system.

5. Conclusions

The Internet of Things concept arises from the need to manage, automate, and explore all devices, instruments, and sensors in the world. In order to make wise decisions both for people and for the things in IoT, data mining technologies are integrated with IoT technologies for decision making support and system optimization. Data mining involves discovering novel, interesting, and potentially useful patterns from data and applying algorithms to the extraction of hidden information. In this paper, we survey the data mining in 3 different views: knowledge view, technique view, and application view. In knowledge view, we review classification, clustering, association analysis, time series analysis, and outlier analysis. In application view, we review the typical data mining application, including e-commerce, industry, health care, and public service. The technique view is discussed with knowledge view and application view. Nowadays, big data is a hot topic for data mining and IoT; we also discuss the new characteristics of big data and analyze the challenges in data extracting, data mining algorithms, and data mining system area. Based on the survey of the current research, a suggested big data mining system is proposed.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Grant nos. 61100066, 61262013, 61472283, and 61103185), the Open Fund of Guangdong Province Key Laboratory of Precision Equipment and Manufacturing Technology (no. PEMT1303), the Fok Ying-Tong Education Foundation, China (Grant no. 142006), and the Fundamental Research Funds for the Central Universities (Grant no. 2013KJ034). This project is also sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry.

References

Jing

Vasilakos

A. V.

Wan

Qiu

Security of the internet of things: perspectives and challenges

Wireless Networks 2014 20 8 2481 2501

10.1007/s11276-014-0761-7

Tsai

C.-W.

Lai

C.-F.

Vasilakos

A. V.

Future internet of things: open issues and challenges

Wireless Networks 2014 20 8 2201 2217

10.1007/s11276-014-0731-0

2-s2.0-84901572822

Jiawei

Kamber

Data Mining: Concepts and Techniques 2011

Morgan Kaufmann

Mukhopadhyay

Maulik

Bandyopadhyay

Coello

C. A. C.

A survey of multiobjective evolutionary algorithms for data mining: part I

IEEE Transactions on Evolutionary Computation 2014 18 1 4 19

10.1109/tevc.2013.2290086

2-s2.0-84893833215

Zhang

Chen

Mao

Leung

CAP: crowd activity prediction based on big data analysis

IEEE Network 2014 28 4 52 57

10.1109/mnet.2014.6863132

Chen

Mao

Liu

Big data: a survey

Mobile Networks and Applications 2014 19 2 171 209

10.1007/s11036-013-0489-0

2-s2.0-84898796363

Chen

Mao

Zhang

Leung

Big Data: Related Technologies, Challenges and Future Prospects 2014

Springer

SpringerBriefs in Computer Science

Wan

Zhang

Sun

Lin

Zou

Cai

VCMIA: a novel architecture for integrating vehicular cyber-physical systems and mobile cloud computing

Mobile Networks and Applications 2014 19 2 153 160

10.1007/s11036-014-0499-6

2-s2.0-84898828128

Rong

X. H.

Chen

Deng

S. L.

A large-scale device collaboration mechanism

Journal of Computer Research and Development 2011 48 9 1589 1596

2-s2.0-80053997391

10.

Chen

Rong

X.-H.

Deng

S.-L.

A survey of device collaboration technology and system software

Acta Electronica Sinica 2011 39 2 440 447

2-s2.0-79955052781

11.

Zhou

Chen

Zheng

Cui

Green multimedia communications over Internet of Things

Proceedings of the IEEE International Conference on Communications (ICC ′12)

June 2012

Ottawa, Canada

1948 1952

10.1109/icc.2012.6363909

2-s2.0-84871967365

12.

Deng

Zhang

J. W.

Rong

X. H.

Chen

A model of large-scale Device Collaboration system based on PI-Calculus for green communication

Telecommunication Systems 2013 52 2 1313 1326

10.1007/s11235-011-9643-9

2-s2.0-84879603230

13.

Deng

Zhang

J. W.

Rong

X. H.

Chen

Modeling the large-scale device control system based on PI-Calculus

Advanced Science Letters 2011 4 6-7 2374 2379

10.1166/asl.2011.1398

2-s2.0-80051542301

14.

Zhang

Deng

Wan

Yan

Rong

Chen

A novel multimedia device ability matching technique for ubiquitous computing environments

EURASIP Journal on Wireless Communications and Networking 2013 2013 1, article 181 12

10.1186/1687-1499-2013-181

2-s2.0-84894120909

15.

Kesavaraj

Sukumaran

A study on classification techniques in data mining

Proceedings of the 4th International Conference on Computing, Communications and Networking Technologies (ICCCNT ′13)

July 2013

1 7

16.

Song

Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms [Ph.D. thesis] 2011

Iowa State University

17.

Quinlan

J. R.

Induction of decision trees

Machine Learning 1986 1 1 81 106

10.1007/BF00116251

2-s2.0-33744584654

18.

Quinlan

J. R.

C4. 5: Programs for Machine Learning 1993 1

Morgan Kaufmann

19.

Mehta

Agrawal

Rissanen

SLIQ: A Fast Scalable Classifier for Data Mining 1996

Berlin, Germany

Springer

20.

Chandra

Varghese

P. P.

Fuzzy SLIQ decision tree algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2008 38 5 1294 1301

10.1109/tsmcb.2008.923529

2-s2.0-52349105458

21.

Shafer

Agrawal

Mehta

SPRINT: a scalable parallel classifier for data mining

Proceedings of 22nd International Conference on Very Large Data Bases

1996

544 555

22.

Polat

Güneş

A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems

Expert Systems with Applications 2009 36 2 1587 1592

10.1016/j.eswa.2007.11.051

2-s2.0-56349133338

23.

Ranka

Singh

CLOUDS: a decision tree classifier for large datasets

Proceedings of the 4th Knowledge Discovery and Data Mining Conference

1998

2 8

24.

van Diepen

Franses

P. H.

Evaluating chi-squared automatic interaction detection

Information Systems 2006 31 8 814 831

10.1016/j.is.2005.03.002

2-s2.0-33748678906

25.

Larose

D. T.

k-nearest neighbor algorithm

Discovering Knowledge in Data: An Introduction to Data Mining 2005

John Wiley & Sons

90 106

26.

Hwang

W.-J.

Wen

K.-W.

Fast kNN classification algorithm based on partial distance search

Electronics Letters 1998 34 21 2062 2063

10.1049/el:19981427

2-s2.0-3743053629

27.

Jeng-Shyang

Yu-Long

Sheng-He

Fast k-nearest neighbors classification algorithm

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2004 87 4 961 963

28.

Pan

J.-S.

Z.-M.

Sun

S.-H.

An efficient encoding algorithm for vector quantization based on subvector technique

IEEE Transactions on Image Processing 2003 12 3 265 270

10.1109/TIP.2003.810587

2-s2.0-0038192328

29.

Z.-M.

Sun

S.-H.

Equal-average equal-variance equal-norm nearest neighbor search algorithm for vector quantization

IEICE Transactions on Information and Systems 2003 86 3 660 663

2-s2.0-0038719254

30.

Tang

L. L.

Pan

J. S.

Guo

Chu

S. C.

Roddick

J. F.

A novel approach on behavior of sleepy lizards based on K-nearest neighbor algorithm

Social Networks: A Framework of Computational Intelligence 2014 526

Cham, Switzerland

Springer

287 311 Studies in Computational Intelligence

10.1007/978-3-319-02993-1_13

31.

Bielza

Larrañaga

Discrete bayesian network classifiers: a survey

ACM Computing Surveys 2014 47 1, article 5

10.1145/2576868

32.

Maron

M. E.

Kuhns

J. L.

On relevance, probabilistic indexing and information retrieval

Journal of the ACM 1960 7 3 216 244

10.1145/321033.321035

33.

Minsky

Steps toward artificial intelligence

Proceedings of the IRE 1961 49 1 8 30

MR0134428

34.

Langley

Sage

Induction of selective Bayesian classifiers

Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence

1994

399 406

35.

Kononenko

Semi-naive Bayesian classifier

Machine Learning—EWSL-91 1991 482

Berlin, Germany

Springer

206 219 Lecture Notes in Artificial Intelligence

10.1007/BFb0017015

MR1101397

36.

Zheng

Webb

G. I.

Tree Augmented Naive Bayes 2010

Berlin, Germany

Springer

37.

Jiang

Zhang

Cai

Learning tree augmented naive bayes for ranking

Proceedings of the 10th International Conference on Database Systems for Advanced Applications (DASFAA ′05)

2005

688 698

38.

Sahami

Learning limited dependence Bayesian classifiers

Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining

August 1996

Portland, Ore, USA

335 338

39.

Friedman

Learning belief networks in the presence of missing values and hidden variables

Proceedings of the 14th International Conference on Machine Learning

1997

125 133

40.

Lei

Ding

X. Q.

Wang

S. J.

Visual tracker using sequential Bayesian learning: discriminative, generative, and hybrid

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2008 38 6 1578 1591

10.1109/tsmcb.2008.928226

2-s2.0-57049142065

41.

Geiger

Heckerman

Knowledge representation and inference in similarity networks and Bayesian multinets

Artificial Intelligence 1996 82 1-2 45 74

10.1016/0004-3702(95)00014-3

MR1391056

2-s2.0-0030125397

42.

Joachims

Text categorization with support vector machines: learning with many relevant features

Machine Learning: ECML-98 1998 1398

Berlin, Germany

Springer

137 142

10.1007/bfb0026683

43.

Yingxin

Xiaogang

Feature selection for cancer classification based on support vector machine

Journal of Computer Research and Development 2005 42 10 1796 1801

44.

Tang

Jin

Sun

Zhang

Y.-Q.

Granular support vector machines for medical binary classification problems

Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB ′04)

October 2004

73 78

2-s2.0-17044427050

45.

Guo

H.-S.

Wang

W.-J.

Men

C.-Q.

A novel learning model-kernel granular support vector machine

Proceedings of the International Conference on Machine Learning and Cybernetics

July 2009

930 935

10.1109/icmlc.2009.5212413

2-s2.0-70350716755

46.

Lian

Huang

Wang

Long

Study on a GA-based SVM decision-tree multi-classification strategy

Acta Electronica Sinica 2008 36 8 1502 1507

47.

Lin

C.-F.

Wang

S.-D.

Fuzzy support vector machines

IEEE Transactions on Neural Networks 2002 13 2 464 471

10.1109/72.991432

2-s2.0-0036505650

48.

Huang

H.-P.

Liu

Y.-H.

Fuzzy support vector machines for pattern recognition and data mining

International Journal of Fuzzy Systems 2002 4 3 826 835

MR1933593

49.

Yan

W.-Y.

Multi-class fuzzy support vector machine based on dismissing margin

Proceedings of the International Conference on Machine Learning and Cybernetics

July 2009

1139 1144

10.1109/icmlc.2009.5212368

2-s2.0-70350708096

50.

Tian

Shi

Robust twin support vector machine for pattern classification

Pattern Recognition 2013 46 1 305 316

10.1016/j.patcog.2012.06.019

2-s2.0-84866023118

51.

Khemchandani

Chandra

Twin support vector machines for pattern classification

IEEE Transactions on Pattern Analysis and Machine Intelligence 2007 29 5 905 910

10.1109/TPAMI.2007.1068

2-s2.0-34047225880

52.

Tian

Shi

Structural twin support vector machine for classification

Knowledge-Based Systems 2013 43 74 81

10.1016/j.knosys.2013.01.008

2-s2.0-84875271568

53.

Tsyurmasto

Zabarankin

Uryasev

Value-at-risk support vector machine: stability to outliers

Journal of Combinatorial Optimization 2014 28 1 218 232

10.1007/s10878-013-9678-9

MR3215108

2-s2.0-84888793792

54.

Herbrich

Graepel

Obermayer

Large margin rank boundaries for ordinal regression

Advances in Neural Information Processing Systems 1999

MIT Press

115 132

55.

Jain

A. K.

Dubes

R. C.

Algorithms for Clustering Data 1988

Englewood Cliffs, NJ, USA

Prentice Hall

56.

Ansari

Chetlur

Prabhu

Kini

G. N.

Hegde

Hyder

An overview of clustering analysis techniques used in data mining

International Journal of Emerging Technology and Advanced Engineering 2013 3 12 284 286

57.

Srivastava

Shah

Valia

Swaminarayan

Data mining using hierarchical agglomerative clustering algorithm in distributed cloud computing environment

International Journal of Computer Theory and Engineering 2013 5 3 520 522

10.7763/ijcte.2013.v5.741

58.

Berkhin

A survey of clustering data mining techniques

Grouping Multidimensional Data 2006

Berlin, Germany

Springer

25 71

10.1007/3-540-28349-8_2

59.

Guha

Rastogi

Shim

CURE: an efficient clustering algorithm for large databases

ACM SIGMOD Record 1998 27 2 73 84

10.1145/276305.276312

60.

Guha

Rastogi

Shim

CURE: an efficient clustering algorithm for large databases

Information Systems 2001 26 1 35 58

10.1016/s0306-4379(01)00008-4

2-s2.0-0035279319

61.

Berry

M. W.

Browne

Understanding Search Engines: Mathematical Modeling and Text Retrieval 2005 17

SIAM

62.

Wallace

C. S.

Dowe

D. L.

Intrinsic classification by MML-the Snob program

Proceedings of the 7th Australian Joint Conference on Artificial Intelligence

1994

World Scientific

37 44

63.

Fraley

Raftery

A. E.

MCLUST version 3: an R package for normal mixture modeling and model-based clustering

DTIC Document 2006

64.

Broder

Garcia-Pueyo

Josifovski

Vassilvitskii

Venkatesan

Scalable K-Means by ranked retrieval

Proceedings of the 7th ACM International Conference on Web Search and Data Mining

Feburary 2014

233 242

65.

Wang

An efficient K-means clustering algorithm on MapReduce

Proceedings of the 19th International Conference on Database Systems for Advanced Applications (DASFAA ′14), Bali, Indonesia, April 2014 2014 8421

Springer International Publishing

357 371 Lecture Notes in Computer Science

10.1007/978-3-319-05810-8_24

66.

Agrawal

Soni

Sharma

Agrawal

Modification of density based spatial clustering algorithm for large database using naive's bayes' theorem

Proceedings of the 4th International Conference on Communication Systems and Network Technologies (CSNT ′14)

April 2014

Bhopal, India

419 423

10.1109/csnt.2014.89

67.

Ester

Kriegel

Sander

A density-based algorithm for discovering clusters in large spatial databases with noise

Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD ′96)

1996

Portland, Ore, USA

226 231

68.

Schikuta

Erhart

The BANG-clustering system: grid-based data analysis

Advances in Intelligent Data Analysis Reasoning about Data 1997 1280

Berlin, Germany

Springer

513 524 Lecture Notes in Computer Science

10.1007/bfb0052867

69.

Guha

Rastogi

Shim

ROCK: a robust clustering algorithm for categorical attributes

Proceedings of the 15th International Conference on Data Engineering (ICD ′99)

March 1999

512 521

2-s2.0-0032652570

70.

Thomopoulos

S. C. A.

Bougoulias

D. K.

Wann

C.-D.

Dignet: an unsupervised-learning clustering algorithm for clustering and data fusion

IEEE Transactions on Aerospace and Electronic Systems 1995 31 1 21 38

10.1109/7.366289

2-s2.0-0029221045

71.

Zhang

Ramakrishnan

Livny

BIRCH: a new data clustering algorithm and its applications

Data Mining and Knowledge Discovery 1997 1 2 141 182

10.1023/a:1009783824328

2-s2.0-21944442892

72.

Keogh

Chakrabarti

Pazzani

Mehrotra

Dimensionality reduction for fast similarity search in large time series databases

Knowledge and Information Systems 2001 3 3 263 286

73.

Nagesh

H. S.

Goil

Choudhary

A. N.

Adaptive grids for clustering massive data sets

Proceedings of the 1st SIAM International Conference on Data Mining (SDM ′01)

April 2001

Chicago, Ill, USA

1 17

74.

Agrawal

Imieliński

Swami

Mining association rules between sets of items in large databases

Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD ′93)

1993

207 216

10.1145/170035.170072

75.

Gosain

Bhugra

A comprehensive survey of association rules on quantitative data in data mining

Proceedings of the IEEE Conference on Information & Communication Technologies (ICT ′13)

April 2013

JeJu Island, Republic of Korea

1003 1008

10.1109/cict.2013.6558244

2-s2.0-84881620516

76.

Luo

Chung

S. M.

Efficient mining of maximal sequential patterns using multiple samples

Proceedings of the 5th SIAM International Conference on Data Mining (SDM ′05)

April 2005

415 426

2-s2.0-79959931599

77.

Yang

Kitsuregawa

LAPIN-SPAM: an improved algorithm for mining sequential pattern

Proceedings of the 21st International Conference on Data Engineering Workshops

April 2005

1222

10.1109/icde.2005.235

2-s2.0-33947172673

78.

Han

Pei

Mining frequent patterns by pattern-growth: methodology and implications

ACM SIGKDD Explorations Newsletter 2000 2 2 14 20

10.1145/380995.381002

79.

Huang

Chang

Lin

Prowl: an efficient frequent continuity mining algorithm on event sequences

Data Warehousing and Knowledge Discovery 2004 3181

Berlin, Germany

Springer

351 360 Lecture Notes in Computer Science

10.1007/978-3-540-30076-2_35

80.

Huang

K. Y.

Chang

C. H.

Efficient mining of frequent episodes from complex sequences

Information Systems 2008 33 1 96 114

10.1016/j.is.2007.07.003

2-s2.0-35748947774

81.

Cong

Han

Padua

Parallel mining of closed sequential patterns

Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ′05)

August 2005

562 567

10.1145/1081870.1081937

2-s2.0-32344453032

82.

T.-C.

A review on time series data mining

Engineering Applications of Artificial Intelligence 2011 24 1 164 181

10.1016/j.engappai.2010.09.007

2-s2.0-78649672225

83.

Esling

Agon

Time-series data mining

ACM Computing Surveys 2012 45 1, article 12 34

10.1145/2379776.2379788

2-s2.0-84871210043

84.

Kalpakis

Gada

Puttagunta

Distance measures for effective clustering of ARIMA time-series

Proceedings of the IEEE International Conference on Data Mining (ICDM ′01)

December 2001

San Jose, Calif, USA

273 280

2-s2.0-78149299418

10.1109/ICDM.2001.989529

85.

Kumar

Lolla

V. N.

Keogh

Lonardi

Ratanamahatana

C. A.

Wei

Time-series bitmaps: a practical visualization tool for working with large time series databases

Proceedings of the 5th SIAM International Conference on Data Mining (SDM ′05)

April 2005

531 535

2-s2.0-84880090937

86.

Chan

F. K.-P.

A. W.-C.

Haar wavelets for efficient similarity search of time-series: with and without time warping

IEEE Transactions on Knowledge and Data Engineering 2003 15 3 686 705

10.1109/tkde.2003.1198399

2-s2.0-0038294452

87.

Shasha

D. E.

Zhu

High Performance Discovery in Time Series: Techniques and Case Studies 2004

Springer

10.1007/978-1-4757-4046-2

MR2079948

88.

Vlachos

Gunopulos

Das

Indexing time-series under conditions of noise

Data Mining in Time Series Databases 2004 57

World Scientific

67 100 Series in Machine Perception and Artificial Intelligence

10.1142/9789812565402_0004

89.

Megalooikonomou

Wang

A dimensionality reduction technique for efficient similarity analysis of time series databases

Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM ′04)

November 2004

Washington, DC, USA

160 161

2-s2.0-18744416014

10.1145/1031171.1031203

90.

Chen

Lian

Liu

J. X.

Indexable PLA for efficient similarity search

Proceedings of the 33rd International Conference on Very Large Data Bases

September 2007

Vienna, Austria

435 446

91.

Dong

X. L.

C. K.

Wang

Z. O.

Research on shape-based time series similarity measure

Proceedings of the International Conference on Machine Learning and Cybernetics

August 2006

1253 1258

10.1109/icmlc.2006.258648

2-s2.0-33947269937

92.

Megalooikonomou

Wang

Faloutsos

A multiresolution symbolic representation of time series

Proceedings of the 21st International Conference on Data Engineering (ICDE ′05)

April 2005

668 679

10.1109/icde.2005.10

2-s2.0-28444432990

93.

Assent

Krieger

Afschari

Seidl

The TS-tree: efficient time series search and retrieval

Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology (EDBT ′08)

2008

252 263

94.

Gogoi

Bhattacharyya

D. K.

Borah

Kalita

J. K.

A survey of outlier detection methods in network anomaly identification

The Computer Journal 2011 54 4 570 588

10.1093/comjnl/bxr026

2-s2.0-79953811849

95.

Mishra

Padhy

Panigrahi

The survey of data mining applications and feature scope

Asian Journal of Computer Science & Information Technology 2013 2, article 4

96.

Heer

Chi

E. H.

Identification of web user traffic composition using multi-modal clustering and information scent

Proceedings of the Workshop on Web Mining, SIAM Conference on Data Mining

2001

51 58

97.

Resnick

Varian

H. R.

Recommender systems

Communications of the ACM 1997 40 3 56 58

2-s2.0-0031104254

98.

Breese

J. S.

Heckerman

Kadie

Empirical analysis of predictive algorithms for collaborative filtering

Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI ′98)

1998

43 52

99.

Nikolay

Anindya

Panagiotis

G. I.

Deriving the pricing power of product features by mining consumer reviews

Management Science 2011 57 8 1485 1509

10.1287/mnsc.1110.1370

2-s2.0-80051648956

100.

Guy

Tutorial on social recommender systems

Proceedings of the 23rd International World Wide Web Conference (WWW ′14)

2014

Seoul, Republic of Korea

195 196

101.

Konstan

J. A.

Walker

J. D.

Brooks

D. C.

Brown

Ekstrand

M. D.

Teaching recommender systems at large scale: evaluation and lessons learned from a hybrid MOOC

Proceedings of the 1st ACM Conference on Learning @ Scale Conference (L@S ′14)

March 2014

61 70

10.1145/2556325.2566244

2-s2.0-84899679149

102.

Tejeda-Lorente

Bernabé-Moreno

Porcel

Herrera-Viedma

Integrating quality criteria in a fuzzy linguistic recommender system for digital libraries

Procedia Computer Science 2014 31 1036 1043

10.1016/j.procs.2014.05.357

103.

Gavalas

Konstantopoulos

Mastakas

Pantziou

Mobile recommender systems in tourism

Journal of Network and Computer Applications 2014 39 1 319 333

10.1016/j.jnca.2013.04.006

2-s2.0-84893733531

104.

Elgendy

Elragal

Big data analytics: a literature review paper

Advances in Data Mining. Applications and Theoretical Aspects 2014 8557

Cham, Switzerland

Springer

214 227 Lecture Notes in Computer Science

10.1007/978-3-319-08976-8_16

105.

Koh

H. C.

Tan

W. C.

Goh

C. P.

A two-step method to construct credit scoring models with data mining techniques

International Journal of Business and Information 2006 1 1 96 118

106.

Hsieh

N. C.

Hung

L. P.

A data driven ensemble classifier for credit scoring analysis

Expert Systems with Applications 2010 37 1 534 545

10.1016/j.eswa.2009.05.059

2-s2.0-70349580621

107.

Kambal

Osman

Taha

Mohammed

Credit scoring using data mining techniques with particular reference to Sudanese banks

Proceedings of the 1st IEEE International Conference on Computing, Electrical and Electronics Engineering (ICCEEE ′13)

August 2013

378 383

10.1109/icceee.2013.6633966

2-s2.0-84889569998

108.

Liu

Wan

Zhou

Cloud manufacturing service system for industrial-cluster-oriented application

Journal of Internet Technology 2014 15 3 373 380

109.

Maaß

Spruit

de Waal

Improving short-term demand forecasting for short-lifecycle consumer products with data mining techniques

Decision Analytics 2014 1 1 1 17

110.

X. F.

Leung

S. C. H.

Zhang

J. L.

Lai

K. K.

Demand forecasting of perishable farm products using support vector machine

International Journal of Systems Science 2013 44 3 556 567

10.1080/00207721.2011.617888

2-s2.0-84870610658

111.

C.-J.

Wang

Y.-W.

Combining independent component analysis and growing hierarchical self-organizing maps with support vector regression in product demand forecasting

International Journal of Production Economics 2010 128 2 603 613

10.1016/j.ijpe.2010.07.004

2-s2.0-78049295934

112.

Lee

Kim

S. G.

Park

H.-W.

Kang

Pre-launch new product demand forecasting using the Bass model: a statistical and machine learning-based approach

Technological Forecasting and Social Change 2013 86 49 64

10.1016/j.techfore.2013.08.020

2-s2.0-84883881196

113.

Chen

Gonzalez

Leung

Zhang

A 2G-RFID-based e-healthcare system

IEEE Wireless Communications 2010 17 1 37 43

10.1109/mwc.2010.5416348

2-s2.0-77649143550

114.

Liu

Wan

Zhang

E-healthcare supported by big data

ZTE Communications 2014 12 3 46 52

115.

Chen

Wang

Mau

D. O.

Song

Enabling comfortable sports therapy for patient: a novel lightweight durable and portable ECG monitoring system

Proceedings of the IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom ′13)

October 2013

Lisbon, Portugal

IEEE

271 273

10.1109/healthcom.2013.6720681

2-s2.0-84894166007

116.

Liu

Wang

Wan

Xiong

Zeng

Towards key issues of disaster aid based on Wireless Body Area Networks

KSII Transactions on Internet and Information Systems 2013 7 5 1014 1035

10.3837/tiis.2013.05.005

2-s2.0-84878463160

117.

Chen

NDNC-BAN: supporting rich media healthcare services via named data networking in cloud-assisted wireless body area networks

Information Sciences 2014 284 10 142 156

10.1016/j.ins.2014.06.023

118.

Chen

Mau

D. O.

Wang

The virtue of sharing: efficient content delivery in wireless body area networks for ubiquitous healthcare

Proceedings of the IEEE 15th International Conference on e-Health Networking, Applications & Services (Healthcom '13)

October 2013

Lisbon, Portugal

669 673

10.1109/HealthCom.2013.6720760

119.

Wan

Zou

Ullah

Lai

C.-F.

Zhou

Wang

Cloud-enabled wireless body area networks for pervasive healthcare

IEEE Network 2013 27 5 56 61

10.1109/MNET.2013.6616116

2-s2.0-84885589116

120.

Duan

Street

W. N.

Healthcare information systems: data mining methods in the creation of a clinical recommender system

Enterprise Information Systems 2011 5 2 169 181

10.1080/17517575.2010.541287

2-s2.0-79952445361

121.

Schuerenberg

B. K.

An information excavation. Las Vegas payer uses data mining software to improve HEDIS reporting and provider profiling

Health Data Management 2003 11 6 80 82

2-s2.0-0043267751

122.

Sun

Reddy

C. K.

Big data analytics for healthcare

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2013

Chicago, Ill, USA

1525

10.1145/2487575.2506178

123.

Kincade

Data mining: digging for healthcare gold

Insurance & Technology 1998 23 2 2 7

124.

Bellazzi

Zupan

Predictive data mining in clinical medicine: current issues and guidelines

International Journal of Medical Informatics 2008 77 2 81 97

10.1016/j.ijmedinf.2006.11.006

2-s2.0-37249089420

125.

Liu

Pan

Wang

Lin

Shen

Yang

Luo

Cao

Component analysis of Chinese medicine and advances in fuming-washing therapy for knee osteoarthritis via unsupervised data mining methods

Journal of Traditional Chinese Medicine 2013 33 5 686 691

10.1016/s0254-6272(14)60043-1

2-s2.0-84887756703

126.

Silver

Sakata

H. C.

Herman

Dolins

S. B.

O'Shea

M. J.

Case study: how to apply data mining techniques in a healthcare data warehouse

Journal of Healthcare Information Management 2001 15 2 155 164

2-s2.0-0035380848

127.

Koh

H. C.

Tan

Data mining applications in healthcare

Journal of Healthcare Information Management 2011 19 2 65

128.

Thornton

Mueller

R. M.

Schoutsen

van Hillegersberg

Predicting healthcare fraud in medicaid: a multidimensional data model and analysis techniques for fraud detection

Procedia Technology 2013 9 1252 1264

129.

Helbig

Gil-García

J. R.

Ferro

Understanding the complexity of electronic government: implications from the digital divide literature

Government Information Quarterly 2009 26 1 89 97

10.1016/j.giq.2008.05.004

2-s2.0-56849130203

130.

Wan

Zou

Zhou

M2M communications for smart city: an event-based architecture

Proceedings of the IEEE 12th International Conference on Computer and Information Technology (CIT ′12)

October 2012

Chengdu, China

895 900

10.1109/cit.2012.188

2-s2.0-84872352579

131.

Chadwick

May

Interaction between states and citizens in the age of the internet: ‘e-government’ in the United States, Britain, and the European Union

Governance 2003 16 2 271 300

10.1111/1468-0491.00216

2-s2.0-0037787917

132.

Peng

Zhang

Tang

An incident information management framework based on data integration, data mining, and multi-criteria decision making

Decision Support Systems 2011 51 2 316 327

10.1016/j.dss.2010.11.025

2-s2.0-79953753719

133.

Sullivan

Mitra

Community issues in American metropolitan cities: a data mining case study

Journal of Cases on Information Technology 2014 16 1 23 39

10.4018/jcit.2014010103

134.

Chen

Towards smart city: M2M communications with software agent intelligence

Multimedia Tools and Applications 2013 67 1 167 178

10.1007/s11042-012-1013-4

2-s2.0-84881146585

135.

Chen

Chung

J. J.

Wang

Qin

Chau

Crime data mining: a general framework and some examples

Computer 2004 37 4 50 56

10.1109/mc.2004.1297301

2-s2.0-1942500388

136.

Huang

A study of the application of data mining on the spatial landscape allocation of crime hot spots

Geo-Informatics in Resource Management and Sustainable Ecosystem 2013 398

Berlin, Germany

Springer

1274 286 Communications in Computer and Information Science

10.1007/978-3-642-45025-9_29

137.

Shyam

V. N.

Crime pattern detection using data mining

Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops (WI-IAT ′06)

December 2006

Hong Kong

41 44

10.1109/WI-IATW.2006.55

138.

Wang

Chen

Atabakhsh

Automatically detecting deceptive criminal identities

Communications of the ACM 2004 47 3 70 76

10.1145/971617.971618

2-s2.0-1942443495

139.

Chen

Chung

Qin

Chau

J. J.

Wang

Zheng

Atabakhsh

Crime data mining: an overview and case studies

Proceedings of the Annual National Conference on Digital Government Research

2003

1 5

140.

Cao

Cong

Jensen

C. S.

Mining significant semantic locations from GPS data

Proceedings of the VLDB Endowment 2010 3 1-2 1009 1020

141.

Wan

Zhang

Zhao

Yang

L. T.

Lloret

Context-aware vehicular cyber-physical systems with cloud support: architecture, challenges, and solutions

IEEE Communications Magazine 2014 52 8 106 113

10.1109/mcom.2014.6871677

142.

Schroedl

Wagstaff

Rogers

Langley

Wilson

Mining GPS traces for map refinement

Data Mining and Knowledge Discovery 2004 9 1 59 87

10.1023/b:dami.0000026904.74892.89

MR2055555

2-s2.0-3543072870

143.

Zheng

Zhang

Xie

Mining interesting locations and travel sequences from GPS trajectories

Proceedings of 18th International Conference on World Wide Web

2009

791 800

144.

Chen

Huang

Zhu

A survey of mass data mining based on cloud-computing

Proceedings of the International Conference on Anti-Counterfeiting, Security and Identification (ASID ′12)

August 2012

1 4

10.1109/icasid.2012.6325353

2-s2.0-84870624640

145.

Sun

Han

Yan

P. S.

Mining knowledge from interconnected data: a heterogeneous information network analysis approach

Proceedings of the VLDB Endowment

2012

2022 2023

146.

Chen

Yang

L. T.

Kwon

Zhou

Itinerary planning for energy-efficient agent communications in wireless sensor networks

IEEE Transactions on Vehicular Technology 2011 60 7 3290 3299

10.1109/TVT.2011.2134116

2-s2.0-80052854703

147.

Zhang

Wan

Liu

Guan

Liang

A taxonomy of agent technologies for ubiquitous computing environments

KSII Transactions on Internet and Information Systems 2012 6 2 547 565

10.3837/tiis.2012.02.006

2-s2.0-84862173924

148.

Chen

Leung

V. C. M.

Mao

Directional controlled fusion in wireless sensor networks

Mobile Networks and Applications 2009 14 2 220 229

10.1007/s11036-008-0133-6

2-s2.0-62249146158

149.

Zhu

G.-Q.

Ding

Data mining with big data

IEEE Transactions on Knowledge and Data Engineering 2014 26 1 97 107

10.1109/TKDE.2013.109

2-s2.0-84890419941

150.

Zhang

Synthesizing high-frequency rules from different data sources

IEEE Transactions on Knowledge and Data Engineering 2003 15 2 353 367

10.1109/TKDE.2003.1185839

2-s2.0-0037339975

151.

Huang

Zhang

A logical framework for identifying quality knowledge from different data sources

Decision Support Systems 2006 42 3 1673 1683

10.1016/j.dss.2006.02.012

2-s2.0-33750493303

152.

Owen

Anil

Dunning

Friedman

Mahout in Action 2011

Manning

153.

R Development Core Team R: A Language, and Environment for Statistical Computing 2012

Vienna, Austria

R Foundation for Statistical Computing

154.

Bifet

Holmes

Kirkby

Pfahringer

Moa: massive online analysis

The Journal of Machine Learning Research 2010 11 1601 1604

2-s2.0-77953527363

155.

de Francisci Morales

SAMOA: a platform for mining big data streams

Proceedings of the 22nd International Conference on World Wide Web (WWW ′13)

May 2013

777 778

2-s2.0-84893053113

156.

Kang

Chau

D. H.

Faloutsos

Pegasus: mining billion-scale graphs in the cloud

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ′12)

March 2012

Kyoto, Japan

5341 5344

10.1109/icassp.2012.6289127

2-s2.0-84867619547

157.

da Silva

W. M.

Alvaro

Tomas

G. H. R. P.

Afonso

R. A.

Dias

K. L.

Garcia

V. C.

Smart cities software architectures: a survey

Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC ′13)

March 2013

1722 1727

10.1145/2480362.2480688

2-s2.0-84877940180