Big Data in Future Sensing

Abstract

With the exponential increasing number of data generating devices, such as computers, smart phones, tablets, and sensors, the vast amount of data to be processed, that is, “big data,” has become an important concern. The utilization of big data is transforming science, engineering, medicine, healthcare, finance, business, and ultimately the whole society. This special issue provides a leading forum for disseminating the latest research in big data research, development, and applications in future sensing. This journal set out to publish topics devoted to this special issue, which solicits high-quality original research papers (including surveys and reviews) in sensing aspect of big data with emphasis on 5Vs (Volume, Velocity, Variety, Value, and Veracity) big data science and foundations, big data infrastructure, big data management, big data searching and mining, big data privacy/security, and big data applications in sensing. The result is a collection of nine outstanding articles submitted by investigators representing fifteen institutions across Asia, Europe, and North America.

T. Zhu et al. summarized the state-of-the-art technologies as well as the research opportunities and challenges in big data sensing in “Emergent Technologies in Big Data Sensing: A Survey.” Specifically, various big data techniques (e.g., platform development, data processing techniques, security, and privacy preserving techniques) have been highlighted. F. Zou et al. proposed a DNS graph mining-based malware detection approach in “Detecting Malware Based on DNS Graph Mining.” They evaluated the proposed malware detection approach with real-world dataset, which is collected from campus DNS servers in three months. They also built a DNS graph consisting of 19,340,820 vertices and 24,277,564 edges. Based on the graph, their approach achieves a true positive rate 80.63% with a false positive rate 0.023%. With a false positive of 1.20%, the true positive rate was improved to 95.66%.

L. Jiang et al. proposed a heterogeneous data integration model based on retrospective audit to locate the original data source and match the data in “A Multisource Retrospective Audit Method for Data Quality Optimization and Evaluation.” To improve the integrated data quality, a retrospective audit model and associative audit rules are proposed to fix incomplete and incorrect data from multiple heterogeneous data sources. The heterogeneous data integration model based on retrospective audit is divided into four modules including original heterogeneous data, data structure, data processing, and data retrospective audit. Some assessment criteria such as redundancy, sparsity, and accuracy are defined to evaluate the effect of the optimized data quality.

L. Chen et al. proposed three algorithms to solve the virtual machine placement problems in large scale in “MTAD: A Multitarget Heuristic Algorithm for Virtual Machine Placement.” Specifically, they propose a physical machine (PM) classification algorithm by analyzing pseudo time complexity and find out a important factor (the number of physical hosts) that affects the efficiency, which improves running efficiency through reduction number of physical hosts. Then, they presented a VM placement optimization model using multitarget heuristic algorithm and figured out the positive-negative vectors of three goals using matrix transformation so as to provide the mapping of VMs to hosts by comparing distance with positive-negative vectors. Finally, they designed a concurrent VM classification algorithm using the K-means method to address the placement efficiency problem of large-scale virtual serial requests. L. Cai et al. proposed a big data visualization algorithm analysis integrated model, which integrates the processing of big data and the visualization of data as a whole in “Big Data Visualization Collaborative Filtering Algorithm Based on RHadoop.” They used $h a d o o p_{1} . X$ as the data storage and used R as the compiler environment in the model. They also designed and implemented a paralleled collaborative filtering algorithm using the model.

With the fragile capacity of security protection, worms can propagate in the integration network and undermine the stability and integrity of data. The integrated network worms called BD-worm constitute one of the major security threats for data security. BD-worm spreads through transferring files with regulated computing resource consuming (CRC) to avoid user security awareness. To address this issue, S. He et al. presented a BD-worm simulation model, including process of infecting connection probability, opening probability, host-defense, and CRC in “Worms Propagation Modeling and Analysis in Big Data Environment.” M.-Y. Kang and J.-S. Nam designed and implemented a scheme for efficient massive contents dissemination in a sensor smart network system, which is modeled as a graph in “Efficient Massive Contents Distribution Strategy for P2P Using Sensor Smart Network.” The proposed system uses minimum traffic overhead to retrieve the contents information, and it simplifies the calculation of the optimal route when compared with conventional schemes.

N. Mishra et al. proposed a Cognitive Oriented Internet of Things (IoT) Big-Data Framework and implementation architecture, which contains IoT big-data layering architecture, data organization, and knowledge exploration subsystems, for effective data management and knowledge discovery. The proposed work is well-suited with the large-scale industrial automation applications in “A Cognitive Adopted Framework for IoT Big-Data Management and Knowledge Discovery Prospective.” Y.-J. Du et al. developed a generalized digraph spectral clustering method in “Digraph Spectral Clustering with Applications in Distributed Sensor Validation.” Different from the traditional methods, which are developed for undirected graph, the proposed method considers the network circulation while clustering the sensors. The extensive simulation results demonstrate that the proposed method outperforms the traditional spectral clustering method by increasing the bad detection ratio from 19% to 41%.

Uniformly, these authors highlight both the promise and the challenges faced by this emerging field of big data. In summary, this special issue provides a snapshot of the current status of big data research with a specific focus on sensing. Hopefully, this publication will trigger more discussions and new research directions in big data arena.

Ting Zhu Qingquan Zhang Sheng Xiao Yu Gu Ping Yi Yanhua Li