Abstract
In future IoT big-data management and knowledge discovery for large scale industrial automation application, the importance of industrial internet is increasing day by day. Several diversified technologies such as IoT (Internet of Things), computational intelligence, machine type communication, big-data, and sensor technology can be incorporated together to improve the data management and knowledge discovery efficiency of large scale automation applications. So in this work, we need to propose a Cognitive Oriented IoT Big-data Framework (COIB-framework) along with implementation architecture, IoT big-data layering architecture, and data organization and knowledge exploration subsystem for effective data management and knowledge discovery that is well-suited with the large scale industrial automation applications. The discussion and analysis show that the proposed framework and architectures create a reasonable solution in implementing IoT big-data based smart industrial applications.
1. Introduction
The IoT objects may be defined as any tangible thing associated with the real world entities such as man, machine, and animals, having unique identification and self-directed data transfer capability over the network. To technically construct an IoT object, three tiny components such as sensing chip, electromagnetic coil, and regulating capacitor are to be embedded into a very small sized container that can easily place into any real world entity associated with the IoT applications; however, the technical configuration actually depends on types of entities and applications. More specifically we can define IoT object as a sensor or RFID device or any smart object having internet connectivity over physical IP and capable to transmit the data to the network autonomously without any human interference [1]. The IoT technology integrates with the big-data approach with the purpose of regulating smart industrial automation application [2]. In a large scale industrial automation applications, thousands of automated machines are fabricated with such trillions of IoT chips to technically construct the IoT objects and the networks of such trillions of IoT objects may constitute a large scale industrial IoT environment, from where huge structured, semistructured, and unstructured IoT big-data are produced in a real timescale [3]. Basically, the structured data are having precise layouts whereas the semistructured and unstructured data are in log format that ensure high data redundancy, inconsistency, anomalies, incompleteness, and so forth and also create a hazard to data management and knowledge discovery. The IoT big-data management system may be defined as a distributed computing system that especially deals with such semistructured and unstructured IoT big-data [4]. Due to high elasticity, flexibility, and dynamicity, the IoT objects penetrate into each and every real-time monitoring application such as business, scientific, industrial, health care, agriculture, animal farming, transportation, and many more [5]. In almost all IoT applications, a huge amount of data is dumped into the storage that are highly required for the purpose of data analysis, modelling, information transformation, knowledge production, and the decision generation. In upcoming large scale industrial automation, the quantity of IoT objects may be some trillions that can generate some thousand exabytes of data. For example, the IoT objects used in a large scale industrial electrical power grid infrastructure generates huge log data at the rate of one terabyte per day if data are extracted in every 5–15 minutes [6]. So such IoT big-data needs suitable analytic framework to produce knowledge so as to measure the operational efficiency, load distributions, machine maintenance, and so on in industrial automation application. The digital technology is tremendously developed to scale the storage of IoT big-data; however, the data management and knowledge discovery speed may not be improved significantly to perform in a timeline basis. The IoT big-data management and knowledge discovery aims to a COIB-framework for real-time data management, so it remains a challenging issue for the knowledge mining researchers. For an effective data management and knowledge discovery, the COIB-framework always needs the suitable data management and analytic tools to transform large scale heterogeneous agile data streams into actionable knowledge, and the architecture is presented for the said purpose in Figure 1.

Knowledge depository system architecture.
In heterogeneous knowledge depository framework, the IoT big-data sources are associated with different frameworks having different logical schemas. In each logical schema, the data structure is defined. The frameworks are heterogeneous because of using different models, name, scale, structures, and levels of abstraction. So it is tedious for the IoT big-data management and knowledge discovery tool to synchronise the data access so as to produce the knowledge of tactical values in a specific time scale. The IoT big-data frameworks use different formats and structures for data storage that may lead to data integration and transformation hazards. As a result, we need a comprehensive COIB-framework for the effective integration of heterogeneous IoT big-data sources in such a way that the higher layer can extract the right data and meta data for better data management and knowledge discovery prospective.
The remainder of this paper is organized as follows. In Section 2, we discuss related study on IoT big-data management and knowledge discovery. In Section 3, we propose a COIB-framework for IoT big-data management and knowledge discovery aimed at regulating large scale automation applications. In Section 4, we present the implementation architectures, data organization, and analysis so as to stimulate the feasibility of the COIB-framework in real-time industrial automation applications. Finally, we conclude this paper in Section 5.
2. Related Work
The IoT big-data management and knowledge discovery is a key research challenge for the real-time industrial automation applications. So we study some existing system, models, or frameworks that are implemented in IoT big-data management and knowledge discovery prospective. The IoT big-data management includes several managerial activities such as data collection, integrations, cleaning, storage, processing, analysis, and visualizations that have been implemented through various systems, models, and frameworks [7–10]. The work in [11] gives a brief research direction in data management and knowledge discovery prospective in IoT big-data management platform, in which three activities are mainly associated, that is, data association, inference, and knowledge discovery. The work in [12] specially emphasizes a cognitive IoT framework for effective decision making and knowledge discovery. In order to enhance the decision making process, the work in [13] emphasizes designing an information management framework that performs data collections and information extraction operations in a real time basis so as to meet the need of the applications. Also, some ontology based modelling mechanisms are suggested to enhance the knowledge discovery from IoT big-data [14].
The traditional IoT big-data research uses noncognitive framework to perform numerous data management activities; however, the data management and knowledge discovery always need a cognitive framework to perform the self-regulated operations in case of emergencies without human interventions. In industrial automation applications, the cognitive data management and knowledge discovery framework must have significant role towards self-regulated monitoring operations. So we propose a COIB-framework for effective data management and knowledge discovery within the industrial automation environment.
3. COIB-Framework Organization
To implement the COIB-framework, we must ensure that the automation environment has high industrial internet connectivity among the networked IoT objects that are fabricated in different machines so as to regulate the machines functional and operational efficiency. The functional aspects of the COIB-framework organization are presented in Figure 2. For a large scale parallel and distributed environment, each IoT segment has an independent control to application monitoring. So the entire industrial automation environment may be logically divided into several IoT segments so as to be compatible with the standard network configuration management requirements [15]. The IoT segments are responsible for producing IOT raw-data streams and act as raw-data sources for their corresponding segments. Those IoT big-data streams are highly unstructured with respect to the name, scale, abstraction level, and so forth and ensure high data inconsistency, redundancy, incompleteness, and other anomalies. So the big-data aggregators do the data fusion operations over those large IoT big-data streams. The data fusion operation may use some standard data semantics to eliminate those anomalies so as to produce the cleaned data with total quality management.

COIB-framework organization.
The IoT big-data classifiers split the cleaned data into multiple clusters based on data behaviours, characteristics, and domain to ensure it is easy to access, easy to understand, and easy to use. In the industrial automation application, the data domain includes operational data, production data, status data, maintenance data, and so forth. The HBase storage takes the responsibility to scale and store those data clusters into its multiple storage nodes. In HBase system, the tables can be set like relational database, such that each table contains rows and columns, and each table must have an element that defines as the primary key. For better supervision and access, the master node holds the control over the storage nodes. The storage nodes store the actual data clusters whereas the master node stores the metadata, that is, all access paths to the storage nodes. So in HBase system, the IoT big-data can be scaled effectively and efficiently. The IoT big-data analysis is an important aspect of regulating the data management and knowledge discovery issues through the cognitive and computational intelligent tools. For an industrial automation environment, several standard cognitive data analytic tools can be used to produce the cognitive decisions, plans, and actuations so as to control the total automation environment. Thus, in order to have the success of COIB-framework implementation in an automation environment, the IoT big-data generation, aggregations, classifications, storage, and the analysis must be synchronised in a real timescale basis.
3.1. Functional Analysis of COIB-Framework
Here, four important functions of COIB-framework are discussed and those functions are as follows. (i) IoT big-data aggregation, (ii) IoT big-data classification, (iii) IoT big-data storage, and (iv) IoT big-data analysis.
3.1.1. IoT Big-Data Aggregation
It encompasses the fusion of large scale data streams according to some standard data semantics. In each periodical time interval, the data aggregator checks the data availability in individual IoT raw-data source, if some availability finds then proceeds to perform the data integration process with an aim to achieve data integrity, consistency, completeness, and overly the total quality management. The data quality assessment is an important aspect of total quality management that further leads to achieving the desired quality of big-data analysis results. For a real-time monitoring application, once the data streams are successfully aggregated or fused in a periodic interval, the subsequent functional processes must be executed for effective data management and knowledge discovery within the critical timeline.
3.1.2. IoT Big-Data Classification
The real-time data classification and clustering for agile data streams are the challenging issues especially in IoT big-data environment. The data classifier initiates the classification process to formulate the fused data into multiple data groups, such that homogeneous data present in each data group. For an automation application, the fused data can be classified into multiple groups of having multiple event types such as machine status data, functional data, inventory data, production data, product quality data, and many more. Some nonlinear data classification process over the multilayer perceptron may be fit to the problem of large scale IoT big-data classification problem. For the time critical big-data classification and clustering, a simple “If…then…else” classifier can be used to formulate the aggregated data into multiple data clusters. For better clarification, let us design a five clusters based data classifiers as described in Algorithm 1. In Algorithm 1, we assume that aggregated IoT big-data streams =
(1) Input-agile heterogeneous data streams. (2) Process-executes an (3) Output-storage of data streams into to the respective data clusters such that each cluster stores the identical data of a specific event type. (4) (5) //Check periodic aggregated data streams ( (6) While (availability of (7) { (8) If ( (9) Then move_data_streams ( (10) Else if ( (11) Then move_data_streams ( (12) Else if ( (13) Then move_data_streams ( (14) Else if ( (15) Then move_data_streams ( (16) Else (17) move_data_streams ( (18) } (19) Set (availability of (20) //Check the move operations (21) If (completed_move = true) (22) Then goto Step (2); //for next periodic (23) Else continue the move operations;
3.1.3. IoT Big-Data Storage
In the current days, the technology is mounted to the large scale data storage in the IoT big-data environment. The HBase database management system takes the responsibility for IoT big-data storage across storage nodes that are headed by a control node. Based on application, the entire IoT space can be segmented into nonintersecting segments such that, in each segment, the HBase system can be implemented so as to ensure a distributed control over the large scale IoT big-data base. The control node of one HBase system must communicate with the control node of another HBase system for effective data distribution and load balancing. Algorithm 1 ensures to formulate the data clusters based on event types and those data clusters can be effectively managed by the HBase system towards monitoring the automation applications. A standard IoT big-data storage engineering approach is described in following steps for the purpose of future big-data sensing and storage aimed at large scale IoT based automation applications.
Step 1.
Recognize total IoT space.
Step 2.
Identify the nonintersecting logical IoT segments.
Step 3.
Observe the heterogeneous distributed data sources in each segment.
Step 4.
Consider the data source of each segment as IoT big-raw data source.
Step 5.
Set and configure the middleware in between the physical sensing and actual storage layer.
Step 6.
Set and configure a storage management tool (HBase).
Step 7.
Check the storage incompatibilities.
Step 8.
Normalize the storage incompatibilities.
Step 9.
Implement Algorithm 1, to place the data streams into the data clusters of corresponding storage servers.
In each IoT segment, a network of millions of smart IoT objects may be there to regulate the large scale automation application. So the necessary storage configuration should be made carefully so as to ensure an effective synchronization in between the architectural layers.
3.1.4. IoT Big-Data Analysis
The data analysis takes the account of data modelling, visualization, and presentations of published data and knowledge in accordance with standard IoT big-data management and knowledge discovery strategy. The main aim of the analysis is to turn IoT big-data into knowledge, actions, and decisions in a real-time basis. The cognitive adopted computational intelligence tool acts as catalyst to data management and knowledge discovery in order to produce the cognitive decisions, plans, and actuations for the large scale industrial automation applications. We propose a fuzzy-neuro-GA based real-time data analysis approach that can be applied to future IoT big-data management and knowledge discovery for large scale industrial automation application. The detailed approach is described in Algorithm 2.
(1) Input-identical data clusters based on event types. (2) Process-executes a machine learning system. (3) Output-generates cognitive decisions, plans, and actuations. (4) Validate the signals and event data. (5) Extract the identical event features. (6) Prepare training data sets. (7) Use GA (genetic algorithm) based optimization technic to extract selective event features from training data sets. (8) Set and configure an NN (neural network) architecture that fit with the problem. (9) Set and configure fuzzy weights from the event features. (10) Train NN-structure with fuzzy weights. (11) When training is over, test the NN-structure with the selective event features that consider as data sets. (12) Trace the cognitive outputs. (13) Diagnose those cognitive outputs to generate cognitive decisions, plans, and actuations so as to regulate the machine operations for the automation application.
4. Analysis and Discussion
It considers the theoretical analysis of our proposed COIB-framework that accommodates heterogeneous IoT big-data sources so as to produce the higher knowledge abstracts that can further analyse for the industrial strategic, tactical, and operational decisions. The operational decisions have key role towards the industrial automation application; however, the strategic and tactical decisions are relating to the planning automations of the industrial applications. The COIB-framework mainly aims to IoT big-data management and knowledge discovery prospective, so a detailed review is presented in Table 1. The researchers have analysed numerous systems, models, and frameworks for the overall big-data management and knowledge discovery activities over IoT environment. The review reveals that the major real-time data mining and knowledge discovery activities can also be carried out over the IoT big-database so as to generate the meta data, information, and explicit knowledge for regulating the automation applications. All big-data management and knowledge discovery activities of Table 1 are integrated into an unify knowledge discovery strategy, in which three diversified areas such data mining, data warehousing, and machine intelligence are integrated toward achieving a new prospective over IoT big-data based smart industrial automation application. Almost all automation applications are of business-critical, safety-critical, and mission-critical types, the failure of which lead to huge losses of business, society, and social property. So in-time data management and knowledge discovery have the key role to minimise those losses through resolving the possible threats and challenges that anticipate in the applications.
Comparison of various works based on the system/model/framework along with big-data management and knowledge discovery prospective over IoT platform.
4.1. Implementation Architecture
The implementation architecture shows the way to implement the COIB-framework in large scale industrial automation applications and the architecture is shown in Figure 3. The data center is responsible to execute the data aggregation, classification, and storage operation. In each center, dedicated powerful servers can be used to carry on the respective operation. The knowledge production center access the precise data and proceeds to generate the explicit knowledge using cognitive computational intelligence (CI) tool.

Implementation architecture.
The actuation center receives that knowledge, identifies the important actions, prioritizes those actions using an action queue, and sends to the regulatory center to take immediate actions. Based on actions, the regulatory center automatically generates regulatory instructions and transfers to the IOT based microcontroller objects to take actions against the operations. As the IoT big-data management mainly incorporate the data management and knowledge discovery process, so we focus the four subsystems (see Figure 4) that are to be managed under the COIB-framework. In order to improve the efficiency of data management and knowledge discovery, those four subsystems management play vital role.

IoT big-data management subsystem.
The large scale industrial automation application needs a complete architecture to implement the COIB-framework, so a discussion is made on the overall IoT big-data layering architecture with a purpose of wide implementations.
An overall architectural representation is presented in Figure 5.

Overall IoT big-data layering architecture.
In this pyramidal architecture, the knowledge processing layer is specially introduced with a purpose of effective knowledge production that may be exploited in the application layer so as to manage various IoT applications. The effective knowledge production targets the performance of cognitive tools like fuzzy ontology, data semantics, neural computing, and so forth. In automation application, the IoT network uses the industrial internet to integrate the multifaceted physical machines with networked IoT objects and intelligent tools so as to regulate the automated operations [25]. The industrial internet integrates the various diversified technology like machine to machine communication, machine learning, IoT, big-data, cloud, and computational intelligence for industrial automation applications. The IoT middleware is mainly responsible for data processing, management, and communication in addition to maintaining the data perturbations in between the architectural layers [26].
4.2. Data Organization
In an IoT big-data system, bulk amounts of data are organized in the form of NoSQL databases [27]. The IoT big-data is a spatiotemporal database that depends on the time and location. In an IoT big-data, more numbers of rows are there along with less number of columns. So the column oriented data-depository can greatly improve the performance of IoT big-data in terms of data accessing and query processing. The heterogeneous IoT big-data cannot be stored in any relation database. So IoT big-data cell (NoSQL database) may be used to resolve the storage limitation and constraints of relational database [16, 28]. The data organization is an important aspect in large scale industrial automation application. So in order to implement the IoT big-data system in large scale automation applications, a data organization subsystem is presented in Figure 6. In this framework, we assume N-number of machines are compounded with the IoT big-data system and regulated under the COIB-framework. In each machine, P-number of IoT objects are embedded in such a way that each IoT object can take the responsibility of sensing and sending n-number of events data to the COIB-framework, where

Data organization subsystem.
In order to ensure the database implementation, an HBase schema for the above data organization framework is described in Box 1. In this logical schema, we assume the arrival of events data from IoT big-data environment at n-number of time intervals to the COIB-framework.
Rowkey M/c-id { Column family (IoT object-1) { Column Column ⋮ Column Column family (IoT object-2) { Column Column ⋮ Column ⋮ Column family (IoT object-P) { Column Column ⋮ Column }
4.3. Knowledge Exploration
The knowledge organization from IoT big-data always needs a database and knowledge base to infer the cognitive outputs like decisions, plans, and actuations for monitoring time critical automation applications [29]. The IoT big-database consists of organised collection of facts that are organized in a standard NoSQL framework as illustrated in Box 1. Now the problem is designing a framework for IoT knowledge base that consists of predefined sets of rules to impart cognitive judgements and decision making for achieving desired intelligence. An IoT knowledge base framework inherits the features of natural language processing model, conceptual dependency model, logical inference model, and predicates model to construct a knowledge base information system (KBIS). The KBIS consists of rule-based, mathematical-based, statistical-based, textual-based, case-reasoning-based, and structure-based frameworks to incorporate human historical experience, analytical skills, logical reasoning, and innovative ideas, and ensures an effective knowledge management system. We highlight an IoT knowledge exploration subsystem as described in Figure 7. The framework consists of four knowledge management activities such as knowledge acquisition, knowledge storage, knowledge dissemination, and knowledge application. The knowledge acquisition implements a knowledge work station for knowledge discovery within the expert knowledge networks.

IoT knowledge exploration subsystem.
The knowledge storage performs the document management of IoT knowledge database within the operational framework of the pre-connected expert system. The knowledge dissemination uses search engines to spread the knowledge into the time critical automation application and the knowledge application includes the planned management of enterprise applications through explicit and tactical knowledge in order to achieve the core competencies of the IoT big-data based smart industrial applications.
4.4. Statistical Analysis
In future IoT big-data management and knowledge discovery prospective, the platform of sensors, RFID devices, wearable intelligent devices, and other smart technologies can be evolved into a large scale IoT big-data environment. So for analysis purpose we use some real-time data sets of specific event type and further transformed into fuzzified event data sets (EDS) in the range of
Load distribution study of MCs.
Figure 8 describes the load distribution analysis with respect to the event instances for an industrial automation application. In Figure 9, the MC's load data are taken in x-axis to analyse the reliability of MCs with respect to the set of event instances through survivor function with a confidence level of 95%. Based on the reliability values of individual MC, the MC's MTTF (mean time to failure) can be estimated to avoid probable failures and hampering of production without any human interventions. In a large scale IoT big-data environment, at a regular time interval such enormous sets of event instances are generated for the purpose of analysis and real-time decision making. Those enormous sets of event instances are highly unstructured, ambiguous, and even not adequate to discover the knowledge of strategic values. So in this situation, the machine learning tools can greatly support to turn such huge event instances into some explicit knowledge for effective business use such as production planning, scheduling, decision making, strategy building, and many more business uses.

Load distribution analysis of MCs.

Reliability analysis of MCs.
4.5. Computational Analysis
This phase discusses the computational analysis of the COIB-framework by mapping the possible functions onto the neural network platform to minimise the framework's error. Here, we implement a computational machine learning approach based on Algorithm 2, in which the neurofuzzy mechanism is more effective in dealing with more tolerance and uncertainties that are actually faced by the time-critical applications [31, 32]. The machine learning approach aims at training and testing the COIB-framework to ensure functional and operational accuracy within the large scale automation environment. We use fitness parameter θ that mimics the behavioural aspects of genetic algorithm in order to classify the data sets into multiple generic data clusters. The higher fitness value θ associated with an event dataset implies higher quality, so to classify the quality of event datasets, we can define three fuzzy membership grades through varying θ values in the range of
Now the selective features can be extracted based on the θ value associated with an event dataset that can be used for COIB-framework learning. The fuzzy weights are also configured from the selective EDS ensuring better tolerance and precision. Here more than one thousand event instances are assumed in between the time intervals
The event instances are randomly allocated into five data samples such that 70% event instances are used for training, 15% event instances for validation, and the rest 15% for testing. In this analysis, we get absolutely zero error or minimum error

Error computation on event instances.
5. Conclusions
In this paper, we propose a COIB-framework for the effective data management and knowledge discovery over the IoT big-data. We also propose implementation architecture along with the overall IoT big-data layering architecture to promote the implementation feasibility of the COIB-framework in large scale industrial automation environment. The COIB-framework follows the principles of data centric architecture to incorporate the large big-data streams by efficiently considering the data access paths and may be adequate in large spatiotemporal query management. A pyramidal IoT big-data management system is proposed to highlight the important subsystems rendering from IoT object management to real world applications management. Finally data organization and knowledge exploration subsystems are presented to implement the machines level big-data streams management. Also the integration of total architectures and frameworks gives a real time platform in IoT big-data management and knowledge discovery prospective for a large scale automation application. We add an absolutely new case in point for the functional analysis, in which the event data sets are normalized and transformed into standard fuzzified data sets having more than one thousand activity instances in the range of
In our future issue we would like to propose an IoT big-data engineering and reengineering framework for the IoT big-data applications. The future issues of the IoT big-data management may include time schedule based data transformation strategy, mining and analysis of large data streams generated by trillions of IoT objects employed in heterogeneous applications, and the security and data perturbation issues relating to IoT big-data management subsystems.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The financial support provided by the Ministry of Science and Technology, China, through Grants MOST 103-2221-E-182-003, MOST 103-2511-S-182-053, and CMRPD3B0033 of Chang Gung University is gratefully acknowledged.
