Abstract
Massive sensing data are generated continuously in the Internet of Things. How to organize and how to query the big sensing data are big challenges for intelligent applications. This paper studies the organization of big sensing data with event-linked network (ELN) model, where events are regarded as primary units for organizing data and links are used to represent the semantic associations among events. Several different types of queries on the event-linked network are also explored, which are different from queries on traditional relational database. We use an instance of smart home to show the effectiveness and efficiency of organization and query approaches based on the event-linked network.
1. Introduction
We are living in an era of emerging big data. Data are flooding in at rates never seen before: doubling every 18 months. This is a result of the rapid growing Internet of Things and the mobile computing which make the greater access to consumer data from public, proprietary, and purchased sources, as well as new information gathered from web communities and newly deployed smart assets [1, 2]. Due to the diversity and complexity, how to organize these data has been always a primary challenge for the data scientists [3–11].
Many different data models have been developed to organize data in the past decades. The development of the data models can be divided into three phases.
(i) Earlier data models included network model, relational model, entity set model and Entity-Relationship (ER) model. Network model provides a separating entities and relationships with more natural view of data, while it is difficult to achieve data independence [12]. Up to now, the relational database systems are the most widely used in industries for their easy-to-use nature, though semantic factors of the data cannot be reflected perfectly [13, 14]. The entity set model based on set theory also ensures data independence, but its viewing of values may not be convenient to some people [15]. The Entity-Relationship (ER) model is a conception model [16] which consists of three elements: entities, attributes, and relationships. It is considered as the most popular tool for conceptual model design for its simplicity and clarity in structure [17].
(ii) With the rapid development of the Internet since the 1990s, the relational database systems have been applied at a rate of exponent. Meanwhile, two major challenges exist for the emerging contents on the Internet: (1) how to organize volumes contents of semistructure and (2) how to make the digital contents understandable for machine by representing the semantic meaning of the data. A series of semantic technologies like XML (http://www.w3.org/XML/), SPARQL (http://www.w3.org/TR/sparql11-query/), and OWL (http://www.w3.org/Submission/2006/10/) came into being. The main characteristic for these technologies is to represent all elements by adding some marked label to annotate corresponding meaning [18]. Semantic link network is proposed as a semantic data model for managing web resources, in which nodes can be any type of resources and edges can be any semantic relation [19–22]. The schema theory provides the basis for normalized management of semantic link network [19]. These techniques are well proposed for querying, browsing, and reasoning semantic data. However, they have not shaken the dominance of the relational model in industries and they do not eliminate the important role of information extracting in the Internet of Things. Integrating, organising, and interpreting data are still big challenges to achieve the vision of the Internet of Things [23].
(iii) The emergence of Web 2.0 is a milestone in the development of information and communication technology. Volumes of diverse data are flooding in at an unimaginable rate, especially no structured contents with the development of various IoT technologies. The primary challenge for this is to find reasonable organizing models for these big data. Obviously, the traditional relational model is far from competent [16]. Several No-sql models, such as Key-value model, Document stores model, Column Family Stores, and graph databases, have been proposed to model the big data.
Key-value model organizes the data with a simple structure (key, value) like traditional dictionaries. Though its efficiency in query is higher than traditional models [24], it is hard to use in practice for its uninterrupted arrays and isolated values lacked of relationships between datasets [25]. Document stores model encapsulates key-value pairs in documents of self-contained form. It is more suitable to handle complex data like nested contents and performs well in query, integration, and schema migration [25].
Column Family Stores is inspired by Google's Bigtable which organizes arbitrary number of key-value pairs within rows [26]. It is more suitable for applications dealing with huge amounts of data stored on very large clusters [25]. Graph databases organize object data and relation data with nodes and edges and it is easy to describe complicated constraints for the schema defined range of keys and values. It is widely used in location based services, knowledge representation, and path finding in navigation systems, recommendation systems [25].
The traditional relational database and semantic data models cannot meet the requirements of the volumes semistructured and unstructured big sensing data generated by the sensing devices in the Internet of Things. No-sql models are successful to model and manage these sensing data. But, as unstructured data is gathered in unprecedented levels, the analysis rather than the modeling and storage of this data becomes a challenge. As known, the raw big sensing data stored in the No-sql database are messily scattered and cannot be used directly for two reasons. First of all, most of them cannot be interpreted by the model; the additional features have to be handled in the application logic. Secondly, overwhelming majority of the big data are few of value while only a drop in the bucket is valuable. And, the drop means important events occur.
In light, a reasonable semantic data model is necessary to organize the mined and extracted events information from the raw big sensing data. The event-linked network (ELN) model proposed in [19] can regulate events and their internal semantic relations efficiently. This paper aims at introducing the conceptions, the organizing model, and the querying of the event-linked networks and exploring a practical sensing scenario of smart home. The reasoning ability enables the ELN model to discover the potential useful semantic links between events and useful patterns. The contributions of this paper include (1) an efficient method to extract the events and their semantic links from the raw complex data; (2) several types of queries on event-linked network; (3) a practical case study to show the proposed methods.
2. Conceptions: Event-Linked Network
The event-linked network model involves three kinds of elements: events, event-links between events, and reasoning rules among event-links. In this section, we introduce the conceptions of the model and show how to construct an event-linked network.
2.1. Event and Event Type
Always, the primary information of an event about When, Who, and What is required to be recorded, and sometimes Where is also needed. Therefore an event can be regarded as a 3-tuple
There are various types of events. An event type can be regarded as a set of formatted events. Event types are used to regularize the events extracted from massive and heterogeneous data. An event type
2.2. Event-Link and Event-Link Type
An event-link
An event-link type
2.3. Reasoning Rule
Links between events weave the isolated events into a connected network which is more useful than a set of isolated events and provides a global view of situation with the internal relationships among events. More useful evolving patterns would be more easily mined from the network, and potential relations between events can be deduced based on the existing links according to some reasoning rules. For example, we can find the inherent reasons of some illness event by the transitivity of the
A reasoning rule is a product rule with the form of (1)
For example, the rule
2.4. Event-Linked Network and Schema
Big data in the Internet of Things contains millions of events and their link information. For a given domain or application, the events and their links can be extracted according to the defined event types and link types. Indeed, such a set of events and the corresponding set links can build a network.
An event-linked network
A set of link types can be defined according to the domain or the application. For two certain events, the semantic links between them can be determined by their inherent properties. Experts would work out a set of useful event types for a given application and a set of link types as well as a set of reasoning rules based on the link types. These well-defined event types, link types, and the reasoning rules construct a domain- or application-dependent event schema. Indeed, an event schema is a domain-dependent knowledge base to differentiate the critical and sensitive information from the massive and heterogeneous data.
A schema of event-linked network
ELN schemas may vary on different application scenarios and play the most important role for organizing the data in the Internet of Things. Traditionally, the event schema should be defined by the domain experts or mined from volumes of history information.
2.5. Constructing Event-Linked Network
In the Internet of Things, trillions of sensing data are generated by smart devices distributed in many scenarios like e-Health and smart home. Various meaningful events are extracted from sensing data by event extracting agents in these application scenarios. These events are more useful and understandable for users to grasp the key points for problems-solving. The derived events from raw data would be transmitted to the event-linked network and would be organized for users or intelligent agents to access easily.
How to extract the event information from the raw data and how to organise or query them are two key challenges herein. Reference [27] proposes an effective approach to extract events from numerous sensing data leveraging predefined event schemas. This paper focuses mainly on the latter challenge: event query.
3. Querying on ELN
Querying on ELN aims at finding a specific instance for a given ELN schema
3.1. Select
Select querying operation is to find a subinstance for a specific instance S based on some conditions. The select querying operation can be defined as
Let
F is a logic expression
(1)
For example, we can use the following expression to select an instance from S where the event type is either
(2)
For example, we can use the following expression to select an instance from S where the link type is either
(3)
For example, we can use the following expression to select an instance from S where the event is either
(4)
For example, we can use the following expression to select an instance from S where time of having supper is between 19:00:00 and 22:00:00 and locationID is 2:
3.2. Intersection
Intersection querying operation is to find the common instance between two specific instances
Let
For example, we can use the following expression to find the instance whose time is later than 19:00:00 in S and earlier than 22:00:00 in S:
3.3. Union
Union querying operation is to find the instance which exists either in a specific instance
Let
For example, we can use the following expression to find the instance whose time is later than 19:00:00 in S or later than 22:00:00 in S:
3.4. Subtract
Subtract querying operation is to find the instance which exists in a specific instance
Let
For example, we can use the following expression to find the instance whose time is later than 19:00:00 but earlier than 22:00:00:
It is worth knowing that select querying can be transformed to insert querying, union querying, and subtract querying, but not vice versa. For example, let
4. Case Study: Event-Linked Network of Smart Home
4.1. Scenario
A smart home is an intelligent agent that perceives state of resident and the physical environments using various kinds of sensors like temperature sensors, motion sensors, light controls, door state sensors, and so forth. These sensors can capture massive detail data about individual's activities, environment settings, and inhabitants' characteristics. In this section, we would like to illustrate how to construct the web of events through the proposed models by using the open datasets from House_n (http://architecture.mit.edu/house_n/) smart project in University of MIT which uses a set of small, simple state-change sensors. The sensors are designed to be “tape on and forget” devices that can be quickly and ubiquitously installed in home environments.
Two datasets (http://courses.media.mit.edu/2004fall/mas622j/04.projects/home/) were collected for two different subjects. Two individuals lived alone in one of the bedroom apartments. Herein, we use only the dataset of one subject who was a professional 30-year-old woman and spent her free time at home. 77 state-change sensors were installed and the sensors were deployed in the bathroom, bedroom, kitchen, living room, porch, and other household locations. The sensors were left unattended, collecting data for 16 days in the apartment. During the study, the subject used the context-aware ESM to create a detailed record of her activities as a sampling dataset.
4.2. Event Types
Event extraction (i.e., activity recognition) is to discover and recognize the event (activity) information from the detail raw data by using various methods like machine learning, data mining, and statistic method in the Internet of Things [19, 20, 27]. Our schema depends on the sampling dataset. Firstly, we work out the event schema of smart home which involves a series of event types: Bathing, Cleaning, Doing laundry, Going out to work, Dressing, Preparing a beverage, Preparing a snack, Preparing breakfast, Preparing lunch, Preparing dinner, Toileting, and Washing dishes. A duration threshold for each event type is necessary to identify an event. However, different individuals could have different durations for each event type; that is, an individual might have his/her personal event schema in a smart home. Thus, we should define personalized event schema according to one's history sample annotated dataset. Herein, we define duration threshold of each event type according to the minimum duration in the sample dataset retrieved through a specified survey.
The event types are listed as follows.
(1) Bathing: ((duration ≥ 1 min, 55 sec) ∧ (locationName = Bathroom) ∧ (objectName = Shower faucet ∨ objectName = Sink faucet-hot ∨ objectName = Sink faucet-cold), (date, start_time, end_time, duration, location, Bathing)).
Description. Bathing is a type of events which occurred in the bathroom (monitored by motion sensors) and the duration of either of the objects (including shower faucet or sink faucet (hot) or sink faucet (cold)) is monitored for no less than 1 min 55 sec. Once an event of Bathing occurs, events information including date, start_time, end_time, duration, location, and Bathing should be recorded.
(2) Cleaning: ((duration ≥ 3 min, 22 sec) ∧ (locationName = Kitchen) ∧ (objectName = Garbage_disposal), (date, start_time, end_time, duration, location, Cleaning)).
Description. Cleaning is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 3 min 22 sec). The monitored object is garbage disposal. Once a Cleaning event occurs, events information including
(3) Doing laundry: ((duration ≥ 43 sec) ∧ (locationName = Kitchen) ∧ (objectName = Laundry Dryer), (date, start_time, end_time, duration, location, Doing_laundry)).
Description. Doing laundry is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 43 sec) and the monitored object is laundry dryer. Once a Doing laundry event occurs, events information including date, start_time, end_time, duration, location, and Doing_laundry should be recorded.
(4) Going out to work: ((duration ≥ 1 min 16 sec) ∧ (locationName = Foyer) ∧ (objectName = Door), (date, start_time, end_time, duration, location, Description. Going out to work is a type of events which occurred in the foyer (monitored by door sensors for no less than 1 min 16 sec) and door is the main monitored object. Once a Going out to work event occurs, events information including
(5) Dressing: ((duration ≥ 1 min 3 sec) ∧ (locationName = Bedroom) ∧ (objectName = Jewelry box ∨ objectName = Drawer ∨ objectName = Light switch), (date, start_time, end_time, duration, location, Dressing)).
Description. Dressing is a type of events which occurred in the bedroom (monitored by motion sensors for no less than 3 min 22 sec) and monitored objects include jewelry box, drawer, and light switch. Once a Dressing event occurs, events information including date, start_time, end_time, duration, location, and Dressing should be recorded.
(6) Preparing a beverage: ((duration ≥ 0 min 30 sec) ∧ (locationName = Kitchen) ∧ (objectName = Coffee machine), (date, start_time, end_time, duration, location, Preparing a beverage)).
Description. Preparing a beverage is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 30 sec) and coffee machine is the main monitored object. Once a Preparing a beverage event occurs, events information including date, start_time, end_time, duration, location, and Preparing_a_beverage should be recorded.
(7) Preparing a snack: ((duration ≥ 39 sec) ∧ (locationName = Kitchen) ∧ (objectName = Refrigerator ∨ objectName = Cereal), (date, start_time, end_time, duration, location, Preparing_ a_ snack)).
Description. Preparing a snack is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 39 sec) and the monitored objects include refrigerator and cereal. Once a Preparing a snack event occurs, events information including date, start_time, end_time, duration, location, and Preparing_a_snack should be recorded.
(8) Preparing breakfast: ((duration ≥ 1 min 59 sec) ∧ (locationName = Kitchen) ∧ (objectName = Light switch ∨ objectName = Microwave ∨ objectName = Oven ∨ objectName = Toaster ∨ objectName = Burner), (date, start_time, end_time, duration, location, Preparing_breakfast)).
Description. Preparing breakfast is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 1 min 59 sec) and the monitored objects include light switch, microwave, oven, toaster, and burner. Once a Preparing breakfast event occurs, events information including date, start_time, end_time, duration, location, and Preparing_breakfast should be recorded.
(9) Preparing lunch: ((duration ≥ 7 min, 52 sec) ∧ (locationName = Kitchen) ∧ (objectName = Light switch ∨ objectName = Microwave ∨ objectName = Oven ∨ objectName = Toaster ∨ objectName = Burner), (date, start_time, end_time, duration, location, Preparing_lunch)).
Description. Preparing lunch is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 7 min 52 sec) and the monitored objects include light switch, microwave, oven, toaster, and burner. Once a Preparing lunch event occurs, events information including
(10) Preparing dinner: ((duration ≥ 7 min 16 sec) ∧ (locationName = Kitchen) ∧ (objectName = Light switch ∨ objectName = Microwave ∨ objectName = Oven ∨ objectName = Toaster ∨ objectName = Burner), (date, start_time, end_time, duration, location, Preparing_ dinner)).
Description. Preparing dinner is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 7 min 16 sec) and the monitored objects include light switch, microwave, oven, toaster, and burner. Once a Preparing dinner event occurs, events information including
(11) Toileting: ((duration ≥ 24 sec) ∧ (locationName = Bathroom) ∧ (objectName = Toilet Flush), (date, start_time, end_time, duration, location, Toileting)).
Description. Toileting is a type of events which occurred in the bathroom (monitored by motion sensors for no less than 24 sec) and the monitored object is toilet flush. Once a Toileting event occurs, events information including date, start_time, end_time, duration, location, and Toileting should be recorded.
(12) Washing dishes: ((duration ≥ 1 min 36 sec) ∧ (locationName = Kitchen) ∧ (objectName = Dishwasher) (date, start_time, end_time, duration, location, Washing_dishes)).
Description. Washing dishes is a type of events which occurred in the kitchen (monitored by motion sensors for no less than 1 min 36 sec) and the monitored object is dishwasher. Once a Washing dishes event occurs, events information including
4.3. Event-Link Types
Then we can work out possible link types among the above event types. Herein, for the consideration of simplification, only five link types between any two event types are defined according to the actual data. Link types and their filters are listed as follows.
(1) d-succeeding. The semantic link d-succeeding from an event
(2) co-occur. The semantic link co-occur between two events
(3) overlap. The semantic link overlap between two events
(4) sameTypeOf. There exists a semantic link sameTypeOf between two events if they are with the same event type. We can easily find the sameTypeOf links between events according to their types.
(5) causeOf. The semantic link causeOf from an event
4.4. Instance of Event-Linked Network for Smart Home
(1) Data Preprocessing. We develop a group of stored procedures in SQL Server 2008 to process the data. From the downloaded dataset from http://courses.media.mit.edu/2004fall/mas622j/04.projects/home/ , which was collected about the day life of a professional woman in 16 days in her apartment, it is easy to generate a dataset with a strict format of (date, week, start_time, end_time, duration, sensorID, locationID, objectID) as shown in Table 1.
The dataset after preprocessing.
(2) Event Extracting and Presentation. In SQL Server 2008, we have developed a stored procedure to extract the event information according to the above defined event types. Total 217 events of 9 types have been extracted from the dataset as shown in Table 2.
Extracted events.
We have developed a java procedure to extract the four kinds of link information according to the above defined link types. To well represent link information of the extracted 217 events, we use an asocial network analysis tool NetDraw (http://www.analytictech.com/Netdraw/netdraw.htm) for drawing graphs of link information, as shown in Figure 1. Different colors of nodes reflect different event types. Figure 1 shows a total of five different link types of d-succeeding, co-occur, overlap, sameTypeOf, and causeOf.

The event-linked network for smart home.
4.5. Querying Example
In our smart home scenario,
(1) Select
Find the ELN where event type is Preparing dinner. We can use the following expression to complete this querying:
There are 15 events and 18 links in the result as shown in Figure 2. Find the ELN where event type is Washing dishes and link type is d-succeeding. We can use the following expression to complete this querying:
There are 39 events and 30 links in the result as shown in Figure 3. Find the ELN where event is e
174
or e
176
. We can use the following expression to complete this querying:
There are 9 events and 8 links in the result as shown in Figure 4. Find the ELN where time is between 19:00:00 and 22:00:00 and link type is co-occur. We can use the following expression to complete this querying:
There are 7 events and 7 links in the result as shown in Figure 5.

Example-find the ELN where event type is Preparing dinner.

Example-find the ELN where event type is Washing dishes and link type is d-succeeding.

Example-find the ELN where event is e 174 or e176.

Example-find the ELN where event is between 19:00:00 and 22:00:00 and link type is co-occur.
(2) Intersection. Using

Example-
(3) Union. Using

Example-
(4) Subtract. Using

Example-
5. Conclusion
Sensing techniques have greatly prompted the emerging and the development of the Internet of Things. Massive sensing data are generated continuously that are closely related to our life. How to organize and how to query the big sensing data are big challenges for intelligent applications. This paper studies the organization of big sensing data with event-linked network model. We also propose a query mechanism on event-linked network which is different from the traditional relational database. An instance of smart home is developed to show the effectiveness and efficiency of organization and query approaches based on the event-linked network. This work is useful to organize and to query the sensing data in the Internet of Things. In future works, we would like to address storage schemes, indexing mechanism, and efficient solutions for query on the large-scaled event-linked networks.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research is sponsored by the National Natural Science Foundation of China (61371185, 6100322561171014) and the ISTIC Research Foundation Projects XK2014-6.
