Matching State Estimation Scheme for Content-Based Sensor Search in the Web of Things

Abstract

More recently, an increasing number of object-attached sensors are publishing their real-time state on the Internet by using state-of-the-art Web technologies, which make the sensor search service extremely important for the Web of Things (WoT). However, the existing issues that the sensor search service is facing bring huge challenges to the design of matching state estimation scheme. In this paper, an architecture of high-efficiency content-based sensor search system is depicted to provide a prototype system for sensor search. And then a matching state estimation scheme is proposed in detail, including a sensor state prediction approach to accurately estimate future sensor readings and a match estimating and verifying approach to effectively classify and verify candidate sensors, in order to enhance the performance of our search system. Simulation results show that our matching state estimation scheme dramatically reduces the communication overhead of search system and achieves excellent performance in terms of recall ratio and precision ratio.

1. Introduction

Nowadays, a growing number of sensors and sensor networks have been connected to the Internet and World Wide Web, such that the real-world entities associated with sensors can be observed by using a standard Web browser [1]. The novel Web-based services, like pachube.com [2], Xively [3], and Microsoft SenseWeb [4], offer application programming interfaces (APIs) for publishing structured sensor data on the Web instantaneously, such as the moisture and temperature information of plant growth environment. Moreover, real-world objects are beginning to publish real-time data, with respect to their state, on the Web. One of the representative cases is Bicing [5], which is a public bicycle-sharing system in Barcelona. Bicing enables users to obtain the number of available bicycles in real time via the Web to facilitate travel. The WoT, which is based on the REST principle, extends the scope of traditional document-centric Web to a generic interface of the real world by providing each real-world entity with a Web representation which can be accessed via lightweight APIs [6, 7].

The sensor search service in the new fashioned WoT has become equally important just as the search service for documents on the conventional Web [8]. However, classical search engines, such as Baidu, Yahoo, and Google, are primarily utilized to search for static or slowly varying unstructured data like Web pages, pdf, doc files, and multimedia [9, 10]. Conversely, the search objects in the WoT are massive sensors whose state is rapidly changing [11]. Additionally, whenever a classical search engine responds to a search query, it needs to access all available objects, which leads to tremendous communication overhead. It is impractical because of the scarce energy resources in sensor networks and frequent communications will result in the rapid death of sensors [12, 13].

A further enormous challenge for sensor search in the WoT is that the majority of existing search systems lack the support of the search for sensors that meet a given query range. It is evident that the output information of most sensors in the WoT is in the form of numerical value; thus the value-based search mode becomes extremely important for the WoT [14]. However, sensor measurements always slightly fluctuate over time as the state of monitored entities usually changes quickly and marginally. Hence, searching for sensors in real time based on a specific value is of limited benefit. Much research must be pursued in the direction of novel value-based search modes.

From those preceding discussions, we conclude that conventional document-centric Web search engines are not directly applicable to the WoT and novel efficient content-based sensor search systems are eagerly to be proposed. The remaining parts of this paper are organized as follows. Section 2 gives an overview of the related work in the WoT. Section 3 presents an architecture of the proposed MSE sensor search. Section 4 proposes a sensor state prediction approach. Section 5 designs a match estimating and verifying approach. Section 6 discusses the performance evaluation. Section 7 finally concludes the paper.

2. Related Work

As a key research focus, the search engine technology has been studied for many years. Existing search engine techniques, like the keyword-based and catalogue-based search engines, are relatively mature by being used in multiple kinds of commercial applications [15]. However, the search systems mentioned above concentrate more on the search for documents (Web pages, videos, pictures, blogs, etc.) [16, 17], rather than object-attached sensors. Nowadays, numerous sensors have already been connected to the WoT; thus the requirements to search for real-world entities according to their perceived real-time state are explosively growing, which fuel an explosive growth in the sensor search technology [18]. However, the existing related research, especially effective matching state estimation scheme, is very limited and the development of sensor search technique in the WoT is still in its infancy [19].

Until recently, several sensor search prototype systems have been proposed, such as Snoogle [20], MAX [21], and Microsearch [22]. Their core ideas are that some textual descriptions related to real-world entities are stored in their associated sensors in the form of keywords. Hence, the preceding search systems are based on keyword match and only support the search for pseudostatic metadata rather than dynamic sensor measurements. In [23] a social-aware distributed Cyber-Physical Human-Centric Search (SCPS) engine is presented, which leverages the social network formed by handheld devices to search for real-world entities. SCPS locates human-carried entities according to uses' routine movement models. Once exceptional events that deviate from the routine model occur, SCPS will integrate Bayesian network into social network model to estimate the locations of deflected objects. However, the research issue of SCPS focuses on locating human-centric entities; the same as aforementioned prototype systems, SCPS cannot be directly used for the search pattern with a given measurement.

In order to search for real-world entities based on their observed measurements, Dyser [24], a novel sensor search engine, is introduced. Dyser provides each sensor and entity with a Web page, which is identified by a URL and accessed by using HTTP. Dyser consists of a resolver, which processes search queries, an index, which stores indexed metadata of sensors and entities, and an indexer, which periodically crawls sensor and entity pages. However, Dyser assumes that sensor readings always exhibit a high degree of periodicity. Moreover, the sensor output content in Dyser is not a raw sensor measurement but high-level state acquired by the means of data fusion. Nevertheless, there are numerous low-cost sensors in the WoT with relatively low data fusion capability and the assumption of periodicity also constrains its application area to particular scenarios.

In response to above problems, a content-based sensor search engine (CSS) is proposed in [25]. In CSS, an architecture to realize content-based search is presented, including a number of sensor gateways. Each gateway manages one or more sensor networks. The prediction model database is also included, which is associated with a sensor gateway. The light-weight prediction model, based on fuzzy logic, is applied to estimate the probability that a sensor matches with a given search range, instead of a concrete value. Sensor gateways in CSS only need to access sensors with high matching probabilities; thus CSS supports the search for sensors whose state satisfies a given query range at the cost of low communication overhead. However, the prediction model in CSS is time-independent, implying that once the query range is given the matching probability of a sensor in the same period is constant. CSS thereby ignores the fact that sensor output values are always changing, which leads to a relatively low prediction accuracy of matching probability and a correspondingly high communication overhead. Besides, CSS overlooks the search needs to sensors whose future state will meet the query requirements. Furthermore, CSS lacks the support for returning all the sensors that meet the search needs to the user.

To solve these existing problems, we propose the matching state estimation scheme for sensor search in the Web of Things (MSE). The main contributions of this paper include the following: (1)

An architecture of MSE is designed to efficiently search for sensors in the WoT. We depict the architecture of MSE, including the main modules and the functions of each module. We also propose two types of search modes, the reliable MSE search (R-MSE) and the proactive MSE search (P-MSE), to fully satisfy the search needs which present strong randomness and uncertainty with respect to the query time and the number of returned search results. To the best of our knowledge, this is the first research paper that summarizes the search modes in the WoT according to the query time and the number of sought results.

(2)

A sensor state prediction approach is proposed to estimate the short-term state of sensors in order to find candidate ones. The sensor state prediction approach is the core algorithm of our prediction model, which is based on raw sensor data and without any assumptions regarding periodicity in sensor measurements. Besides, this prediction approach can better exploit the temporal correlations between the sensor data and accurately perceive the variation trend of future sensor readings.

(3)

A match estimating and verifying approach is presented to estimate the matching state of candidate sensors and efficiently verify their virtual state, so as to ensure the reliability of search results. The proposed approach contains a sensor classifying method, which classifies sensors into three categories based on their predicted state to find all match sensors, a sensor ranking method, which assesses the matching probabilities of match sensors and sorts them in an effective way, and a sensor verifying method to check match sensors with minimal communication overhead.

3. Architecture of MSE Sensor Search

In this part we depict the architecture of our MSE sensor search system and present the main components, encompassing the functions of each component. Then we propose two kinds of search modes, which have never been fully considered in the preceding research. Furthermore, we, respectively, design the interactive processes between these components in MSE system for the two search modes.

The MSE system consists of four types of modules: client, gateway, prediction model database (PMD), and wireless sensor networks (WSNs). The user submits a sensor search query via the client. WoT gateways distribute across the Internet in a two-tiered hierarchy, which are responsible for further issuing the search query and managing one or more WSNs. Each PMD is associated with a WoT gateway and stores prediction models of each sensor. Real-world entities are observed by their embedded sensors in WSN. The architecture of MSE system is plotted as Figure 1.

Figure 1

The architecture of MSE search system.

As is described above, WoT gateways present a two-tiered structure, the upper layer gateway named global gateway which is responsible for responding to the search query and issuing the query to appropriate lower gateway, according to some distinguishing information provided by the user, such as geographical locations like “ETH Zurich,” and the type of desired sensors as “Temperature.” The lower gateway, called local gateway, crawls the prediction models of sensors periodically and stores them to its related PMD.

When executing a search query, classical search engines are accustomed to searching for the desired targets by accessing all available objects, which spend considerable time and cause vast communication overhead. Hence, the traditional search pattern results in poor search efficiency. In the MSE search system, we establish a prediction model based on raw data for each sensor to improve the search performance. The prediction model is built in a distributed manner to predict the sensor output value at the query time and estimate the matching probability of a sensor with the given search range. Thus MSE only needs to communicate with sensors that have relatively high matching probabilities instead.

3.1. R-MSE Sensor Search

The user performs a content-based sensor search query via the client. Above, the state of monitored real-word objects always changes; thus the measurements of attached sensors usually fluctuate over time. Therefore, searching for sensors based on a given specific value is of limited use. We define the search query as $q = ([a, b], t_{q}, (R_{q}))$ instead, which denotes that the user desires to search for sensors whose readings belong to the range $[a, b]$ at the query time $t_{q}$ . $R_{q}$ constrains the number of search results that need to be returned to the user. It also can be unspecified, which implies that the search system should return all match sensors. $t_{q}$ is a random value within the range $[0, ω]$ , where ω is the crawling period of local gateway.

Depending on the query time $t_{q}$ and the number of search results $R_{q}$ we, respectively, propose the corresponding search processes. Provided that the user desires to search for sensors whose current measurements are within the range $[a, b]$ , which we call the current search, and the number of sought sensors $R_{q}$ is given, then we design the R-MSE sensor search which contains the verification process to ensure the reliability of search results as shown in Figure 2.

Figure 2

The interaction process of R-MSE.

At first, sensors, respectively, establish their own prediction models based on the sensor state prediction approach which will be proposed below. Then the local gateway periodically crawls the prediction models of each managed sensor and stores these models in its relevant PMD. When a global search query arrives, the global gateway issues a local search command to a suited local gateway after resolving the differentiated items submitted by the user. The local gateway initiates a matching query to its associated PMD to classify and sort candidate sensors. After that, the local gateway forms a local ranking list (LRL), which presents match sensors in descending order based on their matching probabilities, and sends to its upper global gateway. Then the global gateway reorders the sensors in these lists and publishes a verification command to the corresponding local gateway to further selectively verify the actual status of the chosen sensor. Afterwards, the local gateway transmits the verification results to the global gateway which will return the final search results to the user in the end.

3.2. P-MSE Sensor Search

The user may be also interested in searching for the sensor whose future state will belong to the query range $[a, b]$ , which is called the future search. Besides, specifying the option $R_{q}$ for each search process could place a large burden on the user. In some cases, the user also needs to obtain all match sensors. Therefore, we design the P-MSE sensor search to address the above issues. P-MSE provides users with the ability of taking actions in advance while facing the natural disasters and extreme weather, or the foresight of defining their own application-specific search, and largely mitigates the user's burden, instead of offering the reliability of search results due to the reason that we cannot verify the future state of sensors beforehand. The search process of P-MSE is plotted below.

As is depicted in Figure 3, we remove the verification operation from the P-MSE search process. Instead, the global gateway returns the top- $R_{q}$ sensors in the submitted LRLs as the final search results. Note that P-MSE will return all match sensors to the user when performing the current search without defining the parameter $R_{q}$ , exclusive of the verification process, because checking all match sensors will cause huge communication overhead in comparison with the search mode with specified results.

Figure 3

The interaction process of future search.

4. Sensor State Prediction Approach

The search objects in the WoT are massive object-attached sensors whose state may change rapidly. In order to guarantee the timeliness and reliability of search results, we present a sensor state prediction approach in this section to estimate the variation trend of sensor state. When a search query arrives, these candidate sensors will be selectively accessed according to their predicted state. Thus the communication overhead of the search system can be obviously reduced.

In the vast majority of sensor search applications, it is impractical to obtain the complete information about the perceived objects, due to the limited perception and storage capacity of sensors. Sensor search systems whose partial information is unknown can be considered as the Grey system [26]. Therefore, we estimate the sensor state by using the identical dimension grey recurrence dynamical GM(1,1) model, which is proposed in [27]. The prediction approach is dynamically updated by removing the stale data from the sensor state sequence and adding the new data which reflect the latest tendency, so as to improve the precision of our prediction approach.

We define the original sensor state sequence of senor s as

\begin{matrix} x^{(0)} = \{x^{(0)} (1), x^{(0)} (2), \dots, x^{(0)} (n)\}, \end{matrix}

(1)

where

x^{(0)} (i)

represents the sensor state value at the time i and n denotes the embedding dimension, which means the amount of sensor data required for a single prediction process. In order to reduce the influence of random disturbance on the data prediction process and acquire the temporal correlations between the sensor data, we carry out the accumulated generating operation on the original sensor state sequence

x^{(0)}

as follows:

\begin{matrix} x^{(1)} (k) = \sum_{i = 1}^{k} x^{(0)} (i) (k = 1,2, \dots, n), \\ x^{(1)} = \{x^{(1)} (1), x^{(1)} (2), \dots, x^{(1)} (n)\} . \end{matrix}

(2)

Then we establish the albinism differential equation of $x^{(1)}$ to exploit the regularity and temporal redundancy between the sensor data in the accumulative sequence $x^{(1)}$ :

\begin{matrix} \frac{d x^{(1)}}{d t} + a x^{(1)} = u, \end{matrix}

(3)

where a is a developing coefficient which reflects the increasing rate of the accumulative sequence

x^{(1)}

. u means the grey action that indicates the changing relationships between the data in

x^{(1)}

. For facilitating the computation, we disperse (3) to gain its difference equation as

\begin{matrix} \frac{Δ x}{Δ t} = \frac{x^{(1)} (k + 1) - x^{(1)} (k)}{k + 1 - k} = x^{(1)} (k + 1) - x^{(1)} (k) = x^{(0)} (k + 1) . \end{matrix}

(4)

Partial samples of observed real-world entity might be lost because of the component damage or system failure. Hence, we need to fill the imperfect original sequence to reduce the impact of missing samples on the data prediction process. The data filling method is designed as

\begin{matrix} z^{(1)} (k + 1) = 0.5 x^{(1)} (k + 1) + 0.5 x^{(1)} (k) . \end{matrix}

(5)

Then (3) can be further transformed as

\begin{matrix} x^{(0)} (k + 1) + a z^{(1)} (k + 1) = u, k = 1,2, \dots, n - 1 . \end{matrix}

(6)

We, respectively, construct the sensor state accumulative matrix B and the constant term vector Y as follows:

\begin{matrix} B = [\begin{bmatrix} - 0.5 [x^{(1)} (1) + x^{(1)} (2)] & 1 \\ ⋮ & ⋮ \\ - 0.5 [x^{(1)} (n - 1) + x^{(1)} (n)] & 1 \end{bmatrix}], \\ Y = {[x^{(0)} (2), x^{(0)} (3), \dots, x^{(0)} (n)]}^{T} . \end{matrix}

(7)

By adopting the least square method, we estimate the developing coefficient a and the grey action u as

\begin{matrix} [\begin{bmatrix} a \\ u \end{bmatrix}] = {(B^{T} B)}^{- 1} B^{T} Y . \end{matrix}

(8)

The discrete solution of albinism differential equation can be deduced via substituting the estimated values $\hat{a}$ and $\hat{u}$ into (3):

\begin{matrix} x^{(1)} (k + 1) = [x^{(0)} (1) - \frac{\hat{u}}{\hat{a}}] e^{- \hat{a} k} + \frac{\hat{u}}{\hat{a}}, k = 0,1, 2, \dots . \end{matrix}

(9)

After conducting the inverse accumulated generating operation on the discrete solution mentioned above, we obtain the prediction function of the original sensor state sequence as follows:

\begin{matrix} {\hat{x}}^{(0)} (k + 1) = (e^{- a} - 1) [x^{(0)} (1) - \frac{\hat{u}}{\hat{a}}] e^{- \hat{a} k}, k = 1,2, \dots, \end{matrix}

(10)

where

{\hat{x}}^{(0)} (k + 1)

is the predicted value of sensor s at the time

k + 1

. Hence, we have the predicted value

{\hat{x}}^{(0)} (n + 1)

at the time

n + 1

and then update the original sensor sequence

x^{(0)}

by replacing the outdated sensor state value

x^{(0)} (1)

with the predicted value

{\hat{x}}^{(0)} (n + 1)

, so as to reflect the latest trend of the sensor data and enhance the precision of our prediction function. The updated sensor sequence

x^{{(0)}^{'}}

is described as

\begin{matrix} x^{{(0)}^{'}} = \{x^{(0)} (2), \dots, x^{(0)} (n), {\hat{x}}^{(0)} (n + 1)\} . \end{matrix}

(11)

Similarly, the sensor reading ${\hat{x}}^{(0)} (n + 2)$ at the time $n + 2$ can be calculated following (2)–(10) above. Finally, we obtain the predicted value set of sensor s during the next period below:

\begin{matrix} E S_{P}^{S} = \{{\hat{x}}^{(0)} (n + 1), {\hat{x}}^{(0)} (n + 2), \dots, {\hat{x}}^{(0)} (ω - 1)\}, \end{matrix}

(12)

where

{\hat{x}}^{(0)} (i)

i = n + 1, n + 2, \dots, ω - 1

denotes the predicted sensor reading at the time i. When the local gateway crawls the prediction models of its managed sensors, each sensor reports its predicted value set

E S_{P}^{S}

to its associated local gateway.

5. Match Estimating and Verifying Approach

Estimating the matching state of the sensor whose predicted reading locates around the query boundary is a tough issue, because of the inevitable and uncertain prediction error caused by ambient noise. Besides, a unified measure for evaluating the matching degree of a sensor with the given search content still lacks. Furthermore, verifying every candidate sensor will result in tremendous communication overhead. Therefore, in this section we propose a match estimating and verifying approach to assess the matching state of sensors and verify these sensors with relatively low communication overhead, so as to improve the performance of our search system.

5.1. Sensor Classifying and Ranking Method

When processing the search query without defining the number of sought sensors, we adopt the P-MSE scheme so that the global gateway returns all match sensors to the user, which seems to be straightforward based on the predicted values estimated by their prediction models. However, the prediction error is inevitable because of the impact of random disturbance on the sensor readings. Hence, assessing the matching state of sensors, especially the border ones whose predicted state values are around the search boundary, becomes intricate. Here we propose an adjustable sensor classifying method to address the preceding problem.

The error threshold $e_{γ}$ , which is the maximum difference between the predicted value and the query boundary, is taken into account to classify candidate sensors. The classifying algorithm is designed as

\begin{matrix} E_{s} = \{\begin{cases} PM, & if {\hat{x}}^{(0)} (i) \in (a + e_{γ}, b - e_{γ}) \\ BM, & if {\hat{x}}^{(0)} (i) \in [a - e_{γ}, a + e_{γ}] \cup [b - e_{γ}, b + e_{γ}] \\ NM, & if {\hat{x}}^{(0)} (i) \in (- \infty, a - e_{γ}) \cup (b + e_{γ}, + \infty) . \end{cases} \end{matrix}

(13)

According to the error threshold and query range, we divided the marching state of sensors into three categories. Supposing that a sensor whose predicted value is within the range $(a + e_{γ}, b - e_{γ})$ , we call it a positive match (PM) sensor, which means the sensor owns great matching probability with the search query. A sensor whose predicted output locates in the query scope $[a - e_{γ}, a + e_{γ}] \cup [b - e_{γ}, b + e_{γ}]$ is named as a border match (BM) sensor, indicating a limited matching probability. The remaining ones are negative match (NM) sensors, which imply the matching probabilities of this kind of sensors are extremely low.

After that, the global gateway only returns PM and BM sensors to the user as the search results when responding to the search query with $R_{q}$ unspecified. Note that the error threshold can be dynamically adjusted to adapt to the demand of the search results, which will be demonstrated below.

Obviously, based on the above classifying method, we can easily find that the sensor is a PM or BM sensor. However, distinguishing a more suitable sensor from homogeneous ones is unintuitive. Besides, the user-friendly search results should present the most suitable sensor on the top. Therefore, we further design a sensor ranking method to sort match sensors in an effective way.

Firstly, we map one-dimensional predicted sensor reading into the vector space. Denote ${\hat{x}}^{(0)} (i)$ to be the predicted value of the sensor s at the query time i and then we transform ${\hat{x}}^{(0)} (i)$ to the vector form as follows:

\begin{matrix} \vec{{\hat{x}}^{(0)} (i)} = [|{\hat{x}}^{(0)} {(i)}_{a}|, |{\hat{x}}^{(0)} {(i)}_{b}|], \end{matrix}

(14)

where

|{\hat{x}}^{(0)} (i)_{a}|

denotes the distance between

{\hat{x}}^{(0)} (i)

and the lower bound of query range a. Similarly, we can define the distance

|{\hat{x}}^{(0)} (i)_{b}|

. Thus

|{\hat{x}}^{(0)} (i)_{a}|

and

|{\hat{x}}^{(0)} (i)_{b}|

are calculated as

\begin{matrix} |{\hat{x}}^{(0)} {(i)}_{a}| = |{\hat{x}}^{(0)} (i) - a|, \\ |{\hat{x}}^{(0)} {(i)}_{b}| = |{\hat{x}}^{(0)} (i) - b| . \end{matrix}

(15)

Essentially, we cannot obtain the random error in advance; hence the prediction error is inevitable and unpredictable. However, the fact we confirm is that with the same prediction error the closer the predicted value is to the midpoint of query range, the higher the matching probability of the sensor is. Accordingly, we define the criterion vector as

\begin{matrix} \vec{N_{a}^{b}} = [\frac{(a + b)}{2}, \frac{(a + b)}{2}], \end{matrix}

(16)

where

\vec{N_{a}^{b}}

owns the highest matching probability. Therefore, the matching probability of a given sensor s can be deduced based on the similarity between its predicted vector and the criterion vector, as the following equation describes:

\begin{matrix} P_{m} = \frac{\vec{{\hat{x}}^{(0)} (i)} \cdot \vec{N_{a}^{b}}}{|\vec{{\hat{x}}^{(0)} (i)}| |\vec{N_{a}^{b}}|} = \frac{\sqrt{2} {[|{\hat{x}}^{(0)} {(i)}_{a}|, |{\hat{x}}^{(0)} {(i)}_{b}|]}^{T} [(a + b) / 2, (a + b) / 2]}{(a + b) \sqrt{{|{\hat{x}}^{(0)} {(i)}_{a}|}^{2} + {|{\hat{x}}^{(0)} {(i)}_{b}|}^{2}}} = \frac{(|{\hat{x}}^{(0)} (i) - a| + |{\hat{x}}^{(0)} (i) - b|)}{\sqrt{4 {\hat{x}}^{{(0)}^{2}} (i) - 4 (a + b) {\hat{x}}^{(0)} (i) + 2 a^{2} + 2 b^{2}}} . \end{matrix}

(17)

According to the match probability estimated above, the local gateway sorts its managed match sensors in descending order. Afterwards, it forms a local ranking list $LRL = \{P_{n}, \dots, P_{2}, P_{1}\}$ , $P_{n} > \dots > P_{2} > P_{1}$ and forwards the LRL to its related global gateway for conducting further reordering and verifying processes.

5.2. Ordered Sensor Verifying Method

The optimal search results should exhibit the optimum sought sensors to the user; hence we need to reorder the sensors in the submitted LRLs. Furthermore, R-MSE involves the verifying process to guarantee the reliability of search results, but checking all match sensors will result in tremendous communication overhead. Thus we propose an ordered sensor verifying method to reorder locally optimal results and selectively verify sensors that have high matching probabilities to reduce the communication overhead of the verifying process.

As above, after receiving LRLs, the global gateway will merge these lists and form a global ordered list $GO L_{q}$ . The top-ranked sensors in the list $GO L_{q}$ , $GO L_{q} = ⋃_{k = 1}^{δ} GO L_{q}^{k}$ , will be returned or verified first. The reorder algorithm is

\begin{matrix} \begin{matrix} GO L_{q}^{1} & = MaxPro \{ħ (LR L_{1}), ħ (LR L_{2}), \dots, ħ (LR L_{n})\} \\ GO L_{q}^{2} & = MaxPro \{GO L_{q}^{1} ⊄ \{ħ^{'} (LR L_{1}), ħ^{'} (LR L_{2}), \dots, ħ^{'} (LR L_{n})\}\} \\ ⋮ & ⋮ \\ GO L_{q}^{δ} & = MaxPro \{⋃_{k = 1}^{δ - 1} GO L_{q}^{k} ⊄ \{ħ^{'} (LR L_{1}), ħ^{'} (LR L_{2}), \dots, ħ^{'} (LR L_{n})\}\}, \end{matrix} \end{matrix}

(18)

where

ħ (LR L_{i})

denotes the head of the list

LR L_{i}

and

ħ^{'} (LR L_{i})

means the current head of

LR L_{i}

after excluding the sensor

GO L_{q}^{k}

if the sensor

GO L_{q}^{k}

belongs to the list

LR L_{i}

. Otherwise,

ħ^{'} (LR L_{i})

equals

ħ (LR L_{i})

. The function

MaxPro \{\}

aims to find the sensor that has the maximum matching probability. δ is the number of sensors that need to be returned or verified. Note that if

R_{q}

is undefined, or

R_{q}

is given butthe search mode being performed is the future search, in both cases, we adopt the P-MSE scheme to search for match sensors. Then δ equals the number of sensors that belong to PM and BM, or just equals the given

R_{q}

. Finally, the global gateway returns the sensors in the list

GO L_{q}

to the user, including their respective matching probabilities.

Otherwise, supposing that $R_{q}$ has been specified already and the search mode being executed is the current search, then we perform the R-MSE to search for sought sensors. In this case, the relationship between δ and $R_{q}$ is

\begin{matrix} R_{q} = \sum_{k = 1}^{δ} e (GO L_{q}^{k}), \\ e (GO L_{q}^{k}) = \{\begin{cases} 1, & GO L_{q}^{k} \in [a, b] \\ 0, & GO L_{q}^{k} \notin [a, b], \end{cases} \end{matrix}

(19)

where

e (GO L_{q}^{k})

represents the verification result of the sensor k. If the actual reading of sensor k is within the search range, that is,

e (GO L_{q}^{k})

equals 1, then the sensor k is added to the final search result list

L_{R_{q}}

. Otherwise, if

e (GO L_{q}^{k})

equals 0, the global gateway will exclude the sensor k and continue executing the verifying process.

After deducing the sensor k that needs to be verified, the global gateway will issue a verification command to the local gateway that is responsible for managing the sensor k to check its current actual state. This verifying process lasts until enough match sensors are found. At last, the global gateway returns the final sought sensor list $L_{R_{q}}$ to the user. The pseudocode of MSE scheme is described as shown in Algorithm 1.

Algorithm 1: MSE scheme pseudocode.

(1) Builds the prediction model $E S_{P}^{S}$ ;

(2) Periodically crawls the model;

(3) Stores the model to the PMD;

(4) while (search $q = ([a, b], t_{q}, (R_{q}))$ arrives) do

(5) Accesses PMD;

(6) Specifies error threshold $e_{γ}$ ;

(7) Classifies sensors into PM, BM, and NM;

(8) Converts to vector space ${\hat{x}}^{(0)} (i) \to \vec{{\hat{x}}^{(0)} (i)}$ ;

(9) Acquires the matching probability $P_{m}$ ;

(10) Sorts match sensors $P_{n} > \dots > P_{2} > P_{1}$ ;

(11) return LRLs

(12) Reorders sensors in these lists;

(13) Forms the global ordered list $G O L_{q}$ ;

(14) if ( $R_{q}$ is null ∥ is future search)

(15) Performs P-MSE search:

(16) return all sensors in the list $G O L_{q}$

(17) else

(18) Executes R-MSE search:

(19) Verifies sensors in $G O L_{q}$ :

(20) for $G O L_{q}^{k} \in G O L_{q}$ , $k = 1, \dots, δ$

(21) if ( $G O L_{q}^{k} \in [a, b]$ )

(22) Adds $G O L_{q}^{k}$ to the final list $L_{R_{q}}$ ;

(23) else

(24) continue

(25) end if

(26) end for

(27) return search result list $L_{R_{q}}$

(28) end if

(29) end while

(30) end algorithm

6. Performance Evaluation

In this section we use 2 real-world datasets, IntelLab [28] and NOAA [29], to evaluate the performance of our MSE system. The crawling period of IntelLab and NOAA, respectively, is 10 minutes and 60 hours. The search range $[a, b]$ is randomized within $[x_{Min}, x_{Max}]$ , where $x_{Min}$ is the minimum value of sensor readings in the dataset and $x_{Max}$ is the maximum. The query time $t_{q}$ is randomly distributed within $[0, ω]$ . Simulations are conducted in software environment Matlab R2014a. Each simulation result runs for 20 times.

6.1. R-MSE Simulation Evaluation

In this experiment, we introduce the communication overhead to examine the performance of R-MSE and CSS search engine mentioned in [25], due to the reason that the returned search results are always accurate. The ordered verifying approach, included in R-MSE, needs to communicate with match sensors depending on the global ordered list $GO L_{q}$ , which takes a vast proportion in the communication overhead of search system. Therefore, the entire communication overhead is mainly related with the validity of global ordered list.

The optimal global ordered list should rank all match sensors on the top so as to reduce the communication overhead of search system by minimizing the number of sensors that need to be verified. In order to assess the validity of global ordered list, we introduce a normalized metric $C O_{q}$ [25], which is proportional to the number of sensors that should be verified. For the sensor at rank i, we use $e (i)$ to denote the ranking error, which means the number of nonmatch sensors that rank higher than match ones at rank i. Accordingly, we define the normalized metric that essentially represents the communication overhead as follows:

\begin{matrix} C O_{q} = \frac{\sum_{i = 1}^{δ} e (i)}{R_{q} [δ - R_{q}]} . \end{matrix}

(20)

Note that if the sensor at rank i is a nonmatch sensor, then $e (i)$ equals 0. We, respectively, adopt IntelLab and NOAA to assess the communication overhead of R-MSE and aforementioned CSS system. Evaluation results with different query ranges and simulated days are plotted below.

As is illustrated in Figures 4 and 5, the communication overhead of both R-MSE and CSS presents downward trend when the query range increases. Compared with CSS, R-MSE, respectively, reduces the communication overhead about 21.1% and 20.9% for IntelLab and NOAA. And the variances of R-MSE are a little smaller than CSS. Because the number of match sensors will increase with the query range growing, the effect of random error on the matching state estimation will be reduced. Moreover, the accuracy of our prediction model is much higher than that of CSS, and our ordered verification approach efficiently check sensors after ranking sought sensors on the top. However, the low-accuracy prediction model constructed by CSS ranks many nonmatch sensors ahead of match ones, which generates tremendous communication overhead during the verification process. Therefore, R-MSE can achieve better performance in terms of communication overhead with different query ranges than CSS.

Figure 4

Communication overhead with different query ranges for IntelLab dataset.

Figure 5

Communication overhead with different query ranges for NOAA dataset.

Figures 6 and 7 plot the communication overhead of R-MSE and CSS with different simulated days for IntelLab and NOAA datasets. We see that the communication overhead is reduced about 23.6% and 24.3% by R-MSE severally for Intel and NOAA in comparison with CSS. The variances of R-MSE are much smaller than those of CSS. That is because our state prediction approach can accurately estimate the future readings of sensors and the proposed match estimating and verifying approach effectively assesses the matching probabilities of sensors and then efficiently verifies match sensors based on their matching probabilities. However, the prediction model in CSS is built based on sensor measurements in the previous two periods only, without fully considering the temporal correlations between these sensor data, leading to poor estimation accuracy of matching probability. Thus CSS ranks many nonmatch sensors on the top, which results in vast communication overhead when executing the verification operation. Besides, the number of nonmatch sensors found by CSS may fluctuate. Therefore, compared with CSS, R-MSE locates more suitable sensors at the cost of relatively low communication overhead. And the communication overheads of CSS obtained in several simulations are spread out; then the variances of CSS are larger than R-MSE.

Figure 6

Communication overhead with different simulated days for IntelLab dataset.

Figure 7

Communication overhead with different simulated days for NOAA dataset.

6.2. P-MSE Simulation Evaluation

In order to objectively and exactly assess the search performance of P-MSE, we introduce two typical metrics, the recall ratio and precision ratio, to evaluate our search scheme. However, due to the reason that the CSS search system lacks the support for the search mode with the number of sought sensors undefined, thus we develop an improved CSS, which is called CSS-A, to constrain the number of returned search results by setting a matching probability threshold. More specifically, CSS-A only returns the sensor whose matching probability exceeds the defined threshold. Besides, as is mentioned above, the error threshold has a direct effect on the search performance of P-MSE. Thus in this section we, respectively, evaluate the performance of P-MSE and CSS-A with different error and matching probability thresholds.

As is depicted in Figures 8 and 9, the recall ratio of P-MSE is steadily improved with the error threshold increasing. Conversely, the recall ratio of CSS-A becomes increasingly low when enlarging the matching probability threshold. The variances of the two algorithms are very small and almost equivalent due to the concentrated simulation results. We also observe that the recall ratio of P-MSE is about 34.1% and 59.9% higher than that of CSS-A, respectively, for Intel and NOAA. This is because the growing error threshold increases the number of match sensors; thus more match sensors will be returned to the user. However, the prediction model in CSS-A system is time-independent, which neglects the dynamic characteristics of sensor data, resulting in large random errors while predicting the matching probabilities of sensors. Though enlarging the matching probability threshold seemingly contributes to eliminating more sensors that have low matching probabilities, CSS-A indeed excludes many match sensors due to its low-accuracy matching probability estimation method. Therefore, with an increase in matching probability threshold the recall ratio of CSS-A decreases and that of P-MSE is much higher than CSS-A.

Figure 8

Recall ratio with different error (matching probability) thresholds for IntelLab dataset.

Figure 9

Recall ratio with different error (matching probability) thresholds for NOAA dataset.

Figures 10 and 11 compare the precision ratio of P-MSE and CSS-A with different error (matching probability) thresholds for IntelLab and NOAA datasets. We find that with an increase in error (matching probability) threshold the precision ratio of the two schemes increasingly descends. P-MSE severally enhances the precision ratio about 13.1% and 18.8% for IntelLab and NOAA than CSS-A. The variances of P-MSE are relatively smaller than CSS-A. That is because the growing error threshold increases not only the number of returned match sensors but also nonmatch ones, which raises the proportion of nonmatch sensors to all the returned ones. Thus the precision ratio of P-MSE is slowly decreased. However, the precision of prediction model built by CSS-A is relatively low; hence CSS-A removes a growing number of match sensors while returning more and more nonmatch ones with the matching probability threshold increasing, which result in a drastic downtrend and a poor performance of precision ratio than P-MSE. Moreover, because of the low-precision prediction model, the number of nonmatch sensors returned by CSS-A may be very different, which result in the relatively unstable precision ratio in several simulations. Thus the variances of CSS-A are a little larger than P-MSE.

Figure 10

The effect of error (matching probability) threshold on the precision ratio of IntelLab dataset.

Figure 11

The effect of error (matching probability) threshold on the precision ratio of NOAA dataset.

7. Conclusions

In recent years, massive sensors are being connected to the Internet and accessed by using a standard Web browser. However, the research on sensor search in the WoT is still very scarce. In this paper, we propose a matching state estimation scheme for sensor search in the WoT, which includes these contributions: an architecture of MSE sensor search, a sensor state prediction approach, and a match estimating and verifying approach. The architecture of MSE scheme depicts the prototype of content-based sensor search and presents two kinds of search mode in the WoT. The sensor state prediction approach is designed to build our prediction model to estimate the future sensor readings. The match estimating and verifying approach aims at efficiently classifying and verifying candidate sensors to enhance the performance of our search system. Evaluation results demonstrate the validity of MSE scheme in the area of sensor search.

So far, the research on content-based sensor search in the WoT is just at the stage of prototype designing. The applicable matching state estimation scheme is still very rare. As future work, we plan to improve the practicability of our scheme and apply to the practical sensor search service.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors wish to appreciate the support from National Natural Science Foundation of China (61170275), Beijing Higher Education Young Elite Teacher Project, Technology Project of Guangdong Province, and Beijing Key Laboratory of Work Safety Intelligent Monitoring.

References

Sheng

Z. G.

Yang

S. S.

Y. F.

Vasilakos

A. V.

McCann

J. A.

Leung

K. K.

A survey on the ietf protocol suite for the internet of things: standards, challenges, and opportunities

IEEE Wireless Communications 2013 20 6 91 98

10.1109/mwc.2013.6704479

2-s2.0-84892614267

http://www.pachube.com

https://xively.com/

Kansal

Nath

Liu

Zhao

SenseWeb: an infrastructure for shared sensing

IEEE Multimedia 2007 14 4 8 13

10.1109/mmul.2007.82

2-s2.0-36248949044

http://www.bicing.com/

Wang

Zhou

Zhao

A survey from the perspective of evolutionary process in the internet of things

International Journal of Distributed Sensor Networks 2015 2015 9

462752

10.1155/2015/462752

Mrissa

Médini

Jamont

Le Sommer

Laplace

An avatar architecture for the web of things

IEEE Internet Computing 2015 19 2 30 38

10.1109/mic.2015.19

Maekawa

Yanagisawa

Sakurai

Kishino

Kamei

Okadome

Context-aware web search in ubiquitous sensor environments

ACM Transactions on Internet Technology 2012 11 3, article 12

10.1145/2078316.2078320

2-s2.0-84857846214

Guo

K. H.

J. H.

Duan

G. H.

DHSR: a novel semantic retrieval approach for ubiquitous multimedia

Wireless Personal Communications 2014 76 4 779 793

10.1007/s11277-013-1327-1

2-s2.0-84901775256

10.

Modi

Bhandari

Desai

Shah

Smart search engine using artificial intelligence

Proceedings of the International Conference and Workshop on Emerging Trends in Technology (ICWET 2011)

February 2011

Mumbai, India

ACM

707 710

10.1145/1980022.1980174

2-s2.0-79958701451

11.

Guinard

Trifa

Wilde

A resource oriented architecture for the web of things

Proceedings of the 2nd International Internet of Things Conference (IoT ’10)

December 2010

Tokyo, Japan

IEEE

1 8

10.1109/iot.2010.5678452

2-s2.0-78751686031

12.

Cai

X. L.

Duan

Y. L.

Yang

C. L.

Bee-sensor-C: an energy-efficient and scalable multipath routing protocol for wireless sensor networks

International Journal of Distributed Sensor Networks 2015 2015 14

976127

10.1155/2015/976127

13.

D. P.

Wang

H. G.

Wang

C. G.

Wang

R. Y.

A hierarchical packet forwarding mechanism for energy harvesting wireless sensor networks

IEEE Communications Magazine 2015 53 8 92 98

10.1109/mcom.2015.7180514

14.

Ding

Z. M.

Chen

Z. K.

Yang

IoT-SVKSearch: a real-time multimodal search engine mechanism for the internet of things

International Journal of Communication Systems 2014 27 6 871 897

10.1002/dac.2647

2-s2.0-84902506604

15.

Tamine-Lechani

Boughanem

Daoud

Evaluation of contextual information retrieval effectiveness: overview of issues and research

Knowledge and Information Systems 2010 24 1 1 34

10.1007/s10115-009-0231-1

2-s2.0-77953915372

16.

Vaamonde

S. R.

Torresani

Fitzgibbon

A. W.

What can pictures tell us about web pages? Improving document search using images

IEEE Transactions on Pattern Analysis and Machine Intelligence 2015 37 6 1274 1285

17.

Y. F.

Wang

Huang

A relation-based search engine in Semantic Web

IEEE Transactions on Knowledge and Data Engineering 2007 19 2 273 281

10.1109/tkde.2007.18

2-s2.0-33846236275

18.

Elahi

B. M.

Römer

Ostermaier

Fahrmair

Kellerer

Sensor ranking: a primitive for efficient content-based sensor search

Proceedings of the International Conference on Information Processing in Sensor Networks (IPSN ’09)

April 2009

San Francisco, Calif, USA

IEEE

217 228

2-s2.0-71049185040

19.

Römer

Ostermaier

Mattern

Fahrmair

Kellerer

Real-time search for real-world entities: a survey

Proceedings of the IEEE 2010 98 11 1887 1902

10.1109/jproc.2010.2062470

2-s2.0-77958150494

20.

Wang

H. D.

Tan

C. C.

Snoogle: a search engine for pervasive environments

IEEE Transactions on Parallel and Distributed Systems 2010 21 8 1188 1202

10.1109/tpds.2009.145

2-s2.0-77954312066

21.

Yap

K.-K.

Srinivasan

Motani

MAX: wide area human-centric search of the physical world

ACM Transactions on Sensor Networks 2008 4 4 26 34

10.1145/1387663.1387672

2-s2.0-51849152754

22.

Tan

C. C.

Sheng

Wang

Microsearch: a search engine for embedded devices used in pervasive computing

Transactions on Embedded Computing Systems 2010 9 4, article 43

10.1145/1721695.1721709

2-s2.0-77950835325

23.

Shen

Liu

Chen

Liu

Moyer

SCPS: a social-aware distributed cyber-physical human-centric search engine

IEEE Transactions on Computers 2015 64 2 518 532

10.1109/tc.2013.211

2-s2.0-84921334095

24.

Ostermaier

Römery

Mattern

Fahrmairz

Kellerer

A real-time search engine for the web of things

Proceedings of the 2nd International Internet of Things Conference (IoT ’10)

December 2010

Tokyo, Japan

IEEE

1 8

10.1109/iot.2010.5678450

2-s2.0-78751700704

25.

Truong

Römer

Content-based sensor search for the Web of Things

Proceedings of the IEEE Global Communications Conference (GLOBECOM ’13)

December 2013

Atlanta, Ga, USA

2654 2660

10.1109/glocom.2013.6831475

2-s2.0-84904126433

26.

Wang

Song

Q. B.

MacDonell

Shepperd

Shen

J. Y.

Integrate the GM(1,1) and verhulst models to predict software stage effort

IEEE Transactions on Systems, Man and Cybernetics—Part C: Applications and Reviews 2009 39 6 647 658

10.1109/tsmcc.2009.2020690

2-s2.0-77955085642

27.

Deng

J. L.

Grey System Theory Course 2005

Wuhan, China

Huazhong University of Science and Technology Press

28.

http://db.csail.mit.edu/labdata/labdata.html

29.

http://tidesandcurrents.noaa.gov/gmap3