Abstract
More recently, an increasing number of object-attached sensors are publishing their real-time state on the Internet by using state-of-the-art Web technologies, which make the sensor search service extremely important for the Web of Things (WoT). However, the existing issues that the sensor search service is facing bring huge challenges to the design of matching state estimation scheme. In this paper, an architecture of high-efficiency content-based sensor search system is depicted to provide a prototype system for sensor search. And then a matching state estimation scheme is proposed in detail, including a sensor state prediction approach to accurately estimate future sensor readings and a match estimating and verifying approach to effectively classify and verify candidate sensors, in order to enhance the performance of our search system. Simulation results show that our matching state estimation scheme dramatically reduces the communication overhead of search system and achieves excellent performance in terms of recall ratio and precision ratio.
1. Introduction
Nowadays, a growing number of sensors and sensor networks have been connected to the Internet and World Wide Web, such that the real-world entities associated with sensors can be observed by using a standard Web browser [1]. The novel Web-based services, like pachube.com [2], Xively [3], and Microsoft SenseWeb [4], offer application programming interfaces (APIs) for publishing structured sensor data on the Web instantaneously, such as the moisture and temperature information of plant growth environment. Moreover, real-world objects are beginning to publish real-time data, with respect to their state, on the Web. One of the representative cases is Bicing [5], which is a public bicycle-sharing system in Barcelona. Bicing enables users to obtain the number of available bicycles in real time via the Web to facilitate travel. The WoT, which is based on the REST principle, extends the scope of traditional document-centric Web to a generic interface of the real world by providing each real-world entity with a Web representation which can be accessed via lightweight APIs [6, 7].
The sensor search service in the new fashioned WoT has become equally important just as the search service for documents on the conventional Web [8]. However, classical search engines, such as Baidu, Yahoo, and Google, are primarily utilized to search for static or slowly varying unstructured data like Web pages, pdf, doc files, and multimedia [9, 10]. Conversely, the search objects in the WoT are massive sensors whose state is rapidly changing [11]. Additionally, whenever a classical search engine responds to a search query, it needs to access all available objects, which leads to tremendous communication overhead. It is impractical because of the scarce energy resources in sensor networks and frequent communications will result in the rapid death of sensors [12, 13].
A further enormous challenge for sensor search in the WoT is that the majority of existing search systems lack the support of the search for sensors that meet a given query range. It is evident that the output information of most sensors in the WoT is in the form of numerical value; thus the value-based search mode becomes extremely important for the WoT [14]. However, sensor measurements always slightly fluctuate over time as the state of monitored entities usually changes quickly and marginally. Hence, searching for sensors in real time based on a specific value is of limited benefit. Much research must be pursued in the direction of novel value-based search modes.
From those preceding discussions, we conclude that conventional document-centric Web search engines are not directly applicable to the WoT and novel efficient content-based sensor search systems are eagerly to be proposed. The remaining parts of this paper are organized as follows. Section 2 gives an overview of the related work in the WoT. Section 3 presents an architecture of the proposed MSE sensor search. Section 4 proposes a sensor state prediction approach. Section 5 designs a match estimating and verifying approach. Section 6 discusses the performance evaluation. Section 7 finally concludes the paper.
2. Related Work
As a key research focus, the search engine technology has been studied for many years. Existing search engine techniques, like the keyword-based and catalogue-based search engines, are relatively mature by being used in multiple kinds of commercial applications [15]. However, the search systems mentioned above concentrate more on the search for documents (Web pages, videos, pictures, blogs, etc.) [16, 17], rather than object-attached sensors. Nowadays, numerous sensors have already been connected to the WoT; thus the requirements to search for real-world entities according to their perceived real-time state are explosively growing, which fuel an explosive growth in the sensor search technology [18]. However, the existing related research, especially effective matching state estimation scheme, is very limited and the development of sensor search technique in the WoT is still in its infancy [19].
Until recently, several sensor search prototype systems have been proposed, such as Snoogle [20], MAX [21], and Microsearch [22]. Their core ideas are that some textual descriptions related to real-world entities are stored in their associated sensors in the form of keywords. Hence, the preceding search systems are based on keyword match and only support the search for pseudostatic metadata rather than dynamic sensor measurements. In [23] a social-aware distributed Cyber-Physical Human-Centric Search (SCPS) engine is presented, which leverages the social network formed by handheld devices to search for real-world entities. SCPS locates human-carried entities according to uses' routine movement models. Once exceptional events that deviate from the routine model occur, SCPS will integrate Bayesian network into social network model to estimate the locations of deflected objects. However, the research issue of SCPS focuses on locating human-centric entities; the same as aforementioned prototype systems, SCPS cannot be directly used for the search pattern with a given measurement.
In order to search for real-world entities based on their observed measurements, Dyser [24], a novel sensor search engine, is introduced. Dyser provides each sensor and entity with a Web page, which is identified by a URL and accessed by using HTTP. Dyser consists of a resolver, which processes search queries, an index, which stores indexed metadata of sensors and entities, and an indexer, which periodically crawls sensor and entity pages. However, Dyser assumes that sensor readings always exhibit a high degree of periodicity. Moreover, the sensor output content in Dyser is not a raw sensor measurement but high-level state acquired by the means of data fusion. Nevertheless, there are numerous low-cost sensors in the WoT with relatively low data fusion capability and the assumption of periodicity also constrains its application area to particular scenarios.
In response to above problems, a content-based sensor search engine (CSS) is proposed in [25]. In CSS, an architecture to realize content-based search is presented, including a number of sensor gateways. Each gateway manages one or more sensor networks. The prediction model database is also included, which is associated with a sensor gateway. The light-weight prediction model, based on fuzzy logic, is applied to estimate the probability that a sensor matches with a given search range, instead of a concrete value. Sensor gateways in CSS only need to access sensors with high matching probabilities; thus CSS supports the search for sensors whose state satisfies a given query range at the cost of low communication overhead. However, the prediction model in CSS is time-independent, implying that once the query range is given the matching probability of a sensor in the same period is constant. CSS thereby ignores the fact that sensor output values are always changing, which leads to a relatively low prediction accuracy of matching probability and a correspondingly high communication overhead. Besides, CSS overlooks the search needs to sensors whose future state will meet the query requirements. Furthermore, CSS lacks the support for returning all the sensors that meet the search needs to the user.
To solve these existing problems, we propose the matching state estimation scheme for sensor search in the Web of Things (MSE). The main contributions of this paper include the following:
An architecture of MSE is designed to efficiently search for sensors in the WoT. We depict the architecture of MSE, including the main modules and the functions of each module. We also propose two types of search modes, the reliable MSE search (R-MSE) and the proactive MSE search (P-MSE), to fully satisfy the search needs which present strong randomness and uncertainty with respect to the query time and the number of returned search results. To the best of our knowledge, this is the first research paper that summarizes the search modes in the WoT according to the query time and the number of sought results. A sensor state prediction approach is proposed to estimate the short-term state of sensors in order to find candidate ones. The sensor state prediction approach is the core algorithm of our prediction model, which is based on raw sensor data and without any assumptions regarding periodicity in sensor measurements. Besides, this prediction approach can better exploit the temporal correlations between the sensor data and accurately perceive the variation trend of future sensor readings. A match estimating and verifying approach is presented to estimate the matching state of candidate sensors and efficiently verify their virtual state, so as to ensure the reliability of search results. The proposed approach contains a sensor classifying method, which classifies sensors into three categories based on their predicted state to find all match sensors, a sensor ranking method, which assesses the matching probabilities of match sensors and sorts them in an effective way, and a sensor verifying method to check match sensors with minimal communication overhead.
3. Architecture of MSE Sensor Search
In this part we depict the architecture of our MSE sensor search system and present the main components, encompassing the functions of each component. Then we propose two kinds of search modes, which have never been fully considered in the preceding research. Furthermore, we, respectively, design the interactive processes between these components in MSE system for the two search modes.
The MSE system consists of four types of modules: client, gateway, prediction model database (PMD), and wireless sensor networks (WSNs). The user submits a sensor search query via the client. WoT gateways distribute across the Internet in a two-tiered hierarchy, which are responsible for further issuing the search query and managing one or more WSNs. Each PMD is associated with a WoT gateway and stores prediction models of each sensor. Real-world entities are observed by their embedded sensors in WSN. The architecture of MSE system is plotted as Figure 1.

The architecture of MSE search system.
As is described above, WoT gateways present a two-tiered structure, the upper layer gateway named global gateway which is responsible for responding to the search query and issuing the query to appropriate lower gateway, according to some distinguishing information provided by the user, such as geographical locations like “ETH Zurich,” and the type of desired sensors as “Temperature.” The lower gateway, called local gateway, crawls the prediction models of sensors periodically and stores them to its related PMD.
When executing a search query, classical search engines are accustomed to searching for the desired targets by accessing all available objects, which spend considerable time and cause vast communication overhead. Hence, the traditional search pattern results in poor search efficiency. In the MSE search system, we establish a prediction model based on raw data for each sensor to improve the search performance. The prediction model is built in a distributed manner to predict the sensor output value at the query time and estimate the matching probability of a sensor with the given search range. Thus MSE only needs to communicate with sensors that have relatively high matching probabilities instead.
3.1. R-MSE Sensor Search
The user performs a content-based sensor search query via the client. Above, the state of monitored real-word objects always changes; thus the measurements of attached sensors usually fluctuate over time. Therefore, searching for sensors based on a given specific value is of limited use. We define the search query as
Depending on the query time

The interaction process of R-MSE.
At first, sensors, respectively, establish their own prediction models based on the sensor state prediction approach which will be proposed below. Then the local gateway periodically crawls the prediction models of each managed sensor and stores these models in its relevant PMD. When a global search query arrives, the global gateway issues a local search command to a suited local gateway after resolving the differentiated items submitted by the user. The local gateway initiates a matching query to its associated PMD to classify and sort candidate sensors. After that, the local gateway forms a local ranking list (LRL), which presents match sensors in descending order based on their matching probabilities, and sends to its upper global gateway. Then the global gateway reorders the sensors in these lists and publishes a verification command to the corresponding local gateway to further selectively verify the actual status of the chosen sensor. Afterwards, the local gateway transmits the verification results to the global gateway which will return the final search results to the user in the end.
3.2. P-MSE Sensor Search
The user may be also interested in searching for the sensor whose future state will belong to the query range
As is depicted in Figure 3, we remove the verification operation from the P-MSE search process. Instead, the global gateway returns the top-

The interaction process of future search.
4. Sensor State Prediction Approach
The search objects in the WoT are massive object-attached sensors whose state may change rapidly. In order to guarantee the timeliness and reliability of search results, we present a sensor state prediction approach in this section to estimate the variation trend of sensor state. When a search query arrives, these candidate sensors will be selectively accessed according to their predicted state. Thus the communication overhead of the search system can be obviously reduced.
In the vast majority of sensor search applications, it is impractical to obtain the complete information about the perceived objects, due to the limited perception and storage capacity of sensors. Sensor search systems whose partial information is unknown can be considered as the Grey system [26]. Therefore, we estimate the sensor state by using the identical dimension grey recurrence dynamical GM(1,1) model
We define the original sensor state sequence of senor s as
Then we establish the albinism differential equation of
Partial samples of observed real-world entity might be lost because of the component damage or system failure. Hence, we need to fill the imperfect original sequence to reduce the impact of missing samples on the data prediction process. The data filling method is designed as
Then (3) can be further transformed as
We, respectively, construct the sensor state accumulative matrix B and the constant term vector Y as follows:
By adopting the least square method, we estimate the developing coefficient a and the grey action u as
The discrete solution of albinism differential equation can be deduced via substituting the estimated values
After conducting the inverse accumulated generating operation on the discrete solution mentioned above, we obtain the prediction function of the original sensor state sequence as follows:
Similarly, the sensor reading
5. Match Estimating and Verifying Approach
Estimating the matching state of the sensor whose predicted reading locates around the query boundary is a tough issue, because of the inevitable and uncertain prediction error caused by ambient noise. Besides, a unified measure for evaluating the matching degree of a sensor with the given search content still lacks. Furthermore, verifying every candidate sensor will result in tremendous communication overhead. Therefore, in this section we propose a match estimating and verifying approach to assess the matching state of sensors and verify these sensors with relatively low communication overhead, so as to improve the performance of our search system.
5.1. Sensor Classifying and Ranking Method
When processing the search query without defining the number of sought sensors, we adopt the P-MSE scheme so that the global gateway returns all match sensors to the user, which seems to be straightforward based on the predicted values estimated by their prediction models. However, the prediction error is inevitable because of the impact of random disturbance on the sensor readings. Hence, assessing the matching state of sensors, especially the border ones whose predicted state values are around the search boundary, becomes intricate. Here we propose an adjustable sensor classifying method to address the preceding problem.
The error threshold
According to the error threshold and query range, we divided the marching state of sensors into three categories. Supposing that a sensor whose predicted value is within the range
After that, the global gateway only returns PM and BM sensors to the user as the search results when responding to the search query with
Obviously, based on the above classifying method, we can easily find that the sensor is a PM or BM sensor. However, distinguishing a more suitable sensor from homogeneous ones is unintuitive. Besides, the user-friendly search results should present the most suitable sensor on the top. Therefore, we further design a sensor ranking method to sort match sensors in an effective way.
Firstly, we map one-dimensional predicted sensor reading into the vector space. Denote
Essentially, we cannot obtain the random error in advance; hence the prediction error is inevitable and unpredictable. However, the fact we confirm is that with the same prediction error the closer the predicted value is to the midpoint of query range, the higher the matching probability of the sensor is. Accordingly, we define the criterion vector as
According to the match probability estimated above, the local gateway sorts its managed match sensors in descending order. Afterwards, it forms a local ranking list
5.2. Ordered Sensor Verifying Method
The optimal search results should exhibit the optimum sought sensors to the user; hence we need to reorder the sensors in the submitted LRLs. Furthermore, R-MSE involves the verifying process to guarantee the reliability of search results, but checking all match sensors will result in tremendous communication overhead. Thus we propose an ordered sensor verifying method to reorder locally optimal results and selectively verify sensors that have high matching probabilities to reduce the communication overhead of the verifying process.
As above, after receiving LRLs, the global gateway will merge these lists and form a global ordered list
Otherwise, supposing that
After deducing the sensor k that needs to be verified, the global gateway will issue a verification command to the local gateway that is responsible for managing the sensor k to check its current actual state. This verifying process lasts until enough match sensors are found. At last, the global gateway returns the final sought sensor list
(1) Builds the prediction model (2) Periodically crawls the model; (3) Stores the model to the PMD; (4) (5) Accesses PMD; (6) Specifies error threshold (7) Classifies sensors into (8) Converts to vector space (9) Acquires the matching probability (10) Sorts match sensors (11) (12) Reorders sensors in these lists; (13) Forms the global ordered list (14) (15) Performs P-MSE search: (16) (17) (18) Executes R-MSE search: (19) Verifies sensors in (20) (21) (22) Adds (23) (24) (25) (26) (27) (28) (29) (30)
6. Performance Evaluation
In this section we use 2 real-world datasets, IntelLab [28] and NOAA [29], to evaluate the performance of our MSE system. The crawling period of IntelLab and NOAA, respectively, is 10 minutes and 60 hours. The search range
6.1. R-MSE Simulation Evaluation
In this experiment, we introduce the communication overhead to examine the performance of R-MSE and CSS search engine mentioned in [25], due to the reason that the returned search results are always accurate. The ordered verifying approach, included in R-MSE, needs to communicate with match sensors depending on the global ordered list
The optimal global ordered list should rank all match sensors on the top so as to reduce the communication overhead of search system by minimizing the number of sensors that need to be verified. In order to assess the validity of global ordered list, we introduce a normalized metric
Note that if the sensor at rank i is a nonmatch sensor, then
As is illustrated in Figures 4 and 5, the communication overhead of both R-MSE and CSS presents downward trend when the query range increases. Compared with CSS, R-MSE, respectively, reduces the communication overhead about 21.1% and 20.9% for IntelLab and NOAA. And the variances of R-MSE are a little smaller than CSS. Because the number of match sensors will increase with the query range growing, the effect of random error on the matching state estimation will be reduced. Moreover, the accuracy of our prediction model is much higher than that of CSS, and our ordered verification approach efficiently check sensors after ranking sought sensors on the top. However, the low-accuracy prediction model constructed by CSS ranks many nonmatch sensors ahead of match ones, which generates tremendous communication overhead during the verification process. Therefore, R-MSE can achieve better performance in terms of communication overhead with different query ranges than CSS.

Communication overhead with different query ranges for IntelLab dataset.

Communication overhead with different query ranges for NOAA dataset.
Figures 6 and 7 plot the communication overhead of R-MSE and CSS with different simulated days for IntelLab and NOAA datasets. We see that the communication overhead is reduced about 23.6% and 24.3% by R-MSE severally for Intel and NOAA in comparison with CSS. The variances of R-MSE are much smaller than those of CSS. That is because our state prediction approach can accurately estimate the future readings of sensors and the proposed match estimating and verifying approach effectively assesses the matching probabilities of sensors and then efficiently verifies match sensors based on their matching probabilities. However, the prediction model in CSS is built based on sensor measurements in the previous two periods only, without fully considering the temporal correlations between these sensor data, leading to poor estimation accuracy of matching probability. Thus CSS ranks many nonmatch sensors on the top, which results in vast communication overhead when executing the verification operation. Besides, the number of nonmatch sensors found by CSS may fluctuate. Therefore, compared with CSS, R-MSE locates more suitable sensors at the cost of relatively low communication overhead. And the communication overheads of CSS obtained in several simulations are spread out; then the variances of CSS are larger than R-MSE.

Communication overhead with different simulated days for IntelLab dataset.

Communication overhead with different simulated days for NOAA dataset.
6.2. P-MSE Simulation Evaluation
In order to objectively and exactly assess the search performance of P-MSE, we introduce two typical metrics, the recall ratio and precision ratio, to evaluate our search scheme. However, due to the reason that the CSS search system lacks the support for the search mode with the number of sought sensors undefined, thus we develop an improved CSS, which is called CSS-A, to constrain the number of returned search results by setting a matching probability threshold. More specifically, CSS-A only returns the sensor whose matching probability exceeds the defined threshold. Besides, as is mentioned above, the error threshold has a direct effect on the search performance of P-MSE. Thus in this section we, respectively, evaluate the performance of P-MSE and CSS-A with different error and matching probability thresholds.
As is depicted in Figures 8 and 9, the recall ratio of P-MSE is steadily improved with the error threshold increasing. Conversely, the recall ratio of CSS-A becomes increasingly low when enlarging the matching probability threshold. The variances of the two algorithms are very small and almost equivalent due to the concentrated simulation results. We also observe that the recall ratio of P-MSE is about 34.1% and 59.9% higher than that of CSS-A, respectively, for Intel and NOAA. This is because the growing error threshold increases the number of match sensors; thus more match sensors will be returned to the user. However, the prediction model in CSS-A system is time-independent, which neglects the dynamic characteristics of sensor data, resulting in large random errors while predicting the matching probabilities of sensors. Though enlarging the matching probability threshold seemingly contributes to eliminating more sensors that have low matching probabilities, CSS-A indeed excludes many match sensors due to its low-accuracy matching probability estimation method. Therefore, with an increase in matching probability threshold the recall ratio of CSS-A decreases and that of P-MSE is much higher than CSS-A.

Recall ratio with different error (matching probability) thresholds for IntelLab dataset.

Recall ratio with different error (matching probability) thresholds for NOAA dataset.
Figures 10 and 11 compare the precision ratio of P-MSE and CSS-A with different error (matching probability) thresholds for IntelLab and NOAA datasets. We find that with an increase in error (matching probability) threshold the precision ratio of the two schemes increasingly descends. P-MSE severally enhances the precision ratio about 13.1% and 18.8% for IntelLab and NOAA than CSS-A. The variances of P-MSE are relatively smaller than CSS-A. That is because the growing error threshold increases not only the number of returned match sensors but also nonmatch ones, which raises the proportion of nonmatch sensors to all the returned ones. Thus the precision ratio of P-MSE is slowly decreased. However, the precision of prediction model built by CSS-A is relatively low; hence CSS-A removes a growing number of match sensors while returning more and more nonmatch ones with the matching probability threshold increasing, which result in a drastic downtrend and a poor performance of precision ratio than P-MSE. Moreover, because of the low-precision prediction model, the number of nonmatch sensors returned by CSS-A may be very different, which result in the relatively unstable precision ratio in several simulations. Thus the variances of CSS-A are a little larger than P-MSE.

The effect of error (matching probability) threshold on the precision ratio of IntelLab dataset.

The effect of error (matching probability) threshold on the precision ratio of NOAA dataset.
7. Conclusions
In recent years, massive sensors are being connected to the Internet and accessed by using a standard Web browser. However, the research on sensor search in the WoT is still very scarce. In this paper, we propose a matching state estimation scheme for sensor search in the WoT, which includes these contributions: an architecture of MSE sensor search, a sensor state prediction approach, and a match estimating and verifying approach. The architecture of MSE scheme depicts the prototype of content-based sensor search and presents two kinds of search mode in the WoT. The sensor state prediction approach is designed to build our prediction model to estimate the future sensor readings. The match estimating and verifying approach aims at efficiently classifying and verifying candidate sensors to enhance the performance of our search system. Evaluation results demonstrate the validity of MSE scheme in the area of sensor search.
So far, the research on content-based sensor search in the WoT is just at the stage of prototype designing. The applicable matching state estimation scheme is still very rare. As future work, we plan to improve the practicability of our scheme and apply to the practical sensor search service.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors wish to appreciate the support from National Natural Science Foundation of China (61170275), Beijing Higher Education Young Elite Teacher Project, Technology Project of Guangdong Province, and Beijing Key Laboratory of Work Safety Intelligent Monitoring.
