Abstract
The self-organizing nature of sensor networks, their autonomous operation and potential architectural alternatives make them suitable for different data-centric applications. Their wider acceptance seems to be rising on the horizon. In this article, we present an overview of the current state of the art in the field of wireless sensor networks. We also present various open research issues and provide an insight about the latest developments that need to be explored in greater depth that could possibly make this emerging technological area more useful than ever.
Keywords
Introduction
Recent advances in hardware, sensor, and wireless networking technologies are enabling large-scale deployment of superior data acquisition systems with adjustable resolutions, by instrumenting the physical world with numerous networked microsensors, and such systems are called wireless sensor networks. Unlike traditional sensors that sense physical parameters in a passive manner, microsensors in a sensor network are a part of a full-fledged computer called a sensor node and have capability to process the sensor readings or share them with their neighbors. Each sensor node consists of a variety of different sensors, an embedded processor, memory, a low power radio, and a tiny battery. What makes sensor nodes so attractive is their miniaturization, low cost, low power radio and autonomous ad hoc connectivity; eliminating the need for any human intervention. Thus based on locally sensed parameters, a sensor network provides a global view of the monitored area. These sensor could be deployed permanently at pre-specified locations in a given area, provided it is easily accessible. Sensors could also be deployed in an unknown territory and possibly hazardous environment using low-flying airplanes or unmanned aerial vehicles (UAVs) or unmanned ground vehicle (UGVs).
Research on sensor networks was primarily initiated for tactical surveillance in military applications, by throwing sensors in an inhospitable terrain with the aid of an unmanned vehicle or a low flying aircraft; thereby enabling soldiers to have an enhanced awareness of possible presence of chemical and biological agents. But now, commercial interest in this field is growing at a faster pace by sprinkling thousands of tiny sensors for surveillance of roads and fields, or placing them in buildings and bridges to monitor structural health, or installing them in industrial facilities to manage energy, inventory and manufacturing processes. This is also due to the fact new types of sensors are commercially available and could be broadly classified as pressure, temperature, light, biological, chemical, strain, fatigue, tilt sensors, and so on. These have also enhanced their potential use in commercial applications such as remote monitoring of complex machinery and processes; monitoring of health for highways, bridges and other civil infrastructures; environmental monitoring of ecosystems, toxic spills, radiation level monitoring in nuclear plants and wild fire monitoring of forests. Specifications of these sensors can be obtained from various manufacturers including [1].
Having adequate density of sensors can help overcome the problem of line of sight and environmental effects as few sensors will always be located close to the event of interest. Distributed sensors can aggregate complex data to provide rich, multi-dimensional pictures of the environment which a single complex sensor, working alone cannot provide. This also ensures higher signal to noise ratio (SNR) by combining information from sources with different spatial perspectives. A sensor network can be said to be the nervous system of our engineering network world, extracting and transmitting useful and timely information reliably for efficient decision support and quick corrective actions. Tremendous capabilities and associated underlying advantages of this technology motivated system developers to make dreams of such networks an affordable reality.
Although challenges faced by a wireless sensor network are usually application specific, there are few technical issues that all applications share. For example, generic nodes with limited battery power can be deployed to constitute a mobile ad hoc network (MANET) [2] by placing them at random in unattended, highly dynamic environments. But a real challenge is to get data from sensor networks as their sheer numbers and density of these failure prone sensor nodes makes it difficult to accumulate data [3]. This puts a severe demand on the protocols, the software, and the algorithms that are required to maintain connectivity among sensors and support successful retrieval of information while conserving their energy as much as feasible.
In this article, we describe the state of the art in the field of wireless sensor networks, their unique characteristics, design challenges, and important topics of ongoing research as well as future directions. We present a comparison of the existing network architectural alternatives and discuss their applications. We also illustrate the current trends in protocol and algorithm design for sensor network, highlighting main achievements of existing proposals for routing, design of multiple access schemes, power control strategies, application-driven design, distributed databases, network management, and middleware for sensor networks.
Building Blocks
There are many underlying characteristics of wireless sensor networks and the most important ones are considered here.
Exploiting redundancy: Sensor nodes are expected to run out of energy because of limited battery power. They are also prone to failure because of random placement of nodes in a harsh environment, presence of unexpected obstacles, unreliable nature of the wireless link, and nodes in a neighborhood might not be able to reach each other occasionally. Redundancy and device density are exploited to ensure full coverage of the intended terrain and allow the network to self reconfigure when communication links are disrupted. Data-Centric Routing: In the data-centric paradigm, sensor nodes need not be identified by a unique ID. The querying unit usually called a sink, identifies multiple responding sensors based on the required and collected data. The query is not addressed to any specific sensor, but is based on the data observed by the sensor and hence it is termed as the data-centric approach. Moreover, as nodes in a locality sense the same phenomenon, the traditional end-to-end routing used in MANETs cannot be applied to sensor networks for query processing. Data-centric routing is eminently suited to perform operations such as data aggregation in sensor networks. Data Aggregation: Data implosion and overlap are common phenomenon in a sensor network as nodes in the proximity usually hold similar data. Energy is therefore wasted when the same data value from multiple sources is individually routed to the sink. It is desirable to process as much data locally as possible so as to reduce the number of bits transmitted in the air, particularly over a long distance. Transmitting 1 Kb of data a distance of 100 m [4] costs the same amount of energy as executing 300 million instructions on a general purpose processor with a modest computing device rate of 100 million instructions per second (MIPS). Localized Algorithms: These can be implemented in a distributed manner and are therefore attractive in environments where the delay and communication overhead associated with collection/dissemination of global information can adversely affect the overall performance. Local processing and collaboration among sensor nodes are encouraged for filtering and combining readings from sensor nodes in a neighborhood.
Architectural Alternatives
It is important to study the network organization for data communication in a sensor network as it affects the way nodes configure themselves for data communication and thus corresponding impact on MAC layer design and security issues need to be looked at very carefully. There are two main architectural alternatives: hierarchical and flat network organization. Hierarchical networks have been used extensively in MANETs to eliminate the overhead involved in updating the network about the global topology. Flat networks rely on popular techniques of data centric routing paradigm such as directed diffusion [5] that exploits tree based structure for data gathering and aggregation. Although the hierarchical network architecture is energy efficient for collecting and aggregating data from the entire sensor network or all nodes within a large target region, a flat network architecture is more suitable for transferring data between certain source-destination pairs separated by a large number of hops using knowledge of their relative locations. We now differentiate between hierarchical and flat sensor networks and describe protocols recently developed for these two different modes of data communication.
Hierarchical Network Architecture
One way of minimizing data transmission over a large number of hops is to employ a hierarchical organization of the network by creating clusters. In such a scheme, although all nodes typically function both as switches/routers, one node in each cluster is designated as the cluster head (CH), and traffic between nodes of different clusters must always be routed through their respective CHs or via gateway nodes that are responsible for maintaining connectivity among neighboring CHs. Formation of the clusters in a distributed way is desirable so that global knowledge of exact location of each node is not required. Thus, clusters are formed and maintained based on local information.
The number of tiers within a hierarchical network can vary according to the number of nodes as shown in Fig. 1(a). A proactive clustering algorithm for sensor networks, called LEACH is one of the initial data gathering protocols introduced by MIT's researchers, Heinzelman et. al. [6]. Each cluster has a CH that periodically collects data from its cluster members, aggregates and sends the data to an upper level CH in a proactive mode. The CH is responsible for data aggregation and communication with the rest of the network, whereas the cluster members sleep unless they have to communicate with the CH. A new CH is elected in a neighborhood periodically as CHs dissipate energy faster compared to cluster members. Manjeshwar and Agrawal [7] introduced a reactive protocol TEEN (

Comparison of hierarchical and flat networks. Part (a) shows two-tier sensor network architecture, part (b) shows a flat sensor network that uses directed diffusion for routing.
In a flat network architecture as illustrated in Fig. 1(b), all nodes are equal and connections are set up between nodes that are within each other's radio range, although constrained by connectivity conditions and available resources. Route discovery can be carried out in flat networks using reactive flooding that does not require global maintenance of network topology. During flooding, each node broadcasts the data packets till all the nodes in the network receive the packet, therefore it is energy exhaustive as nodes could receive multiple or duplicate copies of the same data packet as they have common neighbors sensing similar data due to high density of nodes. Gossiping [9] is a relatively less energy consuming variant of flooding where the packet is forwarded to a few randomly selected nodes based on some prior information. Intanagonwiwat et al. [5] have introduced a data dissemination paradigm for sensor networks based on a flat topology called directed diffusion. As we can observe from Fig. 1(b), the query is propagated toward the source nodes along multiple paths shown by dashed lines. The arcs show how the query is directed toward the event of interest similar to a ripple effect. A smaller set of paths can be reinforced (shown by dark lines in Fig. 1(b)), among a large number of paths initially explored to form the routing backbone. In a flat topology, data may be aggregated at intermediate nodes along the routing backbone forming virtual clusters in a locality, similar to the hierarchical topology. Rumor routing [10] is another routing protocol for flat networks that looks at disseminating queries without relying on traditional methods like flooding across the network. Data routes are generated corresponding to each event so that a given query may be randomly routed till it intersects the data route generated by the event being monitored. The rumor routing algorithm uses a set of long-lived agents that create paths that are directed toward the events they encounter. On the other hand, there are protocols like SPIN, which base their communication decisions on application specific knowledge of the data and the knowledge of resources available to them. A family of protocols named as SPIN (Sensor Protocol for Information via Negotiation) has been defined [11] that limits the communication energy spent in flooding and gossiping. The basic strategy is to let nodes name their data using high level data descriptiors, called meta data so that transmission of redundant data throughout the network could be eliminated.
After going through the route discovery phase, the communication paths in a flat topology are usually set up in the form of routing trees with the root of the tree at the sink. A single network flow is assumed in [12], where a single data sink attempts to gather information from a number of data sources in the sensor network. Similar to clustering used in hierarchical networks, reverse multicast trees is a popular data gathering model for flat networks where data is collected from sources in proximity, aggregated at the parent nodes till it reaches the final aggregation point, which could also be the rout node. With a data aggregation tree, a lower marginal energy cost is required in connecting additional sources to the sink along the shortest distance of the source to the aggregation tree. This is in contrast to the address centric (“end-to-end” routing) approach, wherein, the cost of adding a source is its distance to the sink. If the tree structure of the multicast tree is bushy (few large clusters), then potentially more data can be aggregated (many small clusters to cover the same area); but more energy and time will be required in exchanging data from a large number of clusters. Data aggregation algorithms have also been developed for fault-tolerant feature extraction by Krishnamachari et al. [13]. The objective of feature extraction is to identify regions with a distinguishable characteristic, such as the value of certain sensed parameter exceeding a threshold. Once the feature has been identified, it is disambiguated from faulty sensor readings, and sensors that belong to a region with that feature self-organize to form a cluster. Data is aggregated within the cluster to save resources and a clustering algorithm is responsible for intra-cluster routing. Feature extraction can be used for detailed monitoring of that regions using additional sensors.
A flat and a hierarchical network organization can be easily compared and a summary is given in Table 1. We now provide a brief overview of different kinds of routing protocols developed for sensor networks in the recent past. We summarize their relative advantages and disadvantages and indicate the types of applications that might benefit from them.
Comparison between Hierarchical and Flat Sensor Networks
Comparison between Hierarchical and Flat Sensor Networks
In many cases, sensor data are useful if the location of their source is known. Motivated by the fact that data source/responder to queries may often be identified by its geographical location in the sensor field, Yu et al. [14] have designed an energy efficient routing algorithm that propagates a query to the appropriate geographical region without flooding. The basic idea is to route packets in a manner such that the next hop is geographically closer to the destination. The decisions are completely local as information about immediate neighbors is adequate. Yu et al. [14] have proposed Geographic and Energy Aware Routing (GEAR) algorithm that selects a neighbor based on residual energy level to route a packet toward the target region and recursive geographic forwarding to disseminate the packet inside the desired region. This is a distributed protocol that does not require route discovery and maintenance, provided the location of the target region is known. The neighbor selection can be adapted to account for other metrics such as link quality or any other parameter of interest to the user.
Thus, geographic routing protocols lead to drastic performance improvement provided location information is exact. Son et al. [15] study the effect of location errors on geographic routing that is even more significant if sensor networks consists of mobile nodes. Their study is based on freshness of location information, the speed of mobile nodes, and the mobility pattern of mobile nodes. They identify two main problems caused by mobility-induced errors, the first caused by the mobility of neighbor nodes and asymmetric communication links and the second caused by the movement of the destination node.
Energy-Aware Routing
An important objective of energy-aware routing is to maximize the system lifetime. Energy Efficient routes are computed between a source-sink pair based on the energy available at intermediate nodes and the transmission energy required to reach them. Possible steps that consume energy while routing are topology discovery overhead, routing protocol overhead, actual transmission of data, and idle radio listening energy. Besides the actual workload, size density and fluctuations in the topology are other factors that affect energy consumption. It may be noted that the sensors are not mobile. But the topology may still change due to failure of nodes or insufficient energy in maintaining existing links. In directed diffusion during the path establishment phase, the data is routed along multiple paths and energy metrics are collected along the way. Some popular energy metrics are minimum transmission energy, which usually yields the shortest route from the source to the sink, and a path with maximal residual energy path is chosen. Lastly, max-min routing is another metric where the route whose bottleneck node (node experiencing substantially large amount of traffic) has the maximum available energy is preferred.
Probabilistic routing for obtaining uniform energy consumption using destination initiated reactive protocol like directed diffusion protocol, has been explored where instead of maintaining one optimal path, a set of good paths is retained [16] and probability of selecting a path is based on the amount of energy consumed by each path. Kannan et al. [17] have introduced a sensor-centric approach for data routing, where sensors behave as rational players in an N-player routing game in order to tradeoff their own resource consumption with network wide objectives. An additional constraint is the possibility of sensor failures. They show that computing optimal routing is in an N-player routing game is NP-Hard. Therefore they develop a game theoretic metric [18] called path weakness to measure the qualitative performance of different routing mechanisms and derive analytical results on computing paths with bounded weakness.
Multipath Routing
One advantage of the flat networks is the ease of creating multiple paths between communicating nodes, thereby alleviating congestion and providing robustness in the presence of failures. Route selection can also be made according to the traffic requirements where time critical data may be sent over low delay paths, whereas periodic data may be routed through higher latency paths. Multipath routing is used to spread the traffic uniformly over an increasing number of nodes; achieve improved load balancing and enhanced resilience to node failure. Traffic spreading avoids penalizing few nodes connecting the source directly to the destination in a manner that maximizes the time at which the first node runs out of its battery power. Analytical results by Chang and Tassiulas [19] prove that in order to improve the overall lifetime of the network, the protocol should maximize the residual energy of every node while routing, rather than minimizing the total energy consumed in routing. Jain et al. [20] use a distributed multiple path routing protocol to establish multiple routes of varying lengths and available energy in a large symmetrical area bounded by a given source-sink pair. It makes use of location awareness at each node to minimize the overhead involved in setting up multiple paths. Traffic is then split on paths proportional to their residual energy using a localized traffic scheduling algorithm.
Multiple path routing is also actively used in sensor networks to provide robustness to node or link failure though maintenance of disjoint multiple paths so that alternate paths may be substituted on failure of the primary path. Partially disjoint alternate paths or braided paths [21] is another variation of multiple path routing used to maintain back-up routes. Sometimes duplicate copies of data is simultaneously routed through multiple routes to improve the likelihood of essential information being routed to the destination. Computation of parallel multiple routes has also been proposed [20] as a mechanism to provide Quality of Service in sensor networks.
Leveraging Node Mobility for Routing
Smart Tag [22] is a protocol that exploits the mobility of a few sensor nodes to save energy consumed in multihop data routing. In case the sensor network has some mobile nodes, a node that is moving away from a location of an event may choose to pass on the responsibility of monitoring to a closeby node as it drifts away. The idea is to buffer the information at static nodes and transfer it to the mobile nodes when they are in the proximity, so that the mobile nodes may disseminate information while they move around in the network. If the observer's or the sink's motion can be predicted, large power savings can be obtained over a network with node mobility. Chakrabarti et al. [23] have proposed a queuing formulation to model data collection by the mobile observer [24] accurately over the region of interest. The observer in their model is assumed to traverse the same path repeatedly, the data is pulled by the observer by waking up the nodes when it is close to them. Since nodes only transmit when the observer is close to them, the power requirements are significantly reduced. In a real time setting mobility can be used to facilitate network connectivity and saving communication energy spent between sensor nodes and data repository. Rao et al. [25] propose a communal-mission-oriented mobility to decide where to move a node so that it can perform either of the two tasks, that is, local scans (checking for uncovered regions) or data forwarding better. They have developed distributed mobility algorithms to support high priority tracking traffic (detailed information on tracked targets) by moving scanning nodes to route latency critical tracking traffic.
MAC Layer Design in Sensor Networks
The role of the MAC (media access control) layer in a shared communication medium, is to schedule transmission of data for nodes in a neighborhood so as to avoid interference, and ensure fair utilization of the available channels. The primary design issues handled at the MAC layer in a conventional wireless network are latency, throughput, bandwidth utilization and fairness. In the case of sensor networks, energy efficiency and scalability dominate the design objectives. Fairness is not considered as important as compared to energy efficiency, as long as all nodes in the network get to contribute data and serve a common application. The data traffic may be low for long periods with brief periods of intense or bursty traffic. At each of the nodes, there is traffic originating out of the node and traffic being routed through the node because most of the nodes are both data sources and routers. They have little or no dedicated carrier sensing or collision detection and they have no specific protocol stacks that could specify the design of their media access protocol. The main sources of energy consumption at the MAC layer are collision, overhearing, control packet overheads and idle listening. Therefore, MAC layer design for sensor networks is really challenging as in such applications majority of the nodes are not active for most of the time and the nodes waste their limited energy, listening and waiting to receive possible transmissions in the network.
The MAC layer design should be distributed because network wide synchronization for calculating a schedule would itself be an energy intensive procedure in sensor networks. Existing solutions to channel access methods in ad hoc networks can be divided into two categories: contention based and organized methods. In a contention based scheme, all nodes ready to transmit their data, contest for the medium and whosoever succeeds in getting the channel sends its data. Contention based schemes are not suitable for sensor networks because of the radio receivers must continuously monitor the access channel and resources are wasted whenever a collision occurs. The organized methods of channel access attempt to determine connectivity between nodes first and then assign channel slots in a hierarchical manner by forming clusters and selecting cluster heads. Organized methods such as Time Division Multiple Access (TDMA) are not as scalable as the contention based scheme although they are energy efficient as nodes sleep unless they are scheduled to transmit and collision, overhearing is naturally avoided. Allocating TDMA schedules within each cluster requires coordination and additional memory to store neighbors' schedules. PAMAS (Power Aware Media Access Scheme) [26] is a TDMA-based protocol that avoids overhearing by powering off the radio when no transmission or reception is taking place at a node. S-MAC (Sensor-MAC) is an improvement over PAMAS proposed by Ye et al. [27] that minimizes idle-listening as nodes sleep periodically. It forms virtual clusters to auto-synchronize on sleep schedules and unlike PAMAS, it only uses inchannel signaling to switch off radios. S-MAC achieves more than 50% savings in energy compared to IEEE 802.11 MAC protocol. The effect of clock drift can be significant if time slots are small, therefore, time slots are fairly large in S-MAC such that each slot consists of an active and sleeping part. Latency and throughput is traded with energy efficiency in S-MAC and they depend on the slot size as well as the activation period in each slot. The duty cycle of a node determines how long a node can be in sleep state before it is wakes up. S-MAC is a fixed duty cycle protocol and may not provide an optimal solution under varying traffic conditions and energy consumption patterns. An improvement over S-MAC, DE-MAC (Distributed Energy-aware MAC) [28] adapts the sleep duration in a slot such that it is inversely proportional to their residual energy level.
SMACS (Self-Organizing Medium Access Control for Sensor Networks) is another distributed infrastructure building protocol designed by Pottie et al. [4] that enables nodes to discover their neighbors in a flat network, and form virtual clusters to synchronize their sleep wake up schedule without the need of a global master. SMACS assumes that the nodes are able to tune the carrier frequency to different bands and define a channel as a pair of time intervals, similar to slots in a TDMA schedule. Similar to S-MAC, it uses message passing to minimize contention latency and control overhead but in order to reduce the likelihood of collisions, each link operates on a different frequency. This frequency band is chosen at random from a large pool of possible choices when the links are formed. FDMA or CDMA can be used to avoid interference between adjacent links. Its advantage is that non-synchronous scheduled communication enables the nodes to form links-on-the-fly, although bandwidth utilization is low because a node can talk to only one neighbor at a time slot.
Heidmann et al. [29] propose a MAC design for sensor networks, where Synchronization (SYNC), Request to Send (RTS), and Clear to Send (CTS) packets are exchanged prior to data transmission among neighbors for synchronization and collision avoidance. They use message passing for transmitting large data in bursts, using a single acknowledgement for several data burst to minimize control packet overhead. Srivastava et al. [30] have proposed a new technique, called Sparse Topology and Energy Management (STEM) to improve the network lifetime by employing a dual radio at the physical layer with different power levels. To wake up the sleeping neighbors they emulate a paging channel, on their secondary radios operating on a lower duty cycle that wakes up the main radio for data transmission. Although the wake up radio consumes much less energy, an additional component need to be incorporated at each node. Enz et al. [31] have introduced WiseMAC, a low power MAC protocol designed for low-duty-cycle wireless sensor networks. In order to achieve the lowest possible power consumption for hybrid networks, they have concurrently designed the radio and the protocol so that they do not have to consider limitations of existing hardware while designing protocols. They improve on important parameters of radio that affect higher protocol layers. For example, they use a form of preamble sampling at the MAC layer, and achieve large saving in energy consumption by reducing the wake-up time. The IEEE 802.15.4 standard [32] that specifies the physical and MAC layer for low rate wireless PANs or low power sensor networks was released in May 2003, and it is now being promoted by the ZigBee Alliance Inc. [33]. The objective of the ZigBee Alliance (a non-profit industry organization) is to assure broad adoption and interoperability of sensor networking solutions and products developed by different companies.
A scalable MAC address assignment scheme has been proposed by Heidmann et al. [29] to minimize the MAC header, which introduces a significant overhead, considering a small packet payload. Their algorithm is purely distributed and uses an encoded representation of the address exploiting spatial address reuse. Elson et al. [34] propose a design philosophy where small, randomly selected, ephemeral, probabilistically unique transaction identifiers are used in place of globally unique identifiers, where transaction is any computation during which some state must be maintained by the nodes involved in it.
Power Management and Topology Control
A network topology is described by set of neighbors for each node that are used for communication in a network. A node can control its radio range or its set of neighbors by imposing restrictions on the transmitting power level. It is really challenging to determine the right power level for each node so that the data can be routed optimally. The transmit power also depends on the level of desired reliability. A very small power level could leave the network disconnected, where as a very high power level has added cost of maintaining a large number of neighbors list, excessive interference, and provides less opportunity for spatial reuse [3]. The process of topology discovery is infrequent in sensor networks as the sensors are usually static. But based on the workload, the topology maintenance overhead may be significant as periodic exchange of beacon signals are required for this purpose. Increasing the network density can result in higher accuracy, but only if the data traffic is kept below a threshold to minimize wastage of energy due to frequent collisions. Sensors on the average get depleted faster if deployed with higher density. Thus, with increased density, lifetime may drop, even though the total initial energy for the network is much higher. Thus, intelligent management of the node density and transmit and receive power is desirable for a sensor network protocol design. It may be noted that in sensor networks, idle listening is almost as costly as actually receiving data. So, a better strategy is to use only a subset of sensor nodes, known as the coordinators made responsible for forwarding packets, so that most of the nodes can sleep. This implies a process of distributed decision making, in terms of whether a node should be entrusted with forwarding or not. Integrating topology control with a coordinator selection algorithm is an open issue, as coordinator nodes are always awake and constitute a connected multihop backbone. One of the two conflicting objectives here is to minimize the number of nodes that need to be coordinators while minimizing their power levels. The process of switching of a node into either state takes some finite time and resources.
It is very complex problem to adjust the duty cycle according to the traffic load so as to minimize energy wastage due to idle listening time. Algorithms that reduce the system energy consumption without significantly diminishing the connectivity by turning off nodes based on the connectivity level among neighbors have been proposed in literature. Xu et al. [35] have proposed a scheme called Geographic Adaptive Fidelity (GAF) where nodes turn off their radios to conserve energy when they are not involved in data communication. Node density is leveraged to increase the time that radio is switched off. GAF uses geographic location information to divide the area into fixed size squares called grids. Within each grid, it keeps only one node awake for packet forwarding. This node scheduling scheme turns off nodes from the communication perspective, without taking the systems-sensing coverage area into account. Simply turning off nodes that do not participate in data forwarding may lead to formation of blind spots. Inherent to power-awareness is the adaptability to changing environmental conditions and resources. At the same time power management should provide adequate versatility to prioritize either system lifetime or the output quality according to the user requirements. Adaptive Self-Configuring sEnsor Network Topologies, (ASCENT) [36] focuses on how to decide which nodes should join the routing infrastructure so that the network could adapt to a wide variety of environmental dynamics or terrain conditions. In ASCENT, a node signals and reduces its duty cycle when high message loss is detected, letting additional nodes in the region to join the network so that the messages could be relayed correctly. Other existing approach rely on making nodes alternate between the sleep and awake states, based on a probability proportional to the network density. This enables the network to have constant number of nodes that are awake, independent of its size.
Application Driven Design
The conventional data flow model in a sensor network is a multiple-source single-sink model. This follows from the typical data querying applications where a remote user (sink) collect data from an event observed by several source nodes in a locality or a closed boundary, which may be referred to as the target region. The query is initiated by the sink to the target regions and is flooded within the boundary of the entire target region. Potential for large scale data monitoring through sensor networks has been realized where parameters of interest are compared for extended durations over different geographical locations in space. Examples of such applications include monitoring contaminant or pollution level in the atmosphere [37], contour mapping for weather monitoring, determining direction of flow of smoke clouds and so on. Figure 2, illustrates how the flow of sensor data can be shaped to improve the efficiency of data monitoring task at hand. The data flow diagram in Fig. 2(a) shows the bird's eye view of the sensor field, with data streams emerging from more than one target regions and being merged at intermediate nodes in the network into a single output stream directed toward the sink. Observe that the area of a target region could be much smaller as compared to the area of the entire sensor field or its distance from other target regions in the field. Fig. 2(b), zooms into target region

Part (a) shows data flow in the second level of computation hierarchy to process data generated at target regions using the proposed in-network query processing architecture. Part (b) shows data flow at the first level of computation hierarchy to collect data from multiple sources in the target region at a final aggregation point.
Another useful way of categorizing such spatio-temporal applications is based on the type of sensor network installation. Jain et al. [38] have broadly classify spatio-temporal application into two classes, namely infrastructure-based monitoring and field-based monitoring, each influencing placement of nodes and thus the network organization as per their inherent physical factors.
Sensor nodes are The layout (blueprint) of the infrastructure (like machine or a freeway) where the sensor nodes are to be placed is known. As nodes may have been manually placed while manufacturing or embedded in the structure at desired monitoring points, location of sensors may be known to the user. It might be possible to have a renewable source of power to some of the nodes. The communication topology should adapt to the physical limitations (shape or surface area where nodes are placed) of the anticipated infrastructure.
Two important inferences from the latter observations are firstly, placement of nodes is influenced by the underlying structure or the building being monitored, which in turn affects the network topology. Secondly, knowledge of node locations or availability of some areas that are easily accessible can be exploited to provide intermittent resource-rich devices that can act as data loggers or data processors or gateway nodes. For example, Intel [39] has developed a heterogeneous network that improves scalability of wireless sensor networks for a number of monitoring applications. They overlay one 802.11-based mesh network on a sensor network analogous to a highway overlaid on a roadway system. Data is collected from local sensor nodes and transmitted across the network through the faster, reliable 802.11 network. The network lifetime is therefore enhanced by offloading the communication overhead to the high-end 802.11 nodes. This network is capable of self-re-organization in case any 802.11 node fails.
Similarly, computation intensive tasks may be offloaded from low power sensor motes to high-performance nodes in an infrastructure-based application such as leakage detection in a water distribution system. Adapting to the physical layout of a water distribution system, a high power node called a query processor (QP) may be placed at each junction point of pipes to process the data arriving from two or more sections of a drainage system. A self-learning, distributed algorithm can enable nodes to switch among various needed roles of data collection, processing or forwarding among themselves so that the network continues retrieving data in spite of failure of a few nodes. Human intervention would be deemed necessary only when a majority of the nodes in an area malfunction or run out of the battery power.
Similarly, a sensor network for home automation is heterogeneous in terms of the energy capacity of sensors [40]. Although some in-home sensors are directly connected to mains, others are powered by batteries of variable capacities. Besides there are some fixed communication paths in the topology as position of certain sensors relative to others is fixed. Ma et al. [40] propose a Resource Oriented Protocol (ROP) that determines optimal network architecture based on available resources and characteristics of different sensors. By utilizing lifetime of those nodes with limitless battery, they reduce energy consumption of energy-constrained sensors. Thus, having a heterogeneous sensor network organization introduces a lot more flexibility in employing more sophisticated and interesting data processing algorithms [39], [41].
Cayirci et al. [42] have developed a sensor network architecture to manage relief operation whenever there is a large-scale disaster. Sensor nodes are randomly deployed before a disaster occurs and more powerful central nodes are installed close to emergency operation centers and airports. The central nodes could be used to communicate with the sensor nodes that detect presence of victims as well as communicate with a central database. Embernet [43] has developed a mesh networking architecture for sensor network applications that could route traffic when many kinds of interferences and harsh conditions are present such as water treatment plants, heat control, and power stations. The Embernet network allows executing application-specific sensing and control software at the nodes and communicates using TCP/IP in the networks. Some other infrastructure-based applications that may influence query processing and networking design to enable a real-time decision support using a sensor network are as follows:
Su et al. [45] investigate the application of sensor networks to the film industry. They augment the film and video footage with sensor data by deploying unobtrusive sensors on both the performers and in the studio. Sensor data such as light intensity, color temperature, and location are collected and synchronized with each video frame, which is later viewed by editors in synchronization with the video playback to achieve seamless integration between computer graphics and real world photography. The second category of applications that involve spatio-temporal queries is field-based monitoring, which is described in what follows.
Field-Based Monitoring
A given space area is monitored using a dense network of
Agrawal et al. [37] propose use of sensor networks for environmental monitoring to determine the composition of the land fill gas (LFG) to ensure its commercial quality. Unlike existing labor-intensive techniques that perform monitoring periodically, sensors can be installed to continuously monitor the landfill and its neighborhood as the nature of gas emissions is unpredictable. Similarly, air quality monitoring can be performed in real time without the need to install and handle bulky and expensive existing field equipment. They suggest a Geographical Information System (GIS) should be integrated with the sensor network to combine various levels of data, such as local geography, demographics, point sources, highways, sensor locations, and emission data.
Another fertile area of application driven design is to advocate cross-layer design where the application layer has the flexibility to intercept the routing or the MAC layer to process the data in a manner that could affect optimality of its current route. A good example is to use a data aggregation algorithm at a local cluster head, which might purge data packets and eliminate redundancy. Although the nodes can be preprogrammed to determine the best route, provision of cross-layer design mechanism helps adapting to a wider range of application demands such as service differentiation, data-centric routing, or in-network processing and storage. It can therefore lead to better system efficiency by employing simple tools such as network filters to intercept data or execute run-time optimization techniques. Cross-layer design has been explored in ad hoc networks for interaction of power control mechanism with the MAC layer design [49] and has a good scope in sensor networks as application demands may vary during the network lifetime depending on the results at a given time.
Collaborative Signal and Information Processing
Collaboration among sensors can be used to aggregate multitude of sensor data in order to detect, classify, and track moving phenomenon. The phenomenon might be observable from a large number of sensors along its path with varying signal-to-noise levels. Besides information extraction, collaborative signal processing minimizes bandwidth consumption and provides resilience to node failure. Applications such as target tracking that are widely used in several commercial and military surveillance operations, such as battlefield situational awareness, highway traffic monitoring, fluid leakage or dispersion monitoring, and campus security rely heavily on collaborative signal processing.
A central issue of sensor collaboration is the selection of the sensors themselves, that is, to determine which sensors to invoke for sensing during target tracking to save maximum energy and enhance data accuracy. Ramanathan [50] proposes a location-centric approach for collaborative target detection as applications typically require collaboration among devices in the area of interest rather than an arbitrarily specified set of sensors. Motivated by the relevance of geographical proximity, Liu et al. [51] have also proposed a distributed group management protocol to form collaborative groups of sensors that coordinate their behavior using geographically limited message passing. Besides location, motion prediction, or knowledge of degree of change in the values of parameters under observation are other factors that contribute to the sensor selection. Zhao and his group [52], [53] have introduced information-driven dynamic sensor collaboration that builds on directed diffusion by allowing local nodes to make routing decisions and dynamically optimize information gain for a given cost of resource consumption. Information gain or utility is measured by predictions made based on tracking the past history and is used to invoke sensors for future processing. The resource cost considers both network spatial configuration and communication cost. Brooks et al. [54] also use data to guide sensor communication for entity tracking and local collaborations to provide estimates for parameters such as velocity, position, and entity type at points along the track. After entity detection and classification, association algorithms are used to determine which state estimates refer to the same object. Sequences of state estimates form tracks for future course of entities and allocate sensors to continue tracking the entity.
Liu et al. [55] further demonstrate effectiveness of information-driven approach for target tracking by using two types of sensors, acoustic amplitude sensors, and direction-of-arrival sensors, which are small microphone arrays. Based on the physics of sound attenuation the acoustic amplitude sensors estimate the distance of the vehicle by measuring sound amplitude at each microphone. The orientation of the vehicle is determined from the direction of sound using beamforming techniques.
Mostly clustering schemes in sensor networks have been proposed for routing, data gathering, and band-width reuse. Ghen et al [56] propose a dynamic clustering algorithm for acoustic target tracking in sensor networks. The cluster architecture facilitates collaborative information processing, but static clusters prevent sharing of information among sensors belonging to different clusters, which affects the accuracy of estimating target locations. In the dynamic clustering approach, a cluster is only formed in the area of high event concentration and sensors are invited to become members of that cluster. Redundant data is successfully reduced by a single cluster and sensors may support different clusters at different times. They use Voronoi diagram to select a cluster head closest to the target and create a backbone of sparsely placed cluster heads.
Data Handling and Query Processing
Efficient data handling systems enable scientists to observe, analyze, and query distributed sensor data at multiple resolutions, while exploiting spatio-temporal correlation. Recently, there has been a growing emphasis on building a scalable data management layer in sensor networks that performs in-network processing. In-network query processing is an attempt to combine, compare, and process data generated at source nodes within the network to eliminate long-haul shipment of sensor data to the sink for offline query evaluation. So far, there has been extensive research on the development of data aggregation algorithms to combine, fuse, and compress data into a smaller number of bytes near the data sources. This has helped reduce data volume originating from geographical region in the network, while the need to process aggregated data has recently risen for the applications that demand complex query evaluation over geographically separated distributed data. Spatio-temporal queries (refer to the data model described in an earlier section) are a good example of the types of queries needed by these applications. Similar to the relational database model, continuously arriving data is represented by streams, and queries may be represented by SQL or in the form of query trees. The query evaluation is performed within the network and not at the sink, as partial computations are merged at appropriate locations/nodes in the network [57].
Unlike traditional databases, [58] query optimization in sensor networks is conducted with the objective of minimizing data transfer between two query processors separated by several hops in the network. As continuous query processing techniques [59] developed for the Internet are not directly applicable to the case of power stringent sensor networks, query optimization is geared towards minimizing computational and communication overheads. For example, Madden et al. [60] have proposed the use of a semantic routing tree that eliminates data forwarding at query processors and computed value of aggregated function is below a threshold. TinyDB [61], a query processing system developed by Berkeley researchers uses an SQL interface to accept user queries and a database model is imposed on the network to evaluate query results that does not require the user to write explicit software to process data. The SQL constructs are optimized to make in-network query processing energy efficient and handle continuous data streams by including clauses to specify query lifetime, window intervals, and data arrival rate. A prototype for distributed data management infrastructure called Cougar [62] has been designed and implemented at Cornell to develop in-network aggregation, integration of routing with query processing, a probabilistic data model, and query optimization techniques. Intel [39] is developing the next generation motes that will have higher resources compared to the Berkeley mica motes for the half size of the mica mote. They propose a heterogeneous network by embedding higher processing power on selected nodes in a network of large number of inexpensive nodes to enable in-network processing.
Another interesting data flow model is the multiple sink, single source model where users in different parts of the network pose similar queries at identical data sources. This has opened a new area of research, which is multi-query optimization that minimizes query processing overhead when queries posed by different users overlap and there is a significant scope for reusing partial computations. Besides in-network processing, researchers are also looking at the possibility of in-network data storage. The existing distributed storage systems cannot be used for sensor networks as there are fundamental differences in the cost models, nature of data, and intended use of the sensor networks. Ratnasamy et al. [63] proposed data-centric storage as one possible approach to wide-area data dissemination and identified those circumstances where data-centric storage may be relevant. They introduced a sensor network storage architecture that leverages the look up complexity of Distributed Hash Tables (DHT) for event storage. Data centric storage implies that a particular node that stores a given data object, is determined by the object's name. Therefore, all data with the same general name will be stored at the same senor node. Thus, queries for data with a particular name, can be sent directly to the node storing these named data, thereby avoiding the query flooding typically expected in data centric storage.
Middleware for Wireless Sensor Networks
As wireless sensor networks are being increasingly deployed for a large variety of applications, the need to develop appropriate middleware that could support and coordinate concurrent applications on sensor networks is being emphasized. Traditional distributed middleware such as DCOM or CORBA are not suitable for sensor networks as they are memory and computation intensive. Yu et al. [64] describe integration of application knowledge into services as an important principle for middleware design. Easily implementable, lightweight middleware architecture such as Sensorware [65], Milan (Middleware Linking Applications and Networks) [61] have been recently proposed in literature. Some important objectives of sensor network middleware as identified by Romer et al. [67] are mechanisms for formulating complex high-level sensing tasks, coordinating task distribution and assignment among sensors, merging individual sensor readings into useful high-level results and effectively dealing with the heterogeneity of sensor nodes. TinyOS [68], written in C-link language, supports variable length packets and data link level synchronization and could provide rough estimates of network activity. TinyDB [61] employs SQL-like queries and can be used for numerous applications. But, there are several limitations including size of registers, processing speed, and the way interrupts are handled and it is an active area of ongoing research. One can potentially expect increased adoption of sensor-based micro-web servers.
Sensor Management and Qos Issues
After the sensor network is deployed in an unpredictable environment, a high-level management protocol can help notify users of resource depletion or abnormal activities. Limited energy and bandwidth resources in sensor networks make extracting states from each individual node infeasible. Estrin et al. [69] have proposed an in-network aggregation approach of network states to construct abstracted scans of sensor network health. The objective is to build an efficient monitoring infrastructure for sensor networks, where the scans describe the geographical distribution of network resources or activity of a sensor filed. Using application-level knowledge to optimize the network protocol is another goal of sensor management protocols. Application layer management protocols aim at seamlessly integrating sensor networks with the Internet. Importance of supporting QoS for sensor networks is also gradually being realized in order to have the flexibility of discriminating among the type of data that the sensors are reporting [70]. Priority-based data handling is used to reduce the reporting delay and increase the chance of reception of crucial sensor data.
Conclusion
Sensor networks is an important application area [72] that affects a wide range of potential monitoring applications. Its success can be judged by the fact that the University of Washington [71] has decided to integrate wireless sensor networks in an undergraduate embedded systems course to expose students to this emerging technology as the core of their computer engineering curriculum. Sensor Networks seem to be very useful for programmable applications where it is desirable to have self-organizing characteristics, coupled with sensed attribute and easy aggregation of data while minimizing energy consumption. The existing wireless technologies like Bluetooth, HomeRF, or 802.11 are not suitable for such networks because of the sheer number of sensor nodes involved, severe energy constraints, and scalability issues. A lot of progress has been witnessed in recent years on information retrieval [73] from sensor networks and the future seems to be very bright [74]. But unless sensor node hardware is deployed for an increasing number of applications, the real challenges faced in networking and coordinating sensors for actual implementations will be unknown to users. Further work is necessary in the areas of application driven design, topology control, and appropriate security and adequate privacy in wireless sensor networks.
