A Fault-Tolerant Model of Wireless Sensor-Actuator Network

Abstract

In a wireless sensor and actuator network (WSAN), a group of sensor nodes, actuators, and actuation devices is geographically distributed and linked by wireless networks. Sensor nodes gather information for an event occurring in the physical world and send the sensed values to actuator nodes. Actuator nodes perform appropriate actions on actuation devices on receipt of sensed values from sensor nodes. Sensor nodes are a low cost, low-powered device with limited energy, computation, and wireless communication capabilities. Sensor nodes may suffer from arbitrary faults. Furthermore, wireless communications are unreliable due to collision and noise of a wireless channel and shortage of power of sensor nodes. Reliable efficient communication among sensor nodes, actuator nodes, and actuation devices is required in the presence of node and network faults. In order to realize the reliability and efficiency, we newly propose a multi-actuator/multi-sensor (MAMS) model where each sensor node sends sensed values to multiple actuator nodes and each actuator node receives sensed values from multiple sensor nodes for each event occurring in an event area. Even if some number of sensor nodes and actuator nodes are faulty and messages sent by sensor nodes and actuator nodes are lost, a required action can be performed on actuation devices. Actuator nodes are required to receive some messages from some number of sensor nodes to make a decision on what actions to be performed on actuation devices. Here, multiple actuator nodes may perform actions on receipt of sensed values. Multiple redundant executions of an action on each device have to be prevented and conflicting actions on each actuation device from multiple actuators have to be serialized. In this paper, we discuss a method to reliably and non-redundantly perform actions. On the other hand, there is one primary actuator node and multiple backup actuator nodes in the passive model. Only the primary actuator node performs actions. In this paper, we present a semi-passive coordination (SPC) model where not only primary but also backup actuator nodes receive messages from sensor nodes. Only one actuator node performs an action on an actuation device and a backup actuator node takes over the primary one if the primary gets faulty. In addition, multiple actuator nodes issue actions to each actuation device. We discuss how to resolve redundant execution of an action on an actuation device and serialize conflicting actions from multiple actuator nodes.

Keywords

Wireless Sensor and Actuator Network (WSAN)Fault-Tolerant Model Semi-Passive Coordination Protocol Redundant Action Execution

1. Introduction

A wireless sensor and actuator network (WSAN) is a collection of sensor nodes, actuator nodes, and actuation devices linked by wireless networks to perform distributed sensoring and acting tasks [1, 2, 16]. Sensor nodes gather information about the physical world. Actuator nodes are capable of making a decision on actions for sensed values gathered by sensor nodes and perform the actions on actuation devices. WSAN is used in sensoring applications like microclimate control, home automation, environmental monitoring, and target tracking [1,2]. WSAN is one of the most significant technologies to realize ubiquitous computing systems [21]. There are many discussions on how to reliably and efficiently broadcast messages among sensor nodes and actuator nodes [5,16].

Sensor nodes are low-cost, a low-powered device which are equipped with limited energy, computation, and wireless communication capabilities. Sensor nodes may stop, even malfunction [14] due to the out-of-charge and fluctuation of observed phenomena in the physical world. Multiple sensor nodes send and receive messages in one wireless broadcast channel. If more than one sensor node simultaneously send messages, the messages collide in the channel and are lost. Thus, the wireless communication between sensor nodes and actuator nodes is less reliable, i.e. messages may be lost due to noise and collision. We discuss how to make a WSAN tolerant of faults of sensor nodes and wireless communication links. If some event occurs in an event area of the physical world, sensor nodes gather values showing physical phenomena of the event and send the sensed values of the event to actuator nodes. In our multi-actuator/multi-sensor (MAMS) model, each sensor node sends sensed values to multiple actuators and an actuator receives sensed values for a same event from multiple sensor nodes in order to be tolerant of faults of sensor nodes and wireless networks. Even if some sensor nodes are faulty and messages are lost in the wireless link, each actuator node can receive proper sensed values from the other proper sensor nodes. For example, if sensor nodes are arbitrarily faulty [14], an actuator node takes a kind of majority-based decision on sensed values from multiple sensor nodes. If an actuator node is faulty, another operational actuator node can perform actions on actuation devices. Multiple actuator nodes must cooperate to receive every sensed value from sensor nodes, make a decision on actions, and perform the actions on actuation devices. There are principally a pair of methods for coordinating replicated processes, active and passive [22] replications which are discussed in distributed systems. In the active coordination model, every actuator node receives the same messages from sensor nodes and perform the same actions on actuation devices. Here, multiple actuator nodes issue the same action to an actuation device. If the action is an update type, the action should be performed only once on the actuation device. We discuss conflicting relations among actuator nodes on actuation devices. We discuss methods to prevent multiple executions of an action and to serialize conflicting actions from multiple actuator nodes on an actuation device. We discuss how to prevent the redundant execution of each action on an actuation device.

On the other hand, there is one primary actuator node and the others are backup actuator nodes in the passive coordination model. Only the primary actuator node makes the decision on an action and performs the action on an actuation device. The primary actuator node regularly sends the local state to every backup actuator node. Each backup actuator node changes their state to the primary actuator's state. In the semi-passive coordination protocol, not only the primary actuator node but also every backup actuator node receives messages with sensed values from the sensor nodes. Here, there is one primary actuator node which makes a decision on actions to be performed and performs the actions. If the primary actuator node is faulty, one of the backup actuator nodes takes over the primary actuator node. In this paper, we discuss a semi-passive coordination (SPC) protocol for coordinating multiple actuator nodes.

In section 2, we present a multi-actuator/multi-sensor (MAMS) model for realizing a fault-tolerant wireless sensor and actor network (WSAN). In section 3, we discuss how to make sensor nodes and sensor-actuator communications fault-tolerant in the presence of node faults and message loss. In section 4, we discuss a semi-passive coordination way on how to realize reliable sensor-actuator communications. In section 5, we discuss how actuator nodes reliably and non-redundantly perform actions on actuation devices.

2. System Model

2.1. Sensor Nodes, Actuator Nodes, and Actuation Devices

A wireless sensor and actuator network (WSAN) W is composed of three types of nodes, sensor nodes, actuator nodes, and actuation devices interconnected with a wireless channel [1, 2, 16]. A sensor node gathers values about the physical world like air temperature and sends the sensed values to actuator nodes. An actuator node makes a decision on what actions to be performed on actuation devices based on the sensed values from the sensor nodes and then performs the actions on the actuation devices. Let S be a set of sensor nodes, A be a set of actuator nodes, and O be a set of actuation devices in a WSAN W, i.e. W = 〈S, A, O〉.

Phenomena in the physical world is changed on occurrence of an event. Each event is characterized in terms of attributes like temperature and acceleration. Let Ω be a set of all the attributes to be sensed by sensor nodes.

Let Ω(e) show a scheme of an event e which is a subset of the attributes of the event e(Ω(e) ⊆ Ω). An event e is represented as a collection of values of attributes. Let e.a show an attribute a of an event e. For example, a tuple of values 〈…, 15[°C], N35°69'11.74” E139^°22'19.33”, …〉 shows an event e of a scheme Ω(e) = 〈…, temperature, location, …〉 where the event e occurs in Tokyo Denki University, Hatoyama, Japan. The value e.temperature is 15[°C] and e.location is N35°69'11.74” E139°22'19.33”. If an event occurs in some location of the physical world, sensor nodes in some distance from the location gather values of some attributes of the event.

A WSAN W is partitioned into event areas, W₁, …, W_m (m  1). An event area W_i is a geographical unit of a WSAN W. Each event area W_i is composed of sensor nodes, actuator nodes, and actuation devices. Let S_i, A_i, and O_i show sets of sensor nodes {s_il, …, s_ili}, actuator nodes {a_il, …, a_imi}, and actuation devices {o_il, …, o_ini,} in an event area W_i, respectively, i.e. S_i ⊆ S, A_i ⊆ A, and O_i ⊆ O.

A sensor node s_ik gathers information on physical phenomena, i.e. properties of an event e occurring in an event area W_i. A type of a sensor node is a subset of the attributes, i.e. type of information on which the sensor node can gather. Let Ω (s_ik) be a type of a sensor node s_ik, i.e. a collection of attributes (Ω(s_ik) ⊆ Ω). For example, a voice sensor node v can gather voice information, Ω(v) = {voice}, and a temperature sensor t can gather information on temperature, Ω(t) = {temperature}. e[s_ik] shows values for an event e which a sensor node s_ik gathers, i.e. {e.a | a ∊ Ω.(s_ik)}. Multiple sensor nodes sense a same event e in an event area. For a pair of sensor nodes s_ik and s_ih in an event area W_i, sensed values e[s_ik] and e[s_ih] may not be the same, e[s_ik] ≠ e[s_ih], e.g. due to sensor error and different sampling intervals. A sensor node s_ik gathers values of an event e different from another sensor node s_ih even if the sensor nodes s_ik and s_ih are a same type, i.e. Ω(s_ik) = Ω(s_ih). Then, the sensor node s_ik sends the sensed value e[s_ik] of the event e to actuator nodes in an event area W_i.

An actuator node a_is receives sensed values from sensor nodes in an event area W_i. Then, the actuator node a_is makes a decision on what actions to be performed on what actuation devices. Then, the actuator node a_is performs the actions on the actuation devices in W_i.

Actuation devices which act to the physical world like robot arms and air conditioners are modeled to be objects [9] in this paper. An object is an encapsulation of a state and methods for manipulating the state. Actuation of a device is modeled to be execution of methods on the actuation device object. An actuator node issues a method to an actuation device object. On receipt of a method, the method is performed on the device object. For example, cooler air is ventilated by the air conditioner object ac on receipt of a method (turn) down.

Figure 1 shows the relations of sensor nodes, actuator nodes, and actuation device objects.

Figure 1.

Sensor nodes, actuator nodes, and actuation device objects.

2.2. A Wireless Communication Channel

There are multiple types of nodes, sensors nodes, actuators, and device objects in each event area W_i. A node sends a message in a broadcast channel of a wireless network. A message sent by a node can be received by nodes in an event area W_i depending on the radio field intensity. Let C(c) be a set of nodes which can receive a message sent by a node c in W_i. C(c) is named collision area of a node c. C(c₁) = C(c₂) does not always hold for every pair of nodes c₁ and c₂. Every node in an event area W_i communicates with the others through a wireless channel. If multiple nodes send messages at the same time, the messages are lost due to the collision. In order to resolve the collision in the channel, some synchronization protocol like CSMA[13], CSMA/CD[11] or CSMA/CD[12] is used. For example, CSMA is used in the sensor node MICA2 with TinyOS[20] because of the simplicity.

2.3. Multi-Actuator/Multi-Sensor (MAMS) Model

In the single-actuator (SA) model [1], there are one actuator node and multiple sensor nodes in an event area. If an event occurs in an event area W_i, sensor nodes sense values of the event like temperature. Sensor nodes in an event area send the sensed values to one actuator node. Here, the actuator node may be a single point of failure. On the other hand, multiple sensor nodes and multiple actuator nodes in an event area are interconnected in the multi-actuator (MA) model [1]. Each sensor node sends sensed values to one actuator node but some pair of sensor nodes in an event area may send sensed values of an event to different actuator nodes. If messages with sensed values from a sensor node to an actuator node are lost or some number of sensor nodes are faulty, the sensed values of the event obtained by the sensor node cannot be delivered to the actuator node. In order for at least one actuator node to make a decision, more number of sensor nodes are required to send sensed values to the actuator node.

In this paper, we take a multi-actuator/multi-sensor (MAMS) model [Fig. 2] [18] to realize the reliable sensor-actuator communication as shown in Fig. 2. Suppose a sensor node s sends sensed values to an actuator node a. Here, the actuator node a is referred to as parent of the sensor node s and the sensor node s is in turn a child of the actuator node a. Let AA(a_it) and AS (a_it) be sets of actuator nodes and sensor nodes, respectively, which can receive a message sent by an actuator node a_it in an event area W_i. C(a_it) = AA(a_it) ∪ AS(a_it). Let SA(s_ik) and SS(s_ik) be sets of sensor nodes and actuator nodes, respectively, which can receive a message sent by a sensor node s_ik in W_i. Here, C(s_ik) = SA(s_ik) ∪ SS(s_ik).

Figure 2.

MAMS model.

Figure 3.

Collision.

Let AE(e) (⊆ A_i) be a subset of the actuator nodes which can receive sensed values sent by the sensor nodes, i.e. {a_it|a_it ∊ SA(s_ik) and s_ik ∊ SE(e)} in an event area W_i.

Each actuator node a_it in the set AE(e) may receive sensed values from multiple sensor nodes for an event e occurring in an event area W_i. An actuator node a_it can deliver a message to every node, i.e. actuator and sensor nodes in the event area W_i while a sensor node s_ik can deliver a message to only a subset of the nodes. The more number of messages each actuator node sends, the more number of messages may be lost due to collision. In this paper, we make the following assumptions on communications among actuator nodes and sensor nodes in an event area W_i:

Each actuator node a_it can directly deliver a message to every actuator and sensor node in an event area W_i, i.e. AA(a_i) = A_i and AS(a_i) = S_i.

Each sensor node s_ik can deliver a message to only subsets of the actuator nodes and sensor nodes in W_i, i.e. SA(s_ik) ⊆ A_i and SS(s_ik) ⊆ S_i.

From the assumptions 1 and 2, each actuator node a_it receives every pair of messages m₁ and m₂ sent by actuator nodes and sensor nodes in the same order if the actuator node a_it receives both the messages m₁ and m₂. The assumption 2 means that even if an actuator node a_it receives a messages m from a sensor node s_ik, another actuator node a_iu may not receive the message m.

In sensor-actuator communications, each actuator node a_it has to make the following decisions:

The actuator node a_it makes a decision on a value v_ik from sensed values sent by the child sensor nodes. Then, one value v is taken from a set of the values v_il, …, v_im, where each value v_it is taken by an actuator node a_it (t = 1, …, m_i). For example, a majority value v in the set v_i1, …, v_{im
_i} is taken if values are a discrete type. The average value of the values may be taken for a continuous attribute.

The actuator node a_it makes a decision on which method op to be performed on which actuation device for the sensed value v.

Every actuator node a_it has to make an agreement on one discrete value in the domain for a discrete attribute.

On the other hand, a continuous attribute takes a continuous value like temperature. Even if a pair of sensor nodes s_ik and s_ih are proper, the sensor nodes s_ik and s_ih may send different continuous values to actuator nodes. The actuator node a_it obtains a value, e.g. average value and median of the sensed values.

2.4. Object-Based Actions

An actuator node performs actions through an actuation device object in an event area on receipt of sensed values from sensor nodes. An action is modeled to be the execution of a method on an actuation device object. On receipt of a method issued by an actuator node, the method is performed on an actuation device object. An object may be composed of subobjects. For example, a robot arm is composed of multiple component objects like motors and servo controllers [15]. Thus, an object is hierarchically structured in a part-of relation [9]. In a method issued to an object, methods on the subobjects are furthermore invoked. Thus, methods are invoked in a nested manner [6]. There are two types of objects, semi-active and passive types of objects. On receipt of a request of a method op from an actuator node, the method op is performed on an actuator device object o. Hence, suppose a method op' on another actuation device object o' is furthermore invoked in the method up. If the actuation device object o is semi-active, the actuation device object o directly issues the request op' to the actuation device object o'. Thus, a semi-active object can issue methods to other actuation device objects [Fig. 4]. A requesting entity of the method op is not conscious of how methods are invoked in the method op. On the other hand, a method op issued is only performed on a passive actuation device object o and the actuation device object o does not issue a method to another actuation device object. If a method op' is to be issued in the method op, the actuation device object o informs the actuator node [Fig. 5]. Then, the actuator node issues a request op to the actuation device object o'. A passive object can perform methods but does not issue methods to other actuation device objects.

Figure 4.

Semi-active object.

Figure 5.

Passive object.

Let op(s) show the result obtained by performing a method op on state s of an actuation device object in the world. Here, let op₁ and op₂ be a pair of methods of an actuation device object o. Let op₁ ∘ op₂ show a serial execution of methods op₁ and op₂ on an actuation device object o. Let op₁ ‖ op₂ denote the parallel execution of methods op₁ and op₂ on an actuation device object o. A method op₁ is referred to as equivalent with another method op₂ on an actuation device object o (op₁ ≡ op₂) if the result obtained by performing the method op₁ on every state of the actuation device object o is the same as op₂. Ø shows a null method which does nothing on the actuation device object o. There are the following relations among a pair of methods op₁ and op₂ of an actuation device object o:

A pair of the methods op₁ and op₂ conflict with one another on an actuation device object o if and only if (iff) op₁ ∘ op₂(s) ≠ op₂ ∘ op₁(s) for some state s, i.e. the result obtained by performing the methods op₁ and op₂ depends on the computation order. That is, op₁ ∘ op₂ ≢ op₂ ∘ op₁.

A pair of methods op₁ and op₂ are compatible (do not conflict) with one another iff op₁ ∘ op₂(s) = op₂ ∘ op₁(s) for every state s, i.e. op₁ ∘ op₂ ≡ op₂ ∘ op₁.

A method op₂ absorbs another method op₁ iff op₁ ∘ op₂(s) = op₂(s) for every state s, i.e. op₁ ∘ op₂ ≡ op₁.

A method op₂ compensates another method op₁ iff op₁ ∘ op₂(s) = s for every state s, i.e. op₁ ∘ op₂ ≡ Ø. Here, op₂ is referred to as compensating method of op₁, denoted by ${\tilde{o p}}_{1}$ .

A method op₁ is idempotent if op₁(s) = op₁ ∘ op₁(s) for every state s, i.e. op₁ ≡ op₁ ∘ op₁.

Suppose an air-conditioner object ac supports five types of methods on, off, up, down, and temp. The airconditioner object ac is turned on, off, up, and down by the methods on, off, up, and down, respectively. In addition, the current temperature is obtained by a temp method. The method temp conflicts with all the other methods. A pair of the methods on and off conflict with the other methods up, down, and temp while a pair of the methods up and down are compatible with one another. The method off absorbs the other methods temp, on, up, and down. The methods on, off, and temp are idempotent. The method up can be compensated by the method down and vice versa, i.e. $u p \equiv \tilde{d o w n}$ and $d o w n \equiv \tilde{u p}$ . The method off compensates the method $o n, o f f \equiv \tilde{o n}$ Suppose an air-conditioner object ac is turned off if the temperature is 25°C. Then, the object ac is turned on. However, the temperature may not be 25°C. Thus, the method off does not always compensate the method on. The effect done by a method op can be removed by performing a compensating method $\tilde{o p}$ on an object o. Garcia-Molina et al. discuss Saga using the compensation of methods [8]. Yasuzawa et al. [23] discuss unresolvable deadlocks to occur by performing compensating methods in object-based systems where objects are hierarchically structured. In database systems, a transaction can be compensated by some transaction [7, 10]. However, there are uncompensatable methods on an actuation device object. For example, a method print-out cannot be compensated since physical papers are used out in a printer object prt. If a missile is launched, the launch method cannot be compensated.

An actuator node may issue a sequence of methods to objects. For example, a door object is required to be locked after closed. Thus, some multiple methods are required to be sequentially performed in a specified sequence. Suppose a key-lock method is issued after a door is closed. Here, suppose the door cannot be locked due to some trouble. In one case, an application is satisfied if the door is closed and does not care if the door is not locked. Here, it is considered to be all right because the door is closed. In another case, the closed door is required to be locked in another application. Here, all the actions performed have to be undone. That is, the door object is opened again. In the latter case, a sequence of close and lock methods should be atomically performed. A sequence of methods to be atomically performed is referred to as transaction [10]. Each method of close and key-lock is considered to be a transaction in the former case but a sequence of close and key-lock methods is not atomic, i.e. not a transaction. A transaction is required to abort if some method in the transaction cannot be successfully performed. Suppose a transaction is required to be undone after performing a method up on the air-conditioner object ac. In order to undo the transaction, a compensating method down of the method up is performed on the air-conditioner object ac, i.e. down ≡ $d o w n \equiv \tilde{u p}$ [8, 23]. On the other hand, a method like print-out cannot be compensated. Hence, transactions cannot issue uncompenasatable methods.

There are two types of actuation device objects with respect to how many methods can be concurrently performed, sequential and concurrent ones. In a sequential device object, at most one method can be performed at a time. For example, an actuation device with a single actuator node is a sequential object. On the other hand, an actuation device object is referred to as concurrent object if multiple methods can be in parallel performed on the object. A robot is an example of a concurrent object since arms and vehicles can be in parallel move.

Another point of actions on an actuation device object is that each execution of some method is not assumed to be atomic. It takes a longer time to perform a method on an actuation device object compared with the computation speed of an actuator node. For example, it takes some seconds to print-out papers on a printer device object while millions of instructions in a transaction can be performed in a computer for one second. Hence, some method is not atomically performed, i.e. the execution of a method may be interrupted although each instruction cannot be interrupted in a computer. An atomic method should be compensated even if the method is partially performed due to some fault.

Actuation device objects are furthermore classified to stateful and stateless ones. State of an object is changed by performing a method. In a stateful device object, a state can be stored in the stable memory, i.e. the checkpoint can be taken. The object can be recovered by restoring the state most recently checkpointed. On the other hand, the state cannot be stored in the stable storage. The object has to be recovered by performing compensating methods of methods performed.

3. Coordination Protocols of Multiple Actuators

3.1. Active and Semi-Passive Coordination Protocols

We discuss how multiple actuator nodes a_i1, …, a_im, (m_i  1) cooperate to make a decision on a value v for a collection of sensed values v_i1, …, v_ili from sensor nodes s_i1, …, s_il, and perform a method for the value v on an actuation device in an event area W_i. There are active [22], passive [22], semi-passive [4], and semi-active [22] coordination ways of multiple replica processes. In the active coordination of multiple actuator nodes, every actuator node a_it receives sensed values from the sensor nodes. Each actuator node a_it makes a same decision on a value from the sensed values received from the sensor nodes and performs a method for the value on an actuation device as shown in Fig. 6. First, an actuator node a_it receives a sensed value v_ik from a sensor node s_ik. The actuator node a_it stores a pair 〈v_ik, s_ik〉 of the sensed value v_ik and the sensor node s_ik in the buffer B_it. If the actuator node a_it receives sensed values from more than some number of sensor nodes, the actuator node a_it sends a list of the pairs in the buffer B_it to every other actuator nodes. Every actuator node a_it waits for lists of values from other actuator nodes. If the actuator node a_it receives the lists from more than some number of actuator nodes, the actuator node a_it takes a value v from a set of the values received from other actuator nodes and from sensor nodes. Then, the actuator node a_it makes a decision on a method op to be performed on an actuation device. Multiple actuator nodes issue the same methods to an actuation device. We have to resolve the redundant executions of the same method on the actuation device object as discussed later.

Figure 6.

Active coordination.

Figure 7.

Semi-passive coordination.

In the passive coordination way, one actuator node plays a role of a coordinator. Let a_i1 be the primary actuator node. The other actuator nodes a_i2, …,a_{im
_i} are backup actuator nodes in an event area W_i. On receipt of a sensed value v_ik from a sensor node s_ik, a backup actuator node a_it forwards the value v_ik to the primary actuator node a_it since the primary actuator node a_i1 may not receive the value v_ik from the sensor node s_ik due to communication faults (i = 1,…, l_i). Then, the primary actuator node a_i1 makes a decision on a value from the sensed values collected by the backup actuator nodes a_i2,…,a_{im
_i}. The primary actuator node a_i1 performs methods for the sensed value on actuation devices. If the primary actuator node a_i1 is faulty, one of the backup actuator nodes takes over the primary actuator node. In the active way, every actuator node receives the same sensed values from sensor nodes and performs same methods on actuation devices. As discussed in the paper [18], multiple redundant actions on an actuation device object have to be resolved. Some synchronization mechanisms like timestamp ordering (TO) scheduler [3], and locking protocol [7] have to be supported. In the passive and semi-passive ways, every backup actuator node neither makes a decision nor performs methods. On the other hand, each backup actuator node makes a decision on what methods to be performed in what order but does not perform the methods in the semi-active way. In the semi-active coordination and semi-passive coordination ways, every backup actuator node has to receive the same sensed values. In the semi-passive coordination way, no redundant execution of a method occurs since only a primary actuator node issues the method to an actuation device object. In addition, a backup actuator node can more easily take on the primary actuator node since every backup actuator node receives the same values as the primary actuator node makes a decision on. The semi-passive coordination way implies higher availability than the passive one.

3.2. Semi-Passive Coordination (SPC) Protocol

Suppose there is one primary actuator node a_ip and backup actuator nodes a_i1, …,a_{im
_i} (m_i 1). If an event occurs, a sensor node senses the event and sends a sensed value to actuator nodes in a wireless broadcast channel. An actuator node a_it receives a sensed value v_k from a sensor node s_ik in an event area W_i. Due to collisions and noise in wireless channels, some actuator node may not receive a sensed value v_k from a sensor node s_ik. On receipt of a sensed value v_k from a sensor node s_ik, an actuator node a_it sends a message m with the value v_k and the identifier of the sensor node s_ik to every other actuator node a_iu. Here, the actuator node a_it includes a sequence of sensed value and sensor node identifier in the message m, which a_it has so far received. How many pairs of sensed value and source sensor node are included in a message depends on the requirement of realtime communication. The higher the degree of realtime communication is required, the less number of values are included in a message to reduce the size of the message.

The primary actuator node, say a_ip receives messages from backup actuator nodes. A pair 〈v_k, s_ik〉 of a sensed value v_k and a sensor node s_ik which sends the sensed value v_k are carried by a message m. Here, if a pair 〈v_k, s_ik〉 in a message m have not been received by the actuator node a_ip, the pair 〈v_k, s_ik〉 in the message m is stored in the receipt buffer RB_ip. The receipt buffer RB_ip shows a set V_ip of values sensed by sensor nodes for an event. If the primary actuator node a_ip receives messages from more than na (l_i) sensor nodes, the primary actuator node a_ip broadcasts the set V_ip of the sensed values to all the backup actuator nodes [Fig. 8]. Here, f is a maximum number of sensor nodes which stop by fault. Then, na = (l_i — f) and l_i > f/3. Even if the faulty sensor nodes do not send sensed values, the majority in the sensed values are received by one proper actuator node.

Figure 8.

Broadcast phase.

The primary actuator node a_ip makes a decision on a value v from a set V_ip of the sensed values received from sensor nodes in the buffer RB_ip [Fig. 9]. If values are ones of a discrete attribute, the primary actuator node a_ip takes a majority value in the set V_ip. On the other hand, the primary actuator node a_ip first applies a function filter on the set V_ip of the values if values are a continuous type like temperature. Some sensor node might send an incorrect value due to the sensor error. The function filter removes values which are out of two-sided α% confidence interval in the set V_ip. α = 5[%] in this paper. Then, an average value v is taken by a function average. That is, a value v is obtained by average(filter(V_ip)).

Figure 9.

Decision phase.

Next, the primary actuator node a_ip sends an update message with the internal state, i.e. the value v obtained from sensed values to all the backup actuator nodes a_i1, …, a_{im
_i} [Fig. 10]. On receipt of the update message from the primary actuator node a_ip, a backup actuator node a_it stores the value v and sends an acknowledgment ACK message to the primary actuator node a_ip. If the primary actuator node a_ip receives ACK messages from more than a half of the backup actuator nodes, the primary actuator node a_ip sends a decided message to all the backup actuator nodes. On receipt of a decided message from the primary actuator node a_ip, each backup actuator node a_it updates the state by using the internal state stored. The primary actuator node a_ip makes a decision on a method op to be performed on an actuation device from the value v and then performs the method op on the actuation device.

Figure 10.

Update phase.

We introduce the following variables and functions to explain the semi-passive coordination (SPC) protocol:

Variables:

sens = a set of sensed values which an actuator node receives from sensor nodes.

recv = a set of values which an actuator node receives from actuator nodes.

ack = a set of acknowledgment messages which are received from backup actuator nodes.

store = an update value stored in the actuator node.

Functions:

receive S(V) = receive a set of values from sensor nodes in a set V.

receive A(V) = receive a set of values from an actuator node in a set V.

broadcast (V) = broadcast a set V of sensed values which have been received.

calculate (V) = calculate internal valuev from the set V of sensed values, i.e. average(filter(V)).

update(v) = update the internal value of a backup actuator node with a value v.

decide_op(v) = decide on a method op from a value v.

action(op) = perform a method op on an actuation device.

suspect() = record a suspected actuator node in a suspicion list.

send_nack() = send a negative acknowledgment message to every suspected actuator node.

We show the semi-passive coordination (SPC) protocol for actuator nodes a_i1, …, a_{im
_i}, as follows.

[Semi-passive coordination protocol]

Each actuator node a_ij performs the following steps:

Initially,

sens ← recv ← Ø; k ← 0;

store ← 0;

On receiveS(sens_ik) from a sensor node S_ik,

sens ← sens ∪ sens_ik; recv ← recv ∪ sens; if |sens|  na,

broadcast(sens); sens ← Ø;

On receiveA(recv_ip) from an actuator node a_ip,

recv ← recv ∪ recv_ip;

/* primary only */

if a_ij is a primary actuator node,

if |recv| l_i/2,

inter_val ← calculate(recv);

if update(inter_val) = true,

op ← decide_op(inter_val); action(op);

k ← k + 1;

when timeout_flag = true,

send_nack();

suspect();

if store.sequence >k,

if update(store) = true,

op ← decide_op(store);

action(op);

inter_val ← store; k ← k + 1;

when receiveA(inter_val_i) from actuator node a_i,

send_ack(a_i);

if inter_val_i.sequence >k;

store ← inter_val_i;

when receiveA(decide_op),

inter_val ← store; k ← k + 1;

[Update]

Initially,

ack ← Ø; broadcast(v);

when receiveA(ack_i) from actuator node a_i,

ack ← ack ∪ ack_i;

if |ack| m_i/2,

return true;

when receiveA(nack_i) from actuator node a_i,

return false;

Figure 11 shows the cooperation among a primary actuator node a_ip and two backup actuator nodes a_i1 and a_i2 according to the semi-passive coordination (SPC) protocol. Here, no fault occurs in actuator nodes. On the other hand, in Fig. 12, the primary actuator node a_ip gets faulty. Here, the primary actuator node a_ip calculates the internal value, i.e. the sensed value and sends an update message with the value to each of the backup actuator nodes a_i1 and a_i2. Suppose the update messages are delayed, i.e. the backup actuator nodes do not receive the update message from the primary actuator node a_ip. The backup actuator nodes a_i1 and a_i2 suspect the primary actuator node a_ip to be faulty due to time out. Then, each backup actuator node a_is considers the primary actuator node a_ip to be faulty and sends a nack message to the primary another a_ip. Then, one of the backup actuator nodes, say a_i1 which has the smallest identifier, takes over the primary actuator node by selection algorithm [17]. The new primary actuator node a_i1 calculates an internal value v from a set of received sensed values and sends an update message with the value v to the backup actuator nodes, say a_i2. If the new primary actuator node a_i1 can receive ack messages from more than a half of proper backup actuator nodes, the new primary actuator node a_i1 takes a method op and sends a decided message with op and performs the method op on an actuation device.

Figure 11.

Semi-passive coordination.

Figure 12.

Example: timeout.

4. Actuator Nodes

4.1. Non-Redundant Execution of a Method

In the active coordination (AC) protocol, every actuator node a_it issues a method op to an actuation device object o_ij. On receipt of sensed values of an event e from sensor nodes, each actuator nodes a_is makes a decision on what methods to be performed on what actuation device objects and then performs the methods on the actuation device objects in an event area W_i. In the multi-actuator/multi-sensor (MAMS) model, each of the multiple actuator nodes a_i1, …, a_im, receives sensed values of an event e from multiple sensor nodes S_i1, …, S_{il
_i}, in the event area W_i. Suppose that a method op is decided to be performed on an actuation device object o by the actuator nodes in the event area W_i for an event e. If each actuator nodes a_is makes a decision on the method op and performs the method op, the method op is m_i (1) times performed on the actuation device object o. Here, the state of the actuation device object o may get inconsistent. For example, suppose each of the two actuator nodes a_is and a_it receives sensed values of an event ε on the temperature. Each of the actuator nodes a_is and a_it makes a decision on warming the air with 2°C degree. Then, each of the actuator nodes a_is and a_it issues a method up(2[°C]) to the air-conditioner device object ac. The temperature is as a result increased by 4°C because a pair of up(2[°C]) methods are performed on the actuation device object ac. Here, the method up(2[°C]) should be performed only once on the air-conditioner device object ac for the event e even if multiple actuator nodes receive the sensed information of the event e from sensors. In the MAMS model, the redundant invocations of a method by multiple actuator nodes have to be resolved.

Next, suppose each of the actuator nodes a_is and a_it issues a method temp to an air-conditioner device object ac. Here, the state of the air-conditioner device object ac is not changed. Thus, even if multiple actuator nodes issue an idempotent method such as temp to an actuation device object, the state of the actuation device object does not get inconsistent. Following the examples, we classify methods supported by actuation device objects into the following types of methods with respect to whether or not the state of an actuation device object is changed.

Change method.

Non-change method.

Through a change type of a method op, the state of an actuation device object in an event area is changed, i.e. op(s) ≠ s for some state s. The method up is a change type on an air-conditioner device object ac. On the other hand, the state of an actuation device object is not changed by a non-change method, i.e. op(s) = s for every state s. The method temp is a non-change method. For each event e, a change type method op can be performed only once even if multiple instances of the method op are issued to the actuation device object by multiple actuator nodes. A non-change method can be performed multiple times on an actuation device object, i.e. idempotent. However, the performance of the actuation device object may be degraded if more number of methods are performed on the actuation device object.

There are following approaches to realizing the unique execution of a method on an actuation device object:

Actuator-side approach.

Object-side approach.

In the actuator-side approach, only one method is issued to an actuation device object from multiple actuator nodes. On receipt of a method, the method is just performed on an actuation device object. In one way, the actuator nodes cooperate with each other to make a consensus on which the actuator node issues a method to an actuation device object o. Then, only the selected actuator node issues the method while the other actuator nodes do not issue the method to the actuation device object o. It takes time to exchange messages among the actuator nodes, e.g. it takes three rounds if the two-phase commitment protocol [19] is taken.

In the Object-side approach, an actuation device object o takes a method op only once even if each of the multiple actuator nodes sends the method to the actuation device object o. Each actuator node issues a method to an actuation device object independently of other actuator nodes. Each of the actuator nodes issues a method op to an actuation device object o. Here, each instance of the method op to be taken for an event should be uniquely identified. On receipt of a method op from an actuator node, the identifier id of an instance of the method op is checked on an actuation device object o. If the method with the identifier id has not yet been performed on the actuation device object o, the method op is performed on the actuation device object o. Then, the identifier id is recorded in the log. If the identifier id of the method op is found in the log, the method op is not performed since another instance of the method op has been already performed on the actuation device object o. In a WSAN, realtime communication is in nature required to deliver sensed values to actuator nodes and methods to actuation device objects. Hence, we take the object-side approach by giving the unique identifier to each method since no communication among the actuator nodes is required. Each instance op_is of a method op issued by an actuator node a_is is identified in a pair of method type op and global time t_is when the actuator node a_is receives an event. Here, a pair of identifiers 〈op, t_is〉 and 〈op', t_it〉 of instances op_is and op_it, respectively, are referred to as temporarily equivalent (op_is ≅ op_it) iff op = op' and |t_is — t_it|  α_i. A method op_it is considered to be redundant or an actuation device object o, i.e. op_it is rejected if a method op_is has been performed on the actuation device object o such that op_is ≅ op_it. On each actuation device object, each method can be only once performed even if multiple actuator nodes send instances of the method to the actuation device object.

4.2. Synchronization of Multiple Actuator Nodes

Next, suppose a pair of events e₁ and e₂ occur in an event area W_i. An actuator node a_is receives sensed values of the event e₁ and another actuator node a_it receives sensed values of the event e₂ from sensors. A pair of methods op₁₁ and op₁₂ are to be performed on an actuation device object o₁ for the events e₁ and e₂, respectively. A pair of methods op₂₁ and op₂₂ are to be performed on an actuation device object o₂ for the events e₁ and e₂, respectively. The actuator node a₁ issues a pair of the methods op₁₁ and op₁₂ to the actuation device objects o₁ and o₂, respectively. The other actuator node a₂ issues a pair of methods op₂₁ and op₂₂ to the actuation device objects o₁ and o₂, respectively. A pair of the methods op₁₁ and op₂₁ are performed on the actuation device object o₁ and a pair of the methods op₁₂ and op₂₂ are performed on the other actuation device object o₂. Here, suppose a pair of the methods op₁₁ and op₂₁ conflict on the actuation device object o₁ as well as a pair of the methods op₁₂ and op₂₂ conflict on the actuation device object o₂ as shown in Fig. 13. Here, the serializability [3] is required, i.e. the method op₁₁ is performed on the actuation device object o₁ before op₂₁ iff op₁₂ is performed on o₂ before the actuation device object op₂₂.

Figure 13.

Serializability.

In order to realize the serializability, an actuation device object has to be locked before each method is performed on the actuation device object in one way [3,7]. A method op is kept waiting if an actuation device object is locked by another method conflicting with the method op. In another way, method requests are timestamped in each actuator node [3]. Here, each actuator node issues methods to a timestamp ordering (TO) scheduler. Conflicting methods are ordered in the timestamp order by the TO scheduler. Then, the TO scheduler outputs methods to an actuation device object. Here, every pair of conflicting methods issued by different actuator nodes are performed on an actuation device object in the timestamp order. Actuation device objects are classified into a pair of types: synchronous and non-synchronous types of actuation device objects. A synchronous object supports some synchronization mechanisms, i.e. a locking protocol or time-stamp ordering (TO) scheduler. One type of synchronous object is a locking type which supports the strict two-phase locking (2PL) protocol [7]. Here, lock and unlock methods are additionally supported by the actuation device object. An actuator node issues a lock request to an actuation device object before issuing a method. The method is performed on the actuation device object only if the object is locked. Otherwise, the method is kept waiting for release of the lock.

If every method in a transaction is successfully performed on the actuation device object, the transaction commits. If some method is not successfully performed, the transaction aborts. In one way to abort a transaction, every method op performed is compensated by performing the compensating method $\tilde{o p}$ of the method op. If the transaction commits or aborts, every actuation device object held by the transaction is released. This is the strict two phase locking protocol [3]. Deadlocks may occur in every locking protocol.

Another synchronization type of an object is a TO (timestamp ordering) [3] object. An actuator node a_is issues a method op to an actuation device object o. Here, the method op is given timestamp ts(op) by the actuator node a_is. Here, the method op is sent to the TO scheduler of the actuation device object o. In the TO scheduler, every pair of conflicting methods op₁ and op₂ are ordered in the timestamp order. That is, the method op₁ precedes the method op₂ if ts(op₂) < ts(op₂) and the method op₁ conflicts with the method op₂. Then, the TO scheduler delivers conflicting methods to the object in the timestamp order. Here, the method op₁ is delivered before the method op₂ since op₁ precedes op₂ in the TO scheduler. In order to prevent deadlocks, we take TO objects in this paper. Each actuation device object takes methods from multiple actuator nodes and conflicting methods from multiple actuator nodes and performed on the actuation device object in the timestamp order.

An actuation device object normally does not support any synchronization mechanism such as the locking protocol and time-stamp ordering (TO) scheduler. Methods are just performed on a non-synchronous actuation device object in a receipt sequence of the methods. Hence, multiple actuator nodes are required to issue synchronously conflicting methods with each other. Here, an actuator node is assumed to communicate with sensor nodes and the other actuator nodes by using the wireless medium in an event area W_i. An actuator node a_is sends a request message of a method op with the timestamp ts(op) to an actuation device object by broadcasting the message in an event area W_i. Each actuator node a_is receives the messages from other actuator nodes. Every operational actuator node sends the same method op as discussed before. Every actuator node selects one actuator node whose timestamp is the minimum. Then, the selected actuator node whose time-stamp is the minimum sends the method op to the actuation device object. If actuator nodes are assumed to be arbitrarily faulty, each actuator node takes the majority-based decision in a way similar to in the sensors.

5. Concluding Remarks

In this paper, we discussed how to make a wireless sensor and actuator network (WSAN) fault-tolerant. A WSAN is decomposed into event areas. Each event area is composed of sensor nodes, actuator nodes, and actuation device objects. Each sensor node communicates with multiple actuator nodes and each actuator node receives sensed values from multiple sensor nodes by using the wireless communication medium. This is the multi-actuator/multi-sensor (MAMS) model which we proposed to make WSAN fault-tolerant in this paper. An actuator node makes a decision on what method, i.e. action to be performed on which actuation device object in an event area and then issues the method to the object. Sensor nodes are less reliable and may be arbitrarily faulty, due to low-energy, low-cost devices. In addition, messages sent by sensor nodes might be lost due to collision and nose in a wireless channel. We discussed semi-passive coordination (SPC) and active coordination (AC) protocol for coordinating multiple actuator nodes. In addition, we discussed how to resolve redundant execution of a method on an actuation device object and serialize conflicting methods from multiple actuator nodes.

Footnotes

Acknowledgment

This research is partially supported by Research Institute for Science and Technology [Q06J-07] and Academic Frontier Research and Development Center [18-J-6], Tokyo Denki University.

References

Akyildiz

I. F.

and Kasimoglu

I. H.

. Wireless sensor and actor networks: research challenges. Ad Hoc Networks Journal (Elsevier), 2:351–367, 2004.

Akyildiz

I. F.

Sankarasubramaniam

, and Cayirci

. Wireless sensor networks: a survey. Computer Networks Journal (Elsevier), 38:393–422, 2002.

Bernstein

P. A.

Hadzilacos

, and Goodman

. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.

Defago

Schiper

, and Sergent

. Semi-passive replication. In Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems (SRDS-17), pp. 43–50, 1998.

Durresi

and Paruchuri

. Geometric broadcast protocol for heterogeneous sensor networks. Proc. of 19th IEEE International Conf. on Advanced Information Networking and Applications (AINA2005), 1:343–348, 2005.

Eliot

and Moss

. Nested Transactions: An Approach to Reliable Distributed Computing. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1981.

Eswaren

K. P.

Gray

J. N.

Lode

, and Traiger

I. L.

. The notion of consistency and predicate locks in database systems. Communications of the ACM, 19(11):624–637, 1976.

Garcia-Molina

and Salem

. Sagas. In Proceedings of the 1987 ACM SIGMOD international conference on Management of data, pp. 249–259, 1987.

Goldberg

and Robson

. Smalltalk-80: The Interactive Programming Environment. Addison-Wesley, 1983.

10.

Gray

and Reuter

. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.

11.

IEEE . IEEE Std 802.11b – Wireless LAN Medium Access Control (MAC) and Physical layer (PHY) Specifications: High Speed Physical Layer (PHY) in the 2.4 GHz Band. IEEE, 1999.

12.

IEEE . Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications. IEEE, 2000.

13.

Lam

S. S.

. A carrier sense multiple access protocol for local networks. Technical report, Austin, TX, USA, 1979.

14.

Lamport

Shostak

, and Pease

. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, 1982.

15.

Lynxmotion. http://www.lynxmotion.com/.

16.

Melodia

Pompili

Gungor

V. C.

, and Akyildiz

I. F.

. A distributed coordination framework for wireless sensor and actor networks. Proceedings of the 6th ACM International Symposium on Mobile Ad Hoc Networking and Computing. 1:99–110, 2005.

17.

Nakano

and Olariu

. Uniform leader election protocols for radio networks. IEEE Transactions on Parallel and Distriuted Systems, 13(5):516–526, 2002.

18.

Ozaki

Watanabe

Itaya

Hayashibara

Enokido

, and Takizawa

. A fault-tolerant model of wireless sensor-actor network. In Proceedings of the 9th International Symposium on Object and Component-Oriented Read-Time Distributed Computing (ISORC 2006), pp. 186–193. IEEE Computer Society, 2006.

19.

Skeen

and Stonebraker

. A formal model of crash recovery in distributed systems. IEEE Trans. Software Engineering, 9(3), 1983.

20.

TinyOS. http://www.tinyos.net.

21.

Weiser

. Hot topics: Ubiquitous computing. IEEE Computer, pp. 71–72, 1993.

22.

Wiesmann

Pedone

Schiper

Kemme

, and Alonso

. Understanding replication in databases and distributed systems. In Proceedings of 20th International Conference on Distributed Computing Systems (ICDCS'2000), pages 264–274, Taipei, Taiwan, R.O.C., Apr. 2000. IEEE Computer Society Technical Commitee on Distributed Processing.

23.

Yasuzawa

Takizawa

, and Ouchi

. Resolution of parallel deadlock by partial abortion. Proc. of the 1st International Symposium on Computer Communications (ISCOM). pp. 708–711, 1991.