Energy-Efficient Monitoring in Software Defined Wireless Sensor Networks Using Reinforcement Learning: A Prototype

Abstract

Software defined wireless networks (SDWNs) present an innovative framework for virtualized network control and flexible architecture design of wireless sensor networks (WSNs). However, the decoupled control and data planes and the logically centralized control in SDWNs may cause high energy consumption and resource waste during system operation, hindering their application in WSNs. In this paper, we propose a software defined WSN (SDWSN) prototype to improve the energy efficiency and adaptability of WSNs for environmental monitoring applications, taking into account the constraints of WSNs in terms of energy, radio resources, and computational capabilities, and the value redundancy and distributed nature of data flows in periodic transmissions for monitoring applications. Particularly, we design a reinforcement learning based mechanism to perform value-redundancy filtering and load-balancing routing according to the values and distribution of data flows, respectively, in order to improve the energy efficiency and self-adaptability to environmental changes for WSNs. The optimal matching rules in flow table are designed to curb the control signaling overhead and balance the distribution of data flows for achieving in-network fusion in data plane with guaranteed quality of service (QoS). Experiment results show that the proposed SDWSN prototype can effectively improve the energy efficiency and self-adaptability of environmental monitoring WSNs with QoS.

1. Introduction

Wireless sensor networks (WSNs) are application-oriented information-centric networks, which are characterized by limited energy and constrained radio resources [1]. One typical application of WSNs is environmental monitoring, where data-gathering based environmental monitoring tasks are executed by nodes with heterogeneous sensing and programmable functions. Each node in WSNs could be equipped with multiple sensors for different sensing purposes, for example, temperature, humidity, light, and vibration. In WSNs, the time-varying wireless communication environment and random interference may lead to unreliable communication links, while switching on/off of network nodes due to energy constraints can cause unpredictable topology changes, making it difficult to guarantee reliable and adaptive data-gathering for monitoring applications.

Software defined wireless networks (SDWNs) enable programmable control in network and virtualization of network equipment by decoupling control plane and data plane [2]. The logic centralization and simplified abstraction of control plane can improve the scalability and multitasking efficiency [3]. The combination of SDWNs based architecture and WSNs, that is, software defined wireless sensor network (SDWSN), would bring the following advantages: (i)

SDWNs based abstraction of network control plane can effectively reduce the cost of WSN expansion and operation.

(ii)

SDWNs based virtualization of network equipment and programmable control of common hardware and software enabled flexible task configuration, high resource utilization, and simplified network management in WSNs.

However, to realize the above advantages of SDWSN for monitoring applications is not without difficulties. The control-data decoupled structure of SDWNs relies on cross-plane control traffic, which may result in excessive communication overhead and transmission delay. In SDWSN, although different virtual networks can work together on top of the same physical infrastructure, the centralized control plane may lead to high energy costs due to information collection for reaching a global view, and the multiple virtual networks may compete for common physical network resources. If a large number of flows simultaneously request a switch to forward data, network congestion or even crash may occur. Furthermore, energy- and resource-constrained WSNs might not have the sufficient network resources to realize the dynamic resource allocation and QoS of SDWNs. Therefore, the energy and resource utilization of SDWSN need to be carefully designed for resource-constrained and application-oriented WSNs.

Most existing works on SDWSN focus on providing QoS guarantee or optimizing network management for monitoring applications. The software defined information centric network (SDN-IC) [4] floods the network with packets so as to leave reverse path information at routers, but that method will cause frequent duplication of packets and lead to huge communication loads, which increase not only end-to-end delay but also energy consumption. The resource allocation in a software-driven wide-area network (SWAN) was optimized by an agent-based traffic engineering scheme [5], which requires excessive information exchange between the controller and switches for tracking network topology and traffic distribution changes. With the increase of network density, the SWAN would be plagued by large overhead caused by collisions between candidate relays contending for media. The software defined vehicular ad hoc network (SDV) [6] uses network virtualization to allocate network traffic in a programmable fashion, where surveillance packets are delivered following a position-aided data-gathering mechanism with greedy perimeter stateless routing (GPRS) [7] in case of controller failure. However, the SDV controller needs to gather and maintain a large amount of information for transmission power control, which is not practical for large-scale monitoring WSNs. In [8], the energy consumption of a multitask SDWSN was minimized for monitoring applications with guaranteed quality-of-sensing by solving a mixed integer linear programming problem at a high computation complexity. In [9, 10], the load-balancing routing algorithms for WSNs construct an optimal routing tree by minimizing the total weight of routing paths, where the path weights are modeled as a function of energy consumption. However, none of these works has adequately considered the application-oriented features of flows and in-network data fusion in complex and dynamic monitoring environments for SDWSN, thereby significantly limiting their energy efficiency and environmental adaptability.

In this paper, we develop an energy-efficient cognitive SDWSN prototype for environmental monitoring application, where high computational complexity management of data fusion and data routing are centralized in control plane, while low computational complexity execution of algorithms is implemented in data plane. The cognitive mechanism based on reinforcement learning (RL) [11] is embedded in control plane for information processing, where the interactions (in terms of reward or punishment) between agents and the environment are utilized to enhance the intelligence in policy decision making and to improve the self-adaptability of the energy-saving mechanisms in dynamic environments. Particularly, we propose to mine the application-specific value redundancy of flows in periodic transmissions of monitoring data using an autoregressive moving average (ARMA) [12] based time series forecast model. We design RL based mechanisms to perform value-redundancy filtering and load-balancing routing according to the values and distribution of flows, respectively, in order to improve the energy efficiency and self-adaptability to environmental changes of WSNs. Furthermore, the actions of control plane are mapped to low-complexity vector calculations and rule matching in switch's flow table. The rules in flow table are designed to curb the control signaling overhead and balance the distribution of data flows for achieving in-network fusion in data plane.

The novel aspects of the proposed energy-efficient SDWSN prototype are (i)

energy saving with guaranteed QoS is achieved by mining the application-specific value redundancy and distribution of data flows in SDWSN, taking into account the inherent constraints of WSNs in terms of energy, radio resources, and computational capabilities;

(ii)

the RL based mechanisms for value-redundancy filtering and load-balancing routing can adapt to the varying environment and network status, thus improving the self-adaptability of SDWSN for monitoring applications.

The rest of the paper is organized as follows. Section 2 elaborates the cognitive SDWSN prototype and its functional architecture. Section 3 presents a specific implementation of the proposed prototype. In Section 4, performance of the proposed SDWSN prototype in terms of energy efficiency and self-adaptability is evaluated through experiments in comparison with existing WSN approaches for monitoring. Finally, conclusion is drawn in Section 5.

2. Functional Architecture of SDWSN Prototype

In this section, we propose a cognitive SDWSN prototype, where RL is incorporated into the network information process for an integrated consideration of the energy- and resource-constrained trait of WSNs, complex features of monitoring applications, and dynamic nature of WSN deployment environments.

As shown in Figure 1, the fundamental functionalities of SDWSN prototype include an information QoS setting module, cognitive information middleware (CIM), and an information processing module. Following the design principles of SDWNs, the application plane of SDWSN prototype is designed to meet the QoS requirements of monitoring applications, supported by the hardware of sensors. Application plane interacts with control plane through an application programming interface (API). The functionalities of data plane are dynamically configured using Over-the-Air Programming (OTAP) technique [13], which can run multiple tasks simultaneously with QoS and can reduce the energy consumption in online task scheduling.

Figure 1

Cognitive SDWSN prototype.

The data plane is abstracted into a weighted directed graph, $G = (V, L, E)$ , which forms a reverse multicast tree with V being the set of vertexes, L being the set of links, and E being the set of link weights. Each node in the vertex set V maintains a hierarchical cluster, where a switch acts as the cluster leader and other programmable nodes become cluster members constituting the monitoring information generating (IG) module. The data flows generated by members of the same cluster form a programmable set of packets that share certain properties, because the packets of a flow are handled by matching “Field” and “Rules” in a switch's flow table (see Figure 1) and by imposing the “Action” set to execute the preset policy. The weight of each link indicates the status of flow property distribution and frequency bandwidth allocation during a certain data-gathering round and is defined as the function of link bandwidth utilization (BWU) (see Section 3).

In the control plane, CIM is a part of the controller that performs adaptive data mining of network information using machine learning schemes with QoS guarantee. Information mapping (IM) module in CIM is responsible for preprocessing information received from data plane (i.e., information mining). It has two main duties: to perform online evaluation of the value of current monitoring data flow using an ARMA model and to build a flow distribution map and a network interconnection map in G . Information Q-learning (IQ) module utilizes the results from IM module to produce optimal strategies, which are then transformed by policy defining (PD) module into a set of policies to be inserted into switch's flow table for a lightweight implementation of data plane.

Routing decisions are made by CIM in the controller and then translated into rules and actions to be deployed in flow tables. APIs are used to configure flow tables for routing, in conjunction with a floodless service discovery mechanism. As part of the operating system, Sensor OpenFlow (SOF) [14] channel is used to establish an end-to-end connection between the controller and a switch. SOF also supports queries on packet streams and automatically splits queries between the data plane and the control plane, thus avoiding the increase of traffic in the data plane due to queries. Value matching and path matching are designed by CIM and executed by lightweight actions of flow tables in an on-demand driven energy-saving mode. This reduces the amount of information exchange between the operating system and the data plane. After data-gathering routing has been established, data packets can be forwarded and processed in the data plane. Subsequent (follow-up) packets in a flow are forwarded in the data plane based on the configured routing in flow tables without any further participation from the control plane. This can reduce the data-gathering traffic in the data plane and decrease control overhead in the control plane. The features specified in the MAC layer (as part of the operating system) are logically partitioned into two different modules: the lower MAC module, which depends on the proprietary Hardware Abstraction Layer (HAL) and controls time critical functions to achieve value-redundancy fusing in the data plane based on service differentiation access control; and the upper MAC module, which is responsible for delay-tolerant control plane functions.

The proposed cognitive SDWSN prototype takes into account the inherent constraints of WSNs in terms of energy, radio resources, and computational capabilities. Energy saving is achieved through the design of value-redundancy data fusing and load-balancing data routing technologies in CIM. By using machine learning schemes for energy saving in control plane and by incorporating lightweight execution using a flow table at each switch in data plane, intelligence and controllability can be achieved in all stages of the information operation chain in SDWSN. The introduction of in-network processing with low computational complexity in data plane facilitates the centralization of QoS management in control plane, thus reducing the total amount of overheads for cross-plane communications. Moreover, low-complexity numerical operations of the flow entries are enabled by IM module, which matches CIM's outputs to vector constant parameters.

Based on the proposed SDWSN prototype, programmability and resource reutilization in data plane can be improved through OTAP. The overhead for cross-plane control signaling can be reduced by introducing a data fusion mechanism into data plane, which also improves the controllability of packet routing and the efficiency of resource utilization.

3. Design of RL Based Energy-Saving Mechanisms

In this section, the implementation of the proposed SDWSN prototype for environmental monitoring applications will be discussed with a focus on RL based energy-saving mechanisms.

3.1. Design of Energy-Saving Mechanisms in Control Plane

In event-detection based monitoring applications, the periodic transmissions of monitoring data usually have low duty cycles and high time-domain correlation, resulting in data value redundancy. In the following, we exploit the data value redundancy to achieve transmission energy saving.

RL is an agent based learning approach, which uses the trial and error method to find a reward maximizing behavior in a dynamic environment. RL can adapt to the dynamic environment with a relatively low complexity, rendering itself perfectly applicable to WSNs with limited resources and operating in unpredictable environments. Therefore, we design the energy-saving policy Γ (see (4)) in control plane based on RL. The key design of Γ is the utilization of contention window (CW) in QoS-aware media access control (MAC), which exploits the concept of service differentiation MAC [15]. CW introduces a MAC back-off counter, called Failed Times (FT), to count the number of failures before winning the contention, which can avoid long time occupancy of media caused by a large CW. The threshold of FT sets the retry limit in MAC and can be considered as CW size. According to Γ, we propose the value-redundancy filtering mechanism $π_{1}$ and the load-balancing routing mechanism $π_{2}$ , where $π_{1}$ performs online estimation of flow values and implements an optimal in-network fusion strategy, and $π_{2}$ performs online analysis of flow distribution and adaptive optimization of path weights.

During a specific data-gathering round $B (B \in Z^{+})$ , the monitoring data flow generated by node i can be modeled as a limited time series $X = {x_{t}^{i}}$ , $t \in T_{B}$ , $T_{B} = {j, j + 1, \dots, j + m - 1}$ , $i \in V$ , $m \in Z^{+}$ , where $j = (B - 1) m + 1$ denotes the sampling time instant and m denotes the learning queue length. The time span of each data-gathering round is $m T_{s}$ , where $T_{s}$ is the sampling period.

Since the ARMA model captures the statistical characteristics of a time series, which can be used to mine the sampled data value redundancy and to perform real-time value evaluation of data flows, we adopt ARMA to predicate the value ${\hat{x}}_{t}^{i}$ and calculate the corresponding prediction error ${‖x_{t}^{i} - {\hat{x}}_{t}^{i}‖}_{p}$ , where ${‖\cdot‖}_{p}$ denotes the normalized Frobenius norm [16], for example, ${‖A‖}_{p = 2} = [T r (A^{*} A)]^{1 / 2}$ . Note that if some important events occur, the distribution of monitoring data flows among nodes would become uneven. According to the information entropy theory [17], greater variations in data flow values indicate a larger average amount of information contained in the data flows and the higher probability of important events occurring under the premise of no external interference/influence. Thus, we define the value factor $F_{B}^{i}$ to estimate the underlying value of data flow X generated by node i in the Bth data-gathering round as follows:

\begin{matrix} F_{B}^{i} = δ_{t}^{i} * λ_{t}^{i} \\ s . t . δ_{t}^{i} = \frac{\sum_{t = j}^{j + m - 1} {‖x_{t}^{i} - {\hat{x}}_{t}^{i}‖}_{p}}{m}; \\ λ_{t}^{i} = {‖{(\sum_{t}^{} o_{t}^{i} - k_{t h})}^{+}‖}_{p}; \\ o_{t}^{i} = \{\begin{cases} 1, & {‖x_{t}^{i} - {\hat{x}}_{t}^{i}‖}_{p} > δ_{t}^{i} \\ 0, & otherwise; \end{cases} \\ {(\cdot)}^{+} = \max (\cdot, 0); \\ t \in T_{B}, i \in V, \forall x_{t}^{i} \in X, j \in Z^{+}, k_{t h} \in Z^{+}, p = 2, \end{matrix}

(1)

where

δ_{t}^{i}

is the mean of prediction errors,

λ_{t}^{i}

is an anti-interference factor designed to avoid misjudgment of values caused by environmental disturbance, and

o_{t}^{i} = 1

if the prediction error at time instant t is larger than

δ_{t}^{i}

; otherwise

o_{t}^{i} = 0

. If the number of significant prediction errors in data flow X reaches the threshold

k_{t h}

, that is,

\sum_{t} o_{t}^{i} \geq k_{t h}

, then X is considered as a high value level; otherwise, those significant prediction errors are considered as the result of external interference. The anti-interference capability increases with the value of

k_{t h}

After the value of data flow X having been predicted, the historical forwarding record of X during the Bth data-gathering round needs to be extracted based on link statistics obtained from the counter field in switch's flow table, in order to analyze the link state information and calculate the BWU. Accordingly, the real-time state element for RL can be obtained as

\begin{matrix} θ_{B}^{i} = {‖(\frac{F_{B}^{i}}{e_{B}^{k_{i}}})‖}_{p} \\ s.t. e_{B}^{k_{i}} = \frac{N_{B}^{k_{i}}}{l_{B}^{k_{i}}}; \\ l_{B}^{k_{i}} = \frac{1}{|O^{(i)}|} \sum_{h} l_{B}^{〈i, h〉} \\ \forall i \in V, k_{i} = 〈i, h〉 \in L, h \in O^{(i)} \subset V, t \in T_{B}, e_{B}^{k_{i}} \in E, p = 2 . \end{matrix}

(2)

In (2),

θ_{B}^{i}

reflects the local status of flow value and flow allocation used for resource utilization in SDWSN, where

e_{B}^{k_{i}}

is the BWU and

N_{B}^{k_{i}}

denotes the traffic throughput of node i on link

k_{i}

in the Bth data-gathering round, which can be calculated as the incremental number of “transmitted bytes” from node i on link

k_{i}

in the Bth data-gathering round.

l_{B}^{k_{i}}

indicates the mean bandwidth of links in set

{〈i, h〉}_{h}

, where node h belongs to

O^{(i)}

, which denotes the neighborhood of node i with size

|O^{(i)}|

. Then, the threshold of FT in CW is set as an inversely proportional function of

θ_{B}^{i}

using service differentiation based MAC retransmission protocol [15]. Therefore, the value of data flow X and its historical forwarding records can be mapped to the corresponding probability of channel access in MAC layer. The adaptive optimization of CW can be formulated as an average reward Markov decision process (ARMDP) [18]. Accordingly, we design an ARMDP based RL mechanism to optimize CW for the purpose of finding the optimal energy-efficient policy. The RL mechanism is executed by CIM, which adaptively adjusts the channel access probability to inhibit the transmission of value-redundant loads and balance the distribution of transmission loads among nodes while guaranteeing QoS.

In the RL mechanism, the dynamic environment is characterized by a 4-tuple $(S, A, P, R)$ , where S is the set of network states updated in each data-gathering round and consisting of value factors $F_{B}^{i}$ and BWU $e_{B}^{k_{i}}$ stored in the status table embedded in CIM; that is,

\begin{matrix} s_{B} = (F_{B}^{i}, e_{B}^{k_{i}}) \in S, \forall i \in V, k_{i} = 〈i, h〉 \in L, h \in O^{(i)} \subset V, e_{B}^{k_{i}} \in E, B \in Z^{+}, \end{matrix}

(3)

thereby providing a real-time observation of the environment for Q-learning; A is the action set produced by PD module and injected into the “Match Field” of flow table in the switch (see Figure 1);

P : S \times A \times S

denotes the state transition probability; and the reward function

R : S \times A \to R

indicates the environmental reward to the corresponding action for improving energy efficiency. The global reward is accumulated by maximizing the local reward in each data-gathering round by CIM. For a given QoS, the local reward calculated by (4) increases with a higher value

F_{B}^{i}

of the data flow and a lower value of the BWU

e_{B}^{k_{i}}

. Conversely, if the current value of data flow is low and/or the bandwidth is overutilized, then RL mechanism will generate a zero income or a negative reward as punishment.

The CW optimization strategy is shown in Algorithm 1, where Γ optimizes the size of CW $|W^{i}|$ by maximizing the accumulated local reward and amending the criteria of action evaluation in an iterative manner:

\begin{matrix} Γ : Ψ_{B} = \{\begin{cases} Ψ_{B - 1} + ζ, & Q_{B} (s_{B}, a_{B}) \geq Q_{B} (s_{B - 1}, a_{B - 1}) + ε \\ Ψ_{B - 1} - ζ, & Q_{B} (s_{B}, a_{B}) < Q_{B} (s_{B - 1}, a_{B - 1}) - ε \end{cases} \\ s.t Ψ_{1} = θ_{1}^{i}; \\ Q_{B} (s_{B - 1}, a_{B - 1}) = Q_{B - 1} (s_{B - 1}, a_{B - 1}) + λ * [r_{B} + γ * \max_{a_{B}} Q_{B - 1} (s_{B}, a_{B}) - Q_{B - 1} (s_{B - 1}, a_{B - 1})]; \\ r_{B} = \frac{F_{B}^{i}}{e_{B}^{k_{i}}}; \\ \exists ε > 0, \exists ζ \geq 0, λ \in (0,1], γ \in (0,1], B \in Z^{+}, s_{B} \in S, r_{B} \in R, a_{B} \in A, \end{matrix}

(4)

where λ and γ denote the learning factor and the discount factor, respectively,

s_{B}

and

a_{B}

are the state and the action in the Bth data-gathering round, and ε and ζ represent the tuning step sizes for updating values of

Ψ_{B}

and Q function [19], respectively.

Algorithm 1: Contention window optimization strategy based on RL.

① Initialize Q table entry for each (s, a) pair and reward r

Setting parameters $λ, γ, s, a, t = 0$ , $Q (s_{B}, a_{B}) = 0$ , $B = 1$ ;

② Perform the following steps during each round:

{Do calculate the QoS factor of flow

Observe the current state $s_{B}$ ∈ S according to (3)

While (the current run < iteration threshold)

{Adjust Γ with given Q according to (4);

Select action $a_{B}$ according to state transition probability according to (6);

Execute the adjustment of window's size according to (5);

Maximize per-round local reward $r_{B}$ ;

Observe the new state $s_{B + 1}$ ;

Learn s, a, r

Execute the iterative process of Q

Select next step action $a_{B + 1}$ ;

Update Status $s_{B} \leftarrow s_{B + 1}$ ;

End while loop

}

Get the reward from the current run

$B \leftarrow B + 1$ ;

If B is lower than the iteration threshold, then go to Step ②

End do loop

}

③ Obtain the optimal policy Γ, which is inputted into flow-table based on $Q (s, a)$ and historical behavior rewarding.

Status table in CIM tracks how the operational environment evolves with time, where new states can be mined and new actions should be discovered. Policy Γ needs to be constantly updated to match the state-event pairs with the optimal actions. RL mechanism uses a random strategy to fully explore the state space at the beginning and adopts a greedy strategy to ensure convergence later on. According to Γ outputted by CIM, controller executes the optimal action with local maximum Q-value as follows:

\begin{matrix} Action: a_{B}^{} = \{\begin{cases} |W_{B}^{i}| ⟵ |W_{B}^{i}| + Δ |W^{i}|; & θ_{B}^{i} < {‖Ψ_{B}‖}_{p} \\ |W_{B}^{i}| ⟵ |W_{B}^{i}|; & θ_{B}^{i} = {‖Ψ_{B}‖}_{p} \\ |W_{B}^{i}| ⟵ |W_{B}^{i}| - Δ |W^{i}|; & θ_{B}^{i} > {‖Ψ_{B}‖}_{p} \end{cases} \\ s.t. Δ |W^{i}| = \frac{1}{θ_{B}^{i} * ϖ_{Q o s}}; \\ ϖ_{Q o s} \in [0,1], p = 2, \end{matrix}

(5)

where

Δ |W^{i}|

indicates the adjustable size of CW. After the Bth iteration,

Ψ_{B}

is obtained according to the comparison of Q value between the Bth and the

(B - 1)

th rounds, and QoS factor

ϖ_{Q o S}

represents the QoS requirement of a given monitoring application. The value of

ϖ_{Q o S}

can be adjusted by the information QoS setting module in application plane and then be fed into the “Match Rules” in switch's flow table (see Figure 1) via API to realize programmable network control. Accordingly, the size of CW is adjusted by node i executing action

a_{B}

based on the following transfer function:

\begin{matrix} p (a_{B}) = \frac{χ_{Γ}^{s_{B}} (a_{B})}{\sum_{a_{B} \in A} χ_{Γ}^{s_{B}} (a_{B})}, p (a_{B}) \in P, a_{B} \in A, s_{B} \in S, \end{matrix}

(6)

where

χ_{Γ}^{s_{B}} (a_{B}) = Q_{B} (s_{B}, a_{B}) / [\sum_{B} Q_{B} (s_{B}, a_{B})]

denotes the state assessment for action

a_{B}

under state

s_{B}

using Γ. Thus, CW size, that is,

|W^{i}|

, can be adaptively adjusted according to dynamic environment status and application QoS, with the vibration amplitude

Δ |W^{i}|

given by the inverse of

θ_{B}^{i}

. According to the MAC protocol for differentiated services [20], the probability of packet forwarding is inversely proportional to

|W^{i}|

Based on the above analysis, we design $π_{1}$ and $π_{2}$ as follows: (i)

$π_{1}$ reduces the total energy consumption in SDWSN by inhibiting the transmission of value-redundant loads. In $π_{1}$ , $F_{B}^{i}$ is used as one important element to control $|W^{i}|$ . According to (5), a low-value flow will be configured with a high medium access delay caused by the corresponding large number of retries set by MAC back-off counter. This would lead to a low forwarding probability. The low-value flow will be discarded when the medium access delay goes beyond a preset threshold of FT. Therefore, the probability of traffic forwarding can be adaptively controlled according to the flow value. Suppressing the transmission of low-value flows greatly decreases the amount of in-network traffic for data-gathering and thus achieves energy saving.

(ii)

$π_{2}$ balances the energy distribution across SDWSN by minimizing the total weight of routing paths in G , that is, $\min {\sum e_{B}^{k_{i}}}$ , $\exists k_{i} \in {T r e e}_{B}^{o p t i m} \subset L$ , to construct an optimal routing tree, ${T r e e}_{B}^{o p t i m} = \{k_{i}\}$ , $k_{i} \in L$ , $i \in V$ , where the average link BWU $e_{B}^{k_{i}}$ is used as the link weight for controlling the size of CW. According to (2) and (5), a link with a higher BWU will be configured with a larger CW size, leading to a lower probability of traffic forwarding. Therefore, the distribution of network traffic can be balanced by adaptively adjusting the link weights, thereby optimizing the routing selection in SDWSN.

4. Implementation of Flow-Table Based Policy in Data Plane

SDWSN are characterized by the decoupled control and data planes. Although the energy-efficient mechanisms $π_{1}$ and $π_{2}$ require a high computational complexity in control plane, they are executed by the information processing module of data plane at a low computational complexity. Based on Γ in (4), $π_{1}$ and $π_{2}$ are mapped to the parameter vector ψ of the value-redundancy filter and the path matrix Π of the load-balancing routing mechanism, respectively. The value-redundancy filtering parameter vector $ψ = ({‖Ψ_{B}‖}_{p}, ϖ_{Q o s})_{2 \times 1}$ , where $Ψ_{B}$ is the value-redundancy filter threshold calculated in (4) for use in (5) to curb CW size. Π consists of the identities of nodes belonging to the optimal routing tree ${T r e e}_{B}^{o p t i m}$ found by $π_{2}$ . In data plane, the specific implementation of energy-efficient mechanisms $π_{1}$ and $π_{2}$ involves only lightweight vector product and numerical comparison, which are both low-complexity matching operations, following the corresponding rules in the “Match Field” (i.e., Rule (1) to Rule (4) in Figure 2) of flow table. Figure 2 shows the detailed implementation of flow table at each switch. Flow table contains a prioritized list of rules to instruct the corresponding actions. Particularly, task scheduling has the highest priority, followed by value matching (i.e., value-redundancy filtering), and path matching (i.e., load-balancing routing) has the lowest priority. Each input flow will be matched to the prioritized rules. When multiple rules match an input flow, the rule with the highest priority will be selected first to execute the corresponding action set. If no rule matches an input flow, then the switch will request the controller to update its flow table, and the default-action set will automatically forward the flow to CIM in the controller for developing new energy-efficient policies.

Figure 2

Implementation process of flow table.

When the current flow matching process ends and the next flow arrives, if the newly arrived flow contains the same contents as the previous one, it will be considered as redundant. In this case, flow table does not need to be updated for value-redundancy filtering or forwarding path. Therefore, cross-plane communications and task reconfiguration can be greatly reduced, thus improving the energy efficiency in application-oriented SDWSN. When the real-time status (e.g., QoS and throughput) of SDWSN notably changes, the value-redundancy filtering parameters and routing paths can be dynamically adjusted by ψ, thus improving the environmental adaptability SDWSN.

In the proposed SDWSN-RL prototype, the control traffic from the controller to the data plane (i.e., downstream traffic) contains Packet-Out, Modify-State (configuration), and Read-State (request or query); the control traffic from the data plane to the controller (i.e., upstream traffic) contains Packet-In and Read-State (reply or report). The control traffic flow can be described as follows. Once a source host generates a query message, the controller responds with a reply message if the source host and the destination host are on the same island. Otherwise, the controller drops this query message. When network status changes, Packet-In event will be triggered by a request message in the data plane. Each switch sends a reply message containing the switch status to the controller via a secure channel supported by SOF. Meanwhile, Modify-State configuration messages are exchanged between the controller and switches via the secure channel as well. A Packet-Out message is generated by PD module and sent to switch to validate an entry in flow table. If no response is returned within a specified time, the potentially invalid entry will be deleted. The amount of Packet-In/Out messages for handling requests grows with the number of switches in the network.

5. Experiment Results

We perform experiments to evaluate the performance of the proposed RL based SDWSN prototype (SDWSN-RL) for environmental monitoring applications. The network simulator NS2 [21] is used to build the experiment environment. The parameter values used for the experiment setup are given in Table 1. We adopt the event radius (ER) model [22] to simulate the impulsive traffic triggered by temporally and spatially correlated monitoring events in a disk area. Following the ER model, the monitoring area of SDWSN is divided into an event gathering region, a data relaying region, and a decision making region. The first two regions belong to data plane, and the third one belongs to control plane. The monitoring center, that is, BS, is placed at the top right corner of the monitoring area with the coordinate (128 m, 162 m). The event center is located at the coordinate (48 m, 82 m) inside the event gathering area. The arrival of events follows a Poisson distribution in the time domain. Note that all the experiment results in Figures 3–8 include the energy consumptions of both data-gathering and control traffic. In the energy consumption calculation, we consider the energy consumption $E_{data-gathering}$ for data-gathering (including data fusion and processing) and the energy consumption $E_{control-traffic}$ for control overhead (including the control traffic for configuration, request-query, and reply-report). The evaluation for energy consumption in Figures 3, 4, 6, and 7 and for network lifetime in Figure 5 has all taken into account both $E_{data-gathering}$ and $E_{control-traffic}$ .

Table 1

Experiment parameters.

Monitoring networks deployment		RL algorithm parameters		Communication parameters
Application	ER model	Learning factor λ	0.5	MAC	802.11 edcf
Monitoring area radius	150 m	Discount factor γ	0.4	MAC frame	272 bits
Event area radius	80 m	QoS factor $ϖ_{Q o S}$	$(0,1]$	Frame interval	50 μs
Mini detecting radius	10 m	Anti-interference threshold $k_{t h}$	$[1, \infty)$	Payload	1500 bits
Max detecting radius	50 m	Tuning step ε	0.002	Carrier sense	30
Base station coordinates	(128 m, 162 m)	Tuning step ζ	0.018	PHY layer rate	46~512 kbit/s
Poisson arrival rate	5~15 packets/s	Round B	1~1800	Channel coding	16 bit
Number of nodes	90~580	CW initial size	10	PHY frame	128 bits
Deployment type	Random	Maximum back-off stage	7	Energy model	First-order
Antenna	Omni	FT upper limit	15	Transmit power	34.2 mW
Path loss exponent	2.4	m	$[10,80]$	Receive power	22.1 mW
Network architecture	Homogeneous	Iteration threshold	2500	Sensitivity	20 dbm

Figure 3

Energy consumption of WSN with and without software defined architecture.

Figure 4

Comparison of load-balancing performance.

Figure 5

Network lifetime of SDWSN.

Figure 6

Comparison of node-level remaining energy.

Figure 7

Comparison of energy consumption per bit.

Figure 8

Comparison of control traffic cost.

Figure 3 shows the comparison of energy consumption rate between SDWSN-RL and a WSN without SDN (called NonSD-WSN-RL). NonSD-WSN-RL is different from SDWSN-RL mainly in that there is no SDN architecture or SOF support. In NonSD-WSN-RL, each switch uses hybrid energy-efficient distributed clustering routing (HEED) with a back-pressure mode [23], periodically computes the utility based on current queue gradients, and decides the next hop for each flow accordingly. The energy consumption rate is defined as the ratio of the energy consumption of SDWSN-RL (or NonSD-WSN-RL) to that of single-hop communication (without clustering or aggregation). Figure 3 shows that the average energy consumption rate of SDWSN-RL is much lower than that of NonSD-WSN-RL for each considered density of network nodes. The energy consumption rate of NonSD-WSN-RL increases faster with the increase of data-gathering rounds than SDWSN-RL. Furthermore, when the network node density increases from $ρ_{1} = 0.7$ to $ρ_{2} = 1.2 ρ_{1}$ , NonSD-WSN-RL has a much higher increase in energy consumption rate than SDWSN-RL. This is because the SDN-based controller can obtain real-time statistics on granular control and link status (which are not available in Non-SDN networks) for use in energy-efficient routing. Routing decisions made by CIM in SDWSN-RL are translated into rules and lightweight actions in flow tables to realize a floodless service discovery mechanism, thus limiting the amount of control messages. Without the support of SDN, routing tables are created (or reconstructed) in a collaborative way based on local exchanges of neighborhood information, which may require a lot of iterations before convergence.

Figure 4 shows the experiment results in terms of the normalized link BWU of 10 randomly selected links. For performance comparison with SDWSN-RL, we include three classic data-gathering schemes, SDN-IC, SDV+GPRS, and SWAN, which are content-centric, position-aided, and agent-based, respectively. We calculate the normalized variance of BWU ( $D_{B}$ ) based on the 10 randomly selected links during data-gathering round B as follows:

\begin{matrix} D_{B} = {‖var (Y)‖}_{p} \\ s.t. var (Y) = E \{{[Y - E (Y)]}^{2}\}; \\ Y = {\{e_{B}^{k_{i}}\}}_{i}; \\ \forall i \in V, k_{i} \in L, p = 2, \end{matrix}

(7)

where the

10 \times 1

vector Y contains the BWU of the 10 randomly selected links. Our calculations using the results in Figure 4 show that SDWSN-RL reduces the normalized variance of BWU by 8.9%, 12.8%, and 6.7% as compared to SDN-IC, SDV+GPRS, and SWSN, respectively, thus achieving improved load-balancing performance in SDWSN. The gap between the lowest and highest BWU across the 10 links of SDWSN-RL is 0.1357, which is the minimal among the five schemes. The results show that SDWSN-RL outperforms the other four schemes in terms of more balanced flow distribution in SDWSN by optimizing the weight of each link in the routing tree.

Figure 4 also includes NonSD-WSN-RL in the load-balancing performance comparison. We can see that the load-balancing performance of NonSD-WSN-RL is much worse than that of SDWSN-RL and the other SD based schemes. This is because the load-balancing routing mechanism in SDWSN-RL utilizes global network information to construct optimal routing paths in a centralized manner. More specifically, SOF in SDWSN-RL provides a lightweight control protocol between the central controller and the switches in the data plane. The controller uses information in flow tables to calculate the load-balancing routes among all switches and sends the flow tables back to the switches to indicate the next hop towards each destination. SOF provides simple APIs at switches and allows the controller to program the switches through the APIs, which provide flexible lookup mode for deploying routing protocols. The SDN controller can obtain information about granular control, network topology, and link statistics, which is used in the centralized load-balancing routing, while such information is not available or difficult to obtain in traditional WSNs without SDN. NonSD-WSN-RL relies on a distributed neighbor discovery approach, which is not efficient in load balancing. Moreover, frequent next-hop neighbor discoveries and data packet forwarding based on distributed communications would lead to a sharp increase in control traffic with the increase of node density.

Figure 5 plots the average survival rate of nodes versus the number of data-gathering rounds for the four considered schemes. The survival rate of nodes in a network can be used to evaluate the total energy consumption of a data-gathering mechanism [24]. The lifetime of SDWSN is defined as the duration of normal network operations (e.g., data-gathering) while the survival rate of nodes is maintained above a threshold ( ${S R}_{t h} \in [0.4,0.9]$ ). In Figure 4, we set ${S R}_{t h} = 0.45$ and denote the network lifetime achieved by SWAN, SDN-IC, SDV+GPRS, and SDWSN-RL schemes as R1, R2, R3, and R4, respectively. We can see that R1 < R2 < R3 < R4, with SDWSN-RL achieving the highest average node survival rate among the four schemes. This is mainly because the value-redundancy filtering and load-balance routing of SDWSN-RL effectively inhibit the transmission of low-value flows and balance the distribution of loads across nodes, leading to a longer network lifetime of SDWSN as compared to other three schemes.

Figure 6 indicates the remaining energy level of a node after 80 data-gathering rounds normalized with respect to its initial energy level (which is fixed at 2 mJ for all nodes), for 36 different nodes randomly selected in an annular area centered at the coordinate (48 m, 82 m) (i.e., the event center) with the inner and outer radiuses of 10 meters and 20 meters, respectively. The experiment results show that for almost all the selected nodes, SDWSN-RL achieves the highest residual energy level among the four considered schemes. This would effectively prolong the lifetime of SDWSN. With each scheme, the normalized residual energy level varies across different nodes. A higher (lower) level of the remaining energy is due to the smaller (larger) amount of data flows that the node has forwarded. Compared with the three existing schemes, SDWSN-RL offers a more balanced distribution of energy consumption across the network nodes. This is mainly due to the proposed load-balancing routing mechanism, which utilizes global network information to construct optimal routing paths in a centralized manner.

Figure 7 plots the average energy consumed by the four considered schemes for forwarding a single bit of data (mJ/bit) while meeting the same QoS requirement, versus the number of sensor nodes deployed in SDWSN. We can see that the energy consumption per bit of SDWSN-RL increases with the number of sensor nodes at a much slower rate than the three existing schemes, leading to a much lower energy consumption per bit of SDWSN-RL for large numbers of sensor nodes than the existing schemes. This is because the application-oriented in-network fusion in data plane of SDWSN-RL inhibits the transmission of value-redundant flows, and meanwhile the flow value determined by (1) is not notably affected by the number of traffic sources (i.e., nodes in the event gathering area), while the other three schemes would generate excessive traffic loads due to the large amount of local information exchange for executing distributed algorithms in data plane and the large amount of control overhead for cross-layer interaction, which degrade the energy efficiency especially for SDWSN with a large number of sensor nodes.

Figure 8 shows the comparison of control traffic cost between our proposed SDWSN-RL and the other three schemes (SWAN, SDV+GPRS, and SDN-IC), where the control traffic cost is defined as the ratio of control overhead to network throughout, and the network throughput is defined as the rate of successful bit delivery from the IG module to the monitor center. Figure 8 plots the normalized control traffic cost versus the time interval between two successive updates of the parameter $r_{u p d a t e}$ , which is defined as

\begin{matrix} r_{update} = {‖r_{topology}‖}_{p} * {‖r_{event}‖}_{p} \\ s.t. r_{topology} = Δ {density}_{nodes}; \\ r_{event} = Δ {rate}_{event  arrive}; \\ r_{topology} \in [0.05,0.1], r_{event} \in [5,15], p = 2, \end{matrix}

(8)

where

r_{topology}

denotes the change in network node density and

r_{e v e n t}

represents the variation in Poisson event arrival rate. Since

r_{t o p o l o g y}

and

r_{e v e n t}

are the two major factors influencing the network status, the update interval of

r_{u p d a t e}

indicates the frequency of network status change. When the WSNs status in monitoring applications frequently changes (e.g., the redistribution of data traffic caused by changes in topology or event arrival rate), the control overhead will increase due to frequent reconfiguration processes.

We can see from Figure 8 that the control traffic cost decreases with the increasing update interval of $r_{u p d a t e}$ for all the considered schemes. The control traffic cost of the proposed SDWSN-RL is much lower than the other three schemes for all considered values of $r_{u p d a t e}$ update interval. This is because the value-redundancy filtering in SDWSN-RL inhibits the transmission of value redundant loads, thereby avoiding generating control traffic between CIM and IM when the changes in network topology and/or event arrival rate do not significantly affect the monitoring data value. Routing decisions are made by CIM in the controller and then translated into rules and actions to be deployed in flow tables. Network APIs are used to configure flow tables for routing, in conjunction with a floodless service discovery mechanism. After the controller has configured routing, data packets can be forwarded and processed in the data plane. Subsequent (follow-up) packets in a flow are forwarded in the data plane based on the configured routing in flow tables without any further participation from the control plane. Such reductions in control traffic free up radio resources for more data packets to be successfully delivered, thus further lowering the control traffic cost. However, the other three schemes (i.e., SWAN, SDN-IC, and SDV+GPRS) adopt broadcast based service discovery mechanisms, where distributed systems collaboratively create the routing table and the amount of control messages grows while network throughput declines with the increase of network node density or event arrival rate.

Since the throughput is inversely proportional to the control traffic cost for given control overhead, the results in Figure 8 also indicate that SDWSN-RL achieves the highest throughput among the four schemes considered, because it significantly reduces local control message exchanges, thereby freeing up radio resources for more data packets to be successfully delivered. The other three schemes (SWAN, SDN-IC, and SDV+GPRS) use broadcast-based service discovery mechanisms and distributed protocol, where switches need to wait for the wireless medium to be free to send their packets and many data packets may have to be dropped due to the wait, thereby limiting the throughput. Moreover, the broadcast based service discovery mechanisms require high volumes of control messages to be exchanged and high packet processing overhead.

6. Conclusion

In this paper, we have proposed a SOF-based SDWSN prototype for improving the energy efficiency and adaptability of WSNs in environmental monitoring applications, taking into account the inherent constraints of WSNs in terms of energy, radio resources and computational capabilities, and the distributed data flows of monitoring applications. Experiments results have shown that the proposed SDWSN prototype can greatly improve energy efficiency by effectively inhibiting the transmission of value-redundant loads, reducing the amount of cross-plane communications and enhancing the load balance in SDWSN.

In our future work, we will improve the scalability of control-plane mechanisms using decentralized coordination to overcome the bottleneck of a single logical controller and develop an adaptive anti-interference mechanism to improve the robustness of SDWSN for diverse monitoring applications in wireless environments with severe interference.

Footnotes

Conflict of Interests

The authors declares that there is no conflict of interests regarding the publication of this paper.

References

Sun

Beaumont

L. U.

Osborne

Intrusion detection techniques in mobile ad hoc and wireless sensor networks

IEEE Transactions on Wireless Communications 2007 14 5 56 63

Sama

M. R.

Contreras

L. M.

Kaippallimalil

Akiyoshi

Qian

Software-defined control of the virtualized mobile packet core

IEEE Communications Magazine 2015 53 2 107 115

10.1109/MCOM.2015.7045398

Cui

Xiao

Liao

Stojmenovic

Data centers as software defined networks: traffic redundancy elimination with wireless cards at routers

IEEE Journal on Selected Areas in Communications 2013 31 12 2658 2672

10.1109/jsac.2013.131207

2-s2.0-84890523265

Bari

M. F.

Chowdhury

S. R.

Ahmed

Boutaba

Mathieu

A survey of naming and routing in information-centric networks

IEEE Communications Magazine 2012 50 12 44 53

10.1109/mcom.2012.6384450

2-s2.0-84871760798

Hong

C.-Y.

Kandula

Zhang

Gill

Nanduri

Wattenhofer

Achieving high utilization with software-driven WAN

Proceedings of the ACM Conference on SIGCOMM

August 2013

Hong Kong

ACM

15 26

10.1145/2486001.2486012

Gerla

Ongaro

Gomes

R. L.

Cerqueira

Towards software-defined VANET: architecture and services

Proceedings of the 13th Annual Mediterranean Ad Hoc Networking Workshop (MED-HOC-NET ′14)

June 2014

Piran, Slovenia

IEEE

103 110

10.1109/MedHocNet.2014.6849111

Karp

Kung

H. T.

GPSR: greedy Perimeter Stateless Routing for wireless networks

Proceedings of the 6th Annual International Conference on Mobile Computing and Networking (MOBICOM 20′00)

August 2000

243 254

2-s2.0-0034547115

Zeng

Guo

Miyazaki

Xiang

Energy minimization in multi-task software-defined sensor networks

IEEE Transactions on Computers 2015

10.1109/TC.2015.2389802

Zhao

Wang

Zhang

Adaptive and secure load-balancing routing protocol for service-oriented wireless sensor networks

IEEE Systems Journal 2014 8 3 858 867

10.1109/JSYST.2013.2260626

2-s2.0-84907599734

10.

Petrioli

Nati

Casari

Zorzi

Basagni

ALBA-R: load-balancing geographic routing around connectivity holes in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2014 25 3 529 539

10.1109/tpds.2013.60

2-s2.0-84894554312

11.

Girgin

Polat

Alhajj

Positive impact of state similarity on reinforcement learning performance

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2007 37 5 1256 1270

10.1109/tsmcb.2007.899419

2-s2.0-35148828462

12.

Wang

Che

Sun

Liang

ARMA model identification using particle swarm optimization algorithm

Proceedings of the International Conference on Computer Science and Information Technology (ICCSIT ′08)

September 2008

Singapore

223 227

10.1109/iccsit.2008.60

2-s2.0-57849160250

13.

Lee

Y.-F.

Shen

C.-C.

A transaction-based approach to over-the-air programming in wireless sensor networks

Proceedings of the International Symposium on Communications and Information Technologies (ISCIT ′07)

October 2007

Sydney, Australia

1377 1382

10.1109/iscit.2007.4392231

2-s2.0-46749085949

14.

Luo

Tan

H.-P.

Quek

T. Q. S.

Sensor openflow: enabling software-defined wireless sensor networks

IEEE Communications Letters 2012 16 11 1896 1899

10.1109/lcomm.2012.092812.121712

2-s2.0-84870572145

15.

Zhu

Chlamtac

Performance analysis for IEEE 802.11e EDCF service differentiation

IEEE Transactions on Wireless Communications 2005 4 4 1779 1788

10.1109/TWC.2005.847113

2-s2.0-27744446203

16.

Meyer

C. D.

Matrix Analysis and Applied Linear Algebra 2000 section 5.2

Society for Industrial & Applied Mathematics

17.

Wang

Yao

Estrin

Information-theoretic approaches for sensor selection and placement in sensor networks for target localization and tracking

Journal of Communications and Networks 2005 7 4 438 449

10.1109/JCN.2005.6387986

2-s2.0-30344470459

18.

Meyn

S. P.

The policy iteration algorithm for average reward Markov decision processes with general state space

IEEE Transactions on Automatic Control 1997 42 12 1663 1680

10.1109/9.650016

MR1490975

2-s2.0-0031344030

19.

Galindo-Serrano

Giupponi

Distributed Q-learning for aggregated interference control in cognitive radio networks

IEEE Transactions on Vehicular Technology 2010 59 4 1823 1834

10.1109/TVT.2010.2043124

2-s2.0-77952245702

20.

Sudhaakar

R. S.

Yoon

Zhao

Qiao

A novel QoS-aware MAC scheme using optimal retransmission for wireless networks

IEEE Transactions on Wireless Communications 2009 8 5 2230 2235

10.1109/TWC.2009.080294

2-s2.0-77955723407

21.

Zheng

Zhao

Chen

Design and implementation of switches in network simulator (ns2)

Proceedings of the 1st International Conference on Innovative Computing, Information and Control (ICICIC ′06)

September 2006

Beijing, China

721 724

10.1109/icicic.2006.65

2-s2.0-45849090378

22.

Nakamura

E. F.

Souza

E. L.

Towards a flexible event-detection model for wireless sensor networks

Proceedings of the 15th IEEE Symposium on Computers and Communications (ISCC ′10)

June 2010

459 462

10.1109/iscc.2010.5546517

2-s2.0-77956527524

23.

Ying

Shakkottai

Reddy

Liu

On combining shortest-path and back-pressure routing over multihop wireless networks

IEEE/ACM Transactions on Networking 2011 19 3 841 854

10.1109/TNET.2010.2094204

2-s2.0-79958842823

24.

Lee

Guan

Distributed algorithms for network lifetime maximization in wireless visual sensor networks

IEEE Transactions on Circuits and Systems for Video Technology 2009 19 5 704 718

10.1109/TCSVT.2009.2017411

2-s2.0-67249137352