Abstract
We present a Markovian decision process (MDP) framework for multihop wireless sensor networks (MHWSNs) to bound the network performance of both energy constrained (EC) networks and energy harvesting (EH) networks, both with and without relay cooperation. The model provides the fundamental performance limit that a MHWSN can theoretically achieve, under the general constraints from medium access control, routing, and energy harvesting. We observe that the analyses for EC and EH networks fall into two branches of MDP theory, which are finite-horizon process and infinite-horizon process, respectively. The performance metrics for EC and EH networks are different. For EC networks, an appropriate metric is the network lifetime; for EH networks, an appropriate metric is, for example, the network throughput. To efficiently solve the models with high dimension, for the EC networks, we propose a novel computational algorithm by taking advantage of the stochastic shortest path structure of the problem; for the EH networks, we propose a dual linear programming based algorithm by considering the sparsity of the transition matrix. Under the unified MDP framework, numerical results demonstrate the advantages of cooperation for the optimal performance, in both EC and EH networks.
1. Introduction
This paper considers modeling the optimal performance of multihop wireless sensor networks (MHWSNs), which can be energy constrained (EC) networks or energy harvesting (EH) networks, both with and without relay cooperation. The network performance of a MHWSN is a complex function of sensors' harvested energy, traffic volume, routing protocol, and medium access (MAC) technique. An analytical approach that is suitable for WSN traffic and that derives the optimum would serve as a valuable benchmark for heuristics MAC and routing protocols. However, such a multihop framework has not been presented. In this paper, we explore the optimum by presenting a unified Markov Decision Process (MDP) analysis that can analyze both EC and EH networks. We observe that the treatments for both networks fall into two branches of MDP theory, the finite-horizon process and the infinite-horizon process, respectively.
Existing analyses have mainly focused on a single node [1], a single link [2], or a single-hop network [3]. Departing from the literature, this paper addresses the general multihop network with neither assumptions on node distribution nor assumptions on total traffic pattern, motivated by the following reasons. First, since the communication range of a single node is limited, single-hop deployment [3] cannot always meet the requirements of many applications, such as environmental and structural monitoring, security and defense, and wireless body area networks [4]. Second, the traffic pattern of nodes in a MHWSN is unbalanced, invalidating the homogeneous or deterministic traffic assumptions, which are widely assumed for wireless ad hoc or mesh networks. In particular, some nodes near the Sink consume their energy faster because they must use their energy to relay other nodes' data. In EC networks, this unbalance creates the well-known “energy hole” problem [5]. In EH networks, the impact on energy consumption is still nontrivial. Moreover, the extent to which the traffic is unbalanced is not known a priori and is usually determined by a routing protocol. Therefore, towards quantifying the network performance, careful consideration should be taken to capture the peculiarity of the traffic pattern. Third, because the network's energy consumption rate is closely related to nodes' interaction in various aspects including channel access, routing, and cooperation, an optimal solution obtained by focusing only on a point-to-point link does not imply the global optimum for the network-wide objective. In fact, quantifying the latter problem is fairly challenging.
On another track, cooperative transmission (CT) is a mixture of communication protocol and physical (PHY) layer combining scheme that allows spatially separated nodes to collaborate to transmit a single message by forming a virtual multiple-input-single-output (VMISO) array. A receiver, gaining a signal-to-noise ratio (SNR) advantage of 10 dB to 20 dB through PHY combining [6], can decode the message at an extended range [7]. CT range extension has been experimentally demonstrated in [8], and it shows significant impact on Layer Two and Layer Three of a WSN by eliminating the “energy hole” in EC network [9, 10] and by providing better Quality of Service in EH networks [11]. Time-division CT (TDCT) is one type of cooperation, in which the source node multicasts the packet, and the cooperators that decode the packet forward it in orthogonal time slot [12]. Our previous work OSC-MAC [13] shows notable lifetime improvement over state-of-the-art MAC protocols by using TDCT to balance energy. Yet, it is noted that the performance gap between the CT MAC protocol [9, 10] with fixed routes and the theoretically optimal value is unknown. Also note that the analysis in [11], among others, assumes perfect link scheduling, which contrasts with the practical situation wherein link activities are subject to MAC constraints and packet collisions occur. All the above discussions have further motivated our analysis to comprehensively consider energy harvesting, routing, MAC, and cooperation, from the network perspective. Our previous work [14] models the lifetime of battery based multihop sensor networks, which, however, does not apply to energy unconstrained networks (e.g., through energy harvesting). This paper extends the model in [14] to provide an analysis framework that apply to both energy constrained (EC) networks and energy harvesting (EH) networks.
The rest of the paper is organized as follows. Section 2 presents the related work. Section 3 describes the system model and assumptions. An MDP formulation is detailed in Section 4. The computational method and numerical evaluations for non-CT and CT networks are presented in Sections 5 and 6, respectively. Finally, Section 7 concludes the paper.
2. Related Work
In this section, we describe existing works related to modeling for wireless sensor networks in terms of the scope of modeling, the assumptions, and the technical tools. Then we highlight the challenges solved and contributions made in this work in comparison.
Existing analyses are mainly limited to a single node, a single link, or a single-hop network [15]. For example, [1] focuses on energy management of a single energy harvesting (EH) node for achieving maximum throughput and minimizing delay, using an MDP [16] model. While it is assumed in [1] that the characteristic of traffic and energy is stationary (same as in this work), it also assumes infinite packet queue and energy queue, which is not practical. In [17], a method based on stochastic timed automata and statistical model checking is proposed to model a WSN protocol called timing-sync protocol, which however focuses only on node behavior. In [18], an MDP model is used to model the sensor processor (CPU) energy and Petri nets are used to model an energy constrained (EC) sensor node. In [2], an MDP model is presented for EH tags in a point-to-point link, assuming a time-slotted system and a stationary energy harvesting model. Transmission of EH nodes in a single link and in the context of a fading channel is tackled in [19]. In [3], an MDP model is used to formulate the lifetime of a single-hop EC network, which considers sensor scheduling in a scenario where only a fraction of sensors collect information and communicate directly with a one-hop Sink. In [20], considering nodes that are equipped with a hybrid energy storage system, the authors provide an MDP model for a single-hop EH network. The major limitations of these works are their inability to analyze the performance of a network that extends beyond one-hop deployment and the lack of analysis for cooperative transmission.
For multihop networks, the past analyses have exploited various approaches. The optimal lifetime of a multihop WSN is derived in [21] using linear programming (LP) formulation based on traffic balance conditions. However, the analyses in [21] are limited to only the routing layer. In [22], a model based on mixed integer linear programming is proposed to identify energy-efficient route from the source sensor to the Sink; yet, it considers only energy minimization and does not produce network-wide objective optimization. In [23], the authors provide a Lyapunov analysis for multihop EH WSNs, which considers the dynamics of energy and queue, to maximize the utility. However, [23] is based on queue stability, which is not suitable for low-traffic WSNs. Reference [24] studies the upper bound on the lifetime of data gathering sensor networks, while [25] derives the upper bound on the lifetime of large-scale networks using the theory of coverage process. Both [24, 25] are based on the assumption that the data sources are deployed with a particular probability density function or process; thus they are unable to capture the inhomogeneous traffic characteristic in a multihop network. We note that all the aforementioned works fail to consider MAC constraints, as they assume perfect link scheduling and thus ignore packet losses and retransmissions, which contrasts the practical situation wherein link activities are subject to MAC constraints and packet collisions occur. Moreover, none of these works, except for [21], considers cooperative transmission.
For cooperative networks, the existing analyses are typically limited to single-hop networks. In [26], the authors consider the lifetime maximization in an amplify-and-forward cooperative network and model the energy dissipation of nodes as a Markov chain. In [27], the authors develop a general probability model to study the outage performance of the best-relay selection with adaptive decode-and-forward cooperative network. Relatively less works have focused on analysis of multihop cooperative networks. The LP model in [28] has been recently extended to multihop CT networks [29] by considering all single-input-single-output (SISO) and virtual multiple-input-single-output (VMISO) links. Again, none of these works has considered MAC layer link constraints, and therefore the bounds that they provide are optimistic. Moreover, because the interference range has significant effect on the performance of the MAC, and the interference range of CT could differ from that of a SISO transmission, a complete evaluation of multihop cooperative networks is hard to obtain without the inclusion of MAC constraints.
The challenges in modeling a multihop WSN reside mainly in the scope of modeling and the computational complexity. In this paper, we report a unified modeling framework with reasonable complexity that provides the fundamental performance limit that a MHWSN can theoretically achieve. Our model applies to both energy constrained networks and energy harvesting networks, and it captures the general constraints from MAC, routing, and energy dynamics. We further present two computational methods taking the advantage of the characteristics of EC networks and EH networks, respectively.
3. System Model and Assumptions
Each sensor node has two queues, the packet queue where data packets are stored and the energy queue where energy is stored. Both the packet queue and the energy queue have limited capacity. We assume that no dynamic transmit power control technique is used; that is, all the nodes use the same transmit power level, which means CT is used exclusively for range extension purpose instead of power reduction.
3.1. Time Slots in the System
Unlike [30] where the time slot is varied and is defined as the interval between two random events on a node, that is, between instances of energy arrivals, we consider a mini-time-slotted system as shown in Figure 2, where the slot duration is constant and slots are normalized to integral units
In addition, we remark that the time slots in this paper are different and more general than those in the link scheduling literature. In the latter case, the time is divided into slots equal to the length of packet transmission time, and all nodes are synchronized to slot boundaries. Our model does not have such a constraint, so the packet length can span multiple slots and nodes are not necessarily synchronized to slot boundaries.
3.2. Topology and Link Models
3.2.1. Topology Model
The multihop network topology is modeled as a directed graph

An illustration of interference model and VMISO link. Node v is the receiver of Node u. Node u's hidden nodes in the gray area will interfere with v's reception.

A timeline illustration of the decision process model for the network. The decision epochs are at the beginning of each minislot.
3.2.2. Interference Model
A node has three ranges associated with it: the transmission range (
3.2.3. Medium Access Control Model
We assume that carrier-sensing-medium-access (CSMA) is performed before a node attempts to transmit. Further, similar to [35], it is assumed that (i) a node cannot transmit and receive at the same time; (ii) a node can transmit if none of its neighbors (in
3.3. Traffic and Energy Models
3.3.1. Traffic Model
The transmission duration of a link
The number of exogenous (self-generated) packets at Node i of commodity d (destined to Sink d),
3.3.2. Energy Harvesting Model
A realistic power consumption model for a sensor node has been studied in [37], which decouples the power usage into baseband DSP circuit, the RF front-end circuit for transmitting and receiving, the power amplifier (PA), and low noise amplifier (LNA). A similar circuit-level analysis is also performed in [38] for energy modeling in cooperative multiple-input-multiple-output (MIMO) sensor networks. In [39], six different statistical models have been used to fit empirical datasets of a solar-powered energy harvesting sensor node. Our recent work [40] provides an energy model for energy harvesting node considering harvesting and leakage in a supercapacitor.
While we are aware of more complex energy models, for simplicity, in this paper we assume a simpler model. However, we note that it should be straightforward to incorporate complex energy consumption and harvesting models into our framework. In this study, the energy harvested at Node i during any slot t is modeled as a binary RV, which is denoted by
4. Markov Decision Process Formulation
In this section, we describe the Markov Decision Process formulation for the EC and EH networks. We model the network state space by a tuple consisting of the transmission set in the network considering the MAC constraints, the queuing level, and the energy level for each node. Next we specify the action space at each time slot, which affects the network state. Then we analyze the system transition dynamics by decoupling the system into the tuple components with condition and then combining them based on the chain rule.
4.1. Energy Constrained (EC) Networks
4.1.1. Network State Space
The network state is defined as
The transmission set
To find
4.1.2. Decision Epochs and Action Space
A decision epoch corresponds to the beginning of a time slot. The set of decision epochs are denoted by
4.1.3. State Transition Dynamics
We first characterize the state transition of each component.
(
1) Transmission Set Dynamics. Let action
Define
(
2) Queue Length Dynamics. Considering that EC networks are most suitable for light traffic applications, such as monitoring, we assume that at most one node self-generates a packet during a slot. Let
Then the queue evolution can be described by the following cases.
Case 1 ((
):
).
This can happen for two reasons: (
Case 2 ((
):
,
).
This case includes two possibilities: (
Case 3 ((
):
,
).
This happens because some link
Case 4 ((
):
,
,
).
This case includes two possibilities: (
Case 5 ((
):
,
,
).
This occurs because the SISO link
Case 6 ((
):
,
,
).
This occurs because the SISO link
(
3) Energy Evolution Dynamics. The process
(
4) System Dynamics. The transition matrix of the system states
Theorem 1.
The transition kernel of the EC system,
Proof.
According to the chain rule,
4.1.4. Expected Total Rewards
During a time slot, the system obtains a reward
4.1.5. MDP Formulation
A transmission policy is a series of decision rules
The optimal lifetime
4.2. Energy Harvesting Networks
The link set dynamics in the EH network are the same as in the EC network. In this section, we give a more general expression for the queue system, relaxing the low-traffic assumption in the previous sections. EH networks may provide better support for high traffic load than EC networks [43].
In a uniform form, the evolutions of
4.2.1. Queue Length Dynamics
In (20),
Let
Then considering exogenous packet arrivals, the queue status is updated as follows:
4.2.2. Energy Evolutions
Using analysis similar to that for the queue in Section 4.2.1, denote
Then, considering the influence of energy harvesting, we get
4.2.3. System Dynamics
Similar to the EC network, the transition matrix of the system states
Theorem 2.
The transition kernel of the EH system,
4.3. Performance Criterion and Reward Function
Let
The definition of the reward function,
Since the state space and the action space are finite, with a finite reward function, (26) can be expressed as
4.4. Optimality Equations
The optimality equation for the expected total discounted reward criteria given an initial state, a.k.a. Bellman's equation, is given as follows:
5. Computational Methods
In this section, we present the offline computational algorithms for the EC and EH networks. The methods seek to find the global optimum and do not require any information exchange among the nodes.
5.1. Energy Constrained Networks
Considering that the total energies are nonincreasing, we group the states according to the sum energy (in decreasing order) and solve the stages backwards. The transmission sets are computed using the Bron-Kerbosch algorithm [42], which is widely used and considered as one of the fastest. In each stage, the computation of the energy and queue state spaces is similar to the Bin-Ball problem (the Bin-Ball problem refers to enumerating the ways of allocating
For an integer m (stage), the set of states
For each
Find transmission link set using Bron-Kerbosch alg.; Solve Bin-ball problem to obtain Solve MILP for Subproblem-1 to obtain
5.2. Energy Harvesting Networks
The following theorem provides the basis for a linear programming (LP) approach to solve the MDP problem. The proof can be found in [16].
Theorem 3.
Suppose there exists
From the observation in Theorem 3, the primal LP is constructed as follows.
Primal LP. Consider the following:
However, it is more informative to solve this model using its dual LP [16], which has less rows in the constraints matrix.
Dual LP. Consider the following:
Note that the primal LP has
5.3. Computation Complexity of Large-Scale MDP
It has been shown that MDP is solvable in polynomial time by successive approximation methods such as value iteration, without the existence of a known strongly polynomial time algorithm [45]. Notoriously, when dealing with large-scale MDP, the computation of iterations may appear prohibitive. In contrast, computation based on large-scale linear programming is fast solvable with the state-of-the-art LP solver, in polynomial time depending only on the size of A (the constraint matrix in
In this paper, we do not target finding the optimal policy. However, we note that standard policy iteration algorithms and their various modifications can be applied offline to obtain the optimal stationary policies, at the cost of iterations over the whole state space. In addition, policies obtained from this network model would require a network centralizer to conduct the execution, which is less appealing than a distributed approximate optimal policy and which is out of the scope of this paper.
6. Numerical Evaluations
We present some numerical results on the optimal lifetime for the EC network and the expected total discounted reward for the EH network, under the 2-hop funnel topology (Nodes A, B, C, and D and the Sink), as in Figure 1. Besides the computational limitation inherent to MDP, the motivation for choosing a small network is as follows. It is observed from our previous work [10] that under duty-cycling, a large network is typically reduced into isolated small networks in the time domain, to reduce interference and collisions. This observation renders the possibility to analyze a small tree topology towards the whole network.
6.1. Energy Constrained Networks
In this section, we evaluate the optimal lifetime of the EH network and impacts from certain parameters. Note that the funnel topology (with the existence of the bottleneck Node C) captures the essence of the energy hole problem.
Figure 3 depicts the optimal lifetime for non-CT and CT networks. We observe the lifetimes are linearly increasing with the battery capacity. We also observe that the performance of CT network is significantly higher than that of the non-CT network. For example, with battery capacity of

Optimal lifetime for non-CT network and CT network, as battery capacity varies.
Figure 4 shows the computed lifetime for different packet arrival rates (PARs). The PAR is normalized with the packet length, that is, the number of new exogenous packets per node during a packet duration. The PARs where the curves start to become flat represent the PAR thresholds, beyond which the numerical results are accurate. Figure 4 demonstrates that our model is accurate for a very large range of arrival rates (<0.4). For example, if a node generates a packet every 100 seconds and the packet transmission time is 100 ms, then the PAR is 0.001, well below

Lifetime for non-CT network and CT network, as the normalized packet arrival rate (PAR) varies.
6.2. Energy Harvesting Networks
In this section, we evaluate the expected total discounted reward and the impacts from certain parameters. The reward function in (27) is used for plotting Figures 5, 6, and 7, while (28) is used for Figure 8.

The effect of discount factor λ.

The effect of energy harvesting rate γ.

The effect of packet arrival rate α.

The effect of weight factor
Figure 5 shows the expected total discounted reward of non-CT and CT networks, versus different discount factors λ. The decision-making horizon is indicated by
Figure 6 depicts the expected total discounted reward of non-CT and CT networks versus energy harvesting rate γ, with
Figure 7 shows the expected total discounted reward of non-CT and CT networks versus exogenous packet arrival rate α, with
Figure 8 gives an example that the calculated benchmark of performance is highly affected by the choice of reward function. The network designer can choose a reward function according to a particular QoS requirement. The x-axis is the preference weighting factor
In Figure 9, we evaluate the effect of the CT overhead on the expected total discounted reward of the CT network, with different preference weighting factors

The effect of CT overhead rate. Weight factors
7. Conclusions
Energy harvesting (EH) and energy constrained (EC) multihop wireless sensor networks are modeled under a unified Markovian decision process framework. The model extends the literature by encompassing, from the network interaction point of view, energy harvesting, routing, MAC, and cooperative transmission. The performance of EC and EH networks is characterized by finite-horizon and infinite-horizon problems, respectively. The EC model captures the traffic imbalance due to the “energy hole,” which is a known bottleneck problem limiting network lifetime. In the EH network, the expected total discounted reward model allows one to formulate different performance metrics of interest. The model of EC networks is solved by mixed integer linear programing, capitalizing on the stochastic shortest path nature of the problem, while the model of EH networks is solved by a dual linear programming approach. Numerical results evaluate the sensitivity of several network parameters on the optimal performance of non-CT and CT networks and show the nontrivial impacts of cooperative transmission.
Footnotes
Disclosure
Mary Ann Weitnauer was formerly Mary Ann Ingram.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The authors gratefully acknowledge support for this research from the US National Science Foundation (NSF) under Grant CNS-1017984.
