Sage Journals: Discover world-class research

Abstract

This article investigates the data persistence problem in the planetary surface network of interplanetary Internet using the distributed raptor codes. In order to improve the lifetime of space information and space nodes’ energy efficiency in planetary surface network, we propose an efficient data persistence strategy based on raptor codes and probabilistic broadcasting. Unlike most existing data persistence strategies where the random walks are used to disseminate source packets, the probabilistic broadcasting mechanism is employed in the proposed strategy to reduce the data dissemination cost by exploiting the broadcast property of wireless networks. The decoding performance and data dissemination cost are analyzed. Simulation results validate that the proposed strategy consumes the least data dissemination cost while achieving a better decoding performance compared with other representative strategies.

Keywords

Planetary surface network data persistence probabilistic broadcasting fountain codes raptor codes

Introduction

The interplanetary Internet (IPN) is proposed to face the requirements of different deep space missions early in this century. It is composed of a backbone network, external networks, and planetary networks (PNs). As shown in Mukherjee and Ramamurthy¹ and Akyildiz et al.,² the PN includes a planetary orbiter network (PON) and a planetary surface network (PSN), where the PSN is composed of a large number of space nodes such as rovers, astronauts, and landers. In order to accomplish collaborative tasks, space nodes in the PSN autonomously communicate to each other.³ Unlike traditional wireless networks, the network status of PSN changes dynamically due to the wide range and uneven distribution of space nodes.⁴ As proposed in our prior research,⁵ we divided the PSN into a series of hierarchical autonomous system (AS) networks based on the property of space nodes. In this way, the complex PSN is decoupled into small-scale quasi-static networks, which makes the control easier to carry out.

Nowadays, deep space information is increasing exponentially. According to the space database of Union of Concerned Scientists (UCS), the amount of deep space information in 2025 will be 33 times than that in 2015.⁶ However, space nodes in the PSN are resource limited, and sometimes, they may even be out of function due to hazardous environments of deep space. Therefore, how to prolong the lifetime of deep space information in PSN is challenging. The best approach is disseminating it with redundancy among the entire network, and the original data can be reconstructed by querying a few space nodes even though some are accessible. This is known as the data persistence or distributed data storage problem in wireless communication networks.⁷

The data persistence problem has been widely studied in wireless networks. Replication is the simplest redundancy manner, but it induces too much storage overheads. Compared with replication, coding redundancy techniques can significantly improve the storage efficiency. Traditional erasure codes provide a potential solution and have been applied in many practical storage systems.^8–10 To achieve a better trade-off between the repair bandwidth and storage cost, Dimakis et al. introduced the network coding¹¹ into storage systems and proposed the regenerating codes.^12,13 Fountain codes¹⁴ are a class of rateless codes that can generate unlimited number of encoded packets with low complexity, which is suitable for data persistence in wireless networks.

We consider the data persistence problem in the PSN, and the main objective is to increase the reliability of the AS network by prolonging the lifetime of data generated by space nodes. In addition, we aim to reduce the energy consumption among the data dissemination process. In our proposed strategy, redundancy packets are obtained based on Raptor codes¹⁵ and original data are disseminated among the AS network using the probabilistic broadcasting (PBcast)¹⁶ manner. We refer to our strategy as data persistence based on raptor codes and probabilistic broadcasting (DPRPB). To the best of our knowledge, DPRPB is the first strategy that exploits the broadcast property of wireless communications to improve the energy efficiency. Detailed data dissemination and encoding procedures are given. Theoretical analysis and extensive simulations validate that compared with other representative strategies, the proposed DPRPB strategy reduces the cost of data dissemination, while improving the decoding performance.

The rest of this article is organized as follows. In section “Related work and preliminaries,” we briefly review the related work and provide the basic knowledge about AS networks of PSN, Raptor codes, and PBcast. In section “Network model and problem description,” we define the network model and describe the data persistence problem in AS network. In section “Proposed DPRPB strategy,” we propose the DPRPB strategy to improve the energy efficiency and data persistence. Then, the validity of DPRPB strategy and data dissemination cost are analyzed in section “Data dissemination cost and decoding performance analysis.” In section “Simulation results and discussions,” simulation results and discussion are presented. Finally, section “Conclusion” concludes this article.

Related work and preliminaries

In this section, we briefly introduce the necessary background to design our strategy.

Related work

Redundant storage with coding has been widely studied in the data persistence problems to ensure storage reliability and efficiency. In particular, a file of size M is divided into k fragments (with equal size $M / k$ ). Then, they are encoded into n encoded fragments and disseminated among network nodes. The original file can be successfully reconstructed with high probability (w.h.p.) from any $k (1 + ε)$ collected encoded fragments, where $ε \geq 0$ is the decoding overhead. Fountain codes provide a potential solution to reduce the storage complexity due to the rateless property. Dimakis et al.¹⁷ proposed a decentralized fountain codes–based strategy for network storage, and the encoded packets are generated according to greedy routing algorithm. Kamra et al.¹⁸ proposed the growth code which has the similar rateless property with fountain codes, and the partial decoding performance was improved.

Existing researches about fountain codes–based data persistence strategies in wireless networks mainly focus on two aspects: the process of disseminating source packets and the encoding algorithm.^19–24 In Lin et al.,¹⁹ random walks²⁵ are first used to exchange source packets between storage nodes. It can be modeled as a Markov process, and the next visited node is randomly chosen from all neighbors of the current node. Inspired by Lin et al.,¹⁹ most existing strategies use random walks to disseminate source packets since it does not need any routing table. However, they all ignore the inherent broadcast nature of wireless communications. As shown in Wang et al.,²⁶ the data dissemination cost includes both data transmissions and data receptions. It is our intuition that the data dissemination cost can be significantly reduced through broadcasting. To the best of our knowledge, only Liang et al.²⁴ uses overhearing to improve the data persistence, while the data dissemination cost is not considered. In this article, an efficient variation in broadcast, probabilistic broadcasting (PBcast), is employed for packet dissemination.

The encoding process is another issue in data persistence problems. At first, encoding procedure was carried out in Lin et al.¹⁹ and Ye et al.²⁰ only after all storage nodes have converged to the steady-state distribution, that is, all random walks were terminated. Then, each node randomly exclusive-or (XOR) some packets stopped on it according to its code degree. Therefore, all stopped packets should be temporarily saved in the corresponding node until running the encoding procedure. This induces excessive storage overhead. To reduce the overhead, encoding procedure in previous studies^21–23 is done on the fly, that is, during the process of data dissemination. Specifically, each source packet launches a random walk, and only at its first visit to a node, it is considered for XORing. Therefore, each packet should visit all nodes at least once to fulfill its code degree with probability. However, this result in an uneven degree distribution since the encoding procedure is done regardless of whether the code degree is fulfilled. In the proposed DPRPB strategy, the encoding procedure is performed upon source packets’ each visit unless the code degree has fulfilled or the packet has been XORed. As a result, the desired degree distribution is fulfilled w.h.p. and at the same time it is performed on the fly.

AS networks of PSN

The PSN is a part of IPN, and it is a self-organizing network that contains various space nodes. As shown in Figure 1, space nodes are located in a wide range and they have uneven distribution. At the same time, they work in different regions with either static (e.g. sensors) or mobile (e.g. landers) statuses. The unified management method is not efficient for the PSN because it necessitates too much control messages. Hence, as shown in Figure 2, the PSN is decoupled into a series of AS networks. Each AS network is composed of similar space nodes and can be modeled as a homogeneous network.⁵ In this way, the complex PSN is divided into small-scale quasi-static networks. Independent manage scheme can be utilized in each AS network, and only neighboring AS networks exchange the control information to keep connectivity of the whole PSN. This makes the control easier to carry out.

Figure 1.

The architecture of PSN on Mars.

Figure 2.

The PSN is decoupled into a series of AS networks based on the property of space nodes.

Fountain codes and raptor codes

Fountain codes are a class of rateless codes that can generate unlimited number of encoded packets from a finite number of source packets. Luby transform (LT) codes²⁷ are the first practical fountain codes and have been applied in many fields such as satellite broadcast and cooperative communications. As an extension of LT codes, Raptor codes concatenate a pre-code with LT codes. This relaxes the constraint that all source packets should be recovered from the LT-coding part, which achieves the linear encoding and decoding complexity. In particular, encoded packets are generated by randomly XORing d source packets, where d is the code degree and is independently drawn from the degree distribution function.

The degree distribution function is the key factor that affects the decoding performance of fountain codes. Let the pre-code rate of Raptor codes be $r = (1 + ε / 2) / (1 + ε)$ , where $ε > 0$ is a system parameter. Then, the degree distribution of LT-coding part of Raptor codes is

\begin{matrix} Ω_{D} (d) = {\begin{matrix} \frac{μ}{1 + μ}, & d = 1 \\ \frac{1}{1 + μ} \times \frac{1}{d (d - 1)}, & d = 2, \dots, D \\ \frac{1}{1 + μ} \times \frac{1}{D}, & d = D + 1 \end{matrix} \end{matrix}

(1)

where $D = 4 (1 + ε) / ε$ and $μ = (ε / 2) + (ε / 2)^{2}$ . For Raptor codes with $Ω_{D} (d)$ , all k source packets can be successfully recovered w.h.p. from any $k (1 + ε)$ encoded packets using the belief propagations (BP) algorithm. Both the encoding and decoding complexity is $O (\log (1 + ε))$ . Without loss of generality, we use the random low-density parity-check (LDPC) codes as the pre-code in this article.

Probabilistic broadcasting: PBcast

As in previous studies,^19–24 random walk has been used in most existing data persistence strategies for packet dissemination. The length is set to the network cover time to satisfy overall coverage, where the cover time of a random walk is defined as the minimal length that ensure a source packet visiting the entire network. According to Zhong et al.,²⁵ w.h.p., the cover time of a connected network with n nodes is $Θ (n \ln n)$ . During the data dissemination process, hundreds of thousands of steps are necessitated. This consumes too much energy. In this article, we use the PBcast mechanism for data dissemination by exploiting the inherent broadcast property of wireless communications.

PBcast is an efficient and reliable variation of flooding, the simplest form of broadcast. It can effectively overcome the well-known broadcast storm problem of flooding.²⁸ During the PBcast process, each node that receives a source packet for the first time will forward (rebroadcast) it with some probability p. W.h.p., the parameter p should be larger than the threshold $p^{th}$ to ensure most nodes receiving a particular packet. As given in Rahnavard et al.,¹⁶ $p^{th}$ can be calculated by

p^{th} \approx \frac{1.44 V}{n R^{2}}

(2)

where V is the area of network and R is the transmission radius.

Network model and problem description

Network model

Considering that the properties of space nodes in the same AS network of PSN are similar, each AS network can be expressed as a homogeneous network. In this article, we consider the data persistence problem in an AS network. Without loss of generality, we model the AS network as an undirected random simple graph $G (n, R)$ as shown in Figure 3. There are n space nodes randomly and uniformly distributed in the graph $G (n, R)$ , each space node has the same maximal transmission radius R, and it can directly exchange information with any nodes within its transmission radius. In particular, two nodes are called neighbors if their Euclidean distance is less than R. The number of neighbors of a node is defined as the node degree, and the average node degree of all space nodes is called the density of network. Each space node is assigned a unique ID based on its property, for example, media access control (MAC) address. In order to ensure the network connectivity, R should satisfy $R^{2} \geq bV \log n / n$ , where b is a positive constant.²⁹ Let the node ID is $I D_{i} = {i | 1 \leq i \leq N}$ . We do not make any assumption about the routing table since the proposed strategy is fully distributed.

Figure 3.

Example of AS network.

Problem description

For ease of the theoretical analysis and to have a fair comparison with other algorithms proposed in the literature, it is assumed that each space node has limited memory to save only one packet when served as a storage node. We assume that there are $k (k << n)$ data nodes which are chosen randomly from all n nodes, and each data node generates some information periodically. The objective of our strategy is to recover all original source data by querying a few storage nodes in a decentralized manner. Then, our data persistence problem can be described as follows: considering a graph $G (n, R)$ with n space nodes distributed uniformly at random in the region V without any global information and deterministic routing table, how can data nodes efficiently disseminate their source packets to the AS network so as to improve the data persistence? To achieve this, we propose the DPRPB strategy using Raptor codes and PBcast. The data dissemination and encoding processes are discussed in section “Proposed DPRPB strategy.”

Proposed DPRPB strategy

In this section, we present our data persistence strategy, that is, DPRPB in the AS network. We first describe the proposed DPRPB strategy generally; then, we discuss the choices of two parameters b and p.

Description of the DPRPB strategy

We aim to have all n nodes store an encoded packet corresponding to the k source packets in a decentralized way. The DPRPB strategy is divided into four phases: the initialization phase, the pre-coding phase, the LT-coding phase, and the storage phase. PBcast is used in phase II and phase III for data dissemination. There are three groups of nodes in the AS network: besides the data nodes and storage nodes as described in section “Network model,” there are $m (k < m < n)$ pre-coding nodes that temporarily store the output packet of the pre-coding phase, and we term them as the “pre-coding packets.” The pre-coding packets are also referred as the “source packets” of the LT-coding phase, and the number of pre-coding nodes is based on the pre-code rate. To achieve the desired code degree, the encoded packets should XOR some source packets which are randomly chosen from all source packets. Therefore, it is crucial for each source packet visiting the entire network at least once in the pre-coding phase and LT-coding phase. This is done by the choice of forwarding probability p of PBcast. The torus convention³⁰ is used in the proposed DPRPB strategy to avoid the edge effect, that is, the node degree of space nodes that is far away from network center is smaller than the density of network. Then, the detailed phases of DPRPB strategy are given as follows:

1. Phase I. Initialization phase: We assume that the communication in the AS network is slotted and synchronized. At the beginning of the DPRPB strategy, each data node $s, s = 1, \dots, k$ generates its source information $x_{s}$ , then forms its initial source packet $P_{s}$ by adding its node ID to packet header, that is, $P_{s} = [I D_{s}, x_{s}]$ . Then, each pre-coding node $v, v = 1, \dots, m$ generates its pre-code degree $d_{p} (v)$ according to the LDPC degree distribution $Ω_{L}$ . After that, each storage node $u, u = 1, \dots, n$ forms the initial structure of storage packet as shown in Figure 4. The structure explanation is as follows. (1) $Q_{u}$ : a queue that temporarily records the visited packets and without iteration; (2) $ns (u)$ : the number of packets that node u has XORed; (3) $S (u)$ : the packets that node u has been XORed, that is, $ns (u) = | S (u) |$ ; (4) $I D_{u}$ : the ID of node u to break ties; and (5) $y_{u}$ : the data stored in node u.

2. Phase II. Pre-coding phase: Then, in the second phase, that is, the pre-coding phase, data nodes $s, s = 1, \dots, k$ disseminate their source packets to the AS network according to PBcast. The main objective of the pre-coding phase is to have m pre-coding nodes temporarily store a pre-coding packet corresponding to the k source packets. In each round, when a node $u, u = 1, \dots, n$ receives a source packet for the first time, it will forward the source packet to its neighbors $N (u)$ with probability p. When a pre-coding node $v, v = 1, \dots, m$ receives a source packet $P_{s}$ , it will run the pre-coding procedure as described in Algorithm 1. If the pre-code degree $d_{p} (v)$ is not fulfilled and $P_{s}$ has not been XORed at node v previously, that is, $ns (v) < d_{p} (v)$ and $P_{s} \notin S (v)$ , the pre-coding node v will XOR the packet $P_{s}$ with its current data, that is, $y_{v} = y_{v} \oplus x_{s}$ .

Figure 4.

Packet structure of storage node u.

Algorithm 1.
Pre-coding procedure
Input: visited packets $P_{s}, s = 1, \dots, k$ for node $v, v = 1, \dots, m$
Output: pre-coding packet $Y_{v}$
Begin:
if $ns (v) < d_{p} (v)$ then
if $P_{s} \notin S (v)$ then
$S (v) = S (v) \cup P_{s}$ ;
$y_{v} = y_{v} \oplus x_{s}$ ;
$ns (v) = ns (v) + 1$ ;
end
else
continue
end
return $Y_{v}$ ;

Due to the small probability p, each PBcast ends after source packets forward few times. Therefore, we can set a timer and beyond which the pre-coding phase can be considered as finished. During the PBcast process, we do not consider the packet loss because it can be resolved at the lower layer.

3. Phase III. LT-coding phase: At the end of the pre-coding phase, each pre-coding node $v, v = 1, \dots, m$ stores a pre-coding packet, and it is also the “source packet” of the LT-coding phase. Without loss of generality, the corresponding notation is also denoted by $P_{v}, v = 1, \dots, m$ . The objective of the LT-coding phase is to have all n storage nodes store an encoded packet corresponding to the m pre-coding packets. At the beginning of LT-coding phase, each storage node $u, u = 1, \dots, n$ generates its LT code degree $d_{c} (u)$ according to the degree distribution $Ω_{D}$ . Again, m pre-coding packets are disseminated to the AS network according to the PBcast mechanism. In each round, when a node $u, u = 1, \dots, n$ receives a pre-coding packet for the first time, it will forward the pre-coding packet to its neighbors $N (u)$ with probability p. Specifically, when a pre-coding packet $P_{v}, v = 1, \dots, m$ visits storage node $u, u = 1, \dots, n$ , it will run the LT-coding procedure as described in Algorithm 2.

Algorithm 2.
LT-coding procedure
Input: visited packets $P_{v}, v = 1, \dots, m$ for node $u, u = 1, \dots, n$
Output: storage packet $Y_{u}$
Begin:
for $v = 1$ to $v = m$ do
if $P_{v}$ is the first visited packet then
temporary store $P_{v}$ with probability 1;
else if $P_{v}$ is the second visited packet then
run the XORing procedure for the first visited packet;
run the XORing procedure for the second visited packet;
else
run the XORing procedure for $P_{v}$ ;
end
end
return $Y_{u}$ ;

Here, we discuss the LT-coding procedure, that is, how in DPRPB strategy, each storage node u made its decision about the visited pre-coding packets $P_{v}, v = 1, \dots, m$ and XORed it with the current content. The main objective is to increase the probability that each storage node fulfills its LT code degree. If it is the first pre-coding packet that visits node u, this packet will be temporarily stored in its memory. After the second pre-coding packet visits (whether it is the same with the first visited packet or not), storage node u will perform the XORing procedure (as described in Algorithm 3) for the first visited packet. In particular, it will run a Bernoulli trial and accept the pre-coding packet with probability $d_{c} (u) / m$ . For running the Bernoulli trial, node u randomly generates a positive number (variable coin in Algorithm 3) between 0 and 1. If $coin \leq d_{c} (u) / m$ , node u will XOR the visited packet with its current content. Otherwise, node u will run the same XORing procedure for the second visited pre-coding packet. For other visited pre-coding packets (after the second visited packets), node u will make its decision as follows.

Algorithm 3.
XORing procedure
Input: visited packets $P_{v}, v = 1, \dots, m$ for node $u, u = 1, \dots, n$
Output: storage packet $Y_{u}$
Begin:
if $ns (u) < d_{c} (u)$ then
if $P_{v} \notin S (u)$ then
$coin = rand (1)$ ;
if $coin \leq d_{c} (u) / m$ ;
$S (u) = S (u) \cup P_{v}$ ;
$y_{u} = y_{u} \oplus x_{v}$ ;
$ns (u) = ns (u) + 1$ ;
end
end
else
continue
end
return $Y_{u}$ ;

Upon a pre-coding packet $P_{v}, v = 1, \dots, m$ visits, node u will check the value of $ns (u)$ . If $ns (u) < d_{c} (u)$ and $P_{v} \notin S (u)$ , that is, the number of pre-coding packets that node u has been XORed is less than the LT code degree, at the same time $P_{v}$ has not been XORed previously, with probability $d_{c} (u) / m$ node u will run the LT-coding procedure and XOR $P_{v}$ with the current content. This guarantees that the number of XORed pre-coding packets never exceed the code degree. However, in previous studies,^21–23 nodes only run the LT-coding procedure if it is the first time that a pre-coding packet visits. Also, this procedure continues even after the code degree has fulfilled. As a result, nodes with lower code degree might not XOR enough pre-coding packets and vice versa. In our proposed DPRPB strategy, each packet runs the encoding procedure in its every visit. According to the simulation results, this increases the probability that each node fulfills its code degree. We assume that each pre-coding node considers its own pre-coding packet as the first visit packet and runs the same LT-coding procedure for other visited pre-coding packets. Similarly, due to the small probability p, this phase terminates after pre-coding packets forward few times.

In a slight abuse of notation, we use the term “visit”, which refers to the pre-coding packet either received or overheard by a node u in the LT-coding phase, and node u forwards the packet (with probability p) only when it receives this packet for the first time. The overhearing property will increase the number of LT-coding procedures in each node, thus further increases the probability that each node fulfills its code degree.

Figure 5 shows an AS network with $n = 6$ nodes, where node 2 is chosen as the pre-coding node. At the beginning of the LT-coding phase, node 2 broadcasts its packet $P_{2}$ to nodes 1, 4, and 5 with probability 1. After running the LT-coding procedure, those reception nodes will independently choose themselves as the intermediate node (with probability p) to forward packet $P_{2}$ . Assume that node 4 is chosen to forward packet $P_{2}$ to nodes 3 and 6. Since nodes 1 and 2 are also within node 4’s transmission range, they will overhear the packet $P_{2}$ and run the LT-coding procedure if $P_{2}$ has not been XORed. For running the repetitive LT-coding procedure, they can read the packet $P_{2}$ in the queue $Q_{1}$ and $Q_{2}$ , thus consume less energy.

Figure 5.

The overhearing property increases the number of LT-coding procedures.

4. Phase IV. Storage phase: During the LT-coding phase, each pre-coding packet $P_{v}, v = 1, \dots, m$ visits all nodes in the AS network at least once. After node $u, u = 1, \dots, n$ runs the LT-coding procedure and makes its decision for all received packets, it will finish the encoding process and store the current packet $Y_{u}$ in the storage phase. To recover the original k source packets, a user node (either fixed or mobile) queries a few nodes and collects their storage packets. The BP algorithm can be used to decode the original data.

The pseudo-code of DPRPB strategy is described in Algorithm 4.

Algorithm 4.
DPRPB strategy
Input: source packets $x_{s}, s = 1, \dots, k$ , positive constant p
Output: storage packets $Y_{u}, u = 1, \dots, n$
Begin:
/Phase I: Initialization Phase/
for all data nodes $s, s = 1, \dots, k$ do
$P_{s} = [I D_{s}, x_{s}]$ ;
end
for all pre-coding codes $v, v = 1, \dots, m$ do
$d_{p} (v) \leftarrow Ω_{L}$ ;
end
for all storage codes $u, u = 1, \dots, n$ do
$Y_{u} = [Q_{u}, ns (u), S_{u}, I D_{u}, y_{u}]$ ;
$Q_{u}, S_{u}, y_{u} = []$ ;
end
/Phase II: Pre-coding Phase/
k source packets are disseminated according to PBcast;
while (There exists a node that forward source packet) do
for $s = 1$ to $s = k$ do
for pre-coding node $v, v = 1, \dots, m$ receives $P_{s}$ do
if $P_{s}$ visits node v for the first time do
$Q_{v} = Q_{v} \cup P_{s}$ ;
end
run the pre-coding procedure (Algorithm 1);
end
end
end
/Phase III:LT-coding Phase/
for all storage codes $u, u = 1, \dots, n$ do
$d_{c} (u) \leftarrow Ω_{D}$ ;
end
m pre-coding packets are disseminated according to PBcast;
while (There exists a node that forward pre-coding packet) do
for $v = 1$ to $v = m$ do
for node $u, u = 1, \dots, n$ receives/overhears $P_{v}$ do
if $P_{v}$ visits node u for the first time do
$Q_{u} = Q_{u} \cup P_{v}$ ;
end
run the LT-coding procedure (Algorithm 2);
end
end
end
/Phase IV: Storage Phase/
for all storage codes $u, u = 1, \dots, n$ do
return $Y_{u}$ ;
end

Selection of the parameters b and p

As stated above, R should satisfy $R^{2} \geq bV \log n / n$ to ensure the network connectivity, where b is a positive constant. Given a graph $G (n, R)$ , transmission radius R reflects the average node degree and further influences the number of links in the AS network. Therefore, an appropriate parameter b will consume less communication energy while keeping the network connected w.h.p.

During the PBcast process of phase II and phase III, each node that receives a packet for the first time will forward it to its neighbors with probability p. In order to ensure each packet visits all nodes at least once, the forwarding probability p should satisfy $p > p^{th} \approx 0.25$ according to equation (2). However, a bigger parameter p might increase the number of data transmissions and receptions of the PBcast, which results in higher data dissemination cost. Therefore, an appropriate parameter p will consume less energy while guaranteeing the reliability of PBcast, that is, each packet visits the entire network at least once w.h.p.

We investigate the appropriate parameters through Monte Carlo simulations. We consider two network sizes, that is, $n = 100$ and 300 space nodes are randomly distributed in the region $V = 10 \times 10 k m^{2}$ , respectively. Figure 6 shows the fraction of nodes (F) receiving a particular packet versus the probability p, where the parameter b is varied from 1 to 3. The fraction increases as the probability p get larger and converges to 1. When $p = p^{th} \approx 0.25$ , the packet can visit most nodes ( $F > 0.8$ ) in the AS network. The number of transmissions $N_{t}$ and receptions $N_{r}$ of PBcast when $n = 100$ are shown in Figure 7(a) and (b), respectively. The number of transmissions $N_{t}$ almost linearly increases with the probability p, and the number of receptions $N_{r}$ increases as the probability p get larger and converges after $F = 1$ . It is easy to understand since all nodes have received the packet at this moment.

Figure 6.

The fraction of storage nodes receiving a source packet (F) versus p: (a) $n = 100$ and (b) $n = 300$ .

Figure 7.

Total number of transmissions $N_{t}$ and receptions $N_{r}$ versus p ( $n = 100$ ): (a) $N_{t}$ and (b) $N_{r}$ .

There exists an optimal configuration of parameters p and b. For instance, we should select $p = 0.6$ when $b = 1$ , while only $p = 0.3$ can achieve the reliability of PBcast when $b = 2$ as shown in Figure 6(a). Recall that the average node degree is also influenced by parameter b. Table 1 shows the average node degree and the number of transmissions versus the optimal configuration $(b, p)$ . It is evident that to ensure the reliability of PBcast, the probability p should be larger than 0.3. While larger parameter b leads to a denser network. In order to reduce the energy consumption while keeping the network connectivity, we set $(b, p) = (2, 0.3)$ in the subsequent simulations.

Table 1.

Average node degree versus optimal configuration of parameters in the AS network.

$(b, p)$	$N_{t}$	Average node degree
(1, 0.6)	60	16.3
(1.5, 0.5)	50	23.2
(2, 0.3)	30	29.5
(2.5, 0.3)	30	35.1
(3, 0.3)	30	40.4

AS: autonomous system.

Data dissemination cost and decoding performance analysis

Two metrics are chosen to evaluate the efficacy and efficiency of the proposed DPRPB strategy: the probability of successfully decoding and the data dissemination cost. In this section, we first investigate the expression for data dissemination cost and then evaluate the decoding performance.

Data dissemination cost

The process of data dissemination contains both data transmission and data reception. Based on Wang et al.,²⁶ data transmission and data reception cost nearly identical energy in wireless networks. Assuming that each data transmission or data reception costs unit of energy, the data dissemination cost in the DPRPB strategy can be described as the total number of data transmissions and data receptions.

During the DPRPB strategy, all source packets and pre-coding packets should visit the entire AS network at least once in the pre-coding phase and LT-coding phase, respectively. All nodes except the data nodes and pre-coding nodes would receive them during the PBcast process. Therefore, the number of receptions $N_{r}$ is

\begin{matrix} N_{r} = N_{r}^{II} + N_{r}^{III} \\ \begin{matrix} = \end{matrix} k (n - 1) + m (n - 1) \\ \begin{matrix} = (k + m) (n - 1) \end{matrix} \end{matrix}

(3)

where $N_{r}^{II}$ and $N_{r}^{III}$ denote the number of receptions in phase II and phase III, respectively.

At the beginning of the PBcast, data nodes (pre-code nodes) broadcast their source packets (pre-coding packets) with probability 1. During the data dissemination process, each node that receives a packet for the first time would forward it with probability p. Therefore, the number of transmissions $N_{t}$ is

\begin{matrix} N_{t} = N_{t}^{II} + N_{t}^{III} \\ \begin{matrix} = \end{matrix} k + k (n - 1) p + m + m (n - 1) p \\ \begin{matrix} = \end{matrix} k + m + (k + m) (n - 1) p \end{matrix}

(4)

where $N_{t}^{II}$ and $N_{t}^{III}$ denote the number of transmissions in phase II and phase III, respectively.

The data dissemination cost $N_{tot}$ can be calculated as follows

\begin{matrix} N_{tot} = N_{t} + N_{r} \\ = k + m + (k + m) (n - 1) p + (k + m) (n - 1) \\ = (k + m) (1 + (n - 1) (1 + p)) \end{matrix}

(5)

Figure 7 has shown the number of data transmissions and data receptions for disseminating a particular packet, which coincides with our expressions.

Successful decoding probability

Here, we discuss the decoding performance of the proposed DPRPB strategy. Recall that the decoding performance is affected by the final degree distribution. If each node fulfills its code degree, all k source packets can be successfully recovered from any $k (1 + ε)$ encoded packets w.h.p.

Assuming that pre-coding packet j visits node u $c_{j} (u)$ times during the LT-coding phase of the DPRPB strategy, the probability that packet j was not be XORed in encoded packet $Y_{u}$ is

\begin{matrix} P_{j}^{R} (u) = Π_{i = 1}^{c_{j} (u)} (1 - \frac{d_{c} (u)}{m} \times sgn (d_{c} (u) - n s_{i} (u))) \end{matrix}

(6)

where $n s_{i} (u)$ is the number of pre-coding packets that node u has XORed at packet j’s $i th$ visit, and $sgn (x)$ is the sign function and can be expressed as

\begin{matrix} sgn (x) = {\begin{matrix} 1 \begin{matrix} , & x > 0 \end{matrix} \\ 0 \begin{matrix} , & x \leq 0 \end{matrix} \end{matrix} \end{matrix}

(7)

The sign function is the constraint on a node that has fulfilled its code degree and would not XOR any other visited packets. Therefore, during the encoding process of the DPRPB strategy, the number of XORed packets would not exceed the code degree. Relaxing the constraint of sign function, we have

\begin{matrix} P_{j}^{R} (u) = \underset{c_{j} (u)}{Π} (1 - \frac{d_{c} (u)}{m}) = {(1 - \frac{d_{c} (u)}{m})}^{c_{j} (u)} \end{matrix}

(8)

Therefore, at the end of the DPRPB strategy, the probability that packet j was XORed by $Y_{u}$ is

\begin{matrix} P_{j}^{X} (u) = 1 - P_{j}^{R} (u) = 1 - {(1 - \frac{d_{c} (u)}{m})}^{c_{j} (u)} \end{matrix}

(9)

During the DPRPB strategy, all pre-coding packets are disseminated using independent PBcast process. Therefore, at the end of the DPRPB strategy, the probability that just $d_{c} (u)$ packets was XORed by $Y_{u}$ is

\begin{array}{l} \Pr (n s (u) = d_{c} (u)) = Ω_{D} (d) \times \prod_{i = 1}^{d_{c} (u)} P_{j}^{X} (u) \\ = Ω_{D} (d) \times {\prod_{i = 1}^{d_{c} (u)} 1 - (1 - \frac{d_{c} (u)}{m})}^{c_{j_{i}} (u)} \end{array}

(10)

where $i = 1, \dots, d_{c} (u)$ is the XORed pre-coding packet. At the end of the DPRPB strategy, the expected number of visits of packet j to node u is given as follows

\begin{matrix} E [c_{j} (u)] = \frac{T \times d_{c} (u)}{\sum d} \\ = \frac{T \times d_{c} (u)}{\sum_{v = 1}^{n} d_{c} (v)} \end{matrix}

(11)

where T is the number of transmissions in phase III of the DPRPB strategy, and $\sum d = \sum_{v = 1}^{n} d_{c} (v)$ are the number of XORed pre-coding packets in AS network. Note that there are $(\begin{matrix} m \\ d_{c} (u) \end{matrix})$ combinations for choosing $d_{c} (u)$ packets from all m packets. We have

\Pr (ns (u) = d_{c} (u)) = Ω_{D} (d) \times (\begin{matrix} m \\ d_{c} (u) \end{matrix}) \times {(1 - {(1 - \frac{d_{c} (u)}{m})}^{T \times \frac{d_{c} (u)}{\sum d}})}^{d_{c} (u)}

(12)

It should be mentioned that equation (12) relaxes the constraint of sign function. The constraint does not work if the number of pre-coding packets that node u has XORed is less than $d_{c} (u)$ . Therefore, equation (12) is the probability upper bound that node u would XOR $d_{c} (u)$ pre-coding packets.

Simulation results and discussions

In this section, several Monte Carlo simulations are performed to evaluate the efficacy and efficiency of the proposed DPRPB strategy. We compare it with the representative exact decentralized fountain codes (EDFC) strategy¹⁹ and raptor codes–based distributed storage (RCDS) strategy.²¹ The EDFC is the first proposed data persistence strategy using decentralized fountain codes, while RCDS runs the on-the-fly encoding procedure as our strategy. These simulations are carried out through the MATLAB R2011b simulator.

For our simulations, we consider two network sizes with $n = 100$ and 300 space nodes randomly distributed in the region $V = 10 \times 10 k m^{2}$ , where $k = 0.1 n$ of them are data nodes. Each node has an equal transmission radius R, where $R^{2} = 2 V \log n / n$ to ensure the connectivity of AS network. For all three strategies, the parameter $ε$ of the degree distribution in equation (1) is chosen as $ε = 1$ . For EDFC strategy, the length of random walk is set to 200, and the parameter C of RCDS strategy which corresponds to the length of random walk ( $Cn \ln n$ ) is set to $C = 10$ . In the proposed DPRPB strategy, the parameter p is chosen as $p = 0.3$ . BP algorithm is employed for decoding in these strategies. Two performance metrics are successful decoding probability and data dissemination cost. To overcome the random effect of network topology, each simulation is repeated 1000 times to get the mean value of metrics.

Successful decoding probability

We compare the decoding performance of the proposed DPRPB strategy with other two representative strategies. To have a fair comparison, we evaluate the successful decoding probability $\Pr$ versus the decoding ratio $η$ . One successful decoding means that all k source packets are recovered. The successful decoding probability ( $\Pr$ ) is defined as the ratio of the number of successful decoding ( $M_{s}$ ) to the number of Monte Carlo simulations (M), that is, $\Pr = M_{s} / M$ . The decoding ratio ( $η$ ) is defined as the ratio of the number of queried nodes (h) to the number of data nodes (k), that is, $η = h / k$ .

Figure 8 shows the decoding performance of the proposed DPRPB strategy in comparison with the EDFC and RCDS strategies in terms of the decoding ratio $η$ when $n = 100$ and 300, respectively. It is obvious from the plots that the proposed DPRPB strategy has a better performance, especially when queries less nodes (e.g. $η \leq 2$ ), while the EDFC and RCDS strategies nearly have the same decoding performance. Notice that the differences between the plots decrease as $η$ get larger and almost converge when queries enough nodes. For instance, when $n = 100$ and $η = 1.2$ (Figure 8(a)), the largest difference is 0.55, while only 0.2 when $η = 2.2$ . It is because that when queries more nodes, the probability of fulfilling the degree distribution increases, which results in better decoding performance.

Figure 8.

Successful decoding probability versus decoding ratio $η$ : (a) $n = 100$ and (b) $n = 300$ .

The performance improvement of the DPRPB strategy is benefited from the encoding process, while each node runs the encoding procedure at pre-coding packets’ every visit. As a result, the final degree distribution of the proposed DPRPB strategy is much closer to the desired distribution as shown in Figure 9. This improves the decoding performance compared to EDFC and RCDS strategies. Poor decoding probability of the EDFC and RCDS strategies is deteriorated by their irregular degree distributions. The encoding procedure in the EDFC strategy performs after each random walk terminates, and packets are considered accepting only at their last visited nodes. Due to the shorter length of random walks, nodes that far away from data nodes might not XOR enough packets. In the RCDS strategy, nodes run the encoding procedure only at packets’ first visit, regardless of the code degree. Therefore, nodes with low code degree might XOR less packets than expected and vice versa. This results in poor decoding performance when queries few nodes.

Figure 9.

Final degree distribution of the DPRPB strategy ( $n = 100$ ).

Data dissemination cost

We compare the data dissemination cost of the proposed DPRPB strategy with other two representative strategies. Table 2 shows the number of transmissions $N_{t}$ , the number of receptions $N_{r}$ , and the data dissemination costs $N_{tot}$ of three data persistence strategies (the proposed DPRPB, EDFC, and RCDS) when $n = 100$ and 300, respectively. It is evident from the results that the proposed DPRPB strategy costs the least energy compared to EDFC and RCDS strategies. This is benefited from the data dissemination process of PBcast by exploiting the broadcast property of wireless communications. Despite shorter length of random walks in the EDFC strategy, its data dissemination cost is higher than the RCDS strategy. It is because that data nodes should launch multiple random walks to reach the equilibrium distribution. While in the RCDS strategy, each node launches one random walk and the length is set to the cover time. As a result, data dissemination cost in the EDFC strategy is much higher than the RCDS strategy and our proposed DPRPB strategy.

Table 2.

Total data dissemination costs in each strategy.

Strategy	n = 100			n = 300
Strategy	$N_{t}$	$N_{r}$	$N_{tot}$	$N_{t}$	$N_{r}$	$N_{tot}$
DPRPB	3.07e2	9.90e2	1.29e3	2.72e3	8.97e3	1.16e4
EDFC	4.39e5	4.39e5	8.78e5	1.92e7	1.92e7	3.84e7
RCDS	4.60e4	4.60e4	9.20e4	5.12e5	5.12e5	1.02e6

DPRPB: data persistence based on raptor codes and probabilistic broadcasting; EDFC: exact decentralized fountain codes; RCDS: raptor codes–based distributed storage.

It is noticed that during the LT-coding process of the DPRPB strategy, each node should temporarily maintain a queue Q to record the visited packets. The queue Q obviously has all m pre-coding packets after the encoding phase. In order to reduce the cache memory while ensuring decoding performance, nodes should discard a visited packet after the code degree has fulfilled or it has been XORed. In general, the proposed DPRPB strategy significantly reduces the data dissemination cost, while improving the decoding performance.

Conclusion

We investigated the data persistence problem in the AS network of PSN, and an efficient DPRPB strategy was proposed based on Raptor codes and PBcast. Unlike most existing strategies, DPRPB uses the PBcast to disseminate packets by exploiting the broadcast property of wireless communications. Theoretical analysis derives the expression of data dissemination cost and proves that the DPRPB strategy achieves a better decoding performance. Simulation results validate the analysis and show that the proposed DPRPB strategy reduces the data dissemination cost significantly in comparison with representative EDFC and RCDS strategies. In addition, the decoding performance is also increased, which is significant for the resource-limited PSN.

Footnotes

Academic Editor: Paul Mitchell

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by NSF of China under Grant Nos 91338201, 91438109, and 61401507.

References

Mukherjee

Ramamurthy

Communication technologies and architectures for space network and interplanetary Internet. IEEE Commun Surv Tutor 2013; 15(2): 881–897.

Akyildiz

Akan

Chen

. The state of the art in interplanetary internet. IEEE Commun Mag 2004; 42(7): 108–118.

Alena

Gilbaugh

Glass

. Communication system architecture for planetary exporation. IEEE Aero El Sys Mag 2001; 16(11): 4–11.

Rodrigues

Oliveira

Alvarez

. Space wireless sensor networks for planetary exploration: node and network architectures. In: Proceedings of the NASA/ESA conference on adaptive hardware and systems, Leicester, 14–17 July 2014, pp.180–187. New York: IEEE.

Zhang

G-X

Gou

. Delay minimization topology control in planetary surface network: an autonomous systems approach. Int J Distrib Sens N 2015; 2015: 1–13.

Union of Concerned Scientists, “UCS Satellite Database,”

http://www.ucsusa.org/nuclear_weapons_and_global_security/solutions/space-weapons/ucs-satellite-database.html, (accessed 3 October 2016).

Leong

Dimakis

Tracey

Distributed storage allocations. IEEE T Inform Theory 2012; 58(7): 4733–4752.

Sun

W-D

Wang

Y-J

Xiao

X-Q.

Tree-structured parallel regeneration for multiple data losses in distributed storage systems based on erasure codes. China Commun 2013; 10(4): 113–125.

Plank

Thomason

MG.

An exploration of non-asymptotic low-density parity check erasure codes for wide-storage applications. Parallel Process Lett 2007; 17: 103–123.

10.

Dimakis

Prabhakaran

Ramchandran

Decentralized erasure codes for distributed networked storage. IEEE T Inform Theory 2006; 52(6): 2809–2816.

11.

Ahlswede

Cai

S-Y

. Network information flow. IEEE T Inform Theory 2000; 46(4): 1204–1216.

12.

Dimakis

Godfery

Y-N

. Network coding for distributed storage systems. IEEE T Inform Theory 2010; 56(9): 4539–4551.

13.

Rashmi

Shah

Kumar

PV.

Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE T Inform Theory 2011; 57(8): 5227–5239.

14.

Puducheri

Fountain codes. IEEE Proc Commun 2005; 152(6): 1062–1068.

15.

Shokrollahi

Raptor codes. IEEE T Inform Theory 2006; 52(6): 2551–2567.

16.

Rahnavard

Bellambi

Tao

CRBcast: a reliable and energy-efficient broadcast scheme for wireless sensor networks using rateless codes. IEEE T Wirel Commun 2008; 7(12): 5390–5400.

17.

Dimakis

Prabhakaran

Ramchandran

. Distributed fountain codes for networked storage. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, Toulouse, 14–19 May 2006. New York: IEEE.

18.

Kamra

Misra

Feldman

. Growth codes: maximizing sensor network data persistence. In: Proceedings of the 2006 conference on applications, technologies, architectures, and protocols for computer communications (SIGCOMM), Pisa, 11–15 September 2006, pp.255–266. New York: ACM.

19.

Lin

Y-F

Liang

B-C.

Data persistence in large-scale sensor networks with decentralized fountain codes. In: Proceedings of the IEEE international conference on computer communications, Anchorage, AK, 6–12 May 2007, pp.1658–1666. New York: IEEE.

20.

X-C

Chen

W-T

. LT codes based distributed coding for efficient distributed storage in wireless sensor networks. In: Proceedings of the IFIP networking conference, Toulouse, 20–22 May 2015, pp.1–9. New York: IEEE.

21.

Kong

Z-N

Aly

Soljanin

Decentralized coding algorithms for distributed storage in wireless sensor networks. IEEE J Sel Area Comm 2010; 28(2): 261–267.

22.

Aly

Kong

Z-N

Soljanin

. Raptor codes based distributed storage algorithms for wireless sensor networks. In: Proceedings of the IEEE international symposium on information theory, Toronto, ON, Canada, 6–11 July 2008, pp.2051–2055. New York: IEEE.

23.

Jafarizadeh

Jamalipour

Data persistency in wireless sensor networks using distributed Luby transform codes. IEEE Sens J 2013; 13(12): 4880–4890.

24.

Liang

J-B

Wang

J-X

Chen

. An overhearing-based scheme for improving data persistence in wireless sensor networks. In: Proceedings of the IEEE international conference on communications (ICC), Cape Town, South Africa, 23–27 May 2010, pp.1–5. New York: IEEE.

25.

Zhong

Shen

Seiferas

The convergence-guaranteed random walk and its applications in peer-to-peer networks. IEEE T Comput 2008; 57(5): 619–633.

26.

Wang

Hempstead

Yang

. A realistic power consumption model for wireless sensor network devices. In: Proceedings of the 3rd annual IEEE communications society on sensor and ad hoc communications and networks, Reston, VA, 28 September 2006, pp.286–295. New York: IEEE.

27.

Luby

LT codes. In: Proceedings of the 43rd annual IEEE symposium on foundations of computer science (FOCS), Vancouver, BC, Canada, 19 November 2002. New York: IEEE.

28.

Liang

Y-B

Lai

L-F

Poor

. A broadcast approach for fading wiretap channels. IEEE T Inform Theory 2014; 60(2): 842–858.

29.

Gupta

Kumar

PR.

The capacity of wireless networks. IEEE T Inform Theory 2000; 46(2): 388–404.

30.

Hall

Introduction to the theory of coverage process. New York: John Wiley & Sons, 1988.

Data persistence in planetary surface network using raptor codes and probabilistic broadcasting

Abstract

Keywords

Introduction

Related work and preliminaries

Related work

AS networks of PSN

Fountain codes and raptor codes

Probabilistic broadcasting: PBcast

Network model and problem description

Network model

Problem description

Proposed DPRPB strategy

Description of the DPRPB strategy

Selection of the parameters b and p

Data dissemination cost and decoding performance analysis

Data dissemination cost

Successful decoding probability

Simulation results and discussions

Successful decoding probability

Data dissemination cost

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References