Abstract
This article investigates the data persistence problem in the planetary surface network of interplanetary Internet using the distributed raptor codes. In order to improve the lifetime of space information and space nodes’ energy efficiency in planetary surface network, we propose an efficient data persistence strategy based on raptor codes and probabilistic broadcasting. Unlike most existing data persistence strategies where the random walks are used to disseminate source packets, the probabilistic broadcasting mechanism is employed in the proposed strategy to reduce the data dissemination cost by exploiting the broadcast property of wireless networks. The decoding performance and data dissemination cost are analyzed. Simulation results validate that the proposed strategy consumes the least data dissemination cost while achieving a better decoding performance compared with other representative strategies.
Keywords
Introduction
The interplanetary Internet (IPN) is proposed to face the requirements of different deep space missions early in this century. It is composed of a backbone network, external networks, and planetary networks (PNs). As shown in Mukherjee and Ramamurthy 1 and Akyildiz et al., 2 the PN includes a planetary orbiter network (PON) and a planetary surface network (PSN), where the PSN is composed of a large number of space nodes such as rovers, astronauts, and landers. In order to accomplish collaborative tasks, space nodes in the PSN autonomously communicate to each other. 3 Unlike traditional wireless networks, the network status of PSN changes dynamically due to the wide range and uneven distribution of space nodes. 4 As proposed in our prior research, 5 we divided the PSN into a series of hierarchical autonomous system (AS) networks based on the property of space nodes. In this way, the complex PSN is decoupled into small-scale quasi-static networks, which makes the control easier to carry out.
Nowadays, deep space information is increasing exponentially. According to the space database of Union of Concerned Scientists (UCS), the amount of deep space information in 2025 will be 33 times than that in 2015. 6 However, space nodes in the PSN are resource limited, and sometimes, they may even be out of function due to hazardous environments of deep space. Therefore, how to prolong the lifetime of deep space information in PSN is challenging. The best approach is disseminating it with redundancy among the entire network, and the original data can be reconstructed by querying a few space nodes even though some are accessible. This is known as the data persistence or distributed data storage problem in wireless communication networks. 7
The data persistence problem has been widely studied in wireless networks. Replication is the simplest redundancy manner, but it induces too much storage overheads. Compared with replication, coding redundancy techniques can significantly improve the storage efficiency. Traditional erasure codes provide a potential solution and have been applied in many practical storage systems.8–10 To achieve a better trade-off between the repair bandwidth and storage cost, Dimakis et al. introduced the network coding 11 into storage systems and proposed the regenerating codes.12,13 Fountain codes 14 are a class of rateless codes that can generate unlimited number of encoded packets with low complexity, which is suitable for data persistence in wireless networks.
We consider the data persistence problem in the PSN, and the main objective is to increase the reliability of the AS network by prolonging the lifetime of data generated by space nodes. In addition, we aim to reduce the energy consumption among the data dissemination process. In our proposed strategy, redundancy packets are obtained based on Raptor codes 15 and original data are disseminated among the AS network using the probabilistic broadcasting (PBcast) 16 manner. We refer to our strategy as data persistence based on raptor codes and probabilistic broadcasting (DPRPB). To the best of our knowledge, DPRPB is the first strategy that exploits the broadcast property of wireless communications to improve the energy efficiency. Detailed data dissemination and encoding procedures are given. Theoretical analysis and extensive simulations validate that compared with other representative strategies, the proposed DPRPB strategy reduces the cost of data dissemination, while improving the decoding performance.
The rest of this article is organized as follows. In section “Related work and preliminaries,” we briefly review the related work and provide the basic knowledge about AS networks of PSN, Raptor codes, and PBcast. In section “Network model and problem description,” we define the network model and describe the data persistence problem in AS network. In section “Proposed DPRPB strategy,” we propose the DPRPB strategy to improve the energy efficiency and data persistence. Then, the validity of DPRPB strategy and data dissemination cost are analyzed in section “Data dissemination cost and decoding performance analysis.” In section “Simulation results and discussions,” simulation results and discussion are presented. Finally, section “Conclusion” concludes this article.
Related work and preliminaries
In this section, we briefly introduce the necessary background to design our strategy.
Related work
Redundant storage with coding has been widely studied in the data persistence problems to ensure storage reliability and efficiency. In particular, a file of size M is divided into k fragments (with equal size
Existing researches about fountain codes–based data persistence strategies in wireless networks mainly focus on two aspects: the process of disseminating source packets and the encoding algorithm.19–24 In Lin et al., 19 random walks 25 are first used to exchange source packets between storage nodes. It can be modeled as a Markov process, and the next visited node is randomly chosen from all neighbors of the current node. Inspired by Lin et al., 19 most existing strategies use random walks to disseminate source packets since it does not need any routing table. However, they all ignore the inherent broadcast nature of wireless communications. As shown in Wang et al., 26 the data dissemination cost includes both data transmissions and data receptions. It is our intuition that the data dissemination cost can be significantly reduced through broadcasting. To the best of our knowledge, only Liang et al. 24 uses overhearing to improve the data persistence, while the data dissemination cost is not considered. In this article, an efficient variation in broadcast, probabilistic broadcasting (PBcast), is employed for packet dissemination.
The encoding process is another issue in data persistence problems. At first, encoding procedure was carried out in Lin et al. 19 and Ye et al. 20 only after all storage nodes have converged to the steady-state distribution, that is, all random walks were terminated. Then, each node randomly exclusive-or (XOR) some packets stopped on it according to its code degree. Therefore, all stopped packets should be temporarily saved in the corresponding node until running the encoding procedure. This induces excessive storage overhead. To reduce the overhead, encoding procedure in previous studies21–23 is done on the fly, that is, during the process of data dissemination. Specifically, each source packet launches a random walk, and only at its first visit to a node, it is considered for XORing. Therefore, each packet should visit all nodes at least once to fulfill its code degree with probability. However, this result in an uneven degree distribution since the encoding procedure is done regardless of whether the code degree is fulfilled. In the proposed DPRPB strategy, the encoding procedure is performed upon source packets’ each visit unless the code degree has fulfilled or the packet has been XORed. As a result, the desired degree distribution is fulfilled w.h.p. and at the same time it is performed on the fly.
AS networks of PSN
The PSN is a part of IPN, and it is a self-organizing network that contains various space nodes. As shown in Figure 1, space nodes are located in a wide range and they have uneven distribution. At the same time, they work in different regions with either static (e.g. sensors) or mobile (e.g. landers) statuses. The unified management method is not efficient for the PSN because it necessitates too much control messages. Hence, as shown in Figure 2, the PSN is decoupled into a series of AS networks. Each AS network is composed of similar space nodes and can be modeled as a homogeneous network. 5 In this way, the complex PSN is divided into small-scale quasi-static networks. Independent manage scheme can be utilized in each AS network, and only neighboring AS networks exchange the control information to keep connectivity of the whole PSN. This makes the control easier to carry out.

The architecture of PSN on Mars.

The PSN is decoupled into a series of AS networks based on the property of space nodes.
Fountain codes and raptor codes
Fountain codes are a class of rateless codes that can generate unlimited number of encoded packets from a finite number of source packets. Luby transform (LT) codes 27 are the first practical fountain codes and have been applied in many fields such as satellite broadcast and cooperative communications. As an extension of LT codes, Raptor codes concatenate a pre-code with LT codes. This relaxes the constraint that all source packets should be recovered from the LT-coding part, which achieves the linear encoding and decoding complexity. In particular, encoded packets are generated by randomly XORing d source packets, where d is the code degree and is independently drawn from the degree distribution function.
The degree distribution function is the key factor that affects the decoding performance of fountain codes. Let the pre-code rate of Raptor codes be
where
Probabilistic broadcasting: PBcast
As in previous studies,19–24 random walk has been used in most existing data persistence strategies for packet dissemination. The length is set to the network cover time to satisfy overall coverage, where the cover time of a random walk is defined as the minimal length that ensure a source packet visiting the entire network. According to Zhong et al.,
25
w.h.p., the cover time of a connected network with n nodes is
PBcast is an efficient and reliable variation of flooding, the simplest form of broadcast. It can effectively overcome the well-known broadcast storm problem of flooding.
28
During the PBcast process, each node that receives a source packet for the first time will forward (rebroadcast) it with some probability p. W.h.p., the parameter p should be larger than the threshold
where V is the area of network and R is the transmission radius.
Network model and problem description
Network model
Considering that the properties of space nodes in the same AS network of PSN are similar, each AS network can be expressed as a homogeneous network. In this article, we consider the data persistence problem in an AS network. Without loss of generality, we model the AS network as an undirected random simple graph

Example of AS network.
Problem description
For ease of the theoretical analysis and to have a fair comparison with other algorithms proposed in the literature, it is assumed that each space node has limited memory to save only one packet when served as a storage node. We assume that there are
Proposed DPRPB strategy
In this section, we present our data persistence strategy, that is, DPRPB in the AS network. We first describe the proposed DPRPB strategy generally; then, we discuss the choices of two parameters b and p.
Description of the DPRPB strategy
We aim to have all n nodes store an encoded packet corresponding to the k source packets in a decentralized way. The DPRPB strategy is divided into four phases: the initialization phase, the pre-coding phase, the LT-coding phase, and the storage phase. PBcast is used in phase II and phase III for data dissemination. There are three groups of nodes in the AS network: besides the data nodes and storage nodes as described in section “Network model,” there are
1. Phase I. Initialization phase: We assume that the communication in the AS network is slotted and synchronized. At the beginning of the DPRPB strategy, each data node
2. Phase II. Pre-coding phase: Then, in the second phase, that is, the pre-coding phase, data nodes

Packet structure of storage node u.
Due to the small probability p, each PBcast ends after source packets forward few times. Therefore, we can set a timer and beyond which the pre-coding phase can be considered as finished. During the PBcast process, we do not consider the packet loss because it can be resolved at the lower layer.
3. Phase III. LT-coding phase: At the end of the pre-coding phase, each pre-coding node
Here, we discuss the LT-coding procedure, that is, how in DPRPB strategy, each storage node u made its decision about the visited pre-coding packets
Upon a pre-coding packet
In a slight abuse of notation, we use the term “visit”, which refers to the pre-coding packet either received or overheard by a node u in the LT-coding phase, and node u forwards the packet (with probability p) only when it receives this packet for the first time. The overhearing property will increase the number of LT-coding procedures in each node, thus further increases the probability that each node fulfills its code degree.
Figure 5 shows an AS network with

The overhearing property increases the number of LT-coding procedures.
4. Phase IV. Storage phase: During the LT-coding phase, each pre-coding packet
The pseudo-code of DPRPB strategy is described in Algorithm 4.
Selection of the parameters b and p
As stated above, R should satisfy
During the PBcast process of phase II and phase III, each node that receives a packet for the first time will forward it to its neighbors with probability p. In order to ensure each packet visits all nodes at least once, the forwarding probability p should satisfy
We investigate the appropriate parameters through Monte Carlo simulations. We consider two network sizes, that is,

The fraction of storage nodes receiving a source packet (F) versus p: (a)

Total number of transmissions
There exists an optimal configuration of parameters p and b. For instance, we should select
Average node degree versus optimal configuration of parameters in the AS network.
AS: autonomous system.
Data dissemination cost and decoding performance analysis
Two metrics are chosen to evaluate the efficacy and efficiency of the proposed DPRPB strategy: the probability of successfully decoding and the data dissemination cost. In this section, we first investigate the expression for data dissemination cost and then evaluate the decoding performance.
Data dissemination cost
The process of data dissemination contains both data transmission and data reception. Based on Wang et al., 26 data transmission and data reception cost nearly identical energy in wireless networks. Assuming that each data transmission or data reception costs unit of energy, the data dissemination cost in the DPRPB strategy can be described as the total number of data transmissions and data receptions.
During the DPRPB strategy, all source packets and pre-coding packets should visit the entire AS network at least once in the pre-coding phase and LT-coding phase, respectively. All nodes except the data nodes and pre-coding nodes would receive them during the PBcast process. Therefore, the number of receptions
where
At the beginning of the PBcast, data nodes (pre-code nodes) broadcast their source packets (pre-coding packets) with probability 1. During the data dissemination process, each node that receives a packet for the first time would forward it with probability p. Therefore, the number of transmissions
where
The data dissemination cost
Figure 7 has shown the number of data transmissions and data receptions for disseminating a particular packet, which coincides with our expressions.
Successful decoding probability
Here, we discuss the decoding performance of the proposed DPRPB strategy. Recall that the decoding performance is affected by the final degree distribution. If each node fulfills its code degree, all k source packets can be successfully recovered from any
Assuming that pre-coding packet j visits node u
where
The sign function is the constraint on a node that has fulfilled its code degree and would not XOR any other visited packets. Therefore, during the encoding process of the DPRPB strategy, the number of XORed packets would not exceed the code degree. Relaxing the constraint of sign function, we have
Therefore, at the end of the DPRPB strategy, the probability that packet j was XORed by
During the DPRPB strategy, all pre-coding packets are disseminated using independent PBcast process. Therefore, at the end of the DPRPB strategy, the probability that just
where
where T is the number of transmissions in phase III of the DPRPB strategy, and
It should be mentioned that equation (12) relaxes the constraint of sign function. The constraint does not work if the number of pre-coding packets that node u has XORed is less than
Simulation results and discussions
In this section, several Monte Carlo simulations are performed to evaluate the efficacy and efficiency of the proposed DPRPB strategy. We compare it with the representative exact decentralized fountain codes (EDFC) strategy 19 and raptor codes–based distributed storage (RCDS) strategy. 21 The EDFC is the first proposed data persistence strategy using decentralized fountain codes, while RCDS runs the on-the-fly encoding procedure as our strategy. These simulations are carried out through the MATLAB R2011b simulator.
For our simulations, we consider two network sizes with
Successful decoding probability
We compare the decoding performance of the proposed DPRPB strategy with other two representative strategies. To have a fair comparison, we evaluate the successful decoding probability
Figure 8 shows the decoding performance of the proposed DPRPB strategy in comparison with the EDFC and RCDS strategies in terms of the decoding ratio

Successful decoding probability versus decoding ratio
The performance improvement of the DPRPB strategy is benefited from the encoding process, while each node runs the encoding procedure at pre-coding packets’ every visit. As a result, the final degree distribution of the proposed DPRPB strategy is much closer to the desired distribution as shown in Figure 9. This improves the decoding performance compared to EDFC and RCDS strategies. Poor decoding probability of the EDFC and RCDS strategies is deteriorated by their irregular degree distributions. The encoding procedure in the EDFC strategy performs after each random walk terminates, and packets are considered accepting only at their last visited nodes. Due to the shorter length of random walks, nodes that far away from data nodes might not XOR enough packets. In the RCDS strategy, nodes run the encoding procedure only at packets’ first visit, regardless of the code degree. Therefore, nodes with low code degree might XOR less packets than expected and vice versa. This results in poor decoding performance when queries few nodes.

Final degree distribution of the DPRPB strategy (
Data dissemination cost
We compare the data dissemination cost of the proposed DPRPB strategy with other two representative strategies. Table 2 shows the number of transmissions
Total data dissemination costs in each strategy.
DPRPB: data persistence based on raptor codes and probabilistic broadcasting; EDFC: exact decentralized fountain codes; RCDS: raptor codes–based distributed storage.
It is noticed that during the LT-coding process of the DPRPB strategy, each node should temporarily maintain a queue Q to record the visited packets. The queue Q obviously has all m pre-coding packets after the encoding phase. In order to reduce the cache memory while ensuring decoding performance, nodes should discard a visited packet after the code degree has fulfilled or it has been XORed. In general, the proposed DPRPB strategy significantly reduces the data dissemination cost, while improving the decoding performance.
Conclusion
We investigated the data persistence problem in the AS network of PSN, and an efficient DPRPB strategy was proposed based on Raptor codes and PBcast. Unlike most existing strategies, DPRPB uses the PBcast to disseminate packets by exploiting the broadcast property of wireless communications. Theoretical analysis derives the expression of data dissemination cost and proves that the DPRPB strategy achieves a better decoding performance. Simulation results validate the analysis and show that the proposed DPRPB strategy reduces the data dissemination cost significantly in comparison with representative EDFC and RCDS strategies. In addition, the decoding performance is also increased, which is significant for the resource-limited PSN.
Footnotes
Academic Editor: Paul Mitchell
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by NSF of China under Grant Nos 91338201, 91438109, and 61401507.
