Abstract
This article investigates the distributed data storage problem with compressed sensing in the space information network. Since there exists a performance-energy trade-off, most existing strategies focus only on improving the compressed sensing construction performance or reducing the energy consumption, respectively. In order to achieve a better balance, a novel and efficient strategy, referred to as distributed storage strategy based on compressed sensing, is proposed in this article. Unlike other strategies which require source packets visiting the entire network, the proposed strategy is a “one-hop” method since information exchange is only performed between neighbors. Therefore, the compressed sensing measurement matrix depends heavily on the degree of each space node. We prove that the proposed strategy guarantees the compressed sensing reconstruction performance under both sparse orthonormal basis and dense orthonormal basis. Simulation results validate that, compared with the representative CStorage strategy and compressive data persistence strategy, the proposed strategy consumes the least energy and computational overheads, while almost without sacrificing the compressed sensing reconstruction performance.
Introduction
The space information network (SIN) was proposed to solve the problems that different space systems’ built separately and inconvenience in cooperation. It can provide communication, navigation, and remote sensing service simultaneously using various space platforms, for example, satellites, aerial vehicles, high-altitude platforms (HAPs), and terrestrial terminals.1,2 Unlike traditional wireless networks, the network status of SIN was changing dynamically due to the distinguishing characteristics such as self-organizing and large scale. 3 In our prior work, 4 we divided the SIN into a series of hierarchical autonomous system (AS) networks based on the property of space nodes. In this way, the complex SIN was decoupled into quasi-static sub-networks, which makes the control easier to carry out. Each AS network can be modeled as a homogeneous network similar to the terrestrial wireless networks, while the difference is that the nodes in the SIN are bandwidth-limited and energy-limited, these limitations can be jointly described as resource-limited.
In recent years, space information is increasing significantly. According to the satellite database of union of concerned scientists (UCS), the amount of space information in 2020 will be 44 times than that in 2010. 5 However, space nodes in the SIN are resource-limited, and sometimes they may be invisible for a long time due to the shadowing effect. Therefore, how to increase the lifetime of information in the SIN is challenging. Distributed storage is an effective solution, and it adds redundancy into storage systems by combining network technology. 6 This way the original information can be reconstructed by querying a small subset of space nodes even if some nodes are invisible.
The distributed data storage problem has been widely studied in the wireless networks, and information is stored with redundancy among the entire network. At first, traditional erasure codes like low-density parity-check (LDPC) codes 7 and fountain codes 8 are adopted in a decentralized manner to improve the storage reliability.9–12 To reduce the repair bandwidth, Dimakis et al. 13 and Lin et al. 14 proposed the regenerating codes by introducing network coding into storage systems. On the other hand, compressed sensing (CS) theory15–17 has been applied in a variety of wireless distributed storage strategies by exploiting the correlation between typical information.18–23 As a result, both the decoding ratio and data dissemination cost are greatly reduced for the same order of erasure codes. As presented in Wang et al. 24 and Huleihel et al., 25 communication signals between space nodes are also compressible due to the spatial correlation. Therefore, it is significant if the CS theory can be utilized in the SIN.
We investigate the distributed data storage problem in the SIN, and the main objective is to improve the energy efficiency of the AS network such as data dissemination cost and computational overheads. Since there exists a trade-off between the reconstruction performance and the energy consumption, we also aim to make our strategy sacrificing less performance. In this article, storage packet is generated based on the CS theory and the data dissemination process is performed according to the “one-hop” broadcasting mechanism, that is, each node only broadcasts its own source packet once. Thus, information is only exchanged between neighboring nodes. We denote our strategy by distributed storage strategy based on compressed sensing (DSSCS). Detailed theoretical analysis and extensive simulations are provided to evaluate the proposed strategy compared with other representative strategies; the results show that the DSSCS strategy guarantees the CS reconstruction performance under both sparse orthonormal basis (e.g. canonical basis) and dense orthonormal basis (e.g. discrete cosine transform (DCT) basis), while improving the energy efficiency.
The remainder of this article is organized as follows. In section “Background and related work,” we briefly review the AS networks of SIN, CS theory, and the related work about CS-based distributed storage strategies in wireless networks. In section “Network model and problem description,” we define the network model and describe the problem. In section “Proposed DSSCS strategy,” we propose the DSSCS strategy to achieve the balance between CS reconstruction performance and energy efficiency. Then, the validity of DSSCS strategy and data dissemination cost are analyzed in section “Theoretical analysis.” In section “Simulation results and discussions,” simulation results and discussions are presented. Finally, we make conclusion in section “Conclusion.”
Background and related work
In this section, we briefly introduce the necessary background to design our strategy.
AS networks of SIN
The SIN is a space-based information infrastructure that contains different kinds of space nodes. As shown in Figure 1, these nodes are located in different altitude of space orbit and work within various circumstances with either dynamic (e.g. low earth orbit satellites) or static (e.g. grid topology on the earth) statuses. The unified management approach is not efficient for SIN because it induces lots of control messages, which consume excessive energy and network capacities. Therefore, as shown in Figure 2, we decouple the SIN into a series of hierarchical AS networks, each AS network contains a collection of similar space nodes and can be modeled as a homogeneous network. 4 For example, AS-1 contains the satellites, AS-2 contains the airplanes, AS-3 contains the HAPs, and AS-4 contains the ground terminates. In addition, individual AS network can be further divided into sub-AS networks if necessary. Independent manage strategy can be adopted in each AS (sub-AS) network. This will decouple the complex SIN into quasi-static sub-networks, which makes the control easier to carry out.

The architecture of SIN.

The whole SIN is divided into a series of AS networks based on the property of space nodes.
CS
CS theory15–17 states that sparse or compressible information can be successfully reconstructed at a sub-Nyquist sampling rate with high probability (WHP). In particular, suppose an N-dimensional vector
where
where
where
Related work
CS theory has been widely studied in the distributed storage of wireless networks, which improves both storage reliability and energy efficiency for the same order of erasure codes. To the best of our knowledge, the works by Talari and Rahnavard
18
and Liu et al.
19
are the most representative strategies based on CS for distributed storage in wireless networks. However, they focus on different performance metrics since there exists a trade-off between the CS reconstruction performance and the energy efficiency. Aiming at reducing the data dissemination cost, Talari and Rahnavard
18
propose the CStorage strategy for distributed data storage in wireless sensor networks. During the data dissemination process, only
On the other hand, Liu et al.
19
propose the compressive data persistence (CDP) strategy for improving the CS reconstruction performance on different types of signals. Random walk
27
is employed for disseminating packets throughout the entire network since it does not need any location information and can be used in arbitrary topology. During the encoding process, each node launches r random walks and the length is set to the cover time to reach the equilibrium distribution, where the cover time is defined as the expected length that ensures a source packet visiting all nodes at least once. As a result, each column of the measurement matrix has almost r non-zero elements and they prove that
Since space nodes in the SIN are resource-limited, it is significant if less energy and computational overheads are consumed. Inspired by Talari and Rahnavard
18
and Liu et al.,
19
we propose a novel and simple, but efficient, DSSCS strategy for AS network to achieve a better balance between the CS reconstruction performance and energy efficiency. In our strategy, nodes only exchange information with its direct neighbors. Hence, the measurement matrix
Haupt et al.
20
first propose a decentralized CS-based strategy for networked storage, the random gossiping is employed for disseminating source packets. As a result, the measurement matrix
Yang et al. 22 propose the compressive network coding–based distributed data storage (CNCDS) strategy by introducing network coding into wireless storage systems. Although the storage reliability is improved, similar to Talari and Rahnavard, 18 they ignore the universality of different orthonormal basis. Gong et al. 23 propose the spatiotemporal compressive network coding (ST-CNC) strategy by exploiting the spatial and temporal correlation simultaneously, while rigorous proof is not provided and the signal model is not suitable for the SIN.
Network model and problem description
Network model
In this article, we mainly focus on the distributed data storage problem in an AS network. Suppose that the AS network consists of N similar space nodes distributed uniformly at random with omnidirectional antennas, and they have identical transmission radius R. Then, the AS network can be modeled as an undirected random geometric graph

Example of AS network.
In our AS model, we don’t make any assumption about the routing table since the proposed strategy is fully distributed. Each space node only knows the local information, which can be acquired from its neighbors. The global information (e.g. the network size) is not available for each space node, which is different from most existing strategies, for example, Talari and Rahnavard 18 and Liu et al. 19 For convenience to our theoretical analysis, we also assume that the link is symmetric and obstacle-free, and there is a link (either direct or indirect) between any two space nodes. Then, we provide the definition of node degree.
Definition 1: node degree
Denote by
For an AS network with N nodes uniformly and independently distributed in the region V, the probability that two nodes are neighbors, that is, the link probability
Then, the density of network d can be expressed as
Therefore, each node in the AS network almost has the same order of node degree
Problem description
As presented above, we mainly focus on the distributed data storage problem in an AS network, where N similar space nodes are distributed uniformly at random in the region V. Each node is responsible for generating, relaying, and storing information. For convenience to the theoretical analysis and fair comparison with other strategies, each node i has a limited memory and generates one symbol
Proposed DSSCS strategy
In this section, we present our strategy, DSSCS, for distributed data storage in the AS network. The proposed DSSCS strategy is divided into four phases: initialization, CS encoding, storage, and reconstruction phase. We aiming that every space node in the AS network stores one CS measurement in a decentralized manner, that is, the encoding process is down without unified management. In order to reduce the data dissemination cost and without sacrificing the CS reconstruction performance, each node broadcasts its own source packet only once. Therefore, the measurement matrix is obtained based on the local information of space nodes. The detailed descriptions of the DSSCS strategy are given below.
Phase 1: initialization phase
We assume that the transmission in the AS network is synchronized and slotted. At the beginning, that is, the initialization phase, each node
Phase 2: CS encoding phase
In the CS encoding phase, every node disseminates its source packet according to the “one-hop” broadcasting mechanism, that is, each node broadcasts its source packet only once. It is a special case of PBcast with the forward probability
The measurement of each node depends on the source packets it receives. We assume that node i receives three packets during the CS encoding phase:
The above equation can be expressed as the matrix form
The final packet structure stored in node i is shown in Figure 4, which consists of the measurement coefficient set

Storage packet structure of node i.
During the CS encoding phase, each node independently runs the CS encoding procedure and updates its own storage packet as shown in equation (7). Therefore, the encoding process can be expressed as
where
Here, we discuss the choice of measurement coefficient
When N is sufficiently large
For convenience to our theoretical analysis and without sacrificing the CS reconstruction performance, we set the measurement coefficient
Hence, the measurement matrix
According to equation (6), each node has the same order of node degree

The bipartite graph between N nodes and N source packets.
Phase 3: storage phase
During the CS encoding phase, each node
Phase 4: reconstruction phase
To reconstruct the source vector
where
The pseudo-code of DSSCS strategy is illustrated in Algorithm 1.
Theoretical analysis
In this section, we first prove that the proposed DSSCS strategy guarantees CS reconstruction performance and then investigate the expression for data dissemination cost.
Proof of CS reconstruction performance
As presented above, the measurement matrix
Definition 2: isotropy property
For an N-dimensional signal vector
Definition 3: incoherence property
Let
where
Candes and Plan
30
proved that, when the matrix
Theorem 1
For an N-dimensional signal vector
Proof of Theorem 1
Let
Then, we have
In addition, the deviation of
It is evident from Theorem 1 that the isotropy property is mainly based on the measurement matrix
Theorem 2
For an N-dimensional signal vector
Before proving the correctness of Theorem 2, the following lemma is first provided.
Lemma 1 (Conditional incoherence property 1)
Let
Proof of Lemma 1
Since
Based on equation (13), we have
Based on the Chebyshev inequality, 31 we have
where
Hence, Lemma 1 holds.
Then, we prove the correctness of Theorem 2, it is a special case of Lemma 1.
Proof of Theorem 2
Since
Therefore, the matrix
Next, we consider that the signal vector
Theorem 3
For an N-dimensional signal vector
Before proving the correctness of Theorem 3, the following lemma is first provided.
Lemma 2 (Conditional incoherence property 2)
Let
Proof of Lemma 2
Based on equation (15), the incoherence measure
Since
where
Hence, Lemma 2 holds.
Then, we prove the correctness of Theorem 3.
Proof of Theorem 3
The elements of the DCT orthonormal basis are defined as follows
Since
Data dissemination cost
In this subsection, we investigate the expression for data dissemination cost of the proposed DSSCS strategy. As presented above, data dissemination is the most consuming process and it is proportional to the number of data transmissions and data receptions. According to Mobius et al.,
28
in wireless communications, almost identical energy is consumed for data transmission and data reception. We assume that each transmission/reception consumes unit energy, thus the data dissemination cost
Let N be the number of space nodes in the AS network. During the data dissemination process, every node broadcasts its own source packet only once. Therefore, the data dissemination cost of the proposed DSSCS strategy is
where
We should notice that the data dissemination cost is only related to the number of space nodes in the AS network. However, in the CStorage strategy
18
and the CDP strategy,
19
each source packet should visit the entire network at least once, and this is down by choosing appropriate parameters such as the forward probability p or the length of random walk t. Table 1 shows the data dissemination cost for all three strategies, where
Total data dissemination cost in each strategy.
DSSCS: distributed storage strategy based on compressed sensing; CDP: compressive data persistence.
Simulation results and discussions
In this section, we first investigate the appropriate parameters of the proposed DSSCS strategy through extensive simulations. Then, to evaluate the effectiveness of DSSCS strategy, we compare it with the CStorage strategy 18 and the CDP strategy 19 using various performance metrics.
Simulation parameters and performance metrics
We consider an AS network consists of N similar space nodes distributed uniformly at random in a normalized
To overcome the effect of orthonormal basis, we use both sparse basis and dense basis in the following simulations. Without loss of generality, we consider signals are sparse in the canonical basis
Three metrics are chosen to evaluate the CS reconstruction performance and efficiency of those strategies, that is, the successful CS reconstruction probability
Performance on different parameters
In this subsection, we present simulations to investigate appropriate parameters of the proposed DSSCS strategy. Since the measurement matrix
N. The number of space nodes in the AS network.
C. The positive constant that affects the transmission radius R.
S. The sparsity of the signal vector.
First, we investigate how the number of space nodes N affects the CS reconstruction performance of the DSSCS strategy. We fix the transmission radius
The reconstruction probability increases as
There exists an intersection point between different curves for both signals, for example,
The density of AS network increases as N gets larger, which also induces higher data dissemination cost and computational overheads. In addition, the ratio between d and N decreases with increasing number N, which results in a relative sparse measurement matrix. In order to ensure the CS reconstruction performance and without loss of generality, we set

The successful reconstruction probability
Total data dissemination cost and computational overheads in terms of N.
AS: autonomous system.
Next, we investigate the relationship between the CS performance and the transmission radius R. We fix the number of space nodes and sparsity at

The successful reconstruction probability
Total data dissemination cost and computational overheads in terms of parameter C.
AS: autonomous system.
Finally, we investigate the effect of the sparsity S on the performance of DSSCS strategy. We set

The successful reconstruction probability
Comparison with other strategies
To evaluate the effectiveness of the proposed DSSCS strategy, in this subsection, we compare it with the CStorage strategy
18
and the CDP strategy.
19
As presented above, these two strategies are the most representative strategies while they focus on different performance metrics.
Figure 9 shows the successful reconstruction probability of the proposed DSSCS strategy in comparison with the CStorage strategy and the CDP strategy for signals are sparse in canonical basis

The successful reconstruction probability
We also notice that the CS reconstruction performance of the DSSCS strategy is slightly worse than the CDP strategy in canonical basis especially when
The data dissemination cost and computational overheads of three strategies are shown in Figures 10 and 11, respectively. The data dissemination cost is expressed as dB since the values differ several orders of magnitude. It is evident from the results that the proposed DSSCS strategy consumes the least energy and computational overheads. This is benefited from the process of data dissemination. In addition, we notice that the consumption (overheads) of the DSSCS strategy and CDP strategy are independent of the querying ratio

The data dissemination cost of three strategies (DSSCS, CStorage, and CDP).

The computational overheads of three strategies (DSSCS, CStorage, and CDP).
Conclusion
We investigated the distributed data storage problem in the AS network of SIN, a simple but efficient DSSCS strategy was proposed based on the CS theory. Compared with most existing strategies that either use broadcast or random walk to disseminate source symbols, DSSCS utilizes the local information through “one-hop” broadcasting mechanism. In this way, not only the dissemination cost and computational overheads are reduced significantly but also guarantees the CS reconstruction performance under both sparse and dense orthonormal basis. Our simulation results validated the theoretical analysis and effectiveness of the DSSCS strategy.
Although the assumptions presented in the network model have been widely used in existing distributed data storage strategies, some of them may not be practical in reality. Our future works will mainly focus on the storage strategy under more practical conditions. In addition, the proposed strategy is a general method and may be extended to other networks such as wireless sensor networks and interplanetary Internet (IPN). We will also consider these issues in the near future.
Footnotes
Academic Editor: Jaime Lloret
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This work was supported by NSF of China under Grant Nos 91338201, 91438109, and 61401507.
