Abstract
A sensor array produces lots of bits of data every sample period, which may cause a heavy burden on the long-distance wireless data transmission, especially in the scenarios of wireless sensor networks. A lossy but error-bounded sensor array data compression algorithm is proposed, in which the major part of the sensor array data are approximated by the Catmull-Rom spline curve and the residual errors are quantized and encoded with Huffman coding. The performance of this algorithm has been evaluated with a real data set from the wireless hydrologic monitoring system deployed in Qinhuangdao Port of China. The results show that the algorithm performs well for the hydrologic sensor array data.
Introduction
Hydrologic data have been employed and making considerable contributions to many applications, such as helping scientists and researchers understand the hydrologic processes and formulate models for hydrologic prediction,1,2 offering valuable information for mariners to navigate across the ocean, 3 and assisting managers at the coastal ports to guide the vessels. There have been far more similar applications. As a result, it is of great significance to collect the hydrologic data to provide essential support to these wide range of situations.
In recent years, wireless sensor networks 4 offer a superb solution for the long-term, long-distance, and automatic hydrologic data collecting. Wireless sensor networks, which consist of wireless sensor nodes with the capability of sensing, data processing, and wireless communicating, have been extensively applied in various fields. 5 Wireless sensor networks promise an advantage over the traditional sensing methods in many ways. Besides, they also present a preferable choice in risk-associated applications in the inaccessible regions, such as habitat monitoring, environmental observation, underwater acoustic sensing,6,7 and hydrologic monitoring. 8 In fact, hydrologic monitoring systems based on wireless sensor networks have already been developed and deployed in different cases, such as the prototype network of meteorological and hydrological sensors in the Yosemite National Park in the United States, 9 the flash-flood alerting system with wireless sensor networks for monitoring the environment and tracking natural disasters in the Andean region of Venezuela, 10 and the remote monitoring system for heavy metal ions in water. 11 In Qinhuangdao Port of China, a sea route monitoring system has also been deployed, whose major function is to provide underwater flow velocity data to facilitate the safe navigation of the to-and-fro ships. 12
A sensor array is a group of sensors usually deployed in a certain geometric pattern to sense multiple parameters. Sensor arrays can also be used to record underwater hydrologic conditions at different depths. The wireless hydrologic monitoring system in Qinhuangdao Port of China makes use of sensor arrays to collect underwater flow velocity data to guarantee the safety of the sea routes, as depicted in Figure 1. The sensor array consists of a series of equally spaced sensors. The sensors are installed to a vertical steel cable, and the steel cable is suspended from a buoy. The sensors in the array record flow velocity at different depths and then the data collected will be transferred to the controller on the buoy, which will forward the data further to the shore station through wireless communications, artificial satellites for long-distance communications in particular.

A wireless hydrologic monitoring system.
Data compression is one of the most important schemes to decrease the size of the transmitted data. It leads to a reduction in the required wireless communication, which is the main power consumer in wireless sensor networks. 13 The sensor array produces lots of bits of data every sample period, which causes a heavy burden on the long-distance wireless data transmission in the hydrologic monitoring systems. All the data should be transferred in a real-time and error-bounded way to support the accurate information publishing for the vessels. Therefore, a compression algorithm for the data of the sensor array is in urgent need to lower the number of bits in data transmissions so as to save transmission cost and also extend the working time of the buoy equipped with batteries.
This article aims to propose an error-bounded low-complexity data compression algorithm for the sensor array in the wireless hydrologic monitoring system. To be specific, our contribution lies mainly in the following aspects. We have introduced a Catmull-Rom Spline curve Approximation algorithm with Error Compensation (CSA/EC) based on the pre-calculated Moore–Penrose pseudoinverse matrix for the sensor array data with a fixed number of equally spaced control points. Note that the algorithm does not depend on the historical data of sensors to ensure the receiver decompresses data in a stateless manner. Then, we have presented a data compression framework, in which the major part of the sensor array data are represented by the Catmull-Rom spline curve and the residual errors are quantized and encoded with Huffman coding. Finally, we have evaluated the performance of our algorithm and other error-bounded compression algorithms with a real data set. The result shows that our algorithm performs well for the hydrologic sensor array data.
The rest of this article is organized as follows: Section “Related works” introduces related works. Section “Preliminary” illustrates the mathematical model. Section “Description of CSA/EC algorithm” proposes our data compression algorithm. Section “Experimental results” presents experimental results. Finally, section “Conclusion” concludes this article.
Related works
Huffman coding 14 is an entropy variable-length encoding algorithm used for lossless data compression. Huffman coding adopts a specific method for choosing the representation for each symbol based on the frequency of occurrence of the symbol. A symbol that has a very high probability of occurrence generates a code with very few bits, while a symbol with a low probability generates a code with a larger number of bits. Codes are stored in a codebook to encode and decode. Huffman coding is one of the most efficient compression methods of individual source symbols when the actual symbol frequencies agree with those used to create the code. However, if the data of all sensors in the sensor array are regarded as a vector symbol, the codebook will be too large to construct or store in the buoy controllers; if regarded as a collection of individual scalar symbols, the data correlation of adjacent sensors is totally ignored. Thus, Huffman coding is not an ideal approach for data compression of the hydrologic sensor array.
Lossy compression methods ought to be taken into consideration as regards to the data compression of the hydrologic sensor array since the data here are not required to be exactly accurate. Lossy compression methods usually produce good compression ratio in general, for instance, JPEG (Joint Photographic Experts Group), 15 discrete cosine transform (DCT), 16 and vector quantization (VQ). 17 If the sensor array data are treated as a multi-dimensional vector, some dimensionality reduction methods may be able to apply in our situation. Principal Component with Vector Quantization (PCVQ) 12 compression method has been proposed for the hydrologic data. However, although the average data error decreases with an increase in the size of the codebook, the maximum error is not bounded. Since the maximum errors of these lossy compression methods are not guaranteed to be bounded, it may lead to inaccurate results in the worst cases.
As a basic lossy but error-bounded compression method, quantization is the procedure for reducing the number of bits needed to store an integer value by reducing the precision of the integer. Quantization is usually for scalar values. An adaptive piecewise constant approximation (APCA) algorithm 18 was designed to apply to time-series data set but it is also suitable for the space-series data, such as the data from sensors at different depths in our case. APCA algorithm ignores certain fluctuation in the data. A segment grows until deviation exceeds the error threshold and then the segment can be represented by a single value. The maximum error by quantization and APCA algorithms can be bounded to a certain constant, which will play a pivotal role in some specific data analyses.
We summarize all the compression algorithms mentioned above in Table 1.
The comparison of mentioned compression methods.
JPEG: Joint Photographic Experts Group; DCT: discrete cosine transform; VQ: vector quantization; APCA: adaptive piecewise constant approximation.
In addition, there are kinds of curves that are represented with just a small number of parameters, that is, control points for lossy data compression. Data can be regenerated using control points through interpolation at the receiver.19–21 Utilize spline interpolation to reconstruct the physical world. Quadratic Bézier curve and Catmull-Rom spline were introduced to compress video data.22,23 Reducing the data errors caused by the compression, 20 acquires control points through adjusting the sensing frequency and 23 selects control points according to the maximum allowed error. The original versions of the related data compression methods with splines are usually not error-bounded, or not for real-time applications. But they laid a foundation to construct the enhanced algorithm introduced in this article.
Preliminary
Catmull-Rom spline
Catmull-Rom spline
24
is a family of the
The

A segment of Catmull-Rom spline.
Now that the segment of the Catmull-Rom spline is cubic, and the
Equation (1) is restricted by
By combining equations (1) and (2), we always have
where
Representing sensor array data
Consider that there are N sensors
At a given moment, let the value of sensor
Now, let us approximately represent data

Representing sensor array data with the Catmull-Rom spline curve.
Let M denote the number of control points of the Catmull-Rom spline and
Obviously, there are
Let
In other words,
In addition, we place two additional virtual control points for the first and the last segments of the spline, as illustrated in Figure 4.

Additional virtual control points.
The virtual control points satisfy
If the
where ⌈·⌉ is used to denote the ceil function
Now, we calculate the reconstructed data by the tunable parameters
From equation (3), the interpolated value
Now, define
Thus, equation (10) can be written as
Furthermore, equation (12) leads to
where
and
Hence, we solve the optimal parameters
where ∥·∥ is used to denote the matrix norm
Let
Note that
Description of CSA/EC algorithm
A novel sensor array compression algorithm is introduced in this article, abbreviated to CSA/EC. The framework of the algorithm is illustrated in Figure 5.

Framework of the data compression algorithm based on Catmull-Rom spline approximation.
There are two parts in Figure 5. The left part depicts the procedures in the transmitter and the right part depicts those in the receiver. The sensor number N and the control point number M are given constants.
In the transmitter, the sensor array generates data
The approximate value of the sensor array data by the spline parameter
Note that
Quantization is applied to
The receiver reconstructs the approximate value
CSA/EC is depicted in Algorithm 1. When a group of data are produced by the sensor array, it will be compressed and the compressed data can be transmitted to the satellite or other wireless channels.
In our compression algorithm, the data errors are only derived from the quantization of the residual error. Thus, the maximum error is bounded by the quantization step size. It is worthy to pay attention to the fact that the computational delay of CSA/EC is constant. It is a low-complexity compression algorithm and is easy to be implemented in the embedded controllers.
The decompression procedure of CSA/EC is depicted in Algorithm 2. Note that the decompression algorithm is also of low complexity.
Experimental results
We conduct experiments to evaluate the effectiveness of the proposed algorithm based on a practical data set collected by a sensor array in Qinhuangdao Port during the period from September 2012 to December 2012. The data set is demonstrated in Figure 6.

Visualization of the data used in the experiments.
There are

Time series of sensor values from 12:06:54 p.m. on 15 December 2012 to 05:06:54 a.m. on 16 December 2012.
Each subgraph in Figure 7 depicts a time series which contains the sensing values of 100 sample periods generated by a given sensor. For simplicity of presentation, we only list the subgraphs corresponding to the sensors labeled No. 1, No. 5, No. 10, and No. 15. Here, the sensors are indexed from top to bottom. The beginning time of the segment is 12:06:54 p.m. on 15 December 2012 and the ending time is 05:06:54 a.m. on 16 December 2012.
All the algorithms mentioned in this section are implemented with MATLAB. The following experiments were conducted on a desktop computer with 8 GB memory and an Intel i7 processor (2.5 GHz). But it is worth noting that the proposed CSA/EC algorithm is not time consuming and we can easily implement it on an embedded controller.
Figure 8 shows the result of interpolating sensor array data with the Catmull-Rom spline at one sample moment. Comparing the original data with the interpolated data with different numbers of control points, we conclude that the interpolated data on the spline curves reflect the major fluctuation of the original data. The increase in the number of control points often, but not always, leads to the decrease in the residual errors. Theoretically, with the same number of control points as the sensors in the array, we will obtain a perfect Catmull-Rom spline which is identical to the original data but without any data reduction.

Original data and interpolated data.
As previously mentioned, the interpolated values by the Catmull-Rom spline only reflect the major part of the sensor array data. However, the residual errors are concentrated to zero, as depicted in Figure 9.

The distribution of the residual errors.
As the values of original sensor data are sparsely distributed from −4096 to 4095, it is observed that the residual errors range approximately from −600 to 600, and actually 70% of them are limited from −128 to 128. Thus, for the residual errors, Huffman coding is efficient and the size of the codebook is small.
Now, we compare our algorithm with other error-bounded lossy data compression algorithms. The goal of our algorithm is to markedly reduce the transmitted data. To measure the data reduction performance of compression algorithms, we define compression ratio as
where
All the packet formats to store the compressed data of different algorithms are shown in Figure 10.

Packet format to represent the compressed data.
Quantization is a basic lossy compression method. The values of each sensor are quantized and transferred one by one in a packet. If the maximum error bound
APCA is another error-bounded lossy data compression method to be compared. In APCA, the compressed data are represented as
ACSA (Adaptive Catmull-Rom Spline curve Approximation) is a simple error-bounded lossy data compression with spline function inspired by Chakrabarti et al.,
18
Reise et al.,
19
and Li et al.
20
In the algorithm of ACSA, the control points are selected as few as possible constrained by the maximum error bound. Initially, it selects the minimal number of control points to construct the spline. If the residual error of a point is beyond the maximum error bound, the point will be added as a control point. The aforementioned procedure is executed iteratively until all the residual errors are smaller than the maximum error bound. The compressed data can also be represented as
In our algorithm, that is, CSA/EC, the compressed data consist of two parts. The former is a fixed-length record to represent the Catmull-Rom spline parameters. The latter is a variable-length record to represent the residual errors. In our experiments, the number of control points of the Catmull-Rom spline is set to 4. The quantization step size of
Figures 11 and 12 show the sizes of the compressed data by different methods in detail for 100 samples randomly chosen from the real data set with the maximum error bounds set to 8 and 32, respectively. The performance of APCA and ACSA has similar results because both of them dynamically select the control points. CSA/EC has a relatively small size of compressed data compared to other algorithms with the maximum error bound

The compressed data sizes of 100 sample periods (

The compressed data sizes of 100 sample periods (
Using the complete data set from September 2012 to December 2012, we obtain a more convincing results. Figure 13 presents comparison of average compression ratios of CSA/EC and other lossy error-bounded algorithms with different maximum error bounds. All these algorithms achieve higher compression ratios with an increase in the set maximum error bound. CSA/EC has an outstanding advantage over other algorithms. The details of CSA/EC and other algorithms in terms of compressed data size and compression ratio with different maximum error bounds on our data set are listed in Table 2.

The compression ratios of different maximum error bounds.
The comparison of CSA/EC and other algorithms.
CSA/EC: Catmull-Rom Spline curve Approximation algorithm with Error Compensation; APCA: adaptive piecewise constant approximation; ACSA: Adaptive Catmull-Rom Spline curve Approximation
The experimental results indicate that the CSA/EC algorithm is obviously superior to other algorithms. The sensor array data are saved 41.3% of the original size when
Conclusion
Sensor arrays in a certain geometric pattern are often used to record conditions. Data compression is of great importance to data collecting with wireless sensor networks since the sensor array produces lots of bits of data every sample period. We propose an error-bounded lossy sensor array data compression algorithm, in which the major part of the sensor array data are represented by the Catmull-Rom spline curve and the residual errors are quantized and encoded with Huffman coding. The algorithm is also of low complexity and a stateless, which can be easily implemented in the embedded micro-controller in a wireless sensor network.
We have evaluated the performance of our algorithm and other error-bounded compression algorithms with a real data set from the wireless hydrologic monitoring system deployed in Qinhuangdao Port of China. The result shows that our algorithm performs well for the hydrologic sensor array data. Furthermore, our algorithm is not limited to underwater hydrologic monitoring. It is suitable for any sensor array in a linear geometric pattern as long as the major part of the data are possibly represented by a spline and the residual errors are convergent enough.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Natural Science Foundation of China (61370191), MoE (Ministry of Education) and China Mobile Limited Joint Foundation (No. MCM20160103), the Fundamental Research Funds for the Central Universities of China (FRF-BR-15-081a), and the Foundation of Beijing Engineering and Technology Center for Convergence Networks and Ubiquitous Services.
