Abstract
Imaging is an important means to explore the ocean for underwater robotics. The diffuse attenuation coefficient of light in water is one of the most important optical properties of seawater. This article presents a model-based method to analyze the causes of distortion of underwater images. We built a platform for underwater image acquisition and target recognition. The model coefficients were calibrated with images captured underwater and in air. Experiments were carried out to verify the designed algorithm and the transmission error model. The experiments show that the presented method works well in improving the accuracy of feature extraction and recognition of underwater targets.
Keywords
Introduction
In recent years, demands for subsea inspection are increasing with a growth in a number of activities in deep ocean. 1 Underwater robotics have been increasingly deployed for numerous missions related to oceanography, hydrography, coastal, and inland water monitoring. 2 Many researchers have investigated underwater robotics from different perspectives. Some research works focus on tracking control and navigation of an autonomous underwater vehicle (AUV). 3 Some research works on formation control of a group of AUVs. 4 In some other works, task assignment and path planning are the objects of their study. 5 What’s more, the development of underwater robotics has greatly contributed to target search, undersea mines clearance, and pipeline inspection. 6
Imaging is regarded as a promising method for detecting underwater environment, 7 and camera has been one indispensable sensor on underwater robotics. 8 Through the efforts of many scholars, underwater optics plays an important role in target recognition, map creation, underwater positioning, AUV homing and docking, 9 and many other fields.
Although significant processes have been achieved on pattern recognition on land, 10 computer vision, and graph matching, 11 underwater optical vision faces great challenges. It is difficult to obtain high quality images of underwater objects, mainly due to the wavelength-dependent absorption and scattering of the permeating light by water. The factor distorts the spectral energy distribution in underwater images as well as cause the images to be dark and hazy. 8,12 –16 Some research works have been done to remove the effects of water column on images. 17,18 Other researches are devoted to modeling of underwater imaging process, calibration of the optical properties of water, and correction of the underwater images. 8 The attenuation of light has been modeled using the Beer–Lambert law in most studies, 18 –21 but various specific factors are considered in different models, for example, see the study by Guo et al. 8
Another problem is feature recognition. A few of research have proposed a variety of target recognition systems and algorithms. In 1986, John F Canny produced the Canny edge detector, which uses a multistage algorithm to detect a wide range of edges in images. 22 Image difference methods have been used by Lipton to detect moving targets in images, 23 Dubuisson, 24 and among others. Latecki et al. used discrete curve evolution to obtain simplified shape information. 25 Hao et al. applied the epsilon method to the shape description and set a variety of scale parameters to obtain the contour feature parameters. 26 The experiment designed by Kauppien and Sepanen showed that Fourier-based descriptor is the method with the best shape recognition ability. 27
In this article, we explore a novel construction of a preliminary error model for underwater image, which considers all error factors completely. We have also designed an algorithm for co-segmentation and feature recognition. Using the library of Fourier descriptors and considering the distortion based on the model, the recognition rate is greater than 88% in most experimental groups.
We treat an underwater formal target as the object. Visual detection and feature recognition of underwater image are urgently needed in the field of marine engineering for detecting submarine cables, pipes, and other underwater targets. Man-made objects in the ocean usually have regular shape features.
The structure of this article is as follows. The underwater optical signal transmission channel error model is investigated in the second section. The method for image segmentation and feature recognition is presented, along with detailed algorithms, in the third section. The experimental setup is described in the fourth section, followed by experiments and results in the fifth section. The sixth section presents our conclusions.
Underwater optical signal transmission channel error model
It is serious that light is weakening in the underwater environment. Therefore, underwater robots need to carry artificial light source for image acquisition. It is the key issue to improve the quality of underwater images and choose effective information in complex underwater environment. Many factors restrict the development of underwater vision such as the scattering and attenuation of light, the degradation of underwater image quality, and information loss.
Researching on the influence of various factors on losses in transmission of the underwater optical signal, the error model is constructed. Using the model, the image quality is better and the image is compensated for the distortions. It is the recognition accuracy of the underwater target detection algorithm that has important reference significance in underwater image processing.
Underwater optical signals are inevitably associated with loss and noise in the process of reaching the acquisition device (see Figure 1). In addition, due to the complicated imaging environment of the ocean, the acquired image is degraded by color distortion and decreased contrast.

The process of underwater optical signal transmission. The yellow arrow and green arrow represent the light captured by the camera. The orange arrow indicates natural light and the blue arrow indicates artificial light. The error model considers several major factors in the loss of light in water: absorption, scattering, refraction, and lens distortion caused by the camera on AUV. The model structure is derived from the basic properties of light propagation in the water. AUV: autonomous underwater vehicle.
To construct an error model for underwater optical signal transmission, we must quantify and regularize the factors that cause loss of image information. The error models are based on the optical loss analysis of the Beer–Lambert law. However, different models focus on different influencing factors. 21 Based on a comprehensive review of the literature on this topic, we make the following assumption. Using a comparison of the distortion of the same target information underwater and in air, the influence coefficient of each factor in the model is obtained. We consider factors such as water turbidity, the distribution of light, the chemical composition of water, and other physical properties.
At present, research on underwater optical image processing shows that it is the absorption of light that is the most important factor causing image distortion in water, which follows the Beer–Lambert law. In addition, the scattering and attenuation of light in water is also an important reason for the loss of optical signals. The scattering of the target light in water must be considered. The flow of water within existing internal waves will intensify the degree of scattering (positive correlation). One must also consider the flow within the water body, the water flow rate, vortices, and so on. The camera is also important, especially the camera focal length f, the object distance, and the viewing angle θ.
L(x, y) is a matrix that consists of all points of the image in air. There is a pixel matrix M on the image plane corresponding to point matrix L on the object. The pixel lattice M(i, j) can describe the entire image information. The pixel lattice corresponding to the target viewed in air and the underwater target are not identical. A point (x, y) on the target corresponds to (i, j) on the pixel image. A point set L is represented by a vertex i, where an appearance descriptor around (x, y), like Fourier descriptor. Point correspondence could be well-defined and effectively solved by graph matching. Acquiring the image of underwater target, a new matrix L′(x, y) and a new pixel lattice M′(i, j) are obtained. By comparing M(i, j) and M′(i, j), the distortion error is calculated. If the target is simplified as a planar shape with the number of pixels n × n, then the error of the plane shape is an n × n matrix, the matrix Error is shown as follows
The model structure is derived from principles, representing the absorption, scattering, and refraction by water and the optical properties of image sensors. Based on the above analysis about the variability of water optical parameters, equations (2) is simplified as
where T is the transmittance of light, T = I/Io (I: received light, Io : initial light), which follows the Beer–Lambert law, depended on turbidity of water (Tu).
G is reflection and refraction, which is related to the wavelength of light λ and the fluidity of water w. Here, we can’t obtain single T or G, only the sum c = T + G is given by
where di
is the distance between the target and camera, and
Q represents the distortion of light from target to camera (related to the camera focal length f, the acquisition distance D, and the observation angle θ). Denote the coordinates of point P as P(xp , yp ), the object distance as zp . Image point (denoted P′(x′p , y′p ) and image distance (denoted by z′) is given by 28 (Snell’s Law)
and view angle θ is given by
L(x, y) is the target information in air. α is accidental error, satisfying normal distribution.
An experiment has been set to verify the influence of the depth and wavelength interference factors on the acquisition and recognition of the target. The Fourier descriptors for the same target at different depths are calculated and compared with those in the air/shallow water group. By calculating the similarity of the shapes of these sets of targets, the relationship between distortion and depth is obtained. Similarly, the relationship between the distortion and the wavelength of light is obtained by investigating the similarity of the target shape of the air/shallow water group with different color objects under the same conditions.
Algorithm analysis
Image segmentation is one of the most important methods for target extraction. It is based on image attribute such as color, gray level, spatial distribution, and geometric features.
However, due to interference from the water environment and the complexity of the background, it is impossible to obtain better results using only threshold segmentation. 29 By adding hue, saturation, value (HSV) color space segmentation, we can greatly improve the accuracy of segmentation. HSV is one alternative representation of the RGB color model. It is designed in the 1970s by computer graphics researchers (such as Smith 15 ), which more closely align with the way human vision perceives color attributes.
After segmentation of the target area, a Fourier descriptor is used for feature detection and shape recognition. The basic idea of this method
30
is to extract the contour of the object as a closed curve L, which acts as a point trajectory.
The shape similarity is calculated from the normalized descriptors d(k). The Euclidean distance of the normalized Fourier descriptors d is obtained as an index, to describe the similarities and differences of the shape.
where di and dj are the normalized Fourier descriptors from two groups.
The algorithm process is shown in Figure 2 (co-segmentation) and Figure 3 (feature recognition based on Fourier descriptors).

Co-segmentation algorithm based on gray level threshold segmentation and HSV color segmentation. The algorithm is used to divide the target from the background, which is a more accurate way to capture the target completely. HSV: hue, saturation, value.

The feature recognition algorithm for underwater formal target. Set up a Fourier descriptor library and calculate shape similarity using Euclidean distance. All Euclidean distances and Fourier descriptors are collected for a characteristic database, from which the target shape characteristics can be extracted. After the characterization of the target object and identification of the target shape, the shape similarity can be also calculated. This allows determination of the shape with the underwater image distortion.
The co-segmentation algorithm aims at segmenting out one object in the underwater image. It starts with the gray-threshold image and an initial germ located inside the boundary of the region. Then, pixels adjacent to the region are iteratively merged with it. During the process, two criteria are used to control the boundary. The first one compares the image data with gray threshold. The second criterion uses HSV-color information to detect the optimal region boundary and stop the growth.
On the process of image region merging 31, a merging operation aiming at reconstructing these regions is required. Merging is based on criterion of low average gray gradient at the boundary between two adjacent regions. Other criteria based on color considerations can be used if some a priori knowledge about the HSV color of regions is available.
In this article, a large number of underwater images are acquired and analyzed. The Fourier descriptors of many hydrological target images are extracted and used to construct feature library. The Euclidean distance between the descriptors in each set of descriptors and the Fourier descriptors of the same target in the air group and shallow water group is calculated. Here, an error library is constructed from the distortion. The error here is a synthetic one. Using the control variable method in the error library, one factor is changed in different groups while all other conditions are the same. This is a single parameter sub-library. For any one of the underwater target images, the normalized Fourier descriptors are calculated, which correspond to the characteristics of the library. Then, the corresponding Euclidean distance is calculated to measure the degree of similarity. After that, the least Euclidean distance is chosen in the library which is closest to the shape of the most similar set of descriptors. Then it can be considered that the target is the hydrological environment corresponding to the set of descriptors. Corresponding to the feature library, a series of different mathematical features are obtained for the target in the shallow water group and in the air group.
Experimental validation
To verify the validity of the recognition algorithms and the error model, an experiment is performed as showed in Figure 4. It comprises a mobile phone, a water tank, and several regular-shape targets for testing.

The experimental setup consists of a mobile phone, a water tank, a water pump, and regular-shape targets as shown in (a) and (b). The mobile phone used to acquire images is placed in a waterproof box (c). Different regular targets are placed in the water. (a) Sketch of the experimental setup; (b) actual experimental setup; (c) image acquisition using the phone.
The mobile phone (MX metal, Meizu, China) is placed in a square transparent waterproof box. A uniform artificial light source is used to acquire images of underwater targets. The MX metal camera has 13 megapixels. The actual focal length of the phone lens is 4.068 mm.
The waterproof tank (L 100 cm × W 100 cm × H 100 cm) is constructed by blue waterproof cloth and a metal frame. The depth of the water is 0.8 m. When the tank is filled, the water is not very clean and contains impurities. The situation is similar to the real underwater environment.
The water pump works at 50 Hz, producing 230 W of power. The rate of flow is 4 m3/h and the pipe diameter is 10 mm. The device is used to simulate the flow interference.
The test targets include two categories. One type is the standard rule target models produced by a 3-D printer; they include a printed triangle (side length of equilateral triangle is 10 cm, thickness 5 cm) and a rectangle (10 cm long, 6 cm wide, and 5 cm high). Because the print work of sphere is not standard, there is no standard spherical model. Other kinds of targets include objects from daily life such as rectangular plates, bricks, and lunch boxes. A spherical model is created using a water-filled balloon. There are spherical water balloons in five colors (red, green, blue, yellow, and purple), triangular objects in green and red, and rectangular objects in red and green.
Two methods are taken for image acquisition. The first is continuous shooting mode to get series of images. The second is to acquire a video and to extract a frame for target recognition. The algorithm runs on MATLAB [version R2014a (8.3.0.532)].
It is the process of co-segmentation and image region merging in Figures 5 and 6. Figure 5 shows experimental results for a target in air with a yellow background. After the experiments in air, underwater experiments were performed for different colors, shapes, and depths. Some of the results are shown in Figure 6. The accuracy of the segmentation recognition is affected by the size of light spot. The spot can be easily misinterpreted as a target. In practice, the spot that is less than 1/4 of the target area can be removed as noise. If the spot is too large, it will be misjudged as one part of the target.

Results in air. In each group, it shows the original image, the gray image, the grayscale distribution histogram, the binary image and the boundary of target segmentation. The results of segmentation in the rectangular group are analyzed. From (a) green triangle in air and (b) green rectangle in air, we can see the use of multi-threshold segmentation algorithm improves the segmentation results to give the best target extraction. As can be seen from (c) red triangle in air and (d) red rectangle in air, the shadow produced by the target itself has a very large effect on shape recognition. In the following test results, the identified shape area is larger than the actual shape area.

Part of the results from the underwater groups. In each group, there are the original image, the gray image, image segmentation based on gray level threshold, the HSV color space, and the boundary of target co-segmentation. (a) Red balloon 40-cm underwater and (b) red balloon 80-cm underwater. The blue balloon group is the most difficult one to identify the target. It can be seen in (c) blue balloon 40-cm underwater and (d) blue balloon 80-cm underwater that the difference between the blue ball and the surrounding environment is very small. However, using the co-segmentation algorithm, the segmented target area is very close to the actual area of the target.
Results and discussion
In each situation (different colors, shapes, and depths), a long video was collected, extracting 25 frames from the video for processing by algorithm and recording the identification accuracy. For example, in the case of the red balloon in air, 25 frames are taken from video for identification. On changing the target, the above steps are repeated on the acquisition process. The experimental results are showed as follows:
By studying existing researches, the visibility of the sea water varies between 0.3 m and 27 m, which depends on water properties and light. In deep sea, artificial light is the only light source for searching target. And it is less than 1 m when the water is not clear. The size of our targets is very small in the scale of ocean. Considering real water environment, set 80 cm as our experimental distance is in line with practical application and acceptable. To compare the results in a more intuitive way, the data is showed as line graphs in Figure 7.

Line graph of accuracy. Data from in Table 1.
From Table 1 and Figure 7, it shows that the rate of recognition in air is highest among all groups, because the target in air is the least distorted. However, the result for the yellow balloon group 40-cm underwater is higher than the air group. This is because the background color of the air group is yellow, close to the color of the yellow target balloon. We can see that the green rectangle and the green triangle clearly have a higher recognition rate in underwater situations than green balloons. This is because triangle has three corner joints and rectangle has four corner joints, but the boundary of balloon is a curve which is easier to be destructed and identified wrongly. The recognition rate of blue balloons in shallow water, and at 40 cm and 80 cm against a blue background, was significantly lower than for other color groups. In this case, the difference between the color feature and the grayscale feature is small, and the joint segmentation algorithm cannot function normally. Similarly, green balloon recognition accuracy is relatively low. The circle called for higher requirements on the distribution of feature points, and if the result of segmentation is not smooth enough, it can easily be mistaken for other shapes such as rectangles and triangles.
Recognition accuracy of underwater regular targets.
The algorithm can be used to construct a preliminary underwater image reconstruction and feature extraction system. It is designed to inspect the functioning and accuracy of the model under laboratory conditions.
Comparing the tested target with the corresponding target in the library, the range of target shape distortion is 2.1–6.2%, and the average degree of distortion is 3.6%. When the distortion degree is less than 4%, the environment in which the target is to be measured is same as the corresponding environment in the library. Figure 8 shows an example library of Fourier descriptors. In this example, the degree of the distortion degree is 6–13.5% when compared with the air group. Compared with the group in shallow water, the degree of distortion degree is 3.8–8.1%.

(a) A library of Fourier descriptors; (b) Euclidean distance of the normalized Fourier descriptor.
Our error model accounts for factors that influence target segmentation threshold and Fourier characteristics for the corresponding error correction. It is used to reconstruct the target shape. The result of the target image collected in air is compared with that obtained at a depth of 80 cm underwater. The geometrical characteristics of the target suffer from an average distortion of 11.5% compared with the air and 6.5% compared with the blue shallow water. Combined with the average distortion, the stack parameter model revised the rules of feature extraction resulting in 4.8%, thus meeting the original goals. Apart from some of the blue and green results, the other shape recognition accuracies were above 88%. The results of each group are not the same. It mainly depends on the angle and acquisition of light, thus it’s hard to repeat acquisition experiments exactly. And there are also certain differences between group goals.
The error correction is carried out according to the average degree of distortion of the target. Compared with laboratory conditions, the optical properties of seawater under real environmental conditions change, depending on the time and location. Due to the limitations of the conditions, the underwater environment simulated by the laboratory has several disadvantages: (1) the hydrological environment is quite different from the real marine environment and the color of underwater background is only blue; (2) considering the requirement of light (even harsh, reflective, and uneven distribution of light), the recognition error rate is increased.
Conclusions
In conclusion, we built a preliminary underwater optical signal transmission error model. The algorithm was designed for visual detection and feature recognition of underwater formal targets. Three basic shapes were chosen to develop the detection and recognition algorithm: a circle, a triangle, and a rectangle. The co-segmentation algorithm, based on gray level threshold and HSV color, was used to divide the target from the background. Fourier descriptors were used to describe shape features and calculate the distortion error. The error matrix is represented as the comprehensive factors in transmission error model. Distortion error involves underwater image enhancement and dehazing using model-based fusion for corrosion estimation. Experimental results prove the validity and reliability of the model and the algorithm.
Future work will focus on two parts. First, to adapt to the underwater environment better, it is asked to establish an extension library based on more diverse hydrological environments for improving the practicality and applicability of the model. Then, the algorithm will be deployed for target search, homing, and docking for an AUV. Furthermore, we are interested in one of the current hot spots—simultaneous localization and mapping (SLAM). Underwater obstacle detection and recognition algorithm will be implemented as a support algorithm for SLAM solutions in underwater environment.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by National Key Research and Development Program (2016YFC-0300801), the Natural Science Foundation of China (NO.51679213), in part by the UK Royal Society International Exchanges 2017 Cost Share, China, under Grant IECnNSFCn170405 and the Fundamental Research Funds for the Central Universities (29200).
