Sage Journals: Discover world-class research

Abstract

Most of the 3D reconstruction requirements of microscopic scenes exist in industrial detection, and this scene requires real-time object reconstruction and can get object surface information quickly. However, this demand is challenging to obtain for micro scenarios. The reason is that the microscope’s depth of field is shallow, and it is easy to blur the image because the object’s surface is not in the focus plane. Under the video microscope, the images taken frame by frame are mostly defocused images. In the process of 3D reconstruction, a single sheet or a few 2D images are used for geometric-optical calculation, and the affine transformation is used to obtain the 3D information of the object and complete the 3D reconstruction. The feature of defocus image is that its complete information needs to be restored by a whole set of single view defocus image sequences. The defocused image cannot complete the task of affine transformation due to the lack of information. Therefore, using defocus image sequence to restore 3D information has higher processing difficulty than ordinary scenes, and the real-time performance is more difficult to guarantee. In this paper, the surface reconstruction process based on point-cloud data is studied. A Delaunay triangulation method based on plane projection and synthesis algorithm is used to complete surface fitting. Finally, the 3D reconstruction experiment of the collected image sequence is completed. The experimental results show that the reconstructed surface conforms to the surface contour information of the selected object.

Keywords

Point cloud data 3D reconstruction Delaunay triangulation method plane projection synthesis algorithm surface contour

Introduction

Human beings view the three-dimensional structure of the surrounding world according to the intuitive visual system. Its visual information is the most important means for us to perceive others, objects, society, and the world.^1–3 The information we see through our eyes accounts for 80% of the external information we know. After the birth of the computer, through an image recognition algorithm, researchers can identify people and objects in pictures and even use machine learning algorithms to interpret facial expressions.^4,5

Researchers in computer vision have been developing algorithms for recovering 3D information from images.^3,4,6,7 With this technology, we can accurately calculate part of the 3D model of the environment from thousands of photos. However, it is still difficult to recognize and classify images.^2,5 The reason is that machine vision research is a reverse problem, such as reconstructing its shape, texture, and color distribution through images. However, we use the models of physics, statistics, and other disciplines to establish relevant mathematical models and recover the unknown from the lack of information to solve the target problem.^4,6,8–21

Three-dimensional reconstruction is essential for computer vision.^{2,4–6,9,14,22} It uses image information collected by visual equipment and then calculates the 3D information. At present, this method can be used in many fields such as navigation, remote sensing, industrial detection, and so on. In the 1970s, Marr first proposed a theoretical framework of machine vision, which was widely accepted and followed by researchers in related fields, and was known as the father of computer vision in this field.¹ In the next 20 years, vision theory has been developed rapidly, and many algorithms have become the foundation of today’s visual theory and have been developed and applied in various directions. In the 1980s, Ikeuchi and Horn²³ brought forward the concept of shape from the shading method. According to the shading equation, the reflected lighting model was built, and then the 3D structure was derived from the surface method. As the method is an under-constrained problem, other constraints need to be added to solve the problem. Vogel et al.²⁴ proposed a three-dimensional reconstruction of the light field of real pictures based on the non-Lambert model. Barranco et al.²⁵ used optical flow tracking technology to match image feature points and then performed affine transformation to obtain camera parameters and 3D coordinates of the target object. This is the first complete three-dimensional reconstruction model using images, but the accuracy is not high, and the application is also relatively few. In 2003, Schmitt proposed a high-quality 3D reconstruction method,³ which used laser scanning to obtain texture information and contour information modeling, and the error of the recovered 3D structure was 0.25%. In 2012, Microsoft Research launched that the depth information of the scene could be obtained through multiple depth sensors, and the 3D model of the scene could be reconstructed in real time by the computer under multi-angle scanning. This method has become the most popular method for 3D reconstruction at present.

The 3D reconstruction of the microscopic scene in this paper has more application requirements in industrial detection and medicine.^4,10,26–29 However, due to the characteristics of the short depth of field of a microscopic scene, only a small part of each image is clear. A small amount of image information is far from enough for 3D reconstruction, and a large number of continuous frame images have enough information. However, they can not solve the problem of low computational efficiency. Therefore, at present, this field only studies the method of relative estimation of depth information of two defocused images.

In this paper, the estimation results of defocus image of the video microscope are processed. The 3D point cloud data of the object surface is obtained by transformation. A series of methods are used to downsample the 3D point cloud data and filter and denoise to obtain the sparse and smooth point-cloud data conducive to surface fitting. The pre-processed 3D point cloud is fitted by the Delaunay triangulation method. The simulation results show that the reconstructed sample surface conforms to the surface profile information of the selected object.

Data

The image data used in this study are taken using a monocular microscope (with a magnification of 0.5 × (0.7–4.5), a lens radius of 35 mm, and an F-number of 4) with Basler industrial camera (acA640-120uc). All the experiments in this article were completed under the Windows 10 operating system. Using Bysler’s PylonViewer for image acquisition under the microscope and simulation through Matlab2016a, a small number of pre-processing steps were performed completed using Microsoft Visual Studio 2013 and OpenCV.

Point cloud data pre-processing and down-sampling

The point cloud data is generated using the depth estimation result. The corresponding plane positions of the image pixels are the X and Y value of the point, and the depth value is the Z value. Generate point cloud data through space coordinates to complete the three-dimensional reconstruction of the target object. Generally, point cloud data³⁰ is noisy, caused by camera calibration error,³¹ matching error, and deviation in generating point cloud computing. All these noise points will cause deviation to the result. If not processed, the noise points will make the 3D reconstruction result challenging to estimate. For example, some non-existent peak values will be generated, significantly reducing the 3D reconstruction effect. Therefore, it is necessary to deal with the point cloud. Secondly, the number of points in the cloud will also affect the surface reconstruction. The ordinary depth map of 640 × 480 can cause insufficient computer memory and freeze during surface fitting.⁷ In general, when using data, a lower sampling should be done to simplify the data so that the operation can be carried out normally and improve the operation efficiency.

There are many types of noise in point cloud data, and the noise structure is also different so that the filtering method will be different. General point cloud denoising methods include Gaussian filter,³¹ median filter,³² and mean filter.³³ By smoothing the noise, these abnormal points are no longer prominent, which affects the effect of 3D reconstruction. However, in the process of denoising, its smoothing processing cannot be 100% matched to the position of the noise point for processing. It is denoising for the whole image to have an impact on the non-noise point. In some cases, some of its features will be lost due to the denoising process. In the microscopic scene in this paper, the target object is distributed most of the time. The double-pass filtering method³⁴ is selected to filter the region’s abnormal point.

A two-dimensional grayscale image is used. Suppose the depth map is a $m \times n$ matrix, where $m \times n$ represent the size of the picture. Each element in the matrix represents the estimated depth value in the position of the corresponding pixel. In the process of filtering, the Euclidean distance between a point and a neighboring point is used to re-distribute the weights of each pixel. Then the weighted average Euclidean distance between this point and other points in the neighboring area is used to re-fit the three-dimensional coordinates of the new point. The calculation method is shown in equation (1).

N_{p} D_{p} = \frac{\sum_{q \in N_{p}} W_{s} (‖ p - q ‖) W_{d} (‖ C_{p} - C_{q} ‖) C_{q}}{\sum_{q \in N_{p}} W_{s} (‖ p - q ‖) W_{d} (‖ C_{p} - C_{q} ‖)}

(1)

Where, $‖ p - q ‖$ refers to the pixel geometric distance³⁵ between p and q pixels; $‖ C_{p} - C_{q} ‖$ is the three-dimensional space distance between two pixel points corresponding to three-dimensional space points; $N_{p}$ is the neighborhood of point p; $W_{s}$ and $W_{d}$ are the geometric distance weight of pixels and the three-dimensional space distance weight between two points, respectively, and their specific forms are shown in equation (2), where $σ_{s}$ and $σ_{d}$ are the geometric distance scale factor of pixels and the three-dimensional space distance scale factor respectively.

{\begin{matrix} W_{s} (x) = e^{- \frac{x^{2}}{2 σ_{s}^{2}}} \\ W_{d} (x) = e^{- \frac{x^{2}}{2 σ_{d}^{2}}} \end{matrix}

(2)

$W_{s}$ and $W_{d}$ are allocated according to their Euclidean distance size. That is to say, if the geometric distance of pixels is relatively close and the spatial distance is relatively small, the weight value will be significant. If the two distances are relatively large, the weight value will be small. The smooth point in the region is not affected too much. When the jump of the point in the region is significant, it will act as a smoothing effect.

The downsampling of point-cloud data is generally simplified according to the appropriate sampling frequency, ensuring that most of its characteristics are not lost and ensures the smoothness of its operation process. Considering the smoothness of most observed objects in the micro scene, the lower sampling of point cloud was conducted on a grid-based selection method.

Method

Improved image fusion method based on sparse decomposition

Like other machine learning algorithms, the dictionary learning process in the sparse representation method is faced with computational complexity. When the image resolution is high or the number of images is large, the training time may be multiple or even dozens of times the sum of the other steps of image fusion, which affects the calculation speed of the algorithm. In order to improve this problem, we think of parallel computing, such as CUDA based on hardware and multi-scale fusion based on software. Among them, the multi-scale fusion algorithm based on transform is Discrete Wavelet Transform (DWT). The key is the selection of fusion rules. Different fusion rules should be selected for different decomposition levels. The most straightforward average rule is often used for low-frequency signals to ensure visual clarity. Inspired by this, we use the DWT algorithm to decompose the source image into a low and a high-frequency signal. The high-frequency component reflects the structural information. The sparse representation (SR) fusion rule is used to get the high-frequency fusion coefficient. The low-frequency component is divided into fixed-size blocks, and their sharpness indices are calculated. The block with the largest sharpness index is fused into the low-frequency coefficient. Finally, the fused image is obtained by inverse discrete wavelet transform (IDWT). This method uses the multithreading mechanism similar to a computer decomposing the high-frequency signal with a large amount of information in multi-scale. It fuses the sparse representation with less information according to each scale, which reduces the training dictionary time and image fusion time.

Here we use the DWT algorithm. In dictionary learning, each pair of images in the target image sequence is decomposed by the DWT algorithm, and then the components of each level are sparsely coded. Before sparse decomposition, the multi-scale decomposition of the image reduces the dimension of each scale coefficient and the amount of information, and the amount of iterative operation is significantly reduced. The component coefficient approximation between scales can be operated in parallel. The time spent in training dictionaries is effectively reduced under this kind of operation decomposition. Similarly, the time required for image fusion is also significantly reduced.

Modeling depth estimation method based on geometric constraints

We classified the problem as the Depth from Defocus problem. According to the difference between different focus planes and imaging positions, the fuzzy parameter difference model and spatial constraint are deduced. Then the relationship between depth information and fuzzy information is established by using the model, and the complete spatial information of the target object is estimated by iteration.

Geometric derivation of the depth estimation model

The geometric principle of actual aperture imaging is shown in Figure 1. There are three position relations between the focal plane and the imaging plane: $v_{0} = v$ , $v_{0} < v$ , and $v_{0} > v$ , respectively, the corresponding image is on the focal plane, in front of the focal plane, and behind the focal plane.

Figure 1.

The geometric structure of aperture imaging.

When $v_{0} = v$ , $\frac{1}{D} + \frac{1}{v} = \frac{1}{F}$

When $v_{0} < v$ , $\frac{1}{F} < \frac{1}{D} + \frac{1}{v_{0}}$ . Set $r_{0}$ as the radius of the lens, the radius of the dispersion circle is

r = r_{0} v_{0} (\frac{1}{v_{0}} + \frac{1}{D} - \frac{1}{F})

(3)

When $v_{0} < v$ , $\frac{1}{F} > \frac{1}{D} + \frac{1}{v_{0}}$ . The radius of the dispersion circle is

r = r_{0} v_{0} (\frac{1}{F} - \frac{1}{v_{0}} - \frac{1}{D})

(4)

From (3) and (4), the radius of the dispersion circle, namely the fuzzy degree evaluation parameter, can be expressed as

r = r_{0} v_{0} | \frac{1}{F} - \frac{1}{v_{0}} - \frac{1}{D} |

(5)

Since the radius of the dispersion circle has a specific relationship with the camera, in consideration of this point, let $r ρ = σ$ , then the blur parameter is

σ = ρ r_{0} v_{0} | \frac{1}{F} - \frac{1}{v_{0}} - \frac{1}{D} |

(6)

$σ$ is also a diffusion parameter that measures the Point Spread Function (PSF) of the dispersion circle. Take the gaussian diffusion function as an example. It can be expressed as

{h_{σ}}^{u} (y, x) = \frac{1}{2 π σ^{2}} e^{- \frac{{‖ y - x ‖}^{2}}{2 σ^{2}}}

(7)

Where the relevant parameters of the camera are set as u, and the function is denoted by $h_{σ}^{u} (y, x)$ .

Firstly, the two defocused images were studied, and the generated defocused images ${\tilde{I}}_{1}$ and ${\tilde{I}}_{2}$ , their expressions were as follows:

{\tilde{I}}_{2} (y) = \int_{Ω} h_{σ_{2}}^{u_{2}} (y, x) f (x) dx

(8)

{\tilde{I}}_{1} (y) = \int_{Ω} h_{σ_{1}}^{u_{1}} (y, x) f (x) dx

(9)

Where u₁ and u₂ represent the parameters of the camera during shooting the image $I_{1}$ and $I_{2}$ ; σ₁ and σ₂ represent the diffusion parameters of the fuzzy function.

Given the different blur degrees of the two images at different positions, the following difference function is used to simulate:

\tilde{f}, \tilde{σ} = argmin Φ (f, σ)

(10)

The approximation method adopts the least square method:

\begin{matrix} Φ (f, σ) = \int_{Ω} ‖ I (y) {\tilde{I}}_{σ}^{u} (y) ‖ dy \\ = \int_{Ω} {‖ I (y) - \int_{Ω} h_{σ_{2}}^{u_{2}} (y, x) f (x) dx ‖}_{2}^{2} dy \end{matrix}

(11)

Set two PSF functions as $h_{σ_{1}} (x, y) = \frac{1}{2 π σ_{1}^{2}} e^{- \frac{x^{2} + y^{2}}{2 δ_{1}^{2}}} and h_{σ_{2}} (x, y) = \frac{1}{2 π σ_{2}^{2}} e^{- \frac{x^{2} + y^{2}}{2 δ_{2}^{2}}}$ , then

h_{σ_{3}} (x, y) = h_{σ_{1}} (x, y) * h_{σ_{2}} (x, y) = \frac{1}{2 π (σ_{1}^{2} + σ_{2}^{2})} e^{- \frac{x^{2} + y^{2}}{2 (δ_{1}^{2} + δ_{2}^{2})}}

(12)

The parameters $σ_{1}$ , $σ_{2}$ , $σ_{3}$ , of the $PSF$ function $h_{σ_{1}} (x, y)$ , $h_{σ_{2}} (x, y)$ , $h_{σ_{3}} (x, y)$ , respectively, $σ_{1}^{2} < σ_{3}^{2}$ are satisfied, where $h_{σ_{1}} (x, y)$ are convoluted with $h_{σ_{2}} (x, y)$ , then $h_{σ_{3}} (x, y)$ is obtained.

The height of the observed object is different in different positions, which results in different degrees of blur in each region of the two images. It can be expressed as $Σ = {y : σ_{1}^{2} > σ_{2}^{2}}$ and $Σ^{c} = {y : σ_{1}^{2} < σ_{2}^{2}}$ .

In $y \in \sum = {y : σ_{1}^{2} > σ_{2}^{2}}$ , the collected scatter image’s fuzzy process can be simulated as:

{\overset{⌢}{I}}_{1} (y) = \int h_{σ_{1}} (y, x) f (x) d x ≅ \int h_{Δ σ} (y, \bar{y}) I_{2} (y) d y

(13)

Where, $Δ σ = \sqrt{σ_{1}^{2} - σ_{2}^{2}}$ .

In $y \in \sum = {y : σ_{1}^{2} < σ_{2}^{2}}$ , the collected scatter image’s fuzzy process can be simulated as:

{\overset{⌢}{I}}_{2} (y) = \int h_{σ_{2}} (y, x) f (x) d x ≅ \int h_{Δ σ} (y, \bar{y}) I_{1} (y) d y

(14)

Where, $Δ σ = - \sqrt{σ_{1}^{2} - σ_{2}^{2}}$ .

Construct the functional extremum function as:

Δ \overset{⌢}{σ} = \underset{Δ σ}{a r g m i n} Φ (Δ σ)

(15)

Similarly, the least square method is also used to simulate:

\begin{array}{l} Φ (Δ σ) = {\int_{Σ} ‖ {\overset{⌢}{I}}_{1} (y) - I_{1} (y) ‖}_{2}^{2} d y + {\int_{Σ^{c}} ‖ {\overset{⌢}{I}}_{2} (y) + I_{2} (y) ‖}_{2}^{2} d y \\ = \int H (Δ (y)) {‖ {\overset{⌢}{I}}_{1} (y) - I_{1} (y) ‖}_{2}^{2} d y + \int (1 - H (Δ (y))) {‖ {\overset{⌢}{I}}_{2} (y) + I_{2} (y) ‖}_{2}^{2} d y \end{array}

(16)

Establish the relationship between $Δ σ$ and D:

{\begin{matrix} σ_{1}^{2} = ρ^{2} r_{0}^{2} v_{1}^{2} {(\frac{1}{F} - \frac{1}{v_{1}} - \frac{1}{D})}^{2} \\ σ_{2}^{2} = ρ^{2} r_{0}^{2} v_{2}^{2} {(\frac{1}{F} - \frac{1}{v_{2}} - \frac{1}{D})}^{2} \end{matrix}

(17)

After the operation, we can get:

\begin{matrix} \frac{1}{(v_{1} - v_{2}) (v_{1} + v_{2})} \frac{Δ σ | Δ σ |}{ρ^{2} r_{0}^{2}} = {(\frac{1}{F} - \frac{1}{D})}^{2} \\ - \frac{2}{v_{1} + v_{2}} (\frac{1}{F} - \frac{1}{D}) \end{matrix}

(18)

\frac{1}{F} - \frac{1}{D} = \frac{1}{v_{1} + v_{2}} \pm \frac{1}{v_{1} + v_{2}} \sqrt{1 + \frac{Δ σ | Δ σ |}{ρ^{2} r_{0}^{2}} \cdot \frac{v_{1} + v_{2}}{v_{1} - v_{2}}}

(19)

Finally, the mapping between $Δ σ$ and D is established as

D (y) = {(\frac{1}{F} - \frac{1}{v_{1} + v_{2}} - \frac{1}{v_{1} + v_{2}} \sqrt{1 + \frac{Δ σ (y) | Δ σ (y) |}{ρ^{2} r_{0}^{2}} \cdot \frac{v_{1} - v_{2}}{v_{1} - v_{2}}})}^{- 1}

(20)

As for the position relationship between the focal plane and the imaging plane of the image taken by the camera, according to the actual aperture imaging principle, the four positions are as follows:

When $F < v < v_{1}$ ,

F < \frac{1}{F} - \frac{1}{D} < v_{1}

(21)

F < \frac{1}{v_{1} + v_{2}} + \frac{1}{v_{1} + v_{2}} \sqrt{1 + \frac{Δ σ | Δ σ |}{ρ^{2} r_{0}^{2}} \cdot \frac{v_{1} + v_{2}}{v_{1} - v_{2}}} < v_{1}

(22)

We obtain that $Δ σ | Δ σ |$ have the following inequality by calculation:

\begin{matrix} ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} [{(\frac{v_{1} + v_{2}}{F})}^{2} - \frac{2 (v_{1} - v_{2})}{F}] \\ < Δ σ | Δ σ | < ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} (\frac{v_{2}^{2}}{v_{1}^{2}} - 1) \end{matrix}

(23)

Similarly, when $v_{2} < v < 2 F$ ,

\begin{matrix} ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} (\frac{v_{2}^{2}}{v_{1}^{2}} - 1) < Δ σ | Δ σ | < ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} \\ [{(\frac{v_{1} + v_{2}}{2 F})}^{2} - \frac{(v_{1} + v_{2})}{F}] \end{matrix}

(24)

Similarly, when $v_{1} < v < \frac{v_{1} + v_{2}}{2}$ ,

ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} (\frac{v_{2}^{2}}{v_{1}^{2}} - 1) < Δ σ | Δ σ | < 0

(25)

When $\frac{v_{1} + v_{2}}{2} < v < v_{2}$ ,

0 < Δ σ | Δ σ | < ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} (\frac{v_{2}^{2}}{v_{1}^{2}} - 1)

(26)

As can be seen from the above, $Δ σ$ is taken negatively in (3) and (5), $y \in Σ^{c} = {y : σ_{1}^{2} < σ_{2}^{2}}$ and then execute $y \in Σ = {y : σ_{1}^{2} > σ_{2}^{2}}$ .

As can be seen from Figure 1, all observed defocused images are satisfied $F < v < 2 F$ , and there is an inequality relationship:

\begin{matrix} ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} [(\frac{v_{1} + v_{2}}{F}) - \frac{2 (v_{1} + v_{2})}{F}] < Δ σ | Δ σ | \\ < ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} [{(\frac{v_{1} + v_{2}}{2 F})}^{2} - \frac{(v_{1} + v_{2})}{F}] \end{matrix}

(27)

From equations (3) and (19), we obtain the optimization problem’s objective function and constraints, respectively.

The depth estimation model is as follows:

\begin{array}{l} \min_{Δ σ} {\int H (Δ σ (y)) ‖ {\overset{⌢}{I}}_{1} (y) - I_{1} (y) ‖}_{2}^{2} d y + {\int (1 - H (Δ σ (y))) ‖ {\overset{⌢}{I}}_{2} (y) - I_{2} (y) ‖}_{2}^{2} d y \\ s, t . ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} [(\frac{v_{1} + v_{2}}{F}) - \frac{2 (v_{1} + v_{2})}{F}] \\ < Δ σ | Δ σ | < ρ^{2} r_{0}^{2} \frac{v_{1} - v_{2}}{v_{1} + v_{2}} [{(\frac{v_{1} + v_{2}}{2 F})}^{2} - \frac{(v_{1} + v_{2})}{F}] \end{array}

(28)

An improved depth estimation algorithm for defocused images

The basic steps of the method are as follows:

By four classes mentioned in the previous section imaging geometry relationship, for the acquisition of two defocused images of camera parameters to determine $Δ σ$ sigma preliminary interval $[α, β]$ ;

The interval $[α, β]$ for determining camera-parameters is discretized according to N points: $α = Δ σ_{0} < Δ σ_{1} < . . . < Δ σ_{n} = β$ ;

For the $\underset{k \in {0, 1, . . ., n}}{Δ σ_{*} = argmin (Φ (Δ σ_{k}))}$ in equation (16), get the $Δ σ_{*}$ make $Φ (Δ σ_{k})$ get the smallest value;

For the left and right points of $Δ σ_{*}$ , make $α = Δ σ_{* - 1}$ and $β = Δ σ_{* + 1}$ respectively, according to the set threshold $ε$ , judgment $| α - β | \geq ε$ , if the value is true, the cycle (2)–(4) steps are repeated; if it is false, $Δ σ_{*}$ will be changed as the minimum value of equation (16).

The depth information is estimated according to equation (28).

The purpose is to determine the relatively accurate 3D model of the target object by using the depth estimation of multiple defocused images shot by an electronic video microscope. Therefore, in this section, an estimation model is designed to estimate depth information using the multi-frame defocus image model. The steps are as follows:

(1) The sequence of collected images is numbered as $I_{1}, I_{2}, I_{3}, \dots, I_{N}$ . Set a value K, so that the collected image sequence, according to, $(I_{1} I_{K}), (I_{2}, I_{K + 1}),$ ……, $(I_{K}, I_{N - K})$ , respectively estimate depth information according to the algorithm in the previous section.

(2) For the estimated results $D_{1}, D_{2}, \dots . ., D_{N - K}$ , the depth value of each pixel in n-k matrices is used as the histogram, and the value with the most depth value (or the most concentrated value) is selected as the depth of the point. The new fusion depth information is finally obtained.

Surface reconstruction

The surface reconstruction process subdivides the pre-processed point cloud data into surface segments and uses smaller elements to fit. Similar to cyclotomy, a smooth circle is fitted with polygons, and its edge length is small, and the number is large. When the number is large to a certain extent, it can fit the smoother circle observed by human eyes. The surface of point cloud data is also fitted with similar elements. When the segmentation is fine enough, and there are enough elements, the object’s surface can be fitted to the real object surface observed by human eyes. Because of triangles’ particularity, three points can determine a surface, and the point cloud can be transformed into a surface by using the triangle joint of three-dimensional coordinates.

There are two kinds of triangulation for 3D point cloud data: the indirect and direct methods. The projection method uses triangulation and its plane projection. In mapping to 3D coordinates, the rationality of this method lies in the method of surface reconstruction. From plane to the surface, the triangle connection relationship remains unchanged, that is, the one-to-one correspondence relationship. However, the triangulation of the 2D point cloud is much less complex than the 3D triangulation algorithm. The direct method is to triangulate the data of the 3D point cloud directly. Due to various constraints, the algorithm is complex, but the effect is better than the projection method. Because of the application scene of this paper, most of the target objects do not have too many jumping changes. The plane projection method is selected for fitting, and the direct method is used for a few scenes with more jumping points.

The Delaunay triangulation method used in this paper is introduced below. The principles of Delaunay triangulation are as follows: any four points cannot be con-circular, maximized minimum angle, disjoint of sides, and triangular plane principle. The four-point non-coplanar principle guarantees the uniqueness of the partition. There are and only three points in one plane among the four points. The minimum angle maximization principle ensures that the shape of the triangle is not too long and thin, which will result in the poor surface fitting. All formed faces must be triangular. As shown in Figure 2, there are no other points outside the peripheral circle as for triangle in the Delaunay triangle network. As shown in Figure 3, in the convex quadrilateral formed by adjacent triangles, the small angles of the six inner angles no longer increase after the diagonal lines are interchanged.

Figure 2.

Schematic diagram of Delaunay triangulation criteria.

Figure 3.

Maximum and minimum angle characteristics of Delaunay triangulation.

The Delaunay method often uses the incremental and the Divide and conquer algorithms. The former, also known as the point-by-point insertion method, initialize a relatively small triangulation region and increases points successively according to the dividing principle until all points are added. The algorithm has low complexity, low memory consumption, and poor timeliness. The latter will divide the point set into two subsets each time to achieve recursive triangulation, which consumes more memory but has better timeliness. By referring to the previous methods and combining the advantages of the two methods, the point set is firstly divided into piles until the elements in the set are less than the set value. Then the triangular mesh is segmented by the point-by-point insertion method. The steps are as follows:

Two-dimension point set is divided into two piles. If the number of elements is less than the set value, the process of loop (2)–(3) is carried out until the set number are satisfied so that step (5) can execute;

Adding new points in subsets (Figure 4), respectively;

Merge the subsets according to the convex hull bottom line and the top line of two subsets;

According to the results got from (3), perform steps (6);

Perform triangulation with the incremental algorithm to obtain the segmentation results;

The final triangulation results were obtained.

Figure 4.

Schematic diagram of node addition method in Step (2): (a) new note P, (b) decide how to connect P and other nodes, (c) remove side AB, and (d) form a triangle.

Experiments and results

Gradient testing experiment and result

The testing experiment for the proposed Geometric derivation depth estimation model using the Gradient dataset (Paper), which has an even height drop from the left to the right edge (Figure 5). Because the Gradient image set is focused on a paper that has been placed on an even slop surface, this photo set is used to test the proposed model’s depth estimation accuracy (Geometric derivation model).

Figure 5.

Defocusing image sequence with an even drop of depth.

The depth estimation graph of the paper image is shown in Figure 6. The comparison between the actual depth and the estimated depth is shown in Figure 7. The run time and RMS results of the testing experiment with different K values (number of image sequence) are shown in Table 1.

Figure 6.

Depth estimation result of the paper dataset.

Figure 7.

Geometric derivation method depth estimation quantitative comparison.

Table 1.

Comparison of results of depth estimation methods based on Geometric derivation.

K	1	2	3	4	5	6
K	7	8	9	10	11	12
RMS	3.3318	3.3533	3.3597	3.7966	3.9827	3.7321
Run time	53.93	50.77	45.39	43.29	40.08	37.92
RMS	3.4251	3.2219	3.0121	2.9739	2.8847	2.7396
Run time	33.19	27.91	22.49	17.11	15.29	11.53

Surface reconstruction

Appropriate sequence defocused images and appropriate regions were selected for image registration, denoising, and other pre-processing. Here, image sequence Nut was selected, and the result was shown in Figure 8. The image sequence is arranged from left to right from top to bottom, with the focus plane from the higher to the lowest part of the walnut surface.

Figure 8.

Defocusing image sequence after pre-processing with registration.

The depth estimation of the defocused image has good operation efficiency. The depth map of the corresponding full focus image is estimated, as shown in Figure 9.

Figure 9.

Defocusing image depth estimation results.

According to the pre-processing method of point cloud data described, the three-dimensional point cloud is obtained after depth information processing. The distribution of the point cloud is shown in Figure 10.

Figure 10.

Three-dimensional point cloud generated by depth information.

3D reconstruction results of all perspectives are shown in Figures 11 to 13.

Figure 11.

3D reconstruction results from perspective 1.

Figure 12.

3D reconstruction results from perspective 2.

Figure 13.

3D reconstruction results from perspective 3.

Discussion

The results of 3D reconstruction are consistent with the original object surface. However, the top is partially flat. Through several experiments, the integrity of the defocused image sequence is not enough. It must be in the focus plane at least once during the shooting from the highest point to the lowest point of an object’s surface. Moreover, shown in Figure 13 above, it happens when taking the objects by manual. The dispersion circle has a radius for the human eye to identify the minimum range.

Furthermore, part for the human eye to see may not be a clear regional limit. The focusing plane did not start from the highest point on the surface of the object. So, in the process of collecting images, we improve the acquisition method. We began to collect from the peak area of fuzzy, focus to gradually evident, and then gradually blurred, to the lowest point is clear, and then blur to the lowest. Except at the top of the other area, the curved surface fits in line with the original walnut surface. The recovery of the texture on the surface is more accurate. If the computer operation ability allows, do not downsample for 3D point clouds, the finer surface subdivision will be obtained. Theoretically, we can fit better surface and apply it to the medical and industrial fields.

Conclusion

The improved image fusion method based on sparse decomposition solves the long training time of dictionary training and sparse approximation. In the specific application, we can obtain accurate texture information considerably. This paper mainly introduces the process of 3D reconstruction through 3D point clouds in microscopic scenes. Firstly, the depth map is used to calculate the corresponding three-dimensional point cloud. Secondly, the point cloud data is pre-processed, such as removing noise and distortion points through corresponding filtering, and the point cloud is smoothed. Thirdly, the processed 3D point clouds are sampled to reduce the computational burden of surface reconstruction. For the pre-processed 3D point cloud data, a Delaunay triangulation method based on the plane projection and synthesis algorithm was used to triangulate the 3D point cloud data. Finally, the 3D reconstruction of the micro scene was realized. The experimental results show that the reconstructed surface conforms to the surface contour information of the selected object.

However, in the process of 3D reconstruction, it is found that some groups of data are not suitable for micro 3D reconstruction, such as reflection and occlusion, in the later stage, we can thoroughly study which conditions are unsuitable for micro 3D reconstruction and make some implementation strategies and experimental verification for this situation.

Footnotes

Acknowledgements

The authors express their sincere appreciation and profound gratitude to research assistant Botao Ma, Peng Wu, and Xuan Zhang, for their helping and supporting on collection and sorting of the data.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Sichuan Science and Technology Program (2021YFQ0003, 2019YJ0189).

ORCID iDs

Wenfeng Zheng

Lirong Yin

References

Marr

. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. The Modern Schoolman 1985; 62(2):141–142.

Yin

Chen

, et al. (eds). Semantic representation for visual reasoning. In: 2018 International Joint Conference on Metallurgical and Materials Engineering (JCMME 2018), MATEC web of conferences, Wellington, New Zealand, 10–12 December 2018, vol 277. EDP Sciences.

Esteban

Schmitt

(eds). Multi-stereo 3D object reconstruction. In: Proceedings first international symposium on 3D data processing visualization and transmission, 2002. New York, NY: IEEE.

Yang

Liu

Zheng

, et al. Reconstructing a 3D heart surface with stereo-endoscope by learning eigen-shapes. Biomed Opt Express 2018; 9(12): 6222–6236.

Ding

Tian

Yin

, et al. (eds). Multi-scale relation network for few-shot learning based on meta-learning. In: 12th International Conference, ICVS, Thessaloniki, Greece, 23–25 September 2019. Springer.

Yang

Liu

Huang

, et al. A triangular radial cubic spline deformation model for efficient 3D beating heart tracking. Signal Image Video Process 2017; 11(7): 1329–1336.

Riboldi

Gierga

Chen

, et al. Accuracy in breast shape alignment with 3D surface fitting algorithms. Med Phys 2009; 36(4): 1193–1198.

Zheng

Chen

, et al. Joint embedding VQA model based on dynamic word vector. PeerJ Comput Sci 2021; 7: e353.

Yang

Liu

Zheng

, et al. Motion prediction via online instantaneous frequency estimation for vision-based beating heart tracking. Inf Fusion 2017; 35: 58–67.

10.

Yin

Zheng

, et al. Fractal dimension analysis for seismicity spatial and temporal distribution in the circum-Pacific seismic belt. J Earth Syst Sci 2019; 128(1): 22.

11.

Zheng

Xie

, et al. Impact of human activities on haze in Beijing based on grey relational analysis. Rend Lincei 2015; 26(2): 187–192.

12.

Zheng

Yin

, et al. The retrieved urban LST in Beijing based on TM, HJ-1B and MODIS. Arab J Sci Eng 2016; 41(6): 2325–2332.

13.

Zheng

Yin

, et al. Spatiotemporal heterogeneity of urban air pollution in China based on spatial analysis. Rend Lincei 2016; 27(2): 351–356.

14.

Zheng

Yin

, et al. Wavelet analysis of the temporal-spatial distribution in the Eurasia seismic belt. Int J Wavelets Multiresolut Inf Process 2017; 15(3): 1750018.

15.

Zheng

Liu

Yin

. Sentence representation method based on multi-layer semantic network. Appl Sci 2021; 11(3): 1316.

16.

Zhou

Zheng

Shen

. A new algorithm for distributed control problem with shortest-distance constraints. Math Probl Eng 2016; 2016: 1–6.

17.

Chen

Yin

Fan

, et al. Temporal evolution characteristics of PM2. 5 concentration based on continuous wavelet transform. Sci Total Environ 2020; 699: 134244.

18.

Zheng

Wang

, et al. Predicting seismicity trend in southwest of China based on wavelet analysis. Int J Wavelets Multiresolut Inf Process 2015; 13(2): 1550011.

19.

Zheng

Yin

, et al. Influence of social-economic activities on air pollutants in Beijing, China. Open Geosci 2017; 9(1): 314–321.

20.

Yin

Yao

, et al. Seismic spatiotemporal characteristics in the Alpide Himalayan Seismic Belt. Earth Sci Inform 2020; 13: 883–892.

21.

Tang

Liu

, et al. Earthquakes spatio–temporal distribution and fractal analysis in the Eurasian seismic belt. Rend Lincei Sci Fis Nat 2020; 31(1): 203–209.

22.

Yang

Guo

, et al. Sparse-view CBCT reconstruction via weighted Schatten p-norm minimization. Opt Express 2020; 28(24): 35469–35482.

23.

Ikeuchi

Horn

. Numerical shape from shading and occluding boundaries. Artif Intell 1981; 17(1–3): 141–184.

24.

Vogel

Breuß

Weickert

(eds). Perspective shape from shading with non-Lambertian reflectance. In: 30th DAGM Symposium Munich, Joint pattern recognition symposium, Germany, 10–13 June 2008. Springer.

25.

Barranco

Tomasi

Diaz

, et al. Parallel architecture for hierarchical optical flow estimation based on FPGA. IEEE Trans Very Large Scale Integr VLSI Syst 2011; 20(6): 1058–1067.

26.

Liu

Gao

Zheng

, et al. Performance of two neural network models in bathymetry. Remote Sens Lett 2015; 6(4): 321–330.

27.

Liu

Wang

Liu

, et al. Deriving bathymetry from optical images with a localized neural network algorithm. IEEE Trans Geosci Remote Sens 2018; 56(9): 5334–5342.

28.

Tang

Liu

Deng

, et al. Construction of force haptic reappearance system based on Geomagic Touch haptic device. Comput Methods Programs Biomed 2020; 190: 105344.

29.

Tang

Liu

Deng

, et al. An improved method for soft tissue modeling. Biomed Signal Process Control 2021; 65: 102367.

30.

Mitra

Nguyen

(eds). Estimating surface normals in noisy point cloud data. In: Proceedings of the nineteenth annual symposium on computational geometry, June 2003, pp.322–328. New York, NY: Association for Computing Machinery.

31.

Young

Van Vliet

. Recursive implementation of the Gaussian filter. Signal Process 1995; 44(2): 139–151.

32.

Brownrigg

. The weighted median filter. Commun ACM 1984; 27(8): 807–818.

33.

Zhang

Xiong

. Impulse noise removal using directional difference based noise detector and adaptive weighted mean filter. IEEE Signal Process Lett 2009; 16(4): 295–298.

34.

Hodgkins

. Double pass fuel filter assembly. Google Patents US6248236B1, USA, 2001.

35.

Chen

Yuan

(eds). Comparison of three pixel distance measures for L/sub spl infin/geometric reconstruction. In: 2006 8th international conference on signal processing, 2006. New York, NY: IEEE.

Microscopic 3D reconstruction based on point cloud data generated using defocused images

Abstract

Keywords

Introduction

Data

Point cloud data pre-processing and down-sampling

Method

Improved image fusion method based on sparse decomposition

Modeling depth estimation method based on geometric constraints

Geometric derivation of the depth estimation model

An improved depth estimation algorithm for defocused images

Surface reconstruction

Experiments and results

Gradient testing experiment and result

Surface reconstruction

Discussion

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iDs

References