Sage Journals: Discover world-class research

Abstract

Vision-guided telerobot is often used to execute tasks, such as grasping and classification, in various environments, which contains some unfamiliar objects beyond its matching library. Hence, it is necessary to create new template dynamically for the unfamiliar objects. However, this procedure is inconvenient for the traditional template matching algorithm. In this article, a novel map–based normalized cross correlation algorithm is proposed. Map–based normalized cross correlation is summarized into two phases. In the learning phase, map–based normalized cross correlation creates new template and map by the superpixel-based GrabCut method dynamically, which is different from previous template matching algorithms. In the matching phase, a map-based similarity evaluation is designed to determine the position and rotation angle of object, where the map is used to eliminate the interference of background. Various experiments demonstrate that superpixel-based GrabCut method is more robust against noise than the traditional GrabCut algorithm and can separate the object from texture-rich background with less iteration times and time consumption. Additionally, map–based normalized cross correlation algorithm can locate objects in texture-rich images more accurately compared with polar transformation and image pyramids normalized cross correlation algorithm, especially for the matching of irregularly shaped object.

Keywords

Vision-guided telerobot template matching normalized cross correlation superpixel polar transformation

Introduction

Telerobot is a special type of robot, which is widely applied in space stations, undersea detections, telemedicine facilities, and other remote control devices.^1,2 To improve the efficiency of telerobot and release the workload of operator, some smart systems have been designed and applied to tasks including force feedback teleoperation,^3–5 to autonomous obstacle avoidance, and so on.^6,7

Vision-based positioning, as an important machine vision technology, is increasingly applied in telerobot systems.^8–10 It makes telerobot achieve accurate grasping and efficient classification automatically. Pixel-based template matching is one of the most popular methods to determine the target position and rotation angle.^11–13

In the past decades, various pixel-based template matching algorithms have been investigated. These algorithms work as follows: given a template image $T$ and a target image $I$ , find the best match for $T$ in the image $I$ that has the minimum distortion or the maximum correlation.¹⁴ Global search (GS), as a fundamental technique, is time consuming.¹⁵ There are also other measures of score function, such as the sum of squared differences (SSD),¹⁶ the sum of absolute differences (SAD),¹⁷ sequential similarity detection algorithm (SSDA),¹⁸ partial distortion elimination (PDE),¹⁹ and sorting-based partial distortion elimination (SB-PDE).²⁰ They use the absolute or Euclidean distance as the similarity measure, which means poor matching accuracy and robustness against noise. Fortunately, the normalized cross correlation (NCC) algorithm,²¹ compensating for both additive and multiplicative variations under uniform illumination changes,²² was proposed. To improve the computation efficiency, pyramid methods were proposed, which have been widely used in pixel-based template matching.^23,24

In addition, if the object in target image is rotated with respect to the template, the polar transformation algorithm will be used to reduce the computing load.^25–27 When a polar transformation is performed, a circular template is needed. However, there is an issue when the circular template is selected for the elongated or irregularly shaped object such as bolts, rivets, chips. If the incircle region is selected, then the template will not contain all of pixels belonging to the object. And if the excircle region is selected, some pixels out of the object will be contained. Both of these situations lead to inaccurate matching. To solve this problem, some researchers proposed the region-based normalized cross correlation (RB-NCC) algorithm,²⁸ but this algorithm is not robust against texture-rich background and cannot perform well since it works with a rectangular bounding box based on the binary image.

Moreover, most of the traditional template matching algorithms work as depicted in Figure 1; template image needs to be created manually based on reference image in the offline mode. This procedure is inconvenient, especially for irregularly shaped object, so it is better to establish a library that contains all of objects before matching. In fact, it is difficult to ensure all of objects are included in advance for a telerobot that often works in unfamiliar environment. Consequently, traditional template matching algorithms are not flexible enough for the telerobot. But if the massive tasks such as grasping and classification is finished without the assist of template matching algorithms, it will increase the workload of operator obviously.

Figure 1.

Procedure for traditional template matching algorithms.

In this article, a novel template matching algorithm is proposed, which is named map–based normalized cross correlation (MB-NCC). Our contribution can be summarized as follows: (1) MB-NCC algorithm can create new template for the unfamiliar object dynamically, which is different from tradition template matching algorithms. It can replace operator to finish the massive tasks, such as grasping and classification, and release the workload of the operator. (2) SB-GrabCut method, which combines with superpixels and GrabCut algorithm,^29–34 is applied for the creation of template. It can separate the object from texture-rich background with less iteration times and time consumption compared with the traditional GrabCut algorithm. (3) Map is used for the similarity evaluation to eliminate the interference of background, which makes MB-NCC more robust against the noise.

The remainder of this article is organized as follows. The overview of MB-NCC algorithm is presented in section “The overview of MB-NCC algorithm.” Then, the approach of MB-NCC is introduced in section “The approach of MB-NCC.” Implementation details of MB-NCC are described in section “The implementation of MB-NCC” and relevant experiments are provided in section “Experimental results and discussions.” Finally, we conclude the article in section “Conclusion.”

The overview of MB-NCC algorithm

In this section, the overview of our MB-NCC algorithm is described. Figure 2 depicts the brief procedure for MB-NCC algorithm. MB-NCC is summarized into two phases, and both of them work in the online mode:

In the learning phase, SB-GrabCut method is for the creation of map and template image. The polar-transformed template image $P_{T}$ and the map $P_{M}$ are dynamically created based on the excircle region of target image $T_{0}$ .

In the matching phase, the target image $I$ is separated to a set of sub-images $S_{I} : {I_{1}, I_{2}, \dots, I_{k}, \dots, I_{K}}$ , indexed by the index $k$ . Image pyramid is performed for each sub-image $I_{k}$ . Finally, through application of the map-based similarity evaluation on different layers of image pyramids, the position and rotation angle of object are determined.

Figure 2.

Brief procedure for our algorithm.

The approach of MB-NCC

According to the two phases as described in section “The overview of MB-NCC algorithm,” the approach of MB-NCC is divided into two parts: (1) the creation of map and template by SB-GrabCut method and (2) the map-based similarity evaluation.

SB-GrabCut

As mentioned in section “Introduction,” when polar transformation is performed, a circular template for the object will be used. For a square or round object, the circular template includes most of its pixels as shown in Figure 3(a) and (b). But when the shape of the object is elongated or irregular, there is an issue for the circular template selection. If the incircle region is selected as shown in Figure 3(c), the circular template cannot contain all of the pixels in object. And if the excircle region is selected as shown in Figure 3(d), then some pixels out of the object will also be contained. Both of these situations lead to inaccurate matching. In traditional template matching algorithms, such as the NCC algorithm using polar transformation and image pyramids, the creation of template image is finished in the offline mode. In this case, the useless pixels in circular template can be eliminated manually. However, this procedure is inconvenient for the dynamic creation of template. To overcome the shortcomings of the above method, a novel SB-GrabCut method is applied to create template based on the excircle region for the unfamiliar object by separating the object from background.

Figure 3.

Different shapes of objects: (a) circular template for a circular object, (b) circular template for a square object, (c) circular template based on the incircle region of the irregularly shaped object, and (d) circular template based on the excircle region of the irregularly shaped object.

SB-GrabCut is the most crucial for MB-NCC. Briefly, SB-GrabCut separates the foreground, such as object, from background and gets the mask of the foreground. In MB-NCC, this mask can be used to create the map and template without background. SB-GrabCut is composed with two procedures: superpixel-based simplification and GrabCut-based segmentation.

Superpixel-based simplification works by gathering similar and neighbor pixels to the same block and then averaging them. Compared with the traditional K-means algorithm, our method simplifies the image based on local pixel block, which ensures that the pixels in same region are more similar and boundaries between different regions are smoother.

Considering the superpixel transformation as the histogram intersection and histogram similarity problem,³⁵ a novel function is proposed to evaluate the similarity between two histograms as described in equation (1), which composes with two terms. $Int (A, B)$ is the histogram interception distance and $Sim (A, B)$ is calculated with the weighted average in two histograms. When weighted average of two histograms is completely same, $Sim (A, B)$ takes on the peak value. In other cases, the function is lower. $γ$ is a constant that weighs the influence of each term. The combination of the two terms ensures $H (A, B)$ reflecting on both the level of histogram interception and the similarity of the value in two histograms, which makes the boundaries smoother

H (A, B) = Int (A, B) + γ \cdot sim (A, B)

(1)

Sim (A, B) = e^{- {| \bar{A} - \bar{B} |}_{abs}}

(2)

Let $M$ be the number of pixels in the image $I_{sub}$ which contains only one object. $I_{sub}$ is divided into some regular blocks, and all of the pixels are mapped to a superpixels’ set $S_{sub}$ which consists of $Q$ blocks without intersection

I_{sub} : {1, \dots, m, \dots, M} \to S_{sub} : {B_{1}, \dots, B_{q}, \dots, B_{Q}}

(3)

Let $H$ be the histogram of $B_{q}$ and $δ (\cdot)$ be the decision function, which returns 1 only when $B_{q} (x) \in H_{j}$ . $| B_{q} |$ is the total number of pixels in $B_{q}$ . $Int (g_{B_{q 1}}, g_{B_{q 2}})$ and $Sim (g_{B_{q 1}}, g_{B_{q 2}})$ can be calculated with $g_{B_{q}}$ , which is the normalized histogram in $B_{q}$

g_{B_{q}} (j) = \frac{1}{| B_{q} |} \sum_{B_{q}} δ (B_{q} (x) \in H_{j})

(4)

Int (g_{B_{q 1}}, g_{B_{q 2}}) = \sum_{j} \min {g_{B_{q 1}} (j), g_{B_{q 2}} (j)}

(5)

Sim (g_{B_{q 1}}, g_{B_{q 2}}) = e^{- {| \sum_{j} j \cdot (g_{B_{q 1}} - g_{B_{q 2}}) |}_{abs}}

(6)

Regarding $B_{q}$ as a parent block, all of parent blocks can be divided into quadruple sub-blocks as shown in Figure 4. However, some sub-blocks divided from their parents may be more similar with the block around their parent block, so they should be assigned with a new parent. Let $B_{q}^{c}$ be a candidate sub-block to be exchanged from its parent block $B_{q}$ to the neighbor block $B_{k}$ . The similarity between $B_{q}^{c}$ and $B_{k}$ is evaluated by function $H$ . If $H (g_{B_{q}^{c}}, g_{B_{k}}) > H (g_{B_{q}^{c}}, g_{B_{q} \ B_{q}^{c}})$ , $B_{q}^{c}$ will be assigned to $B_{k}$ . Generally, there are different ways to define the neighborhoods. To make updated boundaries smoother, the classical 4-neighbors mask is used, which can avoid splitting the block or creating sharp boundaries between the blocks as shown in Figure 5(a). Similarly, the sub-blocks are divided into quadrants and reassigned repeatedly until all of them are divided into quadruple pixels. At that time, all of sub-blocks are reassigned to a suitable parent and $B_{q}$ becomes a mature superpixel. Finally, the image is simplified by assigning each pixel with the average value of the superpixel which it belongs to.

Figure 4.

The segmentation of partitioned image.

Figure 5.

Superpixels with different boundaries: (a) superpixels with sharp boundaries and (b) superpixels with smooth boundaries.

Figure 6 shows the effectiveness of superpixel-based simplification. It is concluded that the similar pixels are gathered into same superpixels. The simplification decreases the textures but retains the boundaries between the object and background obviously, which benefits the segmentation of GrabCut algorithm.

Figure 6.

The effectiveness of superpixel-based simplification.

In the next procedure, GrabCut algorithm is used to separate the object from background. GrabCut is an image segmentation method based on Gaussian mixture model (GMM). It works relying on a Gibbs energy function defined as follows, which consist of one data term and one smoothness term

E (\underline{α}, k, \underline{β}, z) = U (\underline{α}, k, \underline{β}, z) + V (\underline{α}, z)

(7)

In the above equation, the image is considered as an array $z = {z_{1}, \dots, z_{n}, \dots, z_{N}}$ and the segmentation of the image is expressed as an array of “opacity” values $\underline{α} = {α_{1}, \dots, α_{n}, \dots, α_{N}}$ ( $0 \leq α_{n} \leq 1$ ), with 0 for background and 1 for foreground. Vector $k = {k_{1}, \dots, k_{n}, \dots, k_{N}}$ , with $k_{n} \in {1, \dots, K}$ , is introduced to deal with the GMM tractably. $\underline{β}$ is the parameter which represents foreground and background histogram distribution. The data term $U$ is defined taking account of the GMM, and the smoothness term $V$ is defined based on Euclidean distance in color space.

In our SB-GrabCut method, all of the pixels are divided into two sets $T_{b}$ and $T_{p} f$ , which means the pixels are “background” and “possibly foreground” respectively. Briefly, GrabCut works by two phases. In the initialization phase, some pixels are assigned to the sets $T_{pf}$ or $T_{b}$ . Two GMMs, corresponding to the foreground and background, are initialized based on the sets $T_{pf}$ and $T_{b}$ . In the iteration phase, a unique GMM component, from the background or the foreground model, is assigned to each pixel. Meanwhile, GMM parameters are updated from data $z$ . With the progression of the iteration going, each pixel is reassigned, and the sets $T_{pf}$ and $T_{b}$ are updated correspondingly. According to the definition of the energy function $E$ , when some pixels from both the foreground and background are assigned to the same set, the incorrect segmentation occurs. The data term $U$ and the smoothness term $V$ will become higher. Conversely, when the pixels are assigned to the correct sets, energy function $E$ becomes lower. A minimization of the total energy $E$ is solved by min-cut algorithm. As a result, the sets $T_{b}$ and $T_{pf}$ are updated. Then, the pixels in the set $T_{pf}$ can be considered as the foreground.

Benefitting from the superpixel-based simplification which makes the diversity between the object and foreground clearer, the value of energy function $E$ becomes more sensitive to the incorrect segmentation. And it becomes easier to get the minimum energy $E$ . Hence, our SB-GrabCut method can achieve the separation of foreground and background with less iteration times and higher accuracy compared with the traditional GrabCut algorithm.

The map-based similarity evaluation

As mentioned in section “Introduction,” NCC is often used in the template matching. It is described as follows

S_{NCC} = \frac{\sum [(T - \bar{T}) \cdot (I - \bar{I})]}{\sqrt{\sum {(T - \bar{T})}^{2}} \cdot \sqrt{\sum {(I - \bar{I})}^{2}}}

(8)

$T$ is the template image and $I$ is the target image. When the template image and the target image exactly match, the NCC similarity score takes on a maximum value of 1. Otherwise, the score gets smaller until decreasing to 0, which means that the two images are completely uncorrelated.

However, there is an issue when it is used for the matching of polar-transformed image. If the polar-transformed target image is based on the excircle region of an irregularly shaped object as mentioned in section “SB-GrabCut,” it will contain both the object and background. So the similarity evaluation using NCC will be interfered by the background. In our MB-NCC, the map-based similarity evaluation is used to solve this issue as formulated in equation (9), where the map $M$ is used to restrict the region of matching. The map $M$ is a binary image with 1 for the foreground which corresponds to the region of object in template $T$ created by SB-GrabCut method. With the use of map $M$ , only the pixels corresponding to 1 in target image $I$ are taken into account for our MB-NCC, which avoid the interference of background effectively

\begin{matrix} S_{MB - NCC} = \frac{\sum [(T - \bar{T}) \cdot (I \cdot M - \bar{I \cdot M})]}{\sqrt{\sum {(T - \bar{T})}^{2}} \cdot \sqrt{\sum {(I \cdot M - \bar{I \cdot M})}^{2}}} \\ = \frac{\sum_{M (x, y) = 1} [(T (x, y) - \frac{\sum T}{\sum M}) \cdot (I (x, y) - \frac{\sum^{I} \cdot M}{\sum M})]}{\sqrt{\sum_{M (x, y) = 1} {(T (x, y) - \frac{\sum T}{\sum M})}^{2}} \cdot \sqrt{\sum_{M (x, y) = 1} {(I (x, y) - \frac{\sum I \cdot M}{\sum M})}^{2}}} \end{matrix}

(9)

The implementation of MB-NCC

As mentioned in section “The overview of MB-NCC algorithm,” MB-NCC algorithm is separated into two phases. The details of each step are provided in the following sections.

Learning phase: creation of map-based template

Generally, the procedure of the learning phase is executed in the following four steps as described in Figure 7. In this phase, the operator controls the telerobot to grasp an object and put it aside, and then the polar-transformed template $P_{T}$ and map $P_{T}$ are created based on the object by SB-GrabCut method.

Figure 7.

Procedure for learning phase.

Mark the approximate region of object

An object in current image $I_{cur}$ is selected and put aside by the telerobot in manual mode. When the motion is finished, another image $I_{end}$ , not containing the previous object, is obtained. Two same-sized sub-images $I_{sub}$ and $I'_{sub}$ are cropped from above two images. Then, the binaryzation, erosion, and dilation are performed to get an approximate region mask $I_{b}$ with 0 for background and 1 for foreground.

Optimize the region of object with superpixels

Superpixel-based simplification is used for $I_{sub}$ . Since the superpixels and the simplified image retain the exhaustive boundaries between the object and background. The optimized mask $I_{sb}$ can be obtained referencing to them in order to ensure the region mask more similar with the shape of object. If one of the pixels in $B_{q}$ corresponds to the foreground in $I_{b}$ , all of the pixels in superpixel $B_{q}$ will be regarded as foreground

I_{sb} (B_{q}) = {\begin{matrix} 1, if B_{q} (x) = 1 \\ 0, otherwise \end{matrix}

(10)

Acquire the map $M$ and template image $T$

In this step, SB-GrabCut is used for the separation of the object and background in $I_{sub}$ . In the initialization phase, the pixels that correspond to the foreground region in $I_{sb}$ are set as $T_{pf}$ , and others are set as $T_{b}$ . When the iteration is finished, $T_{pf}$ is used to create the binary map $M$ and the template image $T$

\begin{matrix} M (x, y) = {\begin{matrix} 1, if (x, y) \in T_{pf} \\ 0, otherwise \end{matrix} \\ T = I_{sub} \cdot M \end{matrix}

(11)

Acquire polar-transformed image $P_{M}$ and $P_{T}$

In this step, polar transformation is performed for the map $M$ and template image $T$ based on the excircle regions of the objects. Then, polar-transformed images $P_{M}$ and $P_{T}$ are acquired for the matching.

Taking the map $M$ as an example, a pixel in the polar image $P_{M} (x_{P}, y_{P})$ is found in $M (x_{M}, y_{M})$ , and the mapping $P_{M} (x_{P}, y_{P}) \to M (x_{M}, y_{M})$ is derived as equation (12), where the constants $n_{r}$ and $n_{θ}$ denote the numbers of samples in the radial and the angular directions, respectively. The parameter $n_{r}$ should be less than or equal to $R_{\max}$ , and the parameter $n_{θ}$ should be less than or equal to $2 π \cdot R_{\max}$ . Larger values of $n_{r}$ or $n_{θ}$ will result in a higher resolution of the polar images

\begin{matrix} x_{M} = round (R_{\max} + R \cdot \cos θ) \\ y_{M} = round (R_{\max} + R \cdot \sin θ) \\ θ = (2 π / n_{θ}) \cdot x_{M}, x_{M} \in (0, n_{θ}] \\ R = (R_{\max} / n_{r}) \cdot y_{M}, y_{M} \in (0, n_{r}] \end{matrix}

(12)

Matching phase: matching with MB-NCC

In this phase, the target image $I$ is segmented to a set of sub-images $S_{I} : {I_{1}, I_{2}, \dots, I_{k}, \dots, I_{K}}$ , which are overlap between each other to avoid the missing matching. Then, pyramid transformation is performed for $I_{k}$ . By sliding the window, the region under the window is transformed from Cartesian coordinates to polar coordinates by equation (12) to obtain the polar-transformed image $P_{k}$ . Regarding $P_{k}$ as a set of vectors $p_{r}$ , it can be described as $P_{k} = [p_{1}, p_{2} \dots p_{r} \dots p_{R}]$ . For the convenience of calculation, $P_{k}$ is extended to $P_{2 k}$ . Next, a successive part of $P_{2 k}$ , starting from $p_{r}$ and ending with $p_{r + R}$ , is selected for the matching

\begin{matrix} P_{2 k} = [P_{k}, P_{k}] = [p_{1}, p_{2} \dots p_{r} \dots p_{r + R} \dots p_{2 R}] \\ P_{2 k}^{r} = [p_{r}, p_{r + 1} \dots p_{r + R}], (1 \leq r \leq R) \end{matrix}

(13)

The correlation between $P_{T}$ and $P_{2 k}^{r}$ is calculated according to the MB-NCC similarity score formulated in equation (9). With $r$ increasing from 1 to $R$ , the rotation angle varies from 0 to $2 π$ . The searching algorithm checks each sub-image $I_{k}$ with MB-NCC similarity score. The position and angle of the object in the target image are obtained eventually according to the peak value of score and the coordinate of sub-image $I_{k}$ .

Experimental results and discussions

In this section, various experiments using different images were conducted to evaluate the effectiveness of the proposed template matching algorithm. The telerobot system is depicted in Figure 8. It is composed of a 6-degree-of-freedom (DOF) robot with fore feedback holder and multi-axis force sensor, two cameras, a PHANTOM 7-DOF force feedback controller, and a PC. The Camera1, which is vertical to the platform, is used to take the target image, and the Camera2 is used to monitor the robot. The fore feedback controller is connected to the PC. In the learning phase of MB-NCC, it is used to control the telerobot to grasp the object which is used to create template and map.

Figure 8.

The telerobot system for experiments: (a) 6-DOF robot, platform, and cameras and (b) 7-DOF controller and PC.

All experiments were done using a PC with an Intel Core i5-4460 CPU operating at 3.20 GHz with 8 GB of memory and running on the Windows 7 Professional 64-bit OS. We do not use any GPU or dedicated hardware.

Effectiveness test for SB-GrabCut method

We compare the performance of our SB-GrabCut method with the traditional GrabCut algorithm when they are applied to crop the objects from texture-rich background. We try to crop objects and create maps by our SB-GrabCut method and the traditional GrabCut algorithm as shown in Figure 9.

Figure 9.

Sample image and some different methods: (a) a sample image, (b) SB-GrabCut initialized by superpixels’ region mask $I_{sb}$ , (c) GrabCut initialized by the region mask $I_{b}$ , (d) GrabCut initialized by a rectangular region. Pixels in the green rectangle are set as $T_{pf}$ , and others are set as $T_{b}$ and (e) GrabCut initialized by manual scribbles. Blue scribbles mark background and red scribbles mark foreground.

Six objects are cropped from different images and the results are presented in Figure 10. Additionally, the standard map $M_{std}$ is created by cropping the objects manually to give a reference on the evaluation of the methods. The maps created by our SB-GrabCut method contain most of the boundaries between the object and background, which ensures that there are no obvious differences between them and the standard maps. Conversely, the maps created by some other methods cannot keep concordance with the shape of objects as shown in Figure 9(c) and (d).

Figure 10.

Objects and the maps of them created by different methods: (a) objects with different backgrounds and (b)–(e) maps of the objects created by the methods descripted in Figure 9(b)–(e).

Considering that the map is binary image, we take the absolute diversify value between the maps to evaluate the accuracy of the created map as described in equation (14), where $| M_{std} |$ means the total number of pixels in $M_{std}$

Accuarcy = \frac{\sum {| M - M_{std} |}_{abs}}{| M_{std} |} \times 100 %

(14)

The relationship between the accuracy and iteration times, or between the accuracy and time consumption of the different methods are recorded in Figure 11. We conclude that our SB-GrabCut method achieves higher accuracy with less iteration times and time consumption. In the contrast experiments with the six objects, our SB-GrabCut method creates the maps within two times of iteration, which consumes less than 10 s. The reason of this is that our SB-GrabCut method has simplified the target image. This procession makes the texture get less but retains the boundaries between the object and background. In other words, the boundaries are more obvious than before, and it becomes easier to get the minimization of the total energy $E$ when the iteration goes.

Figure 11.

Results of compared tests for different methods about the creation of map: (a) the relationship between the iteration times and the accuracy of created maps and (b) the relationship between the time consumptions of different methods and the accuracy of created maps.

It is worth mentioning that despite the maps created by the GrabCut algorithm with manual scribbles are more similar with the standard maps, the processions take much more time compared with others. Most of the time is taken to select the objects and draw the manual scribbles. What is more, it needs redrawing the scribbles after each iteration to rectify the segmentation and iterating many times to get the best result, which are really inconvenient for the dynamic creation of template.

Accuracy test for MB-NCC and PP-NCC algorithm

We compare the performance of the polar transformation and image pyramids normalized cross correlation (PP-NCC) algorithm with that of our proposed MB-NCC algorithm when they are applied to the matching of square object, elongated object, and irregularly shaped object.

Firstly, some contrast experiments between PP-NCC using the circular template based on the excircle or incircle region and our MB-NCC using the map-based template are performed. As shown in Figure 12, three kinds of objects are sought in several different images using the two algorithms. The results are recorded in Figure 13. For the objects shown in Figure 12(a)–(f) which contains the same square object in different backgrounds, there was no obvious diversity of the results about two implemented algorithms. For the square object, PP-NCC can work with the template based on incircle region since it contains most of the pixels of object, which avoids the interference of the background when the matching goes. Nevertheless, the effectiveness of two algorithms becomes different as represented in Figure 12(g)–(r), where a good matching algorithm should still return a high similarity score. In detail, when the shape of the object is complex, PP-NCC becomes unreliable, but our MB-NCC still seeks out the object accurately from texture-rich image.

Figure 12.

Three kinds of objects in different backgrounds: (a)–(f) a square object in different backgrounds, (g)–(l) an elongated object in different backgrounds, and (m)–(r) an irregularly shaped object in different backgrounds.

Figure 13.

Results of tests for the objects shown in Figure 12.

Then, the images in Figure 14(a) and (b) are rotated from 0° to 200° and the similarity scores are calculated to evaluate the tolerance of rotating deviation. Figure 15 shows that the similarity score calculated by MB-NCC decreases more sharply than PP-NCC when the rotating deviation occurs. On the other hand, PP-NCC does not perform steadily. There is another peak value when the rotating deviation gets to 180°. It means that MB-NCC can achieve a higher accuracy for the estimation of rotation angle.

Figure 14.

A texture-rich target image containing one object.

Figure 15.

Result of rotating deviation test for Figure 14(a) and (b) by PP-NCC and MB-NCC: (a) the result corresponding to Figure 14(a) and (b) the result corresponding to Figure 14(b).

Finally, a contrast experiment aiming to evaluate the robustness of MB-NCC is implemented. As shown in Figure 14(c), there is a texture-rich image that contains one object near to a piece of wastepaper. Two algorithms are used to seek the object. The maximum similarity score for every coordinate in the target image is recorded when the rotation angle ranging from 0° to 360°. It can be seen from Figure 16(a) that there are two peak similarity scores. Moreover, the higher peak value corresponds to the coordinate of the wastepaper. It means an incorrect position will be returned for the matching in texture-rich image by PP-NCC. On the contrary, it becomes easy to obtain the accurate coordinate of the object referencing on the similarity scores by MB-NCC. As shown in Figure 16(b), there is only one peak similarity score when the coordinate moves. The peak value is higher than others distinctly and the change of the similarity score is sharper than that of PP-NCC obviously, which are beneficial to the matching.

Figure 16.

Similarity scores obtained by PP-NCC and MB-NCC: (a) similarity scores obtained by PP-NCC with the circular template based on excircle region and (b) similarity scores obtained by MB-NCC with the map-based template.

In summary, our MB-NCC is more reliable than the traditional PP-NCC, especially when it is used for the matching in texture-rich image. The reason is that our MB-NCC algorithm can create a map-based template and eliminate the interference of the background effectively. However, when PP-NCC algorithm is used, the contribution of the background pixels to the matching remains substantial. In practice, a similarity score threshold is established before the matching. Since the similarity score acquired by MB-NCC can distinguish the object from background more effectively, it becomes easy to establish a suitable threshold.

Conclusion

We proposed a novel MB-NCC algorithm. Different from previous template matching algorithms, our MB-NCC algorithm creates new template and map for the unfamiliar object dynamically. The procedure of MB-NCC consists of two phases. In the learning phase, a SB-GrabCut method is applied for the creation of template image and map by separating the object from background. In the matching phase, a map-based similarity evaluation is designed to determine the position and rotation angle of object, where the map is used to eliminate the interference of background. The results of experiment demonstrate that our SB-GrabCut method is more robust against noise than the traditional GrabCut algorithm. It can separate the object from texture-rich background with less iteration times and time consumption. Moreover, our MB-NCC algorithm can locate the objects in texture-rich images more effectively than PP-NCC algorithm, especially for the matching of irregularly shaped object.

When our MB-NCC is used for vision-guided telerobot, it can replace operator to finish the massive tasks, such as grasping and classification, and release the workload of the operator. Since MB-NCC proposed in this article is robust against noise and immune to the interference of the background, it could be in wider application.

Footnotes

Acknowledgements

The authors sincerely thank the editor and all the anonymous reviewers for their valuable comments and suggestions.

Academic Editor: Chenguang Yang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This paper is supported by the National Key Research and Development Program of China under grant number 2016YFB1001301, the Natural Science Foundation of China under grant numbers 61325018 and the Technique Support Project of Jiangsu Province under grant number BK2014132.

References

Sheridan

. Teleoperation, telerobotics and telepresence: a progress report. Control Eng Pract 1995; 3: 205–214.

Ding

Hou

Xue

et al . Coordinated motion control model of a six-wheeled rocker lunar rover. Adv Mech Eng 2016; 8: 1–17.

Song

et al . Real-time obstacle avoidance for telerobotic systems based on equipotential surface. Int J Adv Robot Syst 2012; 9: 71.

Fong

Rochlis Zumbado

Currie

et al . Space telerobotics: unique challenges to human–robot collaboration in space. Rev Hum Factor Erg 2013; 9: 6–56.

Song

. A7 DOF force feedback hand controller measurement and control system. Measure Control Technol 2013; 32: 70–79.

Bhatia

Uchiyama

. A VR-human interface for assisting human input in path planning for telerobots. Presence: Teleop Virt 1999; 8: 332–354.

Cao

Wang

Feng

et al . Study on real-time 3D monitoring of telerobot movement. Chin J Sci Instrum 2010; 31: 727–735.

Al-Mouhamed

Toker

Al-Harthy

. A 3-D vision-based man-machine interface for hand-controlled telerobot. IEEE T Ind Electron 2005; 52: 306–319.

Brem

Nandhakumar

. A machine vision system for enhancing the teleoperation of an industrial robot. Mach Vision Appl 1994; 7: 187–198.

10.

Qureshi

Terzopoulos

. Intelligent perception and control for space robotics. Mach Vision Appl 2008; 19: 141–161.

11.

Hel-Or

David

. Matching by tone mapping: photometric invariant template matching. IEEE T Pattern Anal 2014; 36: 317–330.

12.

Di Stefano

Mattoccia

. Fast template matching using bounded partial correlation. Mach Vision Appl 2003; 13: 213–221.

13.

Mattoccia

Tombari

Stefano

. Fast full-search equivalent template matching by enhanced bounded correlation. IEEE T Image Process 2008; 17: 528–538.

14.

Mahmood

Khan

. Correlation-coefficient-based fast template matching through partial elimination. IEEE T Image Process 2012; 21: 2099–2108.

15.

Brunig

Niehsen

. Fast full-search block matching. IEEE T Circ Syst Vid 2001; 11: 241–247.

16.

Nickels

Hutchinson

. Estimating uncertainty in SSD-based feature tracking. Image Vision Comput 2002; 20: 47–58.

17.

Ouyang

Tombari

Mattoccia

et al . Performance evaluation of full search equivalent pattern matching algorithms. IEEE T Pattern Anal 2012; 34: 127–143.

18.

Barnea

Silverman

. A class of algorithms for fast digital image registration. IEEE T Comput 1972; 100: 179–186.

19.

Bei

Gray

. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE T Commun 1985; 33: 1132–1133.

20.

Choi

Jeong

. New sorting-based partial distortion elimination algorithm for fast optimal motion estimation. IEEE T Consum Electr 2009; 55: 2335–2340.

21.

Wei

Lai

. Fast template matching based on normalized cross correlation with adaptive multilevel winner update. IEEE T Image Process 2008; 17: 2227–2235.

22.

Luo

Konofagou

. A fast normalized cross-correlation calculation method for motion estimation. IEEE T Ultrason Ferr 2010; 57: 1347–1357.

23.

Adelson

Anderson

Bergen

et al . Pyramid methods in image processing. RCA Eng 1984; 29: 33–41.

24.

Rodrigues

Bhise

. Steganography using error-correcting code in image pyramid. Int J Global Technol Initiat 2015; 4: B146–B152.

25.

Gong

Liu

Zheng

et al . Flexible multiple-image encryption algorithm based on log-polar transform and double random phase encoding technique. J Mod Optic 2013; 60: 1074–1082.

26.

Sarvaiya

Patnaik

Kothari

. Image registration using log polar transform and phase correlation to recover higher scale. J Pattern Recogn Res 2012; 7: 90–105.

27.

Traver

Bernardino

. A review of log-polar imaging for visual perception in robotics. Robot Auton Syst 2010; 58: 378–398.

28.

Zhang

Yang

Yin

. A region-based normalized cross correlation algorithm for the vision-based positioning of elongated IC chips. IEEE T Semiconduct M 2015; 28: 345–352.

29.

Zhang

Verma

. Superpixel-based class-semantic texton occurrences for natural roadside vegetation segmentation. Mach Vision Appl 2017; 28: 293–311.

30.

Van

Boix

Roig

et al . Seeds: superpixels extracted via energy-driven sampling. International J Comput Vision 2015; 111: 298–314.

31.

Achanta

Shaji

Smith

et al . SLIC superpixels compared to state-of-the-art superpixel methods. IEEE T Pattern Anal 2012; 34: 2274–2282.

32.

Kim

Hong

. A new graph cut-based multiple active contour algorithm without initial contours and seed points. Mach Vision Appl 2008; 19: 181–193.

33.

Rother

Kolmogorov

Blake

. GrabCut: interactive foreground extraction using iterated graph cuts. ACM T Graphic 2004; 23: 309–314.

34.

Khattab

Ebied

Hussein

et al . Color image segmentation based on different color space models using automatic GrabCut. Sci World J 2014; 2014: 126025.

35.

Lee

Xin

Westland

. Evaluation of image similarity by histogram intersection. Color Res Appl 2005; 30: 265–274.

A map-based normalized cross correlation algorithm using dynamic template for vision-guided telerobot

Abstract

Keywords

Introduction

The overview of MB-NCC algorithm

The approach of MB-NCC

SB-GrabCut

The map-based similarity evaluation

The implementation of MB-NCC

Learning phase: creation of map-based template

Mark the approximate region of object

Optimize the region of object with superpixels

Acquire the map M and template image T

Acquire polar-transformed image P M and P T

Matching phase: matching with MB-NCC

Experimental results and discussions

Effectiveness test for SB-GrabCut method

Accuracy test for MB-NCC and PP-NCC algorithm

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

References

Acquire the map $M$ and template image $T$

Acquire polar-transformed image $P_{M}$ and $P_{T}$