Abstract
Alignment for natural images in unconstrained environment is a challenging task. Despite the success for complex deformation, existing feature-based methods may be confused in low-textured regions where features are insufficient, while pixel-based approaches may fail when color changes. In this paper, a parametric chamfer alignment method based on mesh warping model is proposed. Warped positions of mesh vertices are considered as parameters and estimated by optimizing an object function, which measures the chamfer distance of edges and the smoothness of warping. To distinguish the sharpness of pixels, edges are detected through K-means cluster and weights are attached to different levels of edge points. In addition, after the warping model is initialized by feature-based alignment, a growing technique for registering the vertices is presented. Experiment shows that the proposed method outperforms some state-of-the-arts on real data and can be applied in stitching ceramic sanitary ware images.
Image alignment is a crucial computer vision problem of finding a transformation which aligns two images captured from different views or at different times. It has been widely applied in visual measurement, such as pose detection, 1 precise positioning,2,3 and dimensional measurement. 4 Alignment on real-world images still remains a challenging task due to some factors such as non-rigid deformation, illumination change, noise, and low texture.
Feature-based alignment approaches rely on feature correspondences to estimate transformation model. SIFT 5 is a classic feature-point descriptor. Global parametric models, such as homography, are appropriate for the cases when images are captured at a fixed view point or the scene is planar. To handle more complex deformation, local varying models6–8 and mesh-based content-preserving warps9–13 were proposed in recent years. However, these methods are sensitive to the quality of matching points. In other words, matching errors or insufficient features in low-textured regions may cause misalignment. Although line segment matching 11 can remedy this limitation to a certain extent, some weak features are still omitted.
To address this problem, a mesh-based photometric alignment method 14 was proposed. It optimizes the mesh deformation model by minimizing pixel intensity difference and achieves better alignment quality than feature-based methods, especially for low-textured images. However, it is not robust to color variation, which may be caused by light condition, camera exposure, viewing angle or different modalities in unconstrained environment.
In this paper, we present an edge-based image alignment method, since edges are salient structures and stable to illumination changes. The fundamental technique of edge registration is based on the chamfer matching. 15 The geometric transformation parameters can be optimized through minimizing an object function formulated by the distance transform image. Global parametric model and mesh warping model are adopted for simple and complex deformation, respectively. In addition, edge points are clustered into levels depending on their sharpness and weights are allocated to them in the cost function. The main contributions of this paper are presented as follows:
(1) A mesh-based chamfer alignment method, which is robust to images with low texture and color variation, is proposed;
(2) A cluster-based edge detection approach, which weights the contribution of different edge points in the object function, is introduced;
(3) Combining with feature-based alignment, an optimization scheme of mesh warping is presented.
The rest architecture of the paper is organized as follows: section “Related works” presents the related works. Parametric chamfer alignment solved by iterative optimization is revisited in section “Parametric chamfer alignment.” In section “Mesh-based chamfer alignment,” the proposed mesh-based chamfer alignment method is described in detail. Section “The experimental verification” gives the experimental results and comparisons with some other approaches. The conclusion is summarized in the last section.
Related works
Image alignment approaches are mainly categorized into feature-based and pixel-based algorithms. A thorough review of early works can be found in Zitová and Flusser. 16 Feature-based methods estimate transformation model by means of feature correspondences. Point features, such as SIFT, 5 and global models are the most common. To align images with multi-plane scene, local varying models were introduced. Gao et al. 6 proposed a dual-homography model for the scene containing a distant plane and a ground plane. Zaragoza et al. 8 presented an as-projective-as-possible (APAP) warp, which computes homography matrices for all local cells. Besides, mesh-based warping models can also deal with complex transformation. Liu et al. 9 proposed content-preserving warp (CPW), which computes the warps of grid vertices by optimizing a linear least-squares problem. However, the alignment accuracy of these methods is depending on the quality and quantity of corresponding feature points. Mismatched pairs or insufficient features in low-textured region may cause misalignment. To resolve this defect, Zhang et al. 12 introduced an outlier rejection approach via fitting local homographies and computing the residual errors. Li et al. 17 proposed a robust elastic warping (REW) method, which adopts the thin plate spline model and refines features iteratively based on a probabilistic model. Moreover, Guo et al. 18 introduced a grid-based feature tracker to make points cover the image densely and uniformly. To cover the shortage of keypoint features in low-texture scenes, Li et al. 11 proposed a dual-feature warping (DFW) model, which matches line segments and extends the mesh-based model by incorporating a line alignment term. Some recent methods12–14,19 also adopted line preserving constraint to achieve better results. However, this term relies on the line matching result and cannot ensure alignment of curved lines.
Pixel-based methods are to find the motion parameters of images by measuring the photometric error of overlapping pixels. Full search, also called template matching, is a straightforward technique. The Sum of Squared Differences (SSD) and the Normalized Cross-Correlation (NCC) are the most widely used measures. Korman et al. 20 proposed a fast matching algorithm for affine transformation. Hel-Or et al. 21 introduced a matching scheme by tone mapping, which is robust to noise and photometric variance. These methods could obtain the global optimum, but are not suitable for complex deformation, since computational cost increases exponentially with the number of parameters. Parameter optimization methods, such as Gaussian-Newton and Levenberg-Marquardt, 22 can be used to localize the minima, as the photometric error is regarded as the object function. The classic approach is the Lucas-Kanade optical flow algorithm. 23 Tian and Narasimhan 24 proposed a data-driven iterative algorithm to converge to the global optimum for non-rigid distortions. Lin et al. 14 introduced a mesh-based photometric alignment method, which shows higher accuracy than feature-based approaches, especially for low-textured images. However, it assumes the color consistency of images, and thus suffers from illumination changes and noise. To improve the robustness to color variation, Chen et al. 19 proposed a local color mapping model for each mesh. However, it involves in more parameters and increases the optimization difficulty.
Moreover, in recent years, some transformation parameter estimation approaches based on convolutional neural networks were proposed to handle low texture and illumination change. DeTone et al. 25 and Nguyen et al. 26 presented supervised and unsupervised learning methods to estimate global homography models, respectively. Ye et al. 27 introduced a deep meshflow model, but suffered from high training cost.
Chamfer matching is a classic edge-based alignment method, first proposed by Barrow et al. 15 It is insensitive to small deformation and background disturbance, and thus has been used for object detection and recognition.28–30 In this paper, to tackle the challenges to align images in unconstrained environment with complex transformation, low texture and color variation, we propose a mesh-based chamfer alignment method.
Parametric chamfer alignment
Chamfer matching
15
is a classical image alignment technique between edge maps. Let
where
The chamfer distance function can be efficiently computed through a distance transform image, which assigns each pixel as the distance to its nearest edge point in
Chamfer matching can tolerate small deformations, but is still sensitive to outliers in
The nonlinear least-squares problem can be solved via the Gauss-Newton method. Given an initial vector
where
where
Translation, euclidean, affine and projective transformations are typical global models. For complex deformation, parametric chamfer alignment is extended to mesh warping, which is introduced in the following section.
Mesh-based chamfer alignment
Clustered edge detection
Classical edge detection methods, such as Canny detector,
32
usually first compute the gradient magnitude, and then decide edges using thresholds. Low thresholds can detect plenty of edges, but are responsive to noise and irrelevant features in the image. Conversely, high thresholds may ignore weak edges. In addition, image intensity even determines the choice of thresholds in some extent. Rather than selecting appropriate thresholds, all image pixels are partitioned into some levels depending on their gradient magnitudes, where higher level signifies sharper edges. The unsupervised

Edge detection and distance transform on images of the temple 6 database: (a) reference image, (b) target image, (c) weighted edge maps of (a), and (d) inverse-color distance transform image of (b).
Distance transform image is also constructed on clustered edge points. Initially, for each level of edge points, the distance transform
where
Mesh-based alignment
Mesh warping model is adopted for image alignment with complex deformation. Let the reference image
which is aimed to be solved and guide the warping of the target image.

Point mapping interpolation.
The interpolation weights
where
where
where
where
where
where the increment is constrained in an interval
Combination with feature points
Since the parametric chamfer alignment converges to a local optimum, the initial warping model determines the accuracy. Due to the robustness of feature detection and matching in textured region, in our image alignment process, warped positions of vertices are initialized through feature-based method and refined by chamfer matching. SIFT 5 is used to establish feature matches, and then RANSAC 34 is adopted to remove outliers. Next, the model is solved by the optimization problem with alignment term defined by the corresponding error and smoothness term in (10).
Since the initial estimation of vertices surrounded by more feature points is likely to achieve higher accuracy, a growing technique is proposed for chamfer registration of vertices. The major steps are as follows:
Step 1: Initialization. For each vertex, it is marked if there are not lower than 4 feature points in its own related (4 nearest) meshes. Vertices in the convex hull of these marked ones contribute to the initial optimization variables in the alignment term
Step 2: Growing. This step expands the optimized vertices by dilating with
Step 3: If no additional vertices can be expanded, stop the algorithm. Else, go to Step 2.
Algorithm 1 is used for optimization in each step. The illustration is shown in Figure 3. This technique ensures that the accurate alignment propagates from textured area to low-textured area.

Growing procedure for alignment of mesh vertices. Feature points are marked in blue. Vertices whose surrounding feature points are not lower than 4 are marked in yellow, and their convex hull is marked in red lines. Green points denote the current optimization vertices.
The experimental verification
Parameter setting
All the parameters of the proposed mesh-based chamfer alignment (MChA) method are fixed in the experiment. The parameter
Quantitative evaluation
MChA is evaluated on real data, including image pairs and video sequences, as shown in Figure 4. Image pairs are collected from public available databases6,8,17,35,36 and challenging in multiple planes, repetitive pattern and low-textured ground. Videos are collected from public datasets14,37 and YouTube (http://www.youtube.com/) and categorized into regular videos, low-textured videos and videos with color changes.

Dataset for quantitative evaluation. (a) Image pairs; (b) Videos (01-06 are regular videos, 07-09 are low-textured videos, 10-12 are color-change videos).
MChA is compared with three state-of-the-art image alignment methods, APAP,
8
CPW,
9
and REW.
17
The alignment accuracy of two images or frames is evaluated using the measurement in Li et al.
11
The RMSE of one minus normalized cross correlation (NCC) over a neighborhood of
where
For each video, the dataset is comprised of all such pairs with 5 frame differences and the average RMSE is computed. For APAP and CPW, the size of each mesh or cell is set to the same as MChA. The RMSE results on image pairs and videos of different methods are shown in Tables 1 and 2, respectively. As we can see, MChA shows better alignment results than the other three methods significantly on image pairs and outperforms REW as well as CPW on most videos. Since the view differences of image pairs are larger than video frames, MChA is superior to methods based on corresponding feature points for large deformation. Moreover, MChA performs better on the cases with low texture and color variation. However, REW is more robust than MChA on the scenes with complicated texture (04-06), where excessively dense edges could have negative effect on alignment.
RMSE scores on image pairs.
Scores of the best methods are marked in bold.
RMSE scores on videos.
Scores of the best methods are marked in bold.
The implementations of APAP, CPW and MChA are programed in c++ and performed on an Inter i7 7700HQ CPU and an 8 GB RAM. The runtime of MChA for an image pair with
Qualitative evaluation
To evaluate the alignment quality of MChA on low-textured images, we compare it with three methods, APAP, 8 REW, 17 and DFW. 11 APAP and REW rely on feature point correspondences, while DFW is based on dual features: points and line segments. Figure 5 shows the comparison results of image stitching on 4 databases. 11 As we can see, there exist some misalignments in the results of APAP and REW, especially in the region with lines. DFW improves the quality, but still has some fine errors. It demonstrates that MChA solves the problem in these cases.

Results on low-textured image stitching: (a) APAP, (b) REW, (c) DFW, and (d) MChA.
The image pairs for evaluating the effect of MChA on scenes with color variations are collected from Lin et al.
14
and the website, as shown in Figure 6(a) and (b). For comparison, the feature-based alignment method, named MFA and used for the initial estimation of mesh warping in MChA (see section “Combination with feature points”), is evaluated. In addition, a pixel-based approach, named MPhA and similar to the method in Lin et al.,
14
is also used for comparison. MPhA replaces the alignment term of MChA with the photometric error of pixels in the overlapping region
where

Comparison on color-variation image stitching. (a) reference images, (b) target images, (c) results of MFA, (d) results of MPhA, and (e) results of MChA.
In addition, we evaluate the proposed method on images of ceramic sanitary wares, which are challenging in complex curved surfaces, low textures and light spots. Figure 7 presents three examples. It can be seen that the feature-point-based methods, APAP and MFA, suffer from some misalignments of edges. MChA shows better results and can be applied to synthesize multi-view images of sanitary wares for further defect inspection.

Results of aligning ceramic sanitary ware images. Top to bottom: reference images, target images, results of APAP, MFA, and MChA.
Conclusion
A mesh-based parametric chamfer alignment method is proposed in this paper. Edges with different sharpness are detected and divided into levels through
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Key R&D Program of China (No.2018YFB1308400).
