Abstract
Shape recognition is a classically difficult problem because of the affine transformation between two shapes. The current study proposes an affine parameter estimation method for shape recognition based on a genetic algorithm (GA). The contributions of this study are focused on the extraction of affine-invariant features, the individual encoding scheme, and the fitness function construction policy for a GA. First, the affine-invariant characteristics of the centroid distance ratios (CDRs) of any two opposite contour points to the barycentre are analysed. Using different intervals along the azimuth angle, the different numbers of CDRs of two candidate shapes are computed as representations of the shapes, respectively. Then, the CDRs are selected based on predesigned affine parameters to construct the fitness function. After that, a GA is used to search for the affine parameters with optimal matching between candidate shapes, which serve as actual descriptions of the affine transformation between the shapes. Finally, the CDRs are resampled based on the estimated parameters to evaluate the similarity of the shapes for classification. The experimental results demonstrate the robust performance of the proposed method in shape recognition with translation, scaling, rotation and distortion.
Keywords
Introduction
Shape recognition is one of the most challenging tasks for pattern recognition with affine transformation among different images. The fundamental work of shape recognition is the extraction of affine invariant features that fit the variations in viewpoints. Researchers have proposed many shape descriptors in the past decades, including the Fourier descriptor [1, 2]. Other image processing approaches, such as principal component analysis (PCA) [3, 4], independent component analysis (ICA) [5, 6], Wavelet Transformation [7, 8], and Neural network [9, 10], have been used to build representations of shapes. Mei and Androutsos [4] proposed two affine-invariant shape descriptors, namely, the ICA-Fourier and the PCA-Fourier descriptors, which gave satisfying performances in shape-based silhouette image retrieval. Bala and Cetin [8] developed an affine-invariant function for object recognition based on the decimated wavelet transform, which decreased computational complexity without decreasing recognition performance. The blind source separation (BSS) [11] is an efficient algorithm because of its valuable characteristics in extracting affine invariants from object boundaries.
More research has recently focused on multiple-resolution techniques to improve accuracy and robustness in response to the influences of noise and occlusion [12–14]. Thourn, Kitjaidure and Kondo proposed a new multi-resolution technique in the spatial domain called the multi-level of barycentre contour [12], which can reduce noises caused by deformation and distortion. However, the multi-level of barycentre contour still contains some defects. First, since the triangular area of two adjacent boundary points and the barycentre are used as a shape descriptor for matching, errors occur when the corresponding points are not found in both images with high distortions. In addition, the multi-level of barycentre contour might not work properly if the contour is broken because of occlusion. Lihua Zhang, Wenli Xu and Cheng Chang proposed an affine point pattern matching method based on a genetic algorithm [15], which was dependent on the two point sets corresponding to two affine transformed images. Since the proper matching points are difficult to extract, this method is limited with respect to its real applications.
Considering that lots of effective contour extraction algorithms can be applied, the shape of the target is more likely to be achieved than the point set. Thus, our research focuses on the shape identification. The basic idea is: Firstly, the affine transformation parameters between two shapes are estimated. Then, the matching points in the two shapes are directly obtained according to the transform parameters. Finally, the shapes are classified based on the degree of similarity between the two matching point sets. Nevertheless, estimating the affine parameters among the shapes remains a challenge. T. IZUMI et al. proposed a method using a Karhunen–Loeve (KL) expansion, spatial correlation, and a correction equation based on Taylor's expansion of the transform to address the problem described above [16]. Trace transform is also used to estimate the parameters between two images affinely distorted from each other [17].
In our study, the affine parameters are estimated based on a genetic algorithm (GA). The centroid distance ratios (CDRs) of any two opposite contour points to the barycentre are obtained as the representation of the shapes, which are invariant to the affine transformation, to construct the fitness function for the GA. The parameters derived from the iteration results of the GA are regarded as real affine parameters for building the corresponding point pairs. Furthermore, the same method of similarity evaluation used for the fitness function is used for shape classification.
The remainder of this paper is organized as follows: Section 2 analyses the proposed extraction method of affine-invariant features. Section 3 introduces the details of the affine parameter estimation based on GA, which consists of the constraint condition designation and the fitness function construction. Section 4 presents the experimental results of the proposed algorithm. Finally, section 5 draws some conclusions and discusses future works.
Affine invariant features extraction
In this section, the characteristics and the definitions of the parameters of affine transformation are discussed. Then, the concept of CDRs is introduced, and the affine invariant property of CDRs is demonstrated. Finally, the fact that CDRs corresponding to two candidate shapes can be resampled using the affine parameters to construct matching pairs for similarity evaluation is analysed.
Affine transformation and the definitions of its parameters
The general affine transformation can be mathematically represented as follows:
where
where Sx and Sy are the scaling parameters, shx and shy are the shear parameters, and θ is the rotation angle. Thus, a1, a2, a3, and a4 can be expressed as follows:
Figure 1 shows two bird shapes that were defined as being affinely related with each other. Points P0: [
x
0, y0]
T
and

Two contour points chosen for CDR computation
where N is the number of contour points and [x
k
, y
k
]
T
represents the coordinates of any contour point. Contour points
Similarly, distance
Suppose points P1 and P2 are located opposite to the barycentre P0, and these three points are located on a straight line. According to the properties of affine transformation mentioned in section 2.1, points
Equation (7) shows that the CDR derived from two opposite contour points to the barycentre will be invariant to affine transformation, and thus, it can be adopted as a feature of shape representation.
Although CDRs are invariant to affine transformation, some issues need to be further analysed for real applications. As mentioned above, the equivalent relations of equation (7) are constructed under the following constraints:
Points P1 and P2 are located opposite the barycentre P0.
Points
The original point of the coordinate system can be translated to the barycentre P0 to address the first problem. Thus, the azimuth angles of P1 and P2 will have a difference of 180°. This property can be applied to construct all pairs of corresponding points for computing CDRs. On the other hand, the second problem is difficult to solve because the affine parameters between the candidate shapes cannot be obtained directly.
Fortunately, the equivalent relation of CDRs expressed in equation (6) can be applied to estimate the affine parameters, as shown in Figure 2.

Affine parameter estimation method based on GA
First, multiple contour points along the azimuth angle of the first shape are selected and then the CDRs are computed. More CDRs related to the contour points in the second shape are computed with minor angle intervals to match the selected contour points with the azimuth angles of the affine-transformed points. Then, a GA is adopted to find the optimal affine transformation parameters. For a group of given affine parameters, the affine-transformed coordinates related to the selected contour points from the first shape are computed and their azimuth angles are obtained to select the CDRs of the corresponding points in the second shape. Thereafter, the correlation value of the two sets of CDRs is obtained with a matching strategy referred to as the degree of fitness of the GA. The correlation value can be used to modify the affine parameters for the next GA generation until the iteration is terminated. Finally, the affine parameters with optimal matching to the two shapes are regarded as the real affine parameters between the two shapes.
As mentioned in section 2.3, the contour points for CDR computation in the first shape can be chosen arbitrarily. Nevertheless, the contour points for CDR computation in the second shape should be determined according to the original points and the affine parameters. Symbols α and

Definition of the azimuth angles of the contour points
Considering an arbitrary pair of contour points Pi and
According to equation (1),
Therefore, an azimuth angle-based approach is obtained for matching point construction. For any contour point P
i
: [xi, yi]
T
in the first shape, the azimuth angle α
i
can be obtained using equation (8). Given a group of affine parameters sx, sy, shx, shy, and θ, the linear transformation coefficients of matrix

CDR-α and CDR-α curves for two affine-related bird shapes
As can be seen in Figure 4, α and
The average coordinate of the points in a neighbouring region with a prior-defined angle range in place of the position of each contour point was calculated to reduce the impacts of noises and break caused by partial occlusion.
Selecting 60 appropriate points from the total of 480 points in an affine-transformed shape to match with those found on the original shape was a challenging task. This task, which is one of the major findings in this study, will be discussed in the next section.
This section describes the affine parameter estimation method using the shape features of CDRs based on a GA. The similarity assessment strategy is also introduced for shape classification.
GAs and their application in this study
In the 1970s, Holland viewed GAs as an algorithmic concept based on the Darwinian theory of “survival of the fittest.” A GA is a global optimization random searching algorithm that can be classified as a guided random search evolution algorithm that uses probability to guide its search. GAs do not depend on the specific field in question, rather, they are iterative algorithms that depend on the generation-by-generation development of possible solutions, with selection schemes permitting the elimination of bad solutions and the replication of good ones that can be modified. A genetic search process is composed of three stages, namely, selection, crossover and mutation. GAs are widely used for pattern recognition applications, such as handwriting recognition [18], road recognition [19], license plate recognition [20], face recognition [21], and blood cell recognition, [22] among others, because of their excellent characteristics.
In the current study, a GA was used to search for the optimal solution of affine parameters. Some preliminary studies should be conducted for encoding initial population generation, and calculation of the degree of fitness. If S is defined as an individual of a population that consists of affine parameters, then f(S) is defined as the fitness function that represents the degree of similarity between two groups of CDRs. This study can be converted to a problem of searching for an individual S, which allows f(S) to have the maximum value.
Individual coding
Considering the affine parameters of sx, sy, shx, shy, and θ in equation (2), s x can be set to a constant value of 1 without any influence to the value of CDRs in this study. Then, sy, sh x , shy, and θ can be regarded as the genes of an individual. The individual S can be expressed as follows:
where each S can build a one-to-one corresponding relationship between the contour points of two shapes. Thus, the degree of matching can be determined using the fitness function f(S).
The fitness function is used to measure each individual's degree of fitness for finding the optimal solution, and it is the unique rule in the selection stage of GA. As mentioned above, 60 and 480 CDRs were obtained for two shapes, respectively. Suppose the CDR sequences of two shapes are denoted as o[i], i = 0,1,2,…,59 and q[j], j = 0,1,2,…, 479, which can be expressed as follows:
Since each individual S can determine a unique relationship between α and
where n[S,i] ∊ {0,1,…479} represents the index number of the CDRs of the second shape corresponding to the index number of i of the first shape. The fitness function f(S) can be defined as follows:
where (o(i)-q(n[S,i]))/(o(i)+q(n[S,i])) is defined as the relative difference between the CDRs of the two shapes. Since the size of the shape may be changed under the affine transformation, the relative difference is more accurate for describing the similarities of the two shapes than the absolute difference of o(i)-q(n[S,i]). According to equation (13), o(i)-q(n[S,i]) will reach zero if the individual S is close to the actual parameters of the affine transformation between the two shapes. Thus, the value of f(S) can reach a maximum value of 1. f(S) can be used as the fitness function for the GA.
According to equation (10), the individual S consists of four factors. The floating point representation (FPR) is used for individual encoding. Moreover, the value range of each factor is designated based on its physical meaning as follows: sy ∊ [0.25,4], shx ∊ [−0.5,0.5], shy ∊ [−0.5,0.5], and θ ∊ [0,360°]. In the proposed method, the population size was set to 80 and the generation was set to 100. Russian roulette was used to select the individuals for the next generation. The results of the Russian roulette depend on the fitness probability, which is defined as p(i) as follows:
The number of duplicates of the same individual for the next generation should be restricted in the early period of the generation to avoid the premature convergence problem caused by high selection pressure. In this study, the upper limit value of the number of choices of the same individual was designated as four for the first 40 generations. In addition, no special requests were encountered for the crossover and mutation probabilities. The value ranges of the crossover and mutation probabilities are usually defined as [0.5, 0.99] and [0.001, 0.1], which are determined based on the experimental results. They were designated as 0.8 and 0.08 in the current study, respectively. Moreover, the two-point crossover and the uniform mutation operators were used in the proposed method. The implementation procedures of the proposed method consist of the following steps:
Based on an edge detection algorithm, such as a Sobel, Canny or Prewitt operator, the contour points of the two shapes are extracted. All the coordinates are recorded for the subsequent operations.
For each designated azimuth angle, the average coordinates of all the contour points in the adjacent area with the predefined angle range are computed.
The 60 CDRs for the first shape and the 480 CDRs for the second shape are calculated using equations (5) and (6), respectively.
The initial population consisting of 80 individuals is created.
The fitness value of f(St) for each individual of Si is computed using equations (11)–(13). If the maximum of the fitness values exceeds 0.6 (it will be explained in section 4), the iteration can be terminated, and the affine parameters of the optimal solution are obtained.
The fitness probability is obtained using equation (14), and 80 individuals are selected for a new population.
Using the crossover probability, the selected gene of a given individual is exchanged with the same gene of another individual selected randomly.
Using the mutation probability, the selected gene is replaced with a random value within its value range.
A new generation of the population is obtained. If the generation number is less than 100, go back to step 5).
The individual with the maximum degree of fitness from the last generation is chosen as the optimal solution. If the two shapes are affinely related, the solution obtained from the steps above will be approximated to the real affine parameters. Therefore, the fitness value will be greater compared with those derived from two unrelated images. The value of f(S) can be regarded as the coefficient of the similarity evaluation for shape classification.
The proposed algorithm was evaluated on a number of natural and synthetic images, including a rabbit, a plane, a bat, a butterfly, and various shapes of birds, among others (see Figure 5). The affine-related shapes of the natural and synthetic images were obtained using image processing software. The operations consisted of translation, scaling rotation, distortion, and the combination of all.

Some shapes tested in the experiments
The CDRs for both shapes were calculated. As presented in section 2.4, the CDR sequences of the first shape, named as o[i], consists of 60 samples with an angle interval of 3°, whereas the q[j] of the second shape consists of 480 samples with an angle interval of 0.75°. In addition, an angle range of 1.5° was designated for computing the average coordinates of the points. With this method, the impacts of noise and errors of edge detection can be reduced. Meanwhile, The CDR of the breakpoint can still be obtained if its corresponding angle range is less than 1.5°. Thus, our method will be effective for a small partial occlusion.
The experiments focused on three issues to test the reliability of the performance of the proposed algorithm. First, since the processing methods used for the two candidate shapes were different from each other, the two affinely related shapes should be alternately regarded as the first and second shapes for conformance testing. Table 1 shows the experimental results on four groups of candidate shapes, each with three CDR curves.
The first curve from array o[i] was built according to the first shape. The other two curves were derived from the second shape. The second curve from array q[j] consisted of 480 samples, whereas the third curve included 60 samples selected from q[j], according to the optimal solution of affine parameters. The vertical and horizontal axes of the CDR curves represent the CDR value and the azimuth angle, respectively. As can be seen in Table 1, the first and third curves were similar, indicating that the two shapes were affinely related. The f(S) value in group I was similar to that in group II, and a similar result was obtained in groups
and
. The results demonstrate that the proposed method is robust, regardless of which of the two affine-related shapes is designated as the first shape.
Some experimental results of the conformance testing
The second issue verified in the experiments was the differences in the fitness values between the affinely related and unrelated shapes. The unique factor involved was the evaluation of the similarity of the shapes.
As can be seen in Table 2, the maximal fitness value of all the unrelated shapes was 0.5741, whereas the minimal fitness value of the affine-related shapes was 0.6469. Thus, the fitness value of f(S) can be applied to shape discrimination.
Some experimental results on affinely related and unrelated shapes
Multiple affinely related shapes were used to compute the fitness value and to test the capability of the proposed recognition method on affinely related shapes. Table 3 presents experimental results on some affinely related butterfly shapes. During the experiment, the shapes in the first row were regarded as the first shapes, while those in the first column were treated as the second shapes.
Some experimental results on affine-related shapes
As can be seen in Table 3, the fitness values were between 0.6 and 1, and the minimal value was 0.6217, indicating the higher similarity between the affinely related shapes than between the unrelated shapes. Thus, the f(S) value can be used as the evaluation factor for shape classification.
According to the test results of the above experiments, it can be seen that the minimal fitness values of affinely related shapes are greater than 0.6, while the maximal fitness value of unrelated shapes are less than 0.6. Thus, the threshold was set to 0.6 in our method. Of course, this threshold is an empirical value. It looks reasonable in our experiments, but, it should be modified to adjust the trade-off between the false alarm probability and the probability of a miss in a real application.
The computational complexity of the proposed method is mainly dependent on the convergence speed. If the two shapes are affinely related, the iteration process may be terminated early. The time taken to obtain the optimal solution for discriminating between two shapes is usually about three seconds. Two approaches can be used for future studies to reduce time consumption. First, the crossover and mutation operators may be improved to accelerate the convergence speed. For example, the operators can be adjusted to the fitness value and to the characteristics of the coding method of the individuals. Second, the CDRs of some common shapes can be stored in a database as standard samples, which can be directly applied in the current study. Most of the proposed approaches in the literature were critically dependent on the one-by-one coordinate sequences of the contour points. Obtaining coordinate sequences is challenging because the contours of affinely related shapes may be influenced by break and noise. Compared to the methods presented in the literature, the main advantage of the proposed approach is its robustness for break and noise using shapes other than the affine transformation. Moreover, the total time consumed for completing the calculations was also competitive.
The current study proposed a new method for shape recognition that is invariant to affine transformation. The proposed method consists of two main steps, namely, extracting affine invariant features called CDRs and estimating the affine parameters based on the GA. Compared with other methods found in the literature, the advantages of the proposed method include not having to pre-process the impacts of noise, partial occlusion and a variant starting point. Furthermore, the proposed method does not rely on a one-by-one coordinate sequence of the contour points. In the experiments in the current study, 480 CDRs for the second shape were calculated to provide the corresponding points determined using the 60 points in the first shape and the estimated affine parameters.
The main disadvantage of the proposed method is its failure to work properly on central symmetry shapes because the CDRs are always approximately equal to 1. For example, the proposed method cannot distinguish between circles and rectangles, among others, and thus, further investigations are necessary. Considering most natural shapes are complicated and irregular, the proposed method is valuable for real applications.
Footnotes
6.
This work is supported by the innovative research projects of colleges and universities in Chongqing (12A19369).
