Abstract
The performance of unimodal biometric systems (based on a single modality such as face or fingerprint) has to contend with various problems, such as illumination variation, skin condition and environmental conditions, and device variations. Therefore, multimodal biometric systems have been used to overcome the limitations of unimodal biometrics and provide high accuracy recognition. In this paper, we propose a new multimodal biometric system based on score level fusion of face and both irises' recognition.
Our study has the following novel features. First, the device proposed acquires images of the face and both irises simultaneously. The proposed device consists of a face camera, two iris cameras, near-infrared illuminators and cold mirrors. Second, fast and accurate iris detection is based on two circular edge detections, which are accomplished in the iris image on the basis of the size of the iris detected in the face image. Third, the combined accuracy is enhanced by combining each score for the face and both irises using a support vector machine. The experimental results show that the equal error rate for the proposed method is 0.131%, which is lower than that of face or iris recognition and other fusion methods.
1. Introduction
Biometrics is one of the most widely used approaches for the identification of an individual using physiological or behavioural characteristics such as the face, iris, finger vein or gait [1]. Biometric systems are advantageous because they do not require a person to carry cards or remember information, unlike conventional authentication systems based on smart cards or passwords. Possession-based authentication systems have the disadvantage that keys and tokens can be shared, misplaced, duplicated, lost or stolen, whereas biometric systems avoid these problems [2]. Thus, biometrics has been adopted in many applications. However, unimodal biometric systems (based on a single modality such as the face or fingerprint) face several problems, such as illumination variation, skin condition and the environment, and device variations [3].
For example, the performance of face recognition is easily degraded by the facial pose, expression and various illumination conditions. Iris recognition performance is greatly affected by the huge area of near infrared (NIR) light reflection that hides the iris area, dense eyelashes and defocusing of the input image. To overcome the limitations of unimodal biometrics, much attention has been paid to multimodal biometrics [4]. Multimodal biometrics aims to identify the individual based on two or more human physiological or behavioural characteristics. The key problem of multimodal biometrics is the method used to combine multiple features from each modality to produce better recognition results.
Many studies and algorithms have been proposed for multimodal biometric fusion [5, 6]. The fusion of multimodal biometric system information can be performed at three different levels, i.e., feature level, matching score level and decision level. A popular method is to fuse at the matching score level because it easily facilitates the combination of scores from different matching systems such as face and fingerprint recognition systems [5]. Thus, we focus on fusion at the matching score level to integrate matching scores for the face and both irises in this study. There have been many previous studies of multimodal biometric systems that combine the face with a palmprint or the face with the iris, etc. [4].
Previous multimodal biometric systems based on face and iris recognition have used face and single iris features because of the high accuracy of iris recognition and the convenience of face recognition [7–9]. However, there has been little research on the combination of the face and both irises because of the increased system complexity. The irises of a single person are known to be as different as those of different people [10], so we propose a new multimodal biometric method for combining information from the face and both irises, thereby guaranteeing higher accuracy. In addition, in their experiments, the face and iris data were acquired by combining two different open databases of face and iris, even with the face and iris of the same person [7–9]. This is on the basis of an assumption that the face and iris are perfectly uncorrelated. However, intensive statistical analyses are required for confirming this assumption and we performed the experiments with data on the face and both irises which were actually acquired from persons instead of two different open databases of face and iris.
To simultaneously capture images of the face and both irises, our proposed device consists of a face camera, two iris cameras, NIR illuminators and cold mirrors. Rapid and accurate iris detection based on two circular edge detections (CED), which are accomplished in the iris image on the basis of the size of the iris detected in the face image. The accuracy is enhanced by combining the three matching scores for the face and both irises, with recognition based on a support vector machine (SVM). The remainder of the paper is organized as follows. Section 2 presents the proposed system and methods. Section 3 and section 4 provide the experimental results and conclusions, respectively.
2. Proposed Multimodal Biometric System
2.1. The proposed capture device

Proposed image capture device
Figure 1 shows our proposed device for capturing images of the face and both irises simultaneously. It consists of a face camera, two iris cameras, cold mirrors and a near-infrared (NIR) illuminator (including 36 NIR light emitting diodes [LEDs] with a wavelength of 880 nm). We used three universal serial bus (USB) cameras (Webcam C600 made by Logitech Corp [11]) to capture an image containing 1600 × 1200 pixels at a speed of 30 frames/s. One USB camera was used as the face camera, whereas the others were used as iris cameras with a fixed focus zoom lens. To reduce the processing time during iris recognition, the size of the captured iris image was also reduced to 800 × 600 pixels. The use of a fixed focus zoom lens meant our device provided the requisite resolution for the iris image. The average diameter of the iris captured by our proposed device was 180–280 pixels with a Z-distance operating range of 25–40 cm. The Z-distance is the distance between the camera lens and a user's eye. According to ISO/IEC 19794-6, an iris image where the iris has a diameter > 200 pixels is considered to be “good” quality, 150–200 pixels is “acceptable,” while 100–150 pixels is “marginal” [12]. Therefore, our iris image satisfied the requirement for “acceptable” and “good” quality in terms of the iris diameter.
2.1. Overview of proposed method
An overview of our proposed method is shown in Figure 2. The face recognition process is carried out according to the following process. First, the face and eye regions are detected by AdaBoost and rapid eye detection [13, 14]. Second, size normalization is conducted to eliminate variations in the detected facial region, while the illumination is normalized using the Retinex algorithm [15]. Third, facial features are acquired from the normalized facial image based on principal component analysis (PCA). Finally, the matching score is calculated as the Euclidean distance to provide an input for the SVM. During iris recognition, an iris region is segmented using integer-based CED and with an eyelid/eyelash detection method [16–18]. Iris codes are generated from the segmented iris region. The matching score of the Hamming distance is calculated and used as the SVM input. These procedures are performed for both the left and right iris images captured using the proposed device. The matching scores for the face and both irises are used as SVM inputs and a final authentication is carried out based on the outputs of the SVM.

Overview of our proposed method
2.3. Face recognition method
First, we use the AdaBoost algorithm to detect face regions, as shown in Figure 3 [13]. Next, the two eyes are found by rapid eye detection [14], as shown in Figure 3 and the detected facial region is redefined based on the positions of the two eyes [19]. Size normalization is conducted to reduce any size variations in facial region according to the Z-distances. Therefore, the size of the redefined facial region is normalized to 32 × 32 pixels. Furthermore, there are illumination variations in the facial region, which degrade the performance of face recognition. To address this problem, Retinex preprocessing is used for illumination normalization [15]. Figure 4 shows the results of the size and illumination normalization. Numerous face recognition techniques have been proposed in previous studies, such as principal component analysis (PCA) [20], linear discriminant analysis (LDA) [21], and local binary pattern (LBP) [22]. In this study, we used PCA, which is a popular technique for feature selection and dimensionality reduction, because it is good at representing face images. Finally, the matching score for face recognition was calculated based on the Euclidean distance between the input facial feature vector and the enrolled template feature vector.

The detected face and eye region using our proposed method

The facial region after size and illumination normalization
2.4. Iris recognition method
The captured iris images include the iris, pupil, sclera, eyelids and eyelashes. To isolate iris regions from the images, we performed two CEDs [16]. Integro-differential values are computed between the inner and outer boundaries of the iris (and pupil) while changing the radius value and the centre position of the iris [16]. In our device, the Z-distance operating range is 25–40 cm, while a zoom lens with a fixed zoom factor is used rather than a variable zoom lens to reduce the size of the system and to avoid the use of additional motors when operating a zoom lens. Thus, the average iris diameter in the captured image is 180–280 pixels. The radius searching range for the two CEDs was larger than this to detect iris regions with various diameters (180–280), which can degrade the iris recognition performance in terms of processing and the iris region detection accuracy.
We propose the following methods to overcome this problem. After calibration, we can obtain the relationship between the iris sizes in the face image and iris image. As mentioned in section 2.3, both eyes are found using rapid eye detection and we determine the size of an iris in the image captured by the face camera. Based on this relationship, we can estimate the size of the iris in the image captured by the iris camera, which is used as the radius searching range of the two CEDs. This can reduce the processing time and iris detection errors. In the next step, the eyelids are detected using eyelid detecting masks and a parabolic Hough transform [17]. In addition, eyelash masks are used to detect the eyelashes [18]. Figure 5 shows examples of the detected iris, eyelid and eyelash regions. Finally, iris codes are extracted from the segmented region using a 1-D Gabor filter [23]. Next, the matching score for iris recognition is calculated based on the Hamming distance (HD) between enrolled iris codes and the input ones.

Examples of the detected iris, eyelid and eyelash regions. (a) Original images. (b) Localized regions
2.5. Combining the three scores for the recognition of the face and both irises
We normalize the range of the calculated HD and ED to 0–1 using MIN-MAX scaling for the SVM, because the matching scores are generated with different distance measures. The three normalized scores are used as the three input values for the SVM. The three matching scores are used as the inputs for the SVM. In general, the choice of kernel function and its parameters affects the performance of SVM classification. In this study, we compared the performances of a linear kernel, polynomial kernel, radial base function (RBF) kernel and sigmoid kernel using the LIBSVM program [24]. The optimal kernel and kernel parameters are determined by five-fold cross-validation using the training database. The RBF kernel with a gamma value of 8 was found to be optimal for the SVM. We assigned authentic (genuine) and imposter data as “1” and “-1”, respectively, for SVM training. The optimal threshold for the SVM output was determined, where a minimum equal error rate (EER) of recognition was obtained. The false acceptance rate (FAR) is the error rate of accepting false data (imposter) as genuine. The false rejection rate (FRR) is the error rate of rejecting a genuine person as an imposter. The EER is the error rate when the FAR is almost same as the FRR, which has been widely used to represent the accuracy of conventional biometric systems [1, 7, 23].
3. Experimental Results
To measure the accuracy of our proposed system, we acquired images of the face and both irises using the proposed device with a Z-distance range of 25–40 cm. The ground-truth Z-distance was measured using a laser beam (BOSCH DLE 70 professional model) [25]. Figure 6 shows the images captured with different Z-distances. The database contained in total 3,450 images (face images of 1,150, left iris images of 1,150 and right iris images of 1,150, respectively) from 30 people.

Examples of images captured of the face and both irises
The SVM required the training database to determine the optimal classifier, so half of the collected images (1,725 images [face images of 575, left iris images of 575 and right iris images of 575, respectively]) were used for training and the remaining ones were used for testing. With the training database, the number of genuine samples (positive samples) is 10,774 and that of imposter samples (negative samples) is 154,251, respectively, in the case of face recognition. In the case of left or right iris recognition, those of positive (10,774) and negative samples (154,251) are also the same as face recognition, respectively.
We used the testing database to evaluate the performance of our system as shown in Tables 1~3 and Figure 7. In the first experiment shown in Table 1, the accuracies of face and iris recognition were measured based on the EER. The accuracy of iris recognition was a little better than that of face recognition. This was because the iris camera in our system used a fixed focusing lens rather than a variable focusing lens, and hence, unfocused images are often obtained with a Z-distance of 25 or 40 cm, which degrades the iris recognition performance.
Accuracy of the single recognition method (unit: [%])
Accuracy of combination methods using SVMs with different kernels (unit: [%])
In the next experiment, the performance of the combination methods was compared when using different types of SVM kernels, as shown in Table 2. In all cases, the proposed method using the three matching scores had the best performance.
Figure 7 shows 3-D distributions of the testing data based on the three matching scores for the recognition of the face, left iris and right iris. Figure 7 confirms that the authentic and imposter data could be separated with low false acceptance and rejection error rates. Table 3 shows the EERs for each combination with conventional fusion methods. With the “Min” and “Max” rules, the final score was determined by selecting the minimum and maximum input scores, respectively [5]. As shown in Table 3, the accuracy of our proposed method was better than other fusion and combination methods.
In case of iris recognition, the iris images of severely (optical or motion) blur, or whose iris regions are extremely occluded by eyelid, eyelashes, hair or specular reflections, can affect the performance of the proposed system. In addition, the iris images of severe off-angle (the images are captured when a user gazes at the position far from the iris camera) can also affect the performance. In the case of face recognition, the face images of severe rotation (pan or tilt) of the face and extreme facial expressions (surprise, grimace, etc.) can have an effect on the system performance.

Result of SVM classification using the RBF kernel with the testing database
Comparison of the EERs using conventional fusion methods (unit: [%])
4. Conclusion
In this paper, we proposed a new multimodal biometric system that combines the recognition of the face and both irises to enhance performance based on an SVM. The proposed device captures images of the face and both irises simultaneously. The experimental results showed that the proposed system performs better than face or iris recognition in isolation, as well as the other combination methods. In the future, we plan to investigate a method for combining more modalities such as the face, both irises and a finger vein.
Footnotes
5. Acknowledgments
This research was supported by the Public Welfare and Safety research program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2011-0020976).
