Face Recognition under Illumination Variation Using Shadow Compensation and Pixel Selection

Abstract

We propose a robust face recognition method under illumination variation. By using shadow compensation methods, we restore the image of a facial image taken under arbitrary illumination into an image that is similar to the image taken with frontal illumination. Then we apply a pixel selection method to these restored images in order to reduce the noise components, which can interfere with the extraction of discriminant features for face recognition. The experimental results for the CMU-PIE, Yale B and Multi-PIE databases show that the proposed method results in the improvement of recognition performance under illumination variation.

Keywords

Shadow compensation Fourier analysis feature extraction discriminant analysis pixel selection noise reduction face recognition

1. Introduction

Face recognition is used to identify individuals from facial images by using face databases containing people's identities. Compared to other types of biometric recognition, such as fingerprint, retinal or iris recognition, face recognition is less invasive and does not require a subject to be in proximity to or in contact with a sensor, which makes it widely applicable in user identification, e-commerce, access control, surveillance and human-computer interaction.

Numerous methods have been developed for face recognition over the last few decades. Most of them are capable of achieving high recognition rates in well-controlled environments. However, the accuracy of face recognition can deteriorate significantly when the images used for enrolment (gallery) and recognition (probe) are taken under different environmental conditions.

Illumination variation is one of the challenging problems to be solved for robust face recognition systems. The changes induced by illumination, such as cast shadows or attached shadows, can be larger than the innate differences between individuals. The methods used to solve the problems caused by illumination variation can be roughly classified into three categories [1], [2], [3]: face modelling, invariant feature extraction and pre-processing and normalization. Face model-based methods construct a generative 3D face model that can be used to render facial images under various illumination conditions. Most of these methods require several images of the subject using light from different directions or 3D shape information for training [4], [5]. The methods belonging to the second category extract features, such as edge maps [6], gradientfaces [7], Gabor features [8] or local binary patterns (LBP) [9], which are invariant to illumination variation. In pre-processing and normalization, various image processing techniques are used to compensate for the illumination variation [10], [12], [11], [13]. These methods [10], [12], [13] modify a facial image taken under arbitrary illumination conditions into an image that is similar to one taken under frontal illumination. The method in [13] restored a frontal illumination image from an image captured under arbitrary lighting conditions by using the ratio-image between a test image and a reference image, and iterated the process to obtain a visually better restored image. The quotient image (QI) [14], which is defined as the ratio between a test image and a linear combination of three noncoplanar illumination images, depends only on the albedo information and is free of illumination. The method in [11] used the ratio between a test image and its smoothed image to obtain a self-quotient image (SQI).

From human perception, facial images with frontal illumination have a good visual effect and are natural for the investigation of discriminant information for face recognition. In addition, the compensated images can be applied to well-known appearance-based face recognition methods such as Fisherface [15], DCV [16] or ERE [17]. In this paper, we focus on the methods belonging to the third category for handling illumination variation in face recognition.

We first present two state of the art methods [12], [19] for shadow compensation, which are simple and give good recognition performance compared to other methods. The method in [12] decomposes a facial image into magnitude components and phase components in the frequency domain using the Fourier transform. The facial image that was deteriorated by shadow was restored using the auxiliary magnitude, which compensates for the magnitude components distorted due to the illumination variation. In [19], the illumination variation was compensated for by adding an adequate average difference image depending on the light direction, under the assumption that the shadow on a facial image created by a similar light direction has common characteristics. However, the resultant compensated images of both the above methods still have many noisy pixels, which are likely to interfere in extracting discriminant features for face recognition. Thus, we can expect to obtain better discriminant features by eliminating noisy pixels in the compensated images, which will lead to improved face recognition performance. Even though various pixel selection methods can be used to remove noisy pixels [20], [21], [18], we modify the pixel selection method in [20] based on discriminant analysis and apply it to remove those noisy pixels. Experimental results show that the proposed method improves recognition performance in the presence of illumination variations.

The rest of this paper is organized as follows. Section 2 and 3 present how to compensate for the shadow in a facial image and how to remove noisy pixels in the compensated images, respectively. Section IV consists of the experimental results, followed by the conclusion in Section V.

2. Shadow Compensation Methods

2.1 Shadow Compensation Based on Fourier Analysis (SCFA)

In [12], a shadow compensation method was performed using Fourier analysis to deal with illumination variation. The image signal in the spatial domain can be represented in the frequency domain using the Fourier transform [21]. Let us denote an image of M × N pixels as I(x, y) ∈ R ^M×N . The Fourier transform of I(x, y) is generally complex valued and thus it can be represented in terms of magnitude and phase as F_I(u, v) = |F_I(u, v)| e^j|_I(u,v). The magnitude |F_I(u, v)| describes the basic frequency content of a signal providing information about the relative magnitudes of the complex exponentials that make up I(x, y). The phase angle Ø_I(u, v), on the other hand, provides information concerning the relative phases of these exponentials. In a facial image, some of the most discriminant information is contained in the edges and regions of high contrast. Thus, the phase components, which have the structural information of the image, are very important in face recognition [12].

The shadows in a facial image due to illumination variations change the magnitude components of the image in the frequency domain. In contrast, the phase components are less prone to the effects of illumination variations [12]. In order to restore a facial image that had been deteriorated by shadows, the method in [12] compensated for the distorted magnitude components using the auxiliary magnitude |F_{I_Aux}(u,v)|, which represents the characteristic of the magnitude for general human faces. If letting an arbitrary facial image with shadows and its Fourier representation be I_A and |F_{I_A}(u,v)|e^{j∅_{I_A^(u,v)}}, respectively, the restored image I_Re can be obtained from the compensated magnitude |F_C(u, v)| and its own phase ∅_{I_A}(u,v) as follows:

I_{R e} (x, y) = \frac{1}{M N} \sum_{u = 0}^{M - 1} \sum_{v = 0}^{N - 1} F_{c} (u, v) e^{j 2 π (u x / M + v y / N)}

(1)

where |F_c(u,v)| = 1/2(|F_{I_A}(u,v)|+|F_{I_Aux}(u,v)|), ∅_C(u,v) = ∅_{I_A}(u,v),x = M – 1 and y = 0,1,..n – 1.

Figures 1(a) and 1(b) show the raw images and their restored images using (1), respectively.

Figure 1.

Examples of the images in which shadows are compensated. (a) Raw images. (b) Shadow compensated images using (1). (c) Shadow compensated images using (2).

2.2 Shadow Compensation Using Weighted Average Difference (SCWAD)

Generally, human faces are similar in shape in that they are comprised of two eyes and two eyebrows, a nose and a mouth [10]. Each of these components makes a shadow on the face, thereby showing distinctive characteristics depending on the direction of light. If a facial image taken under frontal illumination is set as a reference image for a given facial image that is taken under an arbitrary illumination, the information about the shadow characteristics of the given facial image is contained in the difference image between the facial image and the reference image. Thus, if the light direction of a given facial image can be estimated, we can compensate for the shadow in the facial image using the average difference images D_A^l, which is an average of the individual difference images for the same light direction.

First, L light direction categories {C_I|l = 1,..,L} are defined from the left side to the right side. For a given facial image I_A, the binary image B(I_A) of I_A is produced in order to estimate the light direction. Then, the distances between B(I_A) and the binary images in each category C_l are calculated using a classification rule [19]. From the three nearest distances (dist^NN, i = 1,2,3) and their corresponding categories C_l, i = 1,2,3, the weight ω_{I_i}, is defined as

ω_{l_{i}} = \frac{{d i s t}_{(4 - i)}^{N N}}{\sum_{i = 1}^{3} {d i s t}_{i}^{N N}},

The value ω_{I_i}, indicates how well the shadow of the given facial image can be characterized by D_A^l_i. Finally, the shadow compensated image, I^C of I_A, is obtained with the average differences as follows.

I^{C} (x, y) = I_{A} (x, y) + \sum_{i = 1}^{3} ω_{l_{i}} D_{A}^{(l_{i})} (x, y)

(2)

Figure 1(c) shows the images compensated using (2).

3. Pixel Selection Based on Discriminant Analysis

It is shown in the compensated images of Figs. 1(b) and 1(c) that noise components, which were generated in the process of compensating magnitude components or adding average difference images, are distributed over a whole facial image. This is because the average operations in the compensation procedures dilute the unique features of an individual in a facial image. In addition, the remaining cast shadows after shadow compensation can affect the extraction of discriminant features for face recognition. Therefore, in order to improve the performance of face recognition using these compensated images, we modify the pixel selection method based on discriminant analysis [20] and apply it to distinguish informative pixels from all pixels in the image. By eliminating the noisy pixels that have less discriminative information, we can expect to extract better discriminant features for face recognition, which will result in improved recognition performance.

The projection matrix W to be used for pixel selection is obtained using DCV [16]. The k-th image sample I_k can be represented as a vector x_k = O_L(I_k), where O_L(·) is the lexicographic ordering operator that transforms a matrix into a vector [23]. When the training set contains N samples and c classes, each of which has N_i samples, between(S_B)- and within(S_W)-class scatter matrices are defined as

\begin{array}{l} S_{B} = \frac{1}{N} \sum_{i = 1}^{c} N_{i} (μ^{i} - μ) (μ^{i} - μ)^{T} \\ S_{W} = \frac{1}{N} \sum_{i = 1}^{c} \sum_{x_{k} \in {c_{i}}} (x_{k} {- μ}^{i}) (x_{k} {- μ}^{i})^{T} \end{array}

where μ is the total mean of the whole training set, μⁱ is the mean of the class c_i, and x _k is a sample belonging to class c_i. Then, the objective function of DCV is defined as follows:

W = \begin{matrix} \arg m a x \\ W \end{matrix} \frac{| W^{T} S_{B} W |}{| W^{T} S_{W} W |}

The projection matrix W consists of projection vectors w _i (= [ω_i1,..,ω_in] ^T ) s (W = [w₁, w₂,..,w _m ]). The discriminate feature vector y _k = [y_k1,…,y_km] ^T is obtained from x _k as

y_{k} = W^{T} x_{k} \in R^{m}, k = 1, 2, \dots, N .

(3)

Each feature y_ki(k) in (3) is computed by projecting x_k onto w_i(y_ki=w_i^Tx_k). Since each feature is expressed by a linear combination of individual pixels x_kl s, l = 1,.., n, of which coefficients are ω_ils, the value |ω_il| indicates how much the l-th pixel x_kl contributes to constructing the i-th feature y_kl. For example, if |ω_i1| is larger than |ω_i2|, x_k1 can be regarded as a more important pixel for extracting a discriminant feature than x_k2. Therefore, |ω_il| of each w _i can be used as a measurement of discriminative information in the l-th pixel x_kl. Without defining an order vector separately [20], using |ω_il| as a measurement makes the pixel selection process more simple, which might offer advantages in terms of computation when dealing with high-dimensional data such as face images. From W, which consists of m projection vectors w _i s, i = 1,..,m, m different measurements can be produced. Since a projection vector with a larger eigenvalue has more discriminative power, these measurements are merged as a single measure vector M = [M₁,.., M_n] ^T , based on the eigenvalue λ_i of each w _i as follows.

M_{l} = \sum_{i} α_{i} |ω_{i l}|, α_{i} = \frac{λ_{i}}{\sum_{j} λ_{j}}

Finally, n_s pixels corresponding to larger M_l s are selected to use in extracting the final features for face recognition.

The pixel selection method can be summarized as follows: 1)

Obtain the projection vectors w _i s (i = 1,..,m) using DCV.

Produce the measure vector M using w _i s, (i = 1,..,m) and their eigenvalues λ_is, (i = 1,..,m).

Select ns pixels based on M_ls, (l = 1,..,m).

Extract the final features with only the selected pixels for face recognition.

4. Experimental Results

In order to show the effectiveness of the proposed method, we have applied it to the CMU-PIE [24], Yale B [4] and Multi-PIE databases [26]. All the facial images were cropped in proportion to the distance between the eyes, which were manually detected, and were downscaled to a size of 120(pixels) x 100(pixels). The measure vector M for pixel selection was made with the images from the FERET database [25] that reflects general facial information well and has a large number of subjects. The features for recognition were extracted from these downscaled images using DCV [16] and one nearest neighbour rule was used with the l₂ norm as a classifier.

The CMU-PIE database contains images of 68 subjects with 21 illumination variations. Among them, we selected the images of only 65 subjects, because the images of the other subjects had some defects or did not include all types of illumination variations. Three images for each subject, which were taken with left side, right side and frontal illumination, were used for training, while the other 18 images not used for training were tested. Among the test images, one image under frontal illumination was used as a gallery image and the other 17 images were used as probe images.

Figure 2(a) shows the recognition rates for different number of features. Performing only histogram equalization (HIST) gave 27.9% better recognition rate than that of the raw images (RAW). The recognition rates for the shadow compensated images (SCFA [12] and SCWAD [19]) increased by 4.0% and 0.3%, respectively. As shown in Fig. 3(a), the SCFA-PS images gave the best recognition rates of 97.0%. Table 1 shows the comparisons of the proposed methods (SCFA-PS and SCWAD-PS) and various methods that deal with illumination variation for face recognition. The SCFA-PS and SCWAD-PS images are obtained by selecting 7000 and 8000 pixels, which were only 58.3% and 66.7% of the total pixels, from the SCFA and SCWAD images, respectively, using the pixel selection method. As can be seen in Table I, the proposed methods were better than other methods, showing that SCFA-PS and SCWAD-PS gave recognition rates of 97.0% and 93.6%, which were 1.0% and 1.3% better than SCFA and SCWAD respectively.

Table 1.

Recognition rates for different methods on the cmu-pie database (%).

No. Feature (*)	16 (69)	24 (89)	32 (109)	40 (129)	48 (149)	56 (169)	64 (189)
Eigenfaces w/o 1st 2 [15]	71.0	71.9	72.3	72.0	72.0	71.9	71.9
FLD [15]	66.6	71.6	74.2	75.8	76.4	76.6	76.7
LBP [9]	82.6	84.7	85.3	86.2	86.2	85.9	85.0
SQI [11]	60.5	66.8	70.9	73.9	74.8	74.7	76.1
SCAD [10]	80.9	85.3	86.2	87.1	88.2	89.0	89.7
SCFA [12]	86.7	91.1	92.9	94.7	94.9	95.6	96.0
SCWAD [19]	81.4	86.3	87.9	89.1	90.0	91.1	92.3
SCFA-PS	89.3	91.7	94.1	96.0	96.3	97.0	97.0
SCWAD-PS	81.3	88.1	89.3	91.0	91.6	92.9	93.6

: No. Feature of Eigenfaces w/o 1st 2

Figure 2.

Recognition rates on the CMU-PIE and Yale B databases.

The Yale B database contains images of 10 individuals in nine poses and with 64 illuminations per pose. We used 45 facial images for each subject in the frontal pose (YaleB/Pose00) which were further subdivided into four subsets (Subset i, i=1,2,3,4) depending on the direction of light as in [4]. The index of the subset increases as the light source moves away from the front in taking the pictures. The images in Subset 1 were selected as a training set and the other images in Subsets 2, 3 and 4 were used as the test set.

Figure 2(b) shows the recognition rates for a different number of features which are based on the RAW, HIST, SCFA and SCWAD images. Similar to the cases of the CMU-PIE database, the histogram equalization alone gave 4.9% improvement in recognition rates compared to the RAW image, and the SCFA and SCWAD images additionally gave 3.8% and 9.3% increases, respectively. The SCFA-PS images gave the best recognition rates of 97.1 %.

In Table 2, the recognition performances of the proposed methods are compared with those of various methods. The SCFA-PS and SCWAD-PS images are obtained by selecting 2000 and 9000 pixels, which were only 16.7% and 75.3% of the total pixels, from the SCFA and SCWAD images, respectively. As can be seen in Table 2, the proposed methods gave higher recognition rates compared to the other methods, showing that SCFA-PS and SCWAD-PS gave the recognition rates of 95.3% and 97.1%, which were 5.5% and 1.8% better than SCFA and SCWAD, respectively. In particular, the performance of SCWAD-PS images is most notable for Subset 4, where images are severely affected by shadows.

Table 2.

Recognition rates for the yale b database (%).

Data	Subset 1	Subset 2	Subset 3	Subset 4	All
Eigenfaces w/o 1st 2 [15]	100	100	80.0	29.2	74.9
FLD [15]	100	100	91.4	67.5	88.7
SQI [11]	100	95.8	96.4	76.7	91.6
SCAD [10]	100	100	95.7	86.7	95.1
SCFA [12]	100	100	94.3	68.3	89.8
SCWAD [19]	100	100	95.7	87.5	95.3
SCFA-PS	100	100	97.9	85.0	95.3
SCWAD-PS	100	100	97.9	91.7	97.1

The Multi-PIE database is recorded during four sessions over the course of six months. It contains a total 755,370 images from 337 different subjects. Individual session attendance varied between a minimum of 203 and a maximum 249 subjects. We used the 4,233 images of 249 subjects with 17 illumination variations from the first session. The recognition rates for the Multi-PIE database are shown in Table 3. The both SCFA-PS and SCWAD-PS images are obtained by selecting 7000 pixels, which were only 58.3% of the total pixels, from the SCFA and SCWAD images, respectively. As in the previous two databases, the proposed method also gave better recognition rates for the Multi-PIE database, which is the larger database, than the other methods. The recognition rates of SCFA-PS and SCWAD-PS were improved by 3.3% and 1.1% compared to SCFA and SCWAD, respectively.

Table 3.

Recognition rates for the multi-pie database (%).

No. Feature (*)	53 (55)	60 (95)	67 (135)	74 (175)	81 (215)	88 (255)	95 (295)
Eigenfaces w/o 1st 2 [15]	42.0	46.1	47.9	49.0	49.8	50.2	50.6
FLD [15]	81.3	82.2	82.7	83.0	83.4	83.5	83.6
SQI [11]	74.9	77.0	78.3	79.1	79.9	81.1	81.4
SCAD [10]	85.7	86.4	87.6	88.2	88.8	89.3	89.7
SCFA [12]	84.4	85.5	86.0	86.6	87.1	87.8	88.3
SCWAD [19]	83.1	84.5	86.0	86.7	87.7	88.7	89.5
SCFA-PS	87.8	88.5	89.5	89.9	90.6	91.1	91.6
SCWAD-PS	85.7	86.6	87.7	88.8	89.5	90.1	90.6

: No. Feature of Eigenfaces w/o 1st 2

5. Conclusions

Shadows caused by illumination variation can change the appearance of a face and degrade the face recognition performance. In order to alleviate the influence of shadow on face recognition, we compensated for the illumination variation by restoring the facial image deteriorated by shadow to the image similar to an image under frontal illumination using state of the art methods [12], [19]. In addition, we improved the face recognition performance by applying a pixel selection method based on discriminant analysis, which removed noisy pixels that were generated in the shadow compensation process. The experimental results for the CMU-PIE, Yale B and Multi-PIE databases showed that the proposed method improved the face recognition performance in the presence of illumination variation.

References

Javier

and Julio

(2008) Illumination compensation and normalization in eigenspace-based face recognition: A comparative study of different pre-processing approaches. Int. Pattern Recognition Letters, vol. 29, no. 14, pp. 1966–1979.

Zou

Kittler

, and Messer

(2007) Illumination invariant face recognition: A survey. Int. IEEE Int'l Conf. on Biomet.: Theory, Appli., and Sys., pp. 1–8.

Choi

S.-I

(2010) Face Recognition Based on 2D Images Under Various Conditions, Ph.D. thesis, Seoul National University.

Georghiades

A.S.

and Belhumeur

P.N.

(2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 23, no. 2, pp. 643–660.

Ramamoorthi

(2002) Analytic PCA construction for theoretical analysis of lighting variability in images of a Lambertian object. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 24, no. 10, pp. 1322–1333.

Adini

Moses

and Ullman

(1997) Face recognition: The problem of compensating for changes in illumination direction. Int. IEEE Trans.Pattern Anal. and Mach. Intell., vol. 19, no. 7, pp. 721–732.

Zhang

Tang

Y.Y.

Fang

Shang

and Liu

(2009) Face Recognition Under Varying Illumination Using Gradientfaces. Int. IEEE Trans. Image Processing, vol. 18, no. 11, pp. 2599–2606.

Liu

and Wechsler

(2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. Int. IEEE Trans. Image Processing, vol. 11, no. 4, pp. 467–476.

Ahonen

Hadid

and Pietikäinen

(2006) Face description with local binary patterns: Application to face recognition. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 28, no. 12, pp. 2037–2041.

10.

Choi

S.-I

Kim

and Choi

C.-H

(2007) Shadow compensation in 2D images for face recognition. Int. Pattern Recognition, vol. 40, no. 7, pp. 2118–2125.

11.

Wang

, and Wang

(2004) Face recognition under varying lighting conditions using self quotient image. in Proc. IEEE Int'l Conf. on Auto. Face and Gest. Recog. pp. 819–824.

12.

Choi

S.-I

and Jeong

G.-M

(2011) Shadow compensation using Fourier analysis with application to face recognition. Int. IEEE Signal Processing Letters, vol. 18, no. 1, pp. 23–26.

13.

Shen

L.S.

Liu

D.H.

and Lam

K.M.

(2005) Illumination invariant face recognition. Int. Pattern Recognition, vol. 38, no. 10, pp. 1705–1716.

14.

Shashua

and Riklin-Raviv

(2001) The quotient image: Class based re-rendering and recognition with varying illuminations. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 23, no. 2, pp. 129–139.

15.

Belhumeur

P.N.

Hespanha

J.P.

and Kriegman

D.J.

(1997) Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 19, no. 7, pp. 711–720.

16.

Cevikalp

Neamtu

Wilkes

and Barkana

(2005) Discriminative common vectors for face recognition. Int. IEEE Trans. Pattern Anal. And Mach. Intell., vol. 27, no. 1, pp. 4–13.

17.

Jiang

Mandal

and Kot

(2008) Eigenfeature regularization and extraction in face recognition. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 30, no. 3, pp. 1–12.

18.

Chow

T.W.S.

and Huang

(2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. Int. IEEE Trans. Neural Networks, vol. 16, no. 1, pp. 213–224.

19.

Choi

S.-I

Choi

C.-H

and Kwak

(2011) Face recognition based on 2D images under illumination and pose variations. Int. Pattern Recognition Letters, vol. 32, pp. 561–571.

20.

Choi

S.-I

Choi

C.-H

and Jeong

G.-M

, and Kwak

(2011) Pixel selection based on discriminant features with application to face recognition. Int. Pattern Recognition Letters, vol. 33, pp. 1083–1092.

21.

Gonzales

R.C.

and Woods

R.E.

(2002) Digital Image Processing, 2nd ed. Prentice Hall, NJ USA.

22.

Oppenheim

A.V.

Willsky

A.S.

and Nawab

S.H.

(1996) Signal and Systems, 2nd ed. Prentice Hall, NJ, USA, pp. 303–304, 425–426.

23.

Kim

Choi

S.-I

Turk

and Choi

C.-H

(2012) A new biased discriminant analysis using composite vectors for eye detection. Int. IEEE Trans. Systems, Man, and Cybernetics- Part B, vol. 42.

24.

Sim

Baker

and Bsat

(2003) The CMU Pose, Illumination, and Expression Database. Int. IEEE Trans. Pattern Anal. and Mach. Intell., vol. 25, no. 12, pp. 1615–1618.

25.

National Institute of Standards and Technology, The Color FERET database, http://www.nist.gov/humanid/-colorferet.

26.

Gross

Matthews

Cohn

J.F.

Kanade

, and Baker

, “Multi-PIE,” Image and Vision Computing, vol. 28, pp. 807–813, 2010.