Image background reconstruction by Gaussian mixture based model reinforced with temporal-spatial confidence

Abstract

Background reconstruction from an image sequence is an important topic in image processing. However, most existing background reconstruction algorithms do not produce results as good as expected when applied to complex images. The Gaussian mixture model is frequently utilized to represent image features and used to reconstruct background for complex image. A Gaussian mixture-based model for background restoration algorithm is proposed, which evaluates the temporal confidence as well as spatial confidence value to get multiple most reliable models to assess whether a pixel of the image of a background one or a foreground one. During the process, a Sarsa(λ) is utilized to achieve automatic adaption by interaction with the image during the processing to get maximal-reinforced temporal–spatial confidence. To obtain better reconstruction results, a series of pre-processing methods, such as shadow detection and removing, sunshine change relieving and sudden noise detection and removing, are also used before background reconstructing to wipe off negative interface suffered by noises, e.g. shadow, daylight change, and sudden noise. The testing results show that our proposed algorithms work well in background reconstruction.

Keywords

Image processing background reconstruction Gaussian mixture model reinforcement learning Sarsa(λ)

Introduction

In computer vision, moving object detection and segmentation is a very important issue, widely applied in many aspects such as video surveillance, traffic monitoring and image compression.¹ There are usually three types of methods for motion detection, including optical flow method, the adjacent frame difference method and background subtraction method.²

Optical flow method³ is in a position to detect and track moving objects without requiring prior knowledge of the background area; however, the optical flow method is computation costly, sensitive to the noise, and requests advanced hardware as fundament. Adjacent frame difference method⁴ is basically a kind of background subtraction except that it takes advantage of the previous frame as background model rather than modeling by some algorithm, the benefit of which is that it is highly efficient and can adapt to the dynamic changes of the environment, thus achieving real-time motion detection. However, the resulting segmentation of moving targets is incomplete. Background subtraction method⁵ gets moving target by subtracting the background reference frame of the current frame, and then carrying on binarizing with a threshold. Background subtraction method is one of the most simple and effective ways to detect objects without drawbacks of the adjacent frame difference method. One of the simplest implementations of the background subtraction method is tantamount to select a background image without any moving target in advance, and then subtract the current frame with the pre-defined background image. However, as the environment of background often changes, such as illumination changes caused by sunshine within the different time of a day, changes in different seasons, and offset changes of the camera position, background images are supposed to be able to be adaptively updated. To solve this problem, a classical approach is using time-averaged background image which is an approximation of the background image attained by averaging a sequence of images in a period of time. The method tends to incorrectly mix the moving objects with the background image, leading to a hybrid phenomenon called blending.

In real world, image sequences are suffered by shadow, daylight change, and sudden noise. A novel shadow detection and removing algorithm which identifies shadow by comparing the value of candidate shadow pixel with that of the background pixel in HSV space under the assumption that if the hue part value and the saturation part value are both smaller than predefined thresholds, it can be regarded as a shadow point. The sunshine change is key aspect that will affect the background reconstruction.

In this work, we propose a median filtering algorithm with adaptive filtering window that is able to change median filtering window scale according to real-world circumstance as to fit sunlight change, thus relieving the inference. Sudden noise, despite of its short lasting time, always appears unexpectedly and is very difficult to remove. Sudden noise will heavily disturb background reconstruction monitoring. To wipe off sudden noise, we propose an adaptive sudden noise elimination algorithm, which is based on the premise that the sudden noise usually only lasts in adjacent image frames. We perform a series of subtractions to effectively remove the sudden noise. Particularly we propose a Gaussian mixture model-based background reconstruction with temporal and spatial confidence evaluation. The algorithm sets up multiple Gaussian distribution models for each pixel in the image, and computes the corresponding confidence value for each Gaussian distribution model. Each distribution in the model is corresponding to a confidence level and descending sorted by the value of confidence. Thereby, we can get reliable distribution models. We use the model to assess whether the pixel is a foreground one or a background one. Reinforcement learning is able to optimize policy by interacting with the environment. As a kind of reinforcement learning algorithm, Sarsa(λ) can obtain experience by interacting with the environment with minimal cost experience. Hereby, we use a Sarsa(λ)-based algorithm to achieve automatic adaption by interaction with the image during the processing to get maximal-reinforced temporal–spatial confidence so that the algorithm can automatically adapt during the interaction. The testing results show that our proposed algorithms work well in background reconstruction.

Related work

In recent years, people have paid lots of efforts on studying how to achieve adaptive update the background image in the background subtraction method. These efforts can be divided into two categories: reconstructing the background image by the background model with adaptive parameters adjustment, and constituting the background image by selecting pixels from previous images in accordance with certain assumptions.

Ridder et al.⁶ used the Kalman filter for each pixel of the background image so that the system can also work when the light changed. Friedman and Russell⁷ regarded the gray value of a pixel as the weighted values of three Gaussian distributions,⁸ respectively, corresponding to background, foreground, and shadow, and used the EM algorithm to achieve the model parameters.

As the background is usually sophisticated, using background pixels to represent a Gaussian distribution is not enough. Staufer and Grimson⁹ utilized a Gaussian mixture model to represent the distribution of background. KaewTraKulPong and Bowden¹⁰ found that it required a lot of computing time and was not in a position to detect the shadow of the moving objects, and then proposed an improved adaptive hybrid model to reduce the amount of computation to achieve shadow detection.

Elgammal¹¹ believed that only using a very few of Gaussian mixture model to represent the distribution of the background was still not so much accurate; but the increase will reduce the number of Gaussian distribution of motion detection sensitivity, and increase the amount of computation at the same time. They presented a non-parametric kernel density estimation algorithm to improve the motion detection sensitivity; however, it still needed a great amount of computation.

Magee¹² noted that some previous works only established multiple Gaussian mixture model rather than foreground model. They developed a number of foreground model, achieving more effective motion detection. The model used by Xia et al.¹³ was based on a Gaussian distribution for each pixel. Toyama et al.¹⁴ reconstructed the background image from the pixel level, regional level and frame-level features of the image and made use of adaptive Wiener filtering to fulfill background updating. However, there are a few drawbacks. First, the method needed to initialize the model which is generally assumed in the initialization phase when the moving foreground objects are not part of the background image, which is very difficult in practice to meet. Second, the established model is sometimes difficult to fully represent the actual background image, causing moving objects unexpectedly to blend into the background parts. Gloyer et al.¹⁵ proposed a median method which takes the pixel gray value that is in the middle of a sequence of images as the background pixel gray value. However, if the background pixels occur less than 50% of the observed time, then the median method tends to return erroneous results. Kornprobst et al.¹⁶ assuming that the background image is the most frequently observed one, proposed a partial differential equation-based background reconstruction algorithm.

Gaussian-based background reconstruction

The Gaussian model requires a Gaussian probability density function, normal distribution curve, to accurately quantify objects which will be decomposed into a number of models formed by a Gaussian probability density function. Image histogram reflects the image of a gray value of the frequency of appearance. It can be taken into account as estimation of gray probability density. If the target area is larger than the background area of the image, and at the same time the background area is different in gray value with objects, then the image histogram appears in the bimodal-valley shape, where a peak corresponds to the target object and the another peak corresponds to the central gray of background. Complex images are typically multimodal. The multimodal characteristic of histogram can be seen as the superimposing of a plurality of Gaussian distributions.

Foreground is, on the assumption that the background is stationary, any object with meaningful moving. The basic idea of reconstruction modeling is to extract the foreground from the current frame so as to ensure the background approximate closer to the current background, which means using the current frames and the current background frame sequence to update the weighted average of the background. Meanwhile, it is not in a position to deal with influence of the sudden light change and influence from the external environment. The Gaussian mixture model is one of the most efficacious methods for background reconstruction. The Gaussian mixture model uses K Gaussian models to represent the features of each pixel in the image, and update the model with the new frame of image. During the process of background reconstruction, the pixel that matches with the Gaussian mixture model is regarded as the background pixel; otherwise it is foreground one. The process of background reconstruction with the Gaussian mixture model is shown in Algorithm 1.

Reinforcement learning

Reinforcement learning¹⁷ provides a framework to directly learn from the interaction in the process of achieving goals. In reinforcement learning framework, there are five essential elements: agent, environment, state, reward and action. An agent is an entity that has cognitive skills, the ability to solve the problem, and the capability of communication. Through agent, it is possible to set up controlling model which is anthropomorphic. As a result, we can control the behavior of the system and unify other control units, providing a unified description of the method. Therefore, we can say that agent is a de facto learner and decision maker, interacting with the environment. The agent makes the decision of choosing an action; the environment responds to the action, generates new scenes to the agent, and then returns a reward. The reinforcement learning framework¹⁸ is shown as Figure 1.

Figure 1.

Reinforcement-learning framework where the agent chooses an action; the environment generates new scenes, as well as a reward, to the agent as response to the action.

The agent interacts with the environment at each step during a discrete-time sequence. At each time step t, the agent gets the environment state s_t ∈ S, where S is the set of all possible states; the agent chooses an action a _t ∈ A(s_t) under some policy, where a _t ∈ A(s_t) is all available actions. By taking the action, the agent receives a reward r_t₊₁∈R and reaches to a new state s_t₊₁. The ultimate goal of the agent is to maximize the sum of the rewards within the long term. The mapping from state to action selection is the agent policy π_t.

As an important algorithm, the temporal difference (TD) learning is capable of learning directly from raw experience without understanding outside the environment.¹⁸ What is more, the model learned by temporal difference is updated by estimation based on part of learning rather than the ultimate results of the learning. These two characteristics of temporal difference make it particularly efficient and suitable for incremental learning. Given some experience with policy π, temporal difference learning updates estimated V of V^π,¹⁹ as

V (s t) \to V (s t) + σ [R t - V (s t)]

(1)

where R_t is actual return after time step t, and α is a step size parameter. Temporal difference learning updates V in step t + 1 using the observed reward r_t_+ 1 and estimated V(S_t_+ 1).

Let Q^π(s, a) be the value of taking action a, in S under a policy. Q^π(s, a)²⁰ can be defined as

Q π (s, a) = E π {R t | s t = s, a t = a} = E π {\sum_{k = 0}^{\infty} γ k r t + k + 1 | s t = s, a t = a} .

(2)

Sarsa (State-action-return-state-action) is a class of online TD algorithm and Sarsa(λ) is a class of Sarsa algorithm which uses eligibility trace.¹⁷ The update of Q^π(s, a)¹⁷ is as

Q t + 1 (s, a) = Q t (s, a) + α δ t e t (s, a)

(3)

where

δ t = r t + 1 + γ Q t (s t + 1, a t + 1) - Q t (s t, a t)

and

Image pre-processing

Shadow elimination

The shadow of the moving object of the foreground will cause incorrect detection of moving object,²⁰ thus leading to incorrectly reconstructing the background. Therefore, it is very necessary to identify and remove shadow of moving objects.

As shadow is generally black, we can identify it by comparing the value of candidate shadow pixel with that of the background pixel. In HSV space, if the hue part value and the saturation part value are both smaller than predefined thresholds, it can be regarded as a shadow point. The decision process is shown as Algorithm 2.

Algorithm 1.

Background reconstruction with Gaussian mixture model.

Input: image pixels

Output: reconstructed background of the image

1: for each pixel in the image

2: initialize mean value, standard deviation and weight value

3: end for

4: get N (N > 200) frames of image

5: for each pixel in the first N images

6: obtain mean value, standard deviation and weight value with online EM algorithm

7: end for

8: M = arg min(ω/σ > T)

9: for each pixel from N + 1 images

10: sort descending the Gaussian kernels by ω/σ

11: choose the first M Gaussian kernels

12: if the pixel is in the M kernel then set it as background pixel

13: Update background image

14: end for

15: return reconstructed background of the image

In Algorithm 2, the value parameter V in the shadowed area is generally less than that of the background area, and the parameter β is a value to measure the strength of brightness where the brightness of the light is inversely proportional to the value of β.

Sunshine change relieving

The sunshine change, which is unavoidable, will negatively affect the background. Obviously, the fully dark circumstance is very different from the circumstance with sunlight. The basic idea is to use the median filtering. However, as the median filtering window has to be predefined, it is unable to change the window scale according.

We hereby put forward an adaptive algorithm that is able to fit sunlight change so as to relieve the inference.

To solve this problem, we hereby put forward an adaptive median filtering method of which the fundamental idea is to change median filtering window scale automatically according to real world circumstance. The main processing steps are as Algorithm 3.

Sudden noise elimination

Sudden noise, e.g. flying bird and falling leaves, although usually occurring and existing for a short time, will heavily disturb background reconstruction. However, it is very difficult to remove as a sudden noise always appears unexpectedly. To wipe off sudden noise, we propose an adaptive sudden noise elimination algorithm, which is based on the premise that the sudden noise usually only lasts a very short duration and therefore it is unreasonable to be regarded as background.

Initially, we get the previous image frame and background image frame of the current image. Then we get the difference, which represents information about current background and noise, between the current image and the current background. As the duration between two image sequences is usually very short, the sudden noise would occur in both the previous image and the current image. Therefore, sudden noise can be removed by subtraction, thus making the difference image only has foreground objects. After that we get foreground objects and background image by corresponding pixels. The processing steps are indicated by Algorithm 4.

Algorithm 2.

Shadow detection and removing.

Input: image

Output: image after shadow elimination

1: set saturation threshold T_s

2: set hue threshold T_h

3: set parameter γ is set a value less than 1

4: set strength of brightness β

5: for each pixel in the image

6: get H and S value

7: if

β \leq \frac{V new (pixel)}{V model} \leq γ

and

(S new (pixel) - S model (pixel)) \leq T s

and

| H new (pixel) - H model (pixel) | \leq T H

then

S (pixel) = 1

8: else

S (pixel) = 0

10: end if

11: end for

12: Update background image

13: end for

14: return image after shadow elimination

Algorithm 3

Sunshine change relieving.

Input: image

Output: image after sunshine relieving

1: get current working window W

2: for each pixel (x,y) in W

3: get grayscale gs_xy of pixel (x,y)threshold T_s

4: set upper threshold of grayscale threshold_u for current window

5: set low threshold of grayscale threshold_l for current window

6: get the first m frames of image in W

7: get the last n frames of image in W

8: get minimum grayscale of image by

gs min = \frac{1}{\frac{1}{m + n} \sum_{i = 1}^{m + n} \frac{1}{gs min}}

9: get maximum grayscale of image by

gs max = \frac{1}{\frac{1}{m + n} \sum_{i = 1}^{m + n} \frac{1}{gs max}}

10: get median value by

gs med = \frac{gs min + gs max}{2}

11: if gs_min < gs_med < gs_max then

12: if gs_min < gs_xy < gs_max then

13: return gs_xy

14: else

15: return gs_med;

16: else

17: expand W

18: if W is larger than input image then

19 return gs_xy

20 end if

21: end if

22: end for

23: return image after sunshine relieving

Algorithm 4.

Sudden noise detection and removing.

Input: image sequences with sudden noise

Output: image after sudden noise elimination

1: get current image I_c

2: get previous image I_p

3: get background image I_b

4: for each pixel(x, y) in image

5: get difference between current image and previous image by

Diff 1 (x, y) = | I c (x, y) - I p (x, y) |

6: get difference between current image and background image by

Diff 2 (x, y) = | I c (x, y) - I b (x, y) |

7: if

Diff 1 (x, y) = 1

and

Diff 2 (x, y) = 1

then

8: pixel(x, y) is foreground

9: else

10: pixel(x, y) is background

11: end if

12: end for

13: return image after sudden noise elimination

Algorithm 5.

Sarsa(λ)-based background reconstruction.

Input: image sequence

Output: reconstructed background image

1: for all s, a

2: initialize confidence (s, a) arbitrarily

3: e(s, a) = 0

4: end for

5: for each episode

6: initialize s, a

7: take action a, and observe r, s’

8: select a’ from s’ with maximal confidence

9: δ←r + γ confidence (s’, a’)- confidence (s, a)

10: e(s,a) ←e(s,a) + 1

11: for all s, a:

12: confidence (s,a) ←confidence (s,a) + α

13: e(s,a) ←γ λ e(s,a)

14: end for

15: s ← s’

16: a ←a’

17: end for

18: return background reconstructed

Reinfoced temporal and spatial confidence

In a sequence of images, the pixels with the steady state value are the most likely to be the background ones, but if the foreground objects move very slowly or temporarily kept in a stationary, the pixels in the steady state are is often the foreground ones.

In this work, the reconstruction algorithm restores background images from past observations, mainly using several algorithms for background updates. In the first kind of cases, the background is updated regularly, that is the time period specified in the last several image sequences extracted utilizing algorithm 3, which is applicable to the situation slowly varying, to reconstruct the background. In the second case, the background is updated with a mutation strategy. If the number of pixels in the current frame, which is distinct from the reference image, exceeds a threshold value, the background should be reconstructed with Algorithm 4. For most other cases, we should use another approach to get the background.

Some research work shows that integration of various of information is more conducive to achieve a reasonable reconstruction effect, that is, it is necessary to consider the optimization of the classification of pixels within the temporal features, as well as consider spatial features, which has been theoretically proven by Milan Sonka and their experimental results showed that comprehensive integrated information can obtain more reasonable results than the method using only a single kind of information.

In this work, we, integrating the temporal and spatial information of the image and combining with the related confidence values, propose a novel background reconstruction algorithm.

The algorithm sets up multiple Gaussian distribution models for each pixel in the image, and computes the corresponding confidence value for each Gaussian distribution model. Each distribution in the model is corresponding to a confidence level and descending sorted by the value of confidence. Thereby, we can get top p reliable distribution models and the last q unreliable distribution models. During the processing, although the reliable distributed models are initially arranged ahead of unreliable distribution ones, unreliable distribution model can be rearranged by cumulating matching the pixels with the distribution model and then turn to be a reliable distribution; on the other hand, a reliable distribution model can also turn to be an unreliable one through mismatching the pixel with the reliable model. When we process a new arrival images, the algorithm will sequentially check the matching of the current pixel value with each related Gaussian distribution, so as to determine whether the type of the current pixel is a background pixel, a foreground one, or a unreliable one. After that, based on the multi-Gaussian models, we choose to create a new distribution or update the current distribution. Meanwhile, the pixel in the current image that is determined as a background pixel value will be utilized to generate the background image. In order to increase the adaptability of the model, the algorithm also removes outdated reliable and timely distribution. We also take advantage of reinforcement learning so as the algorithm can automatically adapt during the interaction.

In reinforcement learning framework, the action defines the learning agent behavior. In our algorithm, the action is defined as Table 1.

Table 1.

Actions and corresponding value.

Value	Action
1	Set the current pixel as background pixel
2	Set the current pixel as foreground pixel
3	Set the current pixel as unsettled pixel

In our work, the agent makes decisions in accordance with different conditions, and returns an action sequence, so as to get maximal confidence. To move further, the model gives out an optimal decision made in a certain state. Here, we use temporal and spacial feature confidence to evaluate whether the pixel is a background one, as

confidence = ω 1 confidence 1 + ω 2 confidence 2

(5)

where ω is the weight value, confidence₁ is the confidence value returned by sequential image, representing temporal part of confidence, and confidence₂ is the confidence value returned by current image, denoting spacial part of confidence which is represent by pixel position. The process of Sarsa(λ)-based background reconstruction is shown in Algorithm 5.

Experiment and results

In order to comprehensively evaluate performance of the model, we set up two different scenarios. In the first scenario, candidate objects are small. In the second scenario, candidate objects are large. We used Algorithms 1–4 to pre-process the images and Algorithm 5 to reconstruct the background image. We also used Sarsa(λ) to learn a controller with learning rate = 0.6, discount rate = 0.95, and λ = 0.5 by 1000 episodes. The weighting values were chosen as ω₁ = 0.6 and ω₂ = 0.4.

We obtained a sequence of images from a traffic surveillance vedio from which the object is large with blur image. We can see from Figure 2 that, despite of some part of noise, the above procedure can restore the background. There are minor noisy pixels left in the result.

Figure 2.

The left image is a snapshot from a traffic-monitoring vedio and the right image is the resconstruceted background.

We obtained another sequence of images from the same surveillance camera where moving objects are cleared. Figure 3 shows that the above procedure is able to restore the background successfully.

Figure 3.

The left image is a snapshot from a traffic-monitoring vedio and the right image is the resconstruceted background.

Conclusion

Background reconstructiion is a fundamental and key technology in image processing. It is easily contaminated by noises such as shadow, daylight change, and sudden noise. Many existing background reconstruction algorithms do not perform well. In this work, we derived a Gaussian mixture model based on background restoration algorithms. The algorithm, which takes advantage of Sarsa(λ) to achieve automatic adaption by interaction with the image during the processing, takes the temporal and spatial confidence to set up several most reliable models to determine whether the pixel of the image is of a background one or a foreground one.

To achieve better reconstruction background, a series of algorithms is proposed to clean in the image in the pre-processing stage. The testing results show that the proposed procedure works successfully in recovering the background.