Variational Inference of Kalman Filter and Its Application in Wireless Sensor Networks

Abstract

An improved Kalman filter algorithm by using variational inference (VIKF) is proposed. With variational method, the joint posterior distribution of the states is approximately decomposed into several relatively independent posterior distributions. To avoid the difficulty of high-dimensional integrals, these independent posterior distributions are solved by using Kullback-Leibler divergence. The variational inference of Kalman filter includes two steps, the predict step and the update step, and an iterative process is included in the update step to get the optimized solutions of the posterior distribution. To verify the effectiveness of the proposed algorithm, VIKF is applied to the state estimation of discrete linear state space and the tracking problems in wireless sensor networks. Simulation results show that the variational approximation is effective and reliable for the linear state space, especially for the case with time-varying non-Gaussian noise.

1. Introduction

The Kalman filter (KF), also known as linear quadratic estimation (LQE), is one of the most used linear state estimation methods and has numerous applications [1], which include guidance, navigation, and control of vehicles [2–5]. Kalman filter is also widely applied in time series analysis in fields such as signal processing and econometrics [6, 7]. Other nonlinear methods, such as Extension-Kalman filter (EKF) [2] and unscented Kalman filter (UKF) [8], also use the Kalman filter to fix problems, combined with the linearization of nonlinear functions or the approximation of the probability density distribution, respectively. Practical implementation of the Kalman Filter is often difficult due to the inability of getting a good estimate of the noise covariance matrices. Extensive researches have been carried out in this field to estimate these covariances from data, such as the autocovariance least-squares (ALS) [9, 10], the modified Bryson-Frazier smoother [11], and the minimum-variance smoother [12, 13].

Variational inference for the Kalman filter in linear state space model was also considered in several literatures [14–16]. However, the authors in [14] just used the limited memory BFGS method to decrease the computational burden, both factorization and Kullback-Leibler (KL) divergence [15] (the expectation of joint distribution with holding some variable distribution as constant) were not considered. Works in [16] indeed applied the variational approximation, and got the posterior distributions of the state variables and the measurement noise. However, in the update step of the Variational Bayesian approximation Adaptive Kalman Filter (VBAKF), the initial values of states or variances were used in all iterative steps; thus, the iterations would quickly reach “steady states.” Sometimes, the maximum number of the iterations is only 2 in the algorithm. That is to say that there are only two effective iterative loops performed in the update step of the Kalman filter, therefore, the performance of this algorithm is not good in such cases as the non-Gaussian and high-dimensional linear state space models. In this paper, we rederive the parameter expressions for the Kalman filter by using variational inference method (we call it VIKF in this paper), which can really form an iterative process for the update step of the Kalman filter.

Location and tracking are the keys in variety WSN applications [17–19], where the Kalman filter and its extension algorithms are widely used. In this paper, we will also consider the application of VIKF for WSN tracking problems, especially with time-varying and non-Gaussian noises.

This paper is organized as follows. We firstly introduce the linear state space model, then the derivation of Kalman filter by using variational inference (VIKF) follows, and the iterative process of VIKF is presented. In simulation section, the effectiveness of our proposed method will be demonstrated by examples. Finally, we will compare VIKF with standard KF and VBAKF in the cases of high-dimensional discrete state spaces, and the tracking in WSN with non-Gaussian noise.

2. The Linear State Space Model

Linear state space models are a widely used class of models for control, economics, and so on. The most general state-space representation of a discrete-time linear system with K samples, n state variables, and m outputs is written in the following form:

\begin{matrix} X_{k} = M_{k} X_{k - 1} + ε_{k}^{x}, \end{matrix}

(1)

\begin{matrix} Y_{k} = N_{k} X_{k} + ε_{k}^{y}, \end{matrix}

(2)

where

X_{k}

is an

n \times 1

state vector of the system at time k,

k = 1,2, \dots, K

M_{k}

is the state transition matrix at time k.

ε_{k}^{x}

is the corresponding prediction error vector of

X_{k}

ε_{k}^{x}

N (0, Q_{k})

ε_{k}^{x}

is assumed to be independent and identically distributed (i.d.d), so

Q_{k}

is a diagonal matrix.

Y_{k}

is the observation vector, and we assume that there are m observations, so

N_{k}

is an

m \times n

linear observation operator matrix.

ε_{k}^{y}

is an

m \times 1

observation error vector and assumed to be i.d.d; therefore,

ε_{k}^{y} ~ N (0, Γ_{k})

Γ_{k} = diag (σ_{k, 1}^{2}, \dots, σ_{k, m}^{2})

. Time is indexed by k.

In general, we assume that the initial state has a Gaussian prior distribution $X_{0}$ ~ $N (μ_{0}, P_{0})$ , and $M_{k}$ , $Q_{k}$ , $N_{k}$ , $μ_{0}$ , and $P_{0}$ are all known. In tracking problems, $X_{k}$ includes the information of positions and speeds of the moving target, $X_{k} = {[p_{x}, p_{y}, s_{x}, s_{y}]}^{T}$ , where $(p_{x}, p_{y})$ is the two-dimensional coordinate positions and $(s_{x}, s_{y})$ is the two-dimensional speeds.

3. Variational Inference for the Linear Space Model

Variational inference techniques have been extensively studied to solve problems in various fields [20–23]. There are two steps in the optimization for the standard Kalman filter [24]. The first step is to predict the state $X_{k}$ and $Γ_{k}$ from the last state $X_{k - 1}$ and the observations $Y_{1 : k - 1}$ . The second step is to update $X_{k}$ and $Γ_{k}$ using information from new observations $Y_{k}$ .

3.1. Predict Step

This step calculates $X_{k}$ and $Γ_{k}$ from $Y_{1 : k - 1}$ . Here we assume that $p (X_{k}, Γ_{k} ∣ Y_{1 : k - 1})$ can be decomposed as

\begin{matrix} p (X_{k}, Γ_{k} ∣ Y_{1 : k - 1}) = p (X_{k} ∣ Y_{1 : k - 1}) p (Γ_{k} ∣ Y_{1 : k - 1}) . \end{matrix}

(3)

According to (1), we know that $p (X_{k} ∣ Y_{1 : k - 1})$ is a Gaussian distribution:

\begin{matrix} p (X_{k} ∣ Y_{1 : k - 1}) ~ N (μ_{k}^{'}, P_{k}^{'}) \end{matrix}

(4)

with

\begin{matrix} μ_{k}^{'} = M_{k} μ_{k - 1}, \\ P_{k}^{'} = M_{k} P_{k - 1} M_{k}^{T} + Q_{k}, \end{matrix}

(5)

where

P_{k - 1}

is the variance of the distribution of last state

X_{k - 1}

$p (Γ_{k} ∣ Y_{1 : k - 1})$ is the posterior variance of observation errors. We can assume that $p (Γ_{k} ∣ Y_{1 : k - 1})$ equals $p (Γ_{k - 1} ∣ Y_{1 : k - 1})$ temporarily in the prediction step, and it will be updated in the update step. We also assume that this posterior distribution is an inverse-Gamma distribution (also called Wald distribution). The inverse-Gamma distribution is often used in Bayesian inference, where the distribution arises as the marginal posterior distribution for the unknown variance of a normal distribution.

$p (Γ_{k} ∣ Y_{1 : k - 1})$ can be decomposed as

\begin{matrix} p (Γ_{k} ∣ Y_{1 : k - 1}) = \prod_{i = 1}^{m} IG (α_{k, i}^{'}, β_{k, i}^{'}), \end{matrix}

(6)

where

α_{k, i}^{'}

β_{k, i}^{'}

are the shape and scale parameters of inverse Gamma distribution

IG (\cdot)

, and set

\begin{matrix} α_{k, i}^{'} = α_{k - 1, i}, \\ β_{k, i}^{'} = β_{k - 1, i} . \end{matrix}

(7)

3.2. Update Step

This step updates $X_{k}$ and $Γ_{k}$ based on the new observation $Y_{k}$ . According to the Bayes' rule,

\begin{matrix} p (X_{k}, Γ_{k} ∣ Y_{1 : k}) \propto p (Y_{k}, X_{k}, Γ_{k} ∣ Y_{1 : k - 1}) . \end{matrix}

(8)

We assume that $p (X_{k}, Γ_{k} ∣ Y_{1 : k})$ can be factored as

\begin{matrix} p (X_{k}, Γ_{k} ∣ Y_{1 : k}) = q (X_{k}) q (Γ_{k}) . \end{matrix}

(9)

The variational method gets the approximation of each component by minimizing the Kullback-Leibler (KL) divergence between the estimated value and the true joint posterior. Each component can be obtained by performing the expectation operation to the joint distribution with respect to all other unknown parameters. So we have

\begin{matrix} \ln q_{X_{k}} (X_{k}) = E_{Γ_{k}} (\ln p (Y_{k}, X_{k}, Γ_{k} ∣ Y_{1 : k - 1})) + const ., \end{matrix}

(10)

\begin{matrix} \ln q_{Γ_{k}} (Γ_{k}) = E_{X_{k}} (\ln p (Y_{k}, X_{k}, Γ_{k} ∣ Y_{1 : k - 1})) + const . \end{matrix}

(11)

According to the known priori, we calculate (10):

\begin{array}{l} \ln q_{X_{k}} (X_{k}) & = E_{Γ_{k}} (\ln p (Y_{k}, X_{k}, Γ_{k} ∣ Y_{1 : k - 1})) + const . \\ = E_{Γ_{k}} (\ln [p (Y_{k} ∣ X_{k}, Γ_{k}) * p (X_{k} ∣ Y_{1 : k - 1}) \\ * p (Γ_{k} ∣ Y_{1 : k - 1})]) + const . \\ = E_{Γ_{k}} (\ln [p (Y_{k} ∣ X_{k}, Γ_{k}) * p (X_{k} ∣ Y_{1 : k - 1})]) \\ + const ., \end{array}

(12)

where

E_{Γ_{k}} (\ln p (Γ_{k} ∣ Y_{1 : k - 1}))

is a constant In (12),

p (Y_{k} ∣ X_{k}, Γ_{k})

is a Gaussian distribution (see (2)), so we have

\begin{matrix} p (Y_{k} ∣ X_{k}, Γ_{k}) ~ N (Y_{k} - N_{k} X_{k}, Γ_{k}) . \end{matrix}

(13)

From the predict step, we know that $p (X_{k} ∣ Y_{1 : k - 1})$ is also a Gaussian distribution $N (μ_{k}^{'}, P_{k}^{'})$ (see (4) and (5)). The product of two Gaussian functions is still a Gaussian, so

\begin{matrix} q_{X_{k}} (X_{k}) = N (X_{k} ∣ μ_{k}, P_{k}) . \end{matrix}

(14)

Based on (4) and (13), we get the analytical expression of $μ_{k}$ and $P_{k}$ , and because variational method usually requires an iterative process to achieve convergence, so we expressed them as

\begin{matrix} μ_{k}^{(s + 1)} = μ_{k}^{(s)} + P_{k}^{(s)} N_{k}^{T} {(N_{k} {(P_{k}^{(s)})}^{- 1} N_{k}^{T} + Γ^{'})}^{- 1} (Y_{k} - N_{k} μ_{k}^{(s)}), \end{matrix}

(15)

\begin{matrix} P_{k}^{(s + 1)} = P_{k}^{(s)} - P_{k}^{(s)} N_{k}^{T} {(N_{k} {(P_{k}^{(s)})}^{- 1} N_{k}^{T} + Γ^{'})}^{- 1} N_{k} P_{k}^{(s)}, \end{matrix}

(16)

where the superscript

(s)

means the

s th

iteration,

Γ^{'}

diag (β_{k, 1} / α_{k, 1}, \dots, β_{k, m} / α_{k, m})

β_{k, i}

α_{k, i}

i = 1, \dots, m

will be computed lately. At the beginning of the iteration, we use the results of the predict step, set

μ_{k}^{(0)} = μ_{k}^{'}

P_{k}^{(0)} = P_{k}^{'}

Similar results also appeared in [16] and the algorithm there was called VBAKF. In the iteration process, VBAKF computes $μ_{k}^{(s + 1)}$ and $P_{k}^{(s + 1)}$ with $μ_{k}^{(0)}$ and $P_{k}^{(0)}$ , not $μ_{k}^{(s)}$ and $P_{k}^{(s)}$ ; therefore, VBAKF cannot form sufficient iteration during the update process. The maximum number of effective iterations of VBAKF is small, sometimes even only 2, so the performance cannot achieve the optimal solution of the variational method.

Similarly, we calculate $\ln q_{Γ_{k}} (Γ_{k}) = E_{X_{k}} (\ln p (Y_{k}, X_{k}, Γ_{k} ∣ Y_{1 : k})) + const$ . as

\begin{array}{l} \ln q_{Γ_{k}} (Γ_{k}) = E_{X_{k}} (\ln p (Y_{k}, X_{k}, Γ_{k} ∣ Y_{1 : k})) + const . \\ = E_{X_{k}} (\ln [p (Y_{k} ∣ X_{k}, Γ_{k}) * p (X_{k} ∣ Y_{1 : k - 1}) \\ * p (Γ_{k} ∣ Y_{1 : k - 1})]) + const . \\ = E_{X_{k}} (\ln [p (Y_{k} ∣ X_{k}, Γ_{k}) * p (Γ_{k} ∣ Y_{1 : k - 1})]) \\ + const . \end{array}

(17)

Based on the known conditions, we know that $q_{Γ_{k}} (Γ_{k})$ is still a Wald distribution, and can be expressed as

\begin{matrix} Q_{Γ_{k}} (Γ_{k}) = \prod_{i = 1}^{m} IG (σ_{k, i}^{2} ∣ α_{k, i}, β_{k, i}) . \end{matrix}

(18)

In the iterations, parameters are calculated as follows:

\begin{matrix} α_{k, i}^{(s + 1)} = \frac{1}{2} + α_{k, i}^{(s)}, \end{matrix}

(19)

\begin{matrix} β_{k, i}^{(s + 1)} = β_{k, i}^{(s)} + \frac{1}{2} [{(Y_{k} - N_{k} m_{k}^{(s + 1)})}_{i}^{2} + {(N_{k} P_{k}^{(s + 1)} N_{k}^{T})}_{i i}] . \end{matrix}

(20)

The initial values can be obtained from the predict step, so

α_{k, i}^{(0)} = α_{k, i}^{'} β_{k, i}^{(0)} = β_{k, i}^{'}

We can see that (15), (16), (19), and (20) form an iterative process.

Variational method is an approximate calculation, not the best computation. Through simulations, we find that the performance of the algorithm is better if we set $α_{k, i}^{(s + 1)} = 1 / 2 + α_{k, i}^{(0)}$ . Therefore, $α_{k, i}^{}$ does not participate in the iterative process in our algorithm.

During the iteration, the iterative process should be executed with a sufficient number of times to reach a steady state. A simple stop condition for this is to test the value of the difference of inferred parameters between two successive iterations; that is, $S = {∥ μ_{k}^{(t + 1)} - μ_{k}^{(t)} ∥}^{2}$ . If the difference is small enough (say ξ), the iteration has reached a stable state. The selection for ξ can influence the accuracy of the algorithm. The smaller ξ is, the higher the accuracy is, at the cost of more iterations and more computational time needed.

3.3. The Algorithm of VIKF

VIKF includes two steps; the first step is to predict the present state with the initial information or the results of previous states, and the second step is the update step, which includes an iterative process to update the state with new observations.

The process of the algorithm is as shown in Algorithm 1.

Algorithm 1

Input observation matrix $Y_{1 : n}$ ; set initial hyper parameters $μ_{0}, P_{0}$ , $α_{0, i}, β_{0, i}, i = 1, \dots, m$ ,

Set a small value to ξ, which is used to judge whether the loop of update step comes to an end or not.

Predict:

Calculate $μ_{k}^{'}, P_{k}^{'}$ by (5);

Calculate $α_{k, i}^{'}, β_{k, i}^{'}$ by (7)

Update:

Set $μ_{k}^{(0)} = μ_{k}^{'}$ , $P_{k}^{(0)} = P_{k}^{'}$ . $Γ^{'} = diag (β_{k, 1} / α_{k, 1}, \dots, β_{k, m} / α_{k, m})$ , $α_{k, i} = 1 / 2 + α_{k, i}^{'}$ ;

Set Flag = 1;

While Flag

Update $μ_{k}$ by (15), Update $P_{k}$ by (16);

Update $β_{k}$ by (20);

Calculate $Γ^{'} = diag (β_{k, 1} / α_{k, 1}, \dots, β_{k, m} / α_{k, m})$ and $S = {\sum_{i} (μ_{k, i}^{(s + 1)} - μ_{k, i}^{(s)})}^{2}$

If $S < ξ$ , then set Flag = 0;

End flag

Output $μ_{k}$ in the last iteration as the inferred value $X_{k}$ .

4. Simulations

The performance of proposed algorithm (VIKF) was simulated and compared with that of the standard Kalman filter (KF) and the VBAKF. We first applied the three algorithms to high-dimensional discrete state space with the noise of Gaussian or Student's t-distributions. Residual mean square errors (RMSE) between the true state and the inferred state were calculated in all simulations and were used as the main evaluation criteria. We also applied these three algorithms to the tracking problems in the wireless sensor networks. In these simulations, time-varying parameters are considered for the noise of Student's t-distribution.

4.1. High-Dimensional Discrete State Space Simulation

We considered a 40-dimensional discrete linear state space; the dimension of observation vector was 80, and there are 30 samples. State transition matrix and observation operator matrix were all sparse matrixes. In each simulation case, 20 replicates were conducted and the results were averaged to get the performance of RMSE.

In the case of Gaussian noise, we assumed that the process noise and the observation noise were all subject to $N (0,1)$ . In the simulations, the freedom of Student's t-distribution was 2 for the process noise, while the freedom was selected randomly from set $(1,2, 3,4)$ for the observation noise. The noise intensities were 1 for all the cases in simulations.

The RMSE performances of VIKF, VBAKF, and KF are shown in Figure 1, Figure 2 (for Student's t-distribution), and Figure 3 (for Gaussian distribution). The results shown in Figure 1(a) show the RMSE of each state, which are obtained by averaging the results of the 20 replicates. Figure 1(b) shows the accumulated RMSE of all steps for each replicate. A typical simulation result is also shown in Figure 2. In these figures, we restrict the display range of the Y-axis to show the results clearly, so some points with high RMSE cannot be seen. We can see that the performance of VIKF is clearly higher than that of VBAKF and KF in the case of noise with Student's t-distribution. KF and VBAKF all have a higher probability of occurring extreme RMSE in certain circumstances, while the VIKF algorithm has a relatively lower probability. The reason that VIKF can decrease the probability of occurring large RMSE is that VIKF can perform sufficient iterations.

Figure 1

Performances of VIKF, VBAKF, and KF for the case of Student's t-distribution. (a) The RMSE of each state obtained by averaging the results of 20 replicates. (b) The accumulated RMSE of all 30 steps for each replicate.

Figure 2

A typical simulation result with Student's t-distribution in high-dimensional linear state space.

Figure 3

Performances of VIKF, VBAKF, and KF for the case of Gaussian distribution. (a) The RMSE of each state obtained by averaging the results of 20 replicates. (b) The accumulated RMSE of all 30 steps for each replicate.

In the case of Gaussian noise, the performances of these three algorithms are close, as shown in Figure 3.

4.2. Tracking in WSN

The state space for the tracking problem in WSN was fixed as in Section 4, included two-dimensional coordinates and two-dimensional velocities. As the simulations in Section 4.1, we considered two kinds of noise: Gaussian distribution and Student's t-distribution. We assume that there are 15 sensor nodes and 50 sampling points. Noise was set as that in Section 4.1.

As Figures 1 and 3, Figure 4 shows the RMSE performances with Student's t-distribution noise, where Figure 4(a) shows the RMSE of each state obtained by averaging the results of 20 replicates and Figure 4(b) shows the accumulated RMSE of all steps for each replicate. Figure 5 shows the corresponding results with Gaussian noise. Figure 6 shows the tracking performance using the three algorithms with noise of Student's t-distribution.

Figure 4

Figure 5

Figure 6

Tracking performances of VIKF, VBAKF, and KF with Student's t-distribution noise.

With the noise of Student's t-distribution, the performance of VIKF in tracking is higher than that of VBAKF and KF. VIKF's probability of high RMSE is also lower than KF and VBAKF. In Gaussian cases, it seems that the performance of KF is better than that of VBIKF and VIKF.

5. Conclusions

This paper used variational approximation method to solve the problems arisen from high-dimensional discrete state space and the tracking problems in WSN with linear state model. We proposed a modified variational filtering algorithm, in which an iterative process was formed to get the optimization solutions.

The performance of variational methods is slightly lower than that of KF in a Gaussian environment. But in non-Gaussian and time-varying noise environment, the average performances of VIKF and VBAKF are better than that of KF. Since VIKF can carry out more sufficient iterations than the VBAKF in the update step, the performance of VIKF is better than that of VBAKF. VIKF algorithm also has a lower probability of high RMSE than KF and VBAKF in non-Gaussian environments.

Footnotes

Acknowledgments

This work was jointly supported by The National Natural Science Foundation of China (No. 61271207, No. 61174013), the Natural Science Foundation of Jiangsu Province, China (No. BK2011398), the Jiangsu Overseas Research & Training Program for University Prominent Young & Middle-aged Teachers and Presidents, and the Priority Academic Program Development of Jiangsu Higher Education Institutions.

References

Lauritzen

S. L.

Time series analysis in 1880: a discussion of contributions made by TN Thiele

International Statistical Review 1981 49 319 331

Bar-Shalom

X. R.

Kirubarajan

Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software 2004

New York, NY, USA

John Wiley & Sons

Einicke

Ralston

Hargrave

Reid

D. C.

Hainsworth

Longwall mining automation: an application of minimum-variance smoothing [Applications of Control]

IEEE Control Systems 2008 28 6 28 37

2-s2.0-57149115372

10.1109/MCS.2008.929281

Bucy

R. S.

Joseph

P. D.

Filtering for Stochastic Processes with Applications to Guidance 1987 326

The American Mathematical Society

Groves

P. D.

Principles of GNSS, Inertial, and Multisensor Integrated Navigation Systems 2008

Artech House

Paliwal

Basu

A speech enhancement method based on Kalman filtering

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '87)

1987

177 180

Harvey

A. C.

Application of the Kalman filter in econometrics

Advances in Econometrics: Fifth World Congress 1987

Cambridge, Mass, USA

Cambridge University Press

Wan

E. A.

van der Merwe

The unscented Kalman filter

Kalman Filtering and Neural Networks 2001 221 280

Rajamani

M. R.

Data-Based Techniques to Improve State Estimation in Model Predictive Control 2007

ProQuest

10.

Rajamani

M. R.

Rawlings

J. B.

Estimation of the disturbance structure from data using semidefinite programming and optimal weighting

Automatica 2009 45 1 142 148

2-s2.0-58049217305

10.1016/j.automatica.2008.05.032

11.

Bierman

G. J.

Factorization Methods for Discrete Sequential Estimation 2006

Mineola, NY, USA

Courier Dover

12.

Einicke

G. A.

Optimal and robust noncausal filter formulations

IEEE Transactions on Signal Processing 2006 54 3 1069 1077

2-s2.0-33244493548

10.1109/TSP.2005.863042

13.

Einicke

G. A.

Asymptotic optimality of the minimum-variance fixed-interval smoother

IEEE Transactions on Signal Processing 2007 55 4 1543 1547

2-s2.0-34147101569

10.1109/TSP.2006.889402

14.

Auvinen

Bardsley

Haario

Kauranne

The variational Kalman filter and an efficient implementation using limited memory BFGS

International Journal for Numerical Methods in Fluids 2010 64 3 314 335

2-s2.0-77956576458

10.1002/fld.2153

15.

Cardoso

J.-F.

Infomax and maximum likelihood for blind source separation

IEEE Signal Processing Letters 1997 4 4 112 114

2-s2.0-0031122399

10.1109/97.566704

16.

Sarkka

Nummenmaa

Recursive noise adaptive Kalman filtering by variational Bayesian approximations

IEEE Transactions on Automatic Control 2009 54 3 596 600

2-s2.0-63449123513

10.1109/TAC.2008.2008348

17.

Qin

A dynamic neural network approach for solving nonlinear inequalities defined on a graph and its application to distributed, routing-free, range-free localization of WSNs

Neurocomputing 2013 117 72 80

18.

Liu

Chen

Lou

Neural network based mobile phone localization using Bluetooth connectivity

Neural Computing and Applications 2013 23 3-4 667 675

2-s2.0-84860189264

10.1007/s00521-012-0950-1

19.

Kusý

Amundson

Sallai

Völgyesi

Lédeczi

Koutsoukos

RF doppler shift-based mobile sensor tracking and navigation

ACM Transactions on Sensor Networks 2010 7 1, article 1

2-s2.0-77956138083

10.1145/1806895.1806896

20.

Ghahramani

Beal

M. J.

Variational inference for Bayesian mixtures of factor analysers

Advances in Neural Information Processing Systems 2000 12

MIT Press

449 455

21.

Fox

C. W.

Roberts

S. J.

A tutorial on variational Bayesian inference

Artificial Intelligence Review 2012 38 2 85 95

22.

Seeger

M. W.

Sparse linear models: variational approximate inference and Bayesian experimental design

Journal of Physics: Conference Series 2009 197 1

2-s2.0-74349098311

10.1088/1742-6596/197/1/012001

23.

Beal

M. J.

Variational algorithms for approximate Bayesian inference [Ph.D. thesis] 2003

University of London

24.

Jazwinski

A. H.

Stochastic Processes and Filtering Theory 2007

Mineola, NY, USA

Dover