Sage Journals: Discover world-class research

Abstract

Most nonrigid motions use shape-based methods to solve the problem; however, the use of discrete cosine transform trajectory-based methods to solve the nonrigid motion problem is also very prominent. The signal undergoes discrete transformation due to the transform characteristics of the discrete cosine transform. The correlation of the data is well extracted such that a better compression of data is achieved. However, it is important to select the number and sequence of discrete cosine transform trajectory basis appropriately. The error of reconstruction and operational costs will increase for a high value of K (number of trajectory basis). On the other hand, a lower value of K would lead to the exclusion of information components. This will lead to poor accuracy as the structure of the object cannot be fully represented. When the number of trajectory basis is determined, the combination form has a considerable influence on the reconstruction algorithm. This article selects an appropriate number and combination of trajectory basis by analyzing the spectrum of re-projection errors and realizes the automatic selection of trajectory basis. Then, combining with the probability framework of normal distribution of a low-order model matrix, the energy information of the high-frequency part is retained, which not only helps maintain accuracy but also improves reconstruction efficiency. The proposed method can be used to reconstruct the three-dimensional structure of sparse data under more precise prior conditions and lower computational costs.

Keywords

Probabilistic trajectory space nonrigid motion automatic selection trajectory basis matrix normal distribution low-order model

Introduction

Three-dimensional (3-D) reconstruction encompasses many fields such as image processing, stereoscopic vision, and biological engineering, and has attracted considerable research interest in computer vision. The 3-D motion reconstruction of a nonrigid body is an important technique for virtual representation of the objective world. Generally, 3-D motion reconstruction involves recovering camera rotation matrix R and 3-D structure S of a nonrigid body from a given set of 2-D dynamic image sequences.

There are currently four mainstream reconstruction schemes: shape-based 3-D reconstruction,¹ force-based 3-D reconstruction,² shape-trajectory-based 3-D reconstruction,³ and trajectory-based 3-D reconstruction.⁴ Shape-based 3-D reconstruction has the advantages of simplicity and convenience, but in the reconstruction process, the shape basis reconstructed for all the different sequences will lead to the introduction of a large number of unknowns, which makes the algorithm complex and limits its scope. The advantages of force-based 3-D reconstruction are that it is based on the deformed low-rank force space to formulate the problem, which can better explain the acquired a priori information and more accurately represent the behavior of the actual object but in the process of reconstruction, in addition to determining the force and reality. In addition to rotating the data S, it is also necessary to estimate the elastic model of the object, which increases the uncertainty of the reconstruction result. The advantage of the 3-D reconstruction based on the shape trajectory is that it combines the advantages of both the trajectory basis and shape basis; however, the disadvantage is that the a priori unknown is added. Although the reconstruction-based trajectory-based method solves the limitations of the above three methods, the number and type of predefined trajectory basis in this method are difficult to select. The number of trajectory basis and the choice of trajectory basis directly affect the reconstruction accuracy. In particular, the main difficulty in solving the nonrigid motion (NRSFM) problem is that many different 3-D graphics can produce similar observation images, and uniquely considering the re-projection constraints is not sufficient to obtain a single solution of the shape. Therefore, there is a need for more prior knowledge of the deformation of the structure and the motion of the camera. The method based on automatic selection basis-probability model proposed in this article can effectively solve the problems caused by the number of trajectory basis, combination of trajectory basis, and complex prior knowledge.

Related work

Most existing methods adopt the matrix decomposition algorithm to decompose a rigid reconstruction^5

–9 and use prior information in the form of low-rank shape basis.^10,11 Similarly, a low-rank model is proposed to constrain the motion of every point on the object through a predefined trajectory basis.³ The disadvantage of these methods is that they need to be decomposed into a destination matrix that is proportional to the input points, and they can only be applied to relatively low-resolution shapes.

In the NRSFM, 2-D point tracks obtained only from camera motion and reconstruction of time-varying 3-D shapes are an unconstrained problem. This is because the observation results of different 3-D objects and 2-D images obtained using the camera are similar. A few algorithms have been proposed for solving the rigid constraints in a nonrigid body in the process of solving the 3-D motion reconstruction of a nonrigid body by factorization. Costeira and Kanade¹² constructed an orthogonal projection model and then applied the factorization method to reconstruct the structure and motion of independent moving objects. However, its application scope was limited; thus, it could not satisfy the changeable linear combination. Bascle and Blake¹³ proposed the decomposition of the basic shape of a reconstruction target into a group of linear combinations of basic shapes, and the reconstruction problem was simplified into a problem of solving for the basic shape coefficient, which was also the prototype of solving the 3-D motion reconstruction problem of a nonrigid body with the model based on shape basis. Torresani et al.¹⁴ adopted low-rank constraints to track a nonrigid body, that is, a time constraint method combining the simulation of the shape basis coefficient as a linear dynamic system¹⁵ and the establishment of a nonrigid body deformation distribution model with layered factors.¹⁰ Rabaud and Belongie¹⁶ removed the linear basis representation and proposed a method for learning the shape structure from a video. The advantage of this method is that it combines time specifications to prevent the camera and structure from having excessive and blurred changes between frames. Agudo and Moreno-Noguer¹⁷ introduced the force model into 3-D nonrigid reconstruction, which has the advantage of formulating problems based on the low-rank force space of deformation, and better physical interpretation of the obtained prior information. However, in the process of reconstruction, if there is a lack of force and real data, the reconstruction will become difficult. Recently, the research direction of NRSFM introduced the concept of compressibility to enhance the joint of subspaces. Each shape instance¹⁸ uses a different set of shape basis. The application of the above model converts the NRSFM into a three-wire problem, which can be solved using the decomposition technology¹¹ or optimization strategy, implementing smooth spatial,¹⁰ temporal,^19,20 or tight 3-D shapes.²¹ The advantage of this method is that it can reduce excessive and fuzzy changes between the frames of a camera and structure.

Akhter et al.²² proposed that the trajectory of each point can be limited to a low-dimensional subspace. The advantage of this method based on the trajectory basis is that the target unknown basis, such as discrete cosine transform (DCT), can be used to reduce the number of unknown parameters and improve the accuracy of a 3-D structure. Gotardo and Martinez²³ proposed a shape basis coefficient based on DCT by combining the shape basis and trajectory basis. Zhu et al.²⁴ pointed out that sequences with poor reconstruction ability could be remedied by adding rigid key frames. They also emphasized the necessity of selecting a trajectory cardinal order K, instead of using all the DCT basis or applying normalization rules to coefficient vectors to obtain a sparse solution set. However, Zhu et al.²⁴ could not exploit the known advantage of DCT coefficient distribution in natural signals.²⁵ In addition, many methods use predefined basis to restrict the trajectory of each target point, thus transforming the trilinear problem into a bilinear problem,²² which can greatly simplify the trilinear operation problem.²⁶ In the study of Valmadre and Lucey,²⁷ a priori on the trajectory is introduced by using the differential of 3-D points. Its advantage is that it combines the shape basis and trajectory space. The advantage of combining the shape basis with the trajectory basis is that it can generate a smooth time trajectory of a nonrigid shape in the linear shape space.³

In this study, we first use the automatic selection of the trajectory basis, which can not only effectively reduce the large K value used in the previous method, or the smaller K value, but can also reasonably select the combination of the trajectory basis. Firstly, the efficiency and accuracy of the reconstructed structure are maximized. Secondly, the matrix normal distribution establishes a model of the known trajectory space,^10,16 which combines the spatial smoothness with the inherent temporal smoothness of the subspace. Based on the probabilistic model proposed by Agudo and Moreno-Noguer,²⁸ this study adds accurate prior information and provides more accurate decomposition. The experiment shows the accuracy, versatility, and efficiency of our approach in sparse data sets.

Low-rank model NRSFM

The standard matrix decomposition method is generally used for the NRSFM problem. Suppose the motion model being studied consists of F frame images, P feature points are marked on each frame, and the 2-D position of each feature point of the image is marked as ${(u_{j}^{i}, v_{j}^{i})| 1 \leq i \leq F; 1 \leq j \leq P}$ , where $u_{j}^{i}$ is the 2-D horizontal coordinate, $v_{j}^{i}$ is the 2-D vertical coordinate, and $ο_{j}^{i} = {[u_{j}^{i}, v_{j}^{i}]}^{T}$ is the 2-D image projection. All frames and feature points are represented together and arranged into a $2 F \times P$ observation matrix W as follows

\underset{W}{\underset{︸}{[\begin{matrix} κ_{1}^{1} & \dots & κ_{P}^{1} \\ ⋮ & ⋱ & ⋮ \\ κ_{1}^{F} & \dots & κ_{P}^{F} \end{matrix}]}} = \underset{R}{\underset{︸}{[\begin{matrix} R^{​_{1}} & ​ & ​ \\ ​ & ⋱ & ​ \\ ​ & ​ & R^{p} \end{matrix}]}} \underset{S}{\underset{︸}{[\begin{matrix} x_{1}^{1} & \dots & x_{P}^{1} \\ ⋮ & ⋱ & ⋮ \\ x_{1}^{F} & \dots & x_{P}^{F} \end{matrix}]}}

According to Agudo and Moreno-Noguer,²⁸ $κ_{j}^{i}$ is the zero mean coordinate, which is obtained by subtracting the average translation vector from the original coordinate, which means $κ_{j}^{i} = ο_{j}^{i} - η$ , $η = \sum_{j} ο_{j}^{i} / P$ . Unlike other sparse²² technologies, the method proposed in this article can deal with track points lost owing to occlusion or outliers. We initially set the low-rank track prediction model with the 2-D value of $κ_{j}^{i}$ . These initial predictions are then used to calculate R and Φ, Φ is a matrix of unknown coefficient vectors $ϕ_{j} \in ℝ^{3 K \times 1}$ , for each of the points $j = \{1, \dots, P\}$ . After estimating R and Φ, $κ_{j}^{i}$ can be further refined as follows

B = [\begin{matrix} I_{3} \otimes {(b^{1})}^{T} \\ ⋮ \\ I_{3} \otimes {(b^{F})}^{T} \end{matrix}]

κ_{j}^{i} = R^{j} (I_{3} \otimes {(b^{i})}^{T}) ϕ_{j} + η

where ⊗ represents the Kronecker product. Matrix B is the trajectory basis matrix obtained after automatically selecting the basis, and matrix R is a block diagonal, consisting of a $R^{j} \in ℝ^{2 \times 3}$ rotation of an orthophoto camera. In short, the NRSFM problem can be expressed as a given 2-D trajectory matrix W, and the attitude parameter R and the 3-D shape S are recovered simultaneously. The observation matrix W can be decomposed into $W = R S$ , including $R \in ℝ^{2 F \times 3 F}$ and $S \in ℝ^{3 F \times P}$ . Then, we will further decompose it into

W = R S = R Θ A = Λ A

where $Λ = R Θ$ , $Λ \in ℝ^{2 F \times 3 K}$ , $Θ \in ℝ^{3 F \times 3 K}$ , $A \in ℝ^{3 K \times P}$ , and the maximum rank of matrix W is 3K. We can use singular value decomposition (SVD) to decompose it as follows

W = \tilde{Λ} \tilde{A}

Updating the transition matrix

Because the SVD decomposition of matrices is not unique, matrices $\tilde{Λ}$ and $\tilde{A}$ are different from Λ and A, respectively. For any correction matrix of $Q \in ℝ^{3 K \times 3 K}$ , $\tilde{Λ} Q$ and $Q^{- 1} \tilde{A}$ are also valid factorizations. Therefore, to restore the transfer structure, we can obtain

Λ = \tilde{Λ} Q, A = Q^{- 1} \tilde{A}

Instead of the entire matrix Q, we only need to estimate the three columns of Q to correct for $\tilde{Λ}$ and $\tilde{A}$ . Thus, for the F frame, there are $3 F$ constraints and $9 K$ unknown parameters in $Q_{| | |}$ . The rotation matrix R can be estimated by determining $Q_{| | |}$ .

Automatic base selection

The type, number, and combination of trajectory basis considerably influence the performance of the NRSFM algorithm, and the DCT trajectory basis is the optimal general trajectory basis. After the types of locus basis are determined, it is important to select the number and combination of locus basis.

In this article, an automatic selection algorithm based on trajectory basis is proposed.²⁹ The error in the actual 3-D structure S and first SVD decomposition S ¹ are analyzed in the frequency domain space. In addition, the K value is expanded and compared with the 3-D shape error of the sequential trajectory basis restoration. The optimal trajectory basis is selected, thus reducing the 3-D reconstruction error greatly.

Automatically select the trajectory basis algorithm
A nonrigid body structure S ¹ is obtained by SVD decomposition of observation matrix W. Calculate the error of S ¹ obtained from the decomposition of S and SVD of an actual nonrigid body structure of each frame, where p is the number of columns in the observation matrix $err S (j) = (\sum_{i} \sqrt{{(S_{(3 i - 1) j}^{1} - S_{(3 i - 1) j})}^{2} + {(S_{(3 i - 2) j}^{1} - S_{(3 i - 2) j})}^{2} + {(S_{3 i j}^{1} - S_{3 i j})}^{2}}) / p, i = 1, \dots, f, j = 1, \dots, p$ 7 1-D DCT is adopted for error $err S (j)$ , where z is the number of frames $G_{i} = \frac{1}{2} err S (1) + \sum_{k = 2}^{z} err S (k) cos [\frac{π}{z} (i + \frac{1}{2}) k], z = 1, \dots, f$ 8 Determine the spectrum of G_i and the magnitude of $\|G_{i}\|$ . Select a combination of trajectory basis according to the obtained error spectrum amplitude. Then, select K frequency points with the largest amplitude and use them to represent the actual error level τ. If the corresponding amplitude of the selected K points satisfies the following expression $\frac{\sum_{i}^{K} \| G \|_{i}}{\sum_{i}^{n} \| G \|_{i}} \geq τ$ 9 then the position of the frequency point corresponding to the maximum amplitude of K points can be determined. Extract the corresponding K locus basis from the locus base space according to K frequency points. The final trajectory basis is further determined in the trajectory space so as to determine the trajectory cardinal order and combination form. Compare the structural errors after the automatic selection of trajectory basis and sequential reconstruction of trajectory basis, and select the combination of trajectory basis with lowest reconstruction error as the final combination of trajectory basis.

Automatically select the trajectory basis algorithm

A nonrigid body structure S ¹ is obtained by SVD decomposition of observation matrix W.

Calculate the error of S ¹ obtained from the decomposition of S and SVD of an actual nonrigid body structure of each frame, where p is the number of columns in the observation matrix $err S (j) = (\sum_{i} \sqrt{{(S_{(3 i - 1) j}^{1} - S_{(3 i - 1) j})}^{2} + {(S_{(3 i - 2) j}^{1} - S_{(3 i - 2) j})}^{2} + {(S_{3 i j}^{1} - S_{3 i j})}^{2}}) / p, i = 1, \dots, f, j = 1, \dots, p$ 7

1-D DCT is adopted for error $err S (j)$ , where z is the number of frames

G_{i} = \frac{1}{2} err S (1) + \sum_{k = 2}^{z} err S (k) cos [\frac{π}{z} (i + \frac{1}{2}) k], z = 1, \dots, f

Determine the spectrum of G_i and the magnitude of

|G_{i}|

Select a combination of trajectory basis according to the obtained error spectrum amplitude. Then, select K frequency points with the largest amplitude and use them to represent the actual error level τ. If the corresponding amplitude of the selected K points satisfies the following expression

\frac{\sum_{i}^{K} | G |_{i}}{\sum_{i}^{n} | G |_{i}} \geq τ

then the position of the frequency point corresponding to the maximum amplitude of K points can be determined.

Extract the corresponding K locus basis from the locus base space according to K frequency points. The final trajectory basis is further determined in the trajectory space so as to determine the trajectory cardinal order and combination form.

Compare the structural errors after the automatic selection of trajectory basis and sequential reconstruction of trajectory basis, and select the combination of trajectory basis with lowest reconstruction error as the final combination of trajectory basis.

Automatically select trajectory basis-probability model

Solution and improvement of correlation matrix C

In real life, deformations observed in sports are often not singular, such as the movement of the face or the entire body, and there is similarity between the points of the objects under movement. Therefore, in the normal distribution of a matrix, we utilize a symmetric matrix C as a covariance matrix. Then, we assume that the observation matrix W is formed by a low-rank matrix C combined with the noise term E. Therefore, the following idealized robust principal components analysis (PCA) problem can be obtained.

For the observation matrix W, W = C + E, according to Liu et al.,³⁰ where C is a low-rank matrix and E is a sparse matrix. We can obtain a conceptual solution to the above problem, which can be expressed as follows

min_{C, E} {‖C‖}_{*} + λ {‖E‖}_{1} subject to C + E = W

where $| | . | |_{*}$ is the nuclear norm and $| | . | |_{1}$ is the L ₁ norm. L ₁ norm and L ₀ norm can be sparse, and L ₁ is widely used because of its superior optimal solution characteristics compared with those of L ₀. Agudo and Moreno-Noguer²⁸ introduced C into the covariance matrix of the probability model. According to Costeira and Kanade,¹² the method for solving the covariance matrix is modified to: X = XC + E. Therefore, we convert the expression of the robust PCA problem to

\underset{C, E}{arg min | | C | |_{*}} + λ | | E | |_{1} subject to X = X C + E

For this problem, we can use the exact augmented Lagrange multiplier method to solve the problem. However, the operational cost is relatively high. Therefore, an inexact augmented Lagrange multiplier (IALM) method is used to solve the problem. Compared with the exact algorithm, the inexact Lagrange multiplier method³¹ has a considerable improvement in computing speed while maintaining accuracy.

In the IALM algorithm, when $μ_{k}$ increases linearly, the exact Lagrange multiplier method will converge linearly. When $μ_{k}$ increases rapidly, the convergence speed will also be faster. However, when $μ_{k}$ is very large, the convergence speed of solution will be low. Therefore, the IALM algorithm starts by reducing the computation time of subproblems.

In the precise Lagrange multiplier method, the IT algorithm is used to solve the subproblem $(C_{k + 1}^{}, E_{k + 1}^{}) = \underset{C, E}{arg min} L (C, E, Y_{k}^{}, μ_{k})$ iteratively, where $C_{k + 1}$ and $E_{k + 1}$ are, respectively, the values of C and E, updated after the KTH iteration, $μ_{k}$ is the penalty parameter, and Y_k is the Lagrange multiplier. According to the experiment,³¹ for $C_{k + 1}$ and $E_{k + 1}$ , we do not need to be precise in the above subproblems. A solution of SVD can obtain a relatively approximate optimal solution, which is sufficient to achieve the desired effect. Therefore, in the IALM algorithm, we removed the iterative solution of the subproblem by using the IT algorithm, and replaced IT with a direct SVD solution, as follows

(U, S, V) = SVD (W - E_{k} + μ_{k}^{- 1} Y_{k})

C_{k + 1} = U χ_{μ_{k}^{- 1}} [S] V^{T}

E_{k + 1} = χ_{λ μ_{k}^{- 1}} [W - C_{k + 1} + μ_{k}^{- 1} Y_{k}]

Among them

χ_{ε} [x] = \{\begin{matrix} x - ε & if x > ε \\ x + ε & if x < - ε \\ 0 & otherwise \end{matrix}

where x is represented as a soft threshold, $x = W - C_{k + 1} + μ_{k}^{- 1} Y_{k}$ .

In this manner, one layer of iterative loop can be removed, considerably increasing the calculation time. After obtaining the accelerated correlation matrix $C_{k + 1}$ , we normalize it and force it to be unitary on the diagonal

C^{*} = C_{k + 1} ⊙ (1_{N} 1_{N}^{T} - I_{N}) + I_{N}

where ⊙ represents the hada code product and $1_{N}$ represents a vector.

Adjusting prior rows and column covariance

Matrix normal distribution uses Kronecker covariance to provide the idea of natural combination around data. Specifically, the normal random variable X in the matrix represents the matrix itself. The distribution is parameterized by the mean matrix and two covariance matrices, which represent the covariance of the rows and columns of matrix X.

The prior covariance matrix has a considerable influence on the 3-D structure restoration. The processed prior covariance can increase the stability and accuracy of 3-D structure restoration. In other words, prior covariance with higher accuracy can provide better restoration to the 3-D structure movement. Therefore, we adjusted the initial row covariance $I_{3 K}$ and column covariance inverse $C^{*} ^{- 1}$ accordingly, where K is the number of trajectory basis

vec (Φ) \sim N (μ; Σ_{c} \otimes Σ_{s})

Then, the matrix normal distribution of Θ is

P (Φ; μ, Σ_{c}, Σ_{s}) = {(2 π)}^{- p q / 2} {|{(Σ_{c} \otimes Σ_{s})}^{- 1}|}^{1 / 2} ×exp \{- \frac{1}{2} vec {(Φ - μ)}^{T} {(Σ_{c} \otimes Σ_{s})}^{- 1} vec (Φ - μ)\}

According to Gupta and Nagar,³² equation (18) can be simplified as follows

P (Φ; μ, Σ_{c}, Σ_{s}) = {(2 π)}^{- p q / 2} {|Σ_{s}|}^{- q / 2} {|Σ_{c}|}^{- p / 2} ×exp \{- \frac{1}{2} t r [Σ_{s}^{- 1} (Θ - μ) Σ_{c}^{- 1} {(Θ - μ)}^{T}]\}

Then, the logarithmic likelihood function of parameter $Θ = (μ, Σ_{c}, Σ_{s})$ is

ln P (Θ; μ, Σ_{c}, Σ_{s}) = \frac{p}{2} ln |Σ_{c}^{- 1}| + \frac{q}{2} ln |Σ_{s}^{- 1}| - \frac{1}{2} D_{Σ} (Θ, μ)

where $D_{Σ} (Θ, μ) = t r [Σ_{s}^{- 1} (Θ - μ) Σ_{c}^{- 1} {(Θ - μ)}^{T}]$ , according to equation (20), and the maximum likelihood estimation (MLE) can be obtained as follows

{\hat{Σ}}_{c} = \frac{1}{p} {(Θ - μ)}^{T} {\hat{Σ}}_{s}^{- 1} (Θ - μ), {\hat{Σ}}_{s} = \frac{1}{q} (Θ - μ) {\hat{Σ}}_{c}^{- 1} {(Θ - μ)}^{T}

However, for any $k \neq 0$ , we define ${\tilde{Σ}}_{c} = k Σ_{c}, {\tilde{Σ}}_{s} = Σ_{s} / k$ and then ${\tilde{Σ}}_{c} \otimes {\tilde{Σ}}_{s} = Σ_{c} \otimes Σ_{s}$ . Both estimates produce the same covariance matrix of the population. To solve this problem, according to Glanz and Carvalho,³³ we propose the following modifications to the model

vec (Θ) ∼ N (vec (μ), ς^{2} Σ_{c} \otimes Σ_{s})

To estimate $ς^{2}$ , similar to equation (20), the new likelihood function can be obtained

ln P (Θ; ς^{2}, μ, Σ_{c}, Σ_{s}) = \frac{p}{2} ln |Σ_{c}^{- 1}| + \frac{q}{2} ln |Σ_{s}^{- 1}| - \frac{p q}{2} ln ς^{2} - \frac{1}{2 ς^{2}} D_{Σ} (Θ, μ)

The MLE generating $ς^{2}$ is

{\hat{ς}}^{2} = \frac{1}{p q} {(Θ - μ)}^{T} {({\hat{Σ}}_{c} \otimes {\hat{Σ}}_{s})}^{- 1} (Θ - μ)

However, because the variance scale is determined by $ς^{2}$ , when $ς^{2} \leq 1$ , the update of matrix ${\hat{Σ}}_{c}$ and ${\hat{Σ}}_{s}$ will be abnormal. This study uses the empirical value instead. We need to consider the scale constraints of $Σ_{c}$ and $Σ_{s}$ when deriving their MLE. For this, we use the results of³³

\begin{array}{l} {\hat{Σ}}_{c} = ADJUST \{p, {\hat{ς}}^{2}, {(Θ - μ)}^{T} {\hat{Σ}}_{s}^{- 1} (Θ - μ)\} \\ {\hat{Σ}}_{s} = ADJUST \{q, {\hat{ς}}^{2}, (Θ - μ) {\hat{Σ}}_{c}^{- 1} {(Θ - μ)}^{T}\} \end{array}

Parameter solution

The observed 2-D locus point matrix W is accompanied by Gaussian noise, which is represented by the matrix $N \in ℝ^{2 F \times P}$ . W can be redefined as follows

W = A Φ + N

According to Agudo and Moreno-Noguer,²⁸ we can include the accelerated correlation matrix $C^{*}$ in the probability model in the form of covariance

p (vec (Φ)) \sim N (vec (0); {\hat{Σ}}_{c} \otimes {\hat{Σ}}_{s})

p (vec (N)) \sim N (vec (0); σ^{2} I_{2 F} \otimes {\hat{Σ}}_{s})

where $vec (\cdot)$ represents the vectorization operator of a matrix.

E step: In step E, we estimate the conditional distribution of potential variable $Θ$ . We apply Bayes’ rule to the equation. According to some properties of matrix variational normal distribution,^32,34 it can be known that this distribution is also a Gaussian distribution

p (vec (Φ | W, A, σ^{2})) \sim N (vec (γ_{Φ}); Σ_{Φ} \otimes {\hat{Σ}}_{s})

γ_{Φ} = {(A^{T} A + σ^{2} {\hat{Σ}}_{c})}^{- 1} A^{T} W

Σ_{Φ} = σ^{2} N {(A^{T} A + σ^{2} {\hat{Σ}}_{s})}^{- 1}

M step: We update model parameters A and $σ^{2}$ and obtain

\begin{array}{l} A_{j + 1} \leftarrow D A_{j} {(σ^{2} {\hat{Σ}}_{c} + {(A_{j}^{T} A_{j} + σ^{2} {\hat{Σ}}_{c})}^{- 1} A_{j}^{T} D A_{j})}^{- 1} \\ σ_{j + 1}^{2} \leftarrow \frac{1}{2 T} t r (D - D A_{j} {(A_{j}^{T} A_{j} + σ^{2} {\hat{Σ}}_{c})}^{- 1} A_{j + 1}^{T}) \end{array}

where matrix D, which is a non-singular matrix, corresponds to the covariance matrix of the central observation as

D = N^{- 1} W {\hat{Σ}}_{s}^{- 1} W^{T}

Experimental results

Selection of base

The number and combination of trajectory basis have a considerable influence on the structural errors of reconstructing 3-D nonrigid bodies. This study uses the spectrum analysis method to analyze the frame error of an actual nonrigid structure S of the known data set and the SVD decomposition of the observation matrix W to obtain a nonrigid structure S ¹. The number of trajectory basis K and the combination form of trajectory basis are determined. Then, the structural errors after the automatic selection of trajectory basis and sequential trajectory basis are compared, and the combination of trajectory basis with smaller reconstruction errors is taken as the final trajectory basis.

The recovery method of Figure 1 is based on previous studies,^6

–9 which also employ a similar nonrigid reconstruction method. Figure 1 shows the trajectory basis combination of the yoga 3-D structure of the nonrigid body restoration shown in the literature.^{3,5,10
–12,14
–16,19} The first image on the left is the elevation diagram of frame 50, the second image is of frame 140, the third image is of frame 210, and the fourth image is of frame 240.

Figure 1.

3-D structure diagram reconstructed by automatically selecting the trajectory basis. 3-D: three-dimensional.

Figure 2 shows the mean error comparison of 3-D points of the yoga data set of the frames of 50, 100, 150, 200, 250, and 300 for the automatic selection base method and the sequential selection method; the mean errors were obtained for K tracks of the cardinal head. It can be clearly seen that the automatic selection of basis is superior to the sequential selection of basis.

Figure 2.

Mean error of 3-D points of yoga data set and trajectory basis obtained sequentially. 3-D: three-dimensional.

However, different data sets adopt different trajectory basis forms. For example, in the drink data set, the sequential selection of basis performs better than the automatic selection of basis, as shown in Figure 3.

Figure 3.

Mean error of 3-D points in drink data set and trajectory basis obtained sequentially. 3-D: three-dimensional.

Method comparison

For the quantitative evaluation, we follow the indicators used in the studies of Dai et al.³⁵ and Gotardo and Martinez³ to show the average rotation error e_R and standardized average 3-D error e_S , which are defined as

e_{R} = \frac{1}{F} \sum_{f = 1}^{F} | | {\bar{R}}^{f} - R^{f} | |_{F}

In frame f, R^f is the estimated rotation matrix and ${\bar{R}}^{f}$ is the corresponding real rotation of the ground. e_S is calculated as

e_{S} = \frac{1}{υ F P} \sum_{f = 1}^{F} \sum_{p = 1}^{P} e_{p}^{f}, υ = \frac{1}{3 T} \sum_{f = 1}^{F} (σ_{x}^{f} + σ_{y}^{f} + σ_{z}^{f})

where $e_{p}^{f}$ is the 3-D reconstruction error of point p in the f coordinate system. $σ_{x}^{f}$ , $σ_{y}^{f}$ , and $σ_{z}^{f}$ represent the standard deviation of the x, y, and z coordinates of the original shape in the f coordinate system. When the surface truth 3-D data or rotation data are available, we provide e_S and e_R , respectively, where K is the number of trajectory basis. Table 1 shows the errors of the five 3-D reconstruction methods. Because of the lack of codes of comparative papers,^10,22,35 the errors of these data are derived from the study of Agudo and Moreno-Noguer.²⁸

Table 1.

Average rotation error e_R .^a

Met		e_R
Met		EM-PPCA¹⁰	PTA²²	PCPTA²⁸	BMM³⁵	Ours
Data	Drink	0.186	0.006	0.006	0.007	0.004
	Stretch	0.749	0.055	0.055	0.068	0.057
	Yoga	0.688	0.162	0.106	0.088	0.105
	Pick-up	0.417	0.154	0.155	0.121	0.151
	Dance	—	—	—	—	—

EM-PPCA: expectation-maximization-probabilistic principal components analysis; PTA: point trajectory approach.

^a The missing data in Table 1 is due to the lack of reality rotation matrix R of dance data so that the error cannot be analyzed.

In the experiment, the K values of the five data sets in the subspace are yoga (K = 11), pick-up (K = 12), drink (K = 13), stretch (K = 12), and dance (K = 5).

Table 1 shows average rotation error e_R and Table 2 shows standardized average 3-D error $e_{S} (K)$ .

Further, the stretch is used as the experimental data set. The blue trajectory represents the error curve after the reconstruction of the automatic selection of basis, and the red trajectory represents the error curve after the reconstruction of the sequential selection of trajectory basis. The graph shows the coordinates of each feature point in each dimension. The reconstruction error of the automatic trajectory basis selection method was low. The reconstruction error shown in the figure was obtained using the following equation

err X_{j} = \sqrt{{\sum_{i} ‖ T x_{i j} - \hat{T} x_{i j} ‖}^{2}} / n, i = 1, \dots, m; j = 1, \dots, n

err Y_{j} = \sqrt{{\sum_{i} ‖ T y_{i j} - \hat{T} y_{i j} ‖}^{2}} / n, i = 1, \dots, m; j = 1, \dots, n

err Z_{j} = \sqrt{{\sum_{i} ‖ T z_{i j} - \hat{T} z_{i j} ‖}^{2}} / n, i = 1, \dots, m; j = 1, \dots, n

where $T x, T y, T z$ are the initial motion trajectories before reconstruction and $\hat{T} x, \hat{T} y, \hat{T} z$ are the motion trajectories obtained after reconstruction.

Table 2.

Standardized average 3-D error $e_{S} (K)$ .

Met		$e_{S} (K)$
Met		EM-PPCA¹⁰	PTA²²	PCPTA²⁸	BMM³⁵	Ours
Data	Drink	0.261 (7)	0.025 (13)	0.0274 (13)	0.027 (12)	0.0277 (10)
	Stretch	0.458 (7)	0.109 (12)	0.1620 (12)	0.103 (11)	0.0785 (11)
	Yoga	0.445 (8)	0.1447 (12)	0.1623 (12)	0.115 (10)	0.1659 (10)
	Pick-up	0.423 (14)	0.237 (12)	0.2368 (12)	0.173 (12)	0.2870 (9)
	Dance	0.339 (4)	0.296 (5)	0.257 (4)	0.188 (10)	0.205 (5)
	Average error	0.385	0.166	0.1691	0.121	0.152

3-D: three-dimensional.

Figures 4 and 5 show an analysis of the reconstruction errors of the trajectory basis^{3,5,10
–12,14

–18,20} and the first 11 trajectory basis in order; moreover, a comparison of the reconstruction errors and overall errors of each point in X, Y, and Z coordinates is shown. Figure 6 shows the percentage of frames 50, 100, 150, 200, 250, and 300 that are lower than the number of basis points that are selected sequentially. It can be clearly seen that the error level of 11 reconstructed trajectory basis is much lower than that of the trajectory basis selected in sequence. That is, the reconstruction accuracy and time are considerably improved, which demonstrates that the method proposed in this study can improve efficiency on the premise of ensuring reconstruction accuracy.

Figure 4.

Errors obtained when stretching at frame 50 at X, Y, and Z coordinates of each feature point and overall reconstruction errors.

Figure 5.

Errors obtained when stretching reaches the 200th frame at X, Y, and Z coordinates of each feature point and overall reconstruction errors.

Figure 6.

Stretched data set in frames 50, 100, 150, 200, 250, and 300 is better than the percentage graph of sequential DCT basis points. DCT: discrete cosine transform.

Table 3 compares the execution time (in second) of probabilistic correlation point trajectory approach (PCPTA)²⁸ with that of block matrix method (BMM),²⁵ which are two highly accurate and advanced methods to recover 3-D nonrigid body structures. All methods are executed in MATLAB 2018b, where K is the number of locus basis.

Table 3.

Reconstruction errors and reconstruction time of the four methods.

Trajectory of the base	BMM³⁵	PCPTA²⁸	Proposed method without acceleration	Proposed method with acceleration
Reconstruct the expected value of the error	0.103 (K = 11)	0.1620 (K = 12)	0.0785 (K = 11)	0.0998 (K = 11)
Reconstruction time	46.23	16.32	28.34	18.45

From Table 2, we can see that the performance of the proposed method is better in terms of time and accuracy. Before acceleration, the method shows a lower error level. Its emphasis is that the value of C of the correlation matrix has a higher accuracy. However, although the method of determining the acceleration has slightly increased the error, the speed increased significantly. Unfortunately, the source code of the first two methods is not open. Thus, we cannot complete the comparison of noise observations. The reason why our unaccelerated method is longer than the PCPTA method is that we use the automatic selection base and the ADJUST method.

Figures 7 to 9 show the restored graph of the pick-up data set recovered by the algorithm of the proposed automatically selecting trajectory basis-probability model after acceleration. The sparse 3-D nonrigid body recovery structures of frames 5, 50, 150, and 200 are shown in the order of left to right. Here, $τ = 0.96$ , trajectory basis item K = 9, and the combination of the base positions of the obtained trajectory is selected as described in the literature.^{3,5,10,11,14,15,18,19} It can be seen that, compared with the previous methods,^10,22,28,35 the reduction in trajectory basis purpose does not significantly affect the recovery accuracy, but can accelerate the recovery efficiency.

Figure 7.

Pick-up.

Figure 8.

Pick-up left view.

Figure 9.

Pick-up top view.

Figure 10 shows an example of the dinosaur toy, the recovery structures of frames 1, 8, 42, and 67. Figure 11 shows an example of cubes toys, the recovery structures of frames 1, 72, 197, and 200, respectively. In the cubes model, trajectory basis item K = 2, the trajectory is selected as described in the literature.^5,10 It should be mentioned that although both cubes are rigid individually, they are connected as a whole by a wire. So, when another wire which connects to one cube is pulled, they move like a no-rigid body as a whole. This kind of movement is simpler than that of the human body or dinosaur toy. Therefore, the number of K here is relatively low. In the dinosaur model, the trajectory is selected as described in the literature.^{3,5,10,12

–16,18,19,24} It can be seen from the figure that the method in this article can recover the 3-D nonrigid structure of two toy models well. The data set is from http://mocap.cs.cmu.edu. In particular, the examples of dinosaur toy and cubes toy lack the reality 3-D structure matrix S so that the standardized average 3-D error e_S analysis cannot be analyzed.

Figure 10.

Examples of dinosaur toy.

Figure 11.

Examples of cubes toy.

Adjustment to $ς^{2}$

For the dance data set, when the parameter $ς^{2}$ in equations (5) to (13) is obtained as 0.2072, which is less than 1, ${\hat{Σ}}_{c}$ and ${\hat{Σ}}_{s}$ matrix updates are abnormal. At this time, we can manually adjust the value. When setting $ς^{2}$ = 1000, we took the trajectory basis mesh K = 5, and selected the sequence for restoration. We obtained the standardized average 3-D error as e_S = 0.2057. When $ς^{2} \leq 1$ , the range of artificial regulation $ς^{2}$ is $1 \leq ς^{2} \leq 1500$ .

Conclusion

This study adopts a model of automatic selection of trajectory basis combined with a probability framework. The number and combination of trajectory basis are obtained such that the trajectory basis can maximize the reconstruction accuracy for recovering nonrigid structures. The latter incorporates the low-rank trajectory model into the probability framework of a matrix normal distribution and can also improve the restoration efficiency of a 3-D nonrigid structure within the allowable range of error accuracy by using more precise prior conditions. The combination of the two methods can achieve the accurate reconstruction of sparse data sets. More importantly, the proposed reconstruction method is more accurate and efficient than most previous methods. In the days to come, we want to enhance the solution of the correlation matrix C, not only in accuracy, but also in solution speed. In addition, we hope to find or synthesize useful dense data sets for 3-D nonrigid reconstruction. After all, dense data sets are closer to actual daily life activities and more authentic; however, this will be a huge challenge.

Footnotes

Acknowledgement

Thanks to the test data set provided by .

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Natural Science Foundation of Zhejiang Province (LZ20F020003, LY17F020034, LSZ19F010001), the National Natural Science Foundation of China (61272311, 61672466), and the 521 Project of Zhejiang Sci-Tech University.

ORCID iD

Wenqing Huang

References

Del Bue

Llad

Agapito

Non-rigid metric shape and motion recovery from uncalibrated images using priors. In: IEEE conference on computer vision and pattern recognition, New York, NY, USA, 17–22 June 2006, pp. 1191–1198. IEEE.

Antonio

Moreno-Noguer

. Force-based representation for non-rigid shape and elastic model estimation. IEEE Trans Patt Anal Mach Intell 2018; 40(9): 2137–2150.

Gotardo

PFU

Martinez

. Computing smooth time-trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Trans Patt Anal Mach Intell 2011; 33(10): 2051–2065.

Zaheer

Akhter

Baig

, et al. Multiview structure from motion in trajectory space. In: IEEE international conference on computer vision, Barcelona, Spain, 6–13 November 2011, pp. 2447–2453. IEEE.

Tomasi

Kanade

Shape and motion from image streams under orthography: a factorization approach. Int J Comput Vis 1992; 9(2): 137–154.

de Jesus Rubio

Enrique

. Recursive least squares for a manipulator which learns by demonstration [mínimos cuadrados recursivos para un manipulator que aprende por demostración]. Rev Iberoam Autom Inform Ind 2019; 16(2): 147–158.

de Jesus Rubio

. Discrete time control based in neural networks for pendulums. Appl Soft Comput 2018; 68: 821–832.

Kumar

Rana

KPS

. Design of robust fractional order fuzzy sliding mode PID controller for two link robotic manipulator system. J Intell Fuzzy Syst 2018; 35(5): 5301–5315.

Rubio

. Modified optimal control with a back propagation network for robotic arms. IET Control Theory Appl 2012; 6(14):2216–2225.

10.

Torresani

Hertzmann

Bregler

. Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. IEEE Trans Patt Anal Mach Intell 2008; 30(5): 878–892.

11.

Bregler

Hertzmann

Biermann

Recovering non-rigid 3D shape from image streams. In: IEEE conference on computer vision and pattern recognition, Hilton Head Island, SC, USA, 15 June 2000, pp. 690–696. IEEE.

12.

Costeira

Kanade

. A multibody factorization method for independent moving objects. Int J Comput Vis 1998; 29(3): 159–179.

13.

Bascle

Blake

. Separability of pose and expression infacial tracing and animation. In: Proceedings of international conference on computer vision, Bombay, India, 7 January 1998, pp. 323–328. IEEE.

14.

Torresani

Yang

Alexander

, et al. Tracking and modeling non-rigid objects with rank constraints. In: Proceedings of IEEE conference on computer vision and pattern recognition, Kauai, HI, USA, 8–14 December 2001, pp. 493–500. IEEE.

15.

Torresani

Hertzmann

Bregler

Learning non-rigid 3D shape from 2D motion. In: Advances in neural information processing systems, British Columbia, Canada, 13–16 December 2004, pp. 1555–1562.

16.

Rabaud

Belongie

. Re-thinking non-rigid structure from motion. In: IEEE conference on computer vision and pattern recognition, Anchorage, AK, USA, 23 June 2008, pp. 1–8. IEEE.

17.

Agudo

Moreno-Noguer

Learning shape, motion and elastic models in force space. In: IEEE international conference on computer vision, Araucano Park, Las Condes, Chile, 11–18 December 2015, pp. 756–764. IEEE.

18.

Kong

Lucey

. Prior-less compressible structure from motion. In: IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 4123–4131. IEEE.

19.

Agudo

Moreno-Noguer

. Combining local-physical and global-statistical models for sequential deformable shape from motion. Int J Comput Vis 2017; 122(2): 371–387.

20.

Agudo

Moreno-Noguer

Calvo

, et al. Real-time 3D reconstruction of non-rigid shapes from single moving camera. Comput Vis Image Underst 2016; 153(12): 37–54.

21.

Lee

Cho

Choi

, et al. Procrustean normal distribution for non-rigid structure from motion. In: IEEE conference on computer vision and pattern recognition, Portland, OR, USA, 23–28 June 2013, pp. 1280–1287. IEEE.

22.

Akhter

Sheikh

Khan

, et al. Nonrigid structure from motion in trajectory space. In: Advances in neural information processing systems, Vancouver, Canada, 8–10 December 2008, pp. 41–48.

23.

Gotardo

PFU

Martinez

. Non-rigid structure from motion with complementary rank-3 space. In: Computer vision and pattern recognition, Colorado Springs, CO, USA, 20 June 2011, pp. 3065–3072. IEEE.

24.

Zhu

Cox

Lucey

. 3D motion reconstruction for real-world camera. In: Computer vision and pattern recognition, Colorado Springs, CO, USA, 20 June 2011, pp. 1–8. IEEE.

25.

Gonzalez

Woods

. Digital image processing. 2nd ed. Upper Saddle River: Prentice Hall, 2002.

26.

Park

Shiratori

Matthews

, et al. 3D reconstruction of a moving point from a series of 2D projections. In: European conference on computer vision, Heraklion, Crete, Greece, 5 September 2010, pp. 158–171. Springer.

27.

Valmadre

Lucey

. General trajectory prior for non-rigid reconstruction. In: IEEE conference on computer vision and pattern recognition, Providence, RI, USA, 16 June 2012, pp. 1394–1401. IEEE.

28.

Agudo

Moreno-Noguer

. Scalable, efficient, and accurate solution to non-rigid structure from motion. Comput Vis Image Underst 2018; 167: 121–133.

29.

Wang

Cheng

Zheng

, et al. Analysis of wavelet basis selection in optimal trajectory space finding for 3D non-rigid structure from motion. Int J Wavelets Multiresolution Inf Process 2014; 12(2): 1450023.

30.

Liu

Lin

Yan

, et al. Robust recovery of subspace structures by low-rank representation. IEEE Trans Patt Anal Mach Intell 2012; 35(1): 171–184.

31.

Lin

Chen

. The augmented Lagrange multiplier for exact recovery of corrupted low-rank matrices. ArXiv preprint arXiv:1009.5055 , 2010.

32.

Gupta

Nagar

. Matrix variate distributions. London: Chapman & Hall/CRC, 2000.

33.

Glanz

Carvalho

. An expectation–maximization algorithm for the matrixnormal distribution with an application in remote sensing. J Multivar Anal 2018; 167: 31–48.

34.

Tipping

Bishop

. Mixtures of probabilistic principal component analysers. Neural Comput 1999; 11(2): 443–482.

35.

Dai

. A simple prior-free method for non-rigid structure from motion factorization. Int J Comput Vis 2014; 107(2): 101–122.

Three-dimensional nonrigid reconstruction based on probability model

Abstract

Keywords

Introduction

Related work

Low-rank model NRSFM

Updating the transition matrix

Automatic base selection

Automatically select trajectory basis-probability model

Solution and improvement of correlation matrix C

Adjusting prior rows and column covariance

Parameter solution

Experimental results

Selection of base

Method comparison

Adjustment to ς 2

Conclusion

Footnotes

Acknowledgement

Declaration of conflicting interests

Funding

ORCID iD

References

Adjustment to $ς^{2}$