Abstract
Most nonrigid motions use shape-based methods to solve the problem; however, the use of discrete cosine transform trajectory-based methods to solve the nonrigid motion problem is also very prominent. The signal undergoes discrete transformation due to the transform characteristics of the discrete cosine transform. The correlation of the data is well extracted such that a better compression of data is achieved. However, it is important to select the number and sequence of discrete cosine transform trajectory basis appropriately. The error of reconstruction and operational costs will increase for a high value of K (number of trajectory basis). On the other hand, a lower value of K would lead to the exclusion of information components. This will lead to poor accuracy as the structure of the object cannot be fully represented. When the number of trajectory basis is determined, the combination form has a considerable influence on the reconstruction algorithm. This article selects an appropriate number and combination of trajectory basis by analyzing the spectrum of re-projection errors and realizes the automatic selection of trajectory basis. Then, combining with the probability framework of normal distribution of a low-order model matrix, the energy information of the high-frequency part is retained, which not only helps maintain accuracy but also improves reconstruction efficiency. The proposed method can be used to reconstruct the three-dimensional structure of sparse data under more precise prior conditions and lower computational costs.
Keywords
Introduction
Three-dimensional (3-D) reconstruction encompasses many fields such as image processing, stereoscopic vision, and biological engineering, and has attracted considerable research interest in computer vision. The 3-D motion reconstruction of a nonrigid body is an important technique for virtual representation of the objective world. Generally, 3-D motion reconstruction involves recovering camera rotation matrix R and 3-D structure S of a nonrigid body from a given set of 2-D dynamic image sequences.
There are currently four mainstream reconstruction schemes: shape-based 3-D reconstruction, 1 force-based 3-D reconstruction, 2 shape-trajectory-based 3-D reconstruction, 3 and trajectory-based 3-D reconstruction. 4 Shape-based 3-D reconstruction has the advantages of simplicity and convenience, but in the reconstruction process, the shape basis reconstructed for all the different sequences will lead to the introduction of a large number of unknowns, which makes the algorithm complex and limits its scope. The advantages of force-based 3-D reconstruction are that it is based on the deformed low-rank force space to formulate the problem, which can better explain the acquired a priori information and more accurately represent the behavior of the actual object but in the process of reconstruction, in addition to determining the force and reality. In addition to rotating the data S, it is also necessary to estimate the elastic model of the object, which increases the uncertainty of the reconstruction result. The advantage of the 3-D reconstruction based on the shape trajectory is that it combines the advantages of both the trajectory basis and shape basis; however, the disadvantage is that the a priori unknown is added. Although the reconstruction-based trajectory-based method solves the limitations of the above three methods, the number and type of predefined trajectory basis in this method are difficult to select. The number of trajectory basis and the choice of trajectory basis directly affect the reconstruction accuracy. In particular, the main difficulty in solving the nonrigid motion (NRSFM) problem is that many different 3-D graphics can produce similar observation images, and uniquely considering the re-projection constraints is not sufficient to obtain a single solution of the shape. Therefore, there is a need for more prior knowledge of the deformation of the structure and the motion of the camera. The method based on automatic selection basis-probability model proposed in this article can effectively solve the problems caused by the number of trajectory basis, combination of trajectory basis, and complex prior knowledge.
Related work
Most existing methods adopt the matrix decomposition algorithm to decompose a rigid reconstruction 5 –9 and use prior information in the form of low-rank shape basis. 10,11 Similarly, a low-rank model is proposed to constrain the motion of every point on the object through a predefined trajectory basis. 3 The disadvantage of these methods is that they need to be decomposed into a destination matrix that is proportional to the input points, and they can only be applied to relatively low-resolution shapes.
In the NRSFM, 2-D point tracks obtained only from camera motion and reconstruction of time-varying 3-D shapes are an unconstrained problem. This is because the observation results of different 3-D objects and 2-D images obtained using the camera are similar. A few algorithms have been proposed for solving the rigid constraints in a nonrigid body in the process of solving the 3-D motion reconstruction of a nonrigid body by factorization. Costeira and Kanade 12 constructed an orthogonal projection model and then applied the factorization method to reconstruct the structure and motion of independent moving objects. However, its application scope was limited; thus, it could not satisfy the changeable linear combination. Bascle and Blake 13 proposed the decomposition of the basic shape of a reconstruction target into a group of linear combinations of basic shapes, and the reconstruction problem was simplified into a problem of solving for the basic shape coefficient, which was also the prototype of solving the 3-D motion reconstruction problem of a nonrigid body with the model based on shape basis. Torresani et al. 14 adopted low-rank constraints to track a nonrigid body, that is, a time constraint method combining the simulation of the shape basis coefficient as a linear dynamic system 15 and the establishment of a nonrigid body deformation distribution model with layered factors. 10 Rabaud and Belongie 16 removed the linear basis representation and proposed a method for learning the shape structure from a video. The advantage of this method is that it combines time specifications to prevent the camera and structure from having excessive and blurred changes between frames. Agudo and Moreno-Noguer 17 introduced the force model into 3-D nonrigid reconstruction, which has the advantage of formulating problems based on the low-rank force space of deformation, and better physical interpretation of the obtained prior information. However, in the process of reconstruction, if there is a lack of force and real data, the reconstruction will become difficult. Recently, the research direction of NRSFM introduced the concept of compressibility to enhance the joint of subspaces. Each shape instance 18 uses a different set of shape basis. The application of the above model converts the NRSFM into a three-wire problem, which can be solved using the decomposition technology 11 or optimization strategy, implementing smooth spatial, 10 temporal, 19,20 or tight 3-D shapes. 21 The advantage of this method is that it can reduce excessive and fuzzy changes between the frames of a camera and structure.
Akhter et al. 22 proposed that the trajectory of each point can be limited to a low-dimensional subspace. The advantage of this method based on the trajectory basis is that the target unknown basis, such as discrete cosine transform (DCT), can be used to reduce the number of unknown parameters and improve the accuracy of a 3-D structure. Gotardo and Martinez 23 proposed a shape basis coefficient based on DCT by combining the shape basis and trajectory basis. Zhu et al. 24 pointed out that sequences with poor reconstruction ability could be remedied by adding rigid key frames. They also emphasized the necessity of selecting a trajectory cardinal order K, instead of using all the DCT basis or applying normalization rules to coefficient vectors to obtain a sparse solution set. However, Zhu et al. 24 could not exploit the known advantage of DCT coefficient distribution in natural signals. 25 In addition, many methods use predefined basis to restrict the trajectory of each target point, thus transforming the trilinear problem into a bilinear problem, 22 which can greatly simplify the trilinear operation problem. 26 In the study of Valmadre and Lucey, 27 a priori on the trajectory is introduced by using the differential of 3-D points. Its advantage is that it combines the shape basis and trajectory space. The advantage of combining the shape basis with the trajectory basis is that it can generate a smooth time trajectory of a nonrigid shape in the linear shape space. 3
In this study, we first use the automatic selection of the trajectory basis, which can not only effectively reduce the large K value used in the previous method, or the smaller K value, but can also reasonably select the combination of the trajectory basis. Firstly, the efficiency and accuracy of the reconstructed structure are maximized. Secondly, the matrix normal distribution establishes a model of the known trajectory space, 10,16 which combines the spatial smoothness with the inherent temporal smoothness of the subspace. Based on the probabilistic model proposed by Agudo and Moreno-Noguer, 28 this study adds accurate prior information and provides more accurate decomposition. The experiment shows the accuracy, versatility, and efficiency of our approach in sparse data sets.
Low-rank model NRSFM
The standard matrix decomposition method is generally used for the NRSFM problem. Suppose the motion model being studied consists of F frame images, P feature points are marked on each frame, and the 2-D position of each feature point of the image is marked as
According to Agudo and Moreno-Noguer,
28
where ⊗ represents the Kronecker product. Matrix B is the trajectory basis matrix obtained after automatically selecting the basis, and matrix R is a block diagonal, consisting of a
where
Updating the transition matrix
Because the SVD decomposition of matrices is not unique, matrices
Instead of the entire matrix Q, we only need to estimate the three columns of Q to correct for
Automatic base selection
The type, number, and combination of trajectory basis considerably influence the performance of the NRSFM algorithm, and the DCT trajectory basis is the optimal general trajectory basis. After the types of locus basis are determined, it is important to select the number and combination of locus basis.
In this article, an automatic selection algorithm based on trajectory basis is proposed. 29 The error in the actual 3-D structure S and first SVD decomposition S 1 are analyzed in the frequency domain space. In addition, the K value is expanded and compared with the 3-D shape error of the sequential trajectory basis restoration. The optimal trajectory basis is selected, thus reducing the 3-D reconstruction error greatly.
Automatically select trajectory basis-probability model
Solution and improvement of correlation matrix C
In real life, deformations observed in sports are often not singular, such as the movement of the face or the entire body, and there is similarity between the points of the objects under movement. Therefore, in the normal distribution of a matrix, we utilize a symmetric matrix C as a covariance matrix. Then, we assume that the observation matrix W is formed by a low-rank matrix C combined with the noise term E. Therefore, the following idealized robust principal components analysis (PCA) problem can be obtained.
For the observation matrix W, W = C + E, according to Liu et al., 30 where C is a low-rank matrix and E is a sparse matrix. We can obtain a conceptual solution to the above problem, which can be expressed as follows
where
For this problem, we can use the exact augmented Lagrange multiplier method to solve the problem. However, the operational cost is relatively high. Therefore, an inexact augmented Lagrange multiplier (IALM) method is used to solve the problem. Compared with the exact algorithm, the inexact Lagrange multiplier method 31 has a considerable improvement in computing speed while maintaining accuracy.
In the IALM algorithm, when
In the precise Lagrange multiplier method, the IT algorithm is used to solve the subproblem
Among them
where x is represented as a soft threshold,
In this manner, one layer of iterative loop can be removed, considerably increasing the calculation time. After obtaining the accelerated correlation matrix
where ⊙ represents the hada code product and
Adjusting prior rows and column covariance
Matrix normal distribution uses Kronecker covariance to provide the idea of natural combination around data. Specifically, the normal random variable X in the matrix represents the matrix itself. The distribution is parameterized by the mean matrix and two covariance matrices, which represent the covariance of the rows and columns of matrix X.
The prior covariance matrix has a considerable influence on the 3-D structure restoration. The processed prior covariance can increase the stability and accuracy of 3-D structure restoration. In other words, prior covariance with higher accuracy can provide better restoration to the 3-D structure movement. Therefore, we adjusted the initial row covariance
Then, the matrix normal distribution of Θ is
According to Gupta and Nagar, 32 equation (18) can be simplified as follows
Then, the logarithmic likelihood function of parameter
where
However, for any
To estimate
The MLE generating
However, because the variance scale is determined by
Parameter solution
The observed 2-D locus point matrix W is accompanied by Gaussian noise, which is represented by the matrix
According to Agudo and Moreno-Noguer,
28
we can include the accelerated correlation matrix
where
E step: In step E, we estimate the conditional distribution of potential variable
M step: We update model parameters A and
where matrix D, which is a non-singular matrix, corresponds to the covariance matrix of the central observation as
Experimental results
Selection of base
The number and combination of trajectory basis have a considerable influence on the structural errors of reconstructing 3-D nonrigid bodies. This study uses the spectrum analysis method to analyze the frame error of an actual nonrigid structure S of the known data set and the SVD decomposition of the observation matrix W to obtain a nonrigid structure S 1. The number of trajectory basis K and the combination form of trajectory basis are determined. Then, the structural errors after the automatic selection of trajectory basis and sequential trajectory basis are compared, and the combination of trajectory basis with smaller reconstruction errors is taken as the final trajectory basis.
The recovery method of Figure 1 is based on previous studies, 6 –9 which also employ a similar nonrigid reconstruction method. Figure 1 shows the trajectory basis combination of the yoga 3-D structure of the nonrigid body restoration shown in the literature. 3,5,10 –12,14 –16,19 The first image on the left is the elevation diagram of frame 50, the second image is of frame 140, the third image is of frame 210, and the fourth image is of frame 240.

3-D structure diagram reconstructed by automatically selecting the trajectory basis. 3-D: three-dimensional.
Figure 2 shows the mean error comparison of 3-D points of the yoga data set of the frames of 50, 100, 150, 200, 250, and 300 for the automatic selection base method and the sequential selection method; the mean errors were obtained for K tracks of the cardinal head. It can be clearly seen that the automatic selection of basis is superior to the sequential selection of basis.

Mean error of 3-D points of yoga data set and trajectory basis obtained sequentially. 3-D: three-dimensional.
However, different data sets adopt different trajectory basis forms. For example, in the drink data set, the sequential selection of basis performs better than the automatic selection of basis, as shown in Figure 3.

Mean error of 3-D points in drink data set and trajectory basis obtained sequentially. 3-D: three-dimensional.
Method comparison
For the quantitative evaluation, we follow the indicators used in the studies of Dai et al. 35 and Gotardo and Martinez 3 to show the average rotation error eR and standardized average 3-D error eS , which are defined as
In frame f, Rf
is the estimated rotation matrix and
where
Average rotation error eR .a
EM-PPCA: expectation-maximization-probabilistic principal components analysis; PTA: point trajectory approach.
a The missing data in Table 1 is due to the lack of reality rotation matrix R of dance data so that the error cannot be analyzed.
In the experiment, the K values of the five data sets in the subspace are yoga (K = 11), pick-up (K = 12), drink (K = 13), stretch (K = 12), and dance (K = 5).
Table 1 shows average rotation error eR
and Table 2 shows standardized average 3-D error
Further, the stretch is used as the experimental data set. The blue trajectory represents the error curve after the reconstruction of the automatic selection of basis, and the red trajectory represents the error curve after the reconstruction of the sequential selection of trajectory basis. The graph shows the coordinates of each feature point in each dimension. The reconstruction error of the automatic trajectory basis selection method was low. The reconstruction error shown in the figure was obtained using the following equation
where
Standardized average 3-D error
3-D: three-dimensional.
Figures 4 and 5 show an analysis of the reconstruction errors of the trajectory basis 3,5,10 –12,14 –18,20 and the first 11 trajectory basis in order; moreover, a comparison of the reconstruction errors and overall errors of each point in X, Y, and Z coordinates is shown. Figure 6 shows the percentage of frames 50, 100, 150, 200, 250, and 300 that are lower than the number of basis points that are selected sequentially. It can be clearly seen that the error level of 11 reconstructed trajectory basis is much lower than that of the trajectory basis selected in sequence. That is, the reconstruction accuracy and time are considerably improved, which demonstrates that the method proposed in this study can improve efficiency on the premise of ensuring reconstruction accuracy.

Errors obtained when stretching at frame 50 at X, Y, and Z coordinates of each feature point and overall reconstruction errors.

Errors obtained when stretching reaches the 200th frame at X, Y, and Z coordinates of each feature point and overall reconstruction errors.

Stretched data set in frames 50, 100, 150, 200, 250, and 300 is better than the percentage graph of sequential DCT basis points. DCT: discrete cosine transform.
Table 3 compares the execution time (in second) of probabilistic correlation point trajectory approach (PCPTA) 28 with that of block matrix method (BMM), 25 which are two highly accurate and advanced methods to recover 3-D nonrigid body structures. All methods are executed in MATLAB 2018b, where K is the number of locus basis.
Reconstruction errors and reconstruction time of the four methods.
From Table 2, we can see that the performance of the proposed method is better in terms of time and accuracy. Before acceleration, the method shows a lower error level. Its emphasis is that the value of C of the correlation matrix has a higher accuracy. However, although the method of determining the acceleration has slightly increased the error, the speed increased significantly. Unfortunately, the source code of the first two methods is not open. Thus, we cannot complete the comparison of noise observations. The reason why our unaccelerated method is longer than the PCPTA method is that we use the automatic selection base and the ADJUST method.
Figures 7
to 9 show the restored graph of the pick-up data set recovered by the algorithm of the proposed automatically selecting trajectory basis-probability model after acceleration. The sparse 3-D nonrigid body recovery structures of frames 5, 50, 150, and 200 are shown in the order of left to right. Here,

Pick-up.

Pick-up left view.

Pick-up top view.
Figure 10 shows an example of the dinosaur toy, the recovery structures of frames 1, 8, 42, and 67. Figure 11 shows an example of cubes toys, the recovery structures of frames 1, 72, 197, and 200, respectively. In the cubes model, trajectory basis item K = 2, the trajectory is selected as described in the literature. 5,10 It should be mentioned that although both cubes are rigid individually, they are connected as a whole by a wire. So, when another wire which connects to one cube is pulled, they move like a no-rigid body as a whole. This kind of movement is simpler than that of the human body or dinosaur toy. Therefore, the number of K here is relatively low. In the dinosaur model, the trajectory is selected as described in the literature. 3,5,10,12 –16,18,19,24 It can be seen from the figure that the method in this article can recover the 3-D nonrigid structure of two toy models well. The data set is from http://mocap.cs.cmu.edu. In particular, the examples of dinosaur toy and cubes toy lack the reality 3-D structure matrix S so that the standardized average 3-D error eS analysis cannot be analyzed.

Examples of dinosaur toy.

Examples of cubes toy.
Adjustment to
For the dance data set, when the parameter
Conclusion
This study adopts a model of automatic selection of trajectory basis combined with a probability framework. The number and combination of trajectory basis are obtained such that the trajectory basis can maximize the reconstruction accuracy for recovering nonrigid structures. The latter incorporates the low-rank trajectory model into the probability framework of a matrix normal distribution and can also improve the restoration efficiency of a 3-D nonrigid structure within the allowable range of error accuracy by using more precise prior conditions. The combination of the two methods can achieve the accurate reconstruction of sparse data sets. More importantly, the proposed reconstruction method is more accurate and efficient than most previous methods. In the days to come, we want to enhance the solution of the correlation matrix C, not only in accuracy, but also in solution speed. In addition, we hope to find or synthesize useful dense data sets for 3-D nonrigid reconstruction. After all, dense data sets are closer to actual daily life activities and more authentic; however, this will be a huge challenge.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Natural Science Foundation of Zhejiang Province (LZ20F020003, LY17F020034, LSZ19F010001), the National Natural Science Foundation of China (61272311, 61672466), and the 521 Project of Zhejiang Sci-Tech University.
