Sage Journals: Discover world-class research

Abstract

Triangle meshes are a crucial and powerful data type for three-dimensional (3D) shapes, extensively studied in the fields of computer vision and computer graphics. In this paper, we delve into the challenge of analyzing deforming 3D triangle meshes using deep neural networks. While existing methods extend graph-based deep learning to 3D triangle meshes using graph convolution, the lack of effective graph convolutional structures and pooling operations limits the learning capacity of their networks. We propose a variational autoencoder structure that integrates graph convolutional residual blocks with multilayer pooling to explore the latent space of 3D shapes for generation. This framework introduces graph convolutional residual blocks to address the issue of gradient vanishing in deep networks. By employing multilayer pooling and unpooling structures based on triangle mesh simplification, gradually reducing the spatial dimensions of the input, the model can extract more general features. This enables it to handle denser mesh models effectively. Extensive experiments demonstrate that our generalized framework can learn reasonable representations of deformable shape collections with minimal training examples. It produces competitive results across various applications, including shape generation and interpolation, requiring fewer training samples and outperforming state-of-the-art techniques.

Keywords

Mesh generation variational autoencoder mesh interpolation graph convolution

1. Introduction

With advancements in three-dimensional (3D) capture, modeling, and rendering, point clouds and triangular meshes have found widespread applications in augmented reality, virtual reality, gaming, and film production. Although point cloud data gives raw spatial position information, it does not capture the topological structure of 3D structures, limiting its applicability. Triangular meshes, on the other hand, as illustrated in Figure 1, are made up of a set of interconnected vertices, edges, and faces that are used to represent and define the geometric shape of 3D objects. As a result, an increasing number of academics prefer to examine triangular mesh data using neural networks, and this paper focuses on investigating deep neural networks in this sector. Triangular meshes, unlike photos, have complicated and irregular connections. Most existing works preserve mesh communication between layers, limiting the possibility of enhancing the receptive field when pooling techniques are used.

Figure 1.

Different Representations of 3D Data. Note. 3D = three-dimensional.

Deep learning has significantly aided the advancement of domains such as image processing, natural language processing, speech recognition, and others. They have also become increasingly significant in 3D data analysis in recent years. ShapeNets (Wu et al., 2015) initially proposed using voxels for 3D volumetric representation of shapes, with each voxel recording whether the corresponding spatial position was occupied by an object. The network may learn representations that capture shape data in 3D space by introducing voxel representation and applying 3D convolutions. PointNet (Qi et al., 2017a) demonstrated good performance in 3D classification and segmentation tests using a deep learning architecture for managing 3D point clouds. PointNet++ (Qi et al., 2017b) added new sampling and grouping procedures to better gather local information. It efficiently handles the local structure of point clouds by sampling key points and assigning other points to their neighborhoods. PointCNN (Li et al., 2018) introduced the X-alter operation, which learns an adaptive transformation matrix to alter the coordinates of input point clouds. This improves the network’s resistance to transformations in the input point cloud. Furthermore, several recent methods (Feng et al., 2019; Hanocka et al., 2019; Hu et al., 2021) use grid topological information to improve the 3D data representation performance of neural networks. Despite these methods’ considerable gains in processing 3D point cloud data, there has been little advancement in the sector based on triangular meshes.

Recently, several significant studies have extended 3D mesh analysis methods. For example, TPNet (Li et al., 2023b) proposed a novel mesh analysis method that achieves a more accurate analysis of triangular meshes by preserving topology and enhancing perceptual capabilities. Additionally, MBA (Fan et al., 2023) explored backdoor attacks on 3D mesh classifiers, which is crucial for understanding and improving the robustness of deep learning models. Meanwhile, 3D reconstruction based on hierarchical reinforcement learning (Li et al., 2023a) demonstrated efficient transferability across different categories, indicating that task decomposition through hierarchical structures can enhance generalization in 3D reconstruction tasks.

Variational autoencoder (VAE; Kingma and Welling, 2014), a sort of generative network, has been widely used in a variety of applications, including two-dimensional (2D) image restoration and 3D activities such as triangular mesh generation and interpolation. Mesh VAE (Tan et al., 2018) initially explored the latent space of 3D meshes using fully connected operations, however, this model required a large number of parameters and had poor generalization capabilities. Although fully connected layers permit changes in mesh connectivity between layers, they cannot be applied directly to convolutional layers following irregular modifications. Litany et al. (2018) added a convolutional layer structure to the VAE model. However, such convolutional techniques cannot change the mesh’s connection. When executing mesh operations in neural networks, Ranjan et al. (2018) proposed sampling operations, but their sampling approach did not gather all local neighborhood information when reducing the number of vertices. To address the correspondence concerns between coarser and denser mesh structures in the hierarchical structure of network models, Yuan et al. (2020) suggested a VAE with edge contraction pooling based on Mesh VAE. Because the model has a single-layer pooling structure, excessive averaging of some local features may occur, resulting in the loss of important information. TetGAN (Gao et al., 2022) is a tetrahedral mesh generation convolutional generative adversarial network. This approach learns deep properties on each tetrahedron, encodes them as latent space vectors, and may be used to edit and synthesize shapes. While these methods are effective for 3D data processing, they have drawbacks such as the limited influence of graph convolution operations on connectivity, insufficient aggregation of local information in pooling processes, and the inability to establish deep networks.

We designed a framework for a Mesh VAE to investigate the latent space representation of triangular meshes in order to accommodate denser triangular mesh models and improve the network’s reconstruction capabilities. We used mesh simplification’s edge collapse procedures to build a hierarchical mesh structure with variable levels of detail, enabling efficient pooling by tracking mappings between coarse and fine-grained meshes. The network framework employs multilevel pooling procedures, which effectively aggregate local neighborhood information to reduce the number of mesh parameters. We created graph convolutional residual modules to address difficulties such as gradient vanishing in deep networks. To address issues such as gradient vanishing in deep networks, we designed graph convolutional residual modules. The model is easy to train, requiring only a small number of training samples for the encoder to map triangular mesh information to a latent space representation. This further facilitates the feasibility of interpolation in the latent space, enhancing the effectiveness of 3D shape manipulations.

This paper’s contributions can be summarized as follows:

We propose a Mesh VAE capable of learning a meaningful representation of deformable shape collections to investigate latent space operations. It produces competitive performance in a variety of applications, including shape generation, interpolation, and exploration.

We create an effective and fast feature extraction mechanism that better captures structural correlation information in input data, avoids gradient vanishing in deep networks and improves precision.

We adopt a multilayer mesh simplification-based pooling and unpooling structure that gradually reduces the spatial dimensions of the input. The model can progressively extract more general features, enhancing the generative model’s ability to generalize to different shapes and structures.

2. Related Work

2.1. Neural Networks Based on Multiview, Point Clouds, and Voxels

Following the successful application of deep learning to 2D photos, many academics (Hanocka et al., 2019; Maturana and Scherer, 2015; Qi et al., 2017a) have begun to investigate how neural networks may be used for 3D geometric knowledge. 3D data contains more representations than 2D data, including multiview pictures, point clouds, voxels, meshes, and more (Xiao et al., 2020). Researchers are gradually applying neural networks directly to 3D objects (Qi et al., 2017a, 2017b; Wu et al., 2015) as neural networks improve. Point clouds, unlike images, are unstructured and unordered. A straightforward way (Maturana and Scherer, 2015; Riegler et al., 2017; Wu et al., 2015) for using point clouds in 3D convolutional neural networks is to turn the input into a structured voxel representation. Although such representations are extensively employed in the processing of 2D pictures, as the number of voxels increases in 3D, the computing and storage needs grow. To eliminate duplicate computations, Graham et al. (2018) used sparse convolution. Wang et al. (2021) also improved on this method by using a patch-guided partitioning scheme and output-guided skip connections.

Due to the higher computational costs associated with using voxels, other studies (Qi et al., 2017a, 2017b; Wang et al., 2018) have begun to directly explore methods utilizing point clouds. PointCNN Qi et al. (2017a) developed point cloud convolution operators, which enhanced the nonstructural character of point clouds via feature modification matrices. PointNet (Qi et al., 2017a) used multilayer perceptrons and max pooling to overcome the lack of structure in point clouds, while PointNet++ (Qi et al., 2017b) improved point-based network performance by taking hierarchical structures and set abstraction into account. PointCNN (Li et al., 2018) developed point cloud convolution operators, which improved the nonstructural nature of point clouds via feature transformation matrices, while KPConv (Thomas et al., 2019) improved point cloud network performance even further by suggesting deformable convolutions. Additionally, Zhao et al. (2021) built a self-attention network that used attention mechanisms to improve point cloud processing results. These methods offer a versatile and strong paradigm for dealing with 3D geometric data, offering new opportunities for a variety of application scenarios.

2.2. Neural Networks Based on Triangular Meshes and Graphs

Some methods (Dong et al., 2023; Feng et al., 2019; Hanocka et al., 2019; Liang et al., 2022) have begun to apply neural networks to meshes due to high computing costs or a lack of topological information. The researchers (Haim et al., 2019; Maron et al., 2017) do this by employing parametric methods to translate geometric information onto mesh structures, allowing the mesh to be used with 2D convolutions. TangentConv (Tatarchenko et al., 2018) pioneered the notion of tangent convolution, which enables neural networks to handle large-scale meshes. Through a parallel framework, PFCNN (Yang et al., 2020) improves the neural network’s surface processing capabilities even further.

However, these methods often come with relatively high computational costs. As a result, many researchers (Hanocka et al., 2019; Monti et al., 2017) regard meshes as graphs and employ graph convolutional networks. Mesh Net (Feng et al., 2019) performs surface convolution directly using mesh adjacency connections, however, unlike neural networks on images, it does not generate a hierarchical structure for pooling. DiffusionNet (Sharp et al., 2022) and HodgeNet (Smirnov and Solomon, 2021) investigate the use of spectral algorithms on meshes to learn geometric properties. Some research uses mesh simplification to create a hierarchical structure in order to increase network performance (Feng et al., 2019; Hanocka et al., 2019; Hu et al., 2021). Other research looks into novel ways to build hierarchical structures, such as random walks (RWs; Lahav and Tal, 2020), loop subdivision (Hu et al., 2022), parallel vertex clustering, and adaptive edge contraction (Hanocka et al., 2019). However, due to the lack of a precise mapping, these simplification methods cannot guarantee a consistent receptive field for the network. Subdivision-based (Hu et al., 2022) approaches place substantial demands on the mesh, restricting the application of mesh networks.

Despite the fact that pooling operations are commonly employed in deep networks for image processing, some existing mesh-based deep learning models either do not support pooling (Litany et al., 2018; Tan et al., 2018) or use simple sampling processes (Ranjan et al., 2018) that fail to aggregate all local neighborhood information. Rather, we use multilayered pooling techniques based on mesh simplification to assist the network in learning hierarchical features. The network may gradually capture both global and local features on the mesh, augmenting the learned 3D feature representation. This is especially important when dealing with large-scale mesh data since it can boost computational efficiency. Multilayered pooling can preserve important global structural information by aggregating information from neighboring nodes, aiding in preventing overfitting and enhancing the model’s generalization ability.

2.3. Representation and Applications of Deformable Meshes

To better represent a triangular mesh, a direct approach is to use its vertex coordinates. However, vertex coordinates lack both translational and rotational invariance, posing challenges for learning large-scale deformations. Instead, we employ a recent method for representing triangular mesh shapes (Gao et al., 2021), which involves recording deformations at the vertices, making graph convolution and pooling operations easier to implement.

In the beginning, Kingma and Welling (2014) proposed the VAE model, which effectively blends variational Bayesian approaches and deep neural networks, making it a classic deep generative model. In contrast to standard autoencoder models, VAE injects noise into the latent space, causing it to follow a specific distribution, hence increasing the latent space’s expressive potential and enabling the model to create data. VAE can be utilized in mesh creation to learn the latent representation of 3D objects and generate new mesh models. Shape generation and interpolation are common applications of mesh data. Leveraging the VAE structure, Mesh VAE (Tan et al., 2018) can generate more deformable mesh shapes, and the method proposed by Ranjan et al. (2018) can generate vividly expressive 3D faces from latent space. In fact, these VAE-based methods can also be used for shape interpolation. However, Tan et al. (2018) cannot handle shapes with too many vertices, limiting the resolution of the generated mesh shapes. Although the method by Ranjan et al. (2018) performs well on facial shapes, it lacks generality. Our work is based on the VAE architecture, which has a rich latent space expressive capacity. By training the model to capture the latent patterns in the data, we can naturally apply it to shape generation and interpolation in triangular meshes, with improved generalization capabilities.

3. Method

This paper introduces a novel VAE architecture specifically tailored for triangular mesh generation, as illustrated in Figure 2. Our approach integrates several advanced techniques, including residual blocks, graph convolution, and a triangular mesh simplification algorithm, to create a flexible and effective framework. The residual blocks are designed to enhance the network’s expressive capacity, enabling it to capture complex relationships between local features and graph structures. To facilitate multiscale feature extraction, we employ a hierarchical structure where pooling is achieved by carefully tracking the mapping between coarse and fine meshes. The entire residual block, combined with pooling operations based on a triangular mesh simplification algorithm, forms a hierarchical feature extraction structure, further enhancing the network’s ability to perceive multiscale features.

Figure 2.

Our network architecture.

The input to our network encoder is preprocessed as-consistent-as-possible features (Gao et al., 2021) with a size of $X \in R^{V \times 9}$ , where $V$ is the number of vertices, and 9 is the dimensionality of the deformation representation. Among these dimensions, 3 represent rotation, and 6 represent scaling, effectively encoding local deformations and handling large-scale rotations well. The network encoder consists of three graph convolutional residual blocks and two pooling layers. The output of the last residual block is mapped to a mean vector and a bias vector through two different fully connected layers. Introducing a random variable $ε \sim N (0, I)$ and an all-ones vector $I$ completes the reparameterization process of the VAE. The encoder corresponds to the decoder, as depicted in Figure 2, allowing the model to recreate the original data from the latent space. The overall network output is $\hat{X} \in R^{V \times 9}$ , which has the same dimensions as the input. To accurately recreate the model shape in its original form, the output needs to be rescaled to match the deformation representation that was initially used for the input. This rescaling ensures that the reconstructed model retains the correct proportions and aligns with the deformation characteristics present in the original data. The model contains graph convolutional residual blocks, as well as pooling and unpooling processes based on triangular mesh reduction, resulting in significant modeling capabilities for complicated graph data. It achieves smooth interpolation and efficient creation in the latent space at the same time.

3.1. Graph Conv ResNet

We present a novel graph convolutional residual block as part of a VAE model for producing triangular meshes. This architecture seeks to better capture structural information and correlations in incoming data, hence improving the algorithm’s power for representation learning. One of the important components of our model is the integration of an RW (Lahav and Tal, 2020) process within the graph convolutional layer, which enhances the model’s ability to capture complex local geometric relationships.

As shown in Figure 2(c), this graph convolutional residual block integrates numerous critical components, such as graph convolutional layers (Pei et al., 2019), RW procedures, and ReLU activation functions, to provide a strong and efficient feature extraction approach. With this design, our model can not only extract local features from the input triangular meshes and capture correlations between nodes, but it can also introduce some randomness into the network structure exploration by incorporating RWs, which aids in the capture of global features and trends.

Specifically, the graph convolutional layer allows the model to extract local features from the input triangular mesh and capture node relationships. Assume $x$ is the input matrix and $y$ is the result of the convolution process. Each row of $x$ and $y$ represents a vertex and each column represents a feature dimension. The normalized graph Laplacian operator is denoted as $L$ . The network’s frequency domain graph convolution may be represented as

y = g_{θ} (L) x = \sum_{h = 0}^{H - 1} θ_{h} T_{h} (\tilde{L}) x

(1)

where

\tilde{L} = 2 L / λ_{max} - I

λ_{max}

is the maximum eigenvalue,

θ \in R^{H}

is the polynomial coefficients, and

T_{h} (\tilde{L}) \in R^{V \times V}

is the

h th

Chebyshev polynomial at

\tilde{L}

H - 1

is the highest order of the Chebyshev polynomial.

Then, applying the RW transition matrix $P$ to $y$ , to simulate the long-distance propagation of information in the graph

z = P^{k} \cdot y

(2)

where

z

is the feature matrix after

k

steps of RW.

By introducing the RW process, the residual block can better adapt to different triangular mesh topologies, enhancing the model’s generalization capability. In terms of activation functions, ReLU is employed to introduce nonlinearity, allowing the model to better adapt to complex data distributions. By further processing $z$ through the weight matrix $W$ and bias $b$ , the aforementioned operation can be represented as

Y = ReLU (z \cdot W + b)

(3)

This multilevel, multicomponent structure of the residual block helps to enhance the model’s expressive power, thereby better learning and generating triangular meshes of higher quality. This innovative design plays a crucial role in model training, providing more robust capabilities for generating high-quality, diverse triangular meshes.

3.2. Mesh Simplification and Pooling–Depooling Fusion

For mesh simplification, we introduce an edge contraction algorithm to aid in mesh simplification (Yuan et al., 2020). This technique not only generates a multilevel mesh structure to accommodate varying degrees of complexity, but it also assures consistency across coarser and finer meshes. Edge contraction procedures are iteratively applied and optimized based on a shape change measure using standard edge contraction methods. However, to achieve more effective pooling, it is crucial to ensure that each vertex in the coarser mesh represents a region of similar area or volume in the original mesh.

Since edge length is a crucial metric throughout the simplification process, it is used as one of the criteria for sorting edges to be contracted, helping to prevent the formation of overly long edges during contraction. We adopt the edge contraction algorithm based on the quadric error metrics as introduced by Garland and Heckbert (1997). In this approach, the error at a vertex $v = [v_{x}, v_{y}, v_{z}, 1]^{T}$ is defined as a quadratic form $v^{T} Q v$ , where $Q$ is the sum of the basic quadratic error matrices. For a given edge contraction $(v_{1}, v_{2}) \to \bar{v}$ , using $\bar{Q} = Q_{1} + Q_{2}$ as the new matrix, $\bar{Q}$ approximates the error at $\bar{v}$ . Therefore, the error at $\bar{v}$ will be ${\bar{v}}^{T} \bar{Q} \bar{v}$ . Subsequently, the new edge length is added to the original simplification error metric. Specifically, given an edge $(v_{1}, v_{2}) \to$ contracted to a new vertex ${\bar{v}}_{k}$ , the total error is defined as:

E = {\bar{v}}_{k}^{T} {\bar{Q}}_{k} {\bar{v}}_{k} + λ max {L_{k m}, L_{k n} ∣ m \in N_{i}, n \in N_{j}, m \neq j, n \neq i}

(4)

where

L_{k m}

and

L_{k n}

are the new edge lengths between vertices

k

and vertices

m

n

, respectively.

N_{i}

and

N_{j}

are the sets of adjacent vertices for vertex

i

j

, respectively, and

λ

is the weight. As shown in Figure 3(a) depicts the simplification process, in which the orange vertices are simplified into green vertices by edge contraction, and the features of the green vertices are averaged to produce the features of the purple vertex.

Figure 3.

Pooling and depooling based on triangular grid simplification.

Mesh simplification is accomplished by combining two nearby vertices into a new vertex through repeated edge contraction procedures. This procedure is used to define mesh pooling operations in our network framework. The characteristics of the new vertex are specified as the average features of the contracted vertices following the edge contraction phase, guaranteeing the efficacy of pooling operations in the relevant simplified region. This retains the correct topology, allowing numerous layers of convolution or pooling to be supported while producing a well-defined receptive field.

As the VAE network has a decoder structure, it is crucial to properly define the unpooling operation. Here, we utilize the vertex relationships recorded during the simplification process and define unpooling as the inverse of edge contraction: the features of vertices on the simplified mesh are evenly redistributed to the corresponding contracted vertices on the dense mesh. This process ensures the accurate restoration of the original mesh details during the network decoding phase. Specifically, during the simplification operation, each edge contraction records the information of the corresponding vertex pairs and their connections. In the unpooling operation, based on these records, we accurately restore which vertices were transformed into edges after $k$ simplification steps, and reassign the connections from the simplified vertices back to the two vertices on the edge. Furthermore, to maintain consistency in the mesh topology, we use the same adjacency matrix as in the encoder block. This approach not only ensures the accurate restoration of vertex features but also avoids cumulative errors that may arise from multiple transformations. As shown in Figure 3(b) and (c), in all our experiments, the first layer of pooling contracts 50% of the original vertices, and the second layer contracts 25% of the original vertices to support effective pooling.

3.3. Loss Function Design

When synthesizing a new 3D shape, it is often desired to control the type of shape generated, especially when the training dataset contains different shapes. In our improved VAE-based network training, we utilize mean-squared error (MSE) loss and Kullback-Leibler (KL) divergence loss (Tan et al., 2018; Van Erven and Harremos, 2014).

MSE loss is primarily used to measure the positional reconstruction error between the generated mesh model and the target shape. By calculating the squared differences between the vertex positions of the generated mesh and the corresponding vertex positions of the target shape, the network is encouraged to learn to generate vertex positions as close as possible to the input shape. This ensures that the created mesh has vertices positioned similarly to those in the target shape, thereby optimizing the positional accuracy of the generated shape. However, it is important to note that MSE loss focuses on vertex positions and does not directly measure the preservation of the overall geometric structure. This approach is effective as our network maintains the topology of the mesh, ensuring that the overall geometric structure remains consistent during the generation.

L_{MSE} = \frac{1}{2 M} \sum_{i = 1}^{M} {‖ X^{i} - {\hat{X}}^{i} ‖}_{F}^{2}

(5)

In the equation,

X^{i}

and

{\hat{X}}^{i}

represent the preprocessed features and network output for the

i th

model, respectively, and

M

is the total number of shapes in the dataset.

The difference between the generated latent space distribution and the standard normal distribution is measured using the KL divergence loss. By introducing KL divergence, it forces representations in the latent space to be more regular, assisting in the production of more continuous and interpretable shapes. Its formula is as follows:

L_{KL} = D_{KL} (q (z ∣ X, c) ‖ p (z ∣ c))

(6)

In which

z

represents the latent variable, that is obtained by mapping the posterior distribution from the input data through the encoder.

p (z ∣ c)

represents the prior probability,

q (z ∣ X, c)

represents the posterior probability distribution, and

D_{KL}

represents the KL divergence.

Finally, the loss function $Loss$ we use is the sum of two sublosses

Loss = L_{MSE} + α L_{KL}

(7)

Among them,

α

is a parameter that balances the adjustment between MSE loss and KL divergence, and its selection will be analyzed in the experimental section.

4. Experiment and Result Analysis

4.1. Mesh Dataset

We used several deforming shape datasets, including the SCAPE dataset (Anguelov et al., 2005), Swing dataset (Vlasic et al., 2008), Camel dataset, Horse dataset (Sumner and Popović, 2004), Dense dataset, and Dyna dataset samples (Pons-Moll et al., 2015). Unless otherwise specified, each dataset was divided into two sections for training and testing at random. This splitting strategy aids in evaluating the network’s reconstruction performance on geometries not seen during training. We normalized the mesh models of the same category to ensure fairness in error comparison. Specifically, each model was normalized based on the proportion of the length of its bounding box diagonal, allowing for geometric error comparison among models of the same category. This approach effectively eliminates size differences within the same category, making the error calculation more reasonable. We calculated the reconstruction error, root MSE (RMSE), and geodesic distance to assess the network’s performance on the test set. The geodesic distance (Bouttier et al., 2003) was calculated using the following formula:

d_{geo} (u, v) = min_{γ \in Γ_{u, v}} \int_{γ} ‖ \dot{γ} (t) ‖ d t

(8)

In which

d_{geo} (u, v)

represents the geodesic distance between vertices

u

and

v

Γ_{u, v}

is the set of all possible paths between

u

and

v

γ (t)

is a specific path parameterized by

t

, and

‖ \dot{γ} (t) ‖

is the norm of the velocity along the path at point

t

. This measure provides a more accurate reflection of the true geometric distance between points on the mesh compared to Euclidean distance.

4.2. Environment Settings and Implementation Details

The experiments were conducted on a machine equipped with an Intel Xeon(R) CPU E5-2620 v4 @ 2.10 GHz (16 cores), an NVIDIA GTX 1080Ti GPU, 16 GB of RAM, and 2 TB of solid-state drive storage. The deep learning framework used for the experiments was PyTorch. Additionally, the operating system was Ubuntu 20.04, with CUDA Toolkit 11.0 installed to enable integration with PyTorch, optimizing the training speed and performance of neural networks.

In this paper’s trials, we set the hyperparameters in the graph convolution at $H = 3$ and the total loss function to $α = 0.1$ . The dimension of the latent space vectors in the network model is set to 128, and the experiment will explain the feature dimension size in detail. To avoid overfitting, L2 regularization is used for the network weights. We employ the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.0001. Training our model on the dataset described in Section 4.1 for 200 iterations took approximately 6.5 h, which is the average training time.

Table 1.
Generated Mesh RMSE Values for Different $α$ Values.

Values of $α$

Dataset 0.1 0.3 0.5 0.7

Camel 0.0064 0.0077 0.00910 0.0097

Dyna 0.0079 0.0085 0.0092 0.0103

Swing 0.0067 0.0069 0.0068 0.0092

Dense 0.0098 0.0106 0.0139 0.0190

	Values of $α$
Camel	0.0064	0.0077	0.00910	0.0097
Dyna	0.0079	0.0085	0.0092	0.0103
Swing	0.0067	0.0069	0.0068	0.0092
Dense	0.0098	0.0106	0.0139	0.0190

Note. RMSE = root mean-squared error. The bold font highlights the results obtained by the method in this paper.

Because our mesh generation model uses both MSE and KL divergence as loss functions, we incorporate a regularization factor $α$ before the KL divergence to balance these two portions of the loss. As shown in Table 1, the experiment compares the reconstruction errors for various values of $α$ (0.1, 0.3, 0.5, 0.7). The situation with $α = 0.1$ performs the best in the experimental results. During training, this option attempts to balance the contributions of reconstruction error and the latent space regularization term. Smaller values of $α$ reduce the impact of KL divergence on the overall loss, emphasizing reconstruction quality. This balance ensures that the generated meshes retain high quality while better fitting the input data, creating a solid platform for further model refining and applications.

Table 2.

RMSE and Geodesic Distance Values of Grids Generated by Different Methods.

Dataset	Vertex count	Mesh VAE (Tan et al., 2018)		MeshPooling (Yuan et al., 2020)		DGNet (Li et al., 2023c)		Ours
		RMSE	Godesic distance	RMSE	Geodesic distance	RMSE	Geodesic distance	RMSE	Geodesic distance
Camel	11063	0.0159	0.0306	0.0137	0.0293	0.0097	0.0104	0.0064	0.0079
Dyna	6890	0.0418	0.0685	0.0342	0.0529	0.0158	0.0201	0.0079	0.0092
Swing	9971	0.0397	0.0477	0.0281	0.0414	0.0114	0.0157	0.0067	0.0088
Horse	8431	0.0193	0.0361	0.0174	0.0206	0.0106	0.0177	0.0083	0.0091
Scape	12500	0.0954	0.1108	0.0790	0.1032	0.0450	0.0588	0.0164	0.0199
Elephant	42321	0.1082	0.1504	0.0922	0.1492	0.0812	0.0951	0.0583	0.0501
Dense	10002	0.0611	0.0652	0.0441	0.0533	0.0203	0.0294	0.0098	0.0120

Note. RMSE = root mean-squared error; VAE = variational autoencoder. The bold font highlights the results obtained by the method in this paper.

4.3. Experimental Method Comparison

We developed a unique mesh generation technique and conducted comparative experiments with existing approaches in this paper. We compared our method against the baseline network Mesh VAE (Tan et al., 2018), MeshPooling (Yuan et al., 2020), and DGNet (Li et al., 2023c) based on reconstruction error RMSE and geodesic distance. As demonstrated in Table 2, our framework consistently achieved better outcomes in terms of both reconstruction error and geodesic distance across multiple datasets. These results highlight our algorithm’s exceptional ability to accurately reconstruct the original mesh structure, effectively capturing complex geometries and subtle features. Our method not only outperforms previous approaches on overall dataset performance but also excels in preserving intricate details and complex structures, making it superior in both quantitative measures.

Figure 4.

The heatmap visualization of the reconstruction results in the comparison of methods.

A set of qualitative studies were conducted to thoroughly evaluate our mesh generation method. Figure 4 illustrates the reconstruction errors, while Figure 5 displays the geodesic distances, comparing our approach to existing mesh-generation methods. As shown in Figures 4 and 5, our technique consistently achieves lower reconstruction error values and geodesic distances across multiple datasets, indicating that the generated meshes more effectively preserve the integrity of the original structures. The alignment between our technique and the actual mesh structure is clearly demonstrated through simple heatmap visualizations, where our method consistently highlights regions of minimal distortion and better captures fine details compared to other methods.

Figure 5.

Visualization comparison of geodesic distances across different methods.

For a detailed comparison of reconstruction details, we chose the DGNet (Li et al., 2023c) algorithm for visual contrast in the generated results. Figure 6 shows that the mesh structure formed by DGNet within the orange circles has some visible flaws, particularly in the intricate sections of the face, indicating information loss. These qualitative experiments illustrate our method’s exceptional performance in the field of mesh production. We not only improved quantitative indicators significantly, but also provided compelling visual outcomes. The combined findings of quantitative and qualitative testing show that our suggested method outperforms current state-of-the-art methods in creating mesh details.

Figure 6.

The visualization of reconstruction outcomes with the DGNet (Li et al., 2023c) approach was compared.

4.4. Ablation Experiment

We conducted a series of ablation experiments to conduct a complete comparison of different network architectures and settings, as well as to comprehensively assess the aspects that significantly effect the performance of encoding and decoding outcomes. These investigations intended to validate the contributions of the method’s constituent components to overall performance.

4.4.1. The Ablation Experiment of Pooling Layer Based on Triangle Mesh Simplification

Our model incorporates pooling operations based on triangular mesh simplification. To further validate the impact of various pooling methods on the network, we investigated the cases of no pooling, single-layer pooling, and double-layer pooling. Table 3 compares the reconstruction errors under these three conditions. According to the findings, multilayer pooling and unpooling processes have a considerable advantage in increasing model performance and the network’s ability to reconstruct unseen forms. Multilayer pooling aids in the learning of more complicated form structures and the capture of more features, and is especially beneficial for mesh models with hierarchical structures.

Table 3.
Ablation Experiment of Pooling Layer.

Dataset No pooling layer Single-layer pooling Ours

Camel 0.1032 0.0172 0.0064

Dyna 0.0674 0.0199 0.0079

Swing 0.0896 0.0181 0.0067

Dense 0.1013 0.0994 0.0098

Dataset	No pooling layer	Single-layer pooling	Ours
Camel	0.1032	0.0172	0.0064
Dyna	0.0674	0.0199	0.0079
Swing	0.0896	0.0181	0.0067
Dense	0.1013	0.0994	0.0098

The bold font highlights the results obtained by the method in this paper.

We expanded the experimental design even more by including an error comparison heatmap for scenarios without pooling, single-layer pooling, and the proposed multilayer pooling model. The heatmap evaluates the performance of several pooling techniques in various shape reconstruction tasks visually. Figure 7 shows that when no pooling is applied, the model’s error in form reconstruction tasks is relatively significant. The use of single-layer pooling improves performance slightly, whereas our suggested multilayer pooling model excels in all tasks, achieving reduced error values. This further verifies the multilayer pooling model’s great benefit in enhancing shape reconstruction performance.

Figure 7.

Visualization of ablation experiments on the pooling layer based on triangle mesh simplification.

4.4.2. Ablation Experiment on Graph Convolutional Residual Blocks

The goal of the ablation experiment on the graph convolutional residual block was to obtain a better understanding of how each component of this module contributes to overall model performance. First, we created a baseline model that included the entire graph convolutional residual block and measured its performance on the dataset. Following that, we retrained the model and evaluated performance metrics by removing the graph convolutional residual block and replacing it with graph convolution, as well as removing RWs and replacing them with batch normalization.

Table 4 shows the outcomes of this set of ablation tests. According to a careful study of the experimental data, incorporating graph convolutional residual blocks into the model structure greatly enhanced reconstruction performance. When compared to other ablation circumstances, the model with extra graph convolutional residual blocks had the lowest reconstruction error. In comparison to other experimental settings, the model with extra RWs also displayed comparatively low reconstruction error, demonstrating its critical role in enhancing model performance and accuracy. The experimental results of RWs indicate that they bring significant improvements to the network model when dealing with graph data.

Table 4.
Ablation Experiments on Graph Residual Blocks and Random Walks.

Dataset No graph residual blocks No random walks Ours

Camel 0.0251 0.0112 0.0064

Dyna 0.0194 0.0103 0.0079

Swing 0.0197 0.0152 0.0067

Dense 0.0399 0.0235 0.0098

Dataset	No graph residual blocks	No random walks	Ours
Camel	0.0251	0.0112	0.0064
Dyna	0.0194	0.0103	0.0079
Swing	0.0197	0.0152	0.0067
Dense	0.0399	0.0235	0.0098

The bold font highlights the results obtained by the method in this paper.

We conducted an ablation experiment to investigate the effect of graph convolutional residual blocks on training cycles. At different training cycles, we compared the reconstruction errors of two models, one containing graph convolutional residual blocks and one without. The experiment was carried out using the Dense dataset, and the results are given in Figure 8. At the same training cycles, the model with graph convolutional residual blocks has a much lower reconstruction error than the case without them.

Figure 8.

The impact of graph convolutional residual blocks on the training epochs.

These experimental results show that the graph convolutional residual block has a good impact on the training process. For starters, this module helps to accelerate the model’s convergence speed, allowing it to learn the data representation faster. Second, the graph convolutional residual block’s architecture improves the model’s stability by decreasing fluctuations throughout the training process, pushing the model to attain greater performance in shorter training periods.

Table 5.

Model Complexity and Parameter Comparison With Baseline Networks.

Model name	Parameters (millions)	Training time	RMSE	Geodesic distance	Generation capability
Baseline network (Mesh VAE; Tan et al., 2018)	1651	5.2	0.0545	0.0728	Average
Our network	2156	6.5	0.0096	0.0185	Excellent

Note. Mesh VAE = Mesh variational autoencoder; RMSE = root mean-squared error. The bold font highlights the results obtained by the method in this paper.

4.4.3. Model Complexity and Baseline Comparison

To further evaluate the differences in model complexity between our proposed method and the baseline methods (Mesh VAE; Tan et al., 2018), we conducted a detailed analysis of model parameters. Although our model has significantly more parameters than the baseline networks (as shown in Table 5), this increase endows the model with greater representation capacity, enabling it to achieve higher accuracy and generalization ability when handling complex mesh data. For example, in reconstruction tasks, despite the higher model complexity, our method is better at capturing subtle geometric features and performs excellently in generating new mesh instances. Additionally, the increase in parameters enhances the model’s expressiveness in the latent space, allowing our method to excel in shape interpolation and generation tasks. In contrast, the baseline networks do not utilize residual modules and employ only a single-layer pooling structure, which limits their ability to handle complex meshes. Although our model requires longer training time than the baseline networks, this additional time cost is offset by the significantly lower RMSE and geodesic distance, making our method more competitive in practical applications.

4.5. Framework Evaluation

We validated our framework’s design choices and effectively demonstrated the efficacy of residual blocks and the triangular mesh simplification pooling layer by comparing outcomes under varied parameter settings and input conditions.

Latent Space Vector Dimensions

We compared the reconstruction errors (i.e., the RMSE of position for each vertex) of created meshes with varied latent vector dimensions. The experimental results, as given in Table 6, suggest that utilizing 128 dimensions effectively enhances reconstruction quality. Lower dimensions do not capture enough information, whereas greater dimensions do not appreciably improve the results and may potentially lead to overfitting.

Table 6.
Mesh RMSE Values for Different Latent Space Vector Dimensions.

Latent space vector dimension

Dataset 64 128 256

Camel 0.0071 0.0064 0.0062

Dyna 0.0088 0.0079 0.0078

Swing 0.0080 0.0067 0.0064

Horse 0.0094 0.0083 0.0081

Note. RMSE = root mean-squared error.

Impact of Dataset Size on Reconstruction Error

We also ran mesh generation tests on the Dyna and Swing datasets, producing line graphs of the resulting RMSE versus dataset size. The findings, as shown in Figure 9, show that our proposed technique works extraordinarily well on various datasets, demonstrating its great advantage in generative tasks. It is worth noting that even with a smaller dataset, our technique delivers excellent generation outcomes, as evidenced by relatively low RMSE values. This property is critical for small-scale data scenarios in practical applications, emphasizing our method’s robustness and efficiency under low data conditions. These experimental results corroborate our approach’s excellent generalization performance in addressing mesh generation difficulties, giving strong support for its viability in real-world applications.

Figure 9.
The effect of various dataset sizes on reconstruction error (RMSE). Note. RMSE = root mean-squared error.

These experimental results further validate the exceptional generalization performance of our approach in addressing mesh generation problems, providing strong support for its feasibility in real-world applications.
4.6. Generation of New Shapes

	Latent space vector dimension
Camel	0.0071	0.0064	0.0062
Dyna	0.0088	0.0079	0.0078
Swing	0.0080	0.0067	0.0064
Horse	0.0094	0.0083	0.0081

As the network gradually learns the latent representation of the data during the training phase, this study is eventually able to use the learned latent space and decoder to generate new forms in an original manner. We can see the network’s good generating capability by feeding a sample from the standard normal distribution $ε \sim N (0, I)$ as input to the trained decoder, as shown in Figure 10. The created new shapes are visually credible, with different and interesting characteristics.

Figure 10.

The system generates new shapes at random from the original collection, as well as their nearest neighbors.

We used an intuitive way to determine whether the created shapes differed from those previously present in the training dataset. We performed a visual comparison by computing the average Euclidean distance between vertices and identifying the sample in the training dataset that is closest to the created shape, that is, the nearest neighbor mesh model shown in Figure 10. The comparison results clearly illustrate that the generated shapes are architecturally distinct from any shapes in the training dataset.

4.7. Shape Interpolation

We ran shape interpolation experiments on the Dense dataset to explore deeper into the shape space created by the model. We obtained a sequence of intermediate shapes by picking two alternative shape representations in the latent space, as shown in the orange box on the left side of Figure 11, and gradually interpolating along their continuous route. We layered the generated varied shapes at the same time, as illustrated on the right side of Figure 11, indicating its potential for achieving animation effects. The goal of this experiment is to show the continuity and smoothness of our proposed model in shape generation, as well as how well the model transitions between shapes in the learned latent space. The experimental results will be critical in gaining a more complete grasp of the model’s generative capabilities.

Figure 11.

The outcomes of our frame shape interpolation.

5. Conclusion

In this paper, we present a unique framework for the Mesh VAE that efficiently overcomes the obstacles posed by dealing with triangular meshes’ complexity and irregularity. We improved the network’s generalization capability by employing multilayered pooling operations based on a triangular mesh simplification algorithm and graph convolution residual modules, further mitigating the issue of deep network gradient vanishing and achieving rapid convergence, thereby improving triangular mesh reconstruction accuracy. Furthermore, experimental results show that our framework outperforms others in handling deformable shape collections, which includes applications such as shape generation and interpolation across several domains. The drawback of our mesh-generating model is that it can only handle homogenous meshes. As a future endeavor, a framework capable of processing shapes with diverse topologies as input or generating a sequence of deformed data through data augmentation is required.

Footnotes

ORCID iD

Cheng Han

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Key R&D Program of China (grant no. 2020YFB1709200).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Anguelov

Srinivasan

Koller

Thrun

Rodgers

Davis

(2005). Scape: Shape completion and animation of people. In ACM SIGGRAPH 2005 papers (pp. 408–416). ACM.

Bouttier

Di Francesco

Guitter

(2003). Geodesic distance in planar graphs. Nuclear Physics B, 663(3), 535–567. https://doi.org/10.1016/S0550-3213(03)00355-9

Dong

Wang

Gao

Chen

Shu

Xin

Wang

(2023). Laplacian2mesh: Laplacian-based mesh understanding. IEEE Transactions on Visualization and Computer Graphics, 30(7), 4349–4361. https://doi.org/10.1109/TVCG.2023.3259044

Fan

(2023). MBA: Backdoor attacks against 3D mesh classifier. IEEE Transactions on Information Forensics and Security, 19, 2127–2142. https://doi.org/10.1109/TIFS.2023.3346644

Feng

You

Zhao

Gao

(2019). MeshNet: Mesh neural network for 3d shape representation. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8279–8286. https://doi.org/10.1609/aaai.v33i01.33018279

Gao

Lai

Y.-K.

Yang

Zhang

L.-X.

Xia

Kobbelt

(2021). Sparse data driven mesh deformation. IEEE Transactions on Visualization and Computer Graphics, 27(3), 2085–2100. https://doi.org/10.1109/TVCG.2019.2941200

Gao

Wang

Metzer

Yeh

R. A.

Hanocka

(2022). Tetgan: A convolutional neural network for tetrahedral mesh generation. arXiv preprint arXiv:2210.05735. https://doi.org/10.48550/arXiv.2210.05735

Garland

Heckbert

P. S.

(1997). Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on computer graphics and interactive techniques (pp. 209–216). ACM.

Graham

Engelcke

Van Der Maaten

(2018). 3D semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9224–9232). IEEE.

10.

Haim

Segol

Ben-Hamu

Maron

Lipman

(2019). Surface networks via general covers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 632–641). IEEE.

11.

Hanocka

Hertz

Fish

Giryes

Fleishman

Cohen-Or

(2019). Meshcnn: A network with an edge. ACM Transactions on Graphics (TOG), 38(4), 1–12. https://doi.org/10.1145/3306346.3322959

12.

Bai

Shang

Zhang

Dong

Wang

Sun

Tai

C.-L.

(2021). VMNet: Voxel-mesh network for geodesic-aware 3D semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15488–15498). IEEE.

13.

S.-M.

Liu

Z.-N.

Guo

M.-H.

Cai

J.-X.

Huang

T.-J.

Martin

R. R.

(2022). Subdivision-based mesh convolution networks. ACM Transactions on Graphics (TOG), 41(3), 1–16. https://doi.org/10.1145/3506694

14.

Kingma

D. P.

(2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980

15.

Kingma

D. P.

Welling

(2014). Auto-encoding variational Bayes. Stat, 1050, 1. https://doi.org/10.48550/arXiv.1312.6114

16.

Lahav

Tal

(2020). MeshWalker: Deep mesh understanding by random walks. ACM Transactions on Graphics (TOG), 39(6), 1–13. https://doi.org/10.1145/3414685.3417806

17.

Sun

Chen

(2018). PointCNN: Convolution on X-transformed points. In Bengio, S., Wallach, H., Larochelle, H., et al. (Eds.), Advances in neural information processing systems (Vol. 31). MIT Press.

18.

Fan

Yan

(2023a). 3D reconstruction based on hierarchical reinforcement learning with transferability. Integrated Computer-Aided Engineering, 30(4), 327–339. https://doi.org/10.3233/ICA-230710

19.

Fan

Song

(2023b). TPNet: A novel mesh analysis method via topology preservation and perception enhancement. Computer Aided Geometric Design, 104, 102219. https://doi.org/10.1016/j.cagd.2023.102219

20.

X.-L.

Liu

Z.-N.

Chen

T.-J.

Martin

R. R.

S.-M.

(2023c). Mesh neural networks based on dual graph pyramids. IEEE Transactions on Visualization and Computer Graphics, 30(7), 4211–4224. https://doi.org/10.1109/TVCG.2023.3257035

21.

Liang

Zhao

Zhang

(2022). MeshMAE: Masked autoencoders for 3D mesh data analysis. In European conference on computer vision (pp. 37–54). Springer.

22.

Litany

Bronstein

Makadia

(2018). Deformable shape completion with graph convolutional autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1886–1895). IEEE.

23.

Maron

Galun

Aigerman

Trope

Dym

Yumer

Kim

V. G.

Lipman

(2017). Convolutional neural networks on surfaces via seamless toric covers. ACM Transactions of Graphics, 36(4), Article No. 71. https://doi.org/10.1145/3072959.3073616

24.

Maturana

Scherer

(2015). VoxNet: A 3D convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 922–928). IEEE.

25.

Monti

Boscaini

Masci

Rodola

Svoboda

Bronstein

M. M.

(2017). Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5115–5124). IEEE.

26.

Pei

Wei

Chang

K. C.-C.

Lei

Yang

(2019). Geom-GCN: Geometric graph convolutional networks. In International conference on learning representations. ACM.

27.

Pons-Moll

Romero

Mahmood

Black

M. J.

(2015). Dyna: A model of dynamic human shape in motion. ACM Transactions on Graphics (TOG), 34(4), 1–14. https://doi.org/10.1145/2766993

28.

C. R.

Guibas

L. J.

(2017a). PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652–660). IEEE.

29.

C. R.

Guibas

L. J.

(2017b). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems (Vol. 30). MIT Press.

30.

Ranjan

Bolkart

Sanyal

Black

M. J.

(2018). Generating 3D faces using convolutional mesh autoencoders. In Proceedings of the European conference on computer vision (ECCV) (pp. 704–720). Springer.

31.

Riegler

Osman Ulusoy

Geiger

(2017). OctNet: Learning deep 3D representations at high resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3577–3586). IEEE.

32.

Sharp

Attaiki

Crane

Ovsjanikov

(2022). DiffusionNet: Discretization agnostic learning on surfaces. ACM Transactions on Graphics (TOG), 41(3), 1–16. https://doi.org/10.1145/3507905

33.

Smirnov

Solomon

(2021). HodgeNet: Learning spectral geometry on triangle meshes. ACM Transactions on Graphics (TOG), 40(4), 1–11. https://doi.org/10.1145/3450626.3459797

34.

Sumner

R. W.

Popović

(2004). Deformation transfer for triangle meshes. ACM Transactions on Graphics (TOG), 23(3), 399–405. https://doi.org/10.1145/1015706.1015736

35.

Tan

Gao

Lai

Y.-K.

Xia

(2018). Variational autoencoders for deforming 3D mesh models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5841–5850). IEEE.

36.

Tatarchenko

Park

Koltun

Zhou

Q.-Y.

(2018). Tangent convolutions for dense prediction in 3D. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3887–3896). IEEE.

37.

Thomas

C. R.

Deschaud

J.-E.

Marcotegui

Goulette

Guibas

L. J.

(2019). Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6411–6420). IEEE.

38.

Van Erven

Harremos

(2014). Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7), 3797–3820. https://doi.org/10.1109/TIT.2014.2320500

39.

Vlasic

Baran

Matusik

Popović

(2008). Articulated mesh animation from multi-view silhouettes. In ACM SIGGRAPH 2008 papers (pp. 1–9). ACM.

40.

Wang

P.-S.

Sun

C.-Y.

Liu

Tong

(2018). Adaptive O-CNN: A patch-based deep representation of 3D shapes. ACM Transactions on Graphics (TOG), 37(6), 1–11. https://doi.org/10.1145/3272127.3275050

41.

Wang

P.-S.

Yang

Y.-Q.

Zou

Q.-F.

Liu

Tong

(2021). Unsupervised 3D learning for shape analysis via multiresolution instance discrimination. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 2773–2781). AAAI.

42.

Song

Khosla

Zhang

Tang

Xiao

(2015). 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1912–1920). IEEE.

43.

Xiao

Y.-P.

Lai

Y.-K.

Zhang

F.-L.

Gao

(2020). A survey on deep geometry learning: From a representation perspective. Computational Visual Media, 6, 113–133. https://doi.org/10.1007/s41095-020-0174-8

44.

Yang

Liu

Pan

Liu

Tong

(2020). PFCNN: Convolutional neural networks on 3D surfaces using parallel frames. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13578–13587). IEEE.

45.

Yuan

Y.-J.

Lai

Y.-K.

Yang

Duan

Gao

(2020). Mesh variational autoencoders with edge contraction pooling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 274–275). IEEE.

46.

Zhao

Jiang

Jia

Torr

P. H.

Koltun

(2021). Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16259–16268). IEEE.

Triangle Mesh Reconstruction by Fusion of Residual Graph Convolution and Edge Contraction Pooling

Abstract

Keywords

1. Introduction

2.1. Neural Networks Based on Multiview, Point Clouds, and Voxels

2.2. Neural Networks Based on Triangular Meshes and Graphs

2.3. Representation and Applications of Deformable Meshes

3. Method

4.1. Mesh Dataset

Table 1. Generated Mesh RMSE Values for Different α Values. Values of α Dataset 0.1 0.3 0.5 0.7 Camel 0.0064 0.0077 0.00910 0.0097 Dyna 0.0079 0.0085 0.0092 0.0103 Swing 0.0067 0.0069 0.0068 0.0092 Dense 0.0098 0.0106 0.0139 0.0190

4.4.1. The Ablation Experiment of Pooling Layer Based on Triangle Mesh Simplification

Table 3. Ablation Experiment of Pooling Layer. Dataset No pooling layer Single-layer pooling Ours Camel 0.1032 0.0172 0.0064 Dyna 0.0674 0.0199 0.0079 Swing 0.0896 0.0181 0.0067 Dense 0.1013 0.0994 0.0098

Table 4. Ablation Experiments on Graph Residual Blocks and Random Walks. Dataset No graph residual blocks No random walks Ours Camel 0.0251 0.0112 0.0064 Dyna 0.0194 0.0103 0.0079 Swing 0.0197 0.0152 0.0067 Dense 0.0399 0.0235 0.0098

4.5. Framework Evaluation

Latent Space Vector Dimensions

Table 6. Mesh RMSE Values for Different Latent Space Vector Dimensions. Latent space vector dimension Dataset 64 128 256 Camel 0.0071 0.0064 0.0062 Dyna 0.0088 0.0079 0.0078 Swing 0.0080 0.0067 0.0064 Horse 0.0094 0.0083 0.0081

Impact of Dataset Size on Reconstruction Error

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

References

Table 1.
Generated Mesh RMSE Values for Different $α$ Values.

Values of $α$

Dataset 0.1 0.3 0.5 0.7

Camel 0.0064 0.0077 0.00910 0.0097

Dyna 0.0079 0.0085 0.0092 0.0103

Swing 0.0067 0.0069 0.0068 0.0092

Dense 0.0098 0.0106 0.0139 0.0190

Table 3.
Ablation Experiment of Pooling Layer.

Dataset No pooling layer Single-layer pooling Ours

Camel 0.1032 0.0172 0.0064

Dyna 0.0674 0.0199 0.0079

Swing 0.0896 0.0181 0.0067

Dense 0.1013 0.0994 0.0098

Table 4.
Ablation Experiments on Graph Residual Blocks and Random Walks.

Dataset No graph residual blocks No random walks Ours

Camel 0.0251 0.0112 0.0064

Dyna 0.0194 0.0103 0.0079

Swing 0.0197 0.0152 0.0067

Dense 0.0399 0.0235 0.0098

Table 6.
Mesh RMSE Values for Different Latent Space Vector Dimensions.

Latent space vector dimension

Dataset 64 128 256

Camel 0.0071 0.0064 0.0062

Dyna 0.0088 0.0079 0.0078

Swing 0.0080 0.0067 0.0064

Horse 0.0094 0.0083 0.0081