Recognition to weightlifting postures using convolutional neural networks with evaluation mechanism

Abstract

For modern sport training, critical posture recognition of athletes can be helpful for athlete training. This paper proposes convolutional neural networks using a two-stage evaluation mechanism to recognize four critical postures of a weightlifter, that is, force releasing, knee flexion, knee extension and highest point. Using the proposed convolutional neural networks classify images and extract image features. Meanwhile, a two-stage evaluation mechanism is adopted to calculate the scores of image features, based on the calculated scores, the four critical postures can be accurately recognized. Experimental results show that the accuracy of our method is 92.85% in the recognition of the four critical postures, which defeats the competitive methods in critical posture recognition. Moreover, the training time of the proposed method linearly augments along with the increasing of data volume, that is, non-exponential growth, consequently, our method can be applied to large-scale image datasets. We demonstrate that the two-stage mechanism can calculate the scores of image features independently of specific scenarios, which assist neural networks improve classification capabilities. Moreover, using the two-stage mechanism can simplify the designed complexity of neural network architectures, thus reducing the training parameter of neural networks in the process of critical posture recognition.

Keywords

Convolutional neural networks posture recognition two-stage evaluated mechanism

Introduction

Generally, for weightlifting training and competitions, coaches need to propose corresponding training plans based on the individual situations of different athletes to improve their skills.¹ The traditional training methods employ training plans through utilizing the training theory and experience proposed by coaches, then combining with the technical level of weightlifters. This training manner is subjective, moreover, coaches consume much time to analyze athletes’ postures, which difficultly and objectively evaluate training effectiveness of athletes.² However, the core of modern sports training is precision and efficiency, and coaches can accurately evaluate athlete’s postures to improve training effectiveness. As such, it is great significance for improving scientific nature of coaches’ training plans and improving training effectiveness of athletes through accurately identifying training postures of weightlifters.³

Weightlifting gesture recognition belongs to a kind of human gesture recognition. Currently, there are two main methods of human gesture recognition. One is inertial sensor recognition, and the basic idea is that an athlete’s body is equipped with a simple and lightweight data acquisition sensor, which sends the collected data in real-time to the processing terminal, and then recognizes athlete’s postures based on various posture data. The disadvantage is the large number of devices, which is not conducive to promotion and application. The other one image recognition, according to the number of image acquisition devices, posture recognition based on image acquisition can be divided into monocular video recognition and multi eye video recognition. The general idea of image acquisition gesture recognition is to capture athlete’s images or video through using a camera, then to extract the motion features hidden in the image and video, finally to design a classifier to recognize athlete’s gestures.

Some efforts regarding action recognition have been widely obtained by using computer vision and multimedia community technics. Currently, existing most methods mainly focus on human posture estimation, foreground segmentation, and local foreground motion modeling to explore the characteristics of action performers. For example, Simonyan and Zisserman⁴ proposed a two-stream Convolution-Net network to combine spatial and temporal information for action recognition. Li et al.⁵ utilized a temporal attention model to create global guidance for fusing visual and motion features. Additionally, aiming at posture estimation and action recognition, these method in references ^6–11 are proposed. To achieve action recognition, Li et al.¹² sufficiently take account into hand motion features, gaze features, and object features. To identify different actions, Cai et al.¹³ considered the relationship between grip types and object attributes. While, in detecting object instance action, Baradel et al.¹⁴ proposed an object relationship network to infer semantically meaningful spatiotemporal interactions.

For action recognition, some method uses full action sequences for pattern learning,¹⁵ such as, these methods implemented in references^16–18, but they often overlooked semantics. In fact, there is a mutually reinforcing relationship between action recognition and motion prediction. With the development of deep learning techniques, some recurrent neural networks (RNNs) models have captured temporal correlations along consecutive frames,¹⁹ and convolutional neural networks also gain remarkable achievements,^20,21 such as the RNNs in references^22–25. These methods rarely study joint relationships and lack key dynamics. In order to capture richer features, Yan et al.¹⁵ proposed a skeleton graph with nodes. The references^26,27 established a relationship between thicker body parts. These methods are essentially based on the aggregation of body structure information, while ignoring some implicit relationships related to movements. In addition, graph-based methods also attract attention.^28–32 Graphs can effectively represent data related to non-grid structures,^33,34 for instance, Simonyan and Zisserman⁴ leveraged graph convolution for action recognition. Besides these methods, the methods proposed by the references^35–37 provide the insightful suggestions for the image recognition.

Motivations

Our research goal is to recognize the four critical postures of a weightlifter, that is, force releasing, knee flexion, knee extension and highest point, thus providing some insights to weightlifter training. Therefore, to achieve the motivation, we designed convolutional neural networks with four convolutional layers, namely F-CNNs. Furthermore, we proposed a two-stage evaluation mechanism to assist the model for the recognition of the four critical postures. From the perspective of model structures, convolutional neural networks are capability of image classification and feature extraction. This ability can assist us to classify image and capture image features. From the view of a method, the two-stage evaluation mechanism can evaluate the four critical postures through calculating the scores of image features. In summary, the four critical postures can be well recognized by combining convolutional neural networks with a two-stage evaluated mechanism.

Contributions

We summarized main contributions of this work.

(1) We proposed a two-stage evaluated mechanism to recognize the four critical postures. The two-stage evaluated mechanism calculates the scores of image features independently of specific scenarios. The calculation scores yielded by the mechanism can assist neural networks to improve classification accuracy.

(2) We propose that the training time of our method does not increase exponentially, that is, linearly with the increase of data volume, since we simplified the design of neural network, meanwhile, the two-stage evaluated mechanism does not yield too consumption during training. Therefore, our method can be applied to large-scale image datasets.

(3) Our convolutional neural networks do not take into account complex network structures, although complex network structures can improve the learning ability of a model itself. Here, we use a two-stage mechanism instead of the design of complex network architectures. By so doing, the time consumption during training can be reduced.

This paper is arranged as follows. Section 2 interprets the method and the model. Experiments are designed in Section 3, and Section 4 analyzes and Section 5 discusses the experimental results. Section 6 summarizes the conclusion and directs the future work.

Methodology

This section introduces posture evaluation and model implementations. To accurately recognize the four postures, we designed a two-stage evaluated mechanism, which construct an evaluation indicator of calculating feature regions to evaluate the scores of different postures. For model implementation, we chose the network architectures based on convolutional neural networks. Using convolutional neural networks can well learn these features in images and classify images.

The thought of the proposed method is that convolutional neural networks classify the input image sets into four categories, meanwhile, label feature regions of images, that is, feature labeling. Then, the scores of the features obtained in images are calculated, based on the calculated scores, it can be determined which type of image corresponds to which posture.

Posture evaluation

We analyzed the four postures of a weightlifter, such as force releasing (Posture 1), knee flexion (Posture 2), knee extension (Posture 3) and highest point (Posture 4), as shown in Figure 1. To recognize the four postures, we construct the evaluated indicator regarding the four postures.

Figure 1.

Four postures of a weightlifter.

We took account into athlete’s body regions as a marker, that is, feature regions, and then these four postures are identified by calculating the posture score. Feature regions are marked with blue rectangles, in addition, the horizontal position of a barbell is labeled with a yellow line, illustrated in Figure 2. Here, we calculate the vertical distance d between the yellow line labeling the barbell and the center of the blue rectangle labeling the body region. Meanwhile, taking the athlete’s sole as a horizontal reference baseline, we calculate the vertical distance D between the barbell and the athlete’s sole, that is, the D is equal to the vertical distance from the yellow line to the bottom of blue rectangle. The height H of the marked feature region is equal to the vertical distance between the athlete’s head and her feet, that is, the H is height of the blue rectangle. The details of evaluation indicator are as follows

\begin{matrix} Score = \\ {\begin{matrix} P_{m}^{S}, m = 2 \leftarrow \arg min {S_{i}^{I} = 1 - \frac{1}{d}, i = 1, 2, 3, 4} \\ P_{n}^{S}, n = 1, 3, 4 \leftarrow S_{j}^{II} = 1 - \frac{1}{D - H + θ}, j = 1, 2, 3, 4, j \neq i, θ > 1 \end{matrix} \end{matrix}

(1)

Where $S_{i}^{I}$ and $S_{j}^{II}$ are the score for i-class image and j-class image, $j \neq i$ , respectively. $P_{m}^{S}$ and $P_{n}^{S}$ are m-th posture and n-th posture. $θ$ is a constant being >1 to avoid the situation where the denominator is zero.

Figure 2.

Posture evaluation. Using blue rectangles to mark feature regions, and yellow lines are used to mark the horizontal position of barbells.

Two-stage evaluated mechanism

The principle is to calculate the scores of the features obtained in images, and then to identify these postures based on the calculated scores. Two stages adopt a different evaluated indicator to calculate the scores. In first-stage evaluated mechanism, we utilize $S_{i}^{I}$ to calculate the first stage scores of these four types of features obtained from these four types of images. Then, we choose this type of features with the minimum score, and accordingly, these images corresponding to this type of features are treated as Posture 2. By using first-stage evaluated mechanism, it can be determined which images correspond to Posture 2.

Thereafter, in second-stage evaluated mechanism, the second stage scores of these features screened from first-stage are obtained by using $S_{j}^{II}$ . Similarly, these images corresponding to these features with the maximum score are regarded as Posture 1, instead, those with the minimum score are considered as Posture 4. Finally, those remaining images are automatically confirmed as Posture 3. When second-stage evaluation is terminated, all of images are all recognized.

Model implementations

Convolutional neural networks have outstanding ascendency to image classification and feature extraction, therefore, we designed convolution neural networks with four convolutional layers, namely F-CNNs. Figure 3 displays the structure of F-CNNs, which uses four convolutional layers with $3 \times 3$ convolutional kernel. As follows

(1) $3 \times 3 \times 64$ convolution and ReLU, followed by $3 \times 3 \times 128$ convolution and ReLU.

(2) $3 \times 3 \times 256$ convolution and ReLU, followed by $3 \times 3 \times 512$ convolution and ReLU.

Figure 3.

F-CNNs structure. The structure consists of four convolutional layers, four pooling layers and a fully connection layer.

Four pooling layers use the max pooling, and a fully connection layer is used. Having that

(3) $2 \times 2$ max pooling with stride 1.

(4) a fully connection layer is considered.

Hyper parameters of F-CNNs need to be determined, for example, batch size, base learning rate, etc., here, we carefully studied following hyper parameters.

(i) Learning rate and training epoch. During the training, the loss begins to decline rapidly and then fluctuates significantly in subsequent training, which may be caused by high learning rate. In this scenario, we suggest to tune learning rate. If there is a slight upward curve in the loss, we proposed to reduce the training epoch, since there may exist over-fitting risk. Figure 4³⁸ unveils the relations between learning rate, loss and epoch. In this work, we used a certain range of tuning the learning rate and training epoch, where learning rate $L \in {e - 2, e - 3, e - 4, e - 5, e - 6, e - 7}$ , and training epoch $E \in [100, 1000]$ .

(ii) Batch size. When the configuration of batch size is not appropriate, local optimum may occur. For instance, a small batch size introduces greater randomness, which leads to difficult converge. Within a certain range, the larger batch size is, the more accurate descent direction, and the smaller the training shock obtain. However, when batch size reaches a certain extent, the descent direction basically does not change. Hoffer et al.³⁹ et al. pointed out that the performance degradation in large batches is due to insufficient training time. Smith et al.⁴⁰ and Smith and Le⁴¹ demonstrated that for a fixed learning rate, there is an optimal batch size that can maximize the accuracy of the testing. Additionally, batch size is positively correlated with learning rate and the size of a training set. Considering batch size, epoch, and precision, Figure 5^39–41 unveils the relationship among batch size, training epoch, and accuracy. It can be seen that the speed to treat the same amount of data augments along with the increasing of batch size. Taking Figure 5 as a reference, we dynamically tune batch size during the training process based on the relations between epoch and accuracy.

(iii) Convergence. Currently, it is difficult to accurately determine whether neural networks are fully trained and obtain an optimal value by using existing methods. However, there are still some ways to evaluate whether the networks have converged, thereby assisting us stop training at the appropriate location (i.e. obtaining better results without over-fitting). One method is to monitor loss curves, for instance, when training loss and testing loss remain relatively stable, the gap of both is almost constant, the networks are regarded as basic convergence.

(iv) Activation function. ReLU function is used as the activation function of convolutional layer.

Figure 4.

Relation of learning rate, loss and epoch.³⁸

Figure 5.

Batch size.^39–41

Experiments

Datasets and competitive methods

We processed the video into frames and then extracted the images. Among them, there are a total of 583 training images (namely training set), 182 validation images (i.e. validation set), and 184 test images (i.e. testing set).

To evaluate the detection ability of F-CNNs, apart from our method, four mainstream approaches were selected from a view of neural network structures, including Graph convolution neural networks (G-CNNs),³⁴ Feature-steered graph convolutions (F-SGC),³³ Graph convolutional lstm network (GC-lstm),²⁶ recurrent neural networks model (RNN-M).²³ We implemented the algorithms corresponding to the five methods (our method and four competitive methods) by Python language in Tensorflow Framework in Linux Operation System. Unless otherwise stated, all experiments run on the same GPU, and using the same environment.

Experimental designs

Here, we carried out multiple experiments to verify the proposed method from different perspectives. As follows

Experiment (1). Visualization analysis. To observe the recognized process of the four postures, F-CNNs is trained using the training set, and we visualized the process that the four convolutional layers learn the four postures. Then, visualized results were observed.

Experiment (2). Posture recognition. To verify the ability to recognize the four postures, our method was run on the testing set and validation set. Then, the results were analyzed.

Experiment (3). Performance comparisons. To evaluate method’s performance, the five methods (our method and four competitive methods) were run on the testing set and validation set. Then, comparison results were evaluated.

Experiment (4). Execution efficiency. To analyze algorithm’s efficiency, we compared the training time and testing time of the five algorithms corresponding to the five methods.

Results analysis and discussions

Visualization analysis

We visualized the results learning the four postures on the four convolutional layers, illustrated in Figure 6. Convolutional layer visualization is very useful since those well trained network weights typically appear smooth filters, instead, if it appears noise patterns, this may indicate that networks are not sufficiently trained or that over-fitting of networks has occurred. From Figure 6, it can be seen that the first convolutional layer conv1 in F-CNNs is smooth, indicating good convergence performance. Additionally, we also observe that a portion of the network weights is responsible for color feature extraction, and the other part is responsible for gray feature extraction.

Figure 6.

Visualizations of four convolutional layers in F-CNNs.

Posture recognition

This experiment used 182 images (validation set) to verify F-CNNs, and then used 184 images (testing set) to test the performance of F-CNNs. In the testing set, 41 images are Posture 1, and 40 images are Posture 2, and 52 images are Posture 3, and 51 images are Posture 4. The validation and testing results are given Table 1, showing that validation accuracy and testing accuracy are 92.85%, 89.67%, respectively. Through comparing with Posture 1 and Posture 2, it can be seen that the validation and testing accuracy of Posture 3 and Posture 4 are relatively low due to they have great similarity. (Please see Figure 2).

Table 1.

Recognitions of four postures using F-CNNs.

Postures	Validation				Testing
	Total	Correct	Incorrect	Accuracy (%)	Total	Correct	Incorrect	Accuracy (%)
Posture 1	41	40	1	97.56	41	38	3	92.68
Posture 2	40	38	2	95.00	40	37	3	92.50
Posture 3	51	46	5	90.20	52	46	6	88.46
Posture 4	50	45	5	90.00	51	44	7	86.27
Posture 1–4	182	169	13	92.85	184	165	19	89.67

Performance comparisons

Table 2 unveils the accuracy recognizing the four postures using F-CNNs and competitive methods, indicating that F-CNNs defeats the four competitors in recognizing the four postures. Indeed, compared with convolutional neural networks, recurrent neural networks are not good at classify images, therefore, RNN-M obtains a relative low accuracy.

Table 2.

Performance of different methods.

Methods	F-CNNs (%)	G-CNNs³⁴ (%)	F-SGC³³ (%)	GC-lstm²⁶ (%)	RNN-M²³ (%)
Testing accuracy	89.67±1.22	85.24±2.09	85.06±3.11	84.77±2.53	84.11±1.22
Validation accuracy	92.85±3.19	90.11±5.11	89.02±4.07	88.23±1.07	87.99±2.55

Execution efficiency

We compared the training time and testing time of the five algorithms, illustrated in Table 3. We find that F-CNNs does not show significant ascendency in training time, nevertheless, the training efficiency of F-CNNs is superior to that of both F-SGC and GC-lstm. The training time of neural networks is related to network structures and the number of hyper parameters. The network architectures implemented by F-SGC and GC-lstm are more complex than that of F-CNNs, so that they spend more cost on training network parameters than F-CNNs. Additionally, F-CNNs carried out an operation of calculating two stage posture score during training, which is also main consumption of training time. Therefore, the efficiency of G-CNNs and RNN-M defeat that of F-CNNs. While for testing time, there is not much difference between these five algorithms. Consequently, we find that the difference in efficiency between models is mainly reflected in the training time. Although the testing time has an effect, but it is not a critical factor affecting the efficiency.

Table 3.

Efficiency of algorithms. The five algorithms were trained on the training set until they converge, and then they were tested on the testing set.

Algorithms	F-CNNs	F-SGC³³	GC-lstm²⁶	RNN-M²³	G-CNNs³⁴
Training time (s)	238.55	387.23	401.05	165.11	177.38
Testing time (s)	57.23	59.17	56.09	56.32	54.39

The computational complexity of these methos based on neural network architectures depends on the number of hidden layers for training and the scale of image sets. Generally, to learn more meaningful features, more hidden layers might be considered, so the computational complexity increases along with the scale of hidden layers. In terms of our method, the time complexity relies on the number of hidden layers and the scale of image sets.

Some observations can be obtained from Figure 6 and Tables 1 to 3.

(1) The two-stage evaluated mechanism effectively recognizes the four critical postures. The calculation scores created by the two-stage evaluated mechanism can assist the model to improve classification accuracy.

(2) The training time of the model does not increase exponentially, that is, linearly with the increase of data volume, since the two-stage evaluated mechanism does not spend too consumption. Consequently, the model can be suitable for large-scale image datasets.

(3) Although complex network structures can improve the learning ability of a model itself, we used the two-stage mechanism instead of the design of complex network architectures, so that the time consumption during training can be reduced.

Discussions

Advantages

Our method defeats the four competitors because of adopting the two-stage evaluated mechanism. The details are as follows.

The proposed two-stage evaluation mechanism is used to recognize the four critical postures, that is, force releasing, knee flexion, knee extension and highest point. And the scores of the features for the four critical postures are calculated by the two-stage evaluation mechanism. Using the scores to evaluate postures is more accuracy than directly using features to classify postures. Indeed, from the perspective of network structure, our method or the four competitive methods adopted the structure of convolutional neural networks, moreover, the network structures used by the four competitors have more complex than our network structures. However, we introduced the two-stage evaluation mechanism instead of the design of complex network architectures. Consequently, this is also unique advantages of our method.

Limitations

Although our method wins over these competitive methods, however, our method has relative low accuracy in the identified ability for Posture 3 and Posture 4, since the scores are relatively close in the second stage evaluation when there is great similarity between Posture 3 and Posture 4. Not only is our method like this, but the competitive methods also encounter this embarrassment. In additional, the two-stage evaluation mechanism increases the training time of our model, so the training time might increase linearly on large-scale datasets.

Insights

The principle of image recognition relies on the extracted features from images. The recognition accuracy not only depends on the method itself, but also on an image quality. For example, noise interference, image distortion, etc., these negative effects can reduce the recognition accuracy of the methods. Convolutional neural network methods, as a mainstream image recognition method, aims to extract features from images, and then classify or recognize images based on the extracted features. Hence, feature extraction is the most important sector, since this means how to extract features that can distinguish images to the greatest extent possible. Due to the local perception characteristics of convolution operations, Convolutional neural networks are relatively sensitive to changes in position and scale in images. This may require additional processing and special design of the model for objects at different positions and scales to improve the robustness and generalization ability of the model.

Conclusion

This paper proposed convolutional neural networks fusing with a two-stage evaluation mechanism to recognize four critical postures of a weightlifter. The principle is that we designed convolutional neural networks with four convolutional layers to classify images, and to extract the feature regions of images. Then, the two-stage evaluation mechanism is used to calculate the scores of feature regions, based on the calculated scores, the different postures can be accurately recognized. Experimental results show our method defeats the competitors in critical posture recognition. The training time of our method does not increase exponentially, that is, the training time linearly augments along with the increasing of data volume. Consequently, we indicated that our method can be effectively applied to large-scale image datasets. In addition, the two-stage evaluated mechanism is independently of specific scenarios. In future work, we will explore the posture recognition under noise interference. Noise hidden in images may mislead convolutional neural networks, so as to weaken the classification ability of convolutional neural networks.

Footnotes

Contributions

Quantao He and Wenquan Tang proposed the methodology. Quantao He wrote the manuscript. Wenjuan Li and Baoguan Xu implemented the source code, designed the experiments and analyzed experimental results.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by The second batch of industry-university cooperative education project of the Higher Education Department of the Ministry of Education in 2021, “Virtual Simulation Teaching Reform and Exploration of Physical Education Based on VR/AR/MR Technology ”(202102509004), and the Research on AI+ Online Open Course Learning Support Service Innovation, General topic of Online Open Course in Universities of Guangdong Province (2022ZXKC401), and 2022 Shenzhen Stability Support Program Project “Development and Research of AI Intelligent Sports Training System” (20220811094439002), and “Research on the Exercise Control of Physical Education Teaching by Modern Technology,” the Mentoring Project of Shenzhen University.

Ethical approval

This manuscript does not contain any studies with human participants or animals performed by any of the authors.

ORCID iD

Wenquan Tang

Data availability

Data will be made available on request.

References

Verhoeven

Newell

. Coordination and control of posture and ball release in basketball free-throw shooting. Hum Mov Sci 2016; 49: 216–224.

Wei

Jiao

Wang

, et al. Sports motion recognition using MCMR features based on interclass symbolic distance. Int J Distrib Sens Netw 2016; 12(5): 1–15.

Zhang

. Automatic detection technology of sports athletes based on image recognition technology. EURASIP J Image Video Process 2019; 2019(1): 138–150.

Simonyan

Zisserman

. Two-stream convolutional networks for action recognition in videos. In: Proceedings advances in neural information processing systems, San Francisco, 2014, pp.568–576.

Yao

Duan

L-Y

, et al. Unified spatiotemporal attention networks for action recognition in videos. IEEE Trans Multimedia 2019; 21(2): 416–428.

Yang

Shen

, et al. Arbitrary-view human action recognition: a varying-view RGB-D action dataset. IEEE Trans Circuits Syst Video Technol 2021; 31(1): 289–300.

Liu

Akhtar

Mian

. Adversarial attack on skeleton-based human action recognition. IEEE Trans Neural Netw Learn Syst 2022; 33(4): 1609–1622.

Angelini

Long

, et al. 2D pose-based real-time human action recognition with occlusion-handling. IEEE Trans Multimedia 2020; 22(6): 1433–1446.

Luvizon

Picard

Tabia

. 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceeding IEEE/CVF conference on computer vision and pattern recognition, Washington, 2018, pp.5137–5146.

10.

Cong

Wang

Yan

, et al. CRB weighted source localization method based on deep neural networks in MultiUAV network. IEEE Internet Things J 2023; 10(7): 5747–5759.

11.

Cong

Wang

Lan

, et al. A generalized noise reconstruction approach for robust DOA estimation. IEEE Trans Radar Syst 2023; 1: 382–394.

12.

Rehg

. Delving into egocentric actions. In: Proceeding IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, 2015, pp.287–295.

13.

Cai

Kitani

Sato

. Understanding hand-object manipulation with grasp types and object attributes. In: Proceedings of robotics: science and systems (RSS’16), vol. 3, 2016.

14.

Baradel

Neverova

Wolf

, et al. Object level visual reasoning in videos. In Proceedings European conference on computer vision, London, 2018, pp.105–121.

15.

Yan

Xiong

Lin

. Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, Hawaii, 2018, pp.7444–7452.

16.

Jain

Zamir

Savarese

, et al. Structural-RNN: deep learning on spatio-temporal graphs. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, June 2016, pp.5308–5317.

17.

Martinez

Black

Romero

. On human motion prediction using recurrent neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, July 2017, pp.4674–4683.

18.

Gui

Wang

Liang

, et al. Adversarial geometry-aware human motion prediction. In: The European conference on computer vision (ECCV), Hawaii, September 2018, pp.786–803.

19.

Liu

Shahroudy

, et al. Spatio-temporal LSTM with trust gates for 3D human action recognition. In: The European conference on computer vision (ECCV), London, 2016, pp.816–833.

20.

Kim

Reiter

. Interpretable 3D human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), Hawaii, 2017, pp.1623–1631.

21.

Liu

Chen

. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 2017; 68: 346–362.

22.

Pavllo

Grangier

Auli

. Quaternet: a quaternion based recurrent model for human motion. In: British machine vision converence (BMVC), Chicago, 2018, pp.1–14.

23.

Gui

Wang

Ramanan

, et al. Few-shot human motion prediction via meta-learning. In: The European conference on computer vision (ECCV), Paris, 2018, pp.432–450.

24.

Kundu

Gor

Babu

. BiHMP-GAN: Bidirectional 3D human motion prediction gan. In: AAAI conference on artificial intelligence, Kuala Lumpur, 2019.

25.

Zhang

Lee

, et al. Convolutional sequence to sequence model for human dynamics. In: The IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA, 2018, pp.5226–5234.

26.

Chen

Wang

, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, 2019.

27.

Huang

Wan

Probst

, et al. Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 2017.

28.

Chen

, et al. Actional-structural graph convolutional networks for skeleton-based action recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, 2019.

29.

Jing

Wang

, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: The European conference on computer vision (ECCV), Birmingham, 2018.

30.

Thakkar

Narayanan

. Part-based graph convolutional network for action recognition. In: British machine vision conference (BMVC), London, 2018.

31.

Zhang

, et al. Spatio-temporal graph routing for skeleton-based action recognition. In: AAAI conference on artificial intelligence, Hawaii, 2019, pp.8561–8568.

32.

Wen

Gao

, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, 2019, Honolulu, HI, USA, pp.8989–8996.

33.

Verma

Boyer

Verbeek

. FeaStNet: feature-steered graph convolutions for 3D shape analysis. In: The IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA, 2018, pp.2598–2606.

34.

Valsesia

Fracastoro

Magli

. Learning localized generative models for 3D point clouds via graph convolution. In: International conference on learning representations (ICLR), Tokyo, 2019, pp.1–15.

35.

Özçelik

Altan

. Overcoming nonlinear dynamics in diabetic retinopathy classification: a robust AI-based model with chaotic swarm intelligence optimization and recurrent long short-term memory. Fractal Fract 2023; 7(8): 598–634.

36.

Yağ

Altan

. Artificial intelligence-based robust hybrid algorithm design and implementation for real-time detection of plant diseases in agricultural environments. Biology 2022; 11: 1732.

37.

Özçelik

Altan

. Classification of diabetic retinopathy by machine learning algorithm using entorpy-based features. In: Çankaya International Congress on Scientific Research, Ankara, Turkey, 2023, pp.523–535.

38.

Chen

. Classification for fruit fly images using convolutional neural networks. Inf Technol Control 2022; 51(2): 376–389.

39.

Hoffer

Hubara

Soudry

. Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in neural information processing systems, Ottawa, 2017, pp.1731–1741.

40.

Smith

Kindermans

Ying

, et al. Don’t decay the learning rate, increase the batch size, arXiv:1711.00489, 2017.

41.

Smith

. A bayesian perspective on generalization and stochastic gradient descent, arXiv:1710.06451, 2017.