Abstract
Human action recognition is a very popular field in computer vision research, and the research results are widely used in people's lives. This paper explores Kinect-based algorithm of human action recognition and applies it to the quality evaluation of cardiopulmonary resuscitation (CPR) operation. At present, the main means of CRP training is through physical auxiliary equipment, which has a large limitation and can only be carried out under specific conditions. CPR simulation training under general conditions can be effectively carried out by means of computer vision, which is a strategy worth popularizing. Using Kinect's powerful skeleton tracking capabilities to obtain key human skeleton data and then perform fine-grained human action analysis. Our model can obtain the critical compression depth (CCD) and compression frequency (CCF) of CPR. Compared with the-state-of-the-art, our algorithm has better stability and real-time performance. At the same time, our algorithm improves the time efficiency by about 60% while guaranteeing high accuracy. In addition, we guide the human body to perform standard movements by setting joint angle specifications.Moreover, our system has been proven to be valid by professional medical staff.
Introduction
Video-based human action recognition is a very popular direction in the field of computer vision,1–6 because it has been widely used in people's lives, such as video surveillance, human-computer interaction, video understand -ing, smart medicine and so on.In particular, the launch of the Kinect sensor camera by Microsoft has triggered a surge in the use of human skeleton information to analyze human action. Because of its powerful skeleton tracking function, Kinect enables researchers to analyze human action using human skeleton motion trajectories and has been very successful in many fields.
According to the survey, 5,44,000 people die of cardiac arrest every year in China. 7 Most of them are at home, work place or school, besides there are often onlookers in these places. 8 Because the prevalence of cardiopulmonary resuscitation (CPR) is far from enough, these patients can not be rescued in time and miss the best time to save, so their lives pass away. The research shows that the most effective first aid measure for the patients with cardiac arrest is to implement effective CPR for them. Within 3–5 minutes, the effective CPR for the patients can achieve a rescue success rate of 49%–75%. 9 However, more than 90% of the people have not received professional CPR training,and even about 60% of the people have no awareness of the importance of CPR in china.7–10 At the same time, about 20% of the people have no access to relevant training courses. Therefore, the popularization of CPR is imminent.
The basic process of CPR is: (1) chest compression; (2) open air way; (3) artificial respiration.8,9 Among them, the most challeng -ing part is chest compression, which requires rescuers to perform chest compression for at least 20 minutes. At the same time, the implementation of chest compression process has a more strict standard requirements, for the non-standard chest compression algorithm, may cause secondary injury to patients. Therefore, it is very important for the specification of CPR chest compression steps. The traditional CPR training algorithm is to complete the training with the help of dummies with monitoring data function. The biggest disadvantage of this algorithm is that ordinary people can not get expensive and professional training dummies. In addition, there is no effective feedback on the operation posture of trainees, and human computer interaction is very inadequate. Thanks to the widely use of sensor camera in the field of computer vision, people can use its powerful skeleton tracking function to capture human skeleton parameters and effectively analyze human behavior. In this paper, a CPR training guidance algorithm is proposed. The CCF and CCD of CPR training process are analyzed in real time through the skeleton data obtained by Kinect, and the incorrect posture in CPR training process is corrected in real time accord -ing to CPR training specification.
The contribution of this paper can be summarized as follows:
In order to improve the popularity of CPR, this paper proposes a convenient and effective algorithm for training and evaluating CPR operations., which has higher time efficiency, precision and stability. The algorithm in this paper can accurately and steadily obtain the key parameters of CCF and CCD during CPR operation. At the same time, our algorithm improves the time efficiency by 60% compared with Reskin's. On the other hand, our algorithm can also calculate the joint angle in real time and give timely corrective suggestions for irregular posture. This paper combines computer vision with CPR training, and presents a complete training and evaluation system for CPR training based on human skeleton data, which has been validated by medical professionals.
Related work
At present, a large number of researchers have been working in the field of video-based human action recognition, and have achieved many surprising results,1–6 making the current human action recognition with very high recognition accuracy. Video-based human action recognition is mainly developed on the understanding of color video. Convolutional neural network plays an excellent role in understanding picture information. Many researchers have made a lot of improvements and improvements on the basis of traditional convolutional neural network. Karen et al. 11 proposed a two-stream network architec- tures, which uses color pictures to extract the appearance characteristics of the human body and optical flow information to extract the motion information of the human body. This network architectures not only has superior effects, but also makes a large number of researchers carry out further research on this basis.12–15 LSTM network architectures has achieved great success in understanding sequence information.16–18 Swathikiran et al. 4 proposed an attention mechanism injected LSTM network for video understanding, and achieved amazing results. Due to the wide use of sensor cameras, human action recognition has entered a new level. Many breakthroughs have been made in the analysis of human motion through human skeleton motion trajectory. This way can reduce the impact of background and noise on human action recognition, and human skeleton can well represent human motion under certain conditions. Song et al. 19 proposed an end- to-end LSTM network based on spatio-temporal attention mechanism, which uses human skeleton information to recognize human movements, and achieved good success. Kinect is used to track human bones and extract the behavioral characteristics of human body when performing CPR, which can well analyze the quality of CPR operation. Wang et al. 20 proposed a real-time feedback system of CPR training based on Kinect. The system displays CCF and CCD of CPR training process on the computer screen for trainer to adjust their pressing frequency and depth. The author uses some physical markers to place on the hands of trainer and the chest of training models to obtain CCF and CCD, and provides guidance for trainers. Semeraro et al. Hanners et al. 21 put forward a way of using telephone video stream to assist cardiopulmonary resuscitation. They have reached the means to improve the success of patients' treatment by establishing a direct video phone for callers and EMS dispatchers, and evaluating the CPR operation of rescuers through remote guidance by professional personnel. 22 put forward a system called RELIVE, which can extract CCF and CCD of CPR training through the color image and depth image acquired by Kinect. This system can be regarded as a means to evaluate the quality of CPR training. The system also requires trainers to place markers on their hands, which has limitations. Li et al. 23 determined that untrained bystanders are unlikely to be helpful in CPR in drowning events by assessing the impact of fatigue caused by water rescue on the quality of subsequent CPR and the impact of bystander participation on CPR quality in lifeguard rescue. Therefore, professional CPR training is particularly important. Xu et al. 24 used Kinect equipment and fitness exercise together, and used some wearable equipment to obtain real-time relevant data of human body movement state, gave real-time guidance of training actions, real-time correction of improper posture in the process of human body fitness exercise, and achieved good human-computer interaction. Christian et al. 25 put forward a system called RESKIN. This system uses the skeleton tracking function of Kinect to obtain the skeleton data of human body in real time. By capturing the motion data of human body in the process of CPR training, it uses heuristic swarm intelligence algorithm to search the target parameters CCF and CCD. After a large number of participants' verification,the verification results are effective.Our algorithm uses the skeleton tracking function of Kinect to capture the skeleton data of human motion in real time, extracts the data by means of sliding window mechanism, and uses cosine curve model to represent the real-time state of CPR training, and uses gradient descent optimization strategy to find the optimal parameters of the model, so as to obtain the CCF and CCD we want. Our algorithm has faster fitting speed and solves the problem that the RESKIN fitting model is slow, resulting in the Kinect losing part of the data and obtaining higher accuracy. On the other hand, we use the acquired bone data and spatial geometry knowledge to correct the irregular posture of the trainer in real time. Combined with CCF and CCD, we guide the trainer to carry out the standard CPR training, correct the trainer's wrong posture, wrong press speed and depth put forward a complete CPR training system.
Proposed CPR training model
According to the standard CPR rules, during the CPR training, the trainer’s body will naturally do the simple harmonic motion, and the cosine curve model can well represent the real-time state of the CPR training process. The algorithm in this paper utilizes the powerful human skeleton tracking function of Kinect to capture human skeleton data in real-time,extract human motion feature information, combine wi-th the sliding window mechanism to extract data, fit the cosine curve model, and extract the corresponding model parameters, so as to achieve the purpose of obtaining the desired CCF and CCD, as shown in Figure 1. By combining the acquired human skeleton data with the spatial geometry knowledge and calculating the specific angle information, the purpose of correcting the incorrect CPR training posture of the trainer was realized.

Display of model fitting effect.
Kinect depth camera can acquire the depth mapping of the scene, and then calculate the three-dimensional spatial coordinates of each pixel in the scene through spatial geometric transformation. With this technique, Kinect can estimate the coordinates of 20 joint points when the human body is located within its lens perspective. As shown in Figure 2, joint position information estimated by Kinect. The origin of each three-dimensional spatial coordinate is the location of the Kinect lens. In addition, Kinect acquires 30 frames of bone data per second, which can track the dynamic information of the human body in real time. As shown in Figure 3, as long as the human body is located in the perspective of Kinect, the human bone data can be acquired in real time.

Kinect skeleton tracking for human skeleton data.

Kinect Skeleton tracking diagram.
Motion trajectory feature
Kinect collects the human skeleton data at 30 frames per second. In this paper, the human skeleton data of each frame is collected, and the distance between the human skeleton data and the ground is calculated. Combined with the time change, the motion feature of the trainer in the CPR training process are described. Through experimental comparison, when the human body is facing Kinect, the limbs of the human body are most easily tracked, and the effect is stable and effective. In this paper, three-dimensional spatial data points of shoulder skeleton are extracted as feature data, and the left shoulder is recorded as q, and the right shoulder is recorded as p. in order to simplify the data feature description and reduce the error, the middle points of q and p are taken as key feature data points, and recorded as k=(x,y,z). Select the ground plane as the reference plane, use the depth map of the environment obtained by Kinect, we can get the point cloud data of the current environment, with the help of RANSAC algorithm, the parameters of the ground plane model are easily obtained in the current environment, and recorded as N=(nx,ny,nz,a). According to the geometric relationship between 3 D space point and plane, the distance between 3 D space point and plane can be obtained by equation (1), which is recorded as d.
Where, A represents amplitude, f represents frequency, ρ represents initial phase, h represents offset distance. We extract the cosine curve model parameters fitted by datasets of D to obtain the CCF and CCD that we need. Here, the parameters of each fitting model can approximately represent the CCF and CCD at the current time.
For CPR training, our goal is to achieve real-time guidance for trainers, so there are high real-time requirements. Gradient descent algorithm is used to optimize the parameters of cosine curve model. It needs less training iterations and faster speed to fit our target model, which can meet the real-time requirements with high accuracy and stability.
Body joint angle feature
In the aspect of CPR training posture guidance, we uses the skeleton tracking function of Kinect to obtain the three-dimensional skeleton data points of human body in real-time, and calculates the angle features with the knowledge of space geometry, and corrects the incorrect posture of the trainer in the training process in real time with CPR training posture rules. In 3 D space, clips between adjacent joint points can be represented by vectors. For example, you can represent clips between the shoulder and elbow joints as v.The angle of a joint point θ<v1,v2> can be calculated by equation (3).
Where, v1 represents the segment between pi and pj, v2 represents the segment between pi and pk, meanwhile, i ≠ k≠j. v1·v2 represents the dot product of two vectors, |v1| and |v2| represent the inner product of v1 and v2 respectively.
The angle β between the three-dimensional vector v and the plane can be calculated by equation (4).
Where, v is any space vector, n is the normal vector of the ground plane model, v·n is the dot product of the two vectors, |v| and |n| are the inner products of v and n respectively.
Loss function
In this paper, mean square error (MSE) is used to measure the difference between the estimated model and the real sample, and the most appropriate cosine curve model parameters are obtained by minimizing MSE. which is defined as equation (5):
Where, |D| represents the length of sliding window, d[t] represents the absolute distance between k = (x,y,z) and the ground at t time, and y(t) represents the value of cosine curve model at t time.
Parameter optimization
Gradient descent algorithm is a search based optimization algorithm, which is suitable for solving convex optimization problems. Its purpose is to find an appropriate iterative direction for the input data, so that the output value can reach the local minimum value. The gradient update formula is defined as follows:
Where, ω0 represents the current position of parameter ω, ω 1 represents the position of the next step, α represents the learning rate, an ΔJ(ˑ) represents the gradient direction.
At the same time, momentum can be used to reduce the oscillation of the gradient descent process. In the basic gradient descent algorithm, ω1=ω0 – v, of which v = αΔJ(ω). When momentum is added, v is the vector sum of the current gradient decrease and part of the last update v’, so v = αΔJ(ω) + v’m, where m is momentum. If the gradient direction of the last time is the same as that of this time, momentum can accelerate this update, otherwise momentum can decelerate this update. Therefore, adding momentum mechanism to the traditional gradient updating algorithm can make the algorithm better and faster convergence.
The overall algorithm of proposed in this paper is shown in Table 1. Kinect acquires human skeleton data and human motion information in real time at the frequency of 30 frames per second. In the process of CPR training, the frequency and depth of compression will not be consistent throughout the training process. Therefore, through the above mentioned algorithm, the cosine curve model is repeatedly fitted at a time interval of fupdate, and the model parameters are extracted to obtain CCF and CCD, which are approximately real-time CCF and CCD, as an index to guide CPR training. On the other hand, through real-time acquisition of three dimen- sional human skeleton data, real-time calculation of the corresponding angle information, extraction of human pose features, combined with CPR training standards and norms, to achieve real-time correction of the trainer's incorrect posture. Through the combination of CCF and CCD numerical guidance and posture specification guidance, we have realized a complete guidance system.
The algorithm flow chart of this paper.
Experiments
Experiments setting
In this section, we will describe the experimental setup of the proposed algorithm in detail, and show the experimental results and analyze the results.We used Matlab 2017 and Kinect V2 to complete all experiments on a computer equipped with Intel i7 8700 CPU. We place the Kinect horizontally at a height of 1.5 m from the ground. The trainer and the simulation patient conduct the training operation at 1.5 m in front of the Kinect lens. The trainer faces the Kinect and presents a kneeling posture, simulat- ing the patient lying horizontally at the trainer's knee. The average CCF was 100 cpm, and the average CCD was 5.5 cm. The sliding window s = 5 second is selected, which corresponds to the frequency of data acquisition per second of 30 frames of Kinect. There will be 150 data points participating in the cosine curve model fitting in each time. We use fupdate= 1 second as the time interval to repeatedly fit the target cosine curve model. Every time we update the model parameters, we can approximate them to the real-time CCF and CCD of the time segment between this time and the previous time.
Learning rate and momentum
We need to optimize four parameters, which are A, f, ρ and h. The evaluation criteria were referred to the AHA (American Heart Association) guidelines for CPR. So the CCF reference value was 100–120 cycles per minute, and the CCD reference value was 0.05 m. We set the initial range of parameters as (−0.1,0.1), (0,3), (0,3), (0,1) respectively, and the experimental setting value are 0.05, 1.5, 1.5, 0.5 respectively. For the values of learning rate α and momentum m, we determine their values through experimental comparison, and take the cosine curve model fitting time as the evaluation standard. Set the basic gradient adjustment experiment to determine the relationship between learning rate α and time, as shown in Figure 4. When learning rate α is greater than 1, the model fitting time tends to be stable. Then, we initially set the learning rate α = 1.9, and add momentum m in the gradient adjustment process. The result is shown in Figure 5. As the value of momentum m changes from 0 to 1,the time required for model fitting also decreases steadily. However, when the value of m is greater than 1, the model can not be successfully fitted. Further, we choose momentum m = 0.9 to increase the learning rate. The experimental results are shown in Figure 6 When α = 2.7, the model fitting speed is the fastest. At the same time, considering that the learning rate α value is too large, although the fitting speed is faster, it is also prone to the phenomenon that the model finally oscillates near the optimal solution and cannot be fitted successfully. Therefore, we set the initial learning rate α as 2.7 and momentum m as 0.9, and set its value to half of its current value every 300 iterations.

Comparison of learning rate.

Comparison of momentum.

Determine learning rate.
Time efficiency comparison
In the aspect of comparing the time efficiency of single fitting cosine curve model, we use the same time window and the same set of data to compare the difference in time efficiency of different algorithms under the condition that RESKIN, PSO, 24 GWO 25 and our proposed algorithms reach the same fitness value. The experimental results are shown in Figure 7. It can be seen from the figure that PSO, GWO and our proposed algorithms are after about 0.1 second, we can reach the preset adaptation value, and our algorithm starts to be close to the preset adaptation value at about 0.01 second, and then approaches the preset adaptation value with a smaller gradient. In contrast, RESKIN only reaches the preset adaptation value at about 0.25 second after the experiment, so we can see that our algorithm is more than twice as efficient as RESKIN in terms of time efficiency. Our goal is to achieve real-time posture tracking and guide CPR training, so real-time is very important.26,27

Comparison of time efficiency.
Stability comparison
Our system needs to update the target model in real-time to show the current pressing frequency and pressing depth of trainer. We update our cosine curve model with fupdate as the time interval, and extract the CCF and CCD we want from it. Frequent fitting models have higher requirements for the stability of the algorithm. In this section, we use RESKIN, GWO, PSO and our algorithm to fit the cosine curve model according to the time interval of fupdate on the same set of one minute data. This data is a set of uniform data, with an average pressing frequency of 100 cpm and a pressing depth of 5.5 cm. The experimental results are shown in Figures 8 and 9. In one minute, the four algorithms all fit the model 60 times. The results of PSO and GWO are jittery and easy to fall into the local optimal solution. For this system, it is not applicable. RESKIN and our algorithm are relatively stable in terms of pressing frequency or pressing depth. The CCF output of RESKIN and our algorithm is stable at 100 cpm, while the CCD also almost floats at 5.5 cm up and down, rarely with large fitting jittery. The CCF is consistent with CCD, which can find the most suitable model parameters. For this system, it is more stable and applicable. Considering the model fitting time and algorithm stability, our algorithm is superior.

CCF Stability comparison of algorithms.

CCD Stability comparison of algorithms.
Model prediction error analysis
In this section, 10 groups of standard data are collected, each group of time is one minute, the average compression frequency is 100 cpm, and the average compression depth is 5.5 cm. At the same time, the compression frequency and depth per second of each group of data are not completely equal, showing the uneven distribution state of speed and depth. We can get the CCF and CCD of each set of data at each time in a minute by model calculation. In addition, we get the error between the predicted value and the ground-truth, averaging to every second, so that we can get the average prediction error, as shown in Table 2. Through comparison, it can be seen that the obvious errors of PSO algorithm and GWO algorithm in compression frequency and compression depth are much higher than those of the other two algorithms, which shows that there is a large error in the actual estimation accuracy of PSO and GWO algorithms. Our algorithm no matter on CCF or CCD, the accuracy of this algorithm is higher than RESKIN.
Average error between CCF(cpm)/CCD(cm) and ground-truth of four algorithms.
Posture analysis of CPR training
The CPR training system proposed in this paper, another important branch of its function is the real-time posture guidance. Using the skeleton tracking function of Kinect,we can obtain the three-dimensional space coordinates of shoulder, elbow and wrist in real time, as shown in Figure 2, including 5,67,910 and 11 points. we can obtain the feature points k,r,mark the midpoint of 5, 9 as k, 7 and 11 as r, and use RANSAC algorithm to calculate the parameters of the ground plane model in the current environment, and its normal vector is n. Accord- ing to the CPR training standard, the hands and arms of the trainer should be vertical to the chest plane of the simulated patient at all times during the training, so the β calculated by equation (4) for vkr and n should be 90°. At the same time, during CPR training, the trainer's arms should be kept straight at all times, so the angles θr,θl of the joints where points 6 and 10 are located should be kept at 0°,θr,θl can be obtained by equation (3). Considering the inherent error of Kinect, we set an error threshold ε. By comparing the absolute error between θr,θl,β and the standard value, we can evaluate whether the current training posture conforms to the standard and correct the irregular posture in time.
Figure 10(a) is the standard posture, and the absolute errors of θr,θl,β with standard values meet the preset threshold value of ε = 8°. Figure 10(b) and (c) respectively show that the absolute errors of the current right and left elbow joint angles θr,θl with standard values exceed the threshold value of ε, Figure 10(d) shows that the absolute errors of both hands elbow joint angles θr,θl with standard values exceed the threshold value of ε, and Figure 10(e) shows the absolute errors of β with standard values exceeds the preset threshold of ε.

Analysis of standard posture.
Conclusion
The main purpose of this article is to solve the problem that the popularity of CPR in China is not sufficient and the general public cannot easily obtain CPR training. So we provide a convenient way for CPR training to meet the requirements of the premise, and at the same time, it can be widely obtained and used by the public.
Our algorithm can fit the model faster and meet the real-time requirements. our algorithm has higher CCF and CCD estimation accuracy and meets the estimation accuracy requirements. our algorithm can fit each model more stably and effectively, and can meet the stability requirements of the algorithm well. Finally, compared with RESKIN, our algorithm provides a real-time posture correction enhances the function of human-computer interaction, has a good feedback function in vision, and improves the practical value. Through the operation and evaluation of professional medical staff, it shows that our algorithm is feasible.
Footnotes
Acknowledgements
Thanks for the CPR technical guidance of Ms.Lin Jie in Fuzhou the Second Hospital Affiliated to Xiamen University. The content of this paper is the research results of Innovation team project of FuZhou health committee, subject no. 2018-S-WT3.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: the support of the National Natural Science Foundation of China under Grant 61473330 for this research.
