Abstract
Health smart home, as a typical application of Internet of things, provides a new solution for remote medical treatment. It can effectively relieve pressure from shortage of medical resources caused by aging population and help elderly people live at home more independently and safely. Activity recognition is the core of health smart home. This technology aims to recognize the activity patterns of users from a series of observations on the user’ actions and the environmental conditions, so as to avoid distress situations as much as possible. However, most of the existing researches focus on offline activity recognition, but not good at online real-time activity recognition. Besides, the feature representation techniques used for offline activity recognition are generally not suitable for online scenarios. In this article, the authors propose a real-time online activity recognition approach based on the genetic algorithm–optimized support vector machine classifier. In order to support online real-time activity recognition, a new sliding window-based feature representation technique enhanced by mutual information between sensors is devised. In addition, the genetic algorithm is used to automatically select optimal hyperparameters for the support vector machine model, thereby reducing the recognition inaccuracy caused by manual tuning of hyperparameters. Finally, a series of comprehensive experiments are conducted on freely available data sets to validate the effectiveness of the proposed approach.
Keywords
Introduction
Nowadays, improvements in medicine have increased the average age of the world’s population. The United Nations predicts that by 2050, 22% of the world’s population will be above 65 years of age. 1 As a result, most nations have to face the demographic modification problem and need to develop a series of healthcare technologies to help elderly people live their life in good conditions. In modern society, the avoidable distance between family members makes elderly people often live alone and has to be autonomous. Moreover, with the increase in life expectancy, diseases such as Alzheimer’s become more and more prevalent. To avoid distress situations (fall for instance) as much as possible, telemonitoring technologies should be developed to detect significant changes in the activities or habits of a person and help elderly people stay at home independently and safely.
Health smart homes equipped with various sensors provide an excellent solution for telemonitoring. 2 The MIT (Cambridge, MA, USA) project House_n is a good example. In this project, hundreds of sensors are installed in a flat to monitor users’ daily activities. 3 In addition, users are provided with a series of human–machine interfaces to control their environments, which can help people maintain physical and mental health. The Georgia Institute of Technology works on “The Aware Home Research Initiative,” 4 a two-floor smart home designed to consider the different requirements of children with mental disabilities and the elderly people of a family. Researchers utilize motion and environmental sensors, video cameras and Radio Frequency IDentification (RFID) tags deployed in the house to explore ways to help people live independently and safely at home when they are old or handicapped. In France, the researchers of both AILISA 5 and PROSAFE 6 projects use presence infrared sensors to monitor users’ activities and raise alarms in case of anomalies.
A large number of sensors deployed in a smart home environment definitely produce a large volume of heterogeneous and multidimensional streaming data. Therefore, developing effective data processing technologies to detect anomalies inside the large amount of information is a big challenge. This article presents our research and contribution to the automatic activity recognition technology in health smart homes. We propose a novel activity recognition approach based on a multi-class support vector machine (SVM) framework. A sliding window–based feature representation method enhanced by sensor mutual information is designed to support online real-time activity recognition. In addition, the SVM classification model is optimized by the genetic algorithm to automatically select the optimal hyperparameters. This approach can effectively identify multiple categories of daily activities of the elderly people. The key contributions of this work are summarized as follows:
We design a sliding window–based feature extraction method, which can effectively reduce the influence of irrelevant information contained in a time window of sensor events by incorporating sensor mutual information into the feature vector, thereby improving the accuracy of activity recognition.
We propose a multi-class SVM classification framework based on the above feature extraction technique to realize online real-time activity recognition. The genetic algorithm is employed to automatically select optimal hyperparameters for the SVM classifier, thereby reducing the recognition inaccuracy caused by overdependence on human experience.
Comprehensive experiments are conducted over freely available data sets to validate the effectiveness of the proposed SVM-based activity recognition approach.
The rest of the article is organized as follows. The research literature about activity recognition is presented in section “Related work.” Section “Preliminaries” introduces the SVM theory, and then presents the problem statement and the approach overview. In section “Activity recognition based on genetic algorithm–optimized SVM classifier,” we elaborate on the proposed activity recognition technology, including segmentation of sensor event sequence, feature extraction, and genetic algorithm–optimized SVM for activity recognition. Comprehensive experiments are conducted in section “Experiments and discussion” to verify the effectiveness of the proposed approach. Finally, we draw a conclusion in section “Conclusion.”
Related work
Activity monitoring technology
Monitoring user actions and environmental changes is fundamental to human activity recognition. According to different monitoring equipments, activity monitoring technology is generally divided into two categories: video-based and sensor-based.
Video-based activity monitoring
Video-based activity monitoring 7 continuously tracks and records user actions through cameras installed in a smart environment. Then, a series of two- or three-dimensional (2D or 3D) images are processed by specific algorithms for activity recognition, where 2D images can be generated by a single camera and 3D images should be generated by at least two cameras.8,9 Although the video-based activity monitoring technology is perceived as very intuitive, it still has some shortcomings. First, the video quality is susceptible to environmental light intensity and the range of viewing angle of a camera, so it is difficult to maintain satisfactory video quality at different times of the day. Besides, without concerning the privacy issue, users’ sensitive information can easily be leaked to attackers during video transmission in networks. 10 Moreover, video storage and transmission require a lot of physical resources, for example, memory and bandwidth, thereby further limiting the wider application of this technology. Fortunately, the sensor-based activity monitoring technology can alleviate these problems to some extent, so it is favored by most researchers and is more widely used in the field of activity recognition.
Sensor-based activity monitoring
With the emergence of a variety of low-cost sensors, sensor-based activity monitoring becomes more and more prevalent. According to different sensors used, this technology also falls into two categories: portable sensor-based and non-intrusive sensor-based. The former monitors user actions mainly based on RFID technology and acceleration sensors. RFID tags attached to objects can provide information of different objects. Therefore, it is easy to monitor user actions related to objects in the surrounding environment. Fox et al. 11 design an RFID-based smart kitchen, in which RFID tags are attached to about 60 objects, for example, tablewares, coffee machines, cabinet doors, ovens, dishwashers, and refrigerators. In this application, user activities at different times of the day are collected and analyzed. In addition to RFID technology, acceleration sensors are also commonly used for activity monitoring. This kind of sensors is sensitive to activities that involve repetitive movements, for example, walking, running, standing, and climbing stairs. Zhang et al. 12 acquire user activity data by attaching acceleration sensors to the hands of users, and then use a back-propagation (BP) neural network to analyze the sensor data for activity recognition.
However, the portable sensor-based technology still has some disadvantages. First, users are required to wear a series of sensors for most of the day, which may cause inconvenience to most users, especially the elderly. Second, some technical problems of portable sensors, for example, sensor size, wearing comfort, water resistance, and battery life, also limit the application of this technology to some extent. To alleviate these problems, smart phones are used for daily activity monitoring. 13
Compared with portable sensors, non-intrusive sensors do not impose any burden on users. They are usually low-cost and can be deployed at different positions in a smart home to record locations of users at any time. Then, the time and location data can be further used for activity recognition. Van Kasteren et al. 14 establish a smart home environment with a variety of non-intrusive sensors, without interfering with users’ daily life. In this application, reed switch sensors are used to detect the door states of rooms, wardrobes, refrigerators, and ovens. Mercury contact sensors installed on objects such as medicine boxes, tablewares, and books are used to detect object movements. Floating sensors installed in the toilet can detect whether the user is using the toilet or the bathtub.
The Center for Advanced Studies in Adaptive Systems (CASAS) at Washington State University also build a smart home with non-intrusive sensors15,16 to enhance home-based medical technologies. In this project, passive infrared motion sensors are used to detect whether the target user appears in a certain area. Temperature sensors record the environmental temperature in real time. Object sensors monitor whether the objects are being used by the user. Water flow meters calculate the amount of water used by the user. The opening and closing status of doors are tracked by door sensors. Mobile phone use sensors detect whether the user is using a mobile phone.
As a result, the activity monitoring technology provides data support for the subsequent activity recognition, so it is an essential prerequisite for high-quality activity recognition. In the following subsection, we present some existing technologies of activity recognition.
Activity recognition technology
Sensor data collected by activity monitoring devices are then processed by specific algorithms to realize activity recognition. In this section, we introduce some typical activity recognition technology.
Generally, the collected sensor data can be seen as a time series. By dividing the time series, a series of fixed-length windows are obtained. Then, some statistical techniques are used to extract a feature vector from each time window. The most commonly used features for activity recognition include time and locations of sensor events, and the order of appearance of sensors in a window. 17 Wu 18 proposed a mixed feature extraction method based on time segment coding. Time segment is Gray-encoded and combined with other features to enrich the feature set and improve the recognition accuracy. In addition, some researchers combine environmental information (e.g. time, locations and traffic routes) of users with acceleration sensor data collected by smartphones, 19 in order to enrich identifiable activity categories.
After feature extraction, a group of feature vectors should be manually labeled to build a training set for model training. Typical supervised model training techniques include template matching, discriminant, and generative methods. The template matching techniques calculate the distance between each pair of feature vectors and determine the activity category of a new feature vector according to the labels of its nearest neighbors.20,21 The discriminant techniques, which mainly depend on machine learning algorithms (e.g. artificial neural networks (ANNs) and decision trees), identify different activity categories by searching for boundaries between different categories of feature vectors. ANN22–24 mainly trains a complex network to model the non-linear relationships between feature vectors and activity categories. However, the high complexity of the network (objective function) usually makes the parameter tuning process time-consuming. Moreover, the objective function is easy to fall into a local minimum, resulting in a poor ability of activity recognition. Therefore, it is important to design a reasonable network topology before model training. 25 The decision tree algorithm continuously selects features that can best differentiate different activities according to information gain.26–28 The generative methods, for example, naive Bayes (NB) classifier 29 and hidden Markov model, 30 generally construct a joint probability distribution of feature vectors and labels, and then calculate the association probability of a new feature vector and different labels. Finally, the label with the highest probability is selected as the activity recognition result.
Based on the above analysis, we can see that the current activity recognition technologies in smart environments still have some shortcomings. First, activity recognition is mostly performed offline in existing works. However, in real-world applications, it remains a challenge to realize real-time activity recognition based on the online streaming sensor data. Second, sensor data cannot be divided into different segments according to manually assigned class labels, since manual labeling in real-time environments is impossible. In addition, when processing the latest sensor data in real-time applications, only historical data are available, while future data are not available, which is very different from offline data processing techniques.
In this work, we propose a novel approach for online real-time activity recognition in smart homes. This approach only depends on historical data when analyzing the latest sensor data, so it is suitable for online real-time applications. Moreover, this approach adopts an advanced feature vector extraction technique enhanced by mutual information between sensors, which can effectively reduce the impacts of irrelevant information contained in a window of sensor events and further improve the accuracy of activity recognition. Finally, a genetic algorithm–optimized multi-class SVM classifier is used to realize activity recognition. Here, the genetic algorithm can automatically select optimal hyperparameters for the SVM classifier, thereby reducing the inaccuracy and inefficiency caused by manual tuning of hyperparameters.
Preliminaries
This section first introduces the SVM theory for classification, and then presents the problem statement and an overview of the proposed approach for activity recognition.
Introduction to SVMs
SVM31–35 is a popular machine learning algorithm that provides solutions for classification and regression problems. Here, we mainly focus on the classification problem. Given a group of training samples, SVM aims to find the training cases that lie on the class boundaries, that is, the support vectors. These support vectors can determine an optimal separating hyperplane (OSH) between different classes. In other words, only the training cases that lie on the class boundaries are necessary for discrimination and other training samples can be discarded.
Suppose that a training set of
SVM aims to maximize the discrimination margin
under the constraint denoted by
Solving this quadratic problem yields the hyperplane parameter as follows
and
where
The entire procedure can be generalized to nonlinearly separable training samples. These samples should be mapped into a high-dimensional space, thus to yield a linear OSH. Suppose that a mapping function
thereby leading to the following decision function
The commonly used kernel functions 36 include linear kernel, radial basis function (RBF) kernel, polynomial kernel, and sigmoid kernel.
SVMs were originally designed for binary classification, but can be extended for multi-class classification. A multi-class classification problem should be first reduced to a group of binary classification problems, then the basic SVM approach can be applied. There are generally two main approaches for multi-class classification: “one-against-all” and “one-against-one.” The former trains a group of binary SVM classifiers, each separating one class from the rest. In this approach,
Problem statement and approach overview
Problem statement
We aim to realize online real-time activity recognition in health smart homes using an optimized SVM classifier. Suppose that a total of
Approach overview
In order to address the problem stated above, we propose a novel activity recognition approach based on genetic algorithm–optimized SVM. Procedures involved in this approach are presented in Figure 1. First, the collected sensor events are divided into a sequence of overlapping sliding windows. Then, a feature vector amended by mutual information between sensors is extracted from each sliding window. Afterward, a group of feature vectors with manually labeled categories are used to train a multi-class SVM classifier, and the classification accuracy of the trained model is tested on a test set. In addition, the genetic algorithm is employed to automatically select optimal hyperparameters for the SVM model. Finally, the trained model is used to classify newly generated sensor events to realize real-time activity recognition.

Approach overview.
Activity recognition based on genetic algorithm–optimized SVM classifier
In this section, we elaborate on the proposed activity recognition approach, mainly including sensor event sequence segmentation, feature extraction, and genetic algorithm–optimized SVM for activity recognition.
Sensor event sequence segmentation
In order to build a training set, the collected sensor event sequence should be first segmented into a series of fixed-length overlapping sliding windows from which feature vectors are extracted. Formally, a sequence of
Window lengths of different activities.
Feature extraction
After obtaining a set of sliding windows, we can extract a feature vector from each window. Activity recognition aims to build a mapping relationship between feature vectors and activity categories, so extracting appropriate feature vectors is essential for high-quality activity recognition. Traditionally, the number of occurrences of each sensor in a window is regarded as an important feature. However, this feature extraction technique has an obvious disadvantage. When a window contains one or more activity transitions, most sensor events in the preceding context are irrelevant to the last sensor event, so a lot of irrelevant sensor information will be incorporated into this feature. Figure 2 illustrates a sample window containing two activity transitions from “Leaving” to “OtherActivity” and from “OtherActivity” to “Toilet.” Please note that the class label of the last sensor event in the window is “Toilet,” so all the other sensor events in the window are considered as the preceding context of the “Toilet” activity. However, most sensor events in the window belong to the “Leaving” or the “Other Activity” categories, both of which are irrelevant to the “Toilet” activity. Therefore, using the number of occurrences of each sensor as a feature is likely to result in misclassification of the last sensor event.

A sample window containing activity transitions.
In order to alleviate this problem, we use mutual information between sensors to amend the number of occurrences of each sensor.37,38 Mutual information measures the interdependence between two random variables. Here, the mutual information between two sensors is defined as the probability that the two sensors appear next to each other in the entire sensor event sequence. Supposing that there are
where
If two sensors frequently appear next to each other, their mutual information is relatively high. Therefore, the mutual information between sensors can be used to measure the correlation between a preceding sensor event and the last sensor event in a sliding window. Accordingly, the amended feature vector
where
where
Genetic algorithm–optimized SVM for activity recognition
The SVM model has four hyperparameters: kernel function, penalty coefficient, kernel function coefficient, and polynomial degree. The last parameter is only useful when a polynomial kernel function is used. Different parameter configurations will undoubtedly affect the accuracy of the activity recognition model. Therefore, the genetic algorithm is used to automatically select the optimal hyperparameters for the SVM classifier.
The genetic algorithm 39 is a well-known heuristic search algorithm, which is good at handling large-scale and high-dimensional search problems. It starts with selecting the best or fittest individuals from an initial population, where each individual represents a candidate solution to the selection problem. Then, the reproduction, crossover, and mutation operators are performed on the selected individuals to produce offsprings with higher fitness values. This process iterates over the search space until a termination condition is met. In this work, the genetic algorithm is used to select optimal hyperparameters for the SVM model.
Before performing the genetic algorithm, each hyperparameter should be encoded by a consecutive series of binary genes on a chromosome. For example, the hyperparameter “kernel function” occupies two consecutive genes, which can encode four different values corresponding to the four candidate kernel functions. Then, the initial population is randomly generated and the fitness value of each chromosome is calculated by
where
As mentioned above, the genetic algorithm includes three fundamental operators: reproduction, crossover, and mutation. The reproduction operator copies chromosomes with highest fitness values from the current generation to the next one. It is an elite strategy aimed at producing better solutions for the next generation based on the high-quality chromosomes of the current generation. Crossover is the most important operator. It first selects chromosomes with the highest fitness values as parents and then swap the parents’ sections after a selected crossover point to produce two offsprings. The mutation operator can ensure diversification of solutions and avoid a local optimum by randomly changing some genes on a chromosome. The three operators are repeated for many generations until a termination condition is satisfied. Once the optimal hyperparameters are determined, the corresponding SVM classifier can be used for activity recognition.
Experiments and discussion
In this section, we conduct a series of experiments on a real-world data set to validate the effectiveness of the proposed activity recognition approach.
Data set
We use a freely available data set “Human Activity Recognition from Continuous Ambient Sensor Data” (https://archive.ics.uci.edu/ml/datasets.php) provided by the Center of Advanced Studies in Adaptive Systems (CASAS) at Washington State University 15 for experiments. The entire data set records the daily activities of 15 volunteers in 15 smart homes within a month, resulting in 15 sub–data sets: CSH101–CSH115. In the CASAS project, motion sensors, door sensors, light sensors, temperature sensors, and other kinds of sensors are deployed in locations throughout a smart home for activity and environment monitoring. In our experiments, for simplicity, only the data generated by motion sensors and door sensors are used.
Data preprocessing
We analyzed the common activities in the 15 data sets, then merged some similar activities and finally got 11 activity categories. For example, the three activities “Eating Breakfast,”“Eating lunch,” and “Eating dinner” can be merged into the “Eat” activity. The final 11 activity categories are “Bathe,”“Bed_Toilet_Transition,”“Cook,”“Eat,”“Leaving,”“Personal_Hygiene,”“Sleep,”“Toilet,”“Wash_Dishes,”“Work_At_Table,” and “Other_Activity.” The distribution of different activities in the first data set CSH101 is shown in Figure 3. Obviously, the distribution of different categories is not uniform, so a robust SVM method which treats unbalanced cases based on the weights of classes is employed. 40

Distribution of activities in the CSH101 data set.
Evaluation metric
In our experiments, we use three metrics: macro-precision
where
Performance comparison
In order to evaluate the performance of the proposed approach, the cross-validation technique is used. Specifically, 70% of the data set is randomly selected for model training and the rest for testing. This procedure repeats 10 times and the average results are reported.
First, in order to verify the effectiveness of the sensor mutual information-amended feature extraction technique, the performances (

Performance comparison of different feature extraction methods.
When performing the genetic algorithm for automatic hyperparameter selection, 60% of individuals in each generation are selected as parents, and the mutation probability is set to 1%. The population size of each generation is set to 20 and the maximum number of iterations is set to 20. We performed the genetic algorithm on the CSH101 data set as an example. The average fitness value of each generation is depicted in Figure 5. The fitness curve tends to be stable after 12 iterations. We consider that the genetic algorithm has found the optimal hyperparameters of the SVM model. The corresponding chromosome with the highest fitness value in the final generation represents the optimal hyperparameters of the SVM classifier.

Iteration of genetic algorithm.
Figure 6 illustrates the confusion matrix of the SVM model for activity recognition on the CSH101 data set. The x-axis and the y-axis denote the predicted and the actual activity class labels, respectively. The saturation of the (

Confusion matrix of the SVM model.
In addition, the precisions, recalls, and
Performance of SVM on different activity classes.
SVM: support vector machine.
Next, in order to investigate the impact of the training set size on the generalization ability of the SVM classifier, we conducted experiments on a series of training sets of different sizes. The corresponding experimental results are reported in Figure 7. The x-axis denotes the ratio of training samples randomly selected from the CSH101 data set, and the y-axis represents the macro-

Macro-
In addition, the impacts of the training set size on the model training time and the recognition accuracy on the test set are depicted in Figure 8. The training time increases approximately linearly with the size of the training set. The macro-

Balance between training time and classification accuracy.
Figure 9 shows the performance comparison of five different activity recognition models: the genetic algorithm–optimized SVM proposed in this work, back-propagation–artificial neural network (BP-ANN), logistic regression (LR), NB, and C4.5 decision tree. BP-ANN employs a classic three-layer BP neural network. LR uses a multi-class logistic regression model based on the “one vs one” strategy. NB is based on a Gaussian model. C4.5 is a decision tree–based classification algorithm. The experimental results show that the classification accuracy of SVM is significantly better than that of LR, NB, and C4.5. BP-ANN and SVM yield comparable classification accuracy, but the training time of BP-ANN is usually longer than that of SVM.

Performance comparison of different activity recognition algorithms.
Figure 10 reports the macro-

Activity recognition performance of the proposed approach on different data sets.
Conclusion
In this work, we proposed a real-time activity recognition approach based on the genetic algorithm–optimized SVM classifier. Mutual information between sensors is utilized to amend feature vectors, thereby reducing the impact of irrelevant information contained in a sliding window of sensor events and further improving the accuracy of activity recognition. In addition, the SVM classifier is enhanced by the genetic algorithm for automatic hyperparameter selection, thereby avoiding the costly manual selection of hyperparameters. This approach can realize high-quality real-time activity recognition for elderly people in smart home environments and allow them to live more safely and independently at home.
However, the work presented in this article still has some limitations. First, training an SVM classifier requires a large number of labeled data samples, which makes manual labeling a costly process. In addition, each smart home has to train a specific SVM classifier for activity recognition, since people in different environments have different activity patterns. In other words, it is difficult to share a common activity recognition model between different smart environments. In the future work, we will try to use transfer learning technology to solve these problems, so as to realize knowledge sharing between different environments, reduce the burden of manual labeling and further improve the efficiency of activity recognition.
Footnotes
Acknowledgements
The authors appreciate the reviewers for their helpful comments and suggestions for the improvement of this paper.
Handling Editor: Francesc Pozo
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported, in part, by the National Natural Science Foundation of China under grant 61802016 and 61702506; in part, by the Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities) under grant FRF-IDRY-19-016; in part, by National Key Research and Development Project under grant 2017YFB0802805; and in part, by the National Social Science Foundation of China under grant 17ZDA331.
