Log-Viterbi algorithm applied on second-order hidden Markov model for human activity recognition

Abstract

Recognition of human activities is getting into the limelight among researchers in the field of pervasive computing, ambient intelligence, robotic, and monitoring such as assistive living, elderly care, and health care. Many platforms, models, and algorithms have been developed and implemented to recognize the human activities. However, existing approaches suffer from low-activity accuracy and high time complexity. Therefore, we proposed probabilistic log-Viterbi algorithm on second-order hidden Markov model that facilitates our algorithm by reducing the time complexity with increased accuracy. Second-order hidden Markov model is efficient relevance between previous two activities, current activity, and current observation that incorporate more information into recognition procedure. The log-Viterbi algorithm converts the products of a large number of probabilities into additions and finds the most likely activity from observation sequence under given model. Therefore, this approach maximizes the probability of activity recognition with improved accuracy and reduced time complexity. We compared our proposed algorithm among other famous probabilistic models such as Naïve Bayes, condition random field, hidden Markov model, and hidden semi-Markov model using three datasets in the smart home environment. The recognition possibility of our proposed method is significantly better in accuracy and time complexity than early proposed method. Moreover, this improved algorithm for activity recognition is much effective for almost all the dynamic environments such as assistive living, elderly care, healthcare applications, and home automation.

Keywords

Activity recognition second-order hidden Markov model log-Viterbi algorithm time complexity smart home

Introduction

People adopt independent lifestyles and emphasize the quality of their life. They wish to have easygoing and assisted living. Thus, the system with the intelligent ability to provide various services in accordance with user’s preference should be developed. But, in order to provide intelligent services, the system needs to understand and recognize human activity and behavior first.^1,2 The objective of proposed work is to predict and recognize user activities in a smart home environment more conveniently and accurately with less complexity. If we able to create an accurate and fastest activity recognition method, the smart home can provide the appropriate service in accordance with user desires. The activity of human represents the functional status of the person. Human perform numerous activities from morning to night like brushing, toileting, eating, watching TV, listening music, cooking, sleeping, going outside, medication, and so on. The activity recognition in the real world is challenging due to diversity and complexity of user activities which affect the accuracy of recognition precisely. Many researchers have proposed many ideas and works but the accuracy and complexity still remain below the target in activity recognition evolution.

The human activity recognition has broad application areas such as elderly care,³ remote health⁴ monitoring emergencies such as natural disasters, robotics,⁵ comfort applications in smart homes, and energy-efficient urban spaces. Among all these, activity recognition in the smart home has gained its popularity. The number of existing study has focused on recognizing resident’s activities based on sensory data in the smart home.⁶ The activity recognition systems use several types of sensors, microphones, video cameras, RFID readers, wearable sensors, and embedded sensors with appliances to determine the state of the physical world. We mainly focused on using the external sensor rather than the inertial (wearable) sensor. These days a smartphone is also getting popularity as the sensing tools for monitoring purposes of individuals activity. Audio and video-based activity recognition are sophisticated to use because it requires processing of multidimensional data and may raise user’s privacy issue.^7–9

Human activity recognition system needs access to sensors, computational process, and inferring mechanism to recognize the activity.¹⁰ The accuracy and computational complexity are the main parameters mostly considered for sustainable recognition of human activity. Inter-activity and intra-activity variations influence the accuracy and complexity. Likewise, errors in sensors, execution of activities, limited training data may impact the performance of recognition methods. Different types of machine learning approaches and probabilities reasoning algorithms have been proposed for accurate recognition. Conditional random field,¹¹ Bayesian network,^12,13 hidden Markov model (HMM),¹⁴ hidden semi-Markov model (HSMM),¹⁵ adaptive HMM¹⁶ techniques are probabilistic in nature and have been widely used for detecting the spatiotemporal aspects of the data. However, recognition accuracy is just satisfactory not better that actually have to be. And computational complexity is also very high for large dataset. This method of recognition is the data-driven approach. The data-driven approach has two discriminative and generative root approaches. Discriminative approaches are computationally efficient and good in prediction but this may suffer from overfitting. Generative approaches are flexible to uncertainty in the data but they required a large amount of data for recognition and prediction. Many researchers used either of one, but some¹⁷ used both approaches. In this article, we present the log-Viterbi (LoV) algorithm applied on second-order HMM for recognition of activities in the smart home. The second-order HMM is an extension to the first-order HMM which contains additional information about previous activity that leads to better accuracy results. Second-order HMM relies on one observation in a current state and the transition probability function based on two previous states. Increment in order of second-order HMM may lead to the computational complexity but use of logarithm function into the Viterbi algorithm eventually confront complexity problem.

The LoV algorithm is the dynamic programming algorithm used to find the most likely sequence of hidden states by converting the products of a large number of probabilities into additions from observation sequence. Therefore, the complexity to process the product value is reduced which ultimately speed up the system.

So for to our knowledge, the LoV algorithm applied into the second-order HMM makes our approaches noble from other existing one. We have evaluated the performance of activity recognition with regard to our proposed method using three fully annotated real-world datasets generated by Kyoto and Van Kasteren and hence, compared among other probabilistic approaches: Naïve Bayes (NB), conditional random field (CRF), HMM, and HSMM activity recognition algorithm.

The remainder of the paper is organized as follows: related works of the activity recognition are in section “Related work.” Section “The proposed method” illustrates the proposed activity recognition method. The algorithm for the whole process is described in section “The proposed method.” Section “Evaluation and results” describes the experimental results and evaluation. The time complexity is covered in section “Time complexity” followed by conclusion in section “Conclusion.”

Related work

From late 90s to till date, human activity recognition is one of the interesting topics for many researchers worldwide. However, development of improving an accurate technique has some serious issues to meet the real-world condition. The realistic data collection, unobtrusive, portable, and inexpensive data acquisition system development, the design of extraction and reasoning algorithm, flexibility to support new and multi-users are the major challenges faced during activity recognition.¹⁸

The smart home project such as the Center for Advanced Studies in Adaptive Systems (CASAS)¹⁹ at Washington State University (WSU), Aware Home Research Initiative (AHRI)²⁰ at Georgia Institute of Technology, Tim Van Kasteren,²¹ PlaceLab Massachusetts Institute of Technology (MIT)²² monitor resident’s activities, upgrading comfort by installing heterogeneous sensors but its efficiency depends only on the large number of sensor inputs and processing time. In a smart home, sensors play a vital role in observing and understanding human daily activities. Basically, the external sensors (pressure, light, heat, etc.) and wearable (accelerometer, ECG, etc.) sensors have been used for human activities recognition (cooking, sleeping, walking, running, etc.) in smart homes.²³ Kasteren et al. used temporal probabilistic models (NB, HMM, CRF) to recognize the activities from the sensor data by dividing data into time-slices in constant length which may not be well suited for all activities. Tapia et al.¹ used NB to learn different length time-slices for different activities during training data. Long-range dependencies between the observations within activity segments can be modeled by integrating sequential pattern mining, used to characterize the time spans during an activity execution, with the HSMM for the activity recognition. Video-based depth image compensation with colored markers and joint trajectory on HMM was implemented⁵ and analyzed with high accuracy but complexity remains high due to image processing.

For the recognition of activity the label of activity class according to highest probability of sequences of the sensor technology is used in the NB classifier.²⁴ Dynamic bayesian network (DBN) is tested to recognize activities by rebuilding the already learned models to use differences in activities for an incremental learning approach. Support vector mechanism (SVM)^25–27 applied for classification of activities, considers redundancy of activities during feature selection process. Multilayer perceptrons^25,28 and decision tree²⁹ are used in activity classification. The combination of pattern clustering method such as K-pattern clustering algorithm an activity decision algorithm; an artificial neural network (ANN) featuring IoT on smart home environment³⁰ showed some good accuracy but due to two-step process and that of IoT complexity, runtime may increase. Equally, decision tree algorithm also affords some better feasibility for more interpretability. For the big structured dataset, the neural network techniques such as deep learning³¹ and deep belief network are also widely used. Again restricted Boltzmann machine (RBM)³² and conditional restricted Boltzmann machine (CRBM)³³ have been presented as probabilistic modeling tools. The discriminative models like condition random fields (CRFs)³⁴ use independence assumption in which we learn the model parameters by optimizing conditional likelihood rather than joint likelihood. However, CRF is computationally uncontrollable and repeatedly depend on approximation techniques. Analysis of resident’s behavior in the smart home is done using the activity recognition. Audio³⁵–video^36,37 activity recognition is most suitable for healthcare and monitoring. However, the audio-visual method has privacy, complex, and pervasive issue.³⁸

Deep neural network can only facilitate limited temporal modeling by executing into a fixed size sliding window of activity frame. Convolutional neural network (CNN); a broadly applied deep neural network, has the ability to extract features and can learn multiple layers of features but it is bit slower and complicated. Long–short term memory is recurrent neural networks with memory called cell to model the long range temporal dependencies in time series problems³⁹ using the multimodal wearable sensor. In this work, we use probabilistic LoV algorithm applied on second-order HMM to overcome the time complexity of existing algorithm. Eventually, it also addresses the accuracy issue.

The proposed method

The system architecture used in the activity recognition is shown in Figure 1. Although figure describes the overall system architecture of activity recognition, our main focus is on the development of reliable algorithm using second-order HMM. Hence, we proposed the LoV algorithm.

Figure 1.

Block diagram of activity recognition.

Raw data are taken from the sensors. The feature is extracted from the sensor data. Leave one-day cross concept is considered for the evaluation, on which a single day data are used for testing. Meanwhile, 50% of the data are used as the training data while the remaining 50% for test data. LoV algorithm produces optimal path sequence of activity from given model. And finally, the desired activity is recognized.

Second-order HMM

Normal HMM explains hidden state $y_{t}$ depends on the previous state $y_{t - 1}$ together with observed variable $x_{t}$ at time t. However, in second-order HMM state at any time step t is dependent on their previous two states at t – 1 and t – 2, as shown in Figure 2.

Figure 2.

State transition of second-order HMM.

Unlike the transition probability distribution $a_{ij}$ of normal HMM, transition probability distribution of second-order HMM is represented by $a_{ijk}$ . There are three HMM elements that characterize the probability distributions as:

The initial state distribution

π_{i} = P (i)

(1)

where P(i) is the probability of being i as the initial state.

2.The transition probability distribution of k given i and j

a_{ijk} = P (y_{t} | y_{t - 1}, y_{t - 2})

(2)

\sum_{k - 1}^{N} a_{ijk} = 1, 1 \leq i \leq N, 1 \leq j \leq N

(3)

where N is the number of states in the model and y_t is the hidden state at time t.

3.The observation distribution, also known as emission probability distribution and determined by

b_{k} = P (x_{t} | y_{t})

(4)

where $P (x_{t} | y_{t})$ is the probability of observation x_t at state y_t, b_k observation probability.

Second-order HMM finds a hidden state sequence which maximizes a joint probability $P (x, y)$ of transition probability $a_{ijk}$ and observation probability $b_{k}$ which is the probability that outcomes x_t is observed in a state $y_{t}$

\begin{matrix} P (x, y) = Π_{ijk = 1}^{N} π_{i} a_{ijk} \cdot b_{k} \\ = P (y_{1}) Π_{t = 1}^{T} P (y_{t} | y_{t - 1}, y_{t - 2}) \cdot P (x_{t} | y_{t}) \end{matrix}

(5)

The parameters that maximize this joint probability are found by simply counting the number of occurrence of transitions, observations, and states. But the second-order HMM parameters need to be adjusted to overcome the evaluation, decoding, and learning problems. New forward and backward functions are defined to address the evaluation problem in which probability of observation given in the model $λ = (π_{i}, a_{ijk}, b_{k})$ could be calculated. The forward function $α_{t} (j, k)$ defines probability of observation $X = x_{1}, x_{2}, x_{3}, \dots, x_{T}$ and transition $a_{jk}$ between time t – 1 and t

\begin{matrix} α_{t} (j, k) = P (x_{1} x_{2} x_{3} \dots x_{T}, y_{t - 1}, y_{t} | λ), \\ 2 \leq t \leq T, 1 \leq j, k \leq N \end{matrix}

(6)

$α_{t} (j, k)$ can be obtained from $α_{t - 1} (i, j)$ where $a_{ij}$ and $a_{jk}$ are two transitions between the states i to k

\begin{matrix} α_{t + 1} (j, k) = \sum_{i = 1}^{N} [α_{t} (j, k) \cdot a_{ijk} \cdot b_{k} (x_{t + 1})] \\ 2 \leq t \leq T 1 \leq j, k \leq N \end{matrix}

(7)

The log is applied to the given function $α_{t + 1} (j, k)$

\log α_{t + 1} (j, k) = \sum_{i = 1}^{N} \log [α_{t} (j, k) \cdot a_{ijk} \cdot b_{k} (x_{t + 1})]

(8)

\begin{matrix} α_{t + 1} (j, k) = \exp \sum_{i = 1}^{N} \log α_{t} (j, k) \\ + \log a_{ijk} + \log b_{k} (x_{t + 1}) \end{matrix}

(9)

P (X | λ) = \sum_{i = 1}^{N} α_{T} (i)

(10)

Likewise, the backward algorithm $β_{t} (i, j)$ can be expressed in logarithm form as

\begin{matrix} β_{t} (i, j) = P (x_{t + 1} x_{t + 2} x_{t + 3} \dots x_{T} | y_{t}, λ), \\ 2 \leq t \leq T - 1, 1 \leq j, k \leq N \end{matrix}

(11)

\begin{matrix} \log β_{t} (i, j) = \sum_{i = 1}^{N} \log [β_{t + 1} (j) \cdot a_{ijk} \cdot b_{k} (x_{t + 1})], \\ 2 \leq t \leq T - 1, 1 \leq j, k \leq N \end{matrix}

(12)

β_{t} (i, j) = \exp \sum_{i = 1}^{N} \log β_{t + 1} (j) + \log a_{ijk} (x_{t + 1}) + \log b_{k}

(13)

Proposed LoV algorithm

The problem of decoding can be addressed using the Viterbi algorithm. However, time complexity seems to be increased. Therefore, LoV algorithm is on the count. On given model and observation sequence $X = x_{1}, x_{2}, x_{3}, \dots, x_{T}$ , we have to choose a corresponding state sequence $Y = y_{1}, y_{2}, y_{3}, \dots, y_{T}$ , which is optimal, and can best explains the observation to infer best state sequence. This optimality criterion maximizes the expected number of correct individual state sequence. The most likely state sequence is using probability of partial alignment ending at transition (j, k) at times (t – 1, t) To implement this, we define the variable

\begin{matrix} δ_{t} (j, k) = P (y_{1}, y_{2}, \dots, y_{t - 2}, y_{t - 1} = j, \\ y_{t} = k, x_{1} x_{2} x_{3} \dots x_{t}, | λ), 2 \leq t \leq T, 1 \leq j, k \leq N \end{matrix}

(14)

$δ_{t} (j, k)$ is the highest probability along a single path, at time t, which accounts for first t observations and ends in state k, by induction we have

\begin{matrix} δ_{t} (j, k) =_{1 \leq i \leq N}^{\max} [δ_{t - 1} (i, j) \cdot a_{ijk}] b_{k} (x_{t}), \\ 3 \leq t \leq T 1 \leq j, k \leq N \end{matrix}

(15)

Log of $δ_{t} (j, k)$ be

\begin{matrix} \log δ_{t} (j, k) =_{1 \leq i \leq N}^{\max} \log [δ_{t - 1} (i, j) \cdot a_{ijk}] b_{k} (x_{t}), \\ 3 \leq t \leq T, 1 \leq j, k \leq N \end{matrix}

(16)

δ_{t} (j, k) = \exp [_{1 \leq i \leq N}^{\max} \log δ_{t - 1} (i, j) + \log a_{ijk} + \log b_{k}]

(17)

δ_{t} (i, j) = \exp [\log π_{i} + \log a_{ij} + \log b_{k} (x_{t})] 1 \leq i, j \leq N

(18)

\begin{matrix} φ_{3} (i, j) = 0 \\ δ_{t} (j, k) = \exp [_{1 \leq i \leq N}^{\max} \log δ_{t - 1} (i, j) + \log a_{ijk} + \log b_{k} (x_{t})], \\ 3 \leq t \leq T, 1 \leq j, k \leq N \end{matrix}

(19)

We also keep track of the most likely previous state for each possible state that we end up in probable path

\begin{matrix} φ_{t} (j, k) = \exp [_{1 \leq i \leq N}^{\max} [\log δ_{t - 1} (i, j) + \log a_{ijk}]], \\ 3 \leq t \leq T, 1 \leq j, k \leq N \end{matrix}

(20)

And we terminate by computing the most probable final state $y_{T}^{*}$

y_{T}^{*} =_{1 \leq i, j \leq N}^{argmax} [δ_{t} (j, k)]

(21)

We can then compute the most probable sequence of states using backtracking or traceback approach.

y_{t}^{*} = φ_{t + 1} (y_{t + 1}^{*}) t = T - 1, T - 2, \dots, 1

(22)

Maximum likelihood estimation

The Baum–Welch algorithm is used³⁷ to compute the learning or likelihood of training observation of the model. In second-order HMM, the forward-backward algorithm determines the expected number of state transitions and emissions based on a current model in Baum–Welch-based algorithm. These parameters will be used to re-estimate the model parameters using estimation formula for each iteration. This interaction continues until a convergence to stationary point occurs. The probability of being in state Y_j at time t and Y_k at t + 1 given the model λ, then from definition of forward-backward algorithm is given as

ξ_{t} (j, k) = \frac{α_{t} (j) \cdot a_{ijk} \cdot b_{k} (x_{t + 1}) β_{t + 1} (j)}{\sum_{j = 1}^{N} \sum_{k}^{N} α_{t} (j) \cdot a_{ijk} \cdot b_{k} (x_{t + 1}) β_{t + 1} (j)}

(23)

The probability of being in state X_j at time t, given the observation sequence Y_j and the model λ is

γ_{t (j)} = \sum_{k = 1}^{N} ξ_{t} (j, k)

(24)

The maximum likelihood of model parameter can be re-estimate as

{\bar{π}}_{i} = γ_{i} (i)

(25)

{\bar{a}}_{ij} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} γ_{t} (i)}

(26)

{\bar{a}}_{ijk} = \frac{\sum_{t = 1}^{T - 2} ξ_{t} (j, k)}{\sum_{t = 1}^{T - 2} γ_{t} (j)}

(27)

{\bar{b}}_{k} = \frac{\sum_{t = 1}^{T} γ_{t} (k) δ_{t} y_{t} (k)}{\sum_{t = 1}^{T} γ_{t} (j)}

(28)

Evaluation and results

In our experiments, the state is considered to be the activity and observation data are the sensor data. We evaluate the proposed recognition model using three publicly available real-world datasets. The results of our approach are compared with NB, HMM, CRF, and HSMM. Furthermore, we put in front the activity level performance analysis through the confusion matrices. Leave one-day cross concept is considered for evaluation in which a single day data are used for testing, while the remaining for training. The process is repeated for all days and an average is calculated. The parameter taken for evaluation is F-score and accuracy. Finally, the time complexity is calculated as per experimental processing.

LoV algorithm
1. Input: Y is sets of Activity $X$ is sets of Sensors (Observation) 2. Initialize: Initial distribution π Transition distribution A Emission distribution B 3. For y_t $\in$ Y do Calculate the initial probability $α_{1} (i) \leftarrow π_{i} b_{i} (y_{1})$ 4. end for 5. do for y_t $\in$ Y do calculate the forward probability at time t + 1, $α_{t + 1} (j, k) \leftarrow \exp \sum_{i = 1}^{N} \log α_{t} (j, k) + \log a_{ijk} + \log b_{k} (x_{t + 1})$ where, $2 \leq t \leq T, 1 \leq j, k \leq N$ end for end for 6. for calculate the backward probability at t from the given value at t + 1 $β_{t} (I, j) \leftarrow \exp \sum_{i = 1}^{N} \log β_{t + 1} (j) + \log a_{ijk} + \log b_{k} (x_{t + 1})$ Where, $2 \leq t \leq T - 1, 1 \leq j, k \leq N$ end for 7. for t = 2 to N do 8. for y_t $\in$ Y do calculate the state sequence, $δ_{t} (j, k) \leftarrow \exp [_{1 \leq i \leq N}^{max} \log δ_{t - 1} (i, j) + \log a_{ijk} + \log b_{k} (x_{t})]$ calculate the most probable path, $φ_{t} (j, k) \leftarrow \exp [_{1 \leq i \leq N}^{max} [\log δ_{t - 1} (i, j) + \log a_{ijk}]]$ 9. end for 10. end for terminate by computing the most probable final state $y_{T}^{} \leftarrow_{1 \leq i, j \leq N}^{a r g m a x} [δ_{t} (j, k)]$ 11. For t = N to 2 do backtracking $y_{t}^{} \leftarrow φ_{t + 1} (y_{t + 1}^{*})$ 12. end for 13. for y_t $\in$ Y do 14. for t = 1 to N do 15. max = 0 Probability of the forward-backward algorithm $ξ_{t} (j, k) \leftarrow \frac{α_{t} (j) . a_{ijk} . b_{k} (x_{t + 1}) β_{t + 1} (j)}{\sum_{j = 1}^{N} \sum_{k}^{N} α_{t} (j) . a_{ijk} . b_{k} (x_{t + 1}) β_{t + 1} (j)}$ 16. end for $γ_{t (j)} \leftarrow \sum_{k = 1}^{N} ξ_{t} (j, k)$ 17. end for 18. update: we can obtain the parameters of hidden Markov model $π = {\bar{π}}_{i} \leftarrow γ_{i} (i)$ ${\bar{a}}_{ij} \leftarrow \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} γ_{t} (i)}$ $A = {\bar{a}}_{ijk} \leftarrow \frac{\sum_{t = 1}^{T - 2} ξ_{t} (j, k)}{\sum_{t = 1}^{T - 2} γ_{t} (j)}$ $B = b_{k} \leftarrow \frac{\sum_{t = 1}^{T} γ_{t} (k) δ_{t} y_{t} (k)}{\sum_{t = 1}^{T} γ_{t} (j)}$ 19. end for 20. end for 21. end

LoV algorithm

1. Input: Y is sets of Activity

X

is sets of Sensors (Observation)
2. Initialize: Initial distribution π Transition distribution A Emission distribution B
3. For y_t

\in

Y do Calculate the initial probability

α_{1} (i) \leftarrow π_{i} b_{i} (y_{1})

4. end for
5. do for y_t

\in

Y do calculate the forward probability at time t + 1,

α_{t + 1} (j, k) \leftarrow \exp \sum_{i = 1}^{N} \log α_{t} (j, k) + \log a_{ijk} + \log b_{k} (x_{t + 1})

where,

2 \leq t \leq T, 1 \leq j, k \leq N

end for
end for
6. for calculate the backward probability at t from the given value at t + 1

β_{t} (I, j) \leftarrow \exp \sum_{i = 1}^{N} \log β_{t + 1} (j) + \log a_{ijk} + \log b_{k} (x_{t + 1})

Where,

2 \leq t \leq T - 1, 1 \leq j, k \leq N

end for
7. for t = 2 to N do
8. for y_t

\in

Y do calculate the state sequence,

δ_{t} (j, k) \leftarrow \exp [_{1 \leq i \leq N}^{max} \log δ_{t - 1} (i, j) + \log a_{ijk} + \log b_{k} (x_{t})]

calculate the most probable path,

φ_{t} (j, k) \leftarrow \exp [_{1 \leq i \leq N}^{max} [\log δ_{t - 1} (i, j) + \log a_{ijk}]]

9. end for
10. end for terminate by computing the most probable final state

y_{T}^{*} \leftarrow_{1 \leq i, j \leq N}^{a r g m a x} [δ_{t} (j, k)]

11. For t = N to 2 do backtracking

y_{t}^{*} \leftarrow φ_{t + 1} (y_{t + 1}^{*})

12. end for
13. for y_t

\in

Y do
14. for t = 1 to N do
15. max = 0 Probability of the forward-backward algorithm

ξ_{t} (j, k) \leftarrow \frac{α_{t} (j) . a_{ijk} . b_{k} (x_{t + 1}) β_{t + 1} (j)}{\sum_{j = 1}^{N} \sum_{k}^{N} α_{t} (j) . a_{ijk} . b_{k} (x_{t + 1}) β_{t + 1} (j)}

16. end for

γ_{t (j)} \leftarrow \sum_{k = 1}^{N} ξ_{t} (j, k)

17. end for
18. update: we can obtain the parameters of hidden Markov model

π = {\bar{π}}_{i} \leftarrow γ_{i} (i)

{\bar{a}}_{ij} \leftarrow \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} γ_{t} (i)}

A = {\bar{a}}_{ijk} \leftarrow \frac{\sum_{t = 1}^{T - 2} ξ_{t} (j, k)}{\sum_{t = 1}^{T - 2} γ_{t} (j)}

B = b_{k} \leftarrow \frac{\sum_{t = 1}^{T} γ_{t} (k) δ_{t} y_{t} (k)}{\sum_{t = 1}^{T} γ_{t} (j)}

19. end for
20. end for
21. end

Evaluation terminology

We define evaluation terminologies as precision, recall, F-score, and accuracy. F-score is the harmonic mean of precision and recall. These terminologies are measures using true positive (TP), false positive (FP), and false negative (FN) through confusion matrix

F - score = \frac{2 \times Precision \times Recall}{Precision + Recall}

Precision = \frac{TP}{TP + FP} \times 100

Recall = \frac{TP}{TP + FN} \times 100

Accuracy = \frac{TP}{N} \times 100

N denotes the number of states. The F-score and accuracy values near to 1 highlight the best performance of proposed method otherwise indicates downfall of performance.

Datasets

Many smart home datasets are publicly available; among them, the Kasteren datasets and WSU CASAS are very famous datasets for evaluation of any smart home projects for activity recognition. Overview of Kasteren datasets and WSU CASAS is shown in Table 1. Kasteren datasets contain three smart homes datasets (Kasteren-House-A, Kasteren-House-B, and Kasteren-House-C), but we chose only two of them. Kyoto1 is one of the many WSU CASAS datasets we selected. The Kasteren dataset is collected from performance performed by one resident. However, WSU CASAS comprises of multiple residents, which means WSU CASAS datasets provide high inter-subject variations.

Table 1.

Dataset information used in the evaluation of proposed method.

	Kasteren-House-A	Kasteren-House-C	Kyoto1
Rooms	3	6	4
Sensors	14	21	38
Activities	10	16	5
Residents	1	1	20
Period	25 days	20 days	15 days
Instances	245	257	120
Activities performed	Leaving, toileting, showering, sleeping, breakfast, dinner, drinking	Eating, brushing teeth, shaving, medication, breakfast, dinner, relaxing, dishwashing, toileting, drinking, snacks, laundering, lunch, sleeping, leave house	Calling, washing hand, eating, cooking, cleaning

Activity recognition analysis

Table 2 presents confusion matrix of Kasteren-House-A activities. The proposed method recognizes activities with an average accuracy of more than 85%. Breakfast and dinner activities have 13% of their instances shared among each other because they are being performed at the same location (kitchen). Likewise, toileting and showering share 12% of instances as they share the same location (restroom). Drinking generates most of the confusion with other activity as drinking can be done while taking breakfast, dinner, leaving activities concurrently. Sleeping activity has highest accuracy of 91.3% as none other activity is done while sleeping. Table 3 shows confusion matrix of activities in Kasteren-House-C dataset. All the activities have been recognized with higher accuracy. Some of the activities like dinner have 79% activity instance recognition accuracy. It has 5%, 5.5%, 1.5%, 0.5% and 4.2% error activities recognized as eating, breakfast, dishwashing, lunch, and other, respectively. 85% of shaving activity is recognized correctly but 5%, 7.2%, 4.6%, and 5.1% activity is confused with brushing teeth, toileting, doing laundry, and other.

Table 2.

Confusion matrix for Kasteren-House-A.

	Leaving	Toileting	Showering	Sleeping	Breakfast	Dinner	Drinking	Recall
Leaving	82.5	0	0	0	0	0	17.5	82.50
Toileting	0	88	12	0	0	0	0	88.00
Showering	0	15	85	0	0	0	0	85.00
Sleeping	0	0	0	91.3	1.3	3.2	4.2	91.30
Breakfast	0	0	0	0	79.5	15.1	5.4	79.50
Dinner	0	0	0	0	13	77	10	77.00
Drinking	6.5	0	1.2	0	6.9	4.4	81	81.00
Precision	92.70	85.44	86.56	100.00	78.95	77.23	68.59

Table 3.

Confusion matrix for Kasteren-House-C.

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	Recall
Eating (1)	72	0	0	0	8	5	0	0	0	0	5	0	7	0	0	3	72.00
Brushing teeth (2)	0	78	5	0	0	0	0	0	7	0	0	0	0	0	0	10	78.00
Shaving (3)	0	3	85	0	0	0	0	0	1.5	0	0	1.2	0	0	0	9.3	85.00
Medication (4)	0	0	0	81.6	0	0	1.9	0	0	0	0	0	0	10	0	6.5	81.60
Breakfast (5)	6	0	0	0	67	5.5	0	0	0	4.5	3	0	5	0	0	9	67.00
Dinner (6)	4.3	0	0	1	5.2	79	0	2.8	0	0	0	0	3	0	0	4.7	79.00
Relaxing (7)	0	0	0	0	0	0	92	0	0	0	0	0	0	2.7	0	5.3	92.00
Dishwashing (8)	0	2.3	0	0	0	0	0	81	5.6	3.6	0	0	0	0	0	7.5	81.00
Toileting (9)	0	7.2	0	0	0	0	0	0	78	0	0	0	0	0	0	14.8	78.00
Drinking (10)	6.2	0	0	4.3	2.1	1.5	0	0	0	77.5	1.6	0	0	0	0	6.8	77.50
Snacks (11)	2.3	0	0	1.3	2.5	0	0	1.6	0	0	78	0	5	0	0	9.3	78.00
Laundering (12)	0	0	7.2	0	0	0	0	0	0	0	0	81.8	0	0	0	11	81.80
Lunch (13)	9	0	4.6	4	0	0.5	0	0	0	4.9	2	0	69	0	0	6	69.00
Sleeping (14)	0	0	2	0	0	0	2	0	0	0	0	0	1	92	0	3	92.00
Leave house (15)	0	0	0	0	0	0	0	0	0	0	0	0	0	0	95	5	95.00
Others (16)	2.2	1.5	5.1	3	6	4.2	3.3	1.9	3.2	3.7	0	5.9	2	6.5	3	48.5	48.50
Precision	70.59	84.78	78.05	85.71	73.79	82.55	92.74	92.78	81.85	82.27	87.05	92.01	75.00	82.73	96.94	30.37

There occurred a confusion between different activities it is because of sharing same location and platform, for example, any eating activity like lunch, dinner, and breakfast, commonly occurs in the kitchen. Also share common sensor like gas sensor, heat sensor, and so on. The Kasteren-House-C has large confusion activities; therefore, F-score and accuracy are lower than Kasteren-House-A and Kyoto1. Other similar activities such as drinking, eating, lunch, snack exhibit a similar trend of sharing error and making confusion among each other. Kyoto1 dataset confusion matrix is shown in Table 4 whose recognition accuracy is more compared to other two datasets. Calling activity has 95% correctly recognized but still 6.5% of eating error activity because the call can be taken while doing both activities at the same time. Washing hand has 96%, while 3.7% and 2% of the cooking and cleaning confusion since washing hand can be done next or in between cleaning and cooking. The cooking activity is recognized 94% correctly but still creates confusion with 3%, 1.6%, 4.1%, and 1% of calling, washing hand, eating, and cleaning, respectively. The overall accuracy of the proposed method on Kyoto1 has 94.3% to recognize all of the activities. In eating activity, 89.4% instances are correctly identified, while 4%, 2% of instances are erroneously identified as wash hands, calling and 2.3% for cooking. Although accuracy is high, to find actual recognition distribution, a large number of the dataset could be needed. Kasteren and CASAS data have the least number of dataset and instances, therefore, actual distribution is easy to find out. On analysis, our proposed method to the available dataset proves to be more reliable and effective than other methods.

Table 4.

Confusion matrix for Kyoto1.

	Calling	Washing hand	Eating	Cooking	Cleaning	Recall
Calling	95	0	2	3	0	95
Washing hand	0	96	0	1.6	2.4	96
Eating	6.5	0	89.4	4.1	0	89.4
Cooking	0	3.7	2.3	94	0	94
Cleaning	0	2	0	1	97	97
Precision	93.6	94.395	95.41	90.6	97.5855

Activity recognition comparison

Performance evaluation of the Kasteren-House-A dataset on our proposed method is shown in Table 5. Activities like breakfast, dinner, and drinking again like toileting and showering are performed in the same location that means location sensor is shared in that location. Thus, there appeared less inter-class variation. Precision, recall, F-score, and accuracy in Kasteren-House-A are obtained as 84.21%, 83.47%, 0.84%, and 85.47% respectively, from our proposed method. The resultant values are satisfactory than the existing approaches NB, CRF, and HMM, which means location sensor is shared in that location. The resultant values are satisfactory than the existing approaches NB, CRF, HMM, and HSMM. In Kasteren-House-C precision, recall F-score and accuracy of the proposed method are 80.58%, 78.46%, 0.79%, and 78.46%, respectively. The precision, recall, F-score, and accuracy of Kasteren-House-C is less than that of inter-class variation but these values are high enough compare with other approaches. Table 3 shows the confusion matrix of the Kasteren-House-C. Kyoto1 is the CASAS datasets which have high intra-class and inter-subject variations and activity class is discriminative. Kyoto1 shows the excellent performance of our proposed method securing F-score of 0.94 and accuracy of 94.28%. The precision and recall of Kyoto1 are 94.33% and 94.28%, respectively, and remain higher on our proposed method compare to other four approaches. All the activity recognition comparison with other is clearly shown in Table 5.

Table 5.

Performance evaluation of smart home dataset with proposed method.

Datasets	Method	Precision	Recall	F-score	Accuracy
Kasteren-House-A	NB	76.61	69.11	72.67	71.56
	CRF	79.52	72.23	75.70	75.12
	HMM	81.23	68.2	74.15	73.25
	HSMM	83.31	81.82	82.56	83.89
	Proposed method	84.21	84.21	84.21	83.47
Kyoto1	NB	78.45	73.56	75.93	74.88
	CRF	86.22	87.18	86.70	86.12
	HMM	80.23	90.55	85.08	85.01
	HSMM	89.65	95	92.25	90.26
	Proposed method	94.33	94.28	94.30	94.28
Kasteren-House-C	NB	54.36	53.55	53.95	53.12
	CRF	65.63	68.96	67.25	66.92
	HMM	66.25	64.35	65.29	65.01
	HSMM	77.36	80.23	78.77	77.32
	Proposed method	80.58	78.46	79.51	78.46

NB: Naïve Bayes; CRF: conditional random field; HMM: hidden Markov model; HSMM: hidden semi-Markov model.

Activity level performance comparison

The activity level performance of proposed method is compared with existing method NB. HMM, CRF, and HSMM are given in Figure 3. We measured the F-score of each activity on different datasets Kasteren-House-A, Kasteren-House-C, and Kyoto1 individually. Individual activity performance also shows better accuracy. Kasteren-House-A shown in Figure 3(a) has the accuracy of 85.65%, while Kasteren-House-C has 80.87% of accuracy as displayed in Figure 3(c) and Kyoto1 achieve an accuracy of about 95.67% as in Figure 3(b). We can still earn high accuracy but imbalance in the dataset, where the activities are done more frequently. The used dataset also has problem of overlapping with a minimum number of activity instances and sharing of a different sensor. However, the F-score solves this kind of drawbacks to produce reliable dataset. The proposed method has highest F-score of 0.84, 0.79, and 0.94 in Kasteren-House-A, Kasteren-House-C, and Kyoto1, respectively, compare to existing methods. This shows that our method can also be least affected by the problem of intra-class and inter-class instances.

Figure 3.

The F-score comparison of the proposed method with the existing approaches in the activity level performance: (a) Kasteren-House-A, (b) Kyoto 1, and (c) Kasteren-House-C.

Time complexity

Time complexity appears on calculating the joint probability of hidden state sequences with observed series of activity. The regular time complexity of HMM is N²L for N states and N^N state transition probabilities, 2^N output probabilities of an output sequence of length L. The proposed method has the time complexity of (LN² log N) which means the time complexity is reduced by log N. The effectiveness of the reduced time complexity is shown in Figure 4.

Figure 4.

The time complexity comparison of proposed method with the HMM.

Conclusion

We have presented LoV algorithm using second-order HMM to recognize human activities in the smart home. Proposed method takes two previous activities and the current activity observation for recognition. Three real datasets are used to evaluate the recognition performance of the proposed method with other recognition methods. The proposed method shows F-score of 0.84, 0.79, and 0.94 in Kasteren-House-A, Kasteren-House-C, and Kyoto1, respectively, compare to other existing methods. In addition, it reduced time complexity by log N as compared to others. Therefore, accuracy and time complexity of recognition are better than other approaches. In future, we can increase the accuracy much more applying machine learning schemes for learning model parameters.

Footnotes

Acknowledgements

The work reported in this article was conducted during the sabbatical year of Kwangwoon University in 2016.

Handling Editor: Francesco Longo

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Keshav Thapa

References

Tapia

Intille

Larson

. Activity recognition in the home using simple and ubiquitous sensors. In: Ferscha

Mattern

. (eds) Pervasive Computing. Berlin: Springer, 2004, pp.158–175.

Mynatt

Essa

Rogers

. Increasing the opportunities for aging in place. In: Proceedings of the ACM conference on universal usability, Arlington, VA, 16–17 November 2000, pp.65–71. New York: ACM.

Van Kasteren

TLM

Englebienne

Krose

BJA

. An activity monitoring system for elderly care using generative and discriminative models. Pers Ubiquit Comput 2010; 14(6): 489–498.

Hong

Kim

Ahn

et al . Mobile health monitoring system based on activity recognition using accelerometer. Simul Model Pract Theory 2010; 18(4): 446–455.

Nergui

Yoshida

. Human gait behavior interpretation by a mobile home HealthCare robot. J Mech Med Biol 2012; 12: 1240021.

Lin

Latchman

Lee

et al . A power line communication network infrastructure for the smart home. IEEE Wirel Commun 2002; 9(6): 104–111.

Duong

Phung

Bui

et al . Efficient duration and hierarchical modeling for human activity recognition. Artif Intell 2009; 173(7–8): 830–856.

Ward

Lukowicz

Troster

et al . Activity recognition of assembly tasks using body-worn microphones and accelerometers. IEEE T Pattern Anal 2006; 28(10): 1553–1567.

Aggarwal

Cai

. Human motion analysis: a review. Comput Vis Image Und 1999; 73(3): 428–440.

10.

Abowd

Mynatt

. Charting past, present and future research in ubiquitous computing. ACM T Comput: Hum Int 2000; 7(1): 29–58.

11.

Vail

Veloso

Lafferty

D. Conditional random fields for activity recognition. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems, Honolulu, HI, 14–18 May 2007, pp.1–8. New York: ACM.

12.

Gong

Xiang

. Recognition of group activities using dynamic probabilistic networks. In: Proceedings of the ninth IEEE international conference on computer vision, Nice, 13–16 October 2003, pp.472–479. New York: IEEE.

13.

Oliver

Horvitz

Garg

. Layered representations for human activity recognition. In: Proceedings of the fourth IEEE international conference on multimodal interfaces, Pittsburgh, PA, 16 October 2002, pp.3–8. New York: IEEE.

14.

Rabiner

. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 1989; 77(2): 257–286.

15.

Duong

Bui

HHD

Phung

et al . Activity recognition and abnormality detection with the switching hidden semi-Markov model. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, San Diego, CA, 20–25 June 2005, pp.838–845. New York: IEEE.

16.

Wei

Yue

et al . An Adaptive hidden Markov model for activity recognition based on a wearable multi-sensor device. J Med Syst 2015; 39: 57.

17.

Fahad

Rajaranjan

. Integration of discriminative and generative models for activity recognition in smart homes. Appl Soft Comput 2015; 37: 992–1001.

18.

Fatima Fahim

Lee

. Analysis and effects of smart home dataset characteristics for daily life activity recognition. J Supercomput 2013; 66(2): 760–780.

19.

CASAS Dataset, http://casas.wsu.edu/datasets/

20.

Bell

Fausset

Farmer

et al . Examining social media use among older adults. In: Proceedings of the 24th ACM conference on hypertext and social media, Paris, 1–3 May 2013. New York: ACM.

21.

Van Kasteren

TLM

. Datasets for activity recognition, https://sites.google.com/site/tim0306/datasets

22.

Alam

Reaz

MBI

Ali

MAM

. A review of smart homes—past, present, and future. IEEE T Syst Man Cy C 2012; 42(6): 1190–1203.

23.

Aipperspach

Cohen

Canny

. Modeling human behavior from simple sensors in the home. In: Fishkin

Schiele

Nixon

et al . (eds) Pervasive computing. Berlin: Springer, 2006, pp.337–348.

24.

Chernbumroong

Cang

Atkins

et al . Elderly activities recognition and classification for applications in assisted living. Expert Syst Appl 2013; 40(5): 1662–1674.

25.

Fleury

Noury

. SVM-based multimodal classification of activities of daily living in health smart homes: Sensors, algorithms, and first experimental results. IEEE T Inf Technol B 2010; 14(2): 274–283.

26.

Munguia Tapia

. Using machine learning for real-time activity recognition and estimation of energy expenditure. PhD Dissertation, Department of Architecture, Massachusetts Institute of Technology, Cambridge, MA, 2008.

27.

Namand

Park

. Child activity recognition based on cooperative fusion model of a triaxial accelerometer and a barometric pressure sensor. IEEE J Biomed Health 2013; 17(2): 420–426.

28.

Parkka

Ermes

Korpipaa

et al . Activity classification using realistic data from wearable sensors. IEEE T Inf Technol B 2006; 10(1): 119–128.

29.

Organero

Ruiz-Blazquez

. Time-elastic generative model for acceleration time series in human activity recognition. Sensors 2017; 17(2): E319.

30.

Bourobou

STM

Yoo

. User activity recognition in smart homes using pattern clustering applied to temporal ANN algorithm. Sensors 2015; 15: 11953–11971.

31.

Larochelle

Bengio

. Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th international conference on machine learning, Helsinki, 5–9 July 2008, pp.536–543. New York: ACM.

32.

Hinton

. Training products of experts by minimizing contrastive divergence. Neural Comput 2002; 14: 1771–1800.

33.

Xing

Pei

Keogh

. A brief survey on sequence classification. ACM SIGKDD Explor Newsl 2010; 12(1): 40–48.

34.

Van Kasteren

TLM

Noulas

Englebienne

et al . Accurate activity recognition in a home setting. In: Proceedings of the 10th international conference on Ubiquitous computing, UbiComp 2008, pp.1–9.

35.

Ward

Lukowicz

Troster

et al . Activity recognition of assembly tasks using body-worn microphones and accelerometers. IEEE T Pattern Anal 2006; 28(10): 1553–1567.

36.

Uddin Jalal

Kim

. Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home. IEEE T Consum Electr 2012; 58(3): 863–871.

37.

Jalal

Kamal

Kim

. Human depth sensors-based activity recognition using spatiotemporal features and hidden Markov model for smart environments. J Comput Netw Commun 2016; 2016: 8087545.

38.

Humayun Kabir

Robiul Hoque

Keshav

et al . Two-layer hidden Markov model for human activity recognition in home environments. Int J Distrib Sens N. Epub ahead of print 12 January 2016. DOI: 10.1155/2016/4560365.

39.

Ordóñez

Roggen

. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 2016; 16: 115.