Abstract
Object-usage-based human activity recognition systems require activity data for learning. Acquiring such data from the real world is expensive and time consuming. To overcome such difficulties, the exploitation of web activity data is gaining popularity. However, due to a lack of much real-world information in such data, existing activity models are not suitable for web data. In this paper, we propose a hidden Markov model- (HMM-) based activity model specially designed to use web activity data for activity recognition. It utilizes a sequence of object-usage information for activity recognition. We also propose a web activity data mining algorithm for this model. It is extremely fast and efficient in comparison with the existing algorithms. We perform three experiments to validate the proposed model. We show that the model can be effectively utilized by an activity recognition system.
1. Introduction
Real-world activity data collection is used for engineering human activity recognition systems [1–6], but the process is cumbersome. Expertise and resources are required to design and install sensors, controllers, network components, and middleware just to perform basic data collections. As a result, very little real sensor activity data has been collected and analyzed, and only rarely is this data made available to the research community [7].
To overcome such difficulties, we need an alternate source of this data that is inexpensive and readily available. One of the most promising sources is the World Wide Web (WWW). Numerous web pages exist on the web that explain how to do activities of daily livings. Each of these pages provides details about activities, such as what objects to use, how to use them, and in what sequence.
Researchers have been working to mine this data to train activity models. For example, Perkowitz et al. introduced the notion of mining generic activity models from the web. They showed that it is possible to convert web data into activity models that can be used in conjunction with RFID tags to recognize activity [8]. Wyatt et al. improved the system by introducing a model that includes idiosyncrasies of the environment in which it will be deployed [9]. Hu and Yang showed how to use web knowledge as a bridge to help link different activity label spaces in transfer learning for activity recognition [10].
Although these systems show an important direction, the activity models are not made for web activity data. These are mostly independent and identically distributed (i.i.d.) object-usage models. The i.i.d. assumption would be sufficient for a model trained using real-world data. However, it will not be the same when using web data because of the lack of real-world information such as time. An activity model should include as much information as possible, because web data can only offer a generic view of an activity. Sarkar et al. used room locations (e.g., kitchen and living room) and objects together to build an activity model [3, 11]. They have shown that the addition of extra information leads to higher recognition accuracy. However, their system would not perform well in a home with only one room, such as a studio apartment.
In this paper, we propose an activity model suitable to use web activity data for training. The model is based on a hidden Markov model (HMM) in which the output of a state depends on the sequence of object usage at a given time. The model relaxes the i.i.d. assumption and exploits the sequential pattern of object usage. This model is not suitable to train with real-world data in a conventional way, as the complexity to determine the object-usage sequence probabilities will be very high. However, this will not be the same for web activity data, since we propose that the model be trained while the system is online and recognizing activity. We propose an algorithm to mine and determine the object-usage sequence probability on demand. We perform three experiments to validate the performance of the proposed model. We show that the accuracy of activity recognition is remarkable when applied to three real-world datasets.
The remainder of the paper is organized as follows. In Section 2, we discuss the advantages and disadvantages of the various previous studies related to this work. In Section 3, we introduce the activity recognition system and discuss the properties of the model the system used. In Section 4, we describe the web activity data mining algorithm. In Section 5, we show the validity of the system with the help of experimental results and discussions. In Section 6, we conclude the paper with a direction of future work.
2. Related Work
Object-usage-based activity recognition has long been a goal of researchers due to its strength to provide support in diverse healthcare applications [1, 2, 12–15]. A variety of activity models have been proposed for this purpose. For example, Tapia et al. [1] proposed a Naive Bayes activity model to recognize activities in a home setting. They showed excellent promise, even though their mechanism suffers from low recognition accuracy. Van Kasteren et al. [2] used similar settings with a hidden Markov model and conditional random field.
Although the parameters of some of these models can be learned from web activity data, the activity recognition accuracy will not be high, since web activity data lacks real-world information. To the best of our knowledge, Perkowitz et al. [8] first introduced a technique for mining generic activity models from the web. They converted natural-language recipes into activity models and used them in conjunction with RFID tags to recognize activity. Their model consists of a sequence of states and is based on a particle filter implementation of Bayesian reasoning. Their model extractor works as follows.
Select a set of websites describing activities, such as http://www.ehow.com/ and http://www.epicurious.com/, and understand the HTML structure of such websites. Search each of the pages for the activity direction and extract the direction. Set the label of an activity as the title of the direction. Extract the object phrases from the direction. Remove the phrases without noun sense. Determine the object-usage probability as a Google Conditional Probability (GCP):
where hitcount Finally select only the tagged objects (objects with embedded RFID tags) from the phrases.
They use a sequential Monte Carlo (SMC) approximation to infer activities probabilistically. They borrowed the inference engine from a previous study [15]. Despite their good performance in classifying hand-segmented object-use data, they suffered from low accuracy and limited applicability. In addition to this, they used specific web sites whose formats were known before mining the activity models [9].
Wyatt et al. [9] proposed an unsupervised activity recognition system (UARS) using web activity data mining. They developed two algorithms: a document genre classifier that identifies the pages describing an activity and an object identifier that extracts all the objects from a page and calculates the object's weights. Their mining algorithm works as follows.
It queries Google with the activity name along with “how to” as the discriminating phrase. Google returns the number of pages it has indexed in its server. It then retrieves P pages as the top z pages within the total pages returned by Google. They did not define the optimal value of z. The efficiency of mining clearly depended on z, with a larger value of z meaning more efficient mining. The algorithm uses the genre classifier to determine Using the object identification technique, for each page p in Finally, the algorithm calculates the objects usage probabilities as
From the mined information they assembled an HMM, M, which has the traditional 3 parameters:
Although these systems perform well in mining activity models from the web, they take hours or days to do so. Additionally, the accuracy of activity recognition is not satisfactory. This is because the models are only based on object-usage information. As the web provides a general sense of object usage for an activity, using them in the real world where an activity is individual-specific would not provide high-accuracy activity recognition. We need to use as much information as possible to bridge the gap between general view and individual-specific view of an activity.
Sarkar et al. [3, 11] used the location of an activity along with object-usage and showed that the addition of location provides better accuracy. Their model works well in environments with many rooms. However, it would not perform equally well in a home with only one or two rooms, since such a situation location could not offer significant information about an activity.
In this paper, we propose an object-usage-based activity model. The difference from the existing model is that it uses an object-usage sequence instead of treating each of the objects independently. The model is applicable to diverse homes regardless of the number of rooms. The model requires object-usage sequence data for training. We also propose a web activity mining algorithm for extracting such sequential data from the web.
3. Activity Recognition System
3.1. Overview
We consider an environment in which a set of objects (e.g., light, door, and faucet) are embedded with sensors. A sensor is attached to an object in a way such that it is possible to determine the state of the object when used. Given a set of activities to monitor and object names (with embedded sensors), the purpose of the activity recognition system is to recognize the current activity of a person depending on the sequence of objects used at a given time.
The system does not require training before deployment. It will be trained online while a person is doing daily activities. The system determines the probability of a pair of object-usage sequences (e.g., refrigerator and cabinet) each time it observes a new pair. It reduces the system's complexity since it does not need to know every possible pair of object usage. An overview of the system is shown in Figure 1.

Overview of the activity recognition system.
Let
3.2. The Activity Model
The activity model is based on HMM. Each of the states is an activity, and the observation probabilities are the sequence of object-usages. The graphical representation of the model is shown in Figure 2. It consists of a hidden state (i.e., activity),

The graphical representation of the activity model.
In (3),
There are n distinct observation symbols per state. The observation symbols correspond to the physical output of the system being modeled. We consider the object-usage sequence as the observations symbols per state.
During training, we determine the
During inference, the Viterbi algorithm is used to find the most likely labels for the new observation sequences [2]. This algorithm has been successfully applied with HMM to solve many activity recognition problems.
4. Web Activity Data Mining
As we can see in (4), to train the system we need to know two types of probabilities: the probability of using an object given an activity, that is,
4.1. Web Activity Pages
There are two types of activity pages on the web: explicit activity page (EAP) and implicit activity page (IAP).
Definition 1 (explicit activity page).
A web page is called an explicit activity page if it provides detailed instructions about performing an activity. It has a title, which in most cases contains the activity name. It has a text section that provides details of an activity such as what objects to use and their sequence.
For example, the web page [17] shown in Box 1 is an EAP that contains the activity name, “Bathing,” in the title and contains a detailed description of the activity in the body. The text has a set of object names (such as towels and shampoo) and their usage sequence related to “Bathing.”
When bathing a person with dementia, … Prepare the bathroom in advance by: Have large warmth), Pad the that the room temperature is pleasant.
Definition 2 (implicit activity page).
A web page is called an implicit activity page if it does not provide explicit instructions about an activity but instead provides information that is implicitly related to an activity.
For example, the web page [18] shown in Box 2 is an IAP that contains the activity name, “Bathing,” in the title and contains implicit information in the text related to an activity. It also refers to a set of object names (such as Door and Bathtub) and their usage sequence related to an activity.
Not all safety walk-in tubs are The same. When comparing safety bath tubs from different manufacturers, here are some of the differences you should know about.
Safety
What is Seabridge Dual Draining?
Why do some walk-in baths have
Are safety bathtubs that hold less
4.2. Mining
The goals of mining for a given set of activities are to find EAPs and IAPs and extract object-usage information from them. One way of accomplishing this would be to search for these pages by a search engine (e.g., Google), download the pages, and obtain objects information from them using a natural language processing (NLP) algorithm. However, it will not be feasible for us since downloading the pages could take hours or even days for a single activity. We need an algorithm that extracts the desired information in a real-time without downloading the web pages.
A set of web search engines (e.g., Google and Bing) already have downloaded the pages and stored all the information on their server. The mining will be very fast if we can dig out the desired data from their server. Fortunately, almost all the search engines (SEs) provide special mechanisms for querying the required information. For example, Table 1 provides three query modifiers and operators that can be used along with queries. Our objective is to use these to get the desired information.
Search engine (SE) query modifiers and operators.
Algorithm 1 shows the way of achieving this. It takes the list of activities, A, and the set of object-usage,
Sequences (POS): /* Check in the local database, if not exists get it from web */;
pages indexed by the search engine for the given query */;
For each activity
4.2.1. Number of Queries Required for Mining
As we can see in Algorithm 1, the number of queries needed, r, depends on two factors:
5. Evaluation and Results
We perform three experiments to evaluate the performance of the proposed system. In the first experiment we evaluate the system's performance in recognizing activities of daily life and compare it with a previous system [3]. In the second experiment, we estimate the time required to mine web activity data.
We use similar settings to those used in another study [19]. We use two popular search engines, Google and Bing, for mining web activity data.
We use three real-world activity datasets gathered by Tapia et al. [1] at the MIT PlaceLab (Placelab 1, Placelab 2), and by Van Kasteren et al. [2] at the Intelligent Systems Lab Amsterdam (ISLA). The same set of activities is considered as in [3]. The γ is set to 5 to set the self-transition probabilities to 80%. It ensures that the object-usage sequences play the central role for transition between states.
5.1. Experiment 1: Activity Recognition Accuracy
In this experiment, we verify the accuracy of activity recognition. Figure 3 summarizes the results for three datasets. The first two bars from left to right of each of the three bars represent the class accuracy of the system when using Google and Bing, respectively. Using Google, the system achieves overall class accuracies of 68.12%, 65.50%, and 79.12%, respectively, for the three datasets. Using Bing, the system achieves overall class accuracies of 69.12%, 67.35%, and 80.46%, respectively. The accuracy of activity recognition is better when using Bing's data for training. This indicates that the Bing activity data is somewhat more organized than that of Google.

The accuracy of activity recognition.
5.2. Experiment 2: Performance Comparison with Other Systems
We compare the system's performance of our system (HMMaM) with two existing systems, a general-purpose activity recognition system (GPARS) [3] and unsupervised activity recognition using automatically mined common sense (UARS) [9].
We compare two versions of GPARS; in the first version it uses a naive Bayesian-based two-layer classifier to classify an activity, and in the second version it uses one-layer (also naive Bayesian-based) classifier. The two-layer classifier works as follows: it first classifies a group of potential activity using a location-and-object-usage based model in the first layer and then classifies an activity from that group using an object-usage based model in the second layer. In the one-layer classifier, although, the GPARS uses location-and-object-based model, however, by setting the parameter,
The comparison results are shown in Figure 4. The second and third bars represent the accuracies of the two versions of GPARS, respectively. Even though the proposed system does not use location information in which an activity is performed, the system performs equally well in comparison with the two-layer GPARS. The system outperforms one-layer GPARS. This is because the proposed system utilizes object-usage sequences which give more realistic information about an activity.

Performance comparison with the existing systems.
The accuracies with UARS are shown in the fourth bar of Figure 4. The proposed system also outperforms URAS in classifying an activity. This is expected, since HMMaM uses robust model for classifying an activity.
5.3. Experiment 3: Time Required for Mining
In this experiment, we evaluate the efficiency of the mining engine in extracting web activity data. We inspect how long it takes to mine data for each day in each of the three testing datasets. Figure 5 shows the time required per day for each of the datasets. The figure represents 10 days of data.

Web activity mining time per day.
The mining time decreases gradually over time. Figure 5 shows that, after 3–5 days, the mining time goes down to nearly zero for all the datasets. This is expected, since the mining engine stores mined data locally and uses them for future reference.
The mining time for a dataset acquired from an environment containing more tagged objects (with embedded sensors) is generally higher than that for other datasets. Figure 5 shows that the mining time for Placelab 1 datasets is higher, since the number of tagged objects is higher. This is a common phenomenon, because in that dataset, there are more objects per activity in general.
5.4. Effect of Constant Self-Transition Probability
We have evaluated the algorithm with different γ values, ranging from 2 to 8. Similar to URAS, [9], the accuracy of activity recognition has not been affected much. The mean accuracy across these values for three datasets was 63.5%, 60.34%, and
6. Conclusion
We have introduced a novel activity model for human activity recognition in a home setting. This is an HMM-based model in which the transition to the next state at a given time depends on the current state and the observation sequence. The states are the activities, and an observation is an object-usage sequence. The transition from an activity to this activity or another activity depends on the prior probabilities and object-usage sequence probability.
Although the use of an object-usage sequence gives better understanding of an activity, it is very complex and time-consuming to learn sequence probabilities from real-world activity data. A substantial number of objects can be used in an environment, and therefore, the number of possible sequences can be enormous. Instead of using real-world data, we used web activity data and proposed an efficient web mining algorithm to learn the sequence probabilities on demand.
We performed three experiments to verify the activity model and to validate the performance of the mining algorithm. We showed that the model can be applied to recognize the activities of daily life, and the mining algorithm can efficiently mine activity data from the web.
Footnotes
Conflict of Interests
The author has no conflict of interests regarding the publication of this paper.
Acknowledgment
This work was supported by the Hankuk University of Foreign Studies Research Fund of 2014.
