Twitter Can Predict Your Next Place of Visit

Abstract

The present work focuses on predicting users' next place of visit using their past tweets. We hypothesize that tweets of the person have predictive power on his location and therefore can be used to predict his next place of visit. This problem is important for location based advertising and recommender based services. To predict the next place of visit, we calculate the probabilities of visiting different types of places using bank of binary classifiers and Markov models. More specifically, we train bank of binary classifiers on past tweets and calculated the probabilities of visiting next places. Since bank of binary classifiers is based on a bag-of-words model, to account for time of last visited place and place itself, we built Markov models for different time duration to calculate probabilities of visiting next place. Empirical evaluation shows that by combining the probabilities obtained from bank of binary classifiers and Markov models the accuracy of predicting next place increased from 65% to 80%.

1. Introduction

How the life would be if we know the location of desired and intended place around us anywhere in the world. Though it seems to be superficial, the development in wireless and location acquisition technology helps us to build systems that solve this problem to some extent. Because of location acquisition technology, nowadays, it is not difficult to find out the places in given proximity. The bigger challenge is in predicting places for user according to his need and intent in given proximity. The existence of microblogging services like Twitter and Facebook provides a means to know the person's values and needs in better way.

People share their thoughts and happening on Twitter along with mundane information. In recent few years, the popularity of Twitter is growing exponentially. As of now, at the time of writing this paper the number of users on Twitter are 300 million producing 500 million (https://about.twitter.com/company) tweets per day. With mundane information, people also share their activities like what they are planning, visited places, experiences of visited places, with whom they are going, how they are feeling, and so forth. Availability of such valuable information motivates us to build a system that can predict the future places of visits for the user according to their intent and behavior.

In this paper, we hypothesize that future places of visit can be predicted considering two important factors that are (i) previous visited place and (ii) recent tweets. What people write on their timeline reflects their intent and need, which have a significant role in deciding the next activity to be performed. Similarly, previous visited place also has a major role in deciding the next place to be visited. For example, after having lunch people have coffee. Based on given hypothesis we propose a novel approach for predicting next places of visit. We use bank of binary classifiers (BBC) for predicting the probability of visiting next place using recent tweets only. And to account for time of last visited place and place itself, we build Markov models (MMs) for predicting probabilities. Both BBC and MMs compute the probability of visiting next place independently with high accuracy. We show that by combining these two probabilities we can further improve the prediction accuracy. In attaining the goal of this work, we also proposed two algorithms for tagging the tweets with location and for finding out the optimum number of past tweets used for prediction. Our approach consists of the following steps: (a) assigning location to tweets if relevant information is presents; (b) extracting features from past tweets that are used to train the models; (c) building models using bank of binary classifiers; (d) using contextual information to enhance the accuracy of predicting the next place. Here contextual information is time of previous visited place and place itself.

We crawl more than 4600 Twitter timelines for exhaustive experimentation. Ground truths are extracted from these timelines using Google Places API and by analyzing tweets for mentions of visited places. We extract features from users timelines by using our proposed algorithm for finding out the optimum length of past tweets for prediction. We build models and do performance analysis. Our best model yields 80% of accuracy for top 5 predictions. We evaluate our approach on new users also and show that performances of models are similar to the seen users to model. The major contribution of this works is as follows: (1)

Building generic models for predicting future places of visit using recent tweets only.

(2)

Building Markov models for predicting next place to be visited given previous place visited and time since it is visited.

(3)

Ensemble of both models to enhance the model accuracy in predicting next place.

The remainder of the paper is organized as follows. Section 2 presents the proposed approach in detail for predicting next place of visit. Next, Section 3 discusses the experiments and analysis of results. In Section 4 we review relevant literature work. Finally, Section 5 concludes the paper.

2. Proposed Approach

There are two points we want to mention before explaining the 4 steps of proposed approach. Firstly, we build the generic prediction model that captures the relationship between vocabulary used in past tweets and visited places, without considering user's demographics. The main advantage of this approach is that we do not need individual specific training data for predicting their next location.

Secondly, our focus would be the category of the establishment like restaurant, supermarket, pub, and gym, rather than the specific establishment. By doing this, both establishment owners and users can get mutual benefits. For example, if proposed system is predicting restaurant in given spatial proximity, then all the owners of restaurants in the proximity of user's location can approach user with their available offers for promotions. Along with this, the user can also have the option to choose place according to his own suitable interest.

Four steps of our proposed approach are as follows.

2.1. Assigning Location to Tweets

From here onwards we are using location and place of visit interchangeably. Assigning location to tweet is two-step process. First, filter all those tweets having geocoordinates and location information. For location information, we use a regular expression (“I'm at” or “@”). Then by using Google Places API [1] (GPA), get all the places around geocoordinates of the given tweet. If place name present in a tweet is present among places returned by GPA, then that tweet is labeled by the name and categories of location. We want to mention that GPA also returns categories of places which may be more than one. For matching the place name, we extract three words after regular expression from a tweet and find out the ratio r, with total words in each place name returned by GPA. If $r > 0.75$ , we label the tweet with the first match among GPA places.

2.2. Extracted Features

For predicting next place of visit, we use past tweets to infer the intent and interest of the user. We analyze from the ground truths recovered from timelines that people post activity related text before performing that activity. For example, user tweeted, “We are planning to go for some fun” before visiting the central park in New York. The time interval of activity related post may vary from weeks to hours based on the activity to be performed. For example, user interested in cricket match going to happen next month has already started tweeting about the event whereas a user interested in going to a restaurant in the evening tweets just a few hours before visiting restaurant. Therefore, we built independent binary classifiers for each category using appropriate size of the window of past tweets. Here window stands for the time window of past tweets that is to be taken to form feature vectors. The window size is found empirically for each classifier/category independently and explained in next section.

To form feature vectors, first we label past tweets (concatenation of past tweets in given window) with the categories of location, visited by the user, just after posting these past tweets. Note that the location which user has visited can have more than one category, and, therefore, the labels of past tweets may be more than one. It is worth mentioning here that while concatenating past tweets, we are not considering location tweet (tweet that has location information). To build a binary classifier for category $C_{i}$ corresponding data set $D_{i}$ is formed. To form $D_{i}$ , those past tweets which are labeled by $C_{i}$ are treated as positive samples and remaining as negative samples. Hence for training 100 (total number of categories) binary classifiers, we have 100 corresponding data sets. For example, toy data set in Table 1 is having four different categories. Table 2 shows four binary data sets ( $D_{i}$ ) for four categories, respectively, as explained above.

Table 1

Toy Dataset having five instances and four different categories.

Past Tweets	Categories visited
P1	$C_{2}$ , $C_{3}$
P2	$C_{1}$ , $C_{3}$ , $C_{4}$
P3	$C_{4}$
P4	$C_{2}$ , $C_{3}$
P5	$C_{1}$ , $C_{4}$

Table 2

Four binary datasets constructed for each category respectively using dataset given in Table 1. Here − denotes the negative samples.

(a)Data Set $D_{1}$ for category $C_{1}$

Past Tweets	Categories Visited
P1	$- C_{1}$
P2	$C_{1}$
P3	$- C_{1}$
P4	$- C_{1}$
P5	$C_{1}$

(b) Data Set $D_{2}$ for category $C_{2}$

Past Tweets	Categories Visited
P1	$C_{2}$
P2	$- C_{2}$
P3	$- C_{2}$
P4	$C_{2}$
P5	$- C_{2}$

(c) Data Set $D_{3}$ for category $C_{3}$

Past Tweets	Categories Visited
P1	$C_{3}$
P2	$C_{3}$
P3	$- C_{3}$
P4	$C_{3}$
P5	$- C_{3}$

(d) Data Set $D_{4}$ for category $C_{4}$

Past Tweets	Categories Visited
P1	$- C_{4}$
P2	$C_{4}$
P3	$C_{4}$
P4	$- C_{4}$
P5	$C_{4}$

2.3. Bank of Binary Classifiers (BBC) for Predicting Future Location

For each category, we build independent binary classifier and the size of the window of past tweets for each classifier is determined empirically. The steps for determining the window size of past tweets for category $C_{i}$ are given in Algorithm 1.

Algorithm 1: For determining the window size of past tweets use to construct feature vectors of each category.

Input: D: Timelines of all users after labeling, as described in Section 2.1.

Output: Window size of each ith category that is $W_{i}$ .

( $1$ ) foreach category $C_{i}$ in D do

( $2$ ) $W_{i}$ = 6 hours;

( $3$ ) ${A c c}_{i}$ = 0;

( $4$ ) $w = 0$ ;

( $5$ ) repeat

( $6$ ) $w = w + 6$ ;

( $7$ ) $D_{i}$ = g etBinaryDataSet(D, $C_{i}$ , w);

( $8$ ) train binary classifier $H_{i}$ using data set $D_{i}$ ;

( $9$ ) calculate accuracy $A_{i}$ , of classifier $H_{i}$ on validation data set;

( $10$ ) if $A_{i} > {A c c}_{i}$ then

( $11$ ) ${A c c}_{i} = A_{i}$ ;

( $12$ ) $W_{i} = w$ ;

( $13$ ) end

( $14$ ) until w ≤ 600;

( $15$ ) end

( $1$ ) getBinaryDataSet(D, $C_{i}$ , w);

( $2$ ) For each location tweet in D, concatenate past tweets in w hours, excluding location tweet;

( $3$ ) Concatenated tweets are feature vectors labeled by the categories of location;

( $4$ ) Feature vectors labeled with $C_{i}$ are consider as positive samples and the rest as negative sample;

( $5$ ) return $D_{i}$

( $6$ ) end

In Algorithm 1, for each category, we vary window size w (in hours) from 6 hours to 600 hours to construct feature vectors. Then, for each window size w, we form feature vectors that are used to train the binary classifier. Finally, the window size of a particular category is set, based on the best performance of classifier among all window sizes. For constructing binary data set of each category, function getBinaryDataSet $(D, C_{i}, w)$ is called with parameters that are timelines of all users $(D)$ , name of category $(C_{i})$ , and size of window of past tweets in hours $(w)$ . For all location tweets available on all timelines in D, we form feature vectors by concatenating past tweets in w hours before location tweet (excluding location tweet) and labeled these feature vectors with categories of the location visited by the user in location tweet. Those feature vectors having label $C_{i}$ are denoted by the “positive” value of the class attribute in the binary data set and remaining feature vectors by “negative” value of the class attribute, as shown in Table 4. Hence, this function returns the binary data set $D_{i}$ for category $C_{i}$ . Then binary model $H_{i}$ is built using returned data set $D_{i}$ for category $C_{i}$ and accuracy of models are calculated on validation data set to perceive the performance of model on given window size (w). Similarly, model performance is evaluated on different window sizes and based on the best performance of model, respective window size of past tweets for that category is set. This optimum window size of category $C_{i}$ is denoted by $W_{i}$ . Once the window size of each category is determined, we use the respective window size in constructing feature vectors for that category. These feature vectors are used in training and prediction.

Figure 1 shows the block diagram of prediction of places of visit using past tweets. From past tweets we form feature vector FV( $C_{i}$ ) using window size ( $W_{i}$ ) for ith category determined in Algorithm 1. Then FV( $C_{i}$ ) is used to infer the probability of visiting the category $C_{i}$ . Please note that while predicting the probability of visiting category $C_{i}$ , we use past tweets depending on $W_{i}$ for $H_{i}$ classifier at prediction time, as it is hypothesized that history of activity related tweets depends on activity to be performed. Therefore, this probability is denoted by $P (C_{i} ∣ W_{i})$ . For our purpose, we use Naive Bayes as binary classifier $H_{i}$ in training and testing.

Figure 1

Predicting probability of visiting category $C_{i}$ given past tweets in window size $W_{i}$ .

2.4. Considering Previous Visited Place and Time

We consider two other important factors in predicting next place of visit that are previously visited place and amount of time before it is visited (i.e., time duration) along with the recent tweets. For example, predicting next place of visit as restaurant, after user has visited restaurant only, within an hour, is not considered as a good prediction. For our purpose, to track the duration of time between visits of next and previous place, we discretize the time into one-hour slots. We capture the human behavior of visit in a very simplified manner by building Markov chain for different time duration. Therefore, for each time duration, we form Markov chain and transition matrix. In transition matrix $t_{h}$ , entry $t_{h} (i, j)$ represents the probability of visiting place j after place i between time interval $[h, h + 1)$ hours. Let $n_{h} (i, j)$ be the number of times users' visited place j after place i between time interval $[h, h + 1)$ hours; then

\begin{matrix} t_{h} (i, j) = P_{h} (C_{j} ∣ C_{i}) = \frac{n_{h} (i, j)}{\sum_{k} n_{h} (i, k)} . \end{matrix}

(1)

We form transition matrix for each time duration (hours) in set ${1,3, 6,12,24}$ . The probability of visiting next category $C_{i}$ after h hours given previous visited location having categories $C_{a} C_{b} \dots C_{x} (\equiv C_{a b \dots x})$ ( $C_{1}$ is name of category whereas $C_{a}$ is variable that can be any category. E.g., $C_{a} C_{b}$ can be $C_{22} C_{3}$ .) is calculated as follows. Recall that visited location may have more than one category:

\begin{matrix} P_{h} (C_{i} ∣ C_{a b \dots x}) = \frac{P_{h} (C_{a b \dots x} ∣ C_{i}) P (C_{i})}{P (C_{a b \dots x})} . \end{matrix}

(2)

Considering independence assumption between previous visited categories we can write (2) as

\begin{matrix} P_{h} (C_{i} ∣ C_{a b \dots x}) = \frac{P_{h} (C_{a} ∣ C_{i}) P_{h} (C_{b} ∣ C_{i}) \dots P_{h} (C_{x} ∣ C_{i}) P (C_{i})}{P (C_{a}) P (C_{b}) \dots P (C_{x})} \end{matrix}

(3)

\begin{matrix} = \frac{(P_{h} (C_{i} ∣ C_{a}) P (C_{a}) / P (C_{i})) (P_{h} (C_{i} ∣ C_{b}) P (C_{b}) / P (C_{i})) \dots (P_{h} (C_{i} ∣ C_{x}) P (C_{x}) / P (C_{i})) P (C_{i})}{P (C_{a}) P (C_{b}) \dots P (C_{x})} . \end{matrix}

(4)

After simplifying, we can write (4) as

\begin{matrix} P_{h} (C_{i} ∣ C_{a b \dots x}) = \frac{P_{h} (C_{i} ∣ C_{a}) P_{h} (C_{i} ∣ C_{b}) \dots P_{h} (C_{i} ∣ C_{x})}{{P (C_{i})}^{|\{a b \dots x\}| - 1}}, \end{matrix}

(5)

where

P_{h} (C_{j} ∣ C_{i})

are conditional probabilities estimated in (1) and

P (C_{i})

is estimated as follows:

\begin{matrix} P (C_{i}) = \frac{n (C_{i})}{\sum_{k} n (C_{k})} . \end{matrix}

(6)

n (C_{i})

represents the number of times user visited the category

C_{i}

in the training set. Equation (5) uses information of both factors that are previously visited place and how much time before it is visited by picking the appropriate transition matrix. We refer to this Markov modeling which considers time information also as MMs for the sake of brevity.

In order to predict the next place of visit using past tweets and amount of time before which categories are visited by user, we combine these two probabilities that are $P (C_{i} ∣ C_{a b \dots x})$ (estimated in (5)) and $P (C_{i} ∣ W_{i})$ (refer to Figure 1) under the assumption of independence. We refer to the combined classifier $P (C_{i} ∣ W_{i}, C_{a b \dots x})$ as CC for the sake of brevity. Consider

\begin{matrix} P (C_{i} ∣ W_{i}, C_{a b \dots x}) = \frac{P (W_{i}, C_{a b \dots x} ∣ C_{i}) P (C_{i})}{P (W_{i}, C_{a b \dots x})} . \end{matrix}

(7)

As $W_{i}$ and $C_{a b \dots x}$ are assumed to be independent, therefore we can write (7) as follows:

\begin{matrix} P (C_{i} ∣ W_{i}, C_{a b \dots x}) = \frac{P (W_{i} ∣ C_{i}) P (C_{a b \dots x} ∣ C_{i}) P (C_{i})}{P (W_{i}) P (C_{a b \dots x})}, \end{matrix}

(8)

\begin{matrix} P (C_{i} ∣ W_{i}, C_{a b \dots x}) = \frac{(P (C_{i} ∣ W_{i}) P (W_{i}) / P (C_{i})) (P (C_{i} ∣ C_{a b \dots x}) P (C_{a b \dots x}) / P (C_{i})) P (C_{i})}{P (W_{i}) P (C_{a b \dots x})}; \end{matrix}

(9)

after simplifying (9)

\begin{matrix} P (C_{i} ∣ W_{i}, C_{a b \dots x}) = \frac{P (C_{i} ∣ W_{i}) P (C_{i} ∣ C_{a b \dots x})}{P (C_{i})} . \end{matrix}

(10)

3. Experiments

We have done exhaustive experiments on the data set crawled from Twitter using publicly available Twitter API. We have collected 4606 users' timelines that have tweeted at least once from New York between April 24, 2014, and April 29, 2014. Each timeline contains approximately 3200 recent tweets. Also, the tweet rate of these users is at least 20 tweets per day. Among 4606 users the number of location tweets on timelines is highly variable. Though we have started from New York the locations visited by users in our data set are around the world.

3.1. Data Sets

For evaluation, we divide our data set into two subsets according to the number of locations on user's timeline. Therefore, among 4606 users, those who have more than 60 location tweets (tweet that has location information) on their Twitter timeline are considered in the first subset named Data Set 1 and the remaining users in the second subset named Data Set 2. Table 3 has shown both data sets mentioned.

Table 3

Division of users according to number of location tweets on their timelines.

Data Set	Number of Users	Number of Location
Data Set 1	1706	>60
Data Set 2	2900	⩽60

Table 4

Description of training and testing data sets.

Data Set	# Samples	Detail
Training Data Set (TDS)	260,423	Oldest $n - 50$ location of each user from Data Set 1

Testing Data Set 1 (TS1)	84,470	Latest 50 location of each user from Data Set 1

Testing Data Set 2 (TS2)	56,774	All location of each user from Data Set 2

Training Data Set (TDS). From Data Set 1, we use oldest $n - 50$ location tweets of each user for training the binary models, where n is total number of location tweets available on their timeline. Tweets on timelines are ordered in reverse chronological order.

Testing Data Set. For evaluating the performance of models on both seen and unseen users, we use two different sets for testing, named as Test Set 1 and Test Set 2. (A)

Test Set 1 (TS1): we use the remaining latest 50 location tweets of each user from Data Set 1 to form feature vectors.

(b)

Test Set 2 (TS2): we use all location tweets from Data Set 2 to form feature vectors.

3.2. Evaluation

First we present methods to compute the accuracies of proposed models one by one in the following subsections and then propose few baselines for comparing our proposed models.

3.2.1. Using Markov Models (MMs) Only

We form transition matrix $t_{h}$ from training data set, where $h \in {1,3, 6,12,24}$ in hours, as explained in (1). By using the ground truth of test sets, we compute the accuracy of model as follows: (1)

For every test instance, first we find out both time duration and categories the user has visited just before prediction. Here test instance is the location of a tweet which we want to predict.

(2)

Duration is then discretized to integer value h, such that if duration lies within time interval $[h, h + 1)$ then transition matrix $t_{h}$ is used for computation in step (3).

(3)

If previously visited categories are $C_{a b \dots x}$ , then by using $t_{h}$ in (5) we estimate the probability of visiting each category present in training data set.

(4)

Considering top N categories in nonascending order in the previous step (3), if we recover the ground truth, then we take this test instance as correctly classified. Hence, the accuracy of the model can be defined as follows:

\begin{matrix} A c c @ N = \frac{M}{K}, \end{matrix}

(11)

where M represent the number of test instances accurately classified and K represents total number of test instances.

3.2.2. Using Bank of Binary Classifiers (BBC)

By using the ground truth of test sets, we compute the accuracy of model as follows: (1)

For a given test instance (location tweet) we formed feature vector $F V (C_{i})$ for each category $C_{i}$ as described in Section 2.3. We compute the probability $P (C_{i} ∣ W_{i})$ of visiting the category $C_{i}$ by using the binary classifier $H_{i}$ modeled in the training phase.

(2)

By considering top N categories according to probabilities predicted by binary classifiers, if we recover the ground truth, then we take this test instance as correctly classified. The accuracy of this proposed classifier is calculated in the same way as mentioned in (11).

3.2.3. Using $(P (C ∣ W, C_{a b \dots x}))$ (CC)

The evaluation method is similar to evaluation method illustrated for BBC. For every test instance, we compute the probability of visiting each category by using (10). Considering top N categories in nonascending order, if we recover the ground truth, then we classify this test instance as correctly classified. Accuracies of this model are also calculated in similar fashion as mentioned earlier in (11).

3.2.4. Baselines

Our proposed approach is generic and can be applied to both seen and unseen (new) users. As per our knowledge, there is no such literature available that predicts the future place of visit based on only recently used words and latest visited location without using user's demographics. Hence, for evaluating the performance of models, we proposed the following baselines.

Baseline Model 1 (BM1). In this baseline, we have considered the most frequent check-in category in the training data. Let $n (C_{i})$ be the number of times users visited the category $C_{i}$ in the training set. Then,

\begin{matrix} P (C_{i}) = \frac{n (C_{i})}{\sum_{\forall j} n (C_{j})}, \end{matrix}

(12)

where

n (C_{i})

represent the number of times user has visited the category

C_{i}

Baseline Model 2 (BM2). Markov models are built for predicting the next place of visit based on the latest visited location, say, $C_{L}$ , by the user. The probability of visiting $C_{i}$ is estimated by computing a fraction of number of user's visits to $C_{i}$ after $C_{L}$ in total number of visits to any category after $C_{L}$ . For example, the probability of visiting category $C_{i}$ is estimated as follows:

\begin{matrix} P (C_{i}) = \frac{n (C_{L} C_{i})}{\sum_{j} n (C_{L} C_{j})}, \end{matrix}

(13)

where

n (C_{L} C_{i})

represents the count of visiting categories

C_{i}

after visiting

C_{L}

in order.

Here, we want to mention that, in Section 2.4, we form Markov models (MMs) considering amount of time between consecutive visits. Therefore, if we use this baseline model to predict future place, we have only one transition matrix, but in MMs, we have different transition matrix depending upon the granularity of time.

Baseline Model 3 (BM3). As described in Section 2.3, we discuss the need of different window size for each category while predicting the future place of visit. This baseline is defined to show the importance of our approach for deriving window size for each category. For this, we use fixed window size for each category and show the performance against our approach used in Algorithm 1. Similar to BBC in Section 2.3, in this baseline, Naive Bayes is used as binary classifier $H_{i}$ (see Figure 1), for training and testing. We use five different window sizes that are 5, 10, 20, 30, and 100 hours.

3.3. Results

We conduct two sets of experiments. In the first set, we compare the performance of model BBC proposed in Section 2.3 with Baseline Model 3 (BM3). In the second set of experiments, we compare the performances of proposed models with Baseline Models 1 and 2.

3.3.1. $P (C ∣ W)$ versus BM3

The objective of this set of experiments is to show that BBC performs better when feature vectors are constructed by using window sizes derived by Algorithm 1 for training and testing. We want to recall that window size derived for category $C_{i}$ may not be equal to category $C_{j}$ , where i and j are some arbitrary category, as window size depends upon the performance of the binary classifier.

For this, first we show the performance of the classifier on same window size for each category (BM3). One by one, we use five different window sizes that are 5, 10, 20, 30, and 100 hours. Results are shown in Figure 2. WinN represents the performance of BM3 when window size of past tweets is set to N hours. From these results, we observe that BM3 performs better when the size of a window is 10 hours for each category. Also, BM3 performs similarly on both test sets, that is, seen (TS1) and unseen (TS2), which validates that this classifier can be applied to wide variety of users. These results also support our hypothesis that words in tweets posted by users are highly correlated with the future activity or the location of that activity done by that user. By learning the relation of words and locations, BM3 infers the future location using words with high accuracy.

Figure 2

Performance of BM3 on different window sizes for top 5 accuracies. WinN represent the window size of N hours, where $N \in {5 10 20 30 100}$ .

Results shown in Figure 2 validate the existence of the relationship between words and future location. By observing the text in a window, we found out that using same window (BM3) for all categories does not model the problem appropriately. If the window size is 5 hours then the words are very few which results in erroneous predictions. But if we increase the window size, then some words may come which have no role in predicting the current location. For example, tweet like “now looking for some fun” tweeted 30 hours before has no significance in deciding the current activity. From the above results, we found optimum window size is 10 hours for BM3.

For appropriate modeling, as discussed in Section 2.2, we derive window size for each category by using Algorithm 1 for training and testing. We compare these two approaches that are BM3 (10 hours) and $P (C ∣ W)$ . Results in Figure 3 show that we can enhance the accuracy of prediction by considering appropriate duration of tweets' window.

Figure 3

Performance of BM3 and $P (C ∣ W)$ on top 5 accuracies.

3.3.2. Comparison of Proposed Models with BM1 and BM2

In this section, we compare the proposed models with BM1 and BM2. The performance of our proposed models outperforms both baselines as shown in Figure 4. We can see that MMs perform much better than BM2. The reason of this improvement is that when we use BM2 only, we have only one transition matrix with no time information in it. Thus, the only input to BM2 is previous visited category. But when we use MMs for prediction, we have two input parameters that are previous visited category and the time duration when it is visited. Based on the time duration we pick the corresponding transition matrix for computing predictions. The information of visiting preference according to time is lost in BM2, hence resulting in erroneous predictions. $P (C ∣ W)$ is explained in Section 3.3.1 in detail. From accuracies given by MMs and $P (C ∣ W)$ , we can infer that both recent tweets and previous visited category with duration have almost equal role in predicting next place of visit. But when both these model accuracies are combined under the independent assumption, the prediction rate improves significantly as seen in Figure 4. From this improved accuracy we can infer that the accuracy of proposed model can be enhanced by having more contextual information of the user which helps the model in understanding the user in better way.

Figure 4

Comparison of proposed models with baselines.

4. Related Work

In last decade, researchers have used Twitter data for predicting various things in future. Bollen et al. [2] used tweets for predicting stock market index with high accuracy. Similarly Asur and Huberman [3] predict box-office revenues of movies in advance using related tweets.

Tweets are also explored heavily for inferring users interests for commercial purposes. References [4–6] have used different techniques over tweets for inferring user interest and show that tweets contain a lot of valuable information related to user interest.

For recommendation also people used Twitter data. For example, Sadilek et al. [7] used tweets to recommend those restaurants that user should not go. References [8–10] proposed a recommender system for news recommendations by modeling the user profile and exploiting the tweet-news relationship.

With the exponential growth in usage of smart phones, now users publish millions of tweets frequently from anywhere and at any time. Because of this, one other field emerged that studies the mobility prediction of users; that is, where the user will be? or where he was when given tweets were published? References [11–13] have shown that by using Twitter data we can predict the location of user with high accuracy. But the granularity level of predicting the user location using tweets is at either the country level or regional level. For example, Han et al. [14] predict user location, that is, country or region by identifying location indicative words (i.e., frequent word used at the specific location), in contrast to our approach where we predict locations such as shop, church, and restaurant, where granularity level is very deep.

Some of the efforts have been made for predicting user location such as restaurant and shops, but the data used there are generated by Location Based Social Networks (LBSN). Using LBSN for prediction is very different from using Twitter, as Twitter data is very unstructured and challenging in comparison to structured data of LBSN. References [15–19] have explored LBSN (Foursquare) for recommending or predicting next location to the user.

Some people used text published by users for deriving personality traits and based on common traits recommendations have been made. For deriving personality traits Linguist Inquiry and Word Count (LIWC) [20] had been used frequently. References [20–29] have shown that lexicons used by people can be used for understanding their personal values and how to use these traits for a recommendation. Though all these approaches have been used extensively in analyzing personality traits, these also have shortcomings of predefined word category correlation. Alternatively, Schwartz et al. [30] used rather different approach for using vocabulary known as open-vocabulary technique (using all set of words available on social media) in comparison to closed-vocabulary technique, Linguistic Inquiry and Word Count (LIWC) [20], where some predefined sets of words are used for deriving the personality traits. In their study, they have shown that by using the open-vocabulary approach they got higher state-of-the-art accuracy in predicting gender in comparison to LIWC by exploring latent factors that are not captured by closed-vocabulary approach. Motivated by this approach, we have used all words available on timelines for modeling the users' behavior.

5. Conclusion

In the present work, we study the problem of predicting next place of visit using tweets. We proposed a methodology for predicting next place of visit for user. For experiments, we crawled more than 4600 users' timelines from Twitter. For modeling and generating ground truths, we labeled those tweets with visited locations where relevant information is present. For labeling tweets, we propose simple pattern matching technique labeling tweets with high accuracy. We have also proposed an algorithm for deriving optimal length of window for each category for deriving feature vectors which is used in training and testing of BBC (bank of binary classifiers). From this trained model, BBC (Naive Bayes), we compute the probabilities of visiting next place. To account for the time of last places visited, we trained Markov models that also compute probabilities of visiting next places. Assuming the independence of probabilities produced by bank of binary classifiers and Markov models, we combined these two probabilities. From experiments, we found out that probabilities of visiting next place increased up to 80% from 65%. This shows that our model can be potentially used in location based advertising, intelligent resource allocation, and so forth. Also, similar performance of proposed model on both seen and unseen data sets make it applicable to wide variety of users.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

Google Places API

https://developers.google.com/places

Bollen

Mao

Zeng

Twitter mood predicts the stock market

Journal of Computational Science 2011 2 1 1 8

10.1016/j.jocs.2010.12.007

2-s2.0-79953102821

Asur

Huberman

B. A.

Predicting the future with social media

Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI ’10)

September 2010

Toronto, Canada

492 499

10.1109/wi-iat.2010.63

2-s2.0-78649842272

Bhattacharya

Zafar

M. B.

Ganguly

Ghosh

Gummadi

K. P.

Inferring user interests in the twitter social network

Proceedings of the 8th ACM Conference on Recommender Systems (RecSys ’14)

October 2014

Foster City, Calif, USA

357 360

10.1145/2645710.2645765

2-s2.0-84908885727

Budak

Kannan

Agrawal

Pedersen

Inferring user interests from microblogs

2014

Ramasamy

Venkateswaran

Madhow

Inferring user interests from tweet times

Proceedings of the 1st ACM Conference on Online Social Networks (COSN ’13)

October 2013

Boston, Mass, USA

ACM

235 240

10.1145/2512938.2512960

Sadilek

Brennan

S. P.

Kautz

H. A.

Silenzio

nEmesis: which restaurants should you avoid today?

Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing (HCOMP ’13)

2013

AAAI

Jonnalagedda

Gauch

Personalized news recommendation using twitter

Proceedings of the IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WIC/ACM ’13)

November 2013

Atlanta, Ga, USA

21 25

10.1109/WI-IAT.2013.144

Morales

G. D. F.

Gionis

Lucchese

From chatter to headlines: harnessing the real-time web for personalized news recommendation

Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM ’12)

February 2012

Seattle, Wash, USA

10.1145/2124295.2124315

10.

Abel

Gao

Houben

G.-J.

Tao

Twitter-based user modeling for news recommendations

Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI ’13)

August 2013

Beijing, China

AAAI Press

2962 2966

11.

Yuan

Cong

Sun

Magnenat-Thalmann

Who, where, when and what: discover spatio-temporal topics for twitter users

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’13)

August 2013

Chicago, Ill, USA

605 613

10.1145/2487575.2487576

12.

Lichman

Smyth

Modeling human location data with mixtures of kernel densities

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14)

August 2014

New York, NY, USA

ACM

35 44

10.1145/2623330.2623681

13.

Lee

Ganti

R. K.

Srivatsa

Liu

When twitter meets foursquare: tweet location prediction using foursquare

Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MOBIQUITOUS ’14)

December 2014

London, UK

ICST

198 207

10.4108/icst.mobiquitous.2014.258092

14.

Han

Cook

Baldwin

Textbased twitter user geolocation prediction

Journal of Artificial Intelligence Research 2014 49 451 500

15.

Mathew

Raposo

Martins

Predicting future locations with hidden Markov models

Proceedings of the ACM Conference on Ubiquitous Computing (UbiComp ’12)

September 2012

Pittsburgh, Pa, USA

16.

Bao

Zheng

Mokbel

M. F.

Location-based and preference-aware recommendation using sparse geo-social networking data

Proceedings of the 20th International Conference on Advances in Geographic Information Systems (SIGSPATIAL ’12)

November 2012

Redondo Beach, Calif, USA

ACM

199 208

10.1145/2424321.2424348

17.

Yuan

Cong

Sun

Graph-based point-of-interest recommendation with geographical and temporal influences

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM ’14)

November 2014

Shanghai, China

ACM

659 668

10.1145/2661829.2661983

18.

Yuan

Cong

Sun

Magnenat-Thalmann

Time-aware point-of-interest recommendation

Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’13)

2013

363 372

10.1145/2484028.2484030

19.

Gao

Tang

Liu

Exploring temporal effects for location recommendation on location-based social networks

Proceedings of the 7th ACM Conference on Recommender Systems (RecSys ’13)

October 2013

Hong Kong

93 100

10.1145/2507157.2507182

2-s2.0-84887592399

20.

Pennebaker

J. W.

Chung

C. K.

Ireland

Gonzales

Booth

R. J.

The Development and Psychometric Properties of LIWC2007 2007

Austin, Tex, USA

LIWC. Net

21.

Tausczik

Y. R.

Pennebaker

J. W.

The psychological meaning of words: LIWC and computerized text analysis methods

Journal of Language and Social Psychology 2010 29 1 24 54

10.1177/0261927X09351676

2-s2.0-77649253939

22.

Kramer

A. D. I.

Chung

C. K.

Dimensions of self-expression in facebook status updates

Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM ’11)

July 2011

Barcelona, Spain

169 176

23.

Chen

Hsieh

Mahmud

Nichols

Understanding individuals' personal values from social media word use

Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’14)

February 2014

San Francisco, Calif, USA

ACM

405 414

10.1145/2531602.2531608

2-s2.0-84898976252

24.

Golder

S. A.

Macy

M. W.

Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures

Science 2011 333 6051 1878 1881

10.1126/science.1202775

2-s2.0-80053345545

25.

Argamon

Konnel

Pennebaker

J. W.

Schier

Mining the blogosphere: age, gender and the varieties of self-expression

First Monday 2007 12 9

2-s2.0-34548688436

26.

Gilbert

Phrases that signal workplace hierarchy

Proceedings of the Conference on Computer Supported Cooperative Work (CSCW ’12)

February 2012

Seattle, Wash, USA

1037 1046

10.1145/2145204.2145359

2-s2.0-84858168142

27.

Mahmud

Zhou

M. X.

Megiddo

Nichols

Drews

Recommending targeted strangers from whom to solicit information on social media

Proceedings of the International Conference on Intelligent User Interfaces (IUI ’13)

March 2013

ACM

37 47

10.1145/2449396.2449403

2-s2.0-84875832873

28.

Lee

Mahmud

Chen

Zhou

M. X.

Nichols

Who will retweet this?: automatically identifying and engaging strangers on twitter to spread information

Proceedings of the 19th International Conference on Intelligent User Interfaces (IUI ’14)

February 2014

Haifa, Israel

ACM

247 256

10.1145/2557500.2557502

29.

Badenes

Bengualid

M. N.

Chen

Gou

Haber

Mahmud

Nichols

J. W.

Pal

Schoudt

Smith

B. A.

Xuan

Yang

Zhou

M. X.

System U: automatically deriving personality traits from social media for people recommendation

Proceedings of the 8th ACM Conference on Recommender Systems (RecSys ’14)

October 2014

Foster City, Calif, USA

373 374

10.1145/2645710.2645719

2-s2.0-84908889451

30.

Schwartz

H. A.

Eichstaedt

J. C.

Kern

M. L.

Dziurzynski

Ramones

S. M.

Agrawal

Shah

Kosinski

Stillwell

Seligman

M. E. P.

Ungar

L. H.

Personality, gender, and age in the language of social media: the open-vocabulary approach

PLoS ONE 2013 8 9

e73791

10.1371/journal.pone.0073791

2-s2.0-84884541833

Twitter Can Predict Your Next Place of Visit

Abstract

1. Introduction

2. Proposed Approach

2.1. Assigning Location to Tweets

2.2. Extracted Features

2.3. Bank of Binary Classifiers (BBC) for Predicting Future Location

Algorithm 1: For determining the window size of past tweets use to construct feature vectors of each category.

2.4. Considering Previous Visited Place and Time

3. Experiments

3.1. Data Sets

3.2. Evaluation

3.2.1. Using Markov Models (MMs) Only

3.2.2. Using Bank of Binary Classifiers (BBC)

3.2.3. Using ( P ( C ∣ W , C a b ⋯ x ) ) (CC)

3.2.4. Baselines

3.3. Results

3.3.1. P ( C ∣ W ) versus BM3

3.3.2. Comparison of Proposed Models with BM1 and BM2

4. Related Work

5. Conclusion

Footnotes

Conflict of Interests

References

3.2.3. Using $(P (C ∣ W, C_{a b \dots x}))$ (CC)

3.3.1. $P (C ∣ W)$ versus BM3