Abstract
The mobile human social network actually might be the hugest and best “sensor network” because of the explosive growth in social network content. Nowadays, more and more mobile social applications offer a much easier way for people to share their feeling including vision, haptic, hearing, and smell with the location information by words, images, or even videos. These new sharing methods appearing in the mobile social network actually give us a precious chance to sense the world. Extra systems, which are specialized in particular sensing, do not need to be created any more. The specific sensing data can be acquired from the social network by handling the heterogeneous data. The contribution of this paper lies in developing a model that collects samples considering the relevancy from the perspective of location from different mobile social networks and estimating the occurrence likelihood of the perceived event with collected samples. The simulations and real-world case studies are also developed to verify the reliability of the model and the effectiveness of the Location Aware EM algorithm.
1. Introduction
Mobile social networks, such as twitter and Weibo, where people upload sensing information including location information, offer a precious chance for us to provide the Location Aware Service with the concept of crowdsourcing. Human is regarded as the most complex “sensor,” and using human as sensors has become a hot topic in recent years. There has never been a greater need for monitoring the living environment than before in the rapid development of our society. However, constructing a specific system to monitor one or several events costs a lot. For example, installing cameras in each street to monitor the traffic condition will spend a lot. At the same time, it also will cost much if we put so many acoustic sensors everywhere; however, human is everywhere. People may complain of the traffic jam or noise pollution with uploading messages with location information when they are suffering. Actually, the human sensing is thinking to extend the participatory sensing system and sensor networks.
The heterogeneous sensing data usually can assist us in estimating particular problems by building a specific mathematical model. One of the meaningful research directions is whether a general mathematical model can be built to reconstruct the accurate state of the physical environment from these human sensing data. We can obtain a kind of service which is similar to the Location Based Service (LBS); we call it Location Aware Service (LAS).
Someone may be curious about how to process this so much subjective information or how to convert different kinds of data to be homogeneous. These are not the topic of this paper, and these studies are involved in natural language processing, image recognition, and so forth.
Another significant challenge in mobile social sensing applications lies in making sure of the correctness of collected sensing data. Data mining usually obtains the essential information from big data. Therefore, the participants and their reliability are always not known beforehand. Meanwhile, the uploaded mobile social data with location information is much more convincing because the location information is acquired from the position sensors. People also do not know that their open social network information has become the most useful environmental sensing data.
The main topic of the paper mainly focuses on how to construct a prediction model by utilizing sensing data with location information to estimate whether events will happen or not in the future. The future sensing system may not have an independent existence, and it may rely on other mobile social networks or mobile applications with the concept of crowdsourcing. The concepts of crowdsourcing and human sensing are inseparable and promote each other.
The paper mainly focuses on the following questions from a social networked sensing perspective: (1) how these heterogeneous Location Aware Human Sensing Data can be changed into homogeneous human sensing data and (2) how to estimate if one event will happen somewhere.
The rest of this paper is organized as follows. The related works are reviewed in Section 2. In Section 3, we present a basic framework of Location Aware Human Sensing System and elaborate the Location Aware Prediction Model in Human Sensing System. Section 4 elaborates the problem that needs to be solved. The proposed Location Aware EM algorithm is detailed in Section 5. Evaluation results are presented in Section 6. We conclude our paper in Section 7. Acknowledgments are listed in the end of the paper.
2. Related Works
Human sensing has touched much more attention than before because of the great increase in the number of smart mobiles with location sensors such as GPS by individuals. With the rapid development of wireless communication techniques (e.g., 3G, 4G networks and WiFi), sharing their feelings by uploading texts, images, and videos to the mobile social networks has become common for people who use smart mobile phones with sensors.
Abdelzaher et al. [1] conclude the primary participatory sensing applications deeply. A number of early participatory sensing prototype systems have been built such as BikeNet [2], SoundSense [3], CenceMe [4, 5], MetroSense [6], Bubble-Sensing [7], Urban Tomography [8, 9], CarTel [10], Darwin [11], and Microblog [12] at the same time. These participatory sensing prototype systems lay the foundation of human sensing.
In practice, human can perceive much more than ordinary sensors. The optimal way to obtain data may be that the data providers are unaware that they have provided essential sensing data to the Human Sensing System.
Human sensing is a branch of participatory sensing, and it applies the idea of using human as sensors to perceive the ground truth. Khan et al. [13] summarize mobile phone sensing systems. Participatory sensing systems are divided into personal, social, and public participatory sensing systems in this paper. Personal sensing focuses on personal monitoring, and such systems collect custodians information about his/her daily life patterns and physical activities, heart rate, blood pressure, sugar level, personal contacts and location, and so forth. Social sensing mainly shares information in social groups, and this is the main difference in contrast with the other two kinds of sensing. Public sensing is shared with everyone for public good; it senses the environment or events that the public wonders, such as traffic jams, sound pollutions, and temperature.
However, most researchers have been focusing on how to construct novel participatory sensing systems, and they ignored how to make more people use these participatory sensing systems more frequently. Actually, it is a hard work to draw so many people's attentions to use a participatory sensing system. So how to make the current large scale network systems have the ability to sense is an interesting, useful, and hard problem to study. Nowadays, mobile social networks have never been so popular than before, and a lot of people would like to upload human sensing data to share what they see and hear in the daily life in mobile social networks. It will be much better if we utilize these data to extract meaningful information to serve others.
What has been proposed in this paper is a Location Aware Prediction Model regarding human as sensors, and words, images, and videos have been considered as the sensing data from “human sensors.” The system processing this kind of data can be regarded as Human Sensing System.
In the previous studies, Wang et al. [14] propose using human as sensors and elaborate how one model networked human sources as participatory sensor and how to predict whether a ground truth happened or not using an estimation-theoretic perspective.
Sakaki et al. treat users of twitter as social sensors to forecast earthquakes [15]. Zhao et al. reported their experience of using twitter to monitor the US National Football League (NFL) games currently [16]. Tumasjan et al. [17] tell us an interesting way to predict the elections using twitter. Mathioudakis and Koudas [18] present the Twitter Monitor, a system that performs trend detection over the twitter stream. The system identifies emerging topics on twitter in real time and provides meaningful analytics that synthesize an accurate description of each topic.
As the technology of communications progresses, present people desire to know more detailed information somewhere than before, so Location Aware Model for human sensing is needed to study well.
In this paper, a Location Aware Prediction Model in Human Sensing System is proposed to solve the problem of predicting the public events in a specific location, such as traffic jams and air conditions.
The problem whether the ground truth will happen or not has been treated as one of the joint maximum likelihood estimation problems, and the problem is solved by using the Expectation Maximization (EM) algorithm.
Expectation Maximization (EM) [19, 20] is a universal optimization method for finding the maximum likelihood estimation of parameters in a statistic model when the data are not complete. It iterates between two main steps (namely, the E-step and the M-step) until the estimation converges.
Dependence among different locations is considered fully, and the time factor is also taken into account at the same time. The next section will introduce the main framework of Human Sensing System and elaborate the mathematical problem that this paper works for.
3. Location Related Human Sensing Model
In order to convert the real-world problem into a mathematical problem, the real complex world is needed to be simplified. Before elaborating how to formulate the problem into a mathematical one, the basic framework of the Location Aware Human Sensing Model is elaborated first.
3.1. Human Sensing Information Chain
People utilize mobile social networks to share what we have seen and heard. Meanwhile, servers collect homogeneous data from these heterogeneous data sources and store the processed homogeneous data as the human sensing data. The instant information that people want to acquire, such as whether there is a traffic jam in a specific place or not, will be computed from Human Sensing Servers in which the relevant human sensing data is stored.
From the point of information transferring, there exists a chain which is shown in Figure 1. The chain discovered is named Human Sensing Information Chain (HSIC) inspired by the food chain in ecosystem.

The Human Sensing Information Chain.
The specific information that people want to obtain comes from other people's uploaded data, and the circle in which the information travel is also like the food chain in biology. It may be the main feature of Human Sensing System which is a branch of participatory sensing system and enshrines the theory of crowdsourcing.
There are many interesting research areas in the Human Sensing Information Chain, for example, how to promote the circulation of the Human Sensing Information Chain, how to protect the privacy of people who use the Human Sensing System, how to formulate the heterogeneous data into homogeneous, how to integrate different mobile social networks together, and how to make severs more intelligent.
3.2. Software Architecture of Human Sensing System
The network architecture is not considered very much in our paper; any kinds of networks including sensor networks, wireless networks or ad hoc networks, and the trust models as [21, 22] mentioned are also not the scope of our paper. In fact, many study points of interest are in severs, and the general software architecture of Human Sensing System may be as in Figure 2.

Software architecture of Human Sensing System.
The fetcher of the Human Sensing System obtains the mass data from different social sensing systems following the Incoming Protocol. Since the big data should be handled, the data warehouse is used in the Human Sensing System. After processing the incoming data by Human Sensing Data Process Module, the homogeneous data will be stored in the Human Sensing Data Warehouse. Many Human Sensing Data Markets, which provide data to Human Sensing Applications, are built based on different Human Sensing Applications. The Human Sensing Applications will respond to users' requests according to the data acquired from data markets following the Outgoing Protocol.
Both Incoming Protocol and Outgoing Protocol ensure the robustness of the Human Sensing System, offer mechanisms to protect the privacy of users, use the human sensing data rationally, and so forth. The dashed part of Figure 2 is the internal design of the Human Sensing System, and there also should be many studies to guarantee the security of Human Sensing System with high efficiency, stimulate the users to promote the chain circulation, and so forth.
All these areas of Human Sensing System may be topics in the future, and further researches are needed to make much of them.
3.3. Location Aware Human Sensing Model
Location Based Service (LBS) [23, 24] has been popular since the techniques of positioning and smart mobile devices became more mature. Location Based Service is a kind of service which utilizes wireless networks or other localization methods, such as GPS, to obtain the location information of mobile terminal users, and provides a kind of value-added services by using geographical coordinates or geodetic coordinates with the support of GIS platform. There are many studies about Location Based Service, and many commercial firms also joined the field of developing a number of LBS applications. Location Based Service Applications offer convenience for people.
With the growing needs of people, the person also hopes to know the current conditions at a particular place, even though he/she is not here. This is one reason why Location Aware Human Sensing Model is proposed, and we call these kinds of service Location Aware Service.
Deploying a number of various sensors in each corner everywhere to meet people's different needs is impossible. However, from another point of view, human is the best “sensor” to be deployed in every place.
Large scale mobile social networks provide a precious opportunity for us to “deploy” the biggest human sensor network all over the world. People can obtain information that is needed from the Location Aware Human Sensing System at any time and in any place.
Location Aware Human Sensing Model is a branch of Human Sensing Model, and the major feature of Location Aware Human Sensing Model is that it analyses the human sensing data having location information. More and more statuses, images, and even videos that people upload to the mobile social networks have personal location information nowadays. It makes it possible for us to use mobile social networks as human sensing networks, and an ordinary example of human sensing data with location information from Weibo is shown in Figure 3.

Location Aware Human Sensing Data from Weibo.
However, the unpredictability of the human mind makes it difficult to predict whether the ground truth happened or not. The next section will elaborate how to make the problem a mathematical one.
4. The Problem Formulation of Location Aware Human Sensing System
Compared with ordinary sensors, humans have more distinct characteristics that should be ruminated fully. Humans are modeled as sources generating binary observations with location information.
To formulate the problem of Location Aware Prediction Model, simplification of the real world is introduced firstly. Then, the mathematical issue to be solved in the paper is introduced based on the proposed model.
4.1. Geographic Location Discretization
Geographic location is continuous in the real world, and it is difficult to analyze owing to the continuity. Many mature methods can be chosen for discretization [25–27]; however, they are either too complicated or not suitable for the location discretization in the plane.
Make the location point that one person wants to know if one event will happen or not and be the center of one circle with fixed radius r to discrete the land. Use countable circles with the same radius to cover the whole land. The event which happens in a particular point of one circle can be regarded as that it happens just in this circle.
However, circles must overlap between each other when they cover the whole plane. Location points in the overlap area are ambiguous, because they can be treated in both circle areas. To avoid the ambiguity, polygon is selected for location discretization.
The intuitive way is to lay the same grids, whose shape is square with
The visual contrast between using square and regular hexagon is shown in Figure 4, and the detailed computed comparison is shown in Table 1.
Comparison among different attributes in different polygons.

The discretization contrast between using square and regular hexagon.
The overlap ambiguous area using regular hexagon decreases a lot compared to when using square to discretize the geographic location.
4.2. Data Collection
Data collection is performed by using a program called Eplis. Eplis is an application programed with Java language. It is like a web spider to fetch the data by querying keywords from Weibo without using its API, and it proves the feasibility of fetching data from different social networks only if the mobile social network provides search function. The Eplis follows the basic Incoming Protocol to protect the privacy of users, and all users are anonymous.
There are many events which may happen with seasonal changes in our daily life especially for the events that we want to predict with a specific location, such as traffic jams, air conditions, and temperature.
For example, the traffic jam may happen when everybody goes or goes off to work. So, the data collected every afternoon can be treated as similar data. What we predict in this paper is mainly based on the data with the seasonal changes nature. If we want to predict the traffic conditions in the afternoon, we just use the collect sensing data that is about the traffic jams in the afternoon every day.
4.3. The Location Aware Binary Data
From the perspective of physical sensing system, sensors upload the objective data periodically. However, human is a kind of advanced animals with feelings and emotion and would like to upload any events with emotion. Not all uploaded data from human can be regarded as binary sensing data.
People usually upload texts, images, or videos with emotional feelings, and the human sensing data is simplified by converting it into a binary model. The events one observes can be classified into true or false.
The uploaded information such as in Table 2 all can be treated as the binary sensing variables. The Location Aware Binary Data includes keyword, binary, and location. For example, the first status “Bad luck, traffic jams on this road!” can be translated into “traffic jams,” “true,” and “40.4N, 116.3E.” If one wants to know the traffic jams in a specific location, the system just uses the data whose keyword is “traffic jams.”
Examples of Location Aware Binary Data Model.
The innovation point of this paper is considering the correlation among events in different positions. The relationship among users in mobile social networks is not considered in this paper, because data is collected from different mobile social networks, where one person may have multiple user accounts, in a large scale Human Sensing System.
4.4. Problem Formulation of Location Aware Human Sensing
To formulate the Location Aware truth discovery problem in Human Sensing System, the group participants,
Generally speaking, people always would like to report things that are going on. In other words, the observation of one event will be generated when the positive value of a claim is encountered.
Each source can claim a subset of claims set considering the location information. Make
The probability
Based on the Bayesian theory, the probability of
Who uploads what claims (including location information) from observations is the input of the system and the output of the algorithm, or in other words what needs to be calculated is the statistical probability that the claim C indeed happens in the specific location or not. The output can be regarded as the prediction probability that C will happen in a time period.
So, the problem transforms into calculating the probability that

(a) The map discretization in the simulation. (b) The example of the probability of traffic jams in one time period.
5. The Location Aware EM Algorithm in Human Sensing System
The main challenge in using the Expectation Maximization (EM) algorithm is to show how to cast the question into calculating the probability of claims' correctness as a maximum likelihood estimation problem.
Based on the given observed data set I, both the set of latent variable Z and the vector of unknown parameters should be chosen at the same time.
The latent variable Z is a vector where
The expression that needed maximization is
The problem can be solved by the Expectation Maximization (EM) algorithm. However, the original EM algorithm assumes that all events and claims are independent. In this paper, the Location Aware claim dependency model is built to address these challenges in this section.
Our aim is to gain the maximum likelihood estimation of θ and the latent vector Z, and the likelihood function is shown as (5) if all observations are independent:
Unfortunately,
In theory, it is related to other discrete locations. However, in practice, the location is more associated with the close location, so the position is assumed independent of the remote locations.
Let
The probability of
So, the prediction may be relevant to the ambient locations. In other words, one event may happen when it happens in most of the surrounding areas.
For the geographic location discretization which we propose above, each location has six neighbors. To reduce the sophisticated calculation, the locations are regarded only dependent on their neighbors, and independent of other locations.
Before deriving the E-M algorithm, the relevant probabilities that will be used in the following calculation are shown as
Generally speaking, the EM algorithm is used to find the maximum likelihood estimation by performing the following steps.
E-step: calculate the excepted log likelihood function with the computed distribution of the latent variable Z given the current settings θ and observed set I
M-step: find the parameters that maximize the Q function which is also used in the E-step, and the main method to find the maximization is to calculate the partial derivative of Q function:
The iteration of E-step and M-step will be stopped until the values of calculated θ converge. The values that need to be obtained eventually are
Apparently, the probability of
After giving the above formulation, the Q function, which plays an important role in the EM algorithm, can be written as
The M-step is as follows: setting the derivatives
On the basis of the above equations,
The set
Equations (13) and (16) are calculated iteratively through the EM algorithm. When the parameters converge, the optimal decision can be got from the converged
The main step of the Expectation Maximization algorithm is shown in Algorithm 1.
(1) Input the observed set, time quantum that want to be predicted. (2) Choose the sensing data set I from the time quantum in each period. (3) Initialize the parameters θ with values between 0 and 1. (4) (5) (6) compute (7) (8) (9) (10) compute (11) (12) (13) (14) (15) (16) (17) (18) (19) Return the event prediction in a specific place y.
6. Evaluation
The performance of the LAP is evaluated in this section. Firstly, the simulation experiments have been carried out to prove the estimation accuracy of the LAP algorithm. The Location Aware EM algorithm outperforms the others by not only simulation studies but also practical use. The first part of this section introduces the simulation studies, and the other part illustrates the performance that LAP is applied in the practical mobile social networks.
6.1. Simulation Analysis
The simulator is built in MATLAB, and the simulation scenario is set as follows.
The area of simulation is 50 × 50 km2, and the probability that the event happens in a specific location over the daily cycle has been defined. What needs stimulation is the event such as air condition, traffic jams. These kinds of events have the feature that they happen periodically. r of the discrete hexagon is
The discretization graph is shown in Figure 4. Every day is divided into eight parts equally, and different locations have different probability that the event happens in each part. If an event happens in higher odds in this event, it also may happen in its neighbors with high odds because of the dependence. Then, the ground truth is calculated with the time going on. After obtaining the simulation data, the sensing data should also be simulated.
A random probability
Each participant has a probability in reporting the event that actually happens, and the participant can also be in any location with a probability periodically.
The algorithm to contrast with is also an EM algorithm [28], and it is a regular EM algorithm that does not consider the location information. Another reference contrasting with the proposed algorithm is voting, and its core idea is as follows: the more the participants report the event the higher the probabilities it may happen.
The three criteria are used to verify the effectiveness of the Location Aware EM algorithm in a Human Sensing System with location sensing data, and they are Estimation Error of Participant Reliability, True Positives, and True Negatives.
In the first simulation, the estimation accuracy of Location Aware EM algorithm is compared with other algorithms considering different participants in the Human Sensing System. The number of reported measured variables was fixed at 10 for every day, and the number of participants varies from 100 to 500. The results are shown in Figure 6. Because of the relevance among locations, the result shows that the Location Aware EM algorithm outperforms the other algorithm. The Location Aware EM algorithm is much better than other algorithms with the number of participants increasing.

(a) The Estimation Error of Participant Reliability with different number of participants. (b) True Positives of Measured Variables with different number of participants. (c) True Negatives of Measured Variables with different number of participants.
For the second simulation, algorithms are compared with the average number of observations per participant changes, and the number of participants is set to 200. The results are shown in Figure 7. As the above simulation, this simulation also shows that the Location Aware EM algorithm has a better performance than the other two algorithms with the increasing number of observations per participant in Estimation Error of Participant Reliability, True Positives of Measured Values, and True Negatives of Measured Values.

(a) The Estimation Error of Participant Reliability with changing the number of observations per participant claims. (b) True Positives of Measured Variables with changing the number of observations per participant claims. (c) True Negatives of Measured Variables with changing the number of observations per participant claims.
6.2. Real-World Case Study
In this subsection, the Location Aware EM algorithm is studied with the real-world sensing data that we grabbed in the Weibo.
The traffic jams human sensing data with time and location information in Beijing from 5/1/2014 to 10/1/2014 have been grabbed. Actually, sensing data should be grabbed from different mobile sensing systems based on our framework of Human Sensing System. It has no problems in calculating because the main difference is the size of data set. The criteria and reference algorithms which have been mentioned in the above subsection are also used in this experiment. If one participant did not upload the information about the traffic jams, then assume that he/she did not observe the traffic jams. In other words, we only consider the situation that the participant claims that he/she observers the traffic jams when he/she is indeed stuck in traffic jams.
The first experiment is to contrast the accuracy rate of the traffic jams prediction among three algorithms. Based on the obtained data set, the traffic jam from 7:00 AM to 9:00 AM in Beijing is predicted. The sensing data is chosen before September 1, 2014, from 7:00 AM to 9:00 AM, in Beijing, and ground truths where traffic jams happen are chosen on September 1, 2014. The results are shown in Figure 8(a). It shows that the Location Aware EM algorithm performs better than the regular EM algorithms and voting.

(a) The accuracy rate of the prediction of traffic jams. (b) Convergence property of EM algorithms.
In the second experiment, the convergence property between two EM algorithms is contrasted in the third simulation. The results are shown in Figure 8(b). The experiments show that the two algorithms can reach the convergence, but the Location Aware EM algorithm can reach the convergence with fewer iterations than the regular EM algorithm in the Location Aware Human Sensing System.
7. Conclusion
The paper presents a kind of software architecture that may be used in future Human Sensing Systems. Meanwhile, the Location Aware Prediction Model is also elaborated in the paper, and the Location Aware EM algorithm is proposed based on the model.
The simulations and real-world case studies also show the effectiveness of the Location Aware EM algorithm considering the location dependence in the Human Sensing System with location information.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant no. 61272529; the Fundamental Research Funds for the Central Universities under Grants no. N120417002 and no. N130817003. The authors would like to thank all editors of this paper. They read the paper very carefully and provided valuable feedback, which is helpful to improve the quality of the drafts.
