Semantic Outbreak Power Based Evolution of Web Event in Large-Scale Ubiquitous Contexts

Abstract

Nowadays, emergencies’ events have a great impact on people's daily life. Web is acting as an important platform for the diffusion and evolution of social events. With the popularization and development of web technology in the world, web is becoming an important platform to cover, transmit, and release the news. Information holders can use Internet as a medium to broadcast various news in real time. But it is difficult for people to excavate the events development trend from web information. In this paper, we introduce a method to measure the evolution process of the Web event based on the semantic outbreak power. Then, we propose an approach to distinguish the event type based on the fuzzy recognition. Finally, we give some instances validation of semantic outbreak power of web event and the correctness verification of the event type distinguish of Web event.

1. Introduction

With the popularization and development of web technology in the world, web is becoming an important platform to cover, transmit, and release the news. Information holders can use Internet as a medium to broadcast various news in real time. Web users can quickly and comprehensively grasp the dynamic events [1].

Web event refers to the web event which is sustainedly and extensively reported and discussed in short online media (such as BBS, news sites, and blogs) [2, 3]. It is likely to cause some harm to the social reality. The reported and discussed forms are various, such as a web page news commentary, the post and reply in BBS, and the records and message in blog. These reports and discussions have a major impact on the social stability and a large number of Internet users. It often accompanies the occurrence, development, and change of the hot topics and social events. Web event has some typical features: (a) it may have wide and speedy transmission; (b) it may have devastating effects on the society; (c) it is not easy to find and control the breaking point. So how to detect and compute the propagation speed, predict the direction of evolution, and effectively control web event is a major challenge. Therefore it is necessary to analyze the future of the web event and measure its evolution.

With the development of internet of things [4–6], big data [7–10], and cloud computing [11–14], evolution is a basic feature of the web events and important research content in the field of topic detection and tracking (TDT) [3]. The main research contents of the TDT [15, 16] involve topic detection and information collection, segmentation of event information and the first report event time detection, and topic tracking. Generally speaking, the TDT technologies try to detect unknown events and cluster news related to these events.

The TDT traces the development of the events. But it is not to measure the dynamic evolution process of the events and does not consider event semantic characteristics in the evolution process. So the TDT cannot offer us a global and clear understanding of the web event. Thus, in order to recover the insufficiency of the TDT, we have proposed a method to measure the evolution process of the web event based on semantic outbreak power. Then, we put forward an approach to distinguish the event type based on the fuzzy recognition.

The paper is organized as follows. Section 2 presents briefly the topic detection and topic tracking. Section 3 introduces a method to measure the evolution process of the web event based on semantic outbreak power. Section 4 puts forward an approach to distinguish the event type based on the fuzzy recognition. Section 5 gives some instances of validation of semantic outbreak power of web event and the correctness verification of the event type distinguish of web event.

2. Related Work

The TDT [17–19] was first initiated by the Defense Advanced Research Projects Agency (DARPA) and National Institute of Standards and Technology (NIST). It aims to develop a series of information organization technology based events and help people cope with information overload problem. Then, we mainly introduce the research of the two subtasks: topic detection and topic tracking.

2.1. Topic Detection

Topic detection puts data stream from the newswire and news sources reported into different topics. If it is necessary, topic detection will establish the new topic of technology. Topic detection can be thought of as an event cluster. Most of these clusters are incremental [20].

Most of current research topic detection using the traditional method of natural language processing: the center vector method [17], $K N N$ [21], K-means [22], single pass clustering algorithm [23], and so forth. The processing object of the topic detection is the reporting flow, which is changing with time, rather than a static text set. So most studies use a combination of a variety of clustering strategies to extend and improve these basic clustering algorithms.

More representative research includes the following: Yang et al. [24] used the strategy which combined the condensing type clustering algorithm and the average clustering algorithm; the researchers of CMU [25] mainly use the single pass clustering method with time Windows to detect the topic; literature [26] proposed a hierarchical community discovering algorithm based on event network, which exploits the semantic properties of event nodes and edge-weight information in the network, to discover fine granularity communities that are semantically meaningful.

In addition, establishing the model for the topic and report is another important study in the process of topic detection. Related models include space vector model, probabilistic model, lexical chain model, and graph model. The vector space model is the most commonly used model. There are many kinds of the formula to calculate the similarity based on this model, such as okap, Clarity, Hellinger, Tanimoto, Weightsumt, and cosine similarity formula.

2.2. Topic Tracking

Topic tracking is the technology used to identify the rest news associated with the topic from news flow according to the given small amounts of training reports associated with a topic. The essence of this task is a guided learning. Topic tracking uses few positive instances data (one or more samples) and a lot of negative instances (history data) to get a classifier. This classifier is used to distinguish whether the new report is associated with the topic. So the topic tracking can be regarded as a special kind of binary classification problem. Many of the technologies in text categorization are the foundation of topic tracking.

In the topic tracking study, the commonly used methods include k nearest neighbours ( $K N N$ ) [27, 28], the decision TREE (D-TREE) [29], Rocchio algorithm [30], the SVM algorithm [31], and language model [32]. Most of the research improved the basic algorithm and then applied it to the topic tracking. The following simple introduce several representative researches.

Allan et al. used Rocchio algorithm to implement the topic tracking. The core idea of the algorithm is the structural strategy of the topic model. If the feature is conducive to the topic description, its weight will be strengthened. If the feature is tending to wrongly guide topic description, its weight will be weakened [33]. Leek and Sista used the probability model in their topic tracking and recognition system, mainly based on the simple Bayesian algorithm. The system combines multiple classifier system, organized to present the results of each classifier to the user [32].

In addition, another research about web events or topics includes this paper [34] which presents a novel hot event discovery framework to detect hot events online. It contained three stages: document preprocessing, threshold-resilient document classification, and adaptive splitting document clustering. Deng and Xu [35] propose a method to measure the influence and represent the event evolution graph. Lee et al. [36] proposed an incremental tracking framework for cluster evolution over highly dynamic networks. This paper considered the event evolution tracking task in social streams as an application, where a social stream and an event are modeled as a dynamic postnetwork and a postcluster, respectively.

Generally speaking, the traditional TDT technologies just detect unknown events and put related news reports into different topics based on text clustering and text classification. The TDT also traces the development of the events but does not analyze the event deeply. So it cannot provide us with a global and clear understanding of the web event.

3. The Semantic Outbreak Power

3.1. Semantic Features of the Web Event

Our goal is to measure the evolution process of the web event based on semantic outbreak power. So, first of all, we need to get the semantic features set of the web events in the evolution process (Table 2).

Definition 1 (the semantic features set of web event Fe).

The semantic features set of web event Fe consists of three parts including event seed set $S (t_{i}, t_{j})$ , web page set $φ (t_{i}, t_{j})$ , and events keywords set $K (t_{i}, t_{j})$ .

Events seed set $S (t_{i}, t_{j})$ contains the core attributes word of events which are usually used as the keywords to search for events. Web page set $φ (t_{i}, t_{j})$ is a set of resources on the web. The set is a cluster of related web document collections which described the events in the time period $[t_{i}, t_{j}]$ , and it can be labelled as $φ (t_{i}, t_{j}) = {φ_{i}, φ_{i + 1}, \dots, φ_{j}}$ . We can use the search engine (e.g., Google) as interface and the seed set as a keyword to search events. We can crawl from the Internet for time-series data of hot issues. Event attributes set $K (t_{i}, t_{j})$ is a bunch of event attributes set which is extracted from the web page set $φ (t_{i}, t_{j})$ , and it can be labelled as $K (t_{i}, t_{j}) = {k_{i}, k_{i + 1}, \dots, k_{j}}$ .

The basic steps to get the semantic features set of web event Fe are as follows: (1)

using events seed set $S (t_{i}, t_{j})$ as search events attributes, search engine will return web page set $φ (t_{i}, t_{j})$ ;

(2)

extracting events keywords set $K (t_{i}, t_{j})$ from web page set $φ (t_{i}, t_{j})$ , the weight of event attributes $k_{j}$ of web page $φ_{i}$ can be computed by the TF-IDF method [37].

Then, we will discuss several basic semantic features of the web event set.

Definition 2 (the new increased web page set: $Δ φ (t_{i}, t_{j})$ ).

From time $t_{i}$ to $t_{j}$ , the new increased web page set of event e can be labelled as $Δ φ (t_{i}, t_{j}) = {φ_{i}, φ_{i + 1}, \dots, φ_{j}}$ .

It means that there are new increased n related web pages of event e during the time $[t_{i}, t_{j}]$ , and no intersection set of web pages during all previous time $[t_{s}, t_{i}]$ , namely, $Δ φ (t_{i}, t_{j}) \cap φ (t_{s}, t_{i}) = ⌀$ .

Table 1 shows the event's time-series data of the new increased web page. The right of the table was the new increased number of Chinese web page of the specific source from the search engine on the day.

Table 1

The new increased web page number of the events of “Libya unrest” every day.

Date	News $\| Δ φ (t_{i}, t_{j}) \|$	Blog $\| Δ φ (t_{i}, t_{j}) \|$	BBS $\| Δ φ (t_{i}, t_{j}) \|$
2011-2-22	104	377	616
2011-2-23	132	383	602
2011-2-24	100	781	13300
⋮	⋮	⋮	⋮
2011-3-29	3820	256	745
2011-3-30	4110	72	367
⋮	⋮	⋮	⋮

Table 2

The semantic features original set of the web event evolution process.

Topic ID	date	$\| Δ φ (t_{i}, t_{j}) \|$	$\| Δ K (t_{i}, t_{j}) \|$
33	0	103	55
33	0	107	67
33	1	114	111
33	1	114	115
33	1	116	111
33	⋮	⋮	⋮

Definition 3 (the new increased keywords set: $Δ K (t_{i}, t_{j})$ ).

From time $t_{i}$ to $t_{j}$ , the new increased keywords set of event e is extracted from the new increased web page set $Δ φ (t_{i}, t_{j})$ , which can be labelled as $Δ K (t_{i}, t_{j}) = {k_{i}, k_{i + 1}, \dots, k_{j}}$ . It means that there are new increased m events attributes of event e during the time $[t_{i}, t_{j}]$ and no intersection set of events keywords set during all previous time $[t_{s}, t_{i}]$ , namely, $Δ K (t_{s}, t_{i}) \cap Δ K (t_{i}, t_{j}) = ⌀$ .

Definition 4 (the distribution of event attributes in the new increased web page: $ψ (t_{i}, t_{j})$ ).

For a web event e, all the web pages of $Δ φ (t_{i}, t_{j})$ can be used as the vector of $Δ K (t_{i}, t_{j})$ to represent the event attributes, namely, $φ_{n} = \{w_{n 1}, w_{n 2}, \dots, w_{m m}\}$ . $w_{n m}$ represents the event keyword $k_{m}$ weights of nth pages. m is the number of event attributes. These vectors constitute a matrix. The distribution of event attributes in a web page from time $t_{i}$ to $t_{j}$ is

\begin{matrix} ψ (t_{i}, t_{j}) = (\begin{pmatrix} w_{11} & \dots & w_{1 m} \\ ⋮ & ⋱ & ⋮ \\ w_{n 1} & \dots & w_{n m} \end{pmatrix}) . \end{matrix}

(1)

The new increased web page set $Δ φ (t_{i}, t_{j})$ , the new increased events keywords set $Δ K (t_{i}, t_{j})$ , and the distribution of event attributes in the new increased web page $ψ (t_{i}, t_{j})$ can be obtained from the source data shown in Table 3, and calculate the binary chart shown in Figure 1. The picture reflects the relationship of web page, event keywords, and the distribution of event attributes in the web page.

Table 3

The details of dataset 1 (50 events).

Feature	Value
Average number of seeds per event	2
Average number of web pages per event	1763
Average number of event attributes per event	6118
Average number of days per event	42
Average number of web pages per day	59
Average number of event attributes per day	1011

Figure 1

The relationship of the events seed set $S (t_{i}, t_{j})$ , the new increased web page set $Δ φ (t_{i}, t_{j})$ , the new increased events keywords set $Δ K (t_{i}, t_{j})$ , and the distribution of event attributes in the new increased web page $ψ (t_{i}, t_{j}) ψ (t_{i}, t_{j})$ .

3.2. The Semantic Outbreak Power

Definition 5 (the outbreak power of web event: $o p (t_{i}, t_{j})$ ).

From time $t_{i}$ to $t_{j}$ , the outbreak power of web event can be labelled as $o p (t_{i}, t_{j})$ . It represents the fact that the severity power of the event e may cause harm and urgency power to deal with emergency during the time $[t_{i}, t_{j}]$ .

We understand the web event life process more comprehensively after discussing the semantic features set of web events. We will propose an iterative algorithm combining all of these characteristics of events, measuring the web event life process.

In the previous section, the semantic features set of web events have been defined and attained. Now, we will use these features to calculate the outbreak power. Before proposing the Calculation method, we introduce two important propositions: the representability of event keywords and the credibility of a web page.

Proposition 6 (the representability of event keywords: $e r (k)$ ).

It describes the ability of the event keywords to express the event.

Proposition 7 (the credibility of a web page: $c (φ)$ ).

It describes the believable event power of the keywords to express the event.

According to the observation of the actual data set and cognitive knowledge, we give the following few deductions to be the basis of calculating semantic outbreak power algorithm (Algorithm 1).

Algorithm 1: The steps of computing the semantic outbreak power.

Input: The set of pages $φ (t_{i}, t_{j})$ , the set of keyword $K (t_{i}, t_{j})$ , the distribution of keywords among pages $ψ (t_{i}, t_{j})$ .

Output: The outbreak power $op (t_{i}, t_{j})$ .

for each $φ \in φ (t_{i}, t_{j})$

$c (φ) \leftarrow c_{0}$ ; $/^{*}$ setting initial ${s t a t e}^{*}$ /

$\vec{T} = {c_{0} (φ_{1}), c_{0} (φ_{2}), \dots, c_{0} (φ_{n})}$ ;

Repeat $/^{*}$ iterative $c o m p u t a t i o n^{*}$ /

$\vec{e r (k)} \leftarrow \vec{c (φ)}$

$\vec{c^{'} (φ)} \leftarrow \vec{e r (k)}$

$\vec{T^{'}} = {c^{'} (φ_{1}), c^{'} (φ_{2}), \dots, c^{'} (φ_{2 n})}$

Until cosine similarity of $\vec{T}$ and $\vec{T^{'}}$ is greater than β.

Compute $op (t_{i}, t_{j})$ by (4)

Deduction 1.

The more web pages on event discussion, the higher semantic outbreak power during the time $[t_{i}, t_{j}]$ , namely, $o p (t_{i}, t_{j}) \infty |Δ φ (t_{i}, t_{j})|$ .

Deduction 2.

The more keywords on event discussion, the higher semantic outbreak power during the time $[t_{i}, t_{j}]$ , namely, $o p (t_{i}, t_{j}) \infty |Δ K (t_{i}, t_{j})|$ .

Deduction 3.

When the disagreement is larger about the event discussion, the distribution of event keywords in the new increased web page $ψ (t_{i}, t_{j})$ is one-to-one mapping. Then, the semantic outbreak power is higher, namely, $(\forall w_{n m} \in ψ (t_{i}, t_{j}) \to w_{n m} \neq 0) \to o p (t_{i}, t_{j})_{m i n}$ .

According to Deduction 3, on the condition that each event keyword is provided by one web page which only offers one keyword, the similarity power between each page will be 0. This means that all the webs are different. At this point, the disagreement is bigger and the event is likely to further deteriorate. In contrast, we can make Deduction 4.

Deduction 4.

If the web page number and keywords on the event discussion are decay during the time $[t_{i}, t_{j}]$ , the distribution of event attributes in the new increased web page (binary diagram in Figure 1) tends to be a complete graph. At this time, the power of the outbreak of the event is low.

According to Deduction 4, if the entire keywords exit in every web page, namely, the event keywords set and web page set constitute a complete graph, the similarity power between each page will be 1. This means that all web pages are reproduced from a web page. At this point, the web event is relatively uniform and is less likely to generate further threats.

From the previous discussion, we can know that the greatest influences on the semantic outbreak of web events are the new increased web page set $Δ φ (t_{i}, t_{j})$ , the new increased keywords set $Δ K (t_{i}, t_{j})$ , and the distribution of event attributes in the new increased web page: $ψ (t_{i}, t_{j})$ .

Considering the above discussion, we can use the representability of event keywords $e r (k)$ to infer the credibility of a web page $c (φ)$ . The reverse is also true. it is shown in Figure 2 which consists of three parts: the new increased web page set $Δ φ (t_{i}, t_{j})$ , the new increased keywords set $Δ K (t_{i}, t_{j})$ , and the distribution of event attributes in the new increased web page: $ψ (t_{i}, t_{j})$ .

Figure 2

The process of iterative calculating the credibility of web page and the represented ability of event keywords.

According to Figure 2, we get the following two deductions.

Deduction 5.

For a web page φ, if most of the event keywords have the strong representability $e r (k)$ , the web page will have high credibility $c (φ)$ .

Deduction 6.

If a keyword k is provided by a higher $c (φ)$ web page, the keyword k will have a strong $e r (k)$ . $e r (k)$ and $c (φ)$ affect each other. So we can use altercative algorithm to compute the semantic outbreak power considering the new increased web page, the new increased keywords, distribution of keywords in the web page.

According to Deduction 4, for a web page φ, we can compute the average of the representability of event keywords to compute the credibility of a web page. Consider

\begin{matrix} c (φ) = \frac{\sum_{k \in K w (φ)} e r (k)}{|K w (φ)|}, \end{matrix}

(2)

where

K w (φ)

is the event keywords set for a web page φ.

According to [38], we use the probability function to calculate the representability of event keywords:

\begin{matrix} e r (k) = 1 - \prod_{φ \in W p (k)} (1 - c (φ)), \end{matrix}

(3)

where

W p (k)

is the web page set event keywords k.

The semantic outbreak power $o p (t_{i}, t_{j})$ during the time $[t_{i}, t_{j}]$ is

\begin{matrix} o p (t_{i}, t_{j}) = \sum_{q = 1}^{n} (1 - c (φ_{q})), \end{matrix}

(4)

where n is

|Δ φ (t_{i}, t_{j})|

and it represents the new increased web pages during the time

[t_{i}, t_{j}]

Therefore, we put forward the detailed steps on semantic outbreak power algorithm as follows: (1)

give each page the initial credibility; at the same time, all the initial credibility is a vector of a web page $\vec{T}$ ; namely, $\vec{T} = {c (φ_{1}), c (φ_{2}), \dots, c (φ_{n})}$ ;

(2)

according to formula (4), compute the representability of each event keyword $e r (k)$ ;

(3)

according to formula (3), compute the credibility of a web page: $c {(φ)}^{'}$ ;

(4)

all types of the credibility calculated using (3) constitute a vector: $\vec{T^{'}}$ , namely, $\vec{T^{'}} = {c {(φ_{1})}^{'}, c {(φ_{2})}^{'}, \dots, c {(φ_{n})}^{'}}$ ;

(5)

computing will reach a stable state until cosine similarity of $\vec{T}$ and $\vec{T^{'}}$ is greater than β; at this time, the representability of event keywords $e r (k)$ and the credibility of a web page $c (φ)$ will no longer be changed; if this condition is not satisfied, make $\vec{T} = \vec{T^{'}}$ ; turn to step (2); if this condition is satisfied, end the iteration;

(6)

according to formula (4), calculate the semantic outbreak power.

4. Experiment

4.1. The Verification of the Semantics Outbreak Power

Experiments involve events derived from Baidu news sites. We chose 50 web events about 450000 web pages and 100 events about 900000 web pages as the experimental data set, which includes various areas such as political, accidents, and disasters, and terrorist attacks. Tables 3 and 4 show the statistical results of the experimental data set in detail in this paper. The determination of these web start events stamp according to the method in [26], and the end events stamp general settings for the experimental process in the time we crawl events related web pages; the average length of each event sampling time is 30~40 days. In addition, to determine the evolution process of event, we also obtain semantic futures of events, including event seed set, event web page set, and the event keywords set. We get the event seeds from Baidu. Baidu provides hot issues; at the same time, it also provides hot search words to help users search events.

Table 4

The details of dataset 2 (100 events).

Feature	Value
Average number of seeds per event	2
Average number of web pages per event	5556
Average number of event attributes per event	16856
Average number of days per event	40
Average number of web pages per day	146
Average number of event attributes per day	469

For example, about Japan's catastrophic earthquake and tsunami disaster in 2011, many web users want to search the related web page information. Baidu will provide a set of keywords about events such as “Japan, earthquake, tsunami.” We will be event seed as a search term and crawl events related web from the Internet, after getting the event seed from the search engines.

The detailed steps we collect from the web resources are (1)

obtaining event seed set $S (t_{i}, t_{j})$ of web event from the news websites Baidu, such as “Japan, earthquake, tsunami,”

(2)

using the event seed set as the search keywords, crawl events related web pages from the Internet set,

(3)

determining the event start time stamp $t_{s}$ according to the web page set $φ (t_{i}, t_{j})$ and the event end time stamp $t_{e}$ according to the download page time,

(4)

obtaining timing source data of event every day from the web page set $φ (t_{i}, t_{j})$ including new increasing web page number $| Δ φ (t_{i}, t_{j}) |$ , new increasing keywords number $| Δ K (t_{i}, t_{j}) |$ , and the distribution of event keywords in the new increased web page: $ψ (t_{i}, t_{j})$ ,

(5)

doing step (4) in the different sources of information (including news, blogs, and BBS).

We took “days” for the minimum time granularity, collected, from different sources on the web event, every day, the semantic characteristics of the source data, used the iterative algorithm to calculate the semantic outbreak powers every day based on the source data, and then got the events time-series data of the semantics outbreak over a period of time, as shown in Figures 3, 4, and 5.

Figure 3

The event “Japan nuclear leak” evolution based on semantic outbreak power.

Figure 4

The event “Maldives coup” evolution based on semantic outbreak power.

Figure 5

The event “Libya unrest” evolution based on semantic outbreak power.

5. Conclusions

Nowadays, emergencies events have a great impact on people's daily lives. And, with the popularization and development of web technology worldwide, web is acting as a platform for the diffusion and evolution of social events. However, faced with the huge disorder and continuous web resources, it is impossible for people to efficiently recognize, collect, and organize the events. Therefore, automatically collecting and organizing the information about events and then tracking the evolution process of events becomes a hot research field. Generally, the traditional topic detection and tracking (TDT) techniques have been attempting to detect or cluster news stories into these events, without discussing or interpreting the evolution process of events.

In this paper, we proposed the semantic outbreak power and events process measurement algorithm. For any one event, we can calculate the event time-series data of the semantic outbreak power and measure the evolution process of events by analysing the information from the web about the event. It can help people to understand clearly the evolution process of a web event. Then, we propose an approach to distinguish the event type based on the fuzzy recognition.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Science and Technology Major Project under Grant 2013ZX01033002-003, in part by the National High Technology Research and Development Program of China (863 Program) under Grants 2012AA011504, 2013AA014601, and 2013AA014603, in part by National Key Technology Support Program under Grant 2012BAH07B01, in part by the National Science Foundation of China under Grants 61300202 and 61300028, in part by the Science Foundation of Shanghai under Grant 13ZR1452900, in part by the Major Research Project of the Ministry of the Public Security under Grant 2014JSYJB009, in part by the China Postdoctoral Science Foundation under Grant 2014M560085, in part by the China National Social Science Fund 06BFX051, and in part by the Shanghai University Training and Selection of Outstanding Young Teachers in Special Research Fund hzf05046.

References

Xuan

Luo

Mining websites preferences on web events in big data environment

Proceedings of the 16th International Conference on Computational Science and Engineering (CSE ′13)

2013

1043 1050

2012, http://definitions.uslegal.com/e/emergency-event

Yang

C. C.

Shi

Wei

C.-P.

Discovering event evolution graphs from news corpora

IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans 2009 39 4 850 863

10.1109/tsmca.2009.2015885

2-s2.0-67650652459

Liu

A generalized probabilistic topology control for wireless sensor networks

IEEE Journal on Selected Areas in Communications 2012 30 9 1780 1788

10.1109/jsac.2012.121023

2-s2.0-84866947169

Luo

Chen

Building association link network for semantic link on web resources

IEEE Transactions on Automation Science and Engineering 2011 8 3 482 494

10.1109/TASE.2010.2094608

2-s2.0-79960112691

Liu

Mei

Chen

Luo

Semantic link network based model for organizing multimedia big data

IEEE Transactions on Emerging Topics in Computing 2014 2 3 376 387

10.1109/TETC.2014.2316525

Liu

Zhu

Xue

A reliability-oriented transmission service in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2011 22 12 2100 2107

10.1109/TPDS.2011.113

2-s2.0-80055037954

Liu

Zhang

Opportunity-based topology control in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2010 21 3 405 416

10.1109/TPDS.2009.57

2-s2.0-76749154972

Yen

N. Y.

Shih

T. K.

Jin

LONET: an interactive search network for intelligent lecture path generation

ACM Transactions on Intelligent Systems and Technology 2013 4 2, article 30

10.1145/2438653.2438665

10.

Lau

R. W. H.

Yen

N. Y.

Wah

Recent development in multimedia e-learning technologies

World Wide Web 2014 17 2 189 198

10.1007/s11280-013-0206-8

2-s2.0-84893671286

11.

Wei

Luo

Liu

Mei

Chen

Knowle: a semantic link network based system for organizing large scale online news events

Future Generation Computer Systems 2015 43–44 40 50

10.1016/j.future.2014.04.002

2-s2.0-84899043293

12.

Luo

Zhang

Wei

Mei

Mining temporal explicit and implicit semantic relations between entities using web search engines

Future Generation Computer Systems 2014 37 468 477

10.1016/j.future.2013.09.027

2-s2.0-84901641463

13.

Chen

C.-C.

Huang

T.-C.

Park

J. J.

Tseng

H.-H.

Yen

N. Y.

A smart assistant toward product-awareness shopping

Personal and Ubiquitous Computing 2014 18 2 339 349

10.1007/s00779-013-0649-z

2-s2.0-84897590079

14.

Yen

N. Y.

Wang

C.-L.

Hussain

Park

J. H.

Computational awareness towards green environments

The Journal of Supercomputing 2014 69 3 1007 1012

15.

Allan

Carbonell

Doddington

Yamron

Yang

Topic detection and tracking pilot study final report

Proceedings of the Broadcast News Transcription and Understanding Workshop

1998

194 218

16.

Makkonen

Investigation on event evolution in TDT

Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language

2003

43 48

17.

Larkey

L. S.

Feng

Connell

Lavrenko

Language-specific models in multilingual topic tracking

Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ′04)

July 2004

Sheffield, UK

402 409

18.

Yang

Rogati

Applying CLIR techniques to event tracking

Information Retrieval Technology: Proceedings of the Asia Information Retrieval Symposium, AIRS 2004, Beijing, China, October 18–20, 2004 2005 3411

Berlin, Germany

Springer

24 35 Lecture Notes in Computer Science

10.1007/978-3-540-31871-2_3

19.

Allan

Lavrenko

Frey

Khandelwa

UMass at TDT 2000

Proceedings of the Topic Detection and Tracking Workshop

2000

Gaithersburg, Md, USA

National Institute of Standard and Technology

109 115

20.

Koulali

El-Haj

Meziane

Arabic Topic Detection using automatic text summarisation

Proceedings of the International Conference on Computer Systems and Applications (AICCSA ′13)

May 2013

Ifrane, Morocco

IEEE

1 4

10.1109/aiccsa.2013.6616460

2-s2.0-84887241655

21.

Allan

Lavrenko

UMass at TDT 2000

http://www.nist.gov/speech/tests/tdt/tdt2000/papers.html

22.

Walls

Jin

Sista

Schwartz

Topic detection in broadcast news

Proceedings of the DARPA Broadcast News Workshop

1999

248 255

23.

Xiaolin

Xiao

Nan

Fengchao

An improved Single-Pass clustering algorithm internet-oriented network topic detection

Proceedings of the 4th International Conference on Intelligent Control and Information Processing (ICICIP ′13)

June 2013

560 564

10.1109/icicip.2013.6568138

2-s2.0-84883222663

24.

Yang

Pierce

Carbonell

A study on retro-spective and on-line event detection

Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

1998

Carnegie Mellon University-ACM

28 36

25.

Yang

Carbonell

Brown

Multi-strategy learning for topic detection and tracking

Proceedings of the Topic Detection and Tracking Workshop (TDT ′02)

2002

85 114

26.

Wan

Liu

A community discovering method based on event network for topic detection

Proceedings of the 16th International Conference on Advanced Communication Technology (ICACT ′14)

February 2014

Pyeongchang, Republic of Korea

1242 1246

10.1109/ICACT.2014.6779157

27.

Xiaolin

Xiao

Nan

Fengchao

An improved Single-Pass clustering algorithm internet-oriented network topic detection

Proceedings of the 4th International Conference on Intelligent Control and Information Processing (ICICIP ′13)

June 2013

Beijing, China

560 564

10.1109/icicip.2013.6568138

2-s2.0-84883222663

28.

Papka

On-line new event detection, clustering, and tracking [Ph.D. thesis] 1999

Boston, Mass, USA

Department of Computer Science, University of Massachusetts

29.

Xiaowei

Application of decision tree classification method based on information entropy to web marketing

Proceedings of the 6th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA ′14)

January 2014

Zhangjiajie, China

IEEE

121 127

10.1109/ICMTMA.2014.34

30.

Yang

Ault

Pierce

Lattimer

C. W.

Improving text categorization methods for event tracking

Data Mining and Knowledge Discovery 2004 5 3 167 175

31.

Liu

Tang

Mass classification in mammograms using selected geometry and texture features, and a new SVM-based feature selection method

IEEE Systems Journal 2014 8 3 910 920

10.1109/jsyst.2013.2286539

2-s2.0-84887374623

32.

Leek

Sista

Probabilistic approaches to topic detection and tracking

Data Mining and Knowledge Discovery 2003 7 3 67 83

33.

Allan

Papka

Lavrenko

On-line new event detection and tracking

Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval

1998

Melbourne, Australia

37 45

34.

Liu

Luo

A framework for online hot event discovery on the web

Proceedings of the 16th IEEE International Conference on Computational Science and Engineering (CSE ′13)

December 2013

Sydney, Australia

989 996

10.1109/cse.2013.145

2-s2.0-84900358639

35.

Deng

Event evolution analysis in microblogging based on a view of public opinion field

Proceedings of the 6th International Symposium on Computational Intelligence and Design (ISCID ′13)

2013

193 197

36.

Lee

Lakshmanan

L. V. S.

Milios

E. E.

Incremental cluster evolution tracking from highly dynamic network data

Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE ′14)

April 2014

3 14

10.1109/icde.2014.6816635

2-s2.0-84901774773

37.

Salton

Buckley

Term-weighting approaches in automatic text retrieval

Information Processing and Management 1988 24 5 513 523

10.1016/0306-4573(88)90021-0

2-s2.0-45549117987

38.

Yin

Han

P. S.

Truth discovery with multiple conflicting information providers on the Web

IEEE Transactions on Knowledge and Data Engineering 2008 20 6 796 808

10.1109/TKDE.2007.190745

2-s2.0-42949104529

Semantic Outbreak Power Based Evolution of Web Event in Large-Scale Ubiquitous Contexts

Abstract

1. Introduction

2. Related Work

2.1. Topic Detection

2.2. Topic Tracking

3. The Semantic Outbreak Power

3.1. Semantic Features of the Web Event

Definition 1 (the semantic features set of web event Fe).

Definition 2 (the new increased web page set: Δ φ ( t i , t j ) ).

Definition 3 (the new increased keywords set: Δ K ( t i , t j ) ).

Definition 4 (the distribution of event attributes in the new increased web page: ψ ( t i , t j ) ).

3.2. The Semantic Outbreak Power

Definition 5 (the outbreak power of web event: o p ( t i , t j ) ).

Proposition 6 (the representability of event keywords: e r ( k ) ).

Proposition 7 (the credibility of a web page: c ( φ ) ).

Algorithm 1: The steps of computing the semantic outbreak power.

Deduction 1.

Deduction 2.

Deduction 3.

Deduction 4.

Deduction 5.

Deduction 6.

4. Experiment

4.1. The Verification of the Semantics Outbreak Power

5. Conclusions

Footnotes

Conflict of Interests

Acknowledgments

References

Definition 2 (the new increased web page set: $Δ φ (t_{i}, t_{j})$ ).

Definition 3 (the new increased keywords set: $Δ K (t_{i}, t_{j})$ ).

Definition 4 (the distribution of event attributes in the new increased web page: $ψ (t_{i}, t_{j})$ ).

Definition 5 (the outbreak power of web event: $o p (t_{i}, t_{j})$ ).

Proposition 6 (the representability of event keywords: $e r (k)$ ).

Proposition 7 (the credibility of a web page: $c (φ)$ ).