Toll Fraud Detection of VoIP Service Networks in Ubiquitous Computing Environments

Abstract

Voice over Internet Protocol (VoIP) is an emerging communication service that has advanced in ubiquitous computing environments. Although VoIP is inexpensive and offers additional services, there has been little provision for attacks at the weak points. With the advances of Wireless Sensor Network (WSN) technologies, the risk is increasing. Due to the resource constraints of WSN, attacks have become easier, making protection of the network more difficult. In this work, we attempt to distinguish fraud call attacks as outliers from normal calls on the basis of call detail records. We adopted and applied a Local Outlier Factor (LOF) method on real call data, which include actual fraud call attacks. Our results show the outlier detection method can be effective in detecting fraud calls. Moreover, introducing two additional attributes related to fraud call characteristics enhanced the detection performance.

1. Introduction

Voice over Internet Protocol (VoIP) is an emerging communication service that has advanced in both technological and industrial viewpoints. The prevalent usage of VoIP has led to increased attempts of toll fraud, which is defined as the unauthorized use of a telecommunications system by an unauthorized party [1]. Toll fraud often results in substantial additional charges for telecommunications services. Communications Fraud Control Association estimated global toll fraud losses in 2013 to be $46.3 billion (USD) [2].

The risk of attack is increasing in a ubiquitous computing environment with the emergence of Wireless Sensor Network (WSN) technologies. Recently outlier detection in WSNs has attracted much attention [3], as they are prone to outliers [4]. In particular, the resource constraints of WSN “make it easy to attach and hard to protect” [5].

Despite significant losses from attacks, there has been little provision for preventing attacks at the weak points on the networks. Existing fraud analysis applications rely on rule-based systems [6], in which fraud patterns are predefined by a set of multiple conditions. As a result, the fraud detection effectiveness is often limited. Relying on the knowledge of domain experts, a rule-based approach is ineffective in providing early warning; it is vulnerable to unknown and abnormal fraud patterns [7, 8]. In addition, as the threat of VoIP network increased in ubiquitous computing environments, preventing the VoIP fraud is critical to network service providers.

To overcome the limitations of existing approaches, we propose utilizing the Local Outlier Factor (LOF) method to detect toll fraud attacks. LOF is an outlier detection algorithm based on density in which call detail records (CDRs) from the VoIP service provider are used. CDRs document the details of a phone call that passed through a facility or device. Recently, the LOF method has been successfully applied to outlier detection [9]. In [10], it is shown that LOF typically achieved better performance in network intrusion identification, compared with existing outlier detection algorithms.

Comparative experiments based on actual CDR from the VoIP companies have verified the effectiveness of the proposed approach. We expect that our proposed method will increase both efficiency and effectiveness of toll fraud detection in VoIP services, by overcoming the limitation of existing rule-based approaches.

The rest of this paper is organized as follows. In Section 2, relevant literature is reviewed and the LOF method is demonstrated. In Section 3, the structure of CDR data and experimental settings are included. In Section 4, experimental results are presented and their implications are discussed. In Section 5, we conclude our work and discuss directions of future work.

2. Background

2.1. Fraud Detection

Rule-based approaches and neural networks are examples of methods previously employed in toll fraud prevention. A rule-based approach uses predefined rules developed by experts. A notification is triggered when a rule is satisfied. As long as an effective set of rules can be defined, the rule-based method can be effective against fraud attacks. This method is ineffective for unknown types of fraud [7]. Rosset et al. [6] proposed a rule-discovery framework for fraud detection, in which candidate rules are identified first and the most relevant rules are selected on the basis of a suggested algorithm. Ruiz-Agundez et al. [11] proposed a rule-based fraud detection framework for VoIP services. In the proposed framework, a rule engine is generated using a knowledge base. Olszewski [12] attempted to construct user profiles on the basis of Kullback-Leibler divergence to prevent fraud detection.

Other fraud detection approaches include applications of neural networks, which can overcome the limitation of the rule-based approaches. While they can be more effective against unknown types of toll fraud, they also have limitations. Neural networks have difficulty presenting the interaction of cause and effect of detection. Burge and Shawe-Taylor [13] used a recurrent neural network technique based on unsupervised learning. Taniguchi et al. [14] proposed a feedforward neural network technique, a Gaussian mixture model, and a Bayesian network.

2.2. Local Outlier Factor

The LOF (Local Outlier Factor) is an outlier detection algorithm proposed by Breuniq et al. [15]. The LOF has been applied in a variety of fields [16]. With the LOF, outlier instances are located distantly from neighbor instances in a multidimensional space, whereas normal instances gather relatively close to each other. Therefore, an instance with low proximity to its neighbor instances within a certain range can be regarded as an outlier. In this case, the relative index of isolation for the subject instance is defined as an LOF, which can be calculated by the following procedure.

(1) Calculation of k-Distance. For the subject instance q, calculate the $k -distance (q)$ as the distance between q and its kth nearest neighbor.

(2) Calculation of Reachability Distance. As depicted in Figure 1, the reachability distance is determined by the $k -distance (q)$ . In detail, a reachable distance from an instance q to its neighbor instance q is defined as the maximum value of the simple distance between the instances. The $k -distance (q)$ can be written as follows:

\begin{matrix} reach-dist (p, q) = \max (k -distance (q), d (p, q)) . \end{matrix}

(1)

Figure 1

Reachability distance.

(3) Calculation of the Local Reachability Density. When MinPts indicates the number of neighbors considered, the local reachability density (lrd) can be calculated using the following equation:

\begin{matrix} lrd (q) = \frac{MinPts}{\sum_{p \in MinPts's nearest} reach_dis t_{MinPts} (p, q)} . \end{matrix}

(2)

In other words, an lrd is the reciprocal number of the average reachable distance to the neighbor instances.

(4) Derivation of LOF. Finally, the LOF is derived by comparing the densities of the subject instance and its neighbors. This relative index is defined as the average ratio of the lrd of a neighbor instance $(lrd (p))$ over the lrd of the subject $(lrd (q))$ :

\begin{matrix} LOF (q) = \frac{1}{MinPts} \cdot \sum_{p \in MinPts's nearest} \frac{lrd (p)}{lrd (q)} . \end{matrix}

(3)

Even though the LOF method is known to be effective in outlier detection, its application to CDR requires understanding and preprocessing of CDR data, which is explained in the next section.

3. Data and Experiment Setting

In our study, call detail records (CDRs) of a VoIP service were used. We collected two samples, including actual fraud call attacks. A specific product class of the target service provider was selected, and then the CDR was obtained by routing those calls to an Asterisk server (http://www.asterisk.org). Figure 2 depicts a portion of the sample CDR data.

Figure 2

Call detail records data.

The aforementioned sample data consists of two separate datasets. Dataset #1 includes 105 fraud calls within 1,159 instances from June 29 to July 2, 2012. Dataset #2 includes 87 attacks within 2,062 calls from July 18 to July 20, 2012.

For each dataset, normalization was applied to resolve the differences in scale across attributes. Each column vector corresponding to an attribute was rescaled into the range $[- 1, 1]$ .

The original CDR data contained 18 unique attribute columns. In order to reduce complexity that might hamper the analysis and increase fraud detection effectiveness, we reduced the number of columns to six, as shown in Table 1.

Table 1

Fundamental attributes.

Attribute	Description	Note
Call time	Start time of a call	Unix timestamp
dst_length	Length of destination extension	Excluding international call number
Duration	Duration of a call (after call time)	Seconds
Billsec	Duration of a call (after answering)	Seconds
Disposition	Code for call status	0 (answered), 1 (no answer), 2 (busy), and 3 (failed)
Uniqueid	Instance's ID

In addition to the six primary columns, we introduced two additional variables for better detection, as shown in Table 2.

Table 2

Additional attributes.

Attributes	Description
RTP address	Caller's RTP IP address
country_code	Caller's country code

According to the settings on the attributes used, the experiment was conducted in two separate steps: (1)

Experiment 1: analysis of datasets #1 and #2 using the six fundamental attributes;

(2)

Experiment 2: analysis of datasets #1 and #2 using the six fundamental and two additional attributes.

Tests were performed on MATLAB software with different MinPts: 10, 20, 30, and 40. Consequently, the top 2% to 10% of instances (with respect to the LOF values) were categorized into outliers. We assessed the performance by verifying whether the detected outliers successfully indicated actual threats.

In general, there are two criteria for the outlier detection performance. First, precision refers to the ratio of actual attacks among outliers or notifications that correctly identify the fraud call attacks:

\begin{matrix} precision = \frac{n (FC | outliers)}{n (outliers)}, \end{matrix}

(4)

where

n (outliers)

denotes the number of outliers and

n (FC | outliers)

indicates the number of actual attacks in the outliers. The second criterion is recall, which indicates the percentage of real attacks that can be prevented by the detection procedure:

\begin{matrix} recall = \frac{n (FC | outliers)}{n (FC)}, \end{matrix}

(5)

where

n (FC)

is the total number of fraud calls in the test sample.

4. Results and Discussion

4.1. Experiment 1

In the first experiment, six attributes in Table 1 were utilized to produce LOF values. As seen in the performance measures shown in Figures 3 and 4, when the number of outliers increased from 2% to 10% of whole instances, higher recall and lower precision were obtained. We did not observe a conclusive trend with regard to the MinPts; we achieved the optimal performance when MinPts was set at 30 for both datasets.

Figure 3

Result for Experiment 1 on dataset #1.

Figure 4

Result for Experiment 1 on dataset #2.

With regard to outlier selection, we observed a significant difference between datasets. When a small set of outliers were selected (2–4% of total instances), the detection procedure exposed few actual threats for dataset #2, while the precisions were 10%–25% for dataset #1. However, for an 8%–10% outlier size, recalls on dataset #2 were quite higher than on dataset #1; some instances with the highest LOF values in dataset #2 were not actual fraud calls.

4.2. Experiment 2

In Experiment 2, we used two additional attributes: RTP address and country code to calculate the LOF values.

Figures 5 and 6 show a notable improvement in performance compared to the first experiment. Particularly for dataset #2, even with a small number of outliers, the precision and recall were quite satisfactory. Moreover, the intersubset difference was not as prevalent as in Experiment 1; the recall measures were found to be better on dataset #2 for Experiment 2. We also observed a correlation between the MinPts value and dataset performance. On both subsets, a MinPts value of 10 resulted in poor precision and recall values.

Figure 5

Result for Experiment 2 on dataset #1.

Figure 6

Result for Experiment 2 on dataset #2.

To verify the difference between the results from both experiments, we executed a paired t-test on recall and precision measures. As summarized in Tables 3 and 4, two additional attributes induced improvements at 1% significance level, in terms of both recall and precision values on both subsets. The largest difference was found in dataset #2, in which recall improved by a factor of nine and precision improved by a factor of 15.

Table 3

Comparison between Experiments 1 and 2 in terms of recall.

	Total	Data #1	Data #2
Experiment 1	6.9%	9.9%	4.0%
Experiment 2	32.5%	27.8%	37.2%
Significance at 1%	O	O	O

Table 4

Comparison between Experiments 1 and 2 in terms of precision.

	Total	Data #1	Data #2
Experiment 1	9.2%	16.3%	2.0%
Experiment 2	41.3%	51.4%	31.1%
Significance at 1%	O	O	O

Overall, the experimental results have shown that the proposed approach can be effective for outlier detection of VoIP services, overcoming the limitation of existing rule-based approaches which are often ineffective for unknown types of fraud.

5. Conclusion

VoIP services have advanced both technologically and commercially with the emergence of broadband internet. However, due to the characteristics of the network it uses, the service is inherently vulnerable to various attacks. Additionally, detection and planning for these threats have not kept pace with advances in technology.

Our study proposed a detection method for fraud call attacks based on VoIP CDRs. The main idea of the suggested approach was that a fraud call exhibits a different CDR form; thus, it can be regarded as an outlier.

Among various techniques for outlier detection, we utilized the LOF method considering the difference in local density between the target instances and neighboring instances.

The LOF method produced limited results during the first empirical experiment on two sample datasets, which included actual fraud call attacks. However, these results improved considerably through the introduction of two additional attributes. Satisfactory performance in the second experiment demonstrated the LOF as an effective method to detect attacks, in addition to emphasizing the importance of selecting or designing meaningful attributes.

As our research on VoIP services was successful in fraud call detection, applying the LOF method on similar services would be an interesting research opportunity to investigate the differences according to the characteristics of target services. In particular, outlier detection in Wireless Sensor Networks (WSNs) is noteworthy, as WSNs are inherently prone to attacks and the LOF method can be easily applied to the outlier detection of WSNs.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the research program funded by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (MEST) (NRF-2013R1A1A22011169).

References

Avaya Avaya Toll Fraud Security Guide 2010

Avaya

CFCA Global Telecom Fraud Increases by 0.21% from 2011, Still Near 5-Year Low 2013

Roseland, NJ, USA

Communications Fraud Control Association

Zhang

Meratnia

Havinga

Outlier detection techniques for wireless sensor networks: a survey

IEEE Communications Surveys & Tutorials 2010 12 2 159 170

2-s2.0-77955082590

10.1109/surv.2010.021510.00088

Branch

J. W.

Giannella

Szymanski

Wolff

Kargupta

In-network outlier detection in wireless sensor networks

Knowledge and Information Systems 2013 34 1 23 54

10.1007/s10115-011-0474-5

2-s2.0-84872353254

Huang

Tan

Behavior-based trust in wireless sensor network

Advanced Web and Network Technologies, and Applications 2006 3842

Berlin, Germany

Springer

214 223 Lecture Notes in Computer Science

10.1007/11610496_27

Rosset

Murad

Neumann

Idan

Pinkas

Discovery of fraud rules for telecommunications—challenges and solutions

Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 1999

San Diego, Calif, USA

ACM

409 413

10.1145/312129.312303

Kim

Cho

N. W.

Lee

Y. J.

Kang

S.-H.

Kim

Hwang

Mun

Application of density-based outlier detection to database activity monitoring

Information Systems Frontiers 2013 15 1 55 65

10.1007/s10796-010-9266-9

2-s2.0-84874945119

Yeon

Shim

Lee

Outlier detection techniques for biased opinion discovery

The Journal of Society for e-Business Studies 2013 18 4 315 326

10.7838/jsebs.2013.18.4.315

Pokrajac

Lazarevic

Latecki

L. J.

Incremental local outlier detection for data streams

Proceedings of the 1st IEEE Symposium on Computational Intelligence and Data Mining (CIDM ‘07)

April 2007

Honolulu, Hawaii, USA

IEEE Press

504 515

10.1109/cidm.2007.368917

2-s2.0-34548752457

10.

Lazarevic

Ertoz

Kumar

Ozgur

Srivastava

A comparative study of anomaly detection schemes in network intrusion detection

Proceedings of the 3rd SIAM International Conference on Data Mining

2003

San Francisco, Calif, USA

10.1137/1.9781611972733

11.

Ruiz-Agundez

Penya

Y. K.

Garcia Bringas

Samarati

Tunstall

Posegga

Markantonakis

Sauveron

Fraud detection for voice over IP services on next-generation networks

Information Security Theory and Practices. Security and Privacy of Pervasive Systems and Smart Devices 2010 6033

Berlin, Germany

Springer

199 212 Lecture Notes in Computer Science

10.1007/978-3-642-12368-9_14

12.

Olszewski

A probabilistic approach to fraud detection in telecommunications

Knowledge-Based Systems 2012 26 246 258

10.1016/j.knosys.2011.08.018

2-s2.0-84155186545

13.

Burge

Shawe-Taylor

Detecting cellular fraud using adaptive prototypes

Proceedings of the AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management

1997

9 13

14.

Taniguchi

Haft

Hollmen

Tresp

Fraud detection in communication networks using neural and probabilistic methods

1242

Proceedings of the 23rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ‘98)

May 1998

Seattle, Wash, USA

1241 1244

10.1109/icassp.1998.675496

2-s2.0-84892142402

15.

Breuniq

M. M.

Kriegel

H.-P.

R. T.

Sander

LOF: identifying density-based local outliers

SIGMOD Record 2000 29 2 93 104

2-s2.0-0039253819

16.

Kang

B.-Y.

Kim

D.-S.

Kang

S.-H.

Extended KNN imputation based LOF prediction algorithm for real-time business process monitoring method

The Journal of Society for e-Business Studies 2010 15 303 317