Sage Journals: Discover world-class research

Abstract

This paper studies the problem of 3G downloads in vehicles on the move. Although the 3G brings larger coverage and instant access to data transfer, it may also incur high cost. We observe that many applications of vehicular 3G users can actually tolerate certain data access latency. In addition, vehicle-to-vehicle communications have been practical and can be exploited for intervehicle data delivery. Based on these observations, we propose to augment vehicular 3G users by data sharing through vehicle-to-vehicle communications. We formulate an optimization problem. The objective is to minimize the cost of 3G data communications, meanwhile maximizing the success probability of downloading all 3G user data. The two-hop transmission process and the bandwidth limitation in vehicular network are both modeled in the optimization problem. To lower the cost of 3G and meet the delivery ratio and delay constraints of data, one single-stage algorithm and one multistage algorithm are proposed for selection of seed vehicles (that download the data via 3G channel). We have evaluated our algorithm with simulations with real vehicular traces and the results show that our algorithms reduce the 3G cost and achieve good performance of data downloads.

1. Introduction

It becomes more and more common to use 3G in vehicles on the move. Vehicular 3G users often need to download files from the Internet through the 3G data network. For example, the update of the digital map should be downloaded to keep the navigation system aware of the latest changes to roads. Besides, in a mobile advertisement application, latest advertisements should be downloaded to vehicles to attract potential customers. In many of such vehicular application scenarios, many of the vehicular 3G users may be interested in common files.

3G cellular networks have been widely deployed and the large coverage of 3G networks allows 3G users to download files easily from the Internet with modest latency [1]. However, downloading data though 3G may incur high cost. For example, 100 MB of data traffic costs around 2–8 US dollars in Shanghai, China. Moreover, the quality of the 3G may degrade significantly as the vehicles move at a high speed or when they are in tunnels or distant regions with poor 3G coverage.

Recently, vehicle-to-vehicle communications have been ready through dedicated short range radio communications (DSRC). As a result, vehicular ad hoc networks (VANETs) would bring a new paradigm to data communications among mobile vehicles. Vehicle-to-vehicle communications have several salient advantages over the 3G. First, data transmission via vehicle-to-vehicle communications is free and no cost would be incurred. Second, the bandwidth is high when the link between two vehicles appears. And finally, the data communication is often possible when two vehicles approach the proximity of each other, independent of their moving speeds. One potential problem with VANETs is that a nonnegligible latency may be introduced for an end-to-end data delivery.

We observe that many applications of vehicular 3G users can often tolerate certain data access delays. For example, the update of digital maps can tolerate up to tens of minutes to hours before the map update is made to the navigation system. Motivated by this observation, we propose to harness vehicle-to-vehicle communications for 3G downloads, as illustrated in Figure 1. The basic idea is as follows. Each node is equipped with two radio interfaces, that is, the 3G radio and the intervehicle short range radio. First, a small set of nodes download the file from the Internet through the 3G network, and then these nodes share the files to those nodes which also request the same file through the VANET.

Figure 1

Illustration of vehicular 3G users exploiting vehicle-to-vehicle communications in the urban area of Shanghai, China.

In this paper, we consider a system consisting of moving vehicle nodes. A subset of the vehicle nodes are requesting a common set M of data objects, which can be downloaded from the Internet. A node in the subset can either download the file from the Internet or get it through vehicle-to-vehicle communications from other vehicles which have already obtained the file. The objective is to minimize the total cost of file downloads via the 3G while ensuring all requesting nodes obtained the files successfully. The key challenge is to determine which nodes to use and when to download the files from the Internet through the 3G.

There have been a number of data dissemination protocols in DTNs [2, 3]. However, due to the nature of DTNs, proper delivery cannot always be guaranteed. A few approaches consider the integration of 3G and VANET. However, they assume that either the network has a good connectivity [4] or the trajectories of the nodes are known [5].

We formulate the problem as an optimization problem of selecting seed vehicles (that download files via 3G). According to the number of opportunities that a node can download data from the Internet within the TTL (time to live), we propose a single-stage algorithm and a multistage algorithm for solving the optimization problem. The main idea is to reduce the 3G cost by exploiting the vehicle-to-vehicle communications. We have evaluated our algorithms and the results show that both the single-stage and the multistage algorithms outperform the baseline algorithm.

The key technical contributions are summarized as follows. (i)

We formulate the use of vehicle-to-vehicle communications to assist 3G downloads as an optimization problem of minimizing the 3G cost with the objective of meeting all demands of requesting nodes.

(ii)

We design two algorithms that capture the property of the equal importance of 3G users, bandwidth limitation of vehicle-to-vehicle communication channel, and time-dependent transmission situation.

(iii)

Comparisons of algorithms have been performed based on the real vehicular traces that have been collected from taxis in Shanghai and Shenzhen. Results show that our proposed algorithms have made significant improvements on reducing the 3G cost.

The remainder of this paper is organized as follows. The related work is discussed in Section 2. Empirical study is presented in Section 3. Section 4 presents the network model, the analysis of data transmissions in VANETs, and the problem formulation. We propose the single-stage and the multistage algorithms in Section 5. The performance evaluation based on real vehicular traces is presented in Section 6. Finally, we conclude our work in Section 7.

2. Related Work

In recent years there is an increasing research attention to VANETs [6]. Data sharing has widely been studied in both cellular networks and VANETs [4, 5, 7–9]. In this section, we briefly review the related work from three aspects: cellular networks, VANETs, and VANET-3G integrated networks.

2.1. Data Sharing in Cellular Networks

The cellular networks are ubiquitous for their convenience and efficiency. To reduce the cost of the telecommunication using cellular network, seminal work [10] proposes a unified cellular and ad hoc network (UCAN) architecture for enhancing throughput of cellular link with IEEE 802.11 link. Some studies [11–13] exploit secondary channels such as WiFi or Bluetooth to assist the mobile users with sharing data “pulled” via cellular network. In [12], a proxy scheduling scheme is investigated to deliver multimedia content to mobile peers. In [11], a multimedia content distribution problem is solved in a similar scenario and a fully distributed and cost-effective protocol is proposed to distribute the content to mobiles in a peer-to-peer manner. To emphasize the fairness issue, work [13] minimizes the energy assumption of file-sharing under cellular/Bluetooth networks. To augment 3G downlink rate among multicast receivers, the approach in [14] proposes a localized greedy algorithm to discover proxy that forwards the packets to the receivers through an IEEE 802.11-based ad hoc network. And all the scenarios share some similarity with our problem, in which we focus on the data pulled from the cellular network and then shared among the mobile vehicular users.

2.2. Data Sharing in DTNs and VANETs

A good survey on routing and data dissemination can be found in [15]. Neglecting the process of 3G downloads, VANETs assisted data dissemination in a sparse vehicular network is similar to the problem of data dissemination in DTN. Epidemic routing [16] proposes a basic idea for data dissemination in DTN. Reference [17] proposes a “communication efficient” swarming protocol which uses a gossip mechanism that leverages the inherent broadcast nature of the wireless medium and a data-selection strategy that takes proximity into account in decisions to exchange data. Recently, numerous forwarding strategies have been introduced in DTNs [18, 19]. Based on the social concepts “centrality” and “community,” [2, 20] exploit the property of community to select relays or forwarding path, and centrality is implemented in [2, 3]. Inspired by the aforementioned works, we also exploit the social forwarding path of vehicular network to analyze the nature of data dissimilation in the real trace based vehicular network.

2.3. Integration of 3G and VANET

In recent years, WiFi is exploited to assist and enhance 3G communication in [1, 21] in mobile vehicular scenarios. Furthermore, the approach in [4] proposes a VANET-3G integrated network architecture and investigates a cluster-based gateway selection strategy to link VANET to 3G in a good connectivity scenario that is different from our sparse scenario. In [5], Mongiovì et al. study a similar problem to our problem with the goal of minimizing the cost of remote communications (e.g., 3G). However, they assume the mobility trajectories of nodes in the network are known in advance.

Different from [4, 5], our problem is modeled as the initial data seeds selection via 3G channel and the data dissemination in vehicular networks based on the social forward path considering bandwidth limitation.

2.4. Routing in Vehicular Ad Hoc Networks

Vehicular trajectories have been exploited for routing in vehicular ad hoc networks [22] since the availability of future trajectories significantly reduces the uncertainty with vehicular mobility.

TBD [23] is a routing approach for using trajectories to forward data from vehicles to a given roadside access point (AP) in a light traffic vehicular network. Each node estimates the delivery delay to the AP based on its trajectory, which is then used as the metric for making forwarding decisions. Wu et al. [24] propose to predict the future location of a vehicle by modeling the mobility of a vehicle as a multiorder Markov chain and then estimate the encounter probability of each pair of vehicles. TSF [25] makes use of roadside units (RSUs) and trajectories to forward data from a fixed roadside unit (RSU) to a moving vehicle.

3. Empirical Study

In this section we show the fact that different vehicles have very different capabilities of broadcasting a data packet in the network through simulations based on real traces collected from taxis in two metro cities in China.

3.1. Datasets of Real Traces

We have two large-scale GPS datasets collected from taxis in Shanghai and Shenzhen, two representative cities in China. The details of two datasets are summarized in Table 1. We randomly choose 500 nodes from two datasets, respectively. With these traces data, we can extract the real driving trajectory of each taxi and then export the contacts between each pair of two vehicles. Thus, we can use the traces to simulate the mobility of vehicles.

Table 1

Dataset information.

Parameter	Shanghai taxi	Shenzhen taxi
Number of vehicles	2600	12959
Duration (hour)	17280	8544
Granularity	15–60	9–65
Number of encounters per node (per hour)	352	256

3.2. Revealing Different Broadcasting Capabilities of Vehicles

To reveal the fact that the broadcasting (sending a data packet to all other nodes) capabilities of different vehicles can be very different, we have conducted extensive simulations based on the real vehicle traces. In simulation, a vehicle (called seed vehicle) downloads a data item and broadcasts the single data packet to all the other vehicles in the network through in-vehicle communications. For both of these datasets, the number of forwarding hops is limited to two. The simple greedy forwarding strategy is adopted in simulation.

In Figures 2(a) and 2(b), the vehicle IDs (identity number) of 150 seed vehicles are sorted by their corresponding delivery ratios, and the IDs are then revised (e.g., after revising IDs, seed vehicle with ID = 1 has lowest delivery ratio and seed vehicle with ID = 150 has the largest delivery ratio). We report the delivery ratios achieved by different vehicles. In Figure 2(a), we can see that the capability of broadcasting of each vehicle ranges from 0 to 60% in the Shanghai dataset, and in Figure 2(b), it ranges from 0 to 50% in the Shenzhen dataset. Clearly, the results suggest that it is essential to select the good set of seed vehicles for achieving good performance. Based on such observation, we propose an approach to select vehicles contributing larger delivery ratios and helping reduce the cost of 3G downloads.

Figure 2

Delivery ratio of different seed.

4. Preliminaries and Problem Formulation

The major notations adopted in the paper are summarized in Table 2.

Table 2

Major notations adopted in the paper.

Notation	Explanation
$V$	Set of all vehicles
$v_{i}$	A vehicle in set $V$
λ	Contact rate of a pair of vehicles
TTL	The time to live of a data message
$U$	The set of all downloading nodes
$R$	The set of all relay nodes
$p_{s r d} (T)$	Path weight
$p_{s r}^{(k)} (T)$	One-hop forwarding probability
$S$	The set of all seed nodes

4.1. Preliminaries

We consider a network of mobile nodes (e.g., vehicles), denoted by $V = {v_{1}, v_{2}, v_{3}, \dots, v_{n}}$ , and each node is equipped with two wireless interfaces: the long range cellular interface (3G) and short range vehicular communication interface. Each node can communicate at any time with the Internet through 3G with a fixed cost. And the communications between two vehicles are noncostly, and the amount of data items delivered at one time is limited by the capacity of the short range wireless channel. Furthermore, the buffer size of each vehicle is regarded to be sufficiently large, which is reasonable in vehicular network scenarios. We assume a pair of nodes in the network encounter each other in a random time interval, and the ICT (intercontact time) between two contacts is exponentially distributed with the contact rate λ. And the ICT distribution of any node pair is different.

In our model, each sharing data object in the network has to be the newest; in other words, a TTL value T is attached to each data item. A data object has to be transmitted to a demand node within T. We consider there are diverse kinds of data objects, $P = {p_{1}, p_{2}, \dots, p_{k}}$ . According to the different demands for different data, nodes are divided into different groups. For simplification, this paper studies the situation of one group, which is denoted by $U = {u_{1}, u_{2}, u_{3}, \dots, u_{m}}$ , and $U \subseteq V$ . Every $u_{m}$ needs to obtain $P$ before the deadline time T. Note that each data item $p_{k}$ has a set of sources $S^{(k)}$ , $S^{(k)} \subseteq U$ , and the set of destinations $D^{(k)}$ has $D^{(k)} = U - S^{(k)}$ . The 3G users can act the roles of both source and destination. Most importantly, we assume that we can scheme the downloading strategy in a centralized way, and the information collected by the vehicles in $V$ can be viewed as global knowledge thanks to the information sharing through 3G channel. In vehicular channel, the effective two-hop relay strategy in [26] is implemented in our problem and relay set is denoted by $R = {r_{1}, r_{2}, r_{3}, \dots, r_{l}}$ .

4.2. Problem Formulation

In this paper, we want to select a minimized set of nodes that would download data through the 3G, meeting all data demands of the nodes in the network, under the time constraint of TTL. Next, we will introduce some basic concepts for modeling the VANET-assisted data download and sharing problem.

4.2.1. Two-Hop Forwarding Path

We introduce the concept of 2-hop social forwarding path proposed in [2]. A two-hop social forwarding path from a source s to a destination d via a relay r is denoted by $P_{s r d} = (s, r, d)$ . The path weight is the probability $p_{s r d} (T)$ that a data item is delivered from s to d via r within time T.

The intercontact time between two nodes is exponentially distributed, $p_{X_{i j}} (x) = λ_{i j} e^{- λ_{i j} x}$ ( $x \geq 0$ ). Then the total time Y needed to deliver a data item along the path $P_{s r d}$ can be calculated as $Y = X_{s r} + X_{r d}$ . And the PDF (probability distribution function) $p_{Y}$ can be obtained by calculating the convolution on $p_{i j} (x)$ :

\begin{matrix} p_{Y} (x) = p_{s r} (x) \otimes p_{r d} (x) . \end{matrix}

(1)

Theorem 1.

For a 2-hop social forwarding path $P_{s r d} = (s, r, d)$ with exponential parameters $λ_{s r}, λ_{r d}$ , and $p_{X_{i j}} (x) = λ_{i j} e^{- λ_{i j} x}$ , one has

\begin{matrix} p_{Y} (x) = \frac{λ_{r d}}{λ_{r d} - λ_{s r}} p_{x_{s r}} (x) + \frac{λ_{s r}}{λ_{s r} - λ_{r d}} p_{x_{r d}} (x) . \end{matrix}

(2)

Due to the limited pages, the details of the proof can be found in the Appendix of [2].

As a result, the probability $p_{s r d} (T)$ that a data item is delivered via path $P_{s r d}$ with time T is

\begin{array}{l} p_{s r d} (T) = P (Y < T) = \int_{- \infty}^{T} ‍ p_{Y} (x) d x \\ = \frac{λ_{r d}}{λ_{r d} - λ_{s r}} (1 - e^{- λ_{s r} T}) + \frac{λ_{s r}}{λ_{s r} - λ_{r d}} (1 - e^{- λ_{r d} T}) . \end{array}

(3)

4.2.2. Modeling Bandwidth Limitation

It is assumed that the capacity of transmission is limited in this paper, and the size of each data item is identical, denoted by z bytes. We assume that there are average C bytes that can be transmitted during one transmission. Namely, there are at most $a = C / z$ data items transmitted at one time.

Definition 2.

Suppose that there are $B_{i}$ data items in the buffer of node $u_{i}$ ; the one-hop forwarding probability of transmitting data item $p_{k}$ between nodes s and r, within time T, denoted by $p_{s r}^{(k)} (T)$ , is rewritten as

\begin{matrix} p_{s r}^{(k)} (T) = 1 - e^{- (a / B_{s}) λ_{s r} T} . \end{matrix}

(4)

The number of data items in relays' buffers is different at different time t ( $t < T$ ), denoted by $B (t)$ . Then we have the expected probability $p_{r d}^{(k)} (T)$ that data item $p_{k}$ is transferred from r to d within time T:

\begin{matrix} p_{r d}^{(k)} (T) = 1 - e^{- (a / E [B_{r} (t)]) λ_{r d} T} . \end{matrix}

(5)

We estimate the expectation of

B_{r} (t)

of the number of data items on relay r when

t = [0, T]

\begin{matrix} E [B_{r} (t)] = \frac{1}{T} (\sum_{i \in U} ‍ (\frac{λ_{i r} T}{2} \cdot \min \{\sum_{k \in P} ‍ x_{i}^{(k)}, a\})), \end{matrix}

(6)

where

E [B_{r} (t)]

is estimated by the number of encountered seed multiplied by the number of its downloaded data. As a result, we have the probability

p_{s r d}^{(k)} (T)

that data item

p_{k}

is transferred through the two-hop social forwarding path within time T:

\begin{array}{l} p_{s r d}^{(k)} (T) = \frac{h_{r}^{(k)} λ_{r d}}{h_{r}^{(k)} λ_{r d} - g_{s}^{(k)} λ_{s r}} (1 - e^{- g_{s}^{(k)} λ_{s r} T}) \\ + \frac{g_{s}^{(k)} λ_{s r}}{g_{s}^{(k)} λ_{s r} - h_{r}^{(k)} λ_{r d}} (1 - e^{- h_{r}^{(k)} λ_{r d} T}), \end{array}

(7)

where

g_{s}^{(k)} = a / B_{s}

and

h_{r}^{(k)} = a / E [B_{r} (t)]

. Obviously,

B_{s} = \sum_{k \in P} x_{s}^{(k)}

that the number of data items downloaded by node s.

4.2.3. Weight of Seeds

We denote the set of nodes that are chosen to download data through 3G by a seed set, denoted by $S$ .

Definition 3.

The weight $w_{i j}^{(k)} (T)$ indicates the probability that a node i downloads the data item $p_{k}$ through 3G and cannot deliver it to j via all the relays; that is, it is a weight of failure. Formally,

\begin{matrix} w_{i j}^{(k)} (T) = \prod_{r \in R} ‍ (1 - p_{i r j}^{(k)} (T)) . \end{matrix}

(8)

Obviously, $w_{i j}^{(k)} (T) = 0$ if $j \in S^{k}$ . Then we have the following failure weight vector $W^{(k)} (T)$ , which indicates the weight of 3G downloads decision on seeds set $S^{(k)}$ for data item $p_{k}$ :

\begin{matrix} W^{(k)} (T) =_{u_{j} \in U} (\prod_{u_{i} \in S^{(k)}} ‍ w_{i 1}^{(k)} (T), \dots, \prod_{u_{i} \in S^{(k)}} ‍ w_{i j}^{(k)} (T)) . \end{matrix}

(9)

W^{(k)} (T)

in (9) indicates the vector of probabilities that all the downloaded data items

p_{k}

cannot be delivered to all users before time T when seeds are selected as the set of

S^{(k)}

, which denotes the sets of seeds for data item

p_{k}

We define an indicator variable $x_{i}^{(k)}$ to denote whether a node downloads data objects $p_{k}$ through 3G. That is,

\begin{matrix} x_{i}^{(k)} = \{\begin{cases} 1, & user u_{i} downloads data item p_{k} via 3 G, \\ 0, & otherwise . \end{cases} \end{matrix}

(10)

Obviously,

u_{i} \in S^{(k)}

x_{i}^{(k)} = 1

. Then the total number of 3G downloads is

\sum_{u_{i} \in U} \sum_{p_{k} \in P} x_{i}^{(k)}

Definition 4 (seed selection problem).

Given the vehicular network with nodes $V$ , a group of users $U$ , and the sharing data object set $P$ , the objective is to minimize the size of the seed set while meeting the demands from all nodes within a TTL value T. One gives the definition of the seed selection problem formally:

\begin{array}{l} Minimize \sum_{u_{i} \in U} \sum_{p_{k} \in P} x_{i}^{(k)} \\ subject to \frac{‖W_{avg}^{(k)} (T)‖}{‖W_{c}‖} \cdot Cos_Sim (W_{avg} (k) (T), W_{c}) \\ \geq 1 - δ W_{c} = \{ϵ, ϵ, ϵ, \dots\}, \end{array}

(11)

where δ and ϵ are small number and $W_{avg}^{(k)} (T) = (1 / | P |) \sum_{k \in P} ‍ W^{(k)} (T)$ denotes the average value of $W^{(k)}$ of each data item in (9). One introduces the vector of weights on multidestination to capture the typical property of multidestination. Constraint vector, denoted by $W_{c} = {ϵ, ϵ, \dots}$ , is the probability that data could not be delivered to each destination. Calculating the similarity in the weight vector of selected seeds set with constraint vector emphasizes the equal importance of each 3G user destination for all data items. Similarity is calculated by cosine similarity times Euclidean length ratio between two vectors considering both length and angle of vectors. In (11), $‖W_{avg}^{(k)} (T)‖ / ‖W_{c}‖$ is the Euclidean length ratio, where $‖\cdot‖$ denotes the Euclidean length of vectors. And $Cos_Sim (W_{avg}^{(k)} (T), W_{c})$ is the cosine similarity between $W_{avg}^{(k)} (T)$ and $W_{c}$ . Finally, the similarity must be greater than $1 - δ$ to meet the demand of data sharing within time constraint T.

5. Design of Algorithms

In this section we present the design of the two algorithms. First, we give the basic idea of the algorithms and then describe the details of the two algorithms.

5.1. Basic Idea

We divide the process of data sharing into different stages according to the TTL value of data objects, T, and each stage has equal time period. During the start of each stage, a decision has to be made whether to download data objects through 3G or not. We assume the generated time of data objects and their corresponding TTL values are known in advance.

Figure 3 shows an example. One data item is created at 0 s, and the corresponding TTL value of the data is T seconds. There are n stages in Figure 3. T is divided into n equal periods. The vertically downward arrow at the beginning of each time period denotes that a node has a chance of downloading data through 3G. For example, there are 3 decision chances during T when $n = 3$ and so on when the number of stages is up to n. Notice that if there is one stage in the total process, we consider the decision time is at the beginning time (0 s), and in the following time, nodes in the network can only obtain sharing data through V2V communication until the last second. No matter what the number of stages is, at the last second, to meet the demand for sharing data, each node can still download data through 3G.

Figure 3

Basic idea of our approach.

In perspective of seed downloading process, we can classify download strategies into three categories: (1) only one user is randomly selected as seed for each data item and then shares it through VANETs, and 3G users who do not obtain the data via vehicular communication channel download the data at the last second before time constraint T; (2) 3G users implement download decisions at the beginning of time and a set of users are selected as seeds; (3) 3G users make downloading strategies along the time other than just at the beginning. Obviously, all the strategies request that 3G users have to download the required data before the last second of deadline.

As we can see, strategy 1 is a naive method which can be easily implemented in the integration networks. We mainly focus on the second and third downloading strategies. These two strategies can be regarded as single-stage decision and multistage decision along the time, respectively. In our problem, single-stage decision is made at the beginning when the data are injected into the networks. And for the multistage decision, we study the two-stage decision that makes download decision at times 0 and $T / 2$ as the special case of multistage.

5.2. Single-Stage Algorithm

In this subsection, we consider a simple scenario: single-stage strategy. In this strategy, users make 3G downloads decision at the zero time when there are new data requiring to be pulled via the 3G channel. Recall that all the 3G users need to obtain the fresh data through either 3G channel or vehicular channel within the time constraint T. To transmit data as much as possible through vehicular networks, seeds of data have to be chosen carefully and responsibly. We have formulated the seed selection problem (11) and solve it by using an efficient greedy algorithm. And the download decisions $x_{i}^{(k)}$ are made at the beginning.

Greedy Algorithm. A greedy algorithm is proposed in Algorithm 1 to solve the seed selection problem presented in (11). In each loop, a best seed for one data item is selected based on the improvement of $s i m i l a r i t y$ between current weight vector of selected seed sets and constraint vector. Obviously, vehicles that have high centrality [2], which means there is high average probability that the node contacts any other nodes, will be chosen as seeds, since the seed that has high centrality will gain high improvement of delivery probability by the two-hop social forward path. Such an algorithm captures the equal importance of each user other than the average delivery probability.

Algorithm 1: Greedy seeds selection algorithm.

INPUT: $λ_{i j}$ (between any two vehicles). $W_{c}$ (constraint vector), small number δ and T (TTL of data)

OUTPUT: 3G Downloads decision $x_{i}^{(k)}$ .

(1) repeat

(2) $D e l t a_{\max} : = 0$

(3) $s e e d : = n u l l$ {seed candidate}

(4) $d a t a : = n u l l$ {data item candidate}

(5) for every data p in $P$ do

(6) for every 3G user u in $U$ do

(7) Set u as the seed of p, and update $W_{avg}$

(8) $D e l t a_{tmp} = | S i m i l a r i t y (W_{avg}, W_{c}) - P r e v_{sim} |$

(9) if $D e l t a_{tmp} > D e l t a_{\max}$ then

(10) $D e l t a_{\max} : = D e l t a_{tmp}$

(11) $s e e d : = u$

(12) $d a t a : = p$

(13) end if

(14) end for

(15) end for

(16) COMMIT user $S e e d$ download data item $D a t a$

(17) $x_{s e e d}^{(d a t a)} : = t r u e$

(18) $S^{(d a t a)} . p u t (s e e d)$ {add $s e e d$ to the set $S^{(d a t a)}}$

(19) $P r e v_{sim} : = D e l t a_{\max}$

(20) until $S i m i l a r i t y (W_{avg}, W_{c}) > 1 - δ$

(21) return $x_{i}^{(k)}$

5.3. Multistage Algorithm

Although the single-stage strategy is simple to implement, it may not be cost-effective enough, since single-stage strategy could not capture the situation of overall network just based on the empirical value $λ_{i j}$ . In addition, it cannot adjust its strategy based on the up-to-date situation. Thus, we further propose a more practical scheme: a multistage strategy. A multistage update strategy is proposed to enhance the single-stage algorithm, in which 3G downloads decisions are made more than one time. Meanwhile, the constraint of each stage is reserved and in order of descending. As an example, for a 3-stage algorithm, the constraint number $ϵ = 0.7$ , 0.4, or 0.2, respectively. Therefore, multistage strategy can tune its download decision based on the performance of previous stage.

Similar to the single-stage strategy, the multistage strategy still works in a centralized way. In this multistage mode, our algorithm needs to collect and update some information about a node timely, for example, the buffer condition. We mainly care about two kinds of information of the buffer: the current used buffer size and data IDs. The former is an essential factor on modeling transmission opportunity under bandwidth limitation and the latter can help to recalculate weights.

Recalculate Weight. Suppose that there are n stages in the total process; the TTL of the sharing data is T. Before each decision is made, up-to-date information mentioned before is collected by 3G destination users and relays. New decision is based on the weights that are up-to-date. In the following formula, we take $n = 2$ , for example, and show how the weight is updated at the second stage. First, $h_{r}^{(k)}$ is updated by the observation of buffer size in each relay,

\begin{matrix} h_{r}^{(k)} = \frac{a}{E [B_{r} (T / 2)] + B}, \end{matrix}

(12)

where B is the observation of buffer size at time

T / 2

. As the value of

h_{r}^{(k)}

changes,

p_{s r d}^{(k)}

also needs to recalculate. Then, weight

w_{i j}^{(k)}

will be calculated in a new way,

\begin{matrix} w_{i j}^{(k)} = \prod_{r \in R, z_{r}^{(k)} = 0} ‍ (1 - p_{i r j}^{(k)} (\frac{T}{2})) \times \prod_{r \in R, z_{r}^{(k)} = 1} ‍ (1 - p_{r j}^{(k)} (\frac{T}{2})), \end{matrix}

(13)

where

z_{r}^{(k)} \in {1,0}

indicates whether the relay r has data

p_{k}

or not in its buffer. The weight is recalculated based on the up-to-date buffer information, which results in more accurate estimation on delivery probability.

We utilize a new greedy algorithm, illustrated in Algorithm 2, in which greedy Algorithm 1 uses new weight $w_{i j}^{(k)}$ in (13) and gets the new seed set $S_{new}$ (3G downloads decision $new_x_{i}^{(k)}$ ). It is an iterative process. $S$ calculated at the previous stage will be the input of Algorithm 1 and new seeds set $S_{new}$ is calculated for the current stage. After that the fresh selected seeds download the data through 3G channel and continue to share them through the vehicular channel. Notice that Algorithm 2 only describes one update process in one stage out of the total n stages. In the next stage, this process will be repeated again except that the input values are updated.

Algorithm 2: Multistage greedy seeds selection algorithm.

INPUT: $λ_{i j}$ (between any two vehicles). $W_{c}$ (specific constraint vector for stage m),

n (total number of stages), m (number of current stage), small number δ, T (TTL of data).

$x_{i}^{(k)}$ (3G downloads decision of previous stage)

OUTPUT: 3G downloads decision $n e w_x_{i}^{(k)}$ .

(1) $T : = T - m T / n$ {Calculate remaining time of TTL}

(2) Collect current situation

(3) $n e w_x_{i}^{(k)} : =$ Algorithm 1 (T, $x_{i}^{(k)}$ )

{Using T and $x_{i}^{(k)}$ as input, Run Algorithm 1, and get new 3G downloads decision.}

(4) Return $n e w_x_{i}^{(k)}$

6. Performance Evaluation

In this section we first present the evaluation methodology and the simulation setup and then discuss the evaluation results.

6.1. Methodology and Settings

We conduct simulations driven by real vehicle traces to evaluate the performance of the proposed schemes. In addition, we compare the proposed schemes with several competing schemes.

In simulation, we used two datasets of real vehicle mobility traces. The traces were collected from taxis from two major cities in China, that is, Shanghai and Shenzhen. The Shanghai dataset has around 4,000 taxis, and the Shenzhen dataset has around 12,000 taxis. The trace of a taxi records the GPS coordinates of the taxi on the interval of every 30 seconds. With the real traces, the exponential distribution rate $λ_{i j}$ for each pair of vehicles is computed beforehand.

We use the ONE simulator [27] to simulate the vehicular ad hoc network and the data sharing process. The real traces are fed into the ONE simulator to drive the vehicles. With this simulator, we do not simulate the radio signal propagation. In simulation, two vehicles can communicate as their distance is smaller than the communication range. We simplify the data transmission of a wireless link by assuming that one data message can be transmitted as two vehicles encounter each other. On the link layer, we adopt the CSMA as the media access control (MAC) protocol. The communication range is set to 200 m.

The default number of data items that each user needs to download is five. The number of $U$ is 50. For the Shanghai dataset, the time simulated is T = 7200 s, and for the Shenzhen dataset, the time simulated is T = 3600 s.

We compare the proposed schemes with the following three competing schemes.

(i) Random. Downloading one data item for each unique data item at the beginning.

(ii) Single-Stage. Algorithm (one-stage algorithm) making the 3G downloads decision at the beginning: for the algorithm, $δ = 0.2$ and $ϵ = 0.2$ .

(iii) Two-Stage. Algorithm in which decisions are made at the beginning and $T / 2$ : parameters of algorithm are set as $δ = 0.2$ and $ϵ = 0.4$ at stage one and $ϵ = 0.05$ at stage two.

We use the following four important performance metrics.

(i) Actual Average Delay. Actual average delay is the average value of the delivered data through vehicular network and data via 3G channel. Note that the delay of data downloaded at the end is actually the time of TTL.

(ii) Delivery Data Number. It is the number of data delivered through vehicular channel.

(iii) Ratio of 3G Downloads. It is the ratio of downloaded data through 3G channel to the sum number of data that needed to reach the 3G users.

(iv) Overhead Ratio. It is the ratio of number of relayed data over the number of delivered data during the process of vehicle-to-vehicle communication.

For all reported data points in the following, we repeated the simulation with the same configuration 20 times and the average is reported.

6.2. Impact of Number of 3G Users

We increase the number of 3G users in group $U$ to compare the performance of different algorithms. Figures 4 to 6 show the simulation result of Shanghai dataset and Figures 7 to 9 are corresponding to dataset of Shenzhen. Actual average delay under different number of 3G users is shown in Figure 4, which indicates that one-stage algorithm performs best since it makes 3G downloads decision at the beginning. In other words, there are more copies of sharing data than the other algorithms since the very beginning. Thus, the delay is much shorter than others. Average delay of two-stage algorithm increases until the number of the 3G users reaches 50 and then decreases. Result on average delay of Shenzhen dataset in Figure 7 is more smooth, and one-stage algorithm is still the best. The random algorithm is the worst one when considering metric of average delay. Figure 5 compares the algorithms using the metric of ratio of 3G downloads. We can see that as the number of 3G downloads increases, the proposed algorithms all have a lower ratio of 3G downloads than the random one; ratio of two-stage algorithm decreases as one-stage algorithm and the former is finally almost equal to the latter. In Figure 8, the two proposed algorithms perform closely and one-stage algorithm is better. Figures 6 and 9 investigate the algorithms in terms of utilization of intervehicle communication channel, and the proposed algorithms perform better than the random one.

Figure 4

Average delay versus number of 3G users (Shanghai).

Figure 5

Ratio of 3G downloads versus number of 3G users (Shanghai).

Figure 6

Delivered number of items through vehicular networks versus number of 3G users (Shanghai).

Figure 7

Average delay versus number of 3G users (Shenzhen).

Figure 8

Ratio of 3G downloads versus number of 3G users (Shenzhen).

Figure 9

Delivered number of items through vehicular networks versus number of 3G users (Shenzhen).

6.3. Impact of Traffic Load

We consider the impact of traffic load on the performances. Each user needs to download more data when increasing the total number of data that need to be downloaded. Thus the traffic load in the network increases. Figures 10 and 13 illustrate that one-stage and multistage algorithms have lower average delay than the random one, and one-stage algorithm is still the best. Ratio of 3G downloads in Figure 11 indicates that two-stage algorithm is slightly less than the random one, since the default number of 3G users is 50, which indicates that two-stage algorithm performs closely to the random one in Figure 5, and one-stage algorithm performs best as the traffic load increases. In Figure 14, the two proposed algorithms perform closely to and better than the random one. In Figures 12 and 15, one-stage and two-stage algorithms utilize the capacity of vehicle channel more as the traffic load increases.

Figure 10

Average delay versus total number of data items (Shanghai).

Figure 11

Ratio of 3G downloads versus total number of data items (Shanghai).

Figure 12

Delivered items through vehicular networks versus total number of data items (Shanghai).

Figure 13

Average delay versus total number of data items (Shenzhen).

Figure 14

Ratio of 3G downloads versus total number of data items (Shenzhen).

Figure 15

Delivered items through vehicular networks versus total number of data items (Shenzhen).

6.4. Communication Overhead

It seems that one-stage algorithm performs better than the multistage one, since it has more time to utilize the capacity of vehicle-to-vehicle communication. Though vehicle-to-vehicle communication is of low cost on money, cost on communication should be taken into consideration when evaluating an algorithm in vehicular networks. When we consider the cost of delivering data in the vehicular networks, we find that multistage algorithm is less costly. According to Figures 16 and 17, one-stage algorithm has larger overhead ratio under most cases, because two-stage algorithm collects the information about the transmission process of vehicular networks and makes up-to-date downloads decisions that lead to less overhead in vehicle-to-vehicle communication.

Figure 16

Overhead of delivered data under different number of 3G users.

Figure 17

Overhead of delivered data under different traffic load.

6.5. Summary

In most situations, the proposed one-stage and multistage algorithms both perform better than the random one. One-stage algorithm achieves the lowest delay because of its property of earliest downloading. And two-stage algorithm has lower cost on vehicle-to-vehicle communication since it takes network's situation of each stage into consideration. Both proposed algorithms lower the cost of 3G downloads. From our simulation, the performance of the proposed algorithm is also sensitive to factors δ and ϵ. Meanwhile, we assume the time period of each stage is equal, which is limited in our approaches. Dynamic stage chosen approaches are left for future work.

7. Conclusion

In this paper we have modeled the 3G downloading and sharing problem for 3G vehicular users where vehicle-to-vehicle communications are exploited. A two-hop social forwarding path in vehicular networks is analyzed as the basis of the sharing problem, based on which single-stage and multistage downloading approaches are proposed to solve the key problem of seed selection. The single-stage and multistage algorithms are evaluated with real trace based simulations against other alternative algorithms. The results demonstrate that the proposed single-stage and multistage algorithms can make significant improvements on data delay and ratio of 3G downloads in different scenarios and achieve the goal of minimizing the cost of 3G data downloads.

In our work, we have assumed exponentially distributed ICT for the contact time in a pair of vehicles. To gain this information, we rely on the contact records between two vehicles and compute the average interval to estimate the distribution parameter. It is clear that as we have a longer history of contact records, we could have a better estimate for the ICT parameter.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The research is supported by China 863 Program (no. 2013AA01A601).

References

Balasubramanian

Mahajan

Venkataramani

Augmenting mobile 3G using WiFi

Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys ′10)

June 2010

209 222

10.1145/1814433.1814456

2-s2.0-77955011171

Gao

Zhao

Cao

Multicasting in delay tolerant networks: a social network perspective

Proceedings of the 10th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc ′09)

May 2009

299 308

10.1145/1530748.1530790

2-s2.0-70450161107

Gao

Cao

User-centric data dissemination in disruption tolerant networks

Proceedings of the IEEE INFOCOM

April 2011

Shanghai, China

3119 3127

10.1109/INFCOM.2011.5935157

2-s2.0-79960866603

Benslimane

Taleb

Sivaraj

Dynamic clustering-based adaptive mobile gateway management in integrated VANET 3G heterogeneous wireless networks

IEEE Journal on Selected Areas in Communications 2011 29 3 559 570

2-s2.0-79951989493

10.1109/JSAC.2011.110306

Mongiovì

Singh

A. K.

Yan

Zong

Psounis

Efficient multicasting for delay tolerant networks using graph indexing

Proceedings of the IEEE Conference on Computer Communications (INFOCOM ′12)

March 2012

Orlando, Fla, USA

1386 1394

10.1109/INFCOM.2012.6195503

2-s2.0-84861625680

Hartenstein

Laberteaux

K. P.

A tutorial survey on vehicular ad hoc networks

IEEE Communications Magazine 2008 46 6 164 171

10.1109/MCOM.2008.4539481

2-s2.0-45749099297

Sahu

P. K.

E. H.-K.

Sahoo

Gerla

BAHG: back-bone-assisted hop greedy routing for VANET's city environments

IEEE Transactions on Intelligent Transportation Systems 2013 14 1 199 213

10.1109/TITS.2012.2212189

2-s2.0-84879299317

Zeng

Xiang

Vasilakos

A. V.

Directional routing and scheduling for green vehicular delay tolerant networks

Wireless Networks 2013 19 2 161 173

10.1007/s11276-012-0457-9

2-s2.0-84878547961

Jeong

D. H. C.

TMA: trajectory-based Multi-Anycast forwarding for efficient multicast data delivery in vehicular networks

Computer Networks 2013 57 13 2549 2563

10.1016/j.comnet.2013.05.002

2-s2.0-84880922386

10.

Luo

Ramjee

Sinha

UCAN: a unified cellular and ad-hoc network architecture

Proceedings of the 9th Annual International Conference on Mobile Computing and Networking (MobiCom ′03)

September 2003

353 367

2-s2.0-1542358969

10.1145/938985.939021

11.

Leung

M.-F.

Chan

S.-H. G.

Broadcast-based peer-to-peer collaborative video streaming among mobiles

IEEE Transactions on Broadcasting 2007 53 1 350 361

2-s2.0-33847708411

10.1109/TBC.2006.889093

12.

Kang

S.-S.

Mutka

M. W.

A mobile peer-to-peer approach for multimedia content sharing using 3G/WLAN dual mode channels

Wireless Communications and Mobile Computing 2005 5 6 633 645

10.1002/wcm.332

2-s2.0-25444436689

13.

Iera

Militano

Romeo

L. P.

Scarcello

Fair cost allocation in cellular-bluetooth cooperation scenarios

IEEE Transactions on Wireless Communications 2011 10 8 2566 2576

10.1109/TWC.2011.052511.100749

2-s2.0-84860395563

14.

Bhatia

Luo

Ramjee

ICAM: integrated cellular and ad hoc multicast

IEEE Transactions on Mobile Computing 2006 5 8 1004 1015

2-s2.0-33746344974

10.1109/TMC.2006.116

15.

Chen

Guha

R. K.

Kwon

T. J.

Lee

Hsu

Y.-Y.

A survey and challenges in routing and data dissemination in vehicular ad hoc networks

Wireless Communications and Mobile Computing 2011 11 7 787 795

10.1002/wcm.862

2-s2.0-79960552217

16.

Vahdat

Becker

Epidemic routing for partially connected ad hoc networks

2000 CS-2000-06

Duke University

17.

Nandan

Das

Pau

Gerla

Sanadidi

M. Y.

Co-operative downloading in vehicular ad-hoc wireless networks

Proceedings of the 2nd Annual International Conference on Wireless On-Demand Network Systems and Services (WONS ′05)

January 2005

32 41

10.1109/WONS.2005.7

2-s2.0-84890060973

18.

Daly

E. M.

Haahr

Social network analysis for routing in disconnected delay-tolerant MANETs

Proceedings of the 8th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc' 07)

September 2007

32 40

10.1145/1288107.1288113

2-s2.0-37849017498

19.

Hui

Crowcroft

Yoneki

BUBBLE rap: social-based forwarding in delay tolerant networks

Proceedings of the 9th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc ′08)

May 2008

241 250

10.1145/1374618.1374652

2-s2.0-57349133691

20.

Boldrini

Conti

Passarella

ContentPlace: social-aware data dissemination in opportunistic networks

Proceedings of the 11th International Symposium on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM ′08)

October 2008

203 210

10.1145/1454503.1454541

2-s2.0-63449110577

21.

Hou

Deshpande

Das

S. R.

Moving bits from 3G to metro-scale WiFi for vehicular network access: an integrated transport layer solution

Proceedings of the 19th IEEE International Conference on Network Protocols (ICNP ′11)

October 2011

Vancouver, Canada

353 362

10.1109/ICNP.2011.6089074

2-s2.0-84055187762

22.

Niculescu

Nath

Trajectory based forwarding and its applications

Proceedings of the 9th Annual International Conference on Mobile Computing and Networking (MobiCom ′03)

September 2003

260 272

2-s2.0-1542358974

23.

Jeong

Guo

D. H. C.

Trajectory-based data forwarding for light-traffic vehicular Ad Hoc networks

IEEE Transactions on Parallel and Distributed Systems 2011 22 5 743 757

10.1109/TPDS.2010.103

2-s2.0-79953310708

24.

Zhu

Trajectory improves data delivery in vehicular networks

Proceedings of the IEEE Computer and Communications Societies (INFOCOM ′11)

April 2011

2183 2191

10.1109/INFCOM.2011.5935031

2-s2.0-79960867822

25.

Jeong

Guo

TSF: trajectory-based statistical forwarding for infrastructure-to-vehicle data delivery in vehicular networks

Proceedings of the 30th IEEE International Conference on Distributed Computing Systems (ICDCS ′10)

June 2010

Genova, Italy

IEEE

557 566

10.1109/ICDCS.2010.24

2-s2.0-77955861101

26.

Gunawardena

Karagiannis

Proutiere

Santos-Neto

Vojnovic

Scoop: decentralized and opportunistic multicasting of information streams

Proceedings of the 17th Annual International Conference on Mobile Computing and Networking (MobiCom ′11)

September 2011

169 180

10.1145/2030613.2030633

2-s2.0-80053612689

27.

Keranen

Ott

Karkkainen

The ONE simulator for DTN protocol evaluation

Proceedings of the 2nd International Conference on Simulation Tools and Techniques

2009

Harnessing Vehicle-to-Vehicle Communications for 3G Downloads on the Move

Abstract

1. Introduction

2. Related Work

2.1. Data Sharing in Cellular Networks

2.2. Data Sharing in DTNs and VANETs

2.3. Integration of 3G and VANET

2.4. Routing in Vehicular Ad Hoc Networks

3. Empirical Study

3.1. Datasets of Real Traces

3.2. Revealing Different Broadcasting Capabilities of Vehicles

4. Preliminaries and Problem Formulation

4.1. Preliminaries

4.2. Problem Formulation

4.2.1. Two-Hop Forwarding Path

Theorem 1.

4.2.2. Modeling Bandwidth Limitation

Definition 2.

4.2.3. Weight of Seeds

Definition 3.

Definition 4 (seed selection problem).

5. Design of Algorithms

5.1. Basic Idea

5.2. Single-Stage Algorithm

Algorithm 1: Greedy seeds selection algorithm.

5.3. Multistage Algorithm

Algorithm 2: Multistage greedy seeds selection algorithm.

6. Performance Evaluation

6.1. Methodology and Settings

6.2. Impact of Number of 3G Users

6.3. Impact of Traffic Load

6.4. Communication Overhead

6.5. Summary

7. Conclusion

Footnotes

Conflict of Interests

Acknowledgment

References