Abstract
This paper studies the problem of 3G downloads in vehicles on the move. Although the 3G brings larger coverage and instant access to data transfer, it may also incur high cost. We observe that many applications of vehicular 3G users can actually tolerate certain data access latency. In addition, vehicle-to-vehicle communications have been practical and can be exploited for intervehicle data delivery. Based on these observations, we propose to augment vehicular 3G users by data sharing through vehicle-to-vehicle communications. We formulate an optimization problem. The objective is to minimize the cost of 3G data communications, meanwhile maximizing the success probability of downloading all 3G user data. The two-hop transmission process and the bandwidth limitation in vehicular network are both modeled in the optimization problem. To lower the cost of 3G and meet the delivery ratio and delay constraints of data, one single-stage algorithm and one multistage algorithm are proposed for selection of seed vehicles (that download the data via 3G channel). We have evaluated our algorithm with simulations with real vehicular traces and the results show that our algorithms reduce the 3G cost and achieve good performance of data downloads.
1. Introduction
It becomes more and more common to use 3G in vehicles on the move. Vehicular 3G users often need to download files from the Internet through the 3G data network. For example, the update of the digital map should be downloaded to keep the navigation system aware of the latest changes to roads. Besides, in a mobile advertisement application, latest advertisements should be downloaded to vehicles to attract potential customers. In many of such vehicular application scenarios, many of the vehicular 3G users may be interested in common files.
3G cellular networks have been widely deployed and the large coverage of 3G networks allows 3G users to download files easily from the Internet with modest latency [1]. However, downloading data though 3G may incur high cost. For example, 100 MB of data traffic costs around 2–8 US dollars in Shanghai, China. Moreover, the quality of the 3G may degrade significantly as the vehicles move at a high speed or when they are in tunnels or distant regions with poor 3G coverage.
Recently, vehicle-to-vehicle communications have been ready through dedicated short range radio communications (DSRC). As a result, vehicular ad hoc networks (VANETs) would bring a new paradigm to data communications among mobile vehicles. Vehicle-to-vehicle communications have several salient advantages over the 3G. First, data transmission via vehicle-to-vehicle communications is free and no cost would be incurred. Second, the bandwidth is high when the link between two vehicles appears. And finally, the data communication is often possible when two vehicles approach the proximity of each other, independent of their moving speeds. One potential problem with VANETs is that a nonnegligible latency may be introduced for an end-to-end data delivery.
We observe that many applications of vehicular 3G users can often tolerate certain data access delays. For example, the update of digital maps can tolerate up to tens of minutes to hours before the map update is made to the navigation system. Motivated by this observation, we propose to harness vehicle-to-vehicle communications for 3G downloads, as illustrated in Figure 1. The basic idea is as follows. Each node is equipped with two radio interfaces, that is, the 3G radio and the intervehicle short range radio. First, a small set of nodes download the file from the Internet through the 3G network, and then these nodes share the files to those nodes which also request the same file through the VANET.

Illustration of vehicular 3G users exploiting vehicle-to-vehicle communications in the urban area of Shanghai, China.
In this paper, we consider a system consisting of moving vehicle nodes. A subset of the vehicle nodes are requesting a common set M of data objects, which can be downloaded from the Internet. A node in the subset can either download the file from the Internet or get it through vehicle-to-vehicle communications from other vehicles which have already obtained the file. The objective is to minimize the total cost of file downloads via the 3G while ensuring all requesting nodes obtained the files successfully. The key challenge is to determine which nodes to use and when to download the files from the Internet through the 3G.
There have been a number of data dissemination protocols in DTNs [2, 3]. However, due to the nature of DTNs, proper delivery cannot always be guaranteed. A few approaches consider the integration of 3G and VANET. However, they assume that either the network has a good connectivity [4] or the trajectories of the nodes are known [5].
We formulate the problem as an optimization problem of selecting seed vehicles (that download files via 3G). According to the number of opportunities that a node can download data from the Internet within the TTL (time to live), we propose a single-stage algorithm and a multistage algorithm for solving the optimization problem. The main idea is to reduce the 3G cost by exploiting the vehicle-to-vehicle communications. We have evaluated our algorithms and the results show that both the single-stage and the multistage algorithms outperform the baseline algorithm.
The key technical contributions are summarized as follows.
We formulate the use of vehicle-to-vehicle communications to assist 3G downloads as an optimization problem of minimizing the 3G cost with the objective of meeting all demands of requesting nodes. We design two algorithms that capture the property of the equal importance of 3G users, bandwidth limitation of vehicle-to-vehicle communication channel, and time-dependent transmission situation. Comparisons of algorithms have been performed based on the real vehicular traces that have been collected from taxis in Shanghai and Shenzhen. Results show that our proposed algorithms have made significant improvements on reducing the 3G cost.
The remainder of this paper is organized as follows. The related work is discussed in Section 2. Empirical study is presented in Section 3. Section 4 presents the network model, the analysis of data transmissions in VANETs, and the problem formulation. We propose the single-stage and the multistage algorithms in Section 5. The performance evaluation based on real vehicular traces is presented in Section 6. Finally, we conclude our work in Section 7.
2. Related Work
In recent years there is an increasing research attention to VANETs [6]. Data sharing has widely been studied in both cellular networks and VANETs [4, 5, 7–9]. In this section, we briefly review the related work from three aspects: cellular networks, VANETs, and VANET-3G integrated networks.
2.1. Data Sharing in Cellular Networks
The cellular networks are ubiquitous for their convenience and efficiency. To reduce the cost of the telecommunication using cellular network, seminal work [10] proposes a unified cellular and ad hoc network (UCAN) architecture for enhancing throughput of cellular link with IEEE 802.11 link. Some studies [11–13] exploit secondary channels such as WiFi or Bluetooth to assist the mobile users with sharing data “pulled” via cellular network. In [12], a proxy scheduling scheme is investigated to deliver multimedia content to mobile peers. In [11], a multimedia content distribution problem is solved in a similar scenario and a fully distributed and cost-effective protocol is proposed to distribute the content to mobiles in a peer-to-peer manner. To emphasize the fairness issue, work [13] minimizes the energy assumption of file-sharing under cellular/Bluetooth networks. To augment 3G downlink rate among multicast receivers, the approach in [14] proposes a localized greedy algorithm to discover proxy that forwards the packets to the receivers through an IEEE 802.11-based ad hoc network. And all the scenarios share some similarity with our problem, in which we focus on the data pulled from the cellular network and then shared among the mobile vehicular users.
2.2. Data Sharing in DTNs and VANETs
A good survey on routing and data dissemination can be found in [15]. Neglecting the process of 3G downloads, VANETs assisted data dissemination in a sparse vehicular network is similar to the problem of data dissemination in DTN. Epidemic routing [16] proposes a basic idea for data dissemination in DTN. Reference [17] proposes a “communication efficient” swarming protocol which uses a gossip mechanism that leverages the inherent broadcast nature of the wireless medium and a data-selection strategy that takes proximity into account in decisions to exchange data. Recently, numerous forwarding strategies have been introduced in DTNs [18, 19]. Based on the social concepts “centrality” and “community,” [2, 20] exploit the property of community to select relays or forwarding path, and centrality is implemented in [2, 3]. Inspired by the aforementioned works, we also exploit the social forwarding path of vehicular network to analyze the nature of data dissimilation in the real trace based vehicular network.
2.3. Integration of 3G and VANET
In recent years, WiFi is exploited to assist and enhance 3G communication in [1, 21] in mobile vehicular scenarios. Furthermore, the approach in [4] proposes a VANET-3G integrated network architecture and investigates a cluster-based gateway selection strategy to link VANET to 3G in a good connectivity scenario that is different from our sparse scenario. In [5], Mongiovì et al. study a similar problem to our problem with the goal of minimizing the cost of remote communications (e.g., 3G). However, they assume the mobility trajectories of nodes in the network are known in advance.
Different from [4, 5], our problem is modeled as the initial data seeds selection via 3G channel and the data dissemination in vehicular networks based on the social forward path considering bandwidth limitation.
2.4. Routing in Vehicular Ad Hoc Networks
Vehicular trajectories have been exploited for routing in vehicular ad hoc networks [22] since the availability of future trajectories significantly reduces the uncertainty with vehicular mobility.
TBD [23] is a routing approach for using trajectories to forward data from vehicles to a given roadside access point (AP) in a light traffic vehicular network. Each node estimates the delivery delay to the AP based on its trajectory, which is then used as the metric for making forwarding decisions. Wu et al. [24] propose to predict the future location of a vehicle by modeling the mobility of a vehicle as a multiorder Markov chain and then estimate the encounter probability of each pair of vehicles. TSF [25] makes use of roadside units (RSUs) and trajectories to forward data from a fixed roadside unit (RSU) to a moving vehicle.
3. Empirical Study
In this section we show the fact that different vehicles have very different capabilities of broadcasting a data packet in the network through simulations based on real traces collected from taxis in two metro cities in China.
3.1. Datasets of Real Traces
We have two large-scale GPS datasets collected from taxis in Shanghai and Shenzhen, two representative cities in China. The details of two datasets are summarized in Table 1. We randomly choose 500 nodes from two datasets, respectively. With these traces data, we can extract the real driving trajectory of each taxi and then export the contacts between each pair of two vehicles. Thus, we can use the traces to simulate the mobility of vehicles.
Dataset information.
3.2. Revealing Different Broadcasting Capabilities of Vehicles
To reveal the fact that the broadcasting (sending a data packet to all other nodes) capabilities of different vehicles can be very different, we have conducted extensive simulations based on the real vehicle traces. In simulation, a vehicle (called seed vehicle) downloads a data item and broadcasts the single data packet to all the other vehicles in the network through in-vehicle communications. For both of these datasets, the number of forwarding hops is limited to two. The simple greedy forwarding strategy is adopted in simulation.
In Figures 2(a) and 2(b), the vehicle IDs (identity number) of 150 seed vehicles are sorted by their corresponding delivery ratios, and the IDs are then revised (e.g., after revising IDs, seed vehicle with ID = 1 has lowest delivery ratio and seed vehicle with ID = 150 has the largest delivery ratio). We report the delivery ratios achieved by different vehicles. In Figure 2(a), we can see that the capability of broadcasting of each vehicle ranges from 0 to 60% in the Shanghai dataset, and in Figure 2(b), it ranges from 0 to 50% in the Shenzhen dataset. Clearly, the results suggest that it is essential to select the good set of seed vehicles for achieving good performance. Based on such observation, we propose an approach to select vehicles contributing larger delivery ratios and helping reduce the cost of 3G downloads.

Delivery ratio of different seed.
4. Preliminaries and Problem Formulation
The major notations adopted in the paper are summarized in Table 2.
Major notations adopted in the paper.
4.1. Preliminaries
We consider a network of mobile nodes (e.g., vehicles), denoted by
In our model, each sharing data object in the network has to be the newest; in other words, a TTL value T is attached to each data item. A data object has to be transmitted to a demand node within T. We consider there are diverse kinds of data objects,
4.2. Problem Formulation
In this paper, we want to select a minimized set of nodes that would download data through the 3G, meeting all data demands of the nodes in the network, under the time constraint of TTL. Next, we will introduce some basic concepts for modeling the VANET-assisted data download and sharing problem.
4.2.1. Two-Hop Forwarding Path
We introduce the concept of 2-hop social forwarding path proposed in [2]. A two-hop social forwarding path from a source s to a destination d via a relay r is denoted by
The intercontact time between two nodes is exponentially distributed,
Theorem 1.
For a 2-hop social forwarding path
Due to the limited pages, the details of the proof can be found in the Appendix of [2].
As a result, the probability
4.2.2. Modeling Bandwidth Limitation
It is assumed that the capacity of transmission is limited in this paper, and the size of each data item is identical, denoted by z bytes. We assume that there are average C bytes that can be transmitted during one transmission. Namely, there are at most
Definition 2.
Suppose that there are
The number of data items in relays' buffers is different at different time t (
4.2.3. Weight of Seeds
We denote the set of nodes that are chosen to download data through 3G by a seed set, denoted by
Definition 3.
The weight
Obviously,
We define an indicator variable
Definition 4 (seed selection problem).
Given the vehicular network with nodes
where δ and ϵ are small number and
5. Design of Algorithms
In this section we present the design of the two algorithms. First, we give the basic idea of the algorithms and then describe the details of the two algorithms.
5.1. Basic Idea
We divide the process of data sharing into different stages according to the TTL value of data objects, T, and each stage has equal time period. During the start of each stage, a decision has to be made whether to download data objects through 3G or not. We assume the generated time of data objects and their corresponding TTL values are known in advance.
Figure 3 shows an example. One data item is created at 0 s, and the corresponding TTL value of the data is T seconds. There are n stages in Figure 3. T is divided into n equal periods. The vertically downward arrow at the beginning of each time period denotes that a node has a chance of downloading data through 3G. For example, there are 3 decision chances during T when

Basic idea of our approach.
In perspective of seed downloading process, we can classify download strategies into three categories: (1) only one user is randomly selected as seed for each data item and then shares it through VANETs, and 3G users who do not obtain the data via vehicular communication channel download the data at the last second before time constraint T; (2) 3G users implement download decisions at the beginning of time and a set of users are selected as seeds; (3) 3G users make downloading strategies along the time other than just at the beginning. Obviously, all the strategies request that 3G users have to download the required data before the last second of deadline.
As we can see, strategy 1 is a naive method which can be easily implemented in the integration networks. We mainly focus on the second and third downloading strategies. These two strategies can be regarded as single-stage decision and multistage decision along the time, respectively. In our problem, single-stage decision is made at the beginning when the data are injected into the networks. And for the multistage decision, we study the two-stage decision that makes download decision at times 0 and
5.2. Single-Stage Algorithm
In this subsection, we consider a simple scenario: single-stage strategy. In this strategy, users make 3G downloads decision at the zero time when there are new data requiring to be pulled via the 3G channel. Recall that all the 3G users need to obtain the fresh data through either 3G channel or vehicular channel within the time constraint T. To transmit data as much as possible through vehicular networks, seeds of data have to be chosen carefully and responsibly. We have formulated the seed selection problem (11) and solve it by using an efficient greedy algorithm. And the download decisions
Greedy Algorithm. A greedy algorithm is proposed in Algorithm 1 to solve the seed selection problem presented in (11). In each loop, a best seed for one data item is selected based on the improvement of
(1) (2) (3) (4) (5) (6) (7) Set u as the seed of p, and update (8) (9) (10) (11) (12) (13) (14) (15) (16) COMMIT user (17) (18) (19) (20) (21) return
5.3. Multistage Algorithm
Although the single-stage strategy is simple to implement, it may not be cost-effective enough, since single-stage strategy could not capture the situation of overall network just based on the empirical value
Similar to the single-stage strategy, the multistage strategy still works in a centralized way. In this multistage mode, our algorithm needs to collect and update some information about a node timely, for example, the buffer condition. We mainly care about two kinds of information of the buffer: the current used buffer size and data IDs. The former is an essential factor on modeling transmission opportunity under bandwidth limitation and the latter can help to recalculate weights.
Recalculate Weight. Suppose that there are n stages in the total process; the TTL of the sharing data is T. Before each decision is made, up-to-date information mentioned before is collected by 3G destination users and relays. New decision is based on the weights that are up-to-date. In the following formula, we take
We utilize a new greedy algorithm, illustrated in Algorithm 2, in which greedy Algorithm 1 uses new weight
n (total number of stages), m (number of current stage), small number δ, T (TTL of data). (1) (2) Collect current situation (3) {Using T and (4) Return
6. Performance Evaluation
In this section we first present the evaluation methodology and the simulation setup and then discuss the evaluation results.
6.1. Methodology and Settings
We conduct simulations driven by real vehicle traces to evaluate the performance of the proposed schemes. In addition, we compare the proposed schemes with several competing schemes.
In simulation, we used two datasets of real vehicle mobility traces. The traces were collected from taxis from two major cities in China, that is, Shanghai and Shenzhen. The Shanghai dataset has around 4,000 taxis, and the Shenzhen dataset has around 12,000 taxis. The trace of a taxi records the GPS coordinates of the taxi on the interval of every 30 seconds. With the real traces, the exponential distribution rate
We use the ONE simulator [27] to simulate the vehicular ad hoc network and the data sharing process. The real traces are fed into the ONE simulator to drive the vehicles. With this simulator, we do not simulate the radio signal propagation. In simulation, two vehicles can communicate as their distance is smaller than the communication range. We simplify the data transmission of a wireless link by assuming that one data message can be transmitted as two vehicles encounter each other. On the link layer, we adopt the CSMA as the media access control (MAC) protocol. The communication range is set to 200 m.
The default number of data items that each user needs to download is five. The number of
We compare the proposed schemes with the following three competing schemes.
(i) Random. Downloading one data item for each unique data item at the beginning.
(ii) Single-Stage. Algorithm (one-stage algorithm) making the 3G downloads decision at the beginning: for the algorithm,
(iii) Two-Stage. Algorithm in which decisions are made at the beginning and
We use the following four important performance metrics.
(i) Actual Average Delay. Actual average delay is the average value of the delivered data through vehicular network and data via 3G channel. Note that the delay of data downloaded at the end is actually the time of TTL.
(ii) Delivery Data Number. It is the number of data delivered through vehicular channel.
(iii) Ratio of 3G Downloads. It is the ratio of downloaded data through 3G channel to the sum number of data that needed to reach the 3G users.
(iv) Overhead Ratio. It is the ratio of number of relayed data over the number of delivered data during the process of vehicle-to-vehicle communication.
For all reported data points in the following, we repeated the simulation with the same configuration 20 times and the average is reported.
6.2. Impact of Number of 3G Users
We increase the number of 3G users in group

Average delay versus number of 3G users (Shanghai).

Ratio of 3G downloads versus number of 3G users (Shanghai).

Delivered number of items through vehicular networks versus number of 3G users (Shanghai).

Average delay versus number of 3G users (Shenzhen).

Ratio of 3G downloads versus number of 3G users (Shenzhen).

Delivered number of items through vehicular networks versus number of 3G users (Shenzhen).
6.3. Impact of Traffic Load
We consider the impact of traffic load on the performances. Each user needs to download more data when increasing the total number of data that need to be downloaded. Thus the traffic load in the network increases. Figures 10 and 13 illustrate that one-stage and multistage algorithms have lower average delay than the random one, and one-stage algorithm is still the best. Ratio of 3G downloads in Figure 11 indicates that two-stage algorithm is slightly less than the random one, since the default number of 3G users is 50, which indicates that two-stage algorithm performs closely to the random one in Figure 5, and one-stage algorithm performs best as the traffic load increases. In Figure 14, the two proposed algorithms perform closely to and better than the random one. In Figures 12 and 15, one-stage and two-stage algorithms utilize the capacity of vehicle channel more as the traffic load increases.

Average delay versus total number of data items (Shanghai).

Ratio of 3G downloads versus total number of data items (Shanghai).

Delivered items through vehicular networks versus total number of data items (Shanghai).

Average delay versus total number of data items (Shenzhen).

Ratio of 3G downloads versus total number of data items (Shenzhen).

Delivered items through vehicular networks versus total number of data items (Shenzhen).
6.4. Communication Overhead
It seems that one-stage algorithm performs better than the multistage one, since it has more time to utilize the capacity of vehicle-to-vehicle communication. Though vehicle-to-vehicle communication is of low cost on money, cost on communication should be taken into consideration when evaluating an algorithm in vehicular networks. When we consider the cost of delivering data in the vehicular networks, we find that multistage algorithm is less costly. According to Figures 16 and 17, one-stage algorithm has larger overhead ratio under most cases, because two-stage algorithm collects the information about the transmission process of vehicular networks and makes up-to-date downloads decisions that lead to less overhead in vehicle-to-vehicle communication.

Overhead of delivered data under different number of 3G users.

Overhead of delivered data under different traffic load.
6.5. Summary
In most situations, the proposed one-stage and multistage algorithms both perform better than the random one. One-stage algorithm achieves the lowest delay because of its property of earliest downloading. And two-stage algorithm has lower cost on vehicle-to-vehicle communication since it takes network's situation of each stage into consideration. Both proposed algorithms lower the cost of 3G downloads. From our simulation, the performance of the proposed algorithm is also sensitive to factors δ and ϵ. Meanwhile, we assume the time period of each stage is equal, which is limited in our approaches. Dynamic stage chosen approaches are left for future work.
7. Conclusion
In this paper we have modeled the 3G downloading and sharing problem for 3G vehicular users where vehicle-to-vehicle communications are exploited. A two-hop social forwarding path in vehicular networks is analyzed as the basis of the sharing problem, based on which single-stage and multistage downloading approaches are proposed to solve the key problem of seed selection. The single-stage and multistage algorithms are evaluated with real trace based simulations against other alternative algorithms. The results demonstrate that the proposed single-stage and multistage algorithms can make significant improvements on data delay and ratio of 3G downloads in different scenarios and achieve the goal of minimizing the cost of 3G data downloads.
In our work, we have assumed exponentially distributed ICT for the contact time in a pair of vehicles. To gain this information, we rely on the contact records between two vehicles and compute the average interval to estimate the distribution parameter. It is clear that as we have a longer history of contact records, we could have a better estimate for the ICT parameter.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The research is supported by China 863 Program (no. 2013AA01A601).
