Abstract
We study the problem of transmitting scalable video coded streams in a hybrid scenario including a global cellular network and a local peer-to-peer network. Given desirable channel condition in cellular downlink, some cellular user equipments can fetch the video chunks via the cellular base station directly, and also provide relay service via the local peer-to-peer network for other user equipments unable to meet their basic video quality. To take the advantage of channel diversity gains, the method of random linear coding is adopted to generate linear combinations of the video chunks for transmission in both the cellular and peer-to-peer networks. To optimize the cellular and peer-to-peer transmission arrangement, we first formulate it as a mixed integer nonlinear programming problem, which becomes more complicated when the number of user equipments increases. Then, we convert it to a quasi-convex optimization problem by approximating the indicator function with a continuous step function, and it can be solved by a centralized approach. Furthermore, a primal-dual decomposition approach is developed and a distributed algorithm is proposed accordingly. Simulation results show that the proposed approach achieves a near-optimal solution.
Introduction
According to a recent statistics, global Internet video traffic accounted for 57% of all consumer traffic in 2012 and will even rise to 69% in 2017. 1 More than a half of watching online video activities are via various wireless devices, for example, smartphone and tablet. Supporting video streaming in wireless cellular networks becomes a challenging issue, given the large bandwidth consumption, time-varying wireless channels, and more higher video quality requirements from various wireless terminals.2–4 Recently, the scalable video coding (SVC) technology has been proposed.5,6 It is quite attractive since the video streaming under wireless networks can be adaptive according to the network conditions, user equipment (UE) constraints, and so on. 7 However, streaming video content in wireless networks is still one of the most bandwidth-hungry applications compared with other Internet applications.
In a loss-prone wireless network, cooperative transmission of nearby users is an effective way to achieve the desirable video quality.8,9 User cooperation can be implemented via low-cost local connections, for example, local peer-to-peer (P2P) connections and device-to-device (D2D) connections, which is a promising technology in cellular networks. Considering the varying wireless channel conditions, traditional cooperative transmission requires frequent feedbacks among the cooperative users, and thus it can consume much communication resources. To conquer this problem, network coding is often used in the broadcast/multicast scenarios.10–14 In our work, we also adopt one of network coding approaches, namely, random linear coding (RLC), for encoding each video chunk by the base station (BS) or server. Different from the existing work where the BS unicasts videos to users, we employ broadcast in cellular transmission, where the transmission rate does not cater to the lowest receiving rate of UEs for broadcast efficiency. Correspondingly, some UEs with much worse-downlink channel conditions, that is, such UEs’ receiving rates are lower than the cellular transmission rate, cannot receive the cellular broadcasted chunks at all. Those incapable of obtaining video chunks from the cellular network can purchase relaying service from other UEs with better cellular downlink conditions at the monetary cost. Another difference is that some existing works15–18 utilize the D2D connections in the local transmissions without considering transmission delay, for only one-hop transmission is allowed. In contrast, there is no one-hop limitation when we multicast the video streams in the local P2P network, but the transmission duration has to be addressed. As a result, both the BS and UEs can cooperate under the broadcast/multicast manner, which will be more efficient when a large number of terminals request the same video.
Although we enable all UEs to multicast their peer nodes, the achieved video rate will be affected by both the cellular network and local P2P network, which plays a significant role on video playback quality. For multi-layer SVC video streaming, different layers have different effectiveness in users’ perceived video quality, and one high layer usually relies on all its lower layers to decode. Intuitionally, the lower layers should be allocated with more transmission resources, which guarantees the higher layers can be decoded. Otherwise, the higher layers become invalid. Furthermore, only the UEs, that received sufficient chunks from both the BS and their peers in advance, are allowed to multicast. The UEs, that received sufficient RLC-encoded chunks to reconstruct one entire layer, are called “matured” nodes. After reconstructing a layer successfully, the matured UE can be re-encoded for multicasting its unmatured peers. However, unmatured to matured state transfer occurs in the midway of transmission, the precise time is hardly known, which results in an intractable constraint in the system modeling. Those difficulties are the main concerns that we need to deal with in this work. Our objective is to tradeoff the achieved video playback quality and transmission cost, and design an optimal transmission scheduling algorithm among video layers considering the cooperation of cellular and local P2P transmissions. Our main contributions are summarized as follows:
We formulate the problem of cooperative transmission in cellular network and local P2P network for multi-layer SVC streaming as a mixed integer nonlinear programming (MINLP) problem, in which the cellular transmission duration and the local P2P transmission duration are coupled.
By introducing a continuous step function, we convert the problem of cooperative transmission to a quasi-convex problem, where the UE’s unmatured to matured state transfer can be approximated by the quasi-convex step function. To decouple the transmission durations of cellular and local P2P networks, we design a pipeline work mode at the cost of postponed start-up video playback.
A near-optimal solution can be obtained by employing the dual-decomposition approach, and we also propose a distributed algorithm such that the BS and each UE can calculate its own transmission duration for each layer independently.
We conduct extensive experiments to verify our solution and evaluate the performance of our distributed approach.
The rest of this article is organized as follows. Section “Related work” summarizes the state-of-art of the video streaming in wireless networks. Section “System overview” presents the system model and formulates the optimization problem with respect to the reward and traffic cost in section “Problem formulation.” Section “Approximation of the primal optimization problem” decouples the proposed optimization problem by a quasi-convex step function and relaxing the start-up playback time, and solve it in a centralized way. In section “Distributed algorithm design,” we design a distributed algorithm for each node to calculate its own broadcast traffic duration independently. Section “Performance evaluation” evaluates the proposed approach with extensive simulations, followed by the concluding remarks in section “Conclusion.”
Related work
To improve the perceived video quality in the loss-prone wireless networks, much more attentions have been focused on transmission cooperation.8,10–12,19 Jointly considering the cellular transmission and local UEs’ transmission, Abedini et al. 10 proposed a provable minimum-cost algorithm to guarantee each user’s quality of service (QoS), which took the BS stopping time under unicast and local UEs’ broadcast scheduling into account. Bethanabhotla et al. 11 designed a dynamic scheduling policy for transmitting video streaming in wireless networks, where multiple nearby nodes can help each to fulfill their video quality requirement. They formulated it as a network utility maximization (NUM) problem, and then solved it using the Lyapunov drift plus penalty approach. Le and colleagues 8 proposed a network utility framework for maximizing video quality using cellular network and local network communication resources, in which i was formed by the D2D links. Xing and colleagues12,19 investigated the problem of improving video streaming quality while reducing the wireless service cost by utilizing multiple links simultaneously. Most of the existing approaches assume that all nearby terminals are completely cooperative and can also communicate with each other within one hop.
Since the transmission cost in wireless networks is a significant issue that greatly affects the video quality, lots of efforts have been paid on the energy cost arising from streaming video in wireless scenarios.13,20–22 Duong et al. 20 proposed an energy-aware approach for bit-level rate allocation in cellular networks with the assistance of D2D communications, in which they considered how to save communication resources for video streaming with higher transmission rates and larger sizes. Siekkinen et al. 21 investigated the tradeoff in energy waste between prefetching small and large chunks of video content, and proposed an algorithm called eSchedule that used viewing statistics to predict viewer behavior and designed an energy optimal download strategy. In order to minimize the energy consumption of mobile devices and the load on cellular networks, Almowuena et al. 13 proposed a strategy to concurrently utilize unicast and multicast for streaming video. Atawia et al. 22 addressed the problem of predictive resource allocation for energy-efficient video streaming. In their work, by offering a mechanism to control the probability constraint satisfaction, operators may control the tradeoff between energy savings and the risks associated with erroneous predictions.
In order to take advantage of the diversity of wireless channels, some network coding approaches, for example, RLC, are usually adopted while streaming video content in wireless networks.8,10,11,14 As discussed in section “Video model,” one video content consists of a sequence of a group of pictures/frames (GoPs). Accordingly, one common way is to perform RLC across all frames within one GoP, and then the resulting coded chunks are delivered to all users via the wireless networks. All the video frames contained by one GoP can be decoded at terminals once they have received a certain number of coded chunks, no matter what the arriving order is.13,15,16 Thanks to the flexibility and adaptability of RLC; the users incapable of receiving sufficient coded chunks can be made up by the neighbors.
The previous works indicated that using both the cellular and local network resources can improve video transmission efficiency substantially. However, they often used unicast in the cellular downlink and broadcast in one-hop D2D network separately. In this case, the BS has to separately consider each UE’s requirement and prepare a unique coded-chunk sequence. Consequently, there are no duplicated chunks on any UE even if they are received from multiple sources.17,23 Such an approach is not scalable with the increasing number of UEs requesting the same video content simultaneously. Different from the previous approaches, in this work, both the BS and UEs can broadcast/multicast coded chunks to the neighbors and there is no need for an exclusive video sequence preparation in advance and no one-hop constraint as well. Thus, our approach can be more scalable.
System overview
Network model
Considering a cellular network scenario, multiple UEs are in a cell with one BS, where all UEs, denoted by
Note that some video chunks broadcasted to the better-downlink UEs are also possibly lost due to the dynamic cellular network scenario. Although the BS can be used to identify the cellular downlink condition, the receiving probabilities of UEs could be different, which are independent and identically distributed (i.i.d.).
8
Correspondingly, we assume that the UEs’ receiving probabilities from the cellular network, denoted by
Through the local network interface, all the UEs can connect with each other directly or via local network access points indirectly, which forms a P2P local network. Therefore, each UE can receive video streams from not only the BS via the cellular downlink but also from peer UEs via the P2P network. In the local P2P network, the receiving probabilities, denoted by
Video model
In a multi-layer SVC system, the GoP is the basic unit for video playback, which specifies the order of intra- and inter-frames being arranged.5,24 Generally, one video stream consists of a sequence of GoPs, and each GoP has a unique time for playback. This time is called playback deadline, meaning that the referred GoP has to be delivered to the terminals before this playback deadline, and it becomes invalid otherwise. Assume that each GoP has an equal playback duration, denoted by
We can take H.264 SVC stream as an example. 24 As shown in Figure 1, it illustrates a two-layer structure with temporal and spatial scalability, which includes one base layer (lower layer) and one enhancement layer (upper layer). The temporal scalability means that the frame rate of stream can be scalable according to the network conditions or the terminal constraint. It can be found that the frame rate is higher in the upper layer than that in the lower layer. Furthermore, each frame in the upper layer owns a larger size since it enhances the video quality by increasing the spatial resolution, and thus such scalability can be suitable for UEs with larger screens.

Illustration of a two-layer structure with additional inter-layer prediction. One frame connected with two adjacent frames via dotted line means that it can be predicted by the two connected frames.
As discussed in the introduction, the RLC is adopted in our model, where each layer is partitioned into a number of equal-size chunks, and correspondingly, an RLC-encoded chunk is a linear combination of the partitioned chunks. With such a technique, the UE can recover one layer once it has received sufficient RLC-encoded chunks with respect to the specific layer. The number required to recover one layer is called decoding threshold. Given the decoding threshold
where
System process and problem description
Based on the network model and video model, the system process can be described as follows:
The RLC-encoded video chunks fetched from the video server are first broadcasted to all the UEs under the cellular network. The BS needs to optimize the transmission duration for each layer considering the playback deadline.
When one UE has received sufficient number of UEs that exceeds the decoding threshold, they can recover this specific layer correctly. In the following, such UE that can recover Layer
For an l-matured UE, it can then re-encode Layer l with RLC for multicasting its peers in the local P2P network (to avoid duplicated RLC-encoded chunks, each node including the BS and UEs will be pre-allocated with an exclusive linear coefficient field, such that each RLC-encoded chunk is unique. Thus, once the number of received encoded chunks for one layer reaches the decoding threshold, that is,
Note that there is no feedback for packet loss during both the cellular and P2P transmissions, and the BS or UEs simply generate new RLC-encoded chunks for transmission.
The addressed problem in this work is described as follows. Naturally, to maximize the overall video quality of all UEs is the main objective. Each UE’s video stream can come from two sources, one is from the cellular downlink and the other is from the peer UEs via P2P network. However, maximizing the video quality on the whole would sacrifice some UEs’ video quality for their experienced poor cellular downlink or P2P link conditions. There are several constraints that should be taken into account when we maximize the overall video quality. The first constraint is the transmission rate of one multicast source, which should be no larger than the minimum receiving rate of all its peer UEs. Considering the transmission power limitation, the multicast source rate should take the minimum value from the minimum receiving rate and its supportable maximum transmission rate. The second constraint is the minimum transmission rate for base layer delivery, which can guarantee the smooth playback of all the UEs with the basic video quality. The third constraint is the bound on the traffic volume, which means that the multicast source has a preferred traffic volume due to its transmission resources limitation. The fourth constraint is the transmission duration that each GoP should be delivered to the UEs before its playback deadline, and it becomes invalid otherwise. The last one is the budget constraint, which means that the peer UE pays an equivalent price for its receiving traffic from the multicast source, and the the total pay has to meet with the budget constraint. Considering these mentioned constraint, how to jointly optimize the BS and local P2P transmissions to maximize the video playback quality is the main objective of this work.
Problem formulation
Receiving rate and traffic volume
Since all the UEs can work cooperatively under the cellular network and the local P2P network, the total received chunks of Layer
where
where
Correspondingly, the achieved video rate at Layer l can be calculated by
which can also be achieved by aggregating all the incoming rates of Layer l.
The constraint for base layer delivery
If one UE cannot recover the base layer before its playback time, one stall event occurs and the UE has to stop the playback and wait for the chunks of base layer, even when the higher layers have been delivered. To guarantee no stall event happening in the video playback, it is necessary to guarantee that the base layer rates of all UEs are not less than the required minimum rate, which is given by
where
The constraint to be a multicast source
Note that UE
The bounds on the traffic volume
For an l-matured UE
where
Note that the upper bound of
Transmission duration constraint
Due to the fact that each GoP has its own playback deadline, that is, it has to be delivered to all the UEs before this moment, otherwise, it becomes invalid for missing its playback deadline. According to the proposed system model, each GoP may experience two periods, namely, transmission in the cellular network and the local P2P network. The transmission durations in two parts are denoted by
where
The budget constraint
According to the proposed system model, the UEs incapable of being matured from the cellular network can get help from the matured UEs at the monetary cost. The total received lth traffic of UE
and thus the total payoff of UE
Generally, each UE will have a higher budget on the lower layer, and a smaller one on the higher layer for the lower layers, playing a more significant role in video playback. Therefore, we have
Objective function
Our objective is to maximize the video quality reward of all UEs considering the traffic cost and other constraints. The primal optimization problem can be formulated as
where
Remark 1
The weight of cost function
Remark 2
In the primal optimization problem P0, it can be found that the reward function
In the primal optimization problem P0, the constraints (5), (9), and (12) are linear. In constraint (7), the value of function
Approximation of the primal optimization problem
Decouple the cellular and P2P transmission durations
In constraint (10), the tolerance-efficient
We use an example to illustrate our basic idea

Sketch of transmission pipeline for multiple GoPs.
As a start point, we use
Approximation of the indicator function
The objective function and all constraints to be differentiable is the necessary condition as gradient-based solution techniques for convex/concave programming. Because constraint (7) is discontinuous, Problem P0 cannot be solved by the gradient-based method directly. Inspired by the work of Gaudette et al.,
26
and as an extension of our previous work,
27
we introduce a continuous step function to approximate the binary function in constraint (7), The continuous step function is expressed by

Schematic diagram of the step function.
The step function
Correspondingly, constraint (7) can be rewritten as
Correspondingly, constraint (7) can be approximated by constraint (16), where the step function is quasi-convex.
The approximation of Problem P0
Since
Since the step function is quasi-convex and all the variables are continuous, the above formulation is a concave optimization problem, which can be solved by the gradient-based method.28,29 In the following section, we develop a distributed traffic allocation algorithm for cellular and P2P transmissions.
Distributed algorithm design
In order to simplify the development of the distributed algorithm, we first neglect the budget constraint (12). If the solution cannot satisfy the constraint, we can deduct the purchasing traffic from the higher layer to the lower one correspondingly. Second, the transmission duration constraints (14) and (15) can also be neglected, because the
Let
where
To attain a distributed algorithm, we can decompose Problem P1 to be
Based on the designed model, the basic video quality should be guaranteed, that is, inequality (5) holds for base layer. Other layers
Accordingly, the sub-gradient method is employed to update the Lagrangian multipliers
where
Performance evaluation
Simulation settings
In this section, numerical simulation results are provided to demonstrate the performance of the proposed approach. In our simulations, there are 10 UEs in one cellular network with one BS. Considering the dynamic local P2P network, the P2P connections are generated randomly, and each UE is served by at most two peers. The topology is shown in Figure 4, where the UEs under the dotted red line cannot be satisfied for their low receiving probabilities. The transmission rates and receiving probabilities are set as shown in Table 1. Note that in the cellular receiving probability settings, we only show the receiving probabilities of UEs 1–5, other UEs under the dotted red line in Figure 4 with receiving probabilities lower than the minimum receiving probability

Experimental topology.
Transmission/receiving rate/probability settings.
Considering load balance of the network, we use
Simulation results and analysis
Note that all the receiving probabilities of UEs are generated randomly, and each UE receives the video chunks according to the generated probability. For instance, the receiving probability, 0.95, of one UE from the BS means that this probability is randomly distributed in interval
Figure 5 shows the achieved rewards by the UEs under different receiving probabilities, where

Rewards of UEs under different receiving probabilities from the cellular network.

Received traffic of UEs from the BS under different receiving probabilities of the cellular network.
The traffic costs including UEs and BS are shown in Figure 7, where

Traffic costs of UEs and BS under different receiving probabilities from the cellular network.

Traffic costs of UEs and BS under different receiving probabilities in the P2P network.
The following simulation is conducted with different cellular transmission rates, as shown in Figure 9, where

Achieved rewards of UEs under different cellular transmission rates.
In this simulation, we use a manually configured P2P network topology with different P2P transmission rates, where the P2P connections are as follows: UE 1 → UE 6–10, UE 2 → UE 7–10, UE 3 → UE 8–10, UE 4 → UE 9–10, and UE 5 → UE 10, such that the unmatured UEs can be served by multiple peers. The simulation result is shown in Figure 10, where

Achieved rewards of UEs under different P2P transmission rates.
To verify the convergence of the proposed distributed approach, we conduct the last simulation with Probability 4 in both P2P and cellular networks. The simulation result is shown in Figure 11, where the line with legend Convergence rate 1 stands for the proposed approach, and the line with legend Convergence rate 2 stands for the approach in Feng et al. 27 It can be found that the proposed approach can converge to the optimal value obtained by the centralized approach. Since more factors are considered in our model, for example, budget limit and dynamic P2P connections, the convergence rate of the proposed approach is a little slower than that in Feng et al., 27 but it is still rational considering the more complicated network scenario.

Convergence of the proposed approach.
Conclusion
In this article, we have studied the problem of transmitting multi-layered SVC video in the cellular network. Our objective is to tradeoff the achieved video playback quality and traffic cost considering the local P2P connections formed by the UEs. We first construct an MINLP model, in which only the matured UEs can serve as multicast source in the local P2P network. Then, we convert it to a convex model by introducing a step function and decouple the transmission durations between cellular and P2P networks by postponing the UEs’ start-up playback. Thereafter, we developed a distributed algorithm to obtain its near-optimal value. Extensive simulations are performed to demonstrate the efficiency of our approach. For the future work, we will focus on investigating incentive mechanism for UEs’ cooperative transmission in wireless network.
Footnotes
Acknowledgements
The authors thank Prof. Jizhong Han’s efforts on smoothing the whole paper language, who is a full stuff in the Institute of Information Engineering, Chinese Academy of Sciences.
Academic Editor: Wei Yu
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (nos 61502118, 61402127, and 61370212), the Fundamental Research Fund for the Central Universities (no. HEUCFM160602), and the Natural Science Foundation of Heilongjiang Province in China (nos F2016028, F2016009, and F2015029).
