Unequal loss protection scheme using a quality prediction model in a Wi-Fi broadcasting system

Abstract

Wireless local area network–based broadcasting techniques are a type of mobile Internet Protocol television technology that simultaneously transmits multimedia content to local users. Contrary to the existing wireless local area network–based multimedia transmission systems, which transmit multimedia data to users using unicast packets, a wireless local area network–based broadcasting system is able to transmit multimedia data to many users in a single broadcast packet. Consequently, network resources do not increase with the increase in the number of users. However, IEEE 802.11 does not provide a packet loss recovery algorithm for broadcast packet loss, which is unavoidable. Therefore, the forward error correction technique is required to address the issue of broadcast packet loss. The broadcast packet loss rate of a wireless local area network–based broadcasting system that transmits compressed multimedia data is not proportional to the quality deterioration of the received video signals; therefore, it is difficult to predict the quality of the received video while also considering the effect of broadcast packet loss. In this scenario, allocating equal forward error correction packets to compressed frames is not an effective method for recovering broadcast packet loss. Thus, several studies on unequal loss protection have been conducted. This study proposes an effective, prediction-based unequal loss protection algorithm that can be applied to wireless local area network–based broadcasting systems. The proposed unequal loss protection algorithm adopts a novel approach by adding forward error correction packets to every transmission frame while considering frame loss. This algorithm was used as a new metric to predict video quality deterioration, and an unequal loss protection structure was designed, implemented, and verified. The effectiveness of the quality deterioration model and the validity of the unequal loss protection algorithm were demonstrated through experiments.

Keywords

Unequal loss protection video quality prediction model Wi-Fi broadcasting system

Introduction

High-speed network techniques utilized in smartphones, tablet personal computers (PCs), and laptops have quickly expanded in applications to industry and have enabled users to watch multimedia broadcasts anywhere and at any time. This type of broadcasting is called mobile multimedia broadcasting.

The advantage of mobile multimedia broadcasting compared to digital multimedia broadcasting (DMB), digital video broadcasting—handheld (DVB-H), and multimedia broadcast multicast service (MBMS), which require a large-scale system for mobile multimedia broadcasting, is that its wireless local area network (WLAN)-based broadcasting system can easily set up a mobile multimedia broadcasting system.^1–6 From now on, the WLAN-based broadcasting system would be expressed as a Wi-Fi broadcasting system.

The WLAN-based broadcasting system is a broadcasting communication technology which is suitable for providing a digital broadcasting service in a narrow area and temporarily. The Wi-Fi broadcasting system uses Wi-Fi data transfer technology to transfer one component of multimedia data to multiple users. The system is composed of a broadcasting server, an access point (AP), and various receiving devices. The broadcasting server performs encoding processes to create and transfer multimedia data.^7,8 The encoded multimedia data, which are to be transferred by the AP, should determine the data transmission mode. The Wi-Fi data transferring mode includes both unicast and broadcast methods. The unicast method, which transfers multimedia data through Wi-Fi to users, needs to transfer the multimedia data multiple times to each user. Under circumstances where wireless resources are limited, the unicast data transmission mode might be limited, because its usage of wireless resources increases as the number of users increases. Because broadcast transmission mode enables the transmission of one piece of data to multiple users simultaneously, its service bandwidth does not increase with an increase in users.⁹

There are two causes of packet loss during data transmission over Wi-Fi. One is the congestion in the medium access control (MAC) layer, and the other is modulation or coding scheme in the physical (PHY) layer. This article does not reduce the loss itself, but how to improve the received video quality when frame loss occurs.

When losses occur while transmitting multimedia data through Wi-Fi, both modes try to recover packet losses in different ways. In the unicast method, the user who received the unicast packet from the AP transfers the acknowledgment (ACK) packet to the AP, which is when packet losses occur. When the losses occur at the unicast packet after the AP transfer, or at the ACK packet after user transfer, the AP is not able to receive the ACK packet from the user. The AP assumes that losses occur, if it did not receive a response from the unicast packet, and it then retransmits the packet to the users. In unicast transmission mode, this recovery method is called the loss recovery technique by retransmission. It is conducted at the MAC level of the Wi-Fi network.

However, the Wi-Fi AP does not receive the ACK packet from the user in exchange for the broadcast packet. If the user did not receive the broadcast packet transferred by the AP, it is assumed that packet loss occurred. In other words, the broadcast packet is more vulnerable to packet loss than the unicast packet. To address this simplex multimedia data packet loss problem, the forward error correction (FEC) scheme is used in general. The FEC scheme considers the packet losses and adds FEC data packets to the multimedia data packet to be transferred. For example, if there are three original data packets scheduled for transfer, one packet is added to the batch to compensate for packet loss. When those packets are transferred and users can potentially receive more than three packets, the original data packet is recoverable. If the users received less than three packets, they cannot recover the original packet. In order to recover the original data packet while considering the packet loss rate, it is essential that FEC determines the size of the original data set and the FEC data set.

The multimedia data transferred through a Wi-Fi broadcasting system is first encoded and then transferred. This ensures that an effective transmission will occur. The main goal of video encoding is to eliminate time redundancy; therefore, video encoding incorporates standard frames and reference frames. It is impossible to predict the location where packet loss occurs under the Wi-Fi broadcasting system. The packet losses are recovered by the FEC scheme, which is called equal loss protection (ELP). ELP schemes recover packet losses with the same recovery rate toward all video frames. However, considering the principles of encoding and decoding, it is not efficient to equally recover packet losses in terms of received quality of video frames. This is because the quality of video decoding differs based on the location where packet loss occurs, even if it occurs at the same packet loss rate. Therefore, the unequal loss protection (ULP) scheme has been proposed. In a case involving a multimedia encoding video frame, we classify the frame that affects every frame during decoding and the frame that does not affect any frame about packet losses. According to the basic principle of ULP, a high FEC rate is allocated to recover the former frame.

Under circumstances in which network resources are limited and the location of packet loss is unknown, it is not an optimal option to allocate an FEC packet only to recover the important frame considering the average quality of receiving video. In this study, in order to make a difference in the FEC rate to be allocated on each frame depending on the priority of importance, the expected quality of receiving video by frame loss would be used as a parameter. First, it should be possible to predict the quality of receiving video while considering the effect of frame loss. Using the multimedia data, which includes the packet loss rate and the number of packets that comprise the frame, the frame loss rate can be calculated with a binomial. This binomial can be used to predict the quality of the receiving video. This prediction method would vary based on the number of allocated FEC packets.

This study proposes a framework that can adaptively predict the distortion in the quality of receiving video, and it proposes a quality prediction model according to packet losses in order to effectively transfer multimedia video data in a Wi-Fi broadcasting system. The proposed ULP system reflects the scene transition characteristics of the encoded video. According to the principle of video encoding, static video such as news and dynamic video such as sports have different characteristics of the encoded video. Previous studies have proposed a method that uses different weights depending on the frame type, frame size, and frame position of the group of pictures (GOP). However, this article reflects the scene transition characteristics of video. Based on the proposed quality prediction model, a ULP system is designed and implemented. In addition, the validity of the quality prediction model and the effectiveness of the proposed algorithm will be demonstrated.

This study is outlined as follows. In section “Related works,” we describe the related works, such as those focused on packet receiving and sending characteristics in a Wi-Fi broadcasting system, as well as packet recovery techniques. Then, the basic principle of video compression and the structure of video data are described. Next, existing studies on video quality distortion models and ULP are presented. In section “Quality prediction model–based ULP system,” we present an important parameter for system design, video quality criteria, the quality prediction model, and the design of the ULP system based on the quality prediction model. In section “Performance evaluation,” we discuss the implementation of the suggested algorithm for performance evaluation and the environment in which the experiment was conducted. In addition, a description of how to conduct the performance evaluation is provided. This description validates the quality prediction model, the quality prediction model–based ULP system, and the characteristics of Wi-Fi ULP based on the quality prediction model and compares these elements to existing studies. Finally, our conclusions and discussion are presented in section “Conclusion.”

Related works

In a Wi-Fi broadcasting system that does not implement MAC-level retransmissions, packet losses are unavoidable. Moreover, the rate of packet losses increases as the distance between an AP and a receiver increases, and as the data transmission rate over a fixed distance increases.⁹

The Wi-Fi broadcasting system uses packet-level FEC schemes to recover lost packets. FEC(n, k) schemes transmit n number of packets, including additional (n – k) number of packets eventually to transmit k number of packets. Frame loss is defined as the single packet loss that occurs when one frame is transmitted to k number of packets. Then, for the n number of packet transmissions, if the users received more than k packets, frame loss would not occur. However, for n number of packet transmissions, if the users received less than k number of packets, frames would be lost. The frame loss probability is calculated with a binomial as shown in equation (1), and Figure 1 presents a frame recovery concept map using packet-level FEC⁹

L_{f} = \sum_{i = 0}^{k - 1} (\begin{matrix} n \\ i \end{matrix}) \cdot {(1 - P)}^{i} \cdot P^{n - i}

(1)

Figure 1.

Concept map of broadcast packet loss recovery using the FEC scheme in Wi-Fi broadcasting system.

The video compression technology utilizes H.264 encoding, which eliminates time and space redundancies. H.264 has been developed for use in transferring networks from the beginning of development. In addition, the encoding video is transferred by the GOP unit to reduce the potential for quality deterioration, which is associated with possible transmission losses. One GOP is composed of several frames, which differs based on profile of video. For example, the baseline profile is composed of one I frame and multiple P frames. Each frame type is divided into packets (the unit of Wi-Fi transmission) before being transferred. If one of these packets is lost, the entire frame would be lost as well. Depending on the number of packets that comprise the frame, a different rate of frame loss is calculated using the packet loss rate. Figure 2 shows the video encoding process (the division of frames into packets and their transfer into GOP units).

Figure 2.

Data structure map for data transmission of encoded video.

GOP(j) represents the jth GOP. One GOP has M number of frames, which are composed of one I frame and M – 1 P frames. The I frame of the jth GOP is equal to I(j), the P frame of mth of jth GOP is equal to $P_{m} (j)$ . Here, m represents a number from 1 to M – 1. The number of packets of I frame is $K_{I} (j)$ and the number of packets of mth P frame is $K_{P} (m, j)$ . The number of FEC packets allowed in the I frame is $R_{I} (j)$ and the number of FEC packets allowed in the P frame is $R_{P} (m, j)$ . When $R_{I} (j)$ and $R_{P} (m, j)$ are allocated, the frame loss rates of each I frame and each P frame are $L_{i}$ and $L_{(m, p)}$ , respectively, which could be calculated using a binomial, as shown in equations (2) and (3). The variable, $ρ$ , represents the packet loss rate

L_{i} = \sum_{i = 0}^{K_{I} (j) - 1} (\begin{matrix} K_{I} (j) + R_{I} (j) \\ i \end{matrix}) \cdot {(1 - ρ)}^{i} \cdot ρ^{K_{I} (j) + R_{I} (j) - i}

(2)

L_{(m, p)} = \sum_{i = 0}^{K_{P} (m, j) - 1} (\begin{matrix} K_{P} (m, j) + R_{p} (m, j) \\ i \end{matrix}) \cdot {(1 - ρ)}^{i} \cdot ρ^{K_{P} (m, j) + R_{p} (m, j) - i}

(3)

The video quality distortion modeling research is divided into the one that optimizes the bit rate distortion when encoding the source video and the other that predicts the quality of receiving video when loss occurs after the encoded video is transmitted.

The purpose of rate–distortion optimization (RDO) is to select the macroblock with the minimum distortion rate as the optimal macroblock mode with which to encode the video. This will allow for a minimum bit rate to be achieved without losses, and maximum quality can be attained. One of the factors that facilitated improvements in the H.264 compression rate is the RDO mode. The RDO mode selects the macroblock mode that has the minimum distortion cost by calculating the bit rate and degree of distortion not by the moving cost calculated during the movement prediction process.¹⁰ This mode has the advantage of reducing the beat rate but the disadvantage of increasing coding time.^11,12 Recent studies have focused on reducing bit rate, minimizing distortion, and reducing the calculation amount in the RDO area, while studies on discrete cosine transform (DCT) have been trying to overcome these problems by predicting the bit rate and degree of distortion.¹³

When transferring the encoded video, after considering the time and space characteristics, transmission losses occur and the quality distortion incurred by these losses is called channel distortion. The research that predicts the channel distortion to packet loss rate has been focused on predicting the amount of channel distortion based on various coding technologies.

Existing studies have predicted the level of video quality using pixels, macroblocks, frames, and GOPs. The loss unit is the transmission unit and the prediction unit predicts the video quality based on the loss unit. The loss unit includes the group of blocks (GOB), frame, macroblock, pixel, and slice; the prediction unit includes pixels, macroblocks, frames, and GOPs.^13–16

Zhang et al. proposed recursive optimal per-pixel estimation (ROPE) technology to predict each pixel of decoded video, and the predicted pixels predict the overall frame distortion using mean squared error (MSE). Prediction research based on pixels has expanded the way of modeling mutual relations between two pixels.^13,14,17

Ekmekci and Sikora predicted the success and failure of frame losses at the macroblock level. The frame loss that occurs based on the packet loss was calculated using the method that calculates distortion at the macroblock level.¹⁵

He et al. used the characteristics of the intra-refresh rate and DCT coefficient to predict the degree of video quality deterioration at the frame level when it fails to transfer the macroblock or packet through the pixel unit. The frame distortion is based on MSE, and we performed modeling for video quality in cases where the I and P frames were transferred successfully and unsuccessfully.¹⁸

Wang et al. proposed a method that predicts frame loss based on the slice losses inside the frame. His method has similarities with our method in terms of the prediction of frame loss at the unit level. He used a baseline profile and predicted the distortion based on whether I and P frames of a video were lost or not. Then, to determine quality distortion resulting from channel distortion, he generalized that baseline profile by adding the distortion of the current frame to the distortion from transmission loss.¹⁶

While in the existing research distortion is typically calculated by distinguishing between the distortion of the current frame and the distortion amount by the previous frame, and summing up these two amounts to predict the video quality distortion resulted from packet loss. For precise prediction, it is modeled using the technique used for encoding, which generally remains a constant. Those methods that are determined by the encoding technique make it possible to approximately predict the capability of the entire video in one calculation, but difficult to reflect the characteristics of video motion. To solve this problem, we propose a calculation method that adaptively calculates the average video quality in every GOP.¹⁹

The research on how to effectively transmit high-capacity video under circumstances where packet loss and delay occur has been performed in either the application layer or the transmission layer.

In the application layer, the research has focused on multimedia data transmission technologies in a way that decreases the size of transmitted data or minimizes packet loss. The representative technologies include error concealment and recovery technology,²⁰ multi-layered compression technology,²¹ and scalable compression technology,^22,23 which minimize the quality deterioration of video for the packet loss.

Error concealment and recovery technologies are classified into one using information between frames and using information inside the frame. The method that uses information between frames uses motion information of the adjacent frame to conceal the error. This method is effective for recovering lost blocks because it could utilize the surrounding information of the lost block, as well as adjacent information close to the time axis. However, this method has a disadvantage in that it is not applicable to video without motion, or video that is compressed with intra-frame. In addition, this method is difficult to apply to video with a fast and complex motion.

The scalable compression technique is a video encoding technique that initiates an applicable service to a different network environment and different kinds of devices with one-bit flow. The scalable video compression algorithm is a next-generation color consistency video compression technique that can be used in television, PCs, and cellular phones with one source. This algorithm enables spatial scalability of display resolution sizes (GCIF, CIF, 4CIF, etc.), temporal scalability of frame rate (7.5, 15, and 30 Hz) and quality scalability (bit rate) to meet user demand.

Along with those scalabilities, the H.264 video compression standard²⁴ minimizes the data amount and has a higher compression rate than the existing compression standard. However, this higher compression rate requires a high correlation between spatial and temporal scalability between video frames, and it has the disadvantage of being more highly vulnerable to packet losses than the existing compression standard.

One representative packet loss recovery technique used at the transport layer is the automatic retransmission request (ARQ) and FEC. The ARQ is not appropriate for real-time multimedia services because it generates losses due to increased transmission quantity from the retransmission of packet loss. This leads to additional delay and then overloaded packet transmission. Meanwhile, FEC considers losses, adds and transmits additional packets, and therefore receivers can recover the losses. This means that it does not only need retransmission or additional transmission of feedback but also in less delayed time.²⁵

Under circumstances where network resources are limited, using FEC to recover packet loss at the same recovery rate with the encoded frames is not effective, considering the quality of receiving video. Therefore, ULP is proposed.²⁶

The existing ULP studies classified the data structure to select priority based on the characteristics of the network and the multimedia streaming into the packet, the macroblock, and the frame and then used the macroblock and the frame to decide the priority of video quality decision.^27,28

Cavusoglu et al.²⁷ use the point that MPEG-2 video has different quality in every picture in order to effectively transmit MPEG-2 video. In the case of MPEG-2 video to packet loss, it has different quality relation weight in every picture, and these different weights are calculated in advance to predict the video quality to packet loss rate. Then, using this prediction, it gives different FEC packets to I, P, and B pictures. This is limited to a certain encoding technique (MPEG-2) and, even if it is applied to MPEG-2 video, the quality of receiving video could be changed depending on the characteristics of the scene changes within the video, which could not be applied to recover packet loss.

Marx and Farah²⁹ suggest a method for transmitting H.263 video streams on 3G networks. Using the baseline profile, the I and P frames of GOP were classified by the temporal aspect and FEC packet was given. In the GOP, the priority was given as high, normal, and low, and then different FEC packets according to location were given. After rearranging the frames based on the location of the frame within the GOP, packet loss could be recovered and the GOP would be reorganized. Therefore, the method is not appropriate for real-time multimedia, and it also requires another frame classification and combining system.

Chang et al.²⁸ suggest a technique that is similar to that introduced in Marx and Farah.²⁹ According to the priority order, the macroblocks of the frame are classified as high, medium, and low, and then those are given different FEC rates to be transmitted. This method ensures real-time features by conducting FEC encoding before channel encoding. However, to minimize quality distortion, it assigns a different FEC rate to the same frame location, which is the same as that in Marx and Farah.²⁹ Therefore, the transmission system becomes complex and has the disadvantage of accumulating FEC packets in the front of the GOP.

Hartanto and Sirisena present a fixed FEC rate that is proportional to the size of intra- and inter-frames. This technique does not consider quality distortion resulting from loss propagation in the GOP frame, and instead only considers the size of the frame. Therefore, it is not an effective algorithm.³⁰

Diaz et al. suggest the decision frame set (DFS) as a frame transmission unit and the algorithm that provides a different FEC rate according to frames in order to effectively transmit frames while ensuring the real-time feature of MPEG-2 TS packet. The amount of FEC is decided by the quality distortion model for I frame DFS (I-DFS) and P and B frame DFS (PB-DFS), and the model defines the weight and frame location as a constant depending on the frame type and size; therefore, the mode does not reflect the scene changing characteristics of various videos.³¹

Research of Diaz et al. is similar to this study. Diaz et al. constantized the weights according to frame type, frame size, and frame position of GOP. According to the principle of video encoding, static video such as news and dynamic video such as sports have different characteristics of the encoded video. For example, the capacity of I frame and P frame in GOP is different. Diaz et al. proposed not to reflect these characteristics but to use different weights according to frame type, frame size, and frame position of GOP. However, this article reflects the scene transition characteristics of video.

Recently, Jose and Sameer³² proposed new LUBY transform (LT) codes of application layer forward error correction (AL FEC) for transmission of large-capacity multimedia data in next-generation cellular wireless network and it showed the improvement of video quality using simulation. Haghifam et al.³³ classified the importance of packet into two classes for burst losses occurring in real-time interactive application such as audio and video streaming and allocated additional FEC packets to more important packets. Lv et al.³⁴ proposed a ULP scheme that gives more FEC packets to field of view (FOV) than non-FOV with less impact on video quality in order to efficiently transmit large-capacity virtual reality (VR) video.

The ULP-related studies have been trying to improve the quality of receiving video about packet loss depending on encoding technology and network environment. The studies focus on topics ranging from factors affecting packet loss in encoding technology to predicting quality of the receiving video. The research on predicting quality of receiving video to packet loss has been generalized using certain techniques during the encoding process. The quality of receiving video about packet loss should consider both encoding technology and the motion characteristics of the subject video. However, ULP-related research has not yet considered video motion characteristics.

Therefore, this study predicts video quality in GOP about packet losses, proposes a Wi-Fi quality prediction model based on a ULP system that calculates receiving prediction quality, and finds the FEC rate of a superior-quality video when different FEC rates were applied.

Quality prediction model–based ULP system

Video quality metric of the proposed system

The peak signal-to-noise ratio (PSNR) is a measure of the quality difference between two frames. The PSNR describes the difference of pixel of the same location of comparing frame using MSE. To calculate PSNR, signal power that represents all kinds of different video is needed. However, the signal power is not available and the maximum value of the 8-bit pixel, which is the square of 255, is replaced to calculate PSNR. Equation (4) describes MSE and equation (5) describes PSNR

D (MSE) = \frac{1}{N_{i}} \sum_{t = i}^{N_{i}} {[\hat{F} (n, i) - \tilde{F} (n, i)]}^{2}

(4)

PSNR = 10 \times \log_{10} (\frac{MA X^{2}}{MSE})

(5)

where $\hat{F}$ represents the encoding video, $\tilde{F}$ represents the receiving video, n represents the frame, i represents the pixel, and $N_{i}$ represents the total number of pixels in the frame.

While PSNR is a quantitative metric of video, mean opinion score (MOS) is a qualitative metric. MOS describes the quality difference between two frames with five stages, which means qualitative video quality differences.

Regarding the video quality deterioration metric for frame loss, PSNR is used in this study to compare the quality of the encoded video to that of the receiving video in which loss has occurred. $PSN R_{Q}$ is defined as the video quality of receiving video to encoded frame. The range of $PSN R_{Q}$ is given in equation (6)

PSN R_{\min} \leq PSN R_{Q} \leq PSN R_{\max} (dB)

(6)

At this time, maximum quality occurs when there is no loss and minimum quality occurs when the video is not identifiable due to loss. $PSN R_{\max}$ represents the maximum quality and $PSN R_{\min}$ represents the minimum quality.

In a case where no frame loss occurs, MSE is equal to zero; therefore, PSNR is not defined. However, because $PSN R_{\max}$ by the maximum pixel loss should be defined, we decided to define 45 dB, encoded video’s quality to source video, as $PSN R_{\max}$ when the source video is encoded with a high rate of encoding.

In addition, $PSN R_{\min}$ is at least 12 dB even when every pixel is lost; therefore, we defined it as 12 dB.

In this study, a new video quality metric, Q, is used. It ranges from 0 to 1 and is represented as follows

0 \leq Q \leq 1

(7)

Q could be calculated with equation (8) and the value of PSNR is measuring value before and after frame transmission

Q = \frac{PSN R_{Q} - PSN R_{\min}}{PSN R_{\max} - PSN R_{\min}}

(8)

The relation between Q, PSNR, and MOS is described in Table 1.³⁷

Table 1.

Values of PSNR and MOS according to Q value.

Q	PSNR (dB)	MOS	Meaning
>0.75	>37	5 (excellent)	Not recognizing the changes of two scenes
0.75–0.57	31–37	4 (good)	Recognizing only special case
0.57–0.39	25–31	3 (fair)	Recognizing the changes of two scenes
0.39–0.24	20–25	2 (poor)	Recognizing scene but difficult to see
<0.24	<20	1 (bad)	Not recognizing any scene

PSNR: peak signal-to-noise ratio; MOS: mean opinion score.

Quality prediction model

The quality prediction model predicts the video quality degradation after transmission of encoded I and P frames in an environment with frame loss. The expression of quality degradation uses Q. At this time, Q of the I frame is defined as $Q_{I} (j)$ , Q of the P frame is defined as $Q_{P} (m, j)$ , and the average Q of the GOP is defined as Q(j).

In this study, we use Q to predict the quality of the receiving frame where loss is likely to occur before transmission. The predicted quality of the I frame is $\bar{Q_{I} (j)}$ and that of the P frame is $\bar{Q_{P} (m, j)}$ , and the average predicted quality of GOP is $\bar{Q (j)}$ . Figure 3 presents the relation between Q(j) and $\bar{Q (j)}$ .

Figure 3.

Relation concept map between Q(j) and $\bar{Q (j)}$ .

As shown in Figure 2, frame loss due to broadcast packet loss is difficult to predict. Therefore, we distinguish and analyze the characteristics because the quality degradation of the received video according to the loss of I and P frames is different.

To predict receiving video quality to the lost frame, Q, the quality of the receiving video to actual lost frames was identified. To identify the quality deterioration feature of various videos with motion characteristics, lost frame experiments were conducted for eight videos.

The information on the eight videos is presented in Table 2. These videos have varying resolutions and a varying number of frames per second. Because the experiment was conducted on various videos for quality deterioration, the encoding variables were fixed. The encoding rate is 512 kbps, the size of the GOP is 15, and the profile is the baseline profile. Because the size and purpose of the I and P frames varies in GOPs, the features affected by lost frames are different. Therefore, we are going to look at quality deterioration features in every frame in the GOPs depending on whether I frame is lost or not.

Table 2.

Video information and experimental parameters used to identify quality deterioration features of videos.³⁵

Title	Video size	Coding rate (kbps)	FPS	GOP size	Profile
Akiyo	352 × 288	512	30	15	Baseline
Bus	352 × 288	512	15	15	Baseline
Carphone	176 × 144	512	30	15	Baseline
City	176 × 144	512	15	15	Baseline
Foreman	352 × 288	512	30	15	Baseline
Harbour	704 × 576	512	15	15	Baseline
Ice	704 × 576	512	60	15	Baseline
Mobile	176 × 144	512	15	15	Baseline

FPS: frames per second; GOP: group of pictures.

In case of lossless I frame

To identify the quality deterioration characteristics of lost frames with respect to the entire GOP with various resolutions, frames per second (fps), video motion, randomly chosen frames (5th, 10th) were intentionally lost, and then the quality characteristics (Q) of the entire GOP were identified.

Q of the previous frame of the lost frame in video has no loss; therefore, it is equal to 1. When frame loss occurs, the lost frame would be recovered by the H.264 frame loss recovery algorithm. If the recovered frame and lost frame were the same, the quality would remain the same. However, this is impossible in practice; therefore, distortion occurs. Because H.264 performs encoding that refers to the previous frame, post-loss frames will experience losses during transmission due to distortion of the recovered frame.

Figure 4 shows the Q dispersion of the frames before and after loss occurred when the only 5th P frame is lost in the GOP of each video.

Figure 4.

Q dispersion of GOP due to only 5th P frame loss.

In this experiment, frames after the lost frame did not have losses but they have similar Q dispersion as the lost frame. The amount of quality deterioration resulting from frame loss is defined as α. The α value of successful frames before the lost frame is equal to 0, and therefore Q is 1, while α of successful frames after lost frame is 0 ≤ α ≤ 1 and hence Q is (1 – α). An additional experiment was conducted to identify the Q dispersion of successful frames based on the number of lost frames. The 5th and 10th P frames of the GOP were randomly chosen and lost to identify the Q dispersion of the entire frame.

Figure 5 shows the Q dispersion of the frames before and after frames were lost when the 5th and 10th P frames are lost in each video. The Q of frames before the 5th frame is 1 because the value of α is 0. From the 5th to the 9th frame, Q is equal to $(1 - α)$ because the 5th P frame is lost. From the 10th to the last frame, because the 10th frame is lost based on Q of previous frames, Q is decreased by $α$ .

Figure 5.

Q dispersion of GOP due to only 5th and 10th P frame losses.

If the predicted quality of the (m + 1)th P frame that was successfully received is not related to the quality of the previous frames, the quality can be represented by equation (9)

\bar{Q_{P} (m + 1)} = (1 - L_{P}) \cdot 1 + L_{P} \cdot 0 = (1 - L_{P})

(9)

However, even if the (m + 1)th P frame was successfully received, the quality of the P frame is decided by the one of previous frames, and it should be modified as shown in equation (10)

\bar{Q_{P} (m + 1)} = (1 - L_{P)} \cdot \bar{Q_{P} (m)}

(10)

This is a case when successfully received and, if not, Q would be decreased by (1 – α) and predicted quality of P frame would be calculated with equation (11). Here, $L_{P}$ represents the loss rate of the P frame

\begin{matrix} \bar{Q_{P} (m + 1)} & = (1 - L_{P}) \cdot \bar{Q_{P} (m)} + L_{P} \cdot (1 - α) \cdot \bar{Q_{P} (m)} \\ = (1 - α L_{P}) \cdot \bar{Q_{P} (m)} \end{matrix}

(11)

In order to demonstrate the Q dispersion, a recovery technique to the lost frame in the video compression algorithm needs to be understood. H.264 generally uses frame copying techniques to recover lost frames. Figure 6 shows a copying concept for P frame loss recovery. In short, H.264 copies the previous successfully received P frames to recover a lost frame when the I frame is lossless and the P frame has loss.

Figure 6.

Frame copying concept map for P frame loss recovery to lossless I frame.

If no frame loss occurs and the encoding video and the receiving video are the same, Q is equal to 1. If the nth P frame is lost, the nth frame of the receiving video is copied to the (n – 1)th frame. Then, Q is decreased by $(1 - α)$ . $α$ is the distortion coefficient of frame recovery, which indicates the distortion amount of one sheet of frame. This amount can be calculated using two sheets of the encoded video frame. When a frame is lost, distortion is calculated with the encoded video as shown in equation (12)

D (= MSE) = \frac{1}{N_{P}} \sum_{t = 0}^{N_{P} - 1} {[\hat{F} (n, i) - \hat{F} (n - 1, i)]}^{2}

(12)

Predicting the size of α when a frame is lost is not easy. The existing quality prediction model uses various coefficients using various coding technologies to predict α.

To identify the characteristics of α according to frame loss, the values of α in each GOP to Akiyo and Harbour video were identified. Figure 7 shows the dispersion of Q to the 5th and 10th P frame loss in the Akiyo video in GOP unit. The features of Q to frame loss in one GOP seemed to maintain in each GOP, but the value of α of each GOP to frame loss showed different features.

Figure 7.

Dispersions of Q to the 5th and 10th P frame loss in the whole Akiyo video.

Figure 8 shows the dispersion of Q to the 5th and 10th frame loss in the Harbour video. Compared to one of the Akiyo videos, the Q dispersion of successfully received P frames within GOP looks similar, but α is significantly small.

Figure 8.

Dispersion of Q to the 5th and 10th P frame loss in the whole Harbour video.

Figure 9 shows a comparison of the average P frame sizes of the Akiyo and Harbour videos. $α$ is decided by the P frame loss. When the P frame encodes or decodes, it uses information from both the previous I frame and P frame. The P frame, which only predicts the images, was devised from that the whole images do not change between consequent images but from that blocks of images move side.

Figure 9.

Average data size of P frame in Akiyo and Harbour video.

In other words, when there is a movement, the shape of the object does not change. It only moves to one side; therefore, the difference between the before and after scene is significantly small. Accordingly, encoding is done by coding the small difference. In this regard, the larger the data size of P frame, the higher the prediction amounts. When recovered after loss, it will affect much the video quality. This relation could be used to predict distortion amounts but in this study Q was calculated directly to frame loss. Therefore, this is not the area to be studied in this study but to use it only to explain the difference in size of $α$ .

In case of I frame loss

I frame loss affects every frame of the GOP. Therefore, when the quality of each frame is predicted by the GOP unit, it is difficult to predict the quality of each frame due to the I frame loss. Figure 10 shows the Q dispersion of the entire GOP when the I frame of the jth GOP was lost. The I frame loss of the jth GOP is related to the one of (j + 1)th frame. If the (j + 1)th I frame is lossless and the jth I frame has loss, the entire jth GOP is copied to the (j + 1)th I frame through the copying algorithm. This is related to losses between I frames, which makes it difficult to predict the frame quality of the entire GOP. In this study, we use the average Q value of GOP to I frame loss, and express it as β. Figure 11 shows the recovery process of I frame loss.

Figure 10.

Q dispersion in GOP due to I frame loss.

Figure 11.

Frame copying concept map on I frame loss in the whole video.

As shown in Figure 10, the Q dispersion of each GOP frame to I frame loss is difficult to predict because Q varies depending on the position of the coped I frame. In this study, the prediction quality of the distortion of the middle frame between the I and P frames is defined as $\bar{Q_{I} (j)}$ and expressed as $β$ . The distortion of I frame loss is shown in equation (13)

D (= MSE) = \frac{1}{N_{P}} \sum_{t = 0}^{N_{P} - 1} {[\hat{F} (n, i) - \hat{F} (Center, i)]}^{2}

(13)

\bar{Q_{I} (j)} = β

(14)

Based on the overall result of whether the I frame is lost or not, formula (15) could be expressed. Here, $L_{i}$ indicates the I frame loss rate, $L_{P}$ indicates the P frame loss rate, and $α$ and $β$ represent the quality distortion coefficient

\bar{Q (j)} = \frac{1}{M} \sum_{h = 0}^{M - 1} [β \times L_{i} + {(1 - α L_{P})}^{h} \times (1 - L_{i})]

(15)

Quality prediction model–based ULP system

Video quality prediction model–based ULP is to find the FEC packet allocation case where $\bar{Q (j)}$ becomes maximum in equation (15). The ULP is composed of a packet loss monitoring module, an unequal packet allocation module, an α, β prediction module, and an encoded video monitoring module. α and β are predicted differently based on the video, and the GOP and packet loss rate are collected by the other device using network management protocol, such as the Simple Network Management Protocol (SNMP).

The encoded video monitoring module monitors information about the number of packets of I and P frames. Using packet loss rate and encoded video information, the I and P frame loss rate to the FEC packet case could be calculated. Using I and P frame loss rate calculated by the FEC packet allocation case and $α$ and $β$ that are predicted by every video, finding the FEC case when the value of $\bar{Q (j)}$ in equation (15) is maximum is the objective. Figure 12 shows the overall composition of the quality model–based ULP system.

Figure 12.

System structure map of quality prediction model–based ULP.

There is only one I frame in GOP; therefore, only one FEC packet allocation method exists. However, for the case of the P frame in the GOP, the FEC allocation algorithm is either allocating FEC packet to every P frame, or to the whole P frames. However, the former method is not effective when the location of lost packet is not identifiable. Hence, we allocate FEC packets to the whole P frame and then calculate the loss rate of each P frame.

As explained earlier, the I frame loss rate is calculated using a binomial that utilizes the I frame packet numbers, $K_{I} (j)$ and $R_{I} (j)$ , such as equation (16). We once defined the number of packets of the mth P frame as $K_{P} (m, j)$ , but here the number of packets of the whole P frame is defined as $K_{P} (j)$ and $R_{P} (j)$ is defined as the FEC packet number allocated to the whole P frame. If we assume that $K_{P} (j)$ is one frame and add $R_{P} (j)$ , the packet loss rate $ρ_{L}$ that is decreased in frame is calculated with equation (17). Here, n is the sum of $K_{P} (j)$ and $R_{P} (j)$ , and t is the subtraction of n and $K_{P} (j)$

L_{i} = \sum_{i = 0}^{K_{I} (j) - 1} (\begin{matrix} K_{I} (j) + R_{I} (j) \\ i \end{matrix}) \cdot {(1 - ρ)}^{i} \cdot ρ^{K_{I} (j) + R_{I} (j) - i}

(16)

T_{ρ} = \frac{1}{n} \sum_{j = t + 1}^{n} j \cdot (\begin{matrix} n \\ j \end{matrix}) \cdot ρ^{j} \cdot {(1 - ρ)}^{n - j}

(17)

Using equation (17), the frame loss rate of each P frame could be calculated as shown in equation (18)

L_{p} = 1 - {(1 - T_{ρ})}^{K_{P} (m, j)}

(18)

Figure 13 shows the P frame loss rate calculation concept map according to FEC packet allocation.

Figure 13.

Concept map of P frame loss rate calculation.

Performance evaluation

System development and evaluation of experimental environment and capability

To develop a ULP system for a Wi-Fi broadcasting system, this study identified the real network abstraction layer (NAL) unit of each frame in H.264 streaming. Compared to the standard, the start pattern of the I frame in real NAL units was 00 00 01 B3 and that of the P frame was 00 00 01 B6. Figure 14 captured a real packet in a NAL unit and shows real frame header of I and P frames. Based on this information, different FEC rates were applied to each frame. The Wi-Fi broadcasting system creates video data and encodes FEC packets in order to transmit them through AP.

Figure 14.

Head structure of the I and P frames in an NAL unit.

Packet loss during transmission is unavoidable, but users may recover a lost packet using FEC decoding technology. A problem arises when expressing this system as it is, and the problem is caused by loss. In this environment, data are transmitted through AP, and different losses occur in different environments. To evaluate video quality, a precise packet loss rate is necessary. Hence, in this study an experimental environment was constructed as follows. A broadcasting server creates encoding and broadcasting packets and transmits data to the FEC & error creating server. The FEC & error creating server creates FEC packets according to the proposed method, adds it to the video, and transmits the video data to users based on the loss rate. This is the method to ensure precise packet loss rate and it is assumed that packet loss occurs randomly. To realize the FEC function, an erasure code based the ZFEC library was used (Figure 15).³⁶

Figure 15.

Experimental environment concept map for evaluating performance of Wi-Fi broadcasting system.

Figure 15 shows an experimental structure that corresponds to the composition of the Wi-Fi broadcasting system. Each composition is classified into sending and receiving parts. Table 3 shows the video encoding parameter. For the packet creation, packet transmission, packet receiving, and capability evaluation, EvalVid framework was used.³⁷ The EvalVid framework was installed in the broadcasting server and users to evaluate video quality about broadcast packet loss.

Table 3.

Encoding parameters for capability evaluation experiment.

Parameter	Value
Source video	Akiyo, Foreman
Coding technology	H.264
Profile	Baseline
Encoding rate	512 kbps
GOP size	15
Packet size	1024 bytes

GOP: group of pictures.

The scene changing features of the Akiyo and Foreman videos used in the experiment are discussed here. The Akiyo video involves a female announcer sitting and reporting the news. This video indicates no changes except for the eyes and mouth of the woman. After identifying the size of the I and P frames of the entire video, it has the same proportion of I and P frame size in every GOP. The Foreman video is of a male construction worker introducing his work. The background of the video at the beginning does not change but the man changes his gestures. At the end of the video, he disappears and only the work field is being shown. Here, a significant scene change occurs. After identifying the size changes of the I and P frames of the entire video, it was determined that they were larger than those for the Akiyo video because of the scene changes.

The reason why the Akiyo and Foreman videos were chosen for capability evaluation is to identify the validity of the quality prediction model to frame loss by comparing one video with less scene changes to another with more scene changes. Then, based on this experiment, the videos were used to verify the ULP system.

In addition, to identify the quality deterioration coefficient that is proportionally allocated FEC packets, we found that two different scene changing videos are needed. As a result, we are going to identify the characteristics of our proposed algorithm based on scene changes. Figure 16 shows a video screen of the Akiyo and Foreman videos and the frame size of each video.

Figure 16.

(a) Screen of Akiyo and Foreman videos and (b) frame data size of each whole video.

For the performance evaluation, we compare the suggested $\bar{Q (j)}$ with the simulation model and the result of the actual experiment. Packet loss intentionally occurs during the experiment and the receiving video quality is going to be measured.

The simulation model is used to supplement the results obtained from the actual experiment and to calculate the theoretical Q(j) according to FEC packet allocation dispersion by making every loss situation theoretically possible in every frame of the GOP. In this study, Q(j) by simulation is defined as $Q_{s} (j)$ and by $Q_{e} (j)$ in the actual experiment.

Here, $\bar{Q (j)}$ represents the predicted quality by pattern and $P_{O} (j)$ means the pattern occurrence possibility

Q_{s} (j) = \sum_{k = 1}^{2^{M}} (\bar{Q (j)} \times P_{O} (j))

(19)

For example, when the GOP is equal to 5, $Q_{s} (j)$ would be calculated as follows. When GOP is 5, 32 patterns are possible. First, the predicted quality to each pattern, $\bar{Q (j)}$ , is calculated and is multiplied with $P_{O} (j)$ of the corresponding pattern. At this time, $P_{O} (j)$ would be calculated by $L_{i}$ and $L_{P}$ . There are 32 calculations and $Q_{s} (j)$ is calculated by summing up all 32 values.

Then, $Q_{s} (j)$ with compared with either $\bar{Q (j)}$ or $Q_{e} (j)$ . Figure 17 shows an example calculation of $Q_{s} (j)$ when the GOP is equal to 5.

Figure 17.

Concept map for $Q_{s} (j)$ calculation.

Validity of the quality prediction model

To verify the quality prediction model, characteristics of $\bar{Q (j)}$ according to various $L_{i}$ and $L_{P}$ were identified. $α$ of the Akiyo video is equal to 0.1 and $β$ is 0.72. As the values of $L_{i}$ and $L_{P}$ increased, $\bar{Q (j)}$ decreased.

$\bar{Q (j)} (L_{i}, L_{P})$ represents the $\bar{Q (j)}$ of certain $L_{i}$ and $L_{P}$ . For example, $\bar{Q (j)} (0, 0.2)$ means that $L_{i}$ is equal to 0% and $L_{P}$ indicates the frame loss rate of 20%. $\bar{Q (j)} (0, 0.2)$ means that the I frame has no loss but the P frame has a loss rate of 20%. Therefore, the expected quality $\bar{Q (j)}$ is equal to 0.9154. Figure 18 shows $\bar{Q (j)}$ to various $L_{i}$ and $L_{P}$ in the Akiyo video.

Figure 18.

$\bar{Q (j)}$ dispersion to I and P frame loss rate.

The X-axis in Figure 18 represents $L_{i}$ , the Y-axis represents $L_{P}$ , and the Z-axis represents $\bar{Q (j)}$ .

To confirm the validity of the quality prediction model proposed in this study, $\bar{Q (j)}$ and $Q_{e} (j)$ to various $L_{i}$ and $L_{P}$ would be compared. The value of $Q_{e} (j)$ is the value resulting from the experiment based on a developed system. The experiment was performed 20 times to a certain GOP(j) to $L_{i}$ and $L_{P}$ . $β$ according to $L_{i}$ is related to the loss occurrence of the I frame of GOP(j + 1). Therefore, to identify the validity of the quality prediction model, we assumed that the I frame of GOP(j + 1) has no loss. Figure 19 shows the values of $Q_{e} (j)$ according to various $L_{i}$ and $L_{P}$ in the Akiyo video.

Figure 19.

$Q_{e} (j)$ dispersion to I and P frame loss rate.

Using Figures 18 and 19, the validity of the proposed quality prediction model would be identified. $\bar{Q (j)} (0.1, 0.2)$ is equal to 0.8838 and $\bar{Q (j)} (0.2, 0.1)$ is 0.8903. Comparing these to $Q_{e} (j)$ , $Q_{e} (j) (0.1, 0.2)$ is equal to 0.89 and $Q_{e} (j) (0.2, 0.1)$ is 0.8915. Then, the case when $L_{i}$ is 20% and $L_{P}$ is 10% exhibits higher results in $\bar{Q (j)}$ and $Q_{e} (j)$ than those obtained in the case when $L_{i}$ is 10% and $L_{P}$ is 20%.

In another example, $\bar{Q (j)} (0.2, 0.3)$ is equal to 0.8141 and $\bar{Q (j)} (0.3, 0.2)$ is 0.8206. Comparing these to $Q_{e} (j)$ , $Q_{e} (j) (0.2, 0.3)$ is equal to 0.8278 and $Q_{e} (j) (0.3, 0.2)$ is 0.8298. Then, the case when $L_{i}$ is 30% and $L_{P}$ is 20% exhibits higher results in $\bar{Q (j)}$ and $Q_{e} (j)$ than the ones when $L_{i}$ is 20% and $L_{P}$ is 30%.

We have seen the validity of the proposed algorithm through the values of $\bar{Q (j)}$ and $Q_{e} (j)$ to various $L_{i}$ and $L_{P}$ . The purpose of this study is not to find the precise quality prediction values but to find the values when the dispersions of $\bar{Q (j)}$ and $Q_{e} (j)$ are the same when the FEC packet is applied to $L_{i}$ and $L_{P}$ . Figure 20 shows the value difference of $\bar{Q (j)}$ and $Q_{e} (j)$ to $L_{i}$ and $L_{P}$ .

Figure 20.

Difference between $\bar{Q (f)}$ and $Q_{e} (f)$ to the loss rate of I and P frames.

Because the packet loss rate is smaller, the difference is smaller as well. However, as the loss rate increases, the difference becomes significantly larger. If the loss rate increased by more than 40%, the difference would significantly increase. Even if the difference has increased, the FEC packet could be allocated assuming that the validity of the model is identified.

Validity of the quality prediction model–based ULP system

To identify the validity of the proposed quality prediction model to loss rate of I and P frames, the quality of receiving video to frame loss was identified through experimentation. In this chapter, we are going to identify the quality of receiving video depending on packet loss rate and FEC data packet numbers in order to verify the proposed quality prediction model. Using the Akiyo video, the experiment performed to predict quality dispersion with the proposed algorithm was conducted when the packet loss rate is 10%, 20%, or 30%, and when the FEC packet is 64 or 128 kbps. Figure 21 shows $R_{I} (f)$ when the FEC data size is 64 kbps and $\bar{Q (j)}$ and $Q_{e} (j)$ according to $ρ$ .

Figure 21.

$\bar{Q (j)}$ and $Q_{e} (j)$ curves to $R_{I} (f)$ and $ρ$ , when the FEC packet rate is 64 kbps.

The Akiyo video is 30 fps. Because the GOP size was set at 15 and the packet size was set to 1024 bytes, the FEC packet of 64 kbps could be used to recover losses of the four packets. The four FEC packets have five patterns for the allocation of the I and P frames. For those five patterns, we calculated $\bar{Q (j)}$ when $ρ$ is 10%, 20%, or 30%. The result of $\bar{Q (j)}$ shows that allocating every FEC packet to the I frame will lead to the best quality of receiving video.

Also, in every $ρ$ , allocating all FEC packets to the I frame leads to the best quality of receiving video. The values of $Q_{e} (j)$ and $\bar{Q (j)}$ show that, as $R_{I} (f)$ decreases, the quality deteriorates.

Figure 22 shows $Q_{s} (j)$ and $Q_{e} (j)$ depending on $R_{I} (f)$ and $ρ$ when the FEC packet is 64 kbps in the Akiyo video.

Figure 22.

$Q_{s} (j)$ and $Q_{e} (j)$ curves to $R_{I} (f)$ and $ρ$ , when the FEC packet rate is 64 kbps.

$Q_{s} (j)$ that predicted quality and calculated the occurrence possibility shows that allocating every FEC packet to the I frame would lead to the best quality of video for the whole $ρ$ . $Q_{e} (j)$ shows that quality decreases significantly when $ρ$ is 30% and $R_{I} (f)$ is less than 2, which could be explained by $L_{i}$ and $L_{P}$ .

Figures 23 and 24 show $L_{i}$ and $L_{P}$ according to $ρ$ and FEC packet dispersion. $ρ$ is 20% and one packet is added to the I frame, then the quality decreases drastically because $L_{i}$ exceeds 90%, which in turn decreases the overall quality.

Figure 23.

$L_{i} (f)$ curves to $R_{I} (f)$ in the Akiyo video, when the FEC packet rate is 64 kbps.

Figure 24.

$L_{p} (f)$ curves to $R_{P} (f)$ in the Akiyo video, when the FEC packet rate is 64 kbps.

The reason why quality decreases drastically when $ρ$ is 30% and $R_{I} (f)$ is less than 2 is because $L_{i}$ , which depends on $R_{I} (f)$ dispersion, is more than 97%. The values of $L_{i}$ and $L_{P}$ depending on various $ρ$ could be determined by the calculation of the number of packets of the I and P frames. The results are similar to those presented in Figures 22 and 23. Then, we are going to discuss $Q_{s} (j)$ and $Q_{e} (j)$ depending on $R_{I} (f)$ and $ρ$ in the Akiyo video when the FEC packet is 128 kbps.

Figure 25 shows $R_{I} (f)$ when the FEC packet is 128 kbps and it shows $\bar{Q (j)}$ and $Q_{e} (j)$ according to $ρ$ in the Akiyo video. The FEC packet of 128 kbps could be used to recover eight lost packets. Those eight FEC packets have nine patterns for allocating I and P frames. Regarding the nine patterns, $\bar{Q (j)}$ was predicted when $ρ$ is 10%, 20%, or 30%. When $ρ$ is 10%, $Q (j)$ was expected to be the best pattern in which four FEC packets are allocated to the I and P frames, respectively, while, when $ρ$ is 20%, five FEC packets are to be allocated to the I frame and three to the P frame. When $ρ$ is 30%, all FEC packets are allocated to the I frame.

Figure 25.

$\bar{Q (j)}$ and $Q_{e} (j)$ curves to $R_{I} (j)$ and $ρ$ , when the FEC packet rate is 128 kbps.

From the experiment, the dispersion of $R_{I} (f)$ that $\bar{Q (j)}$ proposed to every $ρ$ was seen superior at $Q_{e} (j)$ as well.

Figure 26 shows $R_{I} (f)$ when the FEC packet is 128 kbps and it shows $Q_{s} (j)$ and $Q_{e} (j)$ according to $ρ$ in the Akiyo video. $Q_{s} (j)$ predicts the quality of every case and calculates the occurrence probability that was assumed to give the best quality in $R_{I} (f)$ dispersion of $Q_{s} (j)$ , which was proposed by $\bar{Q (j)}$ to $ρ$ . In particular, when $ρ$ was 30% and $R_{I} (f)$ less than 2 was allocated to the I frame, $L_{i}$ became nearly 100%, which in turn decreased the overall quality. Figures 27 and 28 show $L_{i}$ and $L_{P}$ to $ρ$ and FEC packet dispersion.

Figure 26.

$Q_{s} (j)$ and $Q_{e} (j)$ curves to $R_{I} (f)$ and $ρ$ , when the FEC packet is 128 kbps.

Figure 27.

$L_{i} (f)$ curves for various $R_{I} (f)$ and $ρ$ , when the FEC packet rate is 128 kbps.

Figure 28.

$L_{p} (f)$ curves for various $L_{p} (f)$ and $ρ$ , when the FEC packet rate is 128 kbps.

The average of $α$ of the Akiyo video was 0.87 and the average for the Foreman video was 0.75. The average $β$ value of the Akiyo video was 0.7 and that of the Foreman was 0.6. When $ρ$ was 10%, the Akiyo video showed four to five $R_{I} (j)$ dispersion and the Foreman video five to six $R_{I} (j)$ . Accordingly, both $\bar{Q (j)}$ and $Q_{e} (j)$ were observed to be similar to 0.95. The GOPs of two videos have different values in $α$ , $β$ , $K_{I} (j)$ , and $K_{P} (j)$ . Because the value of $R (f)$ is limited and $R_{I} (j)$ is predicted using $\bar{Q (j)}$ , both videos have similar values of $\bar{Q (j)}$ and $Q_{e} (j)$ . Figure 29 shows $α$ , $β$ , $\bar{Q (j)}$ , and $Q_{e} (j)$ when combining Akiyo and Foreman videos, and Figure 30 shows $R_{I} (j)$ for $\bar{Q (j)}$ .

Figure 29.

$α$ , $β$ , $\bar{Q (j)}$ and $Q_{e} (j)$ curves of combined Akiyo and Foreman videos.

Figure 30.

$R_{I} (j)$ curves for $\bar{Q (j)}$ of combined Akiyo and Foreman videos.

Predicting video quality is cost intensive. In this study, using the frame information, the values of α and β are predicted. We examined $Q_{e} (j)$ when predicting quality dynamically using α and β at every GOP and when predicting quality while fixed. $R (f)$ for the Akiyo video was 128 kbps and ρ was 10%.

When allocating α and β dynamically, the average of $Q_{e} (j)$ was 0.94 and the dispersion was ±0.005, which was quite small. While using the initial value to allocate α and β, the average $Q_{e} (j)$ was 0.9 and dispersion ranged from 0.895 to 0.945. Figure 31 shows a comparison of $Q_{e} (j)$ between allocating α and β dynamically and fixed.

Figure 31.

Comparison of $Q_{e} (j)$ between allocating α and β dynamically and fixed in the Akiyo video.

Characteristics of Wi-Fi ULP based on the quality prediction model

To identify the characteristics of the $\bar{Q (j)} - based$ Wi-Fi ULP system, we examined $R_{I} (j)$ and $Q_{e} (j)$ for the size of $R (f)$ . The size of $R (f)$ was 64, 128, 256, and 512 kbps. Because the size of GOP is 15 and the packet size is 1024 bytes, 64 kbps uses 4 FEC packets, 128 kbps uses 8 packets, 256 kbps uses 16 packets, and 512 kbps uses 32 packets. Assuming that $ρ$ is 30%, $\bar{Q (j)}$ and $Q_{e} (j)$ were compared for each $R (f)$ . For 64 kbps, allocating all FEC packets to the I frame resulted in the highest $\bar{Q (j)}$ and $Q_{e} (j)$ , and 128 kbps as well. For 256 kbps, allocating 13 FEC packets to the I frame resulted in the highest $\bar{Q (j)}$ and $Q_{e} (j)$ , and for 512 kbps 22 FEC packets were allocated to the I frame. In these four cases, the numbers of I frames for $\bar{Q (j)}$ and $Q_{e} (j)$ were the same. In addition, as the amount of FEC packets increases, the values of $\bar{Q (j)}$ and $Q_{e} (j)$ become closer. Figure 32 shows $\bar{Q (j)}$ and $Q_{e} (j)$ according to $R (f)$ . Therefore, $\bar{Q (j)}$ and $Q_{e} (j)$ with the increase of $ρ$ and the number of $R_{I} (j)$ were identified.

Figure 32.

$\bar{Q (j)}$ and $Q_{e} (j)$ graphs to various sizes of R(f).

The size of $R (f)$ is 256 kbps. When $ρ$ is 10%, allocating 10 packets to $R_{I} (j)$ resulted in the highest $\bar{Q (j)}$ and $Q_{e} (j)$ as well. When $ρ$ is 20%, 12 packets are allocated, when it is 40%, 13 packets are allocated, and, when it is 50%, 14 packets are allocated to the I frame. This resulted in the highest $Q (j)$ and $Q_{e} (j)$ . Figure 33 shows the $\bar{Q (j)}$ and $Q_{e} (j)$ graphs for various $ρ$ and the number of $R_{I} (j)$ .

Figure 33.

$\bar{Q (j)}$ and $Q_{e} (j)$ graphs to various $ρ$ .

Comparison of existing research

In this chapter, existing research on lost packet recovery algorithms and the quality prediction model–based ULP system proposed in this study are discussed. Figure 34 shows the no-recovery, ELP, and Hartanto³⁰ algorithms at FEC data 64 kbps in the Akiyo video and the average of $Q_{e} (j)$ for $\bar{Q (j)}$ to the whole video.

Figure 34.

Q curves of each loss recovery method in the Akiyo video, when the FEC packet rate is 64 kbps.

The no-recovery algorithm measures the quality of receiving video after loss occurs without the FEC packet. The ELP algorithm equally allocates FEC packets to every packet without distinguishing frame. The Hartanto algorithm assigns different FEC rates to different frame sizes. For the Wi-Fi broadcasting system, we decided to compare the no-recovery algorithm, ELP algorithm, and Hartanto algorithm with $\bar{Q (j)}$ in this study.

In this article, we implement the proposed algorithm on an embedded board and construct a real Wi-Fi broadcasting system to show its performance. Since the FEC of the Wi-Fi broadcasting system is implemented in the application layer, we will compare it with the Hartanto, ELP, and no-recovery algorithms that can be implemented in related work.

For the ELP algorithm, the selection of k of FEC(n, k) is important, but in this study k was set as the number of packets per second of ELP. Hence, under circumstances where the FEC packet is limited, the ELP and Hartanto algorithms had similar allocation, which in turn resulted in similar receiving video quality. At low values of $ρ$ (5%–10%), $Q_{e} (j)$ was similar between ELP and Hartanto. However, when $ρ$ increased to 25%, $Q_{e} (j)$ was improved by 0.3. Calculating this with PSNR, $Q_{e} (j)$ became 35.43 dB and Hartanto or ELP became 24.87 dB. Calculating this with MOS, $Q_{e} (j)$ became 4, which meant the good state, and Hartanto or ELP became 2, which meant the poor state. The improvement of $Q_{e} (j)$ by 0.3 means that the quality of video improved from a poor state to a good state.

Figure 35 shows the no-recovery, ELP, and Hartanto³⁰ algorithms at FEC data 64 kbps in the Foreman video and the average of $Q_{e} (j)$ for $\bar{Q (j)}$ to the whole video.

Figure 35.

Q curves of each loss recovery method in the Foreman video, when the FEC packet rate is 64 kbps.

Compared to the Akiyo video, the Foreman video has more I frame packets, the quality of receiving video to packet loss showed lower values. In the Foreman video, an average $Q_{e} (j)$ value of 0.12 showed the superior quality of the method that was proposed in this study.

When $ρ$ is 25%, the Hartanto algorithm showed that $Q_{e} (j)$ was improved by 0.17 compared to ELP. Calculating this in PSNR, $Q_{e} (j)$ became 31.14 dB, Hartanto resulted in 27.84 dB, and ELP resulted in 25.53 dB. Calculating this in MOS, $Q_{e} (j)$ meant good, Hartanto meant fair, and ELP meant fair. The improvement of $Q_{e} (j)$ by 0.1 and 0.15 represented that the quality of video has improved from a fair state to a good state.

Finally, we compared the Q of ELP, Hartanto, and $Q_{e} (j)$ using eight videos with different motion characteristics (Figure 36). The experimental conditions are that $ρ$ is 20% and the FEC rate is 64 kbps. The proposed algorithm is superior to other techniques in terms of received video quality, even in situation where the motion characteristics of video are different.

Figure 36.

Q comparison of ELP, Hartanto, and $Q_{e} (j)$ for eight videos when $ρ$ is 20% and the FEC packet rate is 64 kbps.

Conclusion

With the development of high-performance smartphones, tablet PCs, and laptops, various high-speed network technologies are able to offer real-time multimedia services to users. Consequently, the demand of broadband Wi-Fi technology has been on the rise.

Unlike the unicast-based Wi-Fi system that is used to transmit wireless data to users for multimedia services, the Wi-Fi broadcasting system uses broadcasting packets to offer multimedia broadcasting services to multiple users simultaneously. However, Wi-Fi broadcasting packets do not offer a lost packet protection algorithm at the MAC level or retransmission, and thus packet loss is unavoidable. Hence, additional technology such as FEC is required to recover lost packets.

The multimedia data transmitted through the Wi-Fi broadcasting system are first encoded for effective transmission and then those encoded frames are assigned different roles and features, which in turn resulted in video quality distortion to frame losses. Considering these frame features, recovering lost packets using FEC packets without distinguishing frames is not an effective method to improve video quality. Therefore, ULP has been proposed.

If the quality of receiving video after a certain frame recovery could be predicted, lost packets would be effectively recovered based on the predicted quality of the video. Therefore, this research has performed modeling of the predicted quality of receiving video to frame loss in a Wi-Fi broadcasting system. To propose a quality prediction model to account for frame loss, we first analyzed the quality deterioration features on the loss occurrence of an I frame and then proposed a quality prediction model that could be applied to ULP. In addition, we proposed a quality prediction model for frame loss and a ULP algorithm based on frame loss probability.

To verify the performance of the proposed algorithm, the quality prediction model–based ULP system was realized in a Linux-based embedded board. The realized algorithm was first evaluated for validity through the actual experiment and simulation. To examine the validity of the proposed quality prediction model, an experiment was conducted to determine the loss rate of the I and P frames and the allocation of FEC packets.

The average quality of the proposed algorithm was different for various loss rates of I and P frames. Through the experiment, the validity of the algorithm as a quality metric of the loss rate of the I and P frames was examined. Then, the algorithm was examined through the experimental and simulation results for FEC packet allocation. This study predicts the quality according to frame loss with a simple calculation; therefore, two different videos were played to evaluate the quality when a quality deterioration coefficient was fixed to frame loss, by adaptively calculating the coefficient.

Finally, the algorithm was compared to the ELP and Hartanto algorithms, which are existing lost packet protection techniques. For the Akiyo video, the quality of receiving video utilizing the proposed algorithm showed good results. In particular, when the packet loss rate was 25%, the proposed algorithm showed a Q improvement of 0.3 compared to Hartanto. The 0.3 improvement means that the subjective video quality has been improved from poor to good. In addition, for the Foreman video, Q was improved by an average of 0.12 to every packet loss rate. In particular, when $ρ$ was 25%, Hartanto showed that Q was improved by 0.1. The improvement by 0.1 means that the subjective video quality has been improved from fair to good.

While predicting the receiving video at the GOP level, the success of the transmission of the I frame decides the difficulty of the video quality prediction model. This research used a simple method to predict the quality of GOP to I frame loss. However, consecutive losses of I frames in experiments have drastically decreased the capability of receiving video. In addition, in the existing studies on ULP, researchers have designed packet transmission techniques for certain multimedia data and evaluated them through simulation, which made it difficult to find a comparison model with the one proposed in this study that was designed in the Wi-Fi-based broadcasting system.

Multimedia data have become high-capacity and Wi-Fi techniques have become high-speed. In this circumstance, effectively transmitting high-capacity multimedia data would be challenging. Hence, this area might be classified into network technology and multimedia data compression techniques for future research.

Footnotes

Handling Editor: Pascal Lorenz

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2014R1A1A2009092).

ORCID iD

Jong Deok Kim

References

ETSI TS 102 428:2009. Digital audio broadcasting (DAB); DMB video service; user application specification.

ETSI EN 302 304:2004. Digital video broadcasting (DVB); transmission system for handheld terminals (DVB-H), V1.1.1.

3GPP TS 22.146:2003. Multimedia broadcast/multicast service (MBMS); stage1.

3GPP TS 22.246:2004. Multimedia broadcast/multicast service (MBMS) user service; stage 1.

3GPP TS 23.246:2009. Multimedia broadcast/multicast service (MBMS) user service; stage 1 (release 8).

IEEE 802.11:1997s. Wireless LAN medium access control (MAC) and physical layer (PHY) specifications.

Kim

. Broadcast packet collision and avoidance method in Wi-Fi based broadcasting system. In: Proceedings of the international conference on information network (ICOIN 2012), Bali, 1–3 February 2012, pp.108–113. New York: IEEE.

Kim

Kong

Kim

. A quality ratio-based novel unequal loss protection scheme in Wi-Fi broadcasting system. Int J Comput Commun Eng 2013; 2(3): 313–323.

Kim

. A collision avoidance scheme for the synchronized broadcast packets in a multi-AP Wi-Fi broadcasting system. Multimed Tool Appl 2014; 69(3): 643–659.

10.

ISO/IEC. Text description of joint model reference encoding methods and decoding concealment methods, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Document JVT-K049. New York: ISO/IEC, 2004.

11.

Ostermann

Bormans

List

et al . Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuit Syst Mag 2004; 4(1): 7–28.

12.

Richardson

LEG

. H.264 and MPEG-4 video compression. Hoboken, NJ: John Wiley & Sons, 2003.

13.

Zhang

Regunathan

Rose

. Video coding with optimal inter/intra-mode switching for packet loss resilience. IEEE J Select Area Commun 2000; 18(6): 966–976.

14.

Yang

Rose

. Advances in recursive per-pixel end-to-end distortion estimation for robust video coding in H.264/AVC. IEEE T Circuit Syst Video Technol 2007; 17(7): 845–856.

15.

Ekmekci

Sikora

. Recursive decoder distortion estimation based on ar(1) source modeling for video. In: Proceedings of the international conference on image processing (ICIP 2004), Singapore, 24–27 October 2004, pp.187–190. New York: IEEE.

16.

Wang

Boyce

. Modeling of transmission-loss-induced distortion in decoded video. IEEE T Circuit Syst Video Technol 2006; 16(6): 716–732.

17.

Yang

Rose

. Recursive end-to-end distortion estimation with model-based cross-correlation approximation. In: Proceedings of the international conference on image processing (ICIP 2003), Barcelona, 14–17 September 2003, pp.469–472. New York: IEEE.

18.

Cai

Chen

. Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding. IEEE T Circuit Syst Video Technol 2002; 12(6): 511–523.

19.

Han

Men

Chang

et al . GOP-level transmission distortion modeling for video streaming over mobile networks. In: Proceedings of the international conference on information assurance and security (IAS 2009), Xi’an, China, 18–20 August 2009, pp.91–95. New York: IEEE.

20.

Wang

Zhu

. Error control and concealment for video communication: a review. Proc IEEE 1998; 86(5): 974–997.

21.

Gallant

Kossentini

. Rate-distortion optimized layered coding with unequal error protection for robust internet video. IEEE T Circuit Syst Video Technol 2001; 11(3): 357–372.

22.

Kim

Mercereau

Altunbasak

. Error-resilient image and video transmission over the Internet using unequal error protection. IEEE T Image Pr 2003; 12(2): 121–131.

23.

Liu

. Adaptive unequal loss protection for scalable video streaming over IP network. IEEE T Consum Electr 2005; 51(4): 1277–1282.

24.

Wiegand

Sullivan

Bjontegaard

et al . Overview of the H.264/AVC video coding standard. IEEE T Circuit Syst Video Technol 2003; 13(7): 560–576.

25.

Nafaa

Taleb

Murphy

. Forward error correction strategies for media streaming over wireless networks. IEEE Commun Mag 2008; 46(1): 72–79.

26.

Mohr

Riskin

Ladner

. Unequal loss protection: graceful degradation of image quality over packet erasure channels through forward error correction. IEEE J Select Area Commun 2000; 18(6): 819–828.

27.

Cavusoglu

Schonfeld

Ansari

et al . Real-time low-complexity adaptive approach for enhanced QoS and error resilience in MPEG-2 video transport over RTP networks. IEEE T Circuit Syst Video Technol 2005; 15(12): 1604–1614.

28.

Chang

Lee

Komyia

. A fast forward error correction allocation algorithm for unequal error protection of video transmission over wireless channels. IEEE T Consum Electr 2008; 54(3): 1066–1073.

29.

Marx

Farah

. A novel approach to achieve unequal error protection for video transmission over 3G wireless networks. Signal Pr 2004; 19(4): 313–323.

30.

Hartanto

Sirisena

. Hybrid error control mechanism for video transmission in the wireless IP networks. In: Proceedings of the 10th IEEE workshop on local and metropolitan area network, Sydney, NSW, Australia, 21–24 November 1999, pp.126–132. New York: IEEE.

31.

Diaz

Cabrera

Jaureguizar

et al . A video-aware FEC-based unequal loss protection system for video streaming over RTP. IEEE T Consum Electr 2011; 57(2): 523–531.

32.

Jose

Sameer

. A new unequal error protection technique for scalable video transmission over multimedia wireless networks. In: Proceedings of the 2015 IEEE international conference on signal processing, informatics, communication and energy system, Kozhikode, India, 19–21 February 2015. New York: IEEE.

33.

Haghifam

Badr

Khisti

et al . Streaming codes with unequal error protection against burst losses. In: Proceedings of the 2018 29th Biennial symposium on communications, Toronto, ON, Canada, 6–7 June 2018. New York: Springer.

34.

Wang

et al . Unequal error protection for 360 VR video based on expanding window fountain codes. In: Proceedings of the 2018 international conference on network infrastructure and digital content, Guiyang, China, 22–24 August 2018, pp.295–299. New York: IEEE.

35.

Video Sequences YUV, http://trace.eas.asu.edu/yuv/

36.

The ZFEC library, https://pypi.python.org/pypi/zfec

37.

Klaue

Rathke

Wolisz

. EvalVid—a framework for video transmission and quality evaluation. In: Proceedings of the 13th international conference on modelling techniques and tools for computer performance evaluation, Urbana, IL, 2–5 September 2003, pp.255–272. New York: Springer.