Abstract
Various applications based on IoT real-time multimedia are under the spotlight. To implement real-time multimedia service, motion estimation in video compression service has a high computational complexity. In this paper, an efficient motion search method based on content awareness is proposed consisting of three steps. The first step is motion classification using the center position cost distribution. The second step is calculation of a predictor based motion classification. The third step is setting the arm size of the search pattern based on adaptive use of the distance between the predictor and the center position. Experimental results show that the proposed algorithm achieves speed-up factors of up to 48.57% and 16.03%, on average, with good bitrate performance, compared with fast integer-pel and fractional-pel motion estimation for H.264/AVC (UMHexagonS), and an enhanced predictive zonal search for single and multiple frame motion estimation (EPZS) methods using JM 18.5, respectively. In addition, the proposed algorithm achieves a speed-up factor of up to 42.61%, on average, with negligible bitrate degradation, compared with the TZ search motion estimation algorithm for the multiview video coding (TZS) method on HM 10.0.
1. Introduction
The rapid development of Internet of Things (IoT) technology makes it possible for connecting various smart objects together through the Internet and providing more data interoperability methods for application purpose. Recent research shows more potential applications of IoT in information intensive industrial sectors such as healthcare services [1]. Also, with the vigorous development of the Internet of Things (IoT) technology, the wave of the Internet of Things applications followed by applications based on IoT real-time multimedia is under the spotlight [2].
Most viewers prefer digital television delivered via terrestrial, cable, satellite, or the Internet over analog service due to a greater choice of channels, electronic program guides, and high definition services. Analog TV has been switched off in many countries. Many TV programs can be provided via the Internet. However, due to the huge size of data for a video signal, in practical scenarios it is not possible to transmit raw data and video compression is required. To generalize the video data compression technique for use with different kinds of media, a global standard for video compression has been developed by the ISO/IEC motion picture experts group (MPEG) and the ITU-T video coding experts group (VCEG).
The ITU-T video coding expert group and the ISO/IEC moving picture experts group together proposed the H.264/AVC video coding standard in 2003 [3]. This standard is widely used for many applications, including broadcast of HD TV signals, video content acquisition, camcorders, and security applications. However, due to increasing demands for ultrahigh definition over and above HD video formats VCEG and MPEG established a Joint Collaborative Team on Video Coding (JCT-VC) January 2010 to implement the next-generation video compression standard, named high efficiency video coding (HEVC) [4]. The aim of this new standard is to further reduce the bitrate by 50% with the same reconstructed video quality. With these standards, Liu and Yan have introduced the multiview video technology, explained the principle and coding structure of multiview video, and discussed two kinds of video monitoring applications mode in Internet of Things, focusing on the application of the intelligent security systems in the IoT environment [5].
Both the H.264/AVC and HEVC video coding standards use both temporal and spatial redundancy to achieve compression. In the temporal domain, there is usually a high degree of correlation or similarity between video frames that were captured at nearly the same time. Temporally adjacent frames (successive frames in time order) are often highly correlated, especially if the temporal sampling rate or frame rate is high. In the spatial domain, there is usually a high degree of correlation between pixels that are close to each other as values of neighboring pixel are often similar.
In general, a video system comprises the two major modules of intraframe coding and interframe coding. The block based motion estimation (BME) algorithm plays a key role in interframe coding. In the H.264/AVC video coding standard, BME has 7 different block sizes that are used for interframe motion estimation and compensation [6]. These different block sizes actually form a two-level hierarchy inside a macroblock (MB). The first level comprises block sizes of

Different macroblock partitions.
For HEVC, sizes of coding blocks are not fixed, as are macroblocks in H.264/AVC. A flexible coding block (CB) and a novel quadtree based coding tree unit (CTU) have been proposed for HEVC. Prediction of each block is performed in the prediction unit (PU). Illustration of the CTB structure for a CTU is shown in Figure 2(a) where

(a) CTB structure that provides the lowest RD cost for the CTU; (b) the corresponding CTU partitioning for the best CTB structure [9].
Among traditional motion estimation (ME) algorithms, a full search (FS) algorithm requires much time due to evaluation of all candidate macroblocks to achieve optimal performance. However, the FS algorithm is not suitable in real-time video coding applications because high degree of computational complexity is involved.
To overcome this problem in use of the FS algorithm, variable fast block-matching algorithms (FBMA) have been proposed to increase the speed of the motion estimation process based on reduction of the number of search points using three different approaches. The first approach uses coarse-to-fine searching as a three-step search algorithm (TSS) [7], a 2D logarithmic search algorithm [8], and a four-step search algorithm (4SS) [9]. The second approach uses center-biased characteristics of the error distortion as a block based gradient descent search algorithm (BBGDS) [10] and a diamond search algorithm (DS) [11]. BBGDS is difficult to apply fast motion and simply falls to a local minimum position. DS is efficient for use in midlevel motion. However, many search points are required in slow and fast motion. The third approach uses temporal and/or spatial correlation for calculation of a predicted motion vector (MV). An adaptive rood pattern search (ARPS) and an enhanced predictive zonal search for single and multiple frame motion estimation (EPZS) are also used. These algorithms use a set pattern size or start position from a previous frame and/or a neighboring current block MV. The EPZS and ARPS approaches also preserve the peak signal-to-noise ratio (PSNR), as does FS, while reducing the time required and maintaining the bitrate.
An adaptive FBMA has also been proposed based on pattern switching among TSS, DS, and BBGDS. However, some proposed algorithms only consider a fixed size MB [12]. In [13], a clustering-based approach has been proposed in which an algorithm periodically counts the motion vectors of past blocks to accumulate progressive clustering statistics; then it uses the clusters as MV predictors for following blocks.
To reduce the computational complexity of the motion estimation module, an efficient algorithm is herein proposed for classification of video motion content using an adaptive size for a search pattern. The proposed algorithm reduces computational complexity by use of a predictor.
The rest of the paper is organized as follows: Section 2 presents related motion estimation algorithms. The proposed algorithm is described in Section 3. Experimental results are presented in Section 4, and conclusions are drawn in Section 5.
2. Related Research
Block based encoding structure is used in the H.264/AVC and HEVC video standard. For interprediction, a motion estimation technique is at the core of video compression and for different video processing applications that extract motion information from a video sequence. Typically using motion estimation, a motion vector is generated for a block (MB or CU) in the video compression standard. The motion vector indicates displacement of a block of pixels from a current location due to motion of an object or the camera. This information is used to determine the best matching block in the reference frame that minimizes the rate-distortion cost. This technique is known as the block-matching algorithm (BMA). Based on study of different motion estimation algorithms used in the H.264/AVC and HEVC standards, existing BMAs can be classified into the three categories of (1) fixed search patterns, (2) search patterns based on block correlation, and (3) search patterns based on motion classification.
2.1. Fixed Search Patterns
Most methods are based on the assumption that the ME matching error decreases monotonically as search moves toward the position of the global minimum error. The motion vector of each block is searched independently using fixed search patterns. Examples are displacement measurement and applications in interframe image coding (2-LOGS), motion compensated interframe coding for video conferencing (TSS), a novel four-step search algorithm for fast block motion estimation (4SS), a block based gradient descent search algorithm for block motion estimation in video coding (BBGDS), a hexagon-based search pattern for fast block motion estimation (HEXBS) [14], a new diamond search algorithm for fast block-matching motion estimation (DS), and fast integer-pel and fractional-pel motion estimation for H.264/AVC (UMHexagonS) [15]. These algorithms reduce the number of search points with a tradeoff between complexity reduction and image quality.
Both 4SS and TSS are efficient for fast motion video sequences because MVs in fast motion sequences are far removed from the center point of a macroblock. However, in other cases, such as medium and slow motion sequences, the MV can be trapped in a local minimum. TSS also uses a constantly allocated checking point pattern in the first step, which becomes inefficient for estimation of slow motion. A new three-step search for block motion estimation (NTSS) [16], an efficient three-step search algorithm for block motion estimation (ETSS) [17], and a simple and efficient search algorithm for block-matching motion estimation (SES) have been proposed in order to improve the performance of simple fixed search pattern algorithms [18].
2.2. Search Patterns Based on Block Correlation
Instead of using predetermined search patterns, methods for exploitation of the correlation between the current block and an adjacent block in the spatial and/or temporal domains for prediction of candidate MVs are being considered. Predicted MVs are obtained based on calculation of statistical average (such as a mean, a median, or weighted mean/median) for neighboring MVs or selection of one of the neighboring MVs according to predetermined criteria [19]. One such candidate method, named accelerator MVs, is based on differentially increased or decreased MVs after consideration of the motion vector of the collocated frame in the previous frame and also of the frame before that.
The concept behind selection of such predictor is that a block may not follow a constant velocity but may be accelerated. This kind of approach uses spatial or/and temporal correlation to calculate the predictor as the ARPS and EPZS. These types of algorithms set pattern sizes or estimate positions from a previous frame or/and a neighboring current block MVs. Both EPZS and ARPS preserve the peak signal-to-noise ratio (PSNR) as does FS, and the time required is reduced with a similar bitrate. However, these procedures create much overhead in memory resources since they use Spatio-temproal information.
2.3. Search Patterns Based on Motion Classification
Apart from fixed or variable search patterns, the motion activity of a video sequence has been used for a block-matching algorithm. Video sequences can be broadly divided into the three categories of frame slow, medium, and fast sequences based on motion activity in successive frames. Algorithms use different schemes to classify video sequences.
A proposed search patterns switching algorithm for block motion estimation (SPS) [20] has combined two approaches of motion estimation. The first approach uses a coarse-to-fine technique for reduction of the number of search points, similar to 2-DLG and TSS. This approach is efficient for fast motion video sequences because, in these sequences, search points are evenly distributed over the search window and, thus, global minima far distant from window centers can be located more efficiently. The second approach uses the center-biased characteristic of MVs. Algorithms such as N3SS, 4SS, BBGDS, and DS use center-biased search patterns to use the center-biased global minima distribution. Compared with the first approach, a substantial reduction in the number of search points can be achieved for slow motion sequences. SPS algorithms combine the advantages of these two approaches by use of different search patterns according to the motion content of a block. The performance of an adaptive algorithm depends on the accuracy of motion content classification.
In real video sequences with slow, medium, and fast motion, different motions frequently occur together. The adaptive fast block-matching algorithm using switching search patterns for sequences with a wide-range of motion (A-TDB) can efficiently remove temporal redundancy of sequences containing a wide range of motion. Based on characteristics of a predicted profit list, A-TDB can adaptively switch search patterns among TSS, DS, and BBGDS according to the motion content [21].
An adaptive motion estimation scheme for video coding (NUMHexagonS) using statistics of the MV distribution has been developed. The algorithm uses method for prediction of the MV distribution and makes full use of MV characteristics. A combined MV distribution prediction with new search patterns also makes the search position more accurate [22].
3. Proposed Work
Most real world video sequences contain a substantial amount of background and many motionless objects that have a high temporal correlation between successive frames. In order to estimate movement accurately, different pattern sizes are used based on the type of motion content.
3.1. Motion Classification Based on Video Content
There are several measurements for classifying the motion degree of video content: RD cost, sum of absolute difference (SAD), and the mean of absolute difference (MAD). Usually, analysis of MAD value is robust to some noise components for separating the motion degree. Based on this, we also employ the MAD value to classify the motion degree.
We have analyzed the distribution of the mean absolute difference (MAD) of the center position for the Akiyo, Foreman, and Football sequences, typical representatives of slow, medium, and fast motion, respectively. Mean and standard deviation values were calculated and used to model Gaussian distribution as shown in Figure 3.

Intersection of Akiyo, Foreman, and Soccer sequence distributions.
Only intermotion estimation was decomposed. Calculated MAD values are represented as
Analysis was performed for a particular QP value that varies from 20 to 36. Corresponding
Change in threshold points for different QP values.
To effectively locate the global minimum, predictor is designed to use the motion content. Predictor designs in many ME algorithms have been proposed and most use temporal and spatial information [23, 24]. While accurate predictors can be located, resource overhead and much memory are required. In the next section, calculation of the predictors to solve this problem is explained.
3.2. Calculation of Predictors
Based on a unimodal error surface assumption, block distortion monotonically decreases towards the global minimum. A predictor is calculated using three steps. First, distortion of the center point is compared to the threshold value. If the determined motion is slow, calculation of the predictor is not performed. Otherwise, distortions at the center and four adjacent center points are calculated. Among these five points, the one with the minimum distortion is recognized as the best point and not
From these
Experimental results for determination of the jumping distance.
For the Foreman and Football sequences, two minima are calculated initially for a candidate point where the candidate point is shown as the center point and in black color in Figure 4, two separate loops are generated for each minima (

Candidates with consideration of directional prediction.
From
3.3. Selection of a Search Pattern Size
A pattern size decision algorithm is introduced (Figure 5). Initially, the maximum distance from the predictor to the center is calculated, which is 3 in Figure 5. Using this maximum distance, a square region is considered with 4 points (Figure 5, marked as

An adaptive diamond search pattern and a small diamond search pattern.
SAD values of these 4 new points are calculated and centers are checked. If the minimum SAD value is the center of the diamond, then the large search pattern size decision algorithm terminates. For this case, a small diamond search region is considered again with 4 new points (shown in Figure 5 as

Overall procedure of the proposed algorithm.
3.4. Overall Procedure
The proposed algorithm can be divided into the steps shown in Figure 6:
Initial decision for predictor candidates. Estimation of predictors. Setting a search pattern size. Performing search patterns.
Initially, values
For slow motion, only a small diamond search is considered in the proposed algorithm. On the other hand, medium and fast motion based blocks require calculation of the predictor. The overall procedure is shown in Figure 6.
4. Experimental Results
The proposed algorithm was implemented using JM 18.5 [25] and HM 10.0 [26] of HEVC. Measurement of
4.1. Performance Evaluation Using H.264/AVC Standard
The experimental conditions were as follows:
Frames To Be Encoded 100. Search Range 32. QP 24, 28, and 32. Framerate 20. Entropy coding method CABAC. The format of YUV 4:2:0. The coding environment IPPP frame structure. Five reference frames per sequence. Only integer motion estimation.
Video sequences considered were News, Foreman, Soccer, Stefan, and Tractor. News is representative of slow motion. Foreman and Tractor are medium motion, and fast motion representative sequences are Soccer and Stefan. The frame size of Tractor was
Experimental results are shown in Table 3 in which the proposed algorithm is compared with the full search (FS), UMHexagon search, and EPZ search algorithms. Deviations between the proposed algorithm and the comparison algorithms are shown. The proposed algorithm achieved reductions of 97.10% for total time, on average, compared with FS and achieved, on average, 29.43% and 8.28% more total time reductions than for UMHexagonS and EPZS, respectively. Considering the motion estimation time (
The performance comparison of the proposed algorithm with the FS, UMHexagonS, and the EPZS on the JM 18.5.
Quality degradation of the proposed algorithm is also shown based on the
To show PSNR deviation from a low to a high bitrate, RD curves of the proposed and compared algorithms are plotted in Figure 7 for all sequences. All of the algorithms provided almost the same RD curve. For the Forman sequence, the proposed and UMHexagonS algorithms produced similar RD performance. The proposed algorithm achieved a better result than UMHexagonS for the Tractor sequence and almost the same performance as FS for the same sequence.

Rate-distortion (RD) curves for (a) News, (b) Foreman, (c) Soccer, and (d) Tractor sequences for slow motion, middle motion, and fast motion.
4.2. Performance Evaluation Using HEVC Standard
The proposed algorithm was tested in HM 10.0 reference software using the HEVC encoder. The test environment using the same hardware as for the H.264/AVC comparison was based on the following configuration:
QP values are 24, 28, and 32. The first 30 frames of each sequences were considered. All analyses were performed in low-delay main mode. Fast modes were set in the order of FEN: 1, ECU: 0, FDM: 1, CFM: 0, ESD: 0. Sequences were tested, including Cactus (Class B), BasketballDrive (Class B), BQMall (Class C), PartyScene (Class C), and BasketballPass (Class D).
Results are shown in Table 4 in which the proposed algorithm was compared with the FS and the TZS algorithms [27]. The proposed algorithm reduced the overall encoding time, on average, 95.57% and 14.27%, compared with FS and TZS, respectively. Considering only the motion estimation time, the proposed algorithm achieves 99% and 42.61% time reductions, compared with the FS and TZS algorithms.
The performance comparison of the proposed algorithm with the FS and the TZS algorithm on the HM 10.0.
To evaluate quality degradation,
RD curves of all sequences are shown in Figures 8, 9, and 10 for Class B, Class C, and Class D. The proposed algorithm achieved almost the same result as the FS and TZS algorithms. For the BasketballDrive sequence, the proposed method achieved a degradation of the bitrate, compared with FS, due to the complexity of the sequence.

Rate-distortion (RD) curves for (a) Cactus and (b) BasketballDrive sequences for Class B in low-delay-p, main condition.

Rate-distortion (RD) curves for (a) BQMall and (b) PartyScene sequences for Class C in low-delay-P, main condition.

Rate-distortion (RD) curves for BasketballPass sequences for Class D in low-delay-P, main condition.
In IoT environment, multimedia contents can be also generated locally or downloaded from the Internet through any WiFi, DSRC, WiMAX, 4G LTE connections available to the mobile/smart devices and enable the end-user or end-system to take appropriate actions and be aware of the environmental conditions based on rich visual information. In view of this, real-time content delivery is very important, especially based on HEVC video standard. The proposed algorithm can provide a scheme for supporting the real-time UHD video content distribution system.
5. Conclusions
A video content-based fast motion estimation algorithm is herein proposed. Based on motion classification, a predictor is used and an adaptive search pattern size is set. Using H.264/AVC, experimental results showed that the proposed algorithm achieved speed-up factors of up to 48.57% and 16.03%, on average, compared with UMHexagonS and EPZS, respectively, while maintaining good bitrate performance.
For HEVC, the proposed algorithm achieved a speed-up factor of up to 42.61%, on average, compared with TZS. Also, for medium and fast motion sequences, the proposed algorithm substantially reduced the consumed time for motion estimation. The proposed scheme will be helpful for implementation of a real-time video encoding system because of no need for large memory comparing with EPZS and UMHexagonS algorithms. It means that the proposed algorithm can give easy hardware implementation of HEVC encoder on portable (embedded) devices.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
