Abstract
In some business applications, all kinds of cameras sensors are employed in a distributed way to capture videos for different tasks such as surveillance. Once some illegal actions happen, then somebody or some organization wants to forge or replace some surveillance video clips to destroy evidences or obtain illegal profits. How to authenticate the genuineness and integrity of the source video or trace the source of a video information leak becomes a growing requirement in these small businesses. Fortunately, video watermark just provides an effective technology to resolve this issue. This paper proposes a real-time video watermarking scheme for MPEG, where firstly exploits fast scenes segmentation to original video sequence and adaptively selects appropriate scenes to be embedded. Furthermore, visual model is utilized to modulate watermark strength. Watermarks are embedded by adjusting the number of bit1 in the bitstreams through changing level of run-level pairs. Experiment results show little loss of video quality and also exhibit excellent robustness against many attacks. As watermark is directly detected in bitstreams domain, real-time detection becomes a reality. In addition, the embedding strategy guarantees that the bit rate is not increased and the experiments also validate it.
1. Introduction
The rapid development of internet technologies has extremely accelerated the speed of information exchange and extended the channel of information exchange. Moreover as we know “a picture is worth a thousand words”; audio and video have semantically much richer than just text or/and images. Thus, applications of image and video have growing from the begging of this century. Now with the development of multimedia technology and the increasing of internet bandwidth, people prefer audio and video with most expressive form of media to just single media such as text or images. In recent years, there have been an increasing number of available websites providing video servers, for example, video downloads, online video, and video sharing services such as YouTube [1–3]. Particularly the development of digital imaging and network technologies make cheap digital video production and convenience and fast video transmission possible. Compared with few years ago, nowadays social networks, such as, Facebook, Twitter, QQ, and Weixin, all provide video sharing and playing services [4–6]. However, the wide dissemination and the feature of making perfect copy easily make privacy protection, authentication and access control become a growing concern, the video watermarking provides a potential measure to these problem and has been a research hotspot in recent years [7, 8].
Let us take video surveillance as an example. Nowadays, video surveillance is very popular everywhere. A security supervisor should guarantee that the surveillance videos recorded everyday must not be changed by other people or employees.
This task is not an easy task for each security supervisor. But once the forging happens, it will lead to inestimable loss. This is actually happening in a real case. Some employee stole money from a company and replaced the original surveillance video with a fake surveillance video to cheat the security supervisor. Actually, some researchers thought that using video surveillance as evidence in court is absolute reliability [9]. Although surveillance video can be actualized by witnesses acquainted with the video subject, the genuineness of video must be guaranteed by some technologies. Otherwise, the forged video will make many troubles. Thus, how to achieve the destination is urgent. Fortunately, video watermarking or video data hiding provides a very promising solution for this. Particularly in some privacy preserving environments, there are some very interesting attempts [10, 11].
Generally speaking, invisibility, robustness, and real-time processing are the major challenges of video watermarking technology [12]. One video watermarking technology always wants to keep better invisibility and stronger robustness and less processing time. However these three features conflict with each other. Hence, a good video watermarking algorithm will achieve the best tradeoff among these features under some constraints of the algorithm's application environment. Invisibility means that the distortion caused by watermark embedding is imperceptible by human eyes. Robustness requires that the watermark algorithm can resist variety of intentional and unintentional attacks. Because the watermark can be essentially looked at as the noise embedded by the watermarking algorithm in original signal, the much stronger robustness will lead to the weaker invisibility of watermarking algorithm. The real-time processing requires the lower time-complexity so that the watermark embedding and extraction do not delay remarkably the normal video operations, for example, play, and download. Otherwise, the video watermark will degrade the user experience. Döerr and Dugelay [13] deemed that real-time is the big challenge of video watermarking and thought that there are two ways to improve the real-time feature; one is to lower the algorithm's complexity and the other is to transfer the computation burden to the video provider or watermark embedding side; thereafter the complexity of client or detection side is decreased.
In the earlier period of watermarking, invisibility and robustness are heavily emphasized and the real-time is neglected. For example, Swanson et al. [14] proposed a video watermark algorithm based on human visual system model and scene segmentation and achieved the good tradeoff between invisibility and robustness by adjusting the wavelet coefficients of video frames; however the computational complexity is too large. Niu et al. [15] proposed to apply wavelet transform to watermark and original video frames and then used error correction code to improve the robustness of the method. But the algorithm is time-consuming. Nowadays, due to the development of internet bandwidth, digital video applications have been widely used in daily living; VOD (video on command), real-time interactive video, online live, and so on request the real-time processing. Even normal video applications also demand low-delay, hence it is more important than ever to real-time processing of watermarking.
Real-time video watermarking always embeds the watermark in compressed domain to avoid the computational burden in video coding. Hartung and Girod [16] proposed a watermarking algorithm in compressed domain by using spread-spectrum method and they used the drift-compensation measure to improve the robustness of the watermark. In [17], Langelaar and Lagendijk proposed an algorithm called Differential Energy Watermarking (DEW) which embeds the watermark code by adjusting the coefficients' energy relationship between two DCT coefficients blocks. If the energy relationship does not satisfy the required relationship to embed one bit, one can adjust the relationship to meet the required relationship by removing some high frequency DCT coefficients in one DCT block. The algorithm is performed in the low bit-rate environment. These two methods embed the watermark in DCT domain, and the watermark extraction requires that the bit-stream should be decoded by entropy-decoding first and then inverse quantization. Hence, they are complicated from a computational standpoint.
For the real-time characteristic, Langelaar et al. [18] further proposed another watermarking algorithm based on changing the level value of run-level pairs in entropy coding stage of video coding. Because the method only changed the LSB bit of the level value, it obtained good invisibility and real-time characteristics at the cost of robustness. Similarly, Lu et al. [19] also proposed a watermarking algorithm in VLC domain, where it embedded the watermark bit into video by adjusting the mean value of all level values in one whole macroblock (MB); however, it does not provide a valid control on quality degradation and its robustness to time-synchronization attack is not good.
In recent years, real-time video watermark in compressed domain has become one of main tendencies of watermarking technologies; many works have been proposed. Ye et al. [20] proposed an improved adaptive real-time video watermarking algorithm, where it is based on visual characteristic to select suitable watermarking positions, and watermark bits are embedded into the video streams with dedundant styles through exchanging the EQSP (equal quantization step position). Lu et al. [21] proposed a real-time frame-dependent video watermarking in VLC domain. Roy et al. proposed a hardware implementation for video authentication [22].
Those methods usually considered some aspects of invisibility, robustness, and real-time. However, watermark in compressed domain embeds the watermark in DCT domain or VLC domain in video bits-stream. This paper fully considers these three aspects of video watermarking and proposes a new watermarking algorithm which can achieve the real-time detection and processing. The main contributions of the paper are summarized as follows.
First, fast video scene segmentation is applied to choose those scenes with abundant texture and larger variance as candidate scenes which will be embedded watermark.
Second, once those candidate scenes are determined then a visual model is used to determine the largest amount of changeable values of one (Run, Level) pair in order to guarantee the invisibility of watermark.
The watermark bits are embedded into the video by changing the value of level and then adjusting the relationship between the number of bit1 in two group subblocks in each macroblock. Hence, the watermark detection can be easily done by counting the number of bit1.
In order to resist time-synchronized and collusion attacks, the same watermark information is embedded repetitively into each frame in each candidate's scenes and different watermark information is embedded into different candidate scenes, respectively.
The rest of the paper is organized as follows. The second section overviews the algorithm and then introduces each component in detail. The third section is the experiments and discussion. The final is the conclusion.
2. The Proposed Watermark Algorithm
In order to achieve real-time applications, the watermark embedding can be done in video coding; of course it can be done in compressed video. It depends on your applications. The extraction of watermark can be finished in decoding process or independently computed by extracting algorithm.
The framework of proposed algorithm is shown in Figure 1. This figure takes the basic block-based video coding framework as basis and emphasizes the four key components of proposed algorithm: the detection and selection of candidate scenes, the partition and ordering of Huffman tables, visual modeling, and watermark embedding. Where

The framework of watermark embedding.
Figure 2 is the extraction illustration. The extraction process is relatively simple. The bit-stream first flows into a filter which can filter the motion vector, head information, and side information; then the number of bit1 in groups A and D is compared with that in groups B and C to determine the extracted bit (the four subblocks A, B, C, and D are arranged from left to right and from top to bottom).

The extraction framework of watermark.
In the following section, each key component is given in detail.
2.1. Segmentation and Section of Candidate Scenes
Scene is defined as a series of frames which are taken by one shot (or several shots with slow movement). A meaningful scene cannot be deleted completely without loss of semantic meaning; hence repeatedly embedding the same watermark in one scene can provide the robustness against varied time-synchronized attacks (such as averaging, deleting, and regrouping frames).
Although there are some matured scene segmentation methods [23–25], a fast and appropriate scene segmentation considering the real-time requirement of video watermarking is proposed. It is well known that DC coefficients in each
Let
Then calculate the growth rate of changed amount as
In order to decrease the effect caused by adjacent frames with still or nearly still scenes which will lead to some errors in scene segmentation, the changed amount
The first frame of one scene has dramatically changed in comparison with the last frame in its immediate previous scene. Namely, the changed amount of DC in this case will change much greater than that in adjacent frames in the same scenes. This means that the larger α. In the similar way, the changed amount of DC in the second frame in one scene will much less than that in the first frame which means that the less α. Hence, the segmentation of scenes can be observed as the alternate seeking procedure for the start frame and the end frame of scene:
In our experiments,
According to the characteristic of HVS (human visual system), scenes with high complexity and high variance between frames have high redundancy and thus they have better invisibility than those scenes with low complexity and low variance when the same amount watermark information are embedded into these two class scenes. For finding the appropriate candidate scenes quickly, a parameter p which indicates the degree of appropriateness for embedding watermark is proposed as

The result of scene segmentation.
2.2. Partitioning and Ordering of Huffman Tables
AC coefficients are coded by run coding in most MPEG video compression standards. The two-tuples in run coding can be denoted as
After sorting partitioned sets in ascending order by the value of l, the number of
For example, MPEG-4 has different run-level code mechanisms for different frame type: inter, intra, interlast, and intralast. An example of partitioned results is shown in Box 1.
(1 2 2 1 0) (2F 4 40) (3 15 6 3 0) (4 17 7 40) (5 1F 8 5 0) (6 25 9 3 0) (7 24 9 2 0) (8 21 10 2 0) (9 20 10 1 0) (10 7 11 3 0) (11 6 11 2 0) (12 20 11 1 0) (1 6 3 2 1) (2 14 6 2 0) (3 1E 8 40) (4 F 10 4 0) (5 21 11 2 0) (6 50 12 2 1) (1 E 4 3 0) (2 1D 8 4 0) (3 E 10 3 0) (4 51 12 3 1) (1 D 5 3 1) (2 23 9 3 1) (3 D 10 3 1) (1 C 5 2 1) (2 22 9 2 0) (3 52 12 3 0) (1 B 5 3 0) (2C 10 2 0) (3 53 12 4 0) (1 13 6 3 1) (2B 10 3 1) (3 54 12 3 1) (1 12 6 2 1) (2A 10 2 1) (1 11 6 2 1) (29 10 21) inter (1 2 21 0) (2 6 3 2 0) (3 F 4 40) (4D 5 30) (5 C 5 2 0) (6 15 6 30) (7 13 6 3 0) (8 12 6 20) (9 177 40) (1 E 4 3 0) (2 146 2 0) (3 16 7 3 0) (4 1C8 3 0)(5 20 9 1 0) (6 1F9 5 0) (7 D 10 3 0) (1 B 5 3 1) (2 15 7 3 0) (3 1E9 40) (4 C 10 2 0) (5 56 12 4 0) (1 11 6 2 0) (2 1B 8 40) (3 1D9 4 0) (4 B 10 3 0) (1 106 1 0) (2 22 9 2 0) (3 A 10 2 1) (1 D 6 3 1) (2 1C 9 3 0) (3 8 10 1 0) (1 12 7 2 0) (2 1B 9 40) (3 54 12 3 0) (1 14 7 2 0) (2 1A 9 3 0) (3 57 12 50) (1 19 8 3 0) (29 10 20) intra
Actually, this partition is just the equivalent partition in set theory. A similar idea has been proposed in [26].
2.3. Visual Model
Human eyes have different sensitivities to changed amounts of coefficients at different positions in one frame. Hence, a sophisticated position choosing criteria is designed according to HVS to mask watermark information as natural noise and a good invisibility of watermark is benefited from this mechanism.
The invisibility is closely related with the embedding strength. In this paper, the complexity of scenes, spatial complexity, and time-complexity are considered to determine the embedding strength. The complexity of scenes is represented by the parameter p in Section 2.1. In spatial complexity of images, most people think texture area is more appropriate for embedding watermark than smooth and edge areas. Thus the energy of high frequency components of its DCT coefficients in a video frame describes the spatial complexity of the frame in some sense. Suppose
In addition, human eyes are sensitive to motion part in videos. Thus motion factor is considered to adjust the embedding strength. Consider the motion information in video coding, namely, motion vector
2.4. Watermark Embedding
Selecting the appropriate candidate scenes based on scene segmentation, then the embedding strength is determined by HVS model. The same watermark is embedded into each frame in every scene. Following a detailed embedding algorithm to embed one bit information in one chosen microblock is introduced.
For confidentiality of watermark information, a key K is chosen to generate a pseudo random sequence
The percentage of macroblocks satisfying (10) under different N.
In the following we introduce how to change the number of bit1 in specific bit-stream. Taking
It is observed that when
In Figure 4, the specific embedding framework is shown. In each selected level, there is an interval determined by

The embedding of watermark bits.
2.5. Watermark Extracting
Compared with the embedding, the detection and extraction of watermark are straightforward. The specific extraction is illustrated as in Figure 2. The relationship of the number of bit1 in
3. Experiments
As mentioned earlier, invisibility, robustness, and real-time are the three important factors in video watermarking. Following experiments on these three aspects will be conducted extensively to validate the performance. The video codec is MPEG-4; the coding and decoding codec is provided by Project Mayo of DivX Advance Research Center. The test sequences are the standard CIF sequences.
3.1. Invisibility
Video watermarking requires the watermarking process does not degrade dramatically the perceptual quality of video. In our experiments bus_cif sequence is used to embed the watermark. Figure 5 shows the 50th frame where the compression bit-rate is 7.12 Mbps.

The 50th frame in Bus test sequence.
From the images in Figure 5 human eyes cannot see any distortion caused by watermark embedding. Besides subjective tests, PSNR is used to measure the objective quality. The corresponding PSNR is plotted in Figure 6. The average PSNR is larger than 40 dB which indicates the good video quality. Moreover, the difference of PSNR values between compressed video with and without watermark embedding is very small. It indicates that the watermark embedding has negligible effect on the quality of compressed video.

The PSNR difference between compressed video with and without watermark embedding.
3.2. Robustness
Robustness is the capability of resisting all kinds of attacks. Attacks include active and passive attacks. For example, the quality degradation caused by noise signal is a type of passive attack. And removing the watermark by deleting some frames is a typical active attack.
In experiments, compressed video with watermark is decompressed and conducts all kinds of attacks; finally it is recompressed again. Hence, recompression attack is subject to other attacks. In the following we discuss the robustness of proposed algorithm from the different aspects.
3.2.1. The Detection Performance without Any Attacks
If
The detection of watermarked scenes.
The toto sequence is a mixed video sequence formed by many different video sequences and it includes 3979 frames. From this table, most of watermarked scenes can be detected without any attacks.

The BCR curve of Bus sequence.
3.2.2. The Robustness against Noise
Video sequences are distorted by transmission noise. Hence the robustness against noise is a factor of one good video watermarking algorithm. In this experiment, Gaussian noises with different intensity are added into the luminance component to test the performance.
Suppose noise x is Gaussian distribution, namely,
The first 50 frames of Bus sequence are used as samples. The PSNR of sequence with different noises are shown in Figure 8, where

The PSNR curve of sequence with different noise level.
Figure 9 shows the BCR curve of video with different noise levels. P frames always have much lower BCR than I frames. This phenomenon is caused by the high compression efficiency of P frames. Thus the watermark in I frames has much robust than that in P frames (GOP includes 15 frames).

The BCR value with different noise level.
This algorithm embeds the same watermark in all frames in one scene. Thus even if the watermark is not detected in the P frames, then it can be detected in the I frames.
3.2.3. The Robustness against Temporal-Desynchronized Attack
This attack includes inserting or deleting and averaging and regrouping frames. Due to its easy manipulation and implementation, it is a common used attack in video watermark. This attack can lead to the time-desynchronized attack; hence the watermark cannot be detected. Many video watermark algorithms design some sophisticated mechanism to resist this attack where the segmentation of scenes is the most valid measure to resist this type of attack.
Inserting or deleting frames may change the type of coding frame. It is well know that P frames use the prediction to improve the compression efficiency; thus the detection on P frames will be weaker than that on I frames. If I frames are not changed, then the watermark can be detected with high probability. Under different numbers of GOP frames, the BCR is shown in Figure 10 after randomly deleting some frames in Bus sequence. From that figure, when the GOP includes 3, 5, and 8 frames, the detected frames (

The BCR after deleting some frames.
From this experiment, the proposed algorithm has the capability to resist the attack conducted by deleting frames.
Figure 11 shows the BCR value where some frames are regrouped with different percentages: 5%, 10%, and 20%. From this figure, frame regrouping does not change the position of most I frames; thus I frames and the following P frames keep the BCR unchanged.

The BCR with frame regrouping.
3.2.4. The Robustness against Reencoding
Figure 12 shows the BCR with one, two, and three recompression attacks. From this figure, I frames have the highest stability.

The BCR with recompression.
3.3. Real-Time
Sometimes real-time plays an important role in applications. Of course, different applications have different requirements. In this experiment, the mixed video toto is used. It includes 3979 frames. The time-complexity with and without watermark embedding is considered. The experimental environment is P4 2.66G 512 M memory. The results are shown in Figure 13. The watermark algorithm has only a small effect on the time-complexity. The decode speed achieves 22.3 frames per second without any code optimization.

The time of video compression and decompression with and without watermark embedding (from right to left: the time of compression with and without watermark and decompression with and without watermark).
3.4. Bit-Rate
Watermarking in video sequences often causes the increasing of the file size; thus how to control the increased amount is a common considered problem in video watermarking. This algorithm embeds the watermark bit by substituting those Huffman codes with less bit1. Moreover, the Huffman code with less bit1 makes decreasing the total bits more probable. Hence, from the probability, the algorithm does not increase the bit-rate. Five sequences including Bus, foreman, news, basketball, and football are used as test sequences. The results are shown in Figure 14. The light bar indicates the original bit-rate without watermark embedding, and dark bar indicates the bit-rate with watermark embedding. Moreover, the bit-rate has a slight decrease. This agrees with the earlier analysis.

The bit-rate of different sequences with and without watermark.
4. Conclusions
This paper proposes a real-time video watermarking algorithm based on scene segmentation. The experiments indicate that the proposed algorithm not only keeps the high quality of watermarked video but also provides strong robustness against recompression, noise, and time-desynchronized attacks. At the same time, the time-complexity on coding side is not larger than 10% and that on decoding side is not larger than 2%. Moreover, the algorithm does not increase the bit-rate of compressed video. This algorithm can be used in any video-related application. For example, it can be used in video surveillance to prevent somebody from forging surveillance video in all distributed sensor networks.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported in part by a scholarship from the China Scholarship Council of the Republic of China (file no. 201203070360), the Natural Science Foundation of China (no. 60803147), and the Major State Basic Research Development Program of China (no. 2015CB351804).
