Sage Journals: Discover world-class research

Abstract

Abnormal events detection plays an important role in the video surveillance, which is a challenging subject in the intelligent detection. In this paper, based on a novel motion feature descriptor, that is, the histogram of maximal optical flow projection (HMOFP), we propose an algorithm to detect abnormal events in crowded scenes. Following the extraction of the HMOFP of the training frames, the one-class support vector machine (SVM) classification method is utilized to detect the abnormality of the testing frames. Compared with other methods based on the optical flow, experiments on several benchmark datasets show that our algorithm is effective with satisfying results.

1. Introduction

Nowadays, more and more surveillance cameras have been used in public places. Behavior analysis in crowded scenes [1–5] becomes more and more popular and important for public safety. In order to eliminate the world representation layer which can be a significant source of errors for algorithm modeling, an approach based on modeling directly at the pixel level was described in [6]. In [7, 8], social force model was used in abnormal crowd behavior detection. In [9, 10], a model named social attribute-aware force model was proposed. In this model, in order to improve the algorithm performance for the interaction behavior of the crowd, social characteristics of crowd behavior were taken into account.

In [11], SIFT features were extracted for the Bag of Words (Bow) model with Spatial Pyramid Matching Kernel (SPM). Then a SVM classifier was used for cross-scene abnormal events detection. In [12], based on the fact that the occurrence of abnormal events is rare while the frequently occurring events are normal in general human perception, proximity clustering for abnormal events detection in video sequence was proposed. In [13], when labeled information about normal events was limited and information about abnormal events was not available, projection subspace associated with detectors was discovered by using both labeled and unlabeled segments. In wireless sensor networks, a fact has been observed that instead of being transient, most abnormal events persist over a considerable period time. Thus, a technique for handling data in a segment-based manner was introduced in [14]. Without using any tracking and motion features, a feature extraction and events detection method were presented in [15], where features were extracted from foreground blobs and then confined in SVM based models for real-time events detection.

Unlike most existing approaches used for abnormal events detection, sparse representation based approaches attracted many researchers in the recent years. In [16], a method to detect abnormal events by a sparse subspace clustering was proposed. In [17, 18], a model based on the optical flow was described, which utilized the sparse reconstruction cost (SRC) over the normal dictionary to measure the normalness of the tested samples. As we know, optical flow is the approximated motion vector at each pixel location, which can reflect the relative distances of moving objects. Therefore, it is important and useful in video surveillance and abnormal events detection. Other methods based on the histogram of optical flow were described in [19–21]. Also, it was improved and used in this paper.

Although the above approaches could successfully realize abnormal events detection, they were limited in some aspects. Some models were established complicatedly and others cost a long time in the detection process. Based on these, we propose a novel detection model in crowded scenes, which is relatively simple and time-saving in calculation. Similar to the approach introduced in [21], our algorithm is mainly based on a proper processing method in the optical flow field.

The rest of the paper is organized as follows. In Section 2, we present how to acquire the motion features. In Section 3, the theory of one-class SVM is reviewed. In Section 4, the algorithm of abnormal events detection is introduced in detail. Section 5 presents our experiment results. Finally, some conclusions are presented in Section 6.

2. Motion Feature Extraction

Optical flow field is the movement on the surface of grayscale images, which reflects the movement information of two consecutive frames. Optical flow provides the information of direction and amplitude of the moving object in a scene, which can describe the behavior of people very well. Optical flow is derived from the following basic equation:

\begin{matrix} I_{x} u + I_{y} v + I_{t} = 0, \end{matrix}

(1)

where

I_{x}

I_{y}

, and

I_{t}

are the partial derivatives of the image grayscale value along the x, y, and t dimension, respectively; u and v are the horizontal (x dimension) and vertical (y dimension) components of the optical flow. Equation (1) is an ill-posed problem. In [22], Horn and Schunck proposed an algorithm. It is known as the HS algorithm to compute the optical flow by introducing a global constraint of smoothness, which is equal to the additional condition

\begin{matrix} \min \nabla^{2} u + \nabla^{2} v, \end{matrix}

(2)

where

\nabla^{2} u

and

\nabla^{2} v

are Laplace operators of u and v, respectively. The problem to get optical flow can be concluded as follows:

\begin{matrix} \min \iint [{(I_{x} u + I_{y} v + I_{t})}^{2} + α^{2} {(\nabla^{2} u + \nabla^{2} v)}^{2}] d x d y, \end{matrix}

(3)

where α is the parameter that represents the weights of the regularization term. Then the Euler-Lagrange equations can be acquired, which are solved by utilizing the Gauss-Seidel method. It can get an iterative result to compute the optical flow:

\begin{matrix} u^{n + 1} = {\bar{u}}^{n} - \frac{I_{x} (I_{x} {\bar{u}}^{n} + I_{y} {\bar{v}}^{n} + I_{t})}{α^{2} + {I_{x}}^{2} + {I_{y}}^{2}}, \\ v^{n + 1} = {\bar{v}}^{n} - \frac{I_{y} (I_{x} {\bar{u}}^{n} + I_{y} {\bar{v}}^{n} + I_{t})}{α^{2} + {I_{x}}^{2} + {I_{y}}^{2}}, \end{matrix}

(4)

where

\bar{u}

and

\bar{v}

are weighted average value of u and v, respectively, which are calculated in a neighborhood around the pixel location. n denotes the algorithm iteration number.

In this paper, we propose a novel motion feature descriptor, called histogram of maximal optical flow projection (HMOFP). Figure 1 briefly shows the process for computing the HMOFP.

Figure 1

The process for computing the HMOFP feature.

As shown in Figure 2, the optical flow field of frame s is divided into m image patches with overlap areas. Each block contains $B \times B$ pixels. Then we deal with the optical flow in each patch as follows: $0^{°} – 360^{°}$ are segmented into p bins. For an image patch, the optical flow vector of each pixel must belong to a bin according to its direction. Thus, each bin may contain several optical flow vectors. We project all optical flow vectors in the same bin onto the angle bisector of this bin. Then the maximal projection vector is selected as the feature descriptor. For example, in Figure 3(a), there are two vectors $\vec{o n_{1}}$ and $\vec{o n_{2}}$ falling into the first bin. It is easy to know that the projection of $\vec{o n_{2}}$ is longer than the projection of $\vec{o n_{1}}$ . Thus, the length of the projection vector $\vec{o n_{2^{'}}}$ is selected as the feature descriptor of the first bin. After computing m patches, we obtain the feature descriptor vector of each image patch, denoted as $[h_{1}, h_{2}, \dots, h_{m}]_{p \times m}$ , where $h_{i} = [h_{i}^{1}, \dots, h_{i}^{p}]_{p \times 1}^{}$ . For the ith patch, $h_{i}^{j}$ , $1 \leq j \leq p$ , $1 \leq i \leq m$ denotes the maximal amplitude among all projection vectors in the jth bin. As shown in Figure 3(b), we take the concatenation of the m feature descriptor vectors, which is named $H_{s}$ , as the global HMOFP feature of the frame s.

Figure 2

Block-division of the optical flow field belonging to the frame s.

Figure 3

(a) The calculation of HMOFP in each bin. (b) Components of the global feature descriptor of the frame s.

In order to describe a crowd scene well, sufficient crowd movement information is required. On the other hand, for distinguishing two different scenes, detailed comparisons of them are needed and useless information in these two scenes should be eliminated. In the classification process, overlapping block-division can increase the number of significant motion features in two different frames such that these two frames can be more distinguishable. Thus, it is adopted in our algorithm since the optical information can be utilized sufficiently. Moreover, to describe the motion of a crowd, we need two factors: explicit directions and the moving distance along each direction. The operation of segmenting the $2 D$ space into p bins provides us ample information to describe the directions of moving people. To let the direction in each bin be unique, we select the p angle bisectors as the direction standard. Since there may be far more than one optical flow vector in each bin, in order to enhance the distinction between the normal scene and the abnormal scene, we select the maximal vector projection rather than the sum of all the vector projections on the bisector as the motion feature descriptor. If we ignore the background area, the amplitudes of motion vectors that belong to the normal area are very small in a normal frame and the motion vectors corresponding to the abnormal area are large in an abnormal frame. Usually, the number of normal motion vectors is much more than that of the abnormal area. If we use the sum of all projection vectors on the angel bisector as the feature descriptor of each bin, the accumulation of the massive small motion vectors in the normal frame may confuse the small number of large motion vectors in the abnormal frame; that is, the sum of all projection vectors on the angel bisector in each bin of the normal frame is likely to be close to that of the abnormal frame. Thus, in order to improve the distinguishability between the abnormal and normal frames, we select the maximal projection vector as the feature descriptor of each bin, as it was demonstrated in Figure 3.

3. One-Class SVM

SVM was initiated by Vapnik and Lerner [23]. Since the kernel methods were introduced, SVM has been applied extensively in nonliner classification problems [24–26]. In one-class classification problem, the substance is that the boundary, that is, an appropriate region, needs to be determined in the data space $X$ , which contains most of the samples coming from an unknown probability distribution D. This goal can be realized by searching for an optimal decision hyperplane in the feature space, which is known as the Hilbert space $H$ . This hyperplane can maximize the distance between itself and the original point, while only a small part of data falls between them [27]. The relationship between $X$ and $H$ is shown in Figure 4.

Figure 4

The correspondence between data space and feature space. (a) The boundary in the data space $X$ . (b) The hyperplane in the feature space $H$ .

One-class SVM problem can be presented as an optimization model:

\begin{matrix} \min_{w, ξ, ρ} \frac{1}{2} (w^{T} w) - ρ + \frac{1}{ν l} (\sum_{i = 1}^{l} ξ_{i}) \\ s.t. w^{T} ϕ (x_{i}) \geq ρ - ξ_{i}, ξ_{i} \geq 0, \end{matrix}

(5)

where

x_{i} \in X

i \in [1 \dots l]

are training samples in the input data space

X

and

ϕ : X \to H

can map a vector

x_{i}

into the feature space

H

w^{T} ϕ (x_{i}) - ρ = 0

is the decision hyperplane.

ξ_{i}

is the slack variable for penalizing the outliers.

ν \in (0,1]

is the hyperparameter, which is the weight for controlling slack variable and tunes the number of acceptable outliers. ϕ is a mapping function, which provides us a way to solve the nonlinear classification problem in the space

X

by a linear solution in the space

H

. By calculating dot product in

H

, the kernel function is defined as

k (x_{i}, x_{j}) = ϕ^{T} (x_{i}) ϕ (x_{j})

. The decision function in the space

X

with a Lagrangian multiplier

α_{i}

is defined as

\begin{matrix} f (x) = sgn (\sum_{i = 1}^{l} α_{i} k (x_{i}, x) - ρ) . \end{matrix}

(6)

In [28], it was introduced that if appropriate parameters were selected, polynomial and sigmoid kernels will result in similar results with Gaussian. We choose Gaussian kernel in our algorithm. This kernel is defined as

\begin{matrix} k (x_{i}, x_{j}) = \exp (- \frac{{‖x_{i} - x_{j}‖}^{2}}{2 σ^{2}}), \end{matrix}

(7)

where

x_{i}

and

x_{j}

belong to the space

X

and σ is the scale factor at which the data should be clustered.

In our method, one-class SVM is utilized as follows. Firstly, the training set is used to establish a model. Then an appropriate boundary in the data space can be determined. The new incoming frames will be clustered by the following rule: if the HMOFP feature of the testing frame falls inside the boundary, it will be clustered as a normal frame. Otherwise, it is abnormal.

4. Abnormal Events Detection

In this section, an algorithm for abnormal events detection in surveillance video is described in detail. Suppose that for a given scene, there is a set of training frames $[f_{1}, \dots, f_{l}]$ , which describe the normal behavior of crowded people. The general procedures for the abnormal events detection based on the histogram of maximal optical flow projection (HMOFP) are presented as follows.

Step 1.

Calculate the optical flow, that is, $[O P_{1}, \dots, O P_{l - 1}]$ , by the HS method at each pixel of the first $l - 1$ frames:

\begin{matrix} {[f_{1}, \dots, f_{l}]}_{a \times b \times l} \overset{HS}{⟶} {[{OP}_{1}, \dots, {OP}_{l - 1}]}_{a \times b \times (l - 1)}, \end{matrix}

(8)

where

a \times b

is the size of the frame image and l is the number of the frames in the training set. Our method to compute optical flow is based on the two consecutive frames, which is only effective to the first frame, so in the right side of (8), the maximal subscript is

l - 1

Step 2.

Extract the motion features of the first $l - 1$ training frames. Then the HMOFP feature vectors of them can be obtained, which is denoted as the set ${[H_{1}, \dots, H_{l - 1}]}^{T}$ . Consider

\begin{matrix} {[{OP}_{1}, \dots, {OP}_{l - 1}]}_{a \times b \times (l - 1)} \overset{HMOFP}{\to} {[H_{1}, \dots, H_{l - 1}]}_{p \times m \times (l - 1)}^{T} . \end{matrix}

(9)

Step 3.

Based on HMOFP, one-class SVM is utilized to calculate the optimal boundary of the set ${[H_{1}, \dots, H_{l - 1}]}^{T}$ , which corresponds to the set of support vectors or the optimal hyperplane in the feature space.

Step 4.

Detect HMOFP of the testing frames based on the model trained by the motion feature of the first $l - 1$ training frames.

The whole procedure is illustrated in Figure 5.

Figure 5

The flowchart of the proposed abnormal events detection algorithm.

5. Experimental Results

In this section, based on the UMN dataset [29] and PETS2009 dataset [30], we evaluate our method for abnormal event detection. Image patch size is set as $64 \times 64$ and $128 \times 128$ , respectively, in the UMN dataset and PETS2009 dataset. 0°–360° are divided into 18 bins, that is, $p = 18$ . The overlapping proportion of two neighboring blocks is 50%. In the UMN dataset, the length of the HMOFP feature of each frame is 972 with a $320 \times 240$ resolution. In the PETS2009 dataset, the resolution of each frame is $768 \times 576$ , and the length of the HMOFP feature is 1584.

5.1. Experiments on the UMN Dataset

There are three different crowded scenes in the UMN dataset, which are named lawn, indoor, and plaza, respectively. In our experiments, we select a part of the normal frames of each scene as the training set and take the rest of the video sequence as the testing set.

5.1.1. Detection in the Lawn Scene

The video sequence of the lawn scene contains 1453 frames in total. The first 480 frames are taken as the training set. As shown in Figure 6, in the lawn scene, the normal event is that individuals walk in different directions. The abnormal event is that individuals suddenly run away. The detection results of the lawn scene are shown in Figure 7. The accuracy of the detection results is 95.5141%.

Figure 6

Two different scenes in the sequence of lawn.

Figure 7

Classification results of the lawn scene.

5.1.2. Detection in the Indoor Scene

The video sequence of the indoor scene contains 4144 frames in total. The first 319 frames are taken as the training set. As shown in Figure 8, in the indoor scene, the normal event is that some people are talking and standing in a relatively fixed location while some others are walking along the road in the hall. The abnormal event is that people run out of the doors suddenly. The detection results of the indoor scene are shown in Figure 9. The accuracy of the detection results is 91.2857%.

Figure 8

Two different scenes in the sequence of indoor.

Figure 9

Classification results of the indoor scene.

5.1.3. Detection in the Plaza Scene

The video sequence of the plaza scene contains 2412 frames in total. The first 550 frames are taken as the training set. As shown in Figure 10, in the plaza scene, the normal event is that people walk around the center of the square. The abnormal event is that people suddenly run away from the square. The detection results of the plaza scene are shown in Figure 11. The accuracy of the detection results is 94.3352%.

Figure 10

Two different scenes in the sequence of plaza.

Figure 11

Classification results of the plaza scene.

5.2. Experiments on the PETS2009 Dataset

In the following experiments, we can choose some specific scenes we are interested in as the targets in the detection progress. In the PETS2009 dataset, we firstly select the training set and the normal testing set, respectively, in the same scene. Then another video clip in a different scene is taken as the corresponding abnormal testing set. Our experiments and the detection results are shown as follows.

5.2.1. People Scatter Detection

In this part, the training set is the video sequence Time 14-16 (Frame 0 to Frame 222), where people are walking or running towards one direction. The normal testing set includes 41 frames (Frame 48 to Frame 88) of Time 14-17. 41 frames (Frame 337 to Frame 377) of Time 14-33 are labeled as abnormal for testing, in which people are scattered in all directions. The two different scenes are shown in Figure 12. The accuracy of the detection results is 97.5%, as shown in Figure 13.

Figure 12

Two different scenes in the same location.

Figure 13

The detection results of the sequence Time 14-33. “1” means normal and “−1” means abnormal.

5.2.2. Crowd Movement Direction Detection

In this part, the training set is the video sequence Time 14-55 (Frame 0 to Frame 399), where people are walking towards all directions. The normal testing set includes 89 frames (Frame 400 to Frame 488) of Time 14-55. 89 frames (Frame 0 to Frame 88) of the video sequence Time 14-17 are labeled as abnormal for testing, in which people are walking towards one direction. The two different scenes are shown in Figure 14. The accuracy of the detection results is 92.6136%, as shown in Figure 15.

Figure 14

Two different scenes in the same location.

Figure 15

The detection results of the sequence Time 14-17. “1” means normal and “−1” means abnormal.

5.2.3. People Running Detection

In this part, the training set contains 50 frames (Frame 0 to Frame 49) of the video sequence Time 14-31 and 61 frames (Frame 0 to Frame 60) of the video sequence Time 14-17, where people are walking from right to left and from left to right, respectively. The normal testing set includes 104 frames (Frame 0 to Frame 37 and Frame 108 to Frame 173) of Time 14-16. 119 frames (Frame 38 to Frame 107 and Frame 174 to Frame 222) of Time 14-16 are labeled as abnormal for testing, in which people are running towards one direction. The two different scenes are shown in Figure 16. The accuracy of the detection results is 93.6937%, as shown in Figure 17.

Figure 16

Two different scenes in the same location.

Figure 17

The detection results of the sequence Time 14-16. “1” means normal and “−1” means abnormal.

5.2.4. People Splitting Detection

In this part, the training set contains Frames 0 to 40 of the video sequence Time 14-16, where people are walking towards the same direction. The normal testing set includes 64 frames (Frame 0 to Frame 63) of Time 14-31. 66 frames (Frame 64 to Frame 129) of the video sequence Time 14-31 are labeled as abnormal for testing, in which the crowd is splitting. The normal scene and abnormal scene are shown in Figure 18. The accuracy of the detection results is 96.1538%, as shown in Figure 19.

Figure 18

Two different scenes in the same location.

Figure 19

The detection results of the sequence Time 14-31. “1” means normal and “−1” means abnormal.

5.2.5. Comparison

We compared our algorithm with the histogram of optical flow orientation (HOFO) method proposed in [21], as shown in Table 1. Most results of our algorithm are better than those of HOFO.

Table 1

The comparison of HMOFP with HOFO.

Accuracy	Sequence
Accuracy	Time 14-33	Time 14-17	Time 14-16	Time 14-31
Method
HOFO	97.5%	90%	93.24%	94.6154%
HMOFP (ours)	97.5%	92.6136%	93.6937%	96.1538%

6. Conclusion

In this paper, we proposed an algorithm for abnormal events detection in crowded scenes with global-frame scale. Our method contains two main procedures: first is computing the histogram of maximal optical flow projection (HMOFP) descriptor of the input video sequence. Second, one-class SVM classifier is utilized for nonlinear classification of the testing sets. The proposed method has been tested on several surveillance video datasets with good detection accuracy.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the NSFC (nos. 61273274, 61370127, 61572067, and 61272028), 973 Program (no. 2011CB302203), National Key Technology R&D Program of China (nos. 2012BAH01F03, NSFB4123104, FRFCU 2014JBZ004, and Z131110001913143), and Tsinghua-Tencent Joint Lab. for IIT.

References

Mehran

Moore

B. E.

Shah

A streakline representation of flow in crowded scenes

Computer Vision—ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part III 2010 6313

Berlin, Germany

Springer

439 452 Lecture Notes in Computer Science

10.1007/978-3-642-15558-1_32

Cong

Yuan

Tang

Video anomaly search in crowded scenes via spatio-temporal motion context

IEEE Transactions on Information Forensics and Security 2013 8 10 1590 1599

10.1109/TIFS.2013.2272243

2-s2.0-84884545292

Daniyal

Cavallaro

Abnormal motion detection in crowded scenes using local spatio-temporal analysis

Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '11)

May 2011

1944 1947

10.1109/icassp.2011.5946889

2-s2.0-80051655531

Mahadevan

Vasconcelos

Anomaly detection and localization in crowded scenes

IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 36 1 18 32

10.1109/TPAMI.2013.111

2-s2.0-84890419942

Thida

Eng

H.-L.

Remagnino

Laplacian eigenmap with temporal constraints for local abnormality detection in crowded scenes

IEEE Transactions on Cybernetics 2013 43 6 2147 2156

10.1109/tcyb.2013.2242059

2-s2.0-84884543420

Kosmopoulos

Chatzis

S. P.

Robust visual behavior recognition

IEEE Signal Processing Magazine 2010 27 5 34 45

10.1109/MSP.2010.937392

2-s2.0-77956326192

Mehran

Oyama

Shah

Abnormal crowd behavior detection using social force model

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09)

June 2009

Miami, Fla, USA

IEEE

935 942

10.1109/cvprw.2009.5206641

2-s2.0-70450255364

Yen

S.-H.

Wang

C.-H.

Abnormal event detection using HOSF

Proceedings of the 3rd International Conference on IT Convergence and Security (ICITCS '13)

December 2013

Macao, China

IEEE

1 4

10.1109/icitcs.2013.6717798

2-s2.0-84894133204

Zhang

Qin

Yao

Huang

Abnormal crowd behavior detection based on social attribute-aware force model

Proceedings of the 19th IEEE International Conference on Image Processing (ICIP '12)

October 2012

2689 2692

10.1109/icip.2012.6467453

2-s2.0-84875826134

10.

Zhang

Qin

Yao

Huang

Social attribute-aware force model: exploiting richness of interaction for abnormal crowd detection

IEEE Transactions on Circuits and Systems for Video Technology 2015 25 7 1231 1245

10.1109/tcsvt.2014.2355711

11.

Hung

T.-Y.

Tan

Y.-P.

Cross-scene abnormal event detection

Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '13)

May 2013

2844 2847

10.1109/iscas.2013.6572471

2-s2.0-84883410205

12.

Sandhan

Sethi

Srivastava

Choi

J. Y.

Unsupervised learning approach for abnormal event detection in surveillance video by revealing infrequent patterns

Proceedings of the 28th International Conference on Image and Vision Computing New Zealand (IVCNZ '13)

November 2013

494 499

10.1109/ivcnz.2013.6727064

2-s2.0-84894343778

13.

Tziakos

Cavallaro

Local abnormal detection in video using subspace learning

Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS '10)

2010

519 525

14.

Xie

Guo

Segment-based anomaly detection with approximated sample covariance matrix in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2015 26 2 574 583

10.1109/TPDS.2014.2308198

2-s2.0-84921381770

15.

Haque

Murshed

Panic-driven event detection from surveillance video stream without track and motion features

Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '10)

July 2010

Singapore

IEEE

173 178

10.1109/icme.2010.5583057

2-s2.0-78349249001

16.

Ren

Moeslund

T. B.

Abnormal event detection using local sparse representation

Proceedings of the 11th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS '14)

August 2014

Seoul, Republic of Korea

IEEE

125 130

10.1109/avss.2014.6918655

2-s2.0-84909966600

17.

Cong

Yuan

Liu

Sparse reconstruction cost for abnormal event detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11)

June 2011

3449 3456

10.1109/cvpr.2011.5995434

2-s2.0-80052872021

18.

Cong

Yuan

Liu

Abnormal event detection in crowded scenes using sparse representation

Pattern Recognition 2013 46 7 1851 1864

10.1016/j.patcog.2012.11.021

2-s2.0-84875239530

19.

Wang

Snoussi

Histograms of optical flow orientation for visual abnormal events detection

Proceedings of the IEEE 9th International Conference on Advanced Video and Signal-Based Surveillance (AVSS '12)

September 2012

13 18

10.1109/avss.2012.39

2-s2.0-84868215874

20.

Wang

Snoussi

Histograms of optical flow orientation for abnormal events detection

Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS '13)

January 2013

45 52

10.1109/pets.2013.6523794

2-s2.0-84881099347

21.

Wang

Snoussi

Detection of abnormal visual events via global optical flow orientation histogram

IEEE Transactions on Information Forensics and Security 2014 9 6 988 998

10.1109/tifs.2014.2315971

2-s2.0-84900863530

22.

Horn

B. K. P.

Schunck

B. G.

Determining optical flow

Artificial Intelligence 1981 17 1–3 185 203

10.1016/0004-3702(81)90024-2

2-s2.0-0019597413

23.

Vapnik

V. N.

Lerner

Pattern recognition using generalized portrait method

Automation and Remote Control 1963 24 6 774 780

24.

Boser

B. E.

Guyon

I. M.

Vapnik

V. N.

Training algorithm for optimal margin classifiers

Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory

July 1992

144 152

2-s2.0-0026966646

25.

Piciarelli

Micheloni

Foresti

G. L.

Trajectory-based anomalous event detection

IEEE Transactions on Circuits and Systems for Video Technology 2008 18 11 1544 1554

10.1109/tcsvt.2008.2005599

2-s2.0-55149098943

26.

Cristianini

Shawe-Taylor

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods 2000

Cambridge, UK

Cambridge University Press

10.1017/cbo9780511801389

27.

Schölkopf

Platt

J. C.

Shawe-Taylor

Smola

A. J.

Williamson

R. C.

Estimating the support of a high-dimensional distribution

Neural Computation 2001 13 7 1443 1471

10.1162/089976601750264965

2-s2.0-0000487102

28.

Schölkopf

Smola

A. J.

Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond 2002

Cambridge, Mass, USA

MIT Press

29.

UMN

Unusual crowd activity dataset of university of minnesota, department of computer science and engineering

2006

30.

PETS

Performance evaluation of tracking and surveillance (pets) 2009 benchmark data. multisensor sequences containing different crowd activities

2009, http://www.cvg.reading.ac.uk/PETS2009/a.html

Histogram of Maximal Optical Flow Projection for Abnormal Events Detection in Crowded Scenes

Abstract

1. Introduction

2. Motion Feature Extraction

3. One-Class SVM

4. Abnormal Events Detection

Step 1.

Step 2.

Step 3.

Step 4.

5. Experimental Results

5.1. Experiments on the UMN Dataset

5.1.1. Detection in the Lawn Scene

5.1.2. Detection in the Indoor Scene

5.1.3. Detection in the Plaza Scene

5.2. Experiments on the PETS2009 Dataset

5.2.1. People Scatter Detection

5.2.2. Crowd Movement Direction Detection

5.2.3. People Running Detection

5.2.4. People Splitting Detection

5.2.5. Comparison

6. Conclusion

Footnotes

Conflict of Interests

Acknowledgments

References