Sage Journals: Discover world-class research

Abstract

Addressing the challenge that current metro video surveillance systems have not effectively handled crowd density estimation in metro environments, a Metro Crowd density estimation Network (MCNet) is proposed to automatically classify the crowd density level of passengers. First, an Integrating Multi-scale Attention (IMA) module is introduced to enhance the ability of plain classifiers to extract semantic crowd texture features. The innovation of the IMA module lies in fusing dilation convolution, multiscale feature extraction, and attention mechanisms to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost, strengthening the activation state of crowd features in the top layers. Second, a novel lightweight crowd texture feature extraction network is proposed to automatically extract image texture features for crowd density estimation, offering high efficiency and lower network parameters suitable for deployment on resource-constrained embedded platforms. Finally, this paper combines the IMA module and the lightweight crowd texture feature extraction network to form the MCNet, and tests its feasibility on the image classification dataset: Cifar10, as well as four crowd density datasets. Experimental results demonstrate that, with the help of the IMA module, the prediction accuracies of MCNet improve across these datasets, outperforming other competitors in terms of accuracy, total network parameters, and inference speed. Furthermore, experiments on power consumption and inference speed support the feasibility of MCNet being deployed on embedded metro platforms. These tests show that MCNet is a suitable solution for crowd density estimation in metro video surveillance in complex real-life scenes.

Keywords

metro crowd density attention texture feature embedded device

Get full access to this article

View all access options for this article.

References

Huang

, et al. On pixel count based crowd density estimation for visual surveillance. In: IEEE conference on cybernetics and intelligent systems, 2004, Vol. 1, 2004, pp.170–173. Singapore: IEEE.

Davies

Yin

Velastin

. Crowd monitoring using image processing. Electron Commun Eng J 1995; 7: 37–47.

Lempitsky

Zisserman

. Learning to count objects in images. In: Proceedings of the 23rd international conference on neural information processing systems - Volume 1, 2010, pp.1324–1332. Red Hook, NY, USA: Curran Associates Inc.

Marana

Velastin

Costa

, et al. Estimation of crowd density using image processing. In: IEE Colloquium on image orocessing for security applications, 1997, pp.11/1–11/8. London, UK: IET.

Krizhevsky

Sutskever

Hinton

. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information orocessing systems - Volume 1, 2012, pp.1097–1105. Lake Tahoe, Nevada: Curran Associates Inc.

Deng

Dong

Socher

, et al. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, 2009, pp.248–255. Miami, USA: IEEE.

Zhang

Zhou

Chen

, et al. Single-image crowd counting via multi-column convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp.589–597. Las Vegas, Nevada, USA: IEEE. DOI: 10.1109/CVPR.2016.70.

, et al. Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 2015; 43: 81–88.

Zeiler

Fergus

. Visualizing and understanding convolutional networks. In: Computer vision – ECCV 2014, 2014, pp.818–833. Cham: Springer International Publishing.

10.

Krizhevsky

. Learning multiple layers of features from tiny images. University of Toronto, 2009.

11.

Ferryman

Shahrokni

. Pets2009: dataset and challenge. In: 2009 twelfth IEEE international workshop on performance evaluation of tracking and surveillance, 2009, pp.1–6. Snowbird, USA: IEEE.

12.

Chen

Gong

Xiang

, et al. Cumulative attribute space for age crowd density estimation. In: 2013 IEEE conference on computer vision and pattern recognition, 2013, pp.2467–2474. Portland, USA: IEEE.

13.

Ryan

Denman

Sridharan

, et al. Scene invariant crowd counting. In: 2011 international conference on digital image computing: techniques and applications, 2011, pp.237–242. Noosa, Australia: IEEE.

14.

Conte

Foggia

Percannella

, et al. A method based on the indirect approach for counting people in crowded scenes. In: 2010 7th IEEE international conference on advanced video and signal based surveillance. 2010, pp.111–118. DOI: 10.1109/AVSS.2010.86.

15.

Idrees

Saleemi

Seibert

, et al. Multi-source multi-scale counting in extremely dense crowd images. In: 2013 IEEE conference on computer vision and pattern 4ecognition, 2013, pp.2547–2554. Portland, USA: IEEE.

16.

Liang

Bai

. An end-to-end transformer model for crowd localization. In: Computer vision – ECCV 2022, 2022, pp.38–54. Tel Aviv, Israel: Springer Nature Switzerland.

17.

Liu

Weng

. Recurrent attentive zooming for joint crowd counting and precise localization. In: 2019 IEEE/CVF conference on computer vision and pattern cecognition (CVPR), 2019, pp.1217–1226. DOI: 10.1109/CVPR.2019.00131.

18.

Liu

Shi

Zhao

, et al. Point in, box out: beyond counting persons in crowds. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019, pp.6462–6471. DOI: 10.1109/CVPR.2019.00663.

19.

Saleh

SAM

Suandi

Ibrahim

. Recent survey on crowd density estimation and counting for visual surveillance. Eng Appl Artif Intell 2015; 41: 103–114.

20.

Chan

Vasconcelos

. Counting people with low-level features and Bayesian regression. IEEE Trans Image Process 2012; 21: 2160–2177.

21.

Chen

Loy

Gong

, et al. Feature mining for localised crowd counting. In: British ,achine vision conference, 2012. Surrey, UK: BMVA Press.

22.

Zhang

Wang

, et al. Cross-scene crowd counting via deep convolutional neural networks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp.833–841. Boston, MA, USA: IEEE. DOI: 10.1109/CVPR.2015.7298684.

23.

Cao

Wang

Zhao

, et al. Scale aggregation network for accurate efficient crowd counting. In: Computer vision – ECCV 2018, 2018, pp.757–773. Munich, Germany: Springer International Publishing.

24.

Jiang

Zhang

, et al. Attention scaling for crowd counting. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020, pp.4705–4714. Seattle, Washington, USA: IEEE. DOI: 10.1109/CVPR42600.2020.00476.

25.

Wan

Chan

. Adaptive density map generation for crowd counting. In: 2019 IEEE/CVF international conference on computer vision (ICCV), 2019, pp.1130–1139. Seoul, South Korea: IEEE. DOI: 10.1109/ICCV.2019.00122.

26.

Marana

Velastin

Costa

, et al. Automatic estimation of crowd density using texture. Saf Sci 1998; 28: 165–175.

27.

Liang

Lee

, et al. Crowd density estimation using texture analysis and learning. In: 2006 IEEE international conference on robotics and biomimetics, 2006, pp.214–219. Kunming, China: IEEE.

28.

Hussain

Yatim

HSM

Hussain

, et al. CDES: a pixel-based crowd density estimation system for masjid al-haram. Saf Sci 2011; 49: 824–833.

29.

Yuan

Qiu

, et al. Crowd density estimation using wireless sensor networks. In: 2011 seventh international conference on mobile ad-hoc and sensor networks, 2011, pp.138–145. DOI: 10.1109/MSN.2011.31.

30.

Yang

Zheng

. The large-scale crowd density estimation based on effective region feature extraction method. In: 10th Asian conference on computer vision, 2011, pp.302–313. Queenstown, New Zealand: Springer Berlin Heidelberg.

31.

Wang

Liu

Qian

, et al. Crowd density estimation based on local binary pattern co-occurrence matrix. In: 2012 IEEE international conference on multimedia and expo workshops, 2012, pp.372–377. DOI: 10.1109/ICMEW.2012.71.

32.

Jia

Zhang

. Crowd density classification method based on pixels and texture features. Mach Vis Appl 2021; 32: 1–22.

33.

Zhang

Zheng

Zhang

, et al. Crowd density estimation based on convolutional neural networks with mixed pooling. J Electron Imaging 2017; 26: 1–9.

34.

Han

. Real-time crowd density estimation based on convolutional neural networks. In: 2018 international conference on intelligent transportation, hig data & smart city, 2018, pp.690–694. Xiamen, China: IEEE.

35.

Szegedy

Liu

Jia

, et al. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and oattern recognition, 2015, pp.1–9. Boston, USA: IEEE.

36.

Huynh

Tran

Huang C

. Iuml: inception u-net based multi-task learning for density level classification and crowd density estimation. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), 2019, pp.3019–3024. DOI: 10.1109/SMC.2019.8914497.

37.

Bhuiyan

Abdullah

Hashim

, et al. Deep dilated convolutional neural network for crowd density image classification with dataset augmentation for hajj pilgrimage. Sensors 2022; 22: 1–16.

38.

Bhuiyan

Abdullah

Hashim

, et al. A deep crowd density classification model for hajj pilgrimage using fully convolutional neural network. PeerJ Comput Sci 2022; 8: 1–20.

39.

Zhou

Liu

Ding

, et al. Crowd descriptors and interpretable gathering understanding. IEEE Trans Multimedia 2024; 26: 8651–8664.

40.

Song

Zhang

, et al. Estimation of crowd density in surveillance scenes based on deep convolutional neural network. Procedia Comput Sci 2017; 111: 154–159.

41.

Yamin

Almutairi

Badghish

, et al. Sparrow search optimization with transfer learning-based crowd density classification. Comput Mater Contin 2023; 74: 4965–4981.

42.

Wang

. Densitytoken: weakly-supervised crowd counting with density classification. In: 2023 IEEE international conference on acoustics, speech and signal processing, 2023, pp.1–5. DOI: 10.1109/ICASSP49357.2023.10095402.

43.

Ahmad

. Intelligent crowd density classification using improved metaheuristics with transfer learning model on smart cities. SN Comput Sci 2024; 5: 1–12.

44.

Alsubai

Dutta

Alghayadh

, et al. Design of artificial intelligence driven crowd density analysis for sustainable smart cities. IEEE Access 2024; 12: 121983–121993.

45.

Şaban

Akdemir

. Comparison of edge detection algorithms for texture analysis on glass production. Procedia - Soc Behav Sci 2015; 195: 2675–2682.

46.

Yang

Zhang

, et al. Lightweightnet: toward fast and lightweight convolutional neural networks via architecture distillation. Pattern Recognit 2019; 88: 272–284.

47.

Zhang

Zheng

, et al. Shufflenet v2: practical guidelines for efficient CNN architecture design. In: 15th European conference on computer vision, 2018, pp.122–138.

48.

Gabbasov

Paringer

. Influence of the receptive field size on accuracy performance of a convolutional neural network. In: 2020 international conference on onformation technology and nanotechnology, 2020, pp.1–4. Samara, Russia: IEEE.

49.

Zhou

Khosla

Lapedriza

, et al. Object detectors emerge in deep scene CNNs. In: 3rd international conference on learning representations, 2015, pp.1–12. San Diego, USA.

50.

Jiang

Zhang

, et al. D3d: dual 3-d convolutional network for real-time action recognition. IEEE Trans Ind Inform 2020; 17: 4584–4593.

51.

Jiang

Zhang

, et al. Spatial-temporal interleaved network for efficient action recognition. IEEE Trans Ind Inform 2024; 21: 178–187.

52.

, et al. High performance gesture recognition via effective and efficient temporal modeling. In: IJCAI, 2019, pp.1003–1009.

53.

Phan

VMH

Xie

Zhang

, et al. Structural attention: rethinking transformer for unpaired medical image synthesis. In: International conference on medical image computing and computer-assisted intervention, 2024, pp.690–700. Springer.

54.

Song

, et al. Rethinking attentive object detection via neural attention learning. IEEE Trans Image Process 2023; 33: 1726–1739.

55.

Chen

Hong

, et al. Multi-attention network for compressed video referring object segmentation. In: Proceedings of the 30th ACM international conference on multimedia, 2022, pp.4416–4425.

56.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, 2016, pp.770–778. Las Vegas, USA: IEEE.

57.

Iandola

Moskewicz

Ashraf

, et al. Squeezenet: alexnet-level accuracy with 50

\times

fewer parameters and <1 mb model size, 2016. [Online], http://arxiv.org/abs/1602.07360 (November 2016).

58.

Howard

Zhu

Chen

, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications, 2017. [Online], https://arxiv.org/abs/1704.04861 (April 2017).

59.

Tan

. EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, Vol. 97, 2019, pp.6105–6114. Long Beach, USA: PMLR.

60.

Touvron

Cord

Douze

, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of the 38th international conference on machine learning, Vol. 139, 2021, pp.10347–10357. PMLR.

61.

Liu

Mao

, et al. A convnet for the 2020s. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2022, pp.11966–11976. New Orleans, USA: IEEE.

A novel metro crowd density estimation network based on integrating multiscale attention module

Abstract

Keywords

Get full access to this article

References