Sage Journals: Discover world-class research

Abstract

A deep learning algorithm tracks an object’s movement during object tracking and the main challenge in the tracking of objects is to estimate or forecast the locations and other pertinent details of moving objects in a video. Typically, object tracking entails the process of object detection. In computer vision applications the detection, classification, and tracking of objects play a vital role, and gaining information about the various techniques available also provides significance. In this research, a systematic literature review of the object detection techniques is performed by analyzing, summarizing, and examining the existing works available. Various state of art works are collected from standard journals and the methods available, cons, and pros along with challenges are determined based on this the research questions are also formulated. Overall, around 50 research articles are collected, and the evaluation based on various metrics shows that most of the literary works used Deep convolutional neural networks (Deep CNN), and while tracking the objects object detection helps in enhancing the performance of these networks. The important issues that need to be resolved are also discussed in this research, which helps in leveling up the object-tracking techniques.

Keywords

Object tracking computer vision convolutional neural network object detection

1. Introduction

Object detection is a crucial and challenging subject of research in the domains of computer vision and digital image processing [29]. The goal of object tracking is to automatically locate the target across the entire video. In the realm of aerial and satellite image processing, object detection is crucial but challenging [34]. For a human brain, this process of object detection in an image might be quite simple, but not for a machine. Computer vision, which has the ability of the machine to process the data that demands a huge memory and various graphic capabilities, is required for a machine to distinguish things from an image [2]. The term “object detection” refers to the process of identifying objects based on characteristics like color, shape, and size. Typically, to recognize vehicles, vehicle queues from satellite images are extracted [52]. As a result of the continued advancement of improved compressed sensing technologies in recent years, satellite video is increasingly used in several circumstances, including humanistic surveys, education, emergency rescue, disaster assistance, and military objectives [21]. Some approaches consider object detection as a classification problem and have demonstrated good performance for several specific object recognition tasks in recent years due to the advancement of machine learning techniques, notably the sophisticated feature representations and classifiers [44].

Long-term monitoring is impossible for human operators. It might be challenging to identify items like vehicles, trucks, planes, and ships in high-resolution satellite images. There is no universally accepted solution to this problem, even though numerous ways seek to tackle it [42]. CNNs, a type of neural network, can be used to detect cars, and other vehicles [52, 56, 51, 48, 32, 28, 17]. The primary difficulty in tracking moving vehicles is the small size of the vehicle, which makes stable tracking features impossible. Vehicles have mostly been the targets of moving object detection and tracking utilizing satellite films up to this point [6]. It is extensively employed in numerous application domains, including robotic navigation, intelligent video surveillance, industrial detection, aerospace, military surveillance, homeland security, transportation planning and management, and intelligent traffic guidance systems, among others [29, 4]. It is challenging for conventional detectors and trackers to make the targets visible in satellite images and high-altitude drones [35, 37]. As deep CNNs have attained better performance for image classification [42], CNN-based approaches for object identification have been drawing more and more attention from researchers [15, 47, 24, 20, 3, 30, 38, 54, 5, 18, 33, 14, 12, 8, 45, 13, 19, 7, 49, 55, 27, 9, 51].

Multiple types of research have emerged in recent times for the detection of objects along with classification and tracking. Accurate detection and classification are more important for applying the research in real-time scenarios. For the developed models there is a necessity to prove their sufficiency using multiple datasets. For developing a novel framework relevant to object detection there is a necessity to gain deeper insight into the available techniques for the detection. This research aims to provide a systematic review of the methods for answering the research questions that are interpreted below.

RQ1: What are the techniques that are widely used for the tracking of objects in recent research? RQ2: Does efficient object detection and classification play an efficient role in object tracking mechanisms? RQ3: What are the challenges that are associated with the object tracking mechanisms?

A systematic literature review of the object detection techniques is carried out in this research and the papers from the years 2015 to 2022 are analyzed and the observations are made. The study aims to provide information about the recent advancements made in the object tracking mechanisms and the methods that are used in software alone are concentrated in the research.

The research questions are formulated relying upon the information accessed from the various articles and the formulated questions are described as follows,

RQ1: What are the techniques that are widely used for the tracking of objects in recent research?

•
What are the methods that have been utilized for tracking the objects since 2015?
•
What are the metrics that are used for proving the efficacy of the model?

RQ2: Does efficient object detection and classification play an efficient role in object tracking mechanisms?

•
What are the methods that are utilized for object detection and classification?
•
How does the existing work deal with the issues arising in object tracking?

RQ3: What are the challenges that are associated with the object tracking mechanisms?

•
What are the challenges that exist in the research?
•
What are the challenges that are resolved by the researchers and what are the factors that need improvement should be analyzed?

The organization of the paper is enumerated as follows: the related works relevant to object detection, classification, and tracking are enumerated in Section 2. The various analyses relevant to the various metrics are interpreted in Section 3. The potential challenges to be overcome in the research are interpreted in Section 4 and the research is concluded in Section 5.
2. Related works

The works undertaken based on the object detection, classification, and tracking methods are reviewed in the below sections. The systematic flow of the research relying on different techniques is described in Fig. 1.

Figure 1.

Taxonomy representation.

2.1 Object tracking

Kamini Goyal and Dapinder Kaur [52] implemented a deep neural network (DNN) model for traffic surveillance, utilizing a Median filter for salt and pepper noise removal and reducing Descriptor size through non-negative matrix factorization (NMF). The Hybrid DNN employed advantageous pedestrian classification; however, the method’s performance is hindered by the isolated working of the detection and tracking networks. Other approaches, such as that of Alisa Makhmutavo et al. [15], addressed occlusion issues but are limited to non-moving objects. ShiJie Sun et al. [47] used deep learning for crowded vehicle tracking, yet the separate operation of detection and tracking networks negatively impacted overall efficiency. Yujia Guo et al. [10] introduced an algorithm combining a correlation filter and Kalman filter, enhancing tracking speed but facing occasional low-confidence scores. Bo Du et al. [20] fused the Kernel Correlation Filter with a three-frame-difference algorithm, improving small object detection but with potential limitations in tracking efficiency.

Da Zhang et al. [30] applied deep reinforcement learning for offline object tracking, ensuring suitability beyond real-time scenarios. Xingping Dong et al. [54] implemented a real-time tracking method with a kernel classifier, addressing drifting issues but encountering challenges with low confidence scores. Kuan Fang et al. [33] executed an autoregressive method with internal and external memory, demonstrating high robustness in occluded and crowded areas. Ahilan Appathurai et al. [14] developed a hybrid method using Artificial Neural Network and an oppositional Gravitational search optimization algorithm, enhancing performance with optimally selected weight values. Guanghan Ning et al. [6] tracked visual objects using a recurrent network, effectively handling occlusion challenges. Xu Chen and Haigang Sui [35] presented an efficient method for detecting moving objects in real-time satellite videos, utilizing a Discriminative Correlation filter and Kalman filter for precise position detection. Fahime Farahi and Hadi Sadoghi Yazdi [45] employed a probabilistic Kalman Filter for improved tracking estimation, demonstrating the ability to handle occlusion and track abnormal behavior. Jahongir Azimjonov et al. [19] used You look only once (YOLO) and Kalman filters for vehicle detection and tracking, achieving effective tracking but with lower accuracy in estimating trucks. Xu Chen et al. [9] introduced an adaptive motion separation and differential accumulated trajectory for moving vehicle detection in satellite videos, improving accuracy in distinguishing moving vehicles from pseudo-motion backgrounds. Shiyu Xuan et al. [11] developed a novel tracking algorithm combining correlation filter and motion estimation, overcoming challenges in tracking fast-moving objects. Renxi Chen et al. [36] employed adaptive filtering and lightweight CNN models for moving vehicle detection, achieving noise reduction but experiencing a slight degradation in recall.

Shiyu Xuan et al. [22] addressed tracking a moving rotating object from satellite video using an adaptive correlation filter tracking algorithm, demonstrating efficiency in handling changes in bounding boxes due to rotation. Bing Sui et al. [16] presented a lightweight network for object detection in satellite videos, overcoming the limitations of CNN-based trackers and achieving efficient object tracking with parameter identification and network parameter identification. Niharika Goswami et al. [39] utilized the U-set deep learning system for object detection in high-resolution satellite images, demonstrating simplicity but facing challenges with low scores in detecting certain objects. Xiaofeng Li et al. [46] presented a real-time algorithm for tracking vehicles from aerial images, employing image offset calibration, transfer learning, and filter set construction for accurate target motion detection. Hyungjun Kim [41] developed a traffic monitoring system using CNN for vehicle type classification, background modeling, edge detection, and object tracking, effectively tracking vehicles but facing low-score issues for certain objects. Eric Price et al. [43] expressed real-time continuous DNN-based tracking and detection from multiple cooperating robots, utilizing optimal control problem (OCP) and graphics processing units (GPUs) for multi-robot cooperative detection and tracking. Sayed Majid Azimi et al. [53] addressed multi-object tracking from aerial images using a Siamese neural network, long and short-term memory, and a graphical convolutional network, achieving accurate and stable tracking. Zhaopeng Hu et al. [25] extended deep learning for object tracking in satellite videos using a regression network, integrating gradient descent algorithm, convolutional layer, and regression model for improved performance, leveraging the Visual Geometry Group (VGG-16) network for effective feature extraction.

2.2 Object detection

Chenchen Jiang et al. [24] introduced the You Only Look Once (YOLO) model for object detection from Unnamed Aerial Vehicle (UAV) using Thermal Infra-Red images (TIR) and videos, performing multi-scenario object detection with various YOLO models. YOLOv5 demonstrated efficiency in detecting small objects in real-time at frequently changing and complex backgrounds on UAV and TIR videos however, a limitation exists as it can only work within certain viewing angles. Peng Ding et al. [26] enhanced deep CNN for optical remote sensing by employing dilated convolution and Online Hard Example Mining (OHEM) for efficient bootstrapping. The Faster R-CNN technique, combined with an enhanced VGG16-net, improved accuracy in detecting objects; nonetheless, the detection ability of the network is lower. Ying Ya et al. [29] utilized an arbitrary-oriented region CNN along with a fusion object detection framework for detecting objects from satellite images. The method involved pan-sharpening methods for fusing multi-source images and Faster R-CNN for detecting large-scale satellite images. However, the complexity of deep learning techniques, with multi-layer models, poses computational challenges.

Atakan Korez and Necaattkin Barisci [34] presented a multi-scale Faster R-CNN method for a graphic processing unit (GPU) system, utilizing the Weight Standardization (WS) technique for weight calculation in normalization. The combination of deformable convolution and ResNet-50 extracted high-resolution features efficiently, but the model is applicable only for small batch-sized images. Wenming Cao et al. [3] developed a method for real-time video object detection using fast DNN with knowledge-guided training. The deep NN is trained with a cross-network knowledge projection framework and Support Vector Machine (SVM) for low-complexity object detection; however, the model’s applicability is limited to small batch-sized images. Ivan V. Saetchnikov et al. [38] compared various CNN methods for object detection, highlighting YOLO v3’s superior performance. The inclusion of additional dropout layers with empirical optimization mitigated over-learning during the segmentation task, but the methods apply only to a limited dataset. Gong Cheng et al. [44] introduced a Rotation Invariant CNN (RICNN) model for improved object detection in very high-resolution (VHR) optical remote sensing images. Utilizing a simple rotation function and generic object proposal detection, RICNN efficiently detected vehicles; however, the model’s effectiveness is not guaranteed for all scenarios.

Joshua Bapu et al. [2] employed an Adaptive CNN for spatial object recognition with N-gram, using SOBEL edge detection and gray-level Co-occurrence matrices (GLCM) for object detection from satellite images. The complexity of deep learning techniques and the need for multiple processes to reduce noise may impact computational efficiency. Junfeng Lei et al. [21] presented a method for detecting tiny vehicles in satellite video using spatial-temporal information. The use of a Gaussian filter for the detection process and constraints aimed at terminating false detections adds complexity, requiring more constrained details. Xiaofei Liu et al. [5] used a CNN-based method for real-time ground vehicle detection in infrared images, capturing a greater number of features in infrared imagery. However, manually labeling training samples resulted in increased processing time. Tao Yang et al. [18] presented a detection method for small moving vehicles from satellite video in urban areas. The use of the saliency background model improved accuracy in moving vehicle detection; however, the model’s effectiveness depends on pre-segmented regions, reducing false detections.

Saleh Javadi et al. [12] introduced a method for heavy vehicle detection from aerial images using DNN and depth maps. While achieving improved detection through depth map analysis, the model’s effectiveness is contingent on the modified CNN architecture and selected detector network. Yuanlin Zhang et al. [11] utilized a Hierarchical and Robust NN for enhanced object detection accuracy in remote sensing images. HRCNN efficiently performed four tasks using a combination of the greedy algorithm, AlexNet for feature extraction, and Support Vector Machine; however, the model’s applicability may be limited to specific datasets. Gong Cheng et al. [37] presented a Rotation Invariant and Fisher Discriminative CNN (RIFD-CNN) for object detection with improved performance. The optimization of a new objective function applied to rotation-invariant regularize and fisher discrimination regularize on CNN demonstrated efficiency, particularly in datasets where rotation-invariance is essential. Yapeng Guo et al. [7] introduced the orientation-aware feature fusion Single-stage Detection (OAFF-SSD) deep learning technique for dense construction vehicle detection from unmanned aerial vehicles (UAV). The model’s incorporation of multilevel feature extraction and orientation-aware bounding box regression contributed to more precise detection.

Wenhua Zhang et al. [49] presented the Laplacian Feature Pyramid Network (LFPN) for combining low and high-frequency features, enhancing object detection performance in very high-resolution optical remote sensing (VHR-ORS) images. The use of the Feature Pyramid Network (FPN) and CNN demonstrated efficiency in the NWPU VHR-10 dataset. Ali Tourani et al. [40] presented a Faster Region-based CNN (Faster R-CNN) for vehicle detection from video, using a low-pass filter in image pre-processing. The residual learning framework with ResNet-50 and Faster R-CNN demonstrated effectiveness in vehicle detection, though further improvement is required. Yongzheng Xu et al. [1] introduced the Faster R-CNN method for detecting cars from low-altitude UAV imagery. The method’s two-module approach, incorporating a Fast R-CNN detector and Region Proposal Network (RPN), demonstrated high-speed vehicle detection but with potential limitations in completeness.

2.3 Object classification

S. Vasavi et al. [42] presented a neural network-based classification method that overcome the overfitting and low-performance problem of the deep learning technique. The appearance-based multi-block local binary pattern and model-based algorithm were implemented and the objects present in satellite images were detected and classified. The concept of invariant features with the dark net architecture of YOLO is added and consolidated with Faster Region-Based CNN (Faster RCNN) at different spatial locations that counted the total number of vehicles. Prediction of more classes of vehicles and small object detection were the advantages of combining YOLO with Faster RCNN.

2.4 Object detection and tracking

Hyochang Ahn et al. [4] used a knowledge-based CNN that tracked the objects using an optical flow algorithm. The position of the objects is frequently updated from the frame and the more accurate features are extracted but the processing time is considerably high. Chandan G. et al. [55] detected and tracked the objects using a faster RCNN. This model can be utilized in different situations to locate, follow, and react to the targeted objects in the video surveillance. The trained model produced good detection and tracking outcomes. Camlo Aguilar et al. [27] explained the method of tracking and detecting the objects from satellite videos based on motion CNN with two steps. Initially, the rough location target was identified with a lightweight motion detection operator and the detected results were refined and combined with CNN. Probability Hypothesis Density (PHD) filter changed detection for tracking the vehicles. Multi-object Bayesian data-association framework performed well by continuous tracking of the missed target over different Bayesian filters.

Muhammad Rashid et al. [13] presented a CNN and Scale Invariant Features Transform (SIFT) that overcomes complex backgrounds, congested situations, and similarity problems. VGG and Alex Net the deep CNN models extracted its features after that DCNN pooling and SIFT point matrix were implemented by Reyni entropy-controlled method to select its robust features. These are aligned into a matrix given to the ensemble classifier for recognition and are analyzed by Barkley 3D, Caltech101, and Pascal 3D datasets.

2.5 Others

Jiasong Zhu et al. [31] presented a method for UAV by introducing the deep learning-based detection, tracking, and counting of vehicles for estimating the traffic in urban areas. The counting Framework included two parts, they were deep learning-based detection and identification of single shot multibox detector (SSD), vehicle tracking, and counting meanwhile, these were experimented with in the UAV city Traffic Video Dataset (UAV CT). Seonkyeong Seong et al. [23] presented a method that tracked the vehicle direction involved by optical bounding box applied using CNN. Together with intersecting the received image from the camera vehicle trajectory was extracted as the YOLOv2 model algorithm that was applied in object detection meanwhile Intersection-over-union (IOU) tracker and Kalman filter vehicle tracking algorithm trajectory were estimated. Debojit Biswas et al. [50] stated a method to detect the speed of multiple moving objects from a UAV platform, which included three steps to detect and track an object. Faster R-CNN was applied for the detection of objects where channel and spatial reliability tracking (CSRT) including discriminative correlation filter applied to track objects. Similarly, to get the object location for each frame Feature-based image alignment (FBIA) was used.

3. Bibliographic analysis

The analysis based on methods and the analysis based on the dataset are performed and that are as follows.

3.1 Analysis based on methods

This literature intends to give new researchers the necessary backing for a better comprehension of the methods for object tracking that are now being developed. For scholars and researchers, this document covers the most prevalent methods available for object detection, classification, and tracking. This section examines 50 cutting-edge deep learning methods. Because every study tries to focus on a distinct set of parameters, it is challenging to pinpoint which approaches are preferable. To respond to that query, we will first examine each study’s methodology before identifying the strategies that are the most effective overall. Each paper’s architecture information is extracted based on its framework. The interpretations are shown in Table 1 and Fig. 2.

Table 1
Analysis concerning methods

Methods	Papers
DNN	[52, 20, 14, 12, 31, 43]
Algorithm	[15, 13, 9]
Deep Affinity Network	[47]
YOLO	[24]
CNN	[24, 42, 29, 34, 4, 38, 44, 2, 5, 35, 8, 37, 19, 7, 27, 36, 22, 16, 53, 40, 23, 1, 46, 41, 50]
Dense convoluted networks	[26]
VGG16	[26]
Darknet	[42]
Correlation filter	[10, 20, 54, 6, 45, 11]
Deep Reinforcement Learning	[30]
Recurrent Autoregressive Networks	[33]
Mobile net	[55]
Convolutional Regression Network	[25]
Others	[21, 18, 49]

Table 2

Analysis based on dataset

Dataset	Links	Papers
Stanford University, Stanford, United States	Sky at night: amazing photographs from London’s aerial photographer $\|$ London Evening Standard $\|$ Evening Standard	[52, 26, 38, 53, 50]
Not Defined	–	[15, 20, 4, 2, 21, 54, 14, 49, 40, 31]
Private	–	[47, 12, 6, 13, 19, 27, 39, 46, 25]
FLIR	https://www.flir.com	[24]
Cars Overhead with Context	Cars Overhead With Context at LLNL	[42]
DOTA (Dataset for Object Detection in Aerial Images)	https://paperswithcode.com/dataset/dota	[29, 1]
SkySat Constellation	SkySat Constellation (eoportal.org)	[10, 18, 11]
NWPU VHR-10	researchgate.net/figure/A-detailed-introduction-to-NWPU-VHR-10-dataset_tbl1_348516661	[34, 44, 8, 7]
CIFAR-100	https://www.cs.toronto.edu/∼kriz/cifar.html	[3]
VOC	https://paperswithcode.com/dataset/pascal-voc	[30, 6, 45, 43]
NPU_CS_UAV_IR_DATA	NPU_CS_UAV_IR_DATA (shanxiliuxiaofei.github.io)	[5]
MOT	https://paperswithcode.com/dataset/motchallenge	[33]
Australian Football League (AFL) Database	Australian Football League (AFL) Database	[37]
CGSTL dataset	mall.charmingglobe.com	[55]
Jilin-1	https://www.satimagingcorp.com/satellite-sensors/jilin-1-satellite-sensor-1m/	[9, 36]
UAV123 dataset	https://paperswithcode.com/dataset/uav123	[22, 41, 23]
AerialMPT	https://ailb-web.ing.unimore.it/icpr/media/posters/11143.pdf	[46]

Figure 2.

Analysis concerning methods.

3.2 Analysis based on dataset

The analysis is performed relying upon the various databases and is used to provide the details about the availability of the datasets. Most of the methods utilized data from standard available repositories some of them used manually collected data and others didn’t provide information about the data which is interpreted in Table 2.

4. Potential challenges

Handling occlusions between frames, especially in scenarios involving complex motions, poses a significant challenge for both linear and non-linear models [47]. Object detection is hindered by the presence of complex scene information, low resolution, and the absence of publicly available datasets and training models, making it a challenging task [24]. Achieving accurate dense item detection with a strong classifier proves to be a demanding task, requiring robust methods [42]. Recognizing objects in vehicles is complicated due to obscured objects and shadow zones, adding a layer of difficulty to the object recognition process [42]. Tracking objects in satellite videos is challenging due to factors such as the tiny size of moving objects, lack of texture, and background similarity [10]. Object detection in remote sensing images, applied in various fields including agriculture, city monitoring, and traffic monitoring, faces challenges such as limited datasets and high costs [34]. The scarcity of datasets for specific object classes presents an obstacle to object detection, with aerial imaging facing additional restrictions due to high costs [38].

Despite their adaptability, online classifiers often struggle with the drifting issue caused by noisy updates [54]. Poor resolution in aerial images and the complexity of vehicle recognition make it difficult to extract notable features, handle stance variations, view changes, and manage ambient radiation [5]. Accurately recognizing moving vehicles while suppressing false alarms from objects of the same size remains a challenging task [18]. In some scenarios, tracking algorithms must rely on visual cues rather than bounding box motions when pedestrians in front of the camera move at similar speeds and sizes [33]. Addressing occlusion and significant appearance fluctuation in visual object tracking poses a challenge due to the difficulty of evaluating unknown features [17]. Rapidly and precisely identifying cars in aerial images remains a challenging aspect of object tracking [12]. Visual tracking in computer vision is challenged by target deformations, lighting variations, size changes, rapid motions, occlusions, motion blur, object deformation, and backdrop clutters [6]. Object tracking encounters difficulties in crowded environments and complicated backgrounds, where distinguishing between various objects becomes challenging [13]. The threshold used to differentiate foreground and background in various satellite video images poses a significant detection challenge [9].

5. Conclusion

The main goal of this systematic literature review is to give new researchers a starting point for their object-tracking study. After a detailed filtration, 50 research articles relevant to object detection, classification, and tracking are analyzed and the conclusion is made that the deep CNN network is widely used for these object-tracking mechanisms. The object detection and classification play a major role in the tracking of objects from the satellite images. The analysis is made based on various factors such as the methods availed, the published journal, the year published, metrics used the potential challenges associated with object tracking is also interpreted. The superiority of the papers is also analyzed based on the citations provided by the number of researchers. Interestingly, the majority of research papers in this domain were published during the year 2019, underscoring a pivotal year for advancements in Object Tracking Mechanisms Based on Deep Learning. These findings underscore the central role of Stanford University Dataset, CNN, and precision in the evolution of this field, offering valuable insights into the trajectory of research in this area. In the future, the techniques based on optimization will also be included for reviewing the techniques available for the object tracking mechanisms.

Footnotes

Author’s Bios

Dr. Nilesh J. Uke, He received the B.E. degree in Computer Science and Engineering from SantGadge Baba Amravati University, India, in 1995, and the M.E. from BharathiVidhyapeeth in 2005 and Ph.D. degrees in Computer Science, from SRT Marathwada University, Nanded, India, in 2014. He is currently a Principal and Professor in Computer Engineering at Trinity Academy of Engineering, Pune; affiliated to Savitribai Phule Pune University. His current research interest includes Visual Computing, Artificial Intelligence, Human Computer Interface and Multimedia Systems. He is member of IEEE, ACM and Life Member of the Indian Society for Technical Education (ISTE), Fellow of Institute of Engineers, and Computer Society of India (CSI). ORCID: https://orcid.org/0009-0006-8459-816X.

	Pravin Futane received B.E. (Computer Science and Engineering), SGBAU, Amravati University in 1997 and M.E. Electronics (Computer Engineering), COEP, Pune University in 2002. He pursued his Ph.D in Computer Science and Engineering from Amaravati University in 2015. He has total teaching and research experience of 23 years. He is a Life Member – ISTE (Indian Society of Technical Education) and International Association of Engineers (IAENG) and UACEE (Universal Association of Computer and Electronics Engineers). His current research interest are Artificial Intelligence, Visual Computing, Sign Gestures Recognition, Image Processing, Databases. https://orcid.org/0000-0003-0641-8603.
	Dr. Neeta Deshpande received her B.E. degree in Computer Science and Engineering from SantGadge Baba Amravati University, India, in 1995, and the M.E. from Walchand College of Engineering Sangli in 2004 and Ph.D. degrees in Computer Science, from SRTMarathwadaUniversity, Nanded, India, in 2014. She is currently Associate Professor in Computer Engineering at Gokhale Education Society’s R H Sapat College of Engineering Nasik affiliated to Savitribai Phule Pune University. Her current research interest includes Artificial Intelligence, Visual Computing, Data Science and Multimedia Systems. She is life member Indian Society for Technical Education (ISTE), Computer Society of India (CSI) and Institute of Engineers (IEI). https://orcid.org/0000-0002-0529-439X.
	Shailaja Uke received the B.E. degree in Information Technology in 2003 from Savitribai Phule Pune University, India and M. Tech in Information Technology in 2008 from BharathiVidyapeeth Pune. She is pursuing her PhD from GH Raisoni University, Amaravati. Her Current research interest includes Artificial Intelligence, Computer Vision, Machine learning and Object-Oriented Modelling. She is currently working as Assistant Professor in Computer Department Vishwakarma Institute of Technology, Pune, India. https://orcid.org/0000-0001-5185-627X.

References

Appathurai

Sundarasekar

Raja

Alex

E.J.

Palagan

C.A.

and Nithya

, An efficient optimal neural network-based moving vehicle detection in traffic video surveillance system, Circuits, Systems, and Signal Processing 39 (2020), 734–756.

Jahongir

and Özmen

, A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways, Advanced Engineering Informatics 50 (2021), 101393.

Körez

and Barışçı

, Object detection with low capacity GPU systems using improved faster R-CNN, Applied Sciences 10 (2019), 83.

Tourani

Soroori

Shahbahrami

Khazaee

and Akoushideh

, A robust vehicle detection approach based on faster R-CNN algorithm, in: 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), 2019, pp. 119–123.

Sun

Cai

and Du

, Object tracking in satellite videos by fusing the kernel correlation filter and the three-frame-difference algorithm, IEEE Geoscience and Remote Sensing Letters 15 (2017), 168–172.

Sui

and Li

, Object tracking in satellite videos based on a lightweight network, Journal of Circuits, Systems and Computers 30 (2021).

Renxi

and Li

, A lightweight CNN model for refining moving vehicle detection from satellite videos, IEEE Access 8 (2020).

Biswas

Wang

and Stevanovic

, Speed estimation of multiple moving objects from a moving UAV platform, ISPRS International Journal of Geo-Information 8 (2019), 259.

Peng

et al., A light and faster regional convolutional neural network for object detection in optical remote sensing images, ISPRS Journal of Photogrammetry and Remote Sensing 141 (2018), 208–218.

10.

Price

Lawless

Ludwig

Martinovic

Bülthoff

H.H.

Black

M.J.

and Ahmad

, Deep neural network-based cooperative visual tracking through multiple micro aerial vehicles, IEEE Robotics and Automation Letters 3 (2018), 3193–3200.

11.

Farahi

and Yazdi

H.S.

, Probabilistic Kalman filter for moving object tracking, Signal Processing Image Communication 82 (2020).

12.

Kuan

Xiang

and Savarese

, Recurrent autoregressive networks for online multi-object tracking, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 466–475.

13.

Cheng

Han

Zhou

and Xu

, Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection, IEEE Transactions on Image Processing 28 (2018), 265–278.

14.

Cheng

Zhou

and Han

, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 54 (2016), 7405–7415.

15.

Kamini

and Kaur

, A novel vehicle classification model for urban traffic surveillance using the deep neural network model, International Journal of Education and Management Engineering (IJEME) 6 (2016), 18–31.

16.

Yapeng

and Li

, Dense construction vehicle detection based on orientation-aware feature fusion convolutional neural network, Automation in Construction 112 (2020), 103124.

17.

Chandan

Jain

and Jain

, Real time object detection and tracking using Deep Learning and Open CV, in: 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), 2018, pp. 1305–1308.

18.

Ahn

and Cho

H.J.

, Research of multi-object detection and tracking using machine learning based on knowledge for video surveillance system, Personal and Ubiquitous Computing 1 (2022), 1–10.

19.

Kim

, Multiple vehicle tracking and classification system with a convolutional neural network, Journal of Ambient Intelligence and Humanized Computing 13 (2022), 1603–1614.

20.

Saetchnikov

I.V.

Tcherniavskaia

E.A.

and Skakun

V.V.

, Object detection for unmanned aerial vehicle camera via convolutional neural networks, IEEE Journal on Miniaturization for Air and Space Systems 2 (2020), 98–103.

21.

Chenchen

Ren

Zhu

Zeng

Nan

Ren

M.X.

and Huo

, Object detection from UAV thermal infrared images and videos using YOLO models, International Journal of Applied Earth Observation and Geoinformation 112 (2022), 102912.

22.

Bapu

J.J.

Florinabel

D.J.

Robinson

Y.H.

Julie

E.G.

Kumar

Ngoc

V.T.N.

and Giap

C.N.

, Adaptive convolutional neural network using N-gram for spatial object recognition, Earth Science Informatics 12 (2019), 525–540.

23.

Saleh

Dahl

and Pettersson

M.I.

, Vehicle detection in aerial images based on 3D depth maps and deep neural networks, IEEE Access 9 (2021), 8381–8391.

24.

Zerubia

Aguilar

and Otner

, Small Object Detection and Tracking in Satellite Videos With Motion Informed-CNN and GM-PHD Filter, Frontiers in Signal Processing 20 (2022), 827160.

25.

Zhu

Sun

Jia

Hou

Lin

and Qiu

, Urban traffic density estimation based on ultrahigh-resolution UAV video and deep neural network, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (2018), 4968–4981.

26.

Junfeng

Dong

and Sui

, Tiny moving vehicle detection in satellite video with constraints of multiple prior information, International Journal of Remote Sensing 42 (2021), 4110–4125.

27.

Alisa

Anikin

I.V.

and Dagaeva

, Object tracking method for video monitoring in intelligent transport systems, in: 2020 International Russian Automation Conference (RusAutoCon), 2020, pp. 535–540.

28.

Rashid

Khan

M.A.

Sharif

Raza

Sarfraz

M.M.

and Afza

, Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features, Multimedia Tools and Applications 78 (2019), 15751–15777.

29.

Guanghan

Zhang

Huang

Ren

Wang

Cai

and He

, Spatially supervised recurrent convolutional neural networks for visual object tracking, in: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017, pp. 1–4.

30.

Uke

N.J.

and Futane

P.R.

, Efficient method for detecting and tracking moving objects in video, in: 2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT), 2016, pp. 343–348.

31.

Uke

and Thool

, Moving vehicle detection for measuring traffic count using opencv, Journal of Automation and Control Engineering 1 (2013).

32.

Goswami

Kathiriya

Yadav

Bhatt

and Degadwala

, Object Detection in High resolution using Satellite Imagery with Deep Learning, International Journal of Scientific Research in Science, Engineering and Technology 8 (2021), 208–215.

33.

Bangare

P.S.

Uke

N.J.

and Bangare

S.L.

, Implementation of abandoned object detection in real time environment, International Journal of Computer Applications 57 (2012).

34.

ShiJie

Akhtar

Song

H.S.

Mian

and Shah

, Deep affinity network for multiple object tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 104–119.

35.

Vasavi

Priyadarshini

N.K.

and Harshavaradhan

, Invariant feature-based darknet architecture for moving object classification, IEEE Sensors Journal 30790 (2020), 11417–11426.

36.

Azimi

S.M.

Kraus

Bahmanyar

and Reinartz

, Multiple pedestrians and vehicles tracking in aerial imagery using a convolutional neural network, Remote Sensing 13 (2021), 1953.

37.

Seong

Song

Yoon

Kim

and Choi

, Determination of vehicle trajectory through optimization of vehicle bounding boxes using a convolutional neural network, Sensors 19 (2019), 4263.

38.

Yang

et al., Small moving vehicle detection in a satellite video of an urban area, Sensors 16 (2016), 1528.

39.

Nilesh

and Thool

R.C.

, Objects tracking in video: A object-oriented approach using Unified Modeling Language, International Journal of Computational Vision and Robotics 5 (2015), 202–216.

40.

Nilesh

and Uke

, Proximity approach for object detection in video, International Journal of Control and Automation 13 (2020), 868–876.

41.

Nilesh

Futane

Uke

and Pawar

, The Moving Object Detection and Tracking in Video – A Review, Design Engineering, 2021, 5017–5029.

42.

Zhang

Jiao

Huang

and Wang

, Laplacian feature pyramid network for object detection in VHR optical remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 60 (2021), 1–14.

43.

Cao

Yuan

Zhang

and He

, Fast deep neural networks with knowledge guided training and predicted regions of interests for real-time video object detection, IEEE Access 6 (2018), 8990–8999.

44.

Chen

and Sui

, Real-time tracking in satellite videos via joint discrimination and pose estimation, The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 42 (2019), 23–29.

45.

Chen

Sui

Fang

Zhou

and Wu

, A novel AMS-DAT algorithm for moving vehicle detection in a satellite video, IEEE Geoscience and Remote Sensing Letters 19 (2020), 1–5.

46.

Wei

and Jiao

, Real-time tracking algorithm for aerial vehicles using improved convolutional neural network and transfer learning, IEEE Transactions on Intelligent Transportation Systems 23 (2021), 2296–2305.

47.

Shiyu

Han

Wan

and Xia

G.-S.

, Object tracking in satellite videos by improved correlation filters with motion estimations, IEEE Transactions on Geoscience and Remote Sensing 58 (2020), 1074–1086.

48.

Shiyu

Zhao

Zhou

Zhang

Tan

Xia

and Gu

, Rotation adaptive correlation filter for moving object tracking in satellite videos, Neurocomputing 438 (2021), 94–106.

49.

Yongzheng

Wang

and Ma

, Car detection from low-altitude UAV imagery with the faster R-CNN, Journal of Advanced Transportation, 2017.

50.

Dong

Shen

Wang

and Huang

, Occlusion-aware real-time object tracking by integrated circulant structure kernels classifier, IEEE Trans. Multimedia, 2016.

51.

Liu

Yang

and Li

, Real-time ground vehicle detection in aerial infrared imagery based on convolutional neural network, Electronics 7 (2018), 78.

52.

Guo

Yang

and Chen, Object tracking on satellite videos: A correlation filter-based tracking method with trajectory correction by Kalman filter, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (2019), 3538–3551.

53.

Pan

Jing

Ren

and Qiao

, Fusion object detection of satellite imagery with arbitrary-oriented region convolutional neural network, Aerospace Systems 2 (2019), 163–174.

54.

Zhang

Yuan

Feng

and Lu

, Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection, IEEE Transactions on Geoscience and Remote Sensing 57 (2019), 5535–5548.

55.

Maei

Wang

and Wang

Y.-F.

, Deep reinforcement learning for visual object tracking in videos, arXiv preprint arXiv:1701.08936, 2017.

56.

Yang

Zhang

and Chen

, Object tracking in satellite videos based on convolutional regression network with appearance and motion features, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13 (2020), 783–793.

A review on deep learning-based object tracking methods

Abstract

Keywords

1. Introduction

2.2 Object detection

2.3 Object classification

2.4 Object detection and tracking

2.5 Others

3. Bibliographic analysis

3.1 Analysis based on methods

Table 1 Analysis concerning methods

4. Potential challenges

5. Conclusion

Footnotes

Author’s Bios

References

Table 1
Analysis concerning methods