Abstract
Real-time vehicle detection is one of the challenging problems for automotive and autonomous driving applications. Object detection using Deformable Parts Model (DPM) proved to be a promising approach providing higher detection accuracy. But the baseline DPM scheme spends 98% of its execution time in loop processing thus highlighting the drawback of higher computational cost for real time applications. In this paper, we have proposed a real time vehicle detection scheme for a low-powered embedded Graphics Processing Unit (GPU). The proposed scheme is based upon DPM approach using CUDA programming with different parallelization and loop unrolling schemes to reduce computational cost of DPM. Three loop unrolling schemes i.e. loosely unrolled, tightly unrolled and hybrid unrolled is proposed and implemented on two different datasets. Finally, we provided an optimal solution for vehicle detection with minimum execution time without having any impact on vehicle detection accuracy. We achieved a speedup of 3x to 5x as compared to state-of-the-art GPU implementation and 30x as compared to baseline CPU implementation of DPM on a low-powered automotive-grade embedded computing platform which features a Tegra K1 System on Chip (SOC), thus getting advantage of improved efficiency through parallel computation of CUDA.
Get full access to this article
View all access options for this article.
