Abstract
With the rapid proliferation of intelligent applications in edge computing, the efficient scheduling and lightweight deployment of deep learning models have emerged as critical challenges. This study presents a unified framework that integrates adaptive deep neural network compression with priority-aware task scheduling, specifically tailored for edge computing environments. A hierarchical joint compression strategy is proposed, combining sparsity and quantization through learnable parameters to achieve substantial model size reduction while preserving predictive accuracy. Concurrently, a two-stage load redistribution scheduling algorithm is developed, guided by task latency tolerance and completion time deviation, to enhance resource utilization and achieve balanced server load distribution. Experimental evaluations conducted on benchmark deep learning models and real-world power IoT scenarios demonstrate that the proposed method attains a 143× compression ratio with only a 1.3% loss in accuracy. Furthermore, it improves task latency and load balancing efficiency by over 18% and 21%, respectively, when compared to conventional methods. These results substantiate the effectiveness of the proposed framework in facilitating lightweight, responsive, and resource-efficient edge intelligence.
Keywords
Get full access to this article
View all access options for this article.
