Sage Journals: Discover world-class research

Abstract

With the rapid proliferation of intelligent applications in edge computing, the efficient scheduling and lightweight deployment of deep learning models have emerged as critical challenges. This study presents a unified framework that integrates adaptive deep neural network compression with priority-aware task scheduling, specifically tailored for edge computing environments. A hierarchical joint compression strategy is proposed, combining sparsity and quantization through learnable parameters to achieve substantial model size reduction while preserving predictive accuracy. Concurrently, a two-stage load redistribution scheduling algorithm is developed, guided by task latency tolerance and completion time deviation, to enhance resource utilization and achieve balanced server load distribution. Experimental evaluations conducted on benchmark deep learning models and real-world power IoT scenarios demonstrate that the proposed method attains a 143× compression ratio with only a 1.3% loss in accuracy. Furthermore, it improves task latency and load balancing efficiency by over 18% and 21%, respectively, when compared to conventional methods. These results substantiate the effectiveness of the proposed framework in facilitating lightweight, responsive, and resource-efficient edge intelligence.

Keywords

edge computing adaptive model compression task scheduling sparsity quantization power Internet of Things resource allocation

Get full access to this article

View all access options for this article.

References

Shahzad

Jasińska

. Renewable revolution: a review of strategic flexibility in future power systems. Sustainability 2024; 16(13): 5454.

Nain

Pattanaik

Sharma

. Towards edge computing in intelligent manufacturing: past, present and future. J Manuf Syst 2022; 62: 588–611.

Zhang

. A fine-grained task scheduling mechanism for digital economy services based on intelligent edge and cloud computing. J Cloud Comput 2023; 12(1): 30.

Hou

Gao

Wang

, et al. Improved grey wolf optimization algorithm and application. Sensors 2022; 22(10): 3810.

Musa

Kakudi

Hassan

, et al. Lightweight deep learning models for edge devices—a survey. ijcisim 2025; 17: 18.

Lee

Kim

Kang

, et al. Genetic algorithm based deep learning neural network structure and hyperparameter optimization. Appl Sci 2021; 11(2): 744.

Koutsoukas

Monaghan

, et al. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017; 9: 1–13.

Akay

Karaboga

Akay

. A comprehensive survey on optimizing deep learning models by metaheuristics. Artif Intell Rev 2022; 55(2): 829–894.

Dantas

Sabino da Silva Jr

Cordeiro

, et al. A comprehensive review of model compression techniques in machine learning. Appl Intell 2024; 54(22): 11804–11844.

10.

Chang

Liu

Xiong

, et al. A survey of recent advances in edge-computing-powered artificial intelligence of things. IEEE Internet Things J 2021; 8(18): 13849–13875.

11.

Choudhary

Mishra

Goswami

, et al. A comprehensive survey on model compression and acceleration. Artif Intell Rev 2020; 53: 5113–5155.

12.

Meng

. Model compression for deep neural networks: a survey. Computers 2023; 12(3): 60.

13.

Wang

Giabbanelli

. Identifying informative features to evaluate student knowledge as causal maps. Int J Artif Intell Educ 2024; 34(2): 301–331.

14.

Unterthiner

Keysers

Gelly

, et al. Predicting neural network accuracy from weights, 2020. arXiv preprint arXiv:2002.11448.

15.

Tang

Luo

Xie

, et al. Automatic sparse connectivity learning for neural networks. IEEE Trans Neural Netw Learn Syst 2022; 34(10): 7350–7364.

16.

Shen

Lai

Zhang

. Systematic analysis of low-precision training in deep neural networks: factors influencing matrix computations. Appl Sci 2024; 14(21): 10025, 2076-3417.

17.

Jacob

Kligys

Chen

, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018, pp. 2704–2713.

18.

Dong

Yao

Gholami

, et al. Hawq: hessian aware quantization of neural networks with mixed-precision. In: Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Korea (South), 27 October–2 November, 2019, pp. 293–302.

19.

Huang

Liu

Fang

. MXQN: mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks. Appl Intell 2021; 51: 4561–4574.

20.

Wang

Xiao

, et al. Generalizable mixed-precision quantization via attribution rank preservation. In: Proceedings of the IEEE/CVF International conference on computer vision, Montreal, QC, Canada, 10–17 October 2021, pp. 5291–5300.

21.

Yang

Gui

Zhu

, et al. Automatic neural network compression by sparsity-quantization joint learning: a constrained optimization-based approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020, pp. 2178–2188.

22.

Park

Kim

Lee

. Quantized sparse training: a unified trainable framework for joint pruning and quantization in DNNs. ACM Trans Embed Comput Syst 2022; 21(5): 1–22.

23.

Gonzalez-Carabarin

Huijben

Veeling

, et al. Dynamic probabilistic pruning: a general framework for hardware-constrained pruning at different granularities. IEEE Trans Neural Netw Learn Syst 2022; 35(1): 733–744.

24.

Sun

Cao

Chen

. Filter pruning via automatic pruning rate search. In: Proceedings of the Asian conference on computer vision, Perth, Australia, 30 November–4 December 2018, 2022, pp. 4293–4309.

25.

Lin

Wang

, et al. Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020, pp. 1529–1538.

26.

Zhou

Gao

, et al. Dybit: dynamic bit-precision numbers for efficient quantized neural network inference. IEEE Trans Comput Aided Des Integr Circuits Syst 2023; 43(5): 1613–1617.

27.

Razani

Morin

Sari

, et al. Adaptive binary-ternary quantization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 19–25 June 2021, pp. 4613–4618.

28.

Fan

Pang

. HFPQ: deep neural network compression by hardware-friendly pruning-quantization. Appl Intell 2021; 51: 1–13.

Edge device computing power scheduling and allocation based on adaptive deep model compression

Abstract

Keywords

Get full access to this article

References