Malicious program patch control method based on improved DDPV algorithm

Abstract

In the current context of information data sharing, with the rapid development of communication technology and intelligence, malicious program patch control technology has been widely applied. To address the issues of malicious program propagation control, this study uses the deep deterministic gradient algorithm to design a malicious program patch control method. On the basis of analyzing the propagation mechanism of malicious programs, intrusion detection systems are used to model the malicious programs. The rehabilitation model for susceptible infected individuals is applied to describe the process of malware transmission and construct a composite malware patch propagation model. A malicious program patch control method based on the dual deep Q-network algorithm is designed by introducing a composite malicious program patch propagation model. The dual deep Q-network algorithm could achieve network equilibrium in 45 time steps. Under the attack of malicious programs with different hit rates, the peak proportion of susceptible devices reached 0.07, 0.02, and 0.286, respectively. The number of devices infected by high hit rate malicious programs was 2.81 times that of devices infected by low hit rate malicious programs. In dynamic network environments, the DDPV method showed good adaptability, could effectively control the propagation of malicious programs under different dynamic conditions, and maintained high network gains and patch success rates. Therefore, adopting the designed malicious program patch control method can effectively suppress the spread of malicious programs by quickly identifying and sending patches, providing strong support for building a secure network environment.

Keywords

malicious programs deep deterministic gradient algorithm patch control method composite malicious program patch propagation model double deep Q-network algorithm

Get full access to this article

View all access options for this article.

References

Yakubova

. Communication forms on social networks: a linguistic analysis of internet language. crjps 2025; 6(2): 21–26.

Mokayed

Quan

Alkhaled

, et al. Real-time human detection and counting system using deep learning computer vision techniques. Artif Intell Appl 2023; 1(4): 221–229.

Jiang

Wei

Wang

. UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient. Appl Intell 2023; 53(10): 11474–11489.

Jia

Zhou

. Policy gradient and actor-critic learning in continuous time and space: theory and algorithms. J Mach Learn Res 2022; 23(275): 1–50.

Norouzi

Shahpouri

Gordon

, et al. Safe deep reinforcement learning in diesel engine emission control. Proc Inst Mech Eng Part I J Syst Control Eng 2023; 237(8): 1440–1453.

Nwokoye

CNH

Madhusudanan

. Epidemic models of malicious-code propagation and control in wireless sensor networks: an indepth review. Wirel Pers Commun 2022; 125(2): 1827–1856.

Qureshi

Marvi

Shamsi

, et al. eUF: a framework for detecting over-the-air malicious updates in autonomous vehicles. J King Saud Univ-Comput Inf Sci 2022; 34(8): 5456–5467.

Balikcioglu

Sirlanci

Kucuk

, et al. Malicious code detection in android: the role of sequence characteristics and disassembling methods. Int J Inf Secur 2023; 22(1): 107–118.

Liu

, et al. Msdroid: identifying malicious snippets for android malware detection. IEEE Trans Dependable Secure Comput 2022; 20(3): 2025–2039.

10.

Shah

Mehmood

Khan

, et al. HeuCrip: a malware detection approach for internet of battlefield things. Clust Comput 2023; 26(2): 977–992.

11.

, et al. Continuous decision-making for autonomous driving at intersections using deep deterministic policy gradient. IET Intell Transp Syst 2022; 16(12): 1669–1681.

12.

Egbomwan

Liu

Chaoui

. Twin delayed deep deterministic policy gradient (TD3) based virtual inertia control for inverter-interfacing DGs in microgrids. IEEE Syst J 2022; 17(2): 2122–2132.

13.

Chen

Tang

Wang

. Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT. Digit Commun and Netw 2023; 9(4): 836–845.

14.

Xiao

Zhao

, et al. Multiple peg-in-hole compliant assembly based on a learning-accelerated deep deterministic policy gradient strategy. Ind Robot 2022; 49(1): 54–64.

15.

Liu

Zhang

. Power allocation in ultra-dense networks through deep deterministic policy gradient. IEEE Wirel Commun Lett 2022; 11(12): 2502–2506.

16.

Song

Fan

Chang

, et al. Reconstructing classification to enhance machine-learning based network intrusion detection by embracing ambiguity. In: Silicon Valley Cybersecurity Conference. Cham: Springer International Publishing, 2020, pp. 169–187.

17.

Chadwick

Fan

Costantino

, et al. A cloud-edge based data security architecture for sharing and analysing cyber threat information. Future Gener Comput Syst 2020; 102: 710–722.

18.

Song

Wang

Qian

, et al. From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization. Appl Intell 2023; 53(12): 15188–15203.

19.

Monaci

Agasucci

Grani

. An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents. Eur J Oper Res 2024; 312(3): 910–926.

20.

Viquerat

Duvigneau

Meliga

, et al. Policy-based optimization: single-step policy gradient method seen as an evolution strategy. Neural Comput Appl 2023; 35(1): 449–467.

21.

D’Alfonso

Giannini

Franzè

, et al. Autonomous vehicle platoons in urban road networks: a joint distributed reinforcement learning and model predictive control approach. IEEE/CAA J Autom Sinica 2024; 11(1): 141–156.

22.

Zhang

Jia

, et al. A detection and rerouting mechanism for platoon control of non‐linear autonomous vehicles under denial of service attacks. IET Control Theory & Appl 2024; 18(6): 798–813.

23.

Kontagora

Adeshina

Musa

. A comparative analysis of machine learning models for real-time IoT threat detection with focus on Mirai Botnet. OAlib 2025; 12(2): 1–12.

24.

Aurangzeb

Anwar

Naeem

, et al. BigRC-EML: big-data based ransomware classification using ensemble machine learning. Clust Comput 2022; 25(5): 3405–3422.

25.

Abualhaj

Al-Khatib

. Using decision tree classifier to detect Trojan Horse based on memory data. TELKOMNIKA 2024; 22(2): 393–400.

26.

Weston

Temimi

Fonseca

, et al. A rule-based method for diagnosing radiation fog in an arid region from NWP forecasts. J Hydrol 2021; 597: 126189.

27.

Liu

Deng

Xie

, et al. Human-level control through directly trained deep spiking Q-networks. IEEE Trans Cybern 2022; 53(11): 7187–7198.

28.

Wang

. Deep deterministic policy gradient with compatible critic network. IEEE Trans Neural Netw Learn Syst 2021; 34(8): 4332–4344.