Sage Journals: Discover world-class research

Abstract

Deep neural networks (DNNs) are vulnerable to one of the biggest security risks, backdoor attacks, which insert hidden triggers that, under certain circumstances, alter or control model predictions. Many recent backdoor mitigation techniques implement the mitigation strategy to a model without knowing if it has been backdoor poisoned, assuming it to run effectively without regard for the target class or classes and related backdoor triggers, and this may affect the model's accuracy on benign cases. In this paper, we propose a Multi-phase Hybrid Defense Framework for Backdoor Detection, Mitigation, and Fine-Tuning in DNNs. The first phase takes advantage of a combined strategy, Confidence-Weighted Scaled Prediction Consistency and Parameter-oriented Scaling Consistency (CW-SPC-PSC), to accurately detect poisoned data. In the second phase, we apply a three-step mitigation pipeline: initial training with a hybrid loss to break the relationship between a selected backdoor label and the trigger pattern which was learned by the attacked model, targeted retraining on detected clean samples to strengthen benign patterns, and backdoor unlearning on poisoned data through adversarial gradient manipulation to help the model unlearn the potentially trigger patterns. The final phase introduces an improved teacher-student-based fine-tuning technique that utilizes Selective Neural Attention Distillation with entropy minimization to remove the backdoor trigger from the model while maintaining the model's functionality on clean samples. Experimental results on standard benchmark datasets against 3 backdoor attacks show that our hybrid defense method significantly decreases backdoor attack success rates while preserving high classification performance, presenting an effective and generalizable defense technique for adversarial-affected DNNs.

Keywords

backdoor attacks backdoor mitigation deep learning deep neural networks AI security

Get full access to this article

View all access options for this article.

References

Alabdaly

A. A.

El-Sayed

W. G.

Hassan

Y. F.

(2023). RAMRU-CAM: Residual-atrous MultiResUnet with channel attention mechanism for cell segmentation. Journal of Intelligent & Fuzzy Systems, 44(3), 4759–4777. https://doi.org/10.3233/JIFS-222631

Chen

, et al. (2018). Detecting backdoor attacks on deep neural networks by activation clustering .

Chen

, et al. (2024). A survey of large language models for cyber threat detection. Computers & Security, 145, 104016. https://doi.org/10.1016/j.cose.2024.104016

Cheng

, et al. (2025). UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening. In Computer Vision – ECCV 2024 (pp. 262–281). Springer Nature Switzerland.

Deng

(2012). The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] . IEEE Signal Processing Magazine, 29(6), 141–142. https://doi.org/10.1109/MSP.2012.2211477

Gao

, et al. (2025). A Parallel Color Image Encryption Algorithm Based on a 2-D Logistic-Rulkov Neuron Map . IEEE Internet of Things Journal, 12(11), 18115–18124. https://doi.org/10.1109/JIOT.2025.3540097

Dolan-Gavitt

Garg

S. J. a. p. a.

(2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain .

Guo

, et al. (2023). Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency .

, et al. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). IEEE.

10.

Hou

, et al. (2024). IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency .

11.

Ishida

, et al. (2020). Do we need zero training loss after achieving zero training error?

12.

Krizhevsky

Hinton

(2009). Learning multiple layers of features from tiny images .

13.

Yang

X. J. C. N.

(2015). Tiny imagenet visual recognition challenge. 2015. 7(7): p. 3.

14.

, et al. (2024a). Face recognition method based on fusion of improved MobileFaceNet and adaptive gamma algorithm. Journal of the Franklin Institute, 361(17), 107306. https://doi.org/10.1016/j.jfranklin.2024.107306

15.

, et al. (2025). Correcting the distribution of batch normalization signals for Trojan mitigation . Neurocomputing, 614, 128752. https://doi.org/10.1016/j.neucom.2024.128752

16.

, et al. (2021a). Anti-backdoor learning: Training clean models on poisoned data. 2021. 34: p. 14900–14912.

17.

, et al. (2021b). Backdoor attack in the physical world .

18.

, et al. (2021c). Invisible Backdoor Attack with Sample-Specific Triggers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 16463–16472). IEEE.

19.

, et al. (2021d). Neural attention distillation: Erasing backdoor triggers from deep neural networks .

20.

, et al. (2023). Reconstructive neuron pruning for backdoor defense. In International Conference on Machine Learning (pp. 19837–19854). PMLR.

21.

, et al. (2024b). Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 35(1), 5–22. https://doi.org/10.1109/TNNLS.2022.3182979

22.

, et al. (2024c). Cleangen: Mitigating backdoor attacks for generation tasks in large language models .

23.

Liang

, et al. (2024). Revisiting backdoor attacks against large vision-language models .

24.

Lin

, et al. (2025). UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models .

25.

Liu

, et al. (2023). Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

26.

Mengara

Avila

Falk

T. H.

(2024). Backdoor Attacks to Deep Neural Networks: A Survey of the Literature, Challenges, and Future Research Directions . IEEE Access, 12, 29004–29023. https://doi.org/10.1109/ACCESS.2024.3355816

27.

Min

, et al. (2022). Towards stable backdoor purification through feature shift tuning. 2023. 36: p. 75286–75306.

28.

Nguyen

Tran

A. J. a. p. a.

(2021). Wanet–imperceptible warping-based backdoor attack .

29.

, et al. (2023). Revisiting the assumption of latent separability for backdoor defenses . In The eleventh international conference on learning representations.

30.

Roux

Q. L.

, et al. (2024). A comprehensive survey on backdoor attacks and their defenses in face recognition systems. IEEE Access, 12, 47433–47468. https://doi.org/10.1109/ACCESS.2024.3382584

31.

Sha

, et al. (2022). Fine-tuning is all you need to mitigate backdoor attacks .

32.

Soremekun

Udeshi

Chattopadhyay

(2023). Towards backdoor attacks and defense in robust machine learning models. Computers & Security, 127, 103101. https://doi.org/10.1016/j.cose.2023.103101

33.

Stallkamp

, et al. (2011). The German Traffic Sign Recognition Benchmark: A multi-class classification competition . In The 2011 International Joint Conference on Neural Networks.

34.

Wang

, et al. (2024). MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic. In 2024 IEEE Symposium on Security and Privacy (SP) (pp. 1994–2012). IEEE.

35.

Wang

Y. J. A. i. N. I. P. S.

(2021). Adversarial neuron pruning purifies backdoored deep models. 2021. 34: p. 16913–16925.

36.

Xia

, et al. (2022). Eliminating backdoor triggers for deep neural networks using attention relation graph distillation .

37.

Xiang

Xiong

(2023). Umd: Unsupervised model detection for x2x backdoor attacks. In International Conference on Machine Learning (pp. 38013–38038). PMLR.

38.

, et al. (2023). BATT: Backdoor Attack with Transformation-Based Triggers. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1–5). IEEE.

39.

Zeng

, et al. (2021). Adversarial unlearning of backdoors via implicit hypergradient .

40.

Zeng

, et al. (2023). Narcissus: A practical clean-label backdoor attack with limited information. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (pp. 771–785). Association for Computing Machinery.

41.

Zhai

, et al. (2024). A general image classification model for agricultural machinery trajectory mode recognition. Computers and Electronics in Agriculture, 227, 109629. https://doi.org/10.1016/j.compag.2024.109629

42.

Zhang

, et al. (2024). Small object detection by edge-aware neural network. Engineering Applications of Artificial Intelligence, 138, 109406. https://doi.org/10.1016/j.engappai.2024.109406

43.

Zhang

, et al. (2025). Purifier

^{+}

: Plug-and-play Backdoor Mitigation for Pre-trained Models via Activation Alignment. IEEE Transactions on Multimedia, 27, 1–14. https://doi.org/10.1145/3503161.3548065

44.

Zhao

, et al. (2020). Bridging mode connectivity in loss landscapes and adversarial robustness .

45.

Zhao

Wressnegger

(2025). Two Sides of the Same Coin: Learning the Backdoor to Remove the Backdoor .

46.

Zhao

, et al. (2024). A survey of backdoor attacks and defenses on large language models: Implications for security measures .

47.

Zhu

, et al. (2023). Enhancing Fine-Tuning based Backdoor Defense with Sharpness-Aware Minimization. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 4443–4454). IEEE.

Towards Robust Deep Neural Networks: A Multi-Phase Hybrid Defense Against Backdoor Attacks

Abstract

Keywords

Get full access to this article

References