Abstract
Deep neural networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to benign inputs. Notably, adversarial examples generated on white-box models often exhibit black-box transferability. Targeted attacks, which require fooling a model into predicting a specific target class, are more challenging than non-targeted attacks. A representative approach to targeted attacks is the Self-Universality (SU) method, which improves targeted transferability by enhancing the universality of adversarial perturbations. SU achieves this by maximizing the feature similarity between adversarially perturbed global images and randomly cropped local regions. However, as the pair of images used for similarity calculation is derived from the same domain, the natural high similarity between local regions and global images diminishes the prominence of the dominant features introduced by the perturbations. This limitation compromises universality, ultimately reducing targeted transferability. To address these issues, we propose Style Augmentation Domain-Universality (SADU), a method that enhances perturbation universality across domain-augmented images of the same source image. Specifically, we apply style augmentation to the source domain images and mix them with generated images to create style domain images. We then introduce a feature similarity loss that maximizes the feature similarity between adversarially perturbed source domain images and style domain images, encouraging the learned perturbations to be more universal. This approach amplifies the dominance of features introduced by adversarial perturbations compared to SU, thereby improving perturbation universality and targeted transferability. Experiments on the ImageNet-Compatible dataset demonstrate the effectiveness of SADU, boosting the average targeted attack success rate from 25.6% to 36.8% compared to state-of-the-art methods.
Keywords
Get full access to this article
View all access options for this article.
