Abstract
Consumer electronic devices with voice assistants are becoming increasingly popular in modern intelligent home. Nevertheless, directly uploading unprocessed speech data, which may contain sensitive attributes, to a cloud server poses a significant risk to user privacy. To address this privacy issue, this paper proposes a privacy-enhancing model to protect speech emotions based on generative adversarial networks (PSEGAN). The model aims to prevent the inference of emotional attributes while maintaining the accuracy and utility of speech features. PSEGAN benefits from three modules: (1) A pre-trained speaker matcher imposes generative constraints on the model during the training phase to ensure that the generated speech retains the essential information needed for speaker recognition. (2) Attribute adversarial networks can generate perturbed speech that transforms emotional attributes while preserving the utility of the speech. (3) Gated Recurrent Networks (GRN) can handle the long-short term dependencies of speech signals. PSEGAN model solves the problem of utility loss in traditional speech privacy preservation methods based on generative adversarial networks (GAN). Experimental results show that on the RAVDESS dataset, PSEGAN reduces emotion recognition accuracy by 80.7%, while speaker recognition accuracy only decreases by 1.1%. These findings demonstrate that PSEGAN effectively mitigates the leakage of emotional attributes while maintaining high utility.
Get full access to this article
View all access options for this article.
