Neural network Trojan

Abstract

This paper presents a proof of concept of a neural network Trojan. The neural network Trojan consists of a neural network that has been trained with a compromised dataset and modified code. The Trojan implementation is carried out by insertion of a malicious payload encoded into the weights alongside with the data of the intended application. The neural Trojan is specifically designed so that when a specific entry is fed into the trained neural network, it triggers the interpretation of the data as payload. The paper presents a background on which this attack is based and provides the assumptions that make the attack possible. Two embodiments of the attack are presented consisting of a basic backpropagation network and a Neural Network Trojan with Sequence Processing Connections (NNTSPC). The two alternatives are used depending on the underlying circumstances on which the compromise is launched. Experimental results are carried out with synthetic as well as a chosen existing binary payload. Practical issues of the attack are also discussed, as well as a discussion on detection techniques.

Keywords

Neural network Trojan malware artificial intelligence machine learning

Get full access to this article

View all access options for this article.