Bidirectional IndRNN malicious webpages detection algorithm based on convolutional neural network and attention mechanism

Abstract

A convolutional neural network combined with attention mechanism and a parallel joint algorithm model (CATTB) of bidirectional independent recurrent neural network are proposed. The algorithm extracts the relocation feature and the “texture fingerprint” feature for expressing the similarity of the URL (Uniform Resource Locator) binary file content of the malicious web page, and uses the word vector tool word2vec to train the URL word vector feature and extract the URL static vocabulary feature. CNN (Convolutional Neural Network) is used to extract deep local features. Secondly, Attention mechanism adjusts weight and BiIndRNN (Bidirectional Independently Recurrent Neural Network) to extract global features. Finally, softmax is used for classification. This paper extracts more comprehensive features from different angles and using different methods. The experimental results show that the test results are higher than other researchers, and compared with other algorithms, the proposed CATTB algorithm improves the accuracy of malicious web page detection.

Keywords

Malicious webpages convolutional neural network attention mechanism bidirectional independently recurrent neural network

Get full access to this article

View all access options for this article.

References

Ren ,

He and

Girshick , et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems, 2015, 91–99.

F.A.

Gers ,

J.A.

Schmidhuber and

F.A.

Cummins , Learning to forget: Continual prediction with LSTM[J], Neural Computation 12(10) (2000), 2451–2471.

BARON-COHEN , The eye detection detector (EDD) and the shared attention mechanism (SAM): Two cases for evolutionary psychology[J], Joint Attention: Its Origin and Role in Development, 1995, 41–59.

Li ,

Li and

Cook , et al. Independently recurrent neural network (indrnn): Building a longer and deeper rnn[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, 5457–5466.

Sha

Hong-Zhou ,

Zhou

Zhou ,

Liu

Qing-Yun and

Qin

Peng . Light-weight self-learning for URL classification[J], Journal of Communications 35(9) (2014), 32–39.

Sha

Hong-Zhou ,

Zhou

Zhou ,

Liu

Qing-Yun and

Qin

Peng , Survey on malicious webpage detection research[J], Chinese Journal of Computers 39(3) (2016), 529–542.

Lin

Hai-Lun ,

Wei ,

Wang

Wei-Ping ,

Yue

Yin-Liang and

Lin

Zheng , Efficient segment pattern based method for malicious URL detection[J]. Journal of Communications 36(s1) (2015), 141–148.

Liu

Yanbing ,

Shao

Wei and

Wang

Yong , et al. A multiple string matching algorithm for large-scale URL filtering[J], Chinese Journal of Computers, 2014(5), 1159–1169.

Zhou

Hao , The research and implementation of malicious web pages detection from search engine based on decision tree[D]. Hunan University, 2013.

10.

A.N.

Langville and

C.D.

Meyer , Google's PageRank and Beyond[J]. Mathematical Intelligencer, 30(1) (2011), 68–69.

11.

Poomagal and

Hamsapriya , K-means for search results clustering using URL and tag contents[C] International Conference on Process Automation, Control and Computing. IEEE, 2011, 1–7.

12.

R.K.

Gibson ,

Gillan and

Greffet , et al. Party organizational change and ICTs: The growth of a virtual grassroots?[J], New Media & Society, 15(1)(2013), 31–51.

13.

L.X.

Zheng ,

L.I.

Qing-Shan and

L.I.

Su-Ke , et al. Phishing URL Detection Based on Domain Name Information[J], Computer Engineering 38(10) (2012), 108–110.

14.

Wang

Qiu-Shi , Design and implementation of HTTP Trojan horse network monitoring system based on client honeypot technology[D], Beijing University, 2008.

15.

Shiraishi ,

Kamizono and

Hirotomo , et al. Detection of Malicious PDF Files by Windows API Hook-based Network Monitoring[J], D -- Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition), 2014.

16.

Chen ,

K.E.

Wen-De and

A.G.

Wang , et al. Research on behavior analysis system based on sandbox technology[J], Computer Technology & Development, 2015.

17.

X.P.

Feng ,

L.I.

Zhi-Tang and

T.U.

Hao , et al. A method of malicious webpages behavior detection based on BHO technology[J]. Journal of Guangxi University, 2011.

18.

M.N.

Feroz and

Mengel , Examination of data, rule generation and detection of phishing URLs using online logistic regression[C] IEEE International Conference on Big Data. IEEE, 2015, 241–250.

19.

S.C.

Jeeva and

E.B.

Rajsingh , Intelligent phishing url detection using association rule mining[J]. Human-centric Computing and Information Sciences 6(1) (2016), 1–19.

20.

Akiyama ,

Yagi and

Yada , et al. Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots[J], Computers & Security, 2017.

21.

Bhagyashree and

Tanuja , Phishing URL Detection: A Machine Learning and Web Mining-based Approach[J], International Journal of Computer Applications 123(13) (2015), 46–50.

22.

Konte ,

Perdisci and

Feamster , ASwatch:An AS Reputation System to Expose Bulletproof Hosting ASes[J], ACM SIGCOMM Computer Communication Review 45(5) (2015), 625–638.

23.

Xue ,

Li and

Yao , et al. Phishing sites detection based on Url Correlation[C] International Conference on Cloud Computing and Intelligence Systems. IEEE, 2016, 244–248.

24.

M.N.

Feroz and

Mengel , Phishing URL Detection Using URL Ranking[C] IEEE International Congress on Big Data. IEEE Computer Society, 2015, 635–638.

25.

Ma ,

L.K.

Saul and

Savage , et al. Learning to detect malicious URLs[J]. Acm Transactions on Intelligent Systems & Technology 2(3) (2011), 1–24.

26.

Aldwairi and

Alsalman , MALURLS: A lightweight malicious website classification based on URL features[J], Journal of Emerging Technologies in Web Intelligence, 4,2(2012-05-01) 4(2) (2012), 128–133.

27.

Shukla and

Singh , PythonHoneyMonkey: Detecting malicious web URLs on client side honeypot systems[C] International Conference on Reliability, INFOCOM Technologies and Optimization. IEEE, 2014, 1–5.

28.

Akiyama ,

Yagi and

Yada , et al. Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots[J]. Computers & Security, 2017.

29.

Ishida ,

Takakura and

Okabe , High-performance intrusion detection using OptiGrid clustering and grid-based labelling[C] Ieee/ipsj, International Symposium on Applications and the Internet. IEEE, 2011, 11–19.

30.

Rajitha and

Vijayalakshmi , Suspicious URLs filtering using optimal RT-PFL: A novel feature selection based web URL detection[M] Smart Computing and Informatics, 2018.

31.

G.E.

Hinton , Learning distributed representations of concepts[C]

Proceedings of the Eighth Annual Conference of the Cognitive Science Society

1 (1986), 12.

32.

Tian

Shengwei ,

Zhou

Xingfa and

Pei

Long , et al. Causal relationship extraction based on bidirectional LSTM in Uighur language[J], Journal of Electronics & Information Technology 40(1) (2018), 200–208.

33.

Tian

Shengwei ,

Qin

Yue ,

Long ,

Turgen

Ibrahim and

Feng

Champion , Bi-LSTM-based Uighur Personal Pronouns Referential Decomposition[J], Acta Electronica Sinica 46(07) (2018), 1691–1699.

34.

Rahmawati and

M.L.

Khodra , Word2vec semantic representation in multilabel classification for Indonesian news article[C] International Conference on Advanced Informatics: Concepts, Theory and Application. IEEE, 2017, 1–6.

35.

Zhang ,

Xu and

Su , et al. Chinese comments sentiment classification based on word2vec and SVM perf[J], Expert Systems with Applications 42(4) (2015), 1857–1863.

36.

Xue ,

Fu and

Zhan , A study on sentiment computing and classification of Sina Weibo with Word2vec[C] IEEE International Congress on Big Data. IEEE, 2014, 358–363.

37.

Liu ,

Qiu and

Liu , Automatic Detection of Phishing Target from Phishing Webpage[J] 57(11) (2017), 4153–4156.

38.

J.C.S.

Fatt ,

C.K.

Leng and

S.S.

Nah , Phishdentity: Leverage Website Favicon to Offset Polymorphic Phishing Website[C] International Conference on Availability. IEEE, 2015, 114–119.

39.

Dewan and

Kumaraguru , Detecting Malicious Content on Facebook[J]. Computer Science, 2015.

40.

A.K.

Jain and

B.B.

Gupta , A novel approach to protect against phishing attacks at client side using auto-updated white-list[J], Eurasip Journal on Information Security 2016(1) (2016), 9.

41.

A.K.

Jain and

B.B.

Gupta , PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning[J]. 2018.