Clustering android ransomware families using fuzzy hashing similarities

Abstract

Recently, Android device usage has increased significantly, and malicious applications for the Android ecosystem have also increased. Security researchers have studied Android malware analysis as an emerging issue. The proposed methods employ a combination of static, dynamic, or hybrid analysis along with Machine Learning (ML) algorithms to detect and classify malware into families. These families often exhibit shared similarities among their members or with other families. This paper presents a new method that combines Fuzzy Hashing and Natural Language Processing (NLP) techniques to find Android malware families based on their similarities by applying reverse engineering to extract the features and compute fuzzy hashing of the preprocessed code. This relationship allows us to identify the families according to their features. A study was conducted using a database test of 2,288 samples from diverse ransomware families. An accuracy in classifying Android ransomware malware up to 98.46% was achieved.

Keywords

Android malware analysis android ransomware cybersecurity fuzzy hashing natural language processing

Get full access to this article

View all access options for this article.

References

IDC. IDC Worldwide Quarterly Mobile Phone Tracker. International Data Corporation (2022).

Google. (n.d.). Secure an Android Device | Android Open Source Project. https://source.android.com/docs/security/overview.https://source.android.com/docs/security/overview.

Oprea

S.V.

Bara

Dobrita

, (ENE) and D.C. Barbu, A horizontaltuning framework for machine learning algorithms using amicroservice-based architecture, Studies in Informatics and ControlISSN 1220-1766, 32(3) (2023), 31–43. doi: https://doi.org/10.24846/v32i3y202303.

Gopinath

Sethuraman

S.C.

, A comprehensive survey on deeplearning based malware detection techniques, Computer ScienceReview4747 (2023), 10059.ISSN 1574-0137,https://doi.org/10.1016/j.cosrev.2022.100529.

Akhtar

M.S.

Feng

, Malware analysis and detection usingmachine learning algorithms, Symmetry14 (2022), 2304. https://doi.org/10.3390/sym14112304.

Shaukat

Luo

Varadharajan, , A novel deep learning-based approach for malware detection,30, ISSN -, Engineering Applications of Artificial Intelligence122 (2023), 106030,ISSN 0952-1976.https://doi.org/10.1016/j.engappai.2023.106030.

Sarantinos

Benzaïd

Arabiat

Al-Nemrat

, Forensic Malware Analysis: The Value of Fuzzy Hashing Algorithms in Identifying Similarities, 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 2016, pp. 1782–1787, doi:10.1109/TrustCom.2016.0274.

Zhu

Chan

Guizani

, Fuzzy Hashing onFirmwares Images: A Comparative Analysis, in, IEEE InternetComputing27(2) (2023), 45–50. doi: 10.1109/MIC.2022.3225811.

Roussev

Data Fingerprinting with Similarity Digests. In: Chow, KP., Shenoi, S. (eds) Advances in Digital Forensics VI. Digital Forensics 2010. IFIP Advances in Information and Communication Technology, vol 337. Springer, Berlin, Heidelberg. (2010).https://doi.org/10.1007/978-3-642-15506-2_15.

10.

Roussev

, An evaluation of forensic similarity hashes, Digital Investigation8(Supplement) (2011), S34–S41ISSN 1742-2876,https://doi.org/10.1016/j.diin.2011.05.005.

11.

Naik

Jenkins

Savage

Yang

Boongoen

Iam-On

, Fuzzy-Import Hashing: A Malware Analysis Approach, 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Glasgow, UK, 2020, pp. 1–8, doi:10.1109/FUZZ48607.2020.9177636.

12.

Roussev

Quates

, sdhash. (2013).http://roussev.net/sdhash/tutorial/sdhash-tutorial.html.

13.

Valosek

, APKTool. (2010).https://ibotpeaches.github.io/Apktool/.https://ibotpeaches.github.io/Apktool/.

14.

Sidorov

Vector Space Model for Texts and the tf-idf Measure. In: Syntactic n-grams in Computational Linguistics. SpringerBriefs in Computer Science. Springer, Cham (2019).https://doi.org/10.1007/978-3-030-14771-6_3.

15.

Alireza Mohammadinodooshan, Ulf Kargén Nahid Shahmehri, , Robust Detection of Obfuscated Strings in Android Apps. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security (AISec’19). Association for Computing Machinery, New York, NY, USA, 2019, 25–35. https://doi.org/10.1145/3338501.3357373.

16.

Tran

T.K.

Sato

, NLP-based approaches for malware classification from API sequences, 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES), Hanoi, Vietnam, 2017, pp. 101–105, doi:10.1109/IESYS.2017.8233569.

17.

Zhang

Pang

Liu

, Common Program Similarity MetricMethod for Anti-Obfuscation, in, IEEE Access6 (2018), 47567–47565. doi:10.1109/ACCESS.2018.2867531.

18.

Zhang

Cheng

Lou

Jiang

, A Novel Android Malware Detection Approach Using Operand Sequences, 2018 Third International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC), Shanghai, China, 2018, pp. 1–5, doi:10.1109/SSIC.2018.8556755.

19.

ElMouatez Billah Karbab Mourad Debbabi, , MalDy: Portable,data-driven malware detection using natural language processing andmachine learning techniques on behavioral analysis reports, , Digit. Investig28 (2019), S77S87.https://doi.org/10.1016/j.diin.2019.01.017.

20.

Aghamohammadi

Faghih

, Lightweight versusobfuscation-resilient malware detection in android applications, J Comput Virol Hack Tech16 (2020), 125–139. doi: https://doi.org/10.1007/s11416-019-00341-y.

21.

Chen

Mao

Cheng

, DroidVecDeep: AndroidMalware Detection Based on Word2Vec and Deep Belief Network, KSII Transactions on Internet and Information Systems13(4) (2019), 2180–2197. doi: 10.3837/tiis.2019.04.025.

22.

Vinayakumar

Alazab

Soman

K.P.

Poornachandran

Venkatraman,

, Robust intelligent malware detection using deeplearning, in, IEEE Access7 (2019), 46737–46738. doi: 10.1109/ACCESS.2019.2906934.

23.

Huh

Cho

Choi

Shin

Lee

, A ComprehensiveAnalysis of Today’s Malware and Its Distribution Network: CommonAdversary Strategies and Implications, in, IEEE Access10 (2022), 49586–49584. doi: 10.1109/ACCESS.2022.3171226.

24.

Kumar

Zhang

Wang

Khan

R.U.

Kumar

Sharif

, AMultimodal Malware Detection Technique for Android IoT Devices UsingVarious Features, in, IEEE Access7 (2019), 64431–64430. doi: 10.1109/ACCESS.2019.2916886.

25.

Song

Zhang

Wang

Chen

, Inter-BIN:Interaction-BasedCross-Architecture IoT Binary Similarity Comparison, in. IEEE Internet of Things Journal9(20) (2022), 20018–20033. doi: 10.1109/JIOT.2022.31709278-3.

26.

Torabi

Dib

Bou-Harb

Assi

Debbabi

, AStrings-Based Similarity Analysis Approach for Characterizing IoTMalware and Inferring Their Underlying Relationships, in, IEEE Networking Letters3(3) (2021), 161–165. doi: 10.1109/LNET.2021.3076600.

27.

Chen

Wang

Zhao

Chen

Ahn

G.-J.

, Uncoveringthe Face of Android Ransomware: Characterization and Real-TimeDetection, in-, IEEE Transactions on Information Forensics and Security13(5) (2018), 1286–1300. doi:10.1109/TIFS.2017.2787905.

28.

Manning

Raghavan

Schütze

, Introduction to Information Retrieval. Cambridge: Cambridge University Press (2008). doi:10.1017/CBO9780511809071.

29.

App Manifest Overview | Android Developers. (n.d.).https://developer.android.com/guide/topics/manifest/mani-fest-intro.

30.

Chen

Wang

Zhao

Chen

Ahn

G.-J.

, Uncoveringthe face of android ransomware: characterization and real-timedetection, in, IEEE Transactions on Information Forensics andSecurity13(5) (2018), 1286–1300. doi: 10.1109/TIFS.2017.2787905.

31.

Sood

, Virustotal: R Client for the virustotal API (2021).