Association rule-based malware classification using common subsequences of API calls

Abstract Emerging malware pose increasing challenges to detection systems as their variety and sophistication continue to increase. Malware developers use complex techniques to produce malware variants, by removing, replacing, and adding useless API calls to the code, which are specifically designed to evade detection mechanisms, as well as do not affect the original functionality of the malicious code involved. In this work, a new recurring subsequences alignment-based algorithm that exploits associative rules has been proposed to infer malware behaviors. The proposed approach exploits the probabilities of transitioning from two API invocations in the call sequence, as well as it also considers their timeline, by extracting subsequence of API calls not necessarily consecutive and representative of common malicious behaviors of specific subsets of malware. The resulting malware classification scheme, capable to operate within dynamic analysis scenarios in which API calls are traced at runtime, is inherently robust against evasion/obfuscation techniques based on the API call flow perturbation. It has been experimentally compared with two detectors based on Markov chain and API call sequence alignment algorithms, which are among the most widely adopted approaches for malware classification. In such experimental assessment the proposed approach showed an excellent classification performance by outperforming its competitors.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  Bazara I. A. Barry,et al.  Improving the Detection of Malware Behaviour Using Simplified Data Dependent API Call Graph , 2013 .

[3]  Massimo Ficco,et al.  Detecting IoT Malware by Markov Chain Behavioral Models , 2019, 2019 IEEE International Conference on Cloud Engineering (IC2E).

[4]  Jie He,et al.  CBM: Free, Automatic Malware Analysis Framework Using API Call Sequences , 2014 .

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[7]  Nan Zhang,et al.  Leave Me Alone: App-Level Protection against Runtime Information Gathering on Android , 2015, 2015 IEEE Symposium on Security and Privacy.

[8]  Jonghyun Kim,et al.  Improvement of malware detection and classification using API call sequence alignment and visualization , 2017, Cluster Computing.

[9]  Francesco Palmieri,et al.  Malware detection in mobile environments based on Autoencoders and API-images , 2020, J. Parallel Distributed Comput..

[10]  David Camacho,et al.  CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains , 2018, Eng. Appl. Artif. Intell..

[11]  Sheng Chen,et al.  A malware detection method based on family behavior graph , 2018, Comput. Secur..

[12]  Gerardo Canfora,et al.  An HMM and structural entropy based detector for Android malware: An empirical study , 2016, Comput. Secur..

[13]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[14]  Lior Rokach,et al.  Using the confusion matrix for improving ensemble classifiers , 2010, 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel.

[15]  Mahmood Yousefi-Azar,et al.  Malytics: A Malware Detection Scheme , 2018, IEEE Access.

[16]  Eul Gyu Im,et al.  Malware Similarity Analysis using API Sequence Alignments , 2014, J. Internet Serv. Inf. Secur..

[17]  Gianluca Stringhini,et al.  MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version) , 2016, NDSS 2017.

[18]  Wu Liu,et al.  Behavior-Based Malware Analysis and Detection , 2011, 2011 First International Workshop on Complexity and Data Mining.

[19]  Deepti Vidyarthi,et al.  Malware Detection Using API Function Frequency with Ensemble Based Classifier , 2013, SSCC.

[20]  Eunjin Kim,et al.  A Novel Approach to Detect Malware Based on API Call Sequence Analysis , 2015, Int. J. Distributed Sens. Networks.

[21]  Sheng-De Wang,et al.  Machine Learning Based Hybrid Behavior Models for Android Malware Analysis , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.