Effective classification of android malware families through dynamic features and neural networks

Due to their open nature and popularity, Android-based devices have attracted several end-users around the World and are one of the main targets for attackers. Because of the reasons given above, it is necessary to build tools that can reliably detect zero-day malware on these devices. At the moment, many of the frameworks that have been proposed to detect malware applications leverage Machine Learning (ML) techniques. However, an essential requirement to build these frameworks consists of using very large and sophisticated datasets for model construction and training purposes. Their success, indeed, strongly depends on the choice of the right features used for building a classification model providing adequate generalisation capability. Furthermore, the creation of a training dataset that well represents the malware properties and behaviour is one of the most critical challenges in malware analysis. Therefore, the main aim of this paper is proposing a new dataset called Unisa Malware Dataset (UMD) available on http://antlab.di.unisa.it/malware/, which is based on the extraction of static and dynamic features characterising the malware activities. Additionally, we will show some experiments concerning common ML tools to demonstrate how it is possible to build efficient ML-based malware classification frameworks using the proposed dataset.

[1]  Mianxiong Dong,et al.  Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing , 2018, IEEE Network.

[2]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[3]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[4]  Francesco Palmieri,et al.  Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial-temporal features extraction , 2021, J. Netw. Comput. Appl..

[5]  Thomas Schreck,et al.  Mobile-sandbox: having a deeper look into android applications , 2013, SAC '13.

[6]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.

[7]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[8]  Patrick P. K. Chan,et al.  Static detection of Android malware by using permissions and API calls , 2014, 2014 International Conference on Machine Learning and Cybernetics.

[9]  Yuval Elovici,et al.  “Andromaly”: a behavioral malware detection framework for android devices , 2012, Journal of Intelligent Information Systems.

[10]  Massimo Ficco,et al.  Detecting IoT Malware by Markov Chain Behavioral Models , 2019, 2019 IEEE International Conference on Cloud Engineering (IC2E).

[11]  Georgios Kambourakis,et al.  An extrinsic random-based ensemble approach for android malware detection , 2020, Connect. Sci..

[12]  Jiyong Jang,et al.  Android Malware Clustering through Malicious Payload Mining , 2017, RAID.

[13]  Abdelouahid Derhab,et al.  MalDozer: Automatic framework for android malware detection using deep learning , 2018, Digit. Investig..

[14]  Francesco Palmieri,et al.  Malware detection in mobile environments based on Autoencoders and API-images , 2020, J. Parallel Distributed Comput..

[15]  Jacques Klein,et al.  AndroZoo: Collecting Millions of Android Apps for the Research Community , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[16]  L. Cavallaro,et al.  A System Call-Centric Analysis and Stimulation Technique to Automatically Reconstruct Android Malware Behaviors , 2013 .

[17]  Ali A. Ghorbani,et al.  Towards a Network-Based Framework for Android Malware Detection and Characterization , 2017, 2017 15th Annual Conference on Privacy, Security and Trust (PST).

[18]  Arash Habibi Lashkari,et al.  Extensible Android Malware Detection and Family Classification Using Network-Flows and API-Calls , 2019, 2019 International Carnahan Conference on Security Technology (ICCST).

[19]  David Camacho,et al.  CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains , 2018, Eng. Appl. Artif. Intell..

[20]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.

[21]  Katarina Grolinger,et al.  Deep Learning: Edge-Cloud Data Analytics for IoT , 2019, 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE).

[22]  Tzyy-Chyang Lu,et al.  CNN Convolutional layer optimisation based on quantum evolutionary algorithm , 2020, Connect. Sci..

[23]  Stefan Wermter,et al.  Interactive natural language acquisition in a multi-modal recurrent neural architecture , 2017, Connect. Sci..

[24]  Gianluca Stringhini,et al.  MaMaDroid , 2019, ACM Trans. Priv. Secur..

[25]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[26]  Nathan S. Netanyahu,et al.  DeepSign: Deep learning for automatic malware signature generation and classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[27]  Adam Doupé,et al.  Deep Android Malware Detection , 2017, CODASPY.

[28]  Ali A. Ghorbani,et al.  Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification , 2018, 2018 International Carnahan Conference on Security Technology (ICCST).

[29]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[30]  Lidia Ogiela,et al.  Cognitive Informatics in Automatic Pattern Understanding and Cognitive Information Systems , 2010 .

[31]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[32]  Cong Li,et al.  A combinational convolutional neural network of double subnets for food-ingredient recognition , 2020 .

[33]  Hahn-Ming Lee,et al.  DroidMat: Android Malware Detection through Manifest and API Calls Tracing , 2012, 2012 Seventh Asia Joint Conference on Information Security.

[34]  Kuan-Ching Li,et al.  An intrusion detection approach based on improved deep belief network , 2020, Applied Intelligence.

[35]  Luca Oneto,et al.  Low-Resource Footprint, Data-Driven Malware Detection on Android , 2020, IEEE Transactions on Sustainable Computing.

[36]  Bhaskar Biswas,et al.  CNN-based salient features in HSI image semantic target prediction , 2020, Connect. Sci..

[37]  Ryotaro Kamimura,et al.  Improving collective interpretation by extended potentiality assimilation for multi-layered neural networks , 2020, Connect. Sci..

[38]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.