Visualization of big data security: a case study on the KDD99 cup data set

Abstract Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing untrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear identification of “normal” clusters and described distinct clusters of effective attacks.

[1]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[2]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[3]  Insu Song,et al.  Big Data Visualization:: Application in Visualizing Learning Activities , 2016 .

[4]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[5]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[6]  Benjamin W. Wah,et al.  Significance and Challenges of Big Data Research , 2015, Big Data Res..

[7]  Bernhard Ager,et al.  Visualizing big network traffic data using frequent pattern mining and hypergraphs , 2013, Computing.

[8]  Stefano Nativi,et al.  Big Data challenges in building the Global Earth Observation System of Systems , 2015, Environ. Model. Softw..

[9]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[10]  Eduard Glatz,et al.  Visualizing host traffic through graphs , 2010, VizSec '10.

[11]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[12]  Sanming Zhou,et al.  Networking for Big Data: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[13]  Olasehinde Olayemi,et al.  Feature or Attribute Extraction for Intrusion Detection System using Gain Ratio and Principal Component Analysis (PCA) , 2016 .

[14]  Juha Heinanen,et al.  OF DATA INTENSIVE APPLICATIONS , 1986 .

[15]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[16]  Soukaena H. Hashem Efficiency of Svm and Pca to Enhance Intrusion Detection System , 2013 .

[17]  Annie George,et al.  Anomaly Detection based on Machine Learning Dimensionality Reduction using PCA and Classification using SVM , 2012 .

[18]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[19]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[20]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .