The Holy Grail of: Teaming humans and machine learning for detecting cyber threats

Although there is a large corpus of research focused on using machine learning to detect cyber threats, the solutions presented are rarely actually adopted in the real world. In this paper, we discuss the challenges that currently limit the adoption of machine learning in security operations, with a special focus on label acquisition, model deployment, and the integration of model findings into existing investigation workflows. Moreover, we posit that the conventional approach to the development of machine learning models, whereby researchers work offline on representative datasets to develop accurate models, is not valid for many cybersecurity use cases. Instead, a different approach is needed: to integrate the creation and maintenance of machine learning models into security operations themselves.

[1]  Gunnar Rätsch,et al.  The Feature Importance Ranking Measure , 2009, ECML/PKDD.

[2]  Martin Warmer,et al.  Detection of web based command & control channels , 2011 .

[3]  Christopher Krügel,et al.  Revolver: An Automated Approach to the Detection of Evasive Web-based Malware , 2013, USENIX Security Symposium.

[4]  Biswanath Mukherjee,et al.  SIDD: A Framework for Detecting Sensitive Data Exfiltration by an Insider Attack , 2008, 2009 42nd Hawaii International Conference on System Sciences.

[5]  Ali A. Ghorbani,et al.  Towards effective feature selection in machine learning-based botnet detection approaches , 2014, 2014 IEEE Conference on Communications and Network Security.

[6]  Sanjay Krishnan,et al.  PALM: Machine Learning Explanations For Iterative Debugging , 2017, HILDA@SIGMOD.

[7]  Paulo Cortez,et al.  Opening black box Data Mining models using Sensitivity Analysis , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[8]  Ernest Foo,et al.  Automated feature engineering for HTTP tunnel detection , 2016, Comput. Secur..

[9]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[11]  Ali A. Ghorbani,et al.  Detecting Malicious URLs Using Lexical Analysis , 2016, NSS.

[12]  Parvez Ahammad,et al.  SoK: Applying Machine Learning in Security - A Survey , 2016, ArXiv.

[13]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[14]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[15]  Asaf Shabtai,et al.  Detection of Malicious and Low Throughput Data Exfiltration Over the DNS Protocol , 2017, Comput. Secur..

[16]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Philipp Winter,et al.  An On-Line Learning Statistical Model to Detect Malicious Web Requests , 2011, SecureComm.

[18]  Ali A. Ghorbani,et al.  Characterization of Encrypted and VPN Traffic using Time-related Features , 2016, ICISSP.

[19]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[20]  Ruzanna Chitchyan,et al.  Detecting and Preventing Data Exfiltration , 2014 .

[21]  Prabaharan Poornachandran,et al.  A lexical approach for classifying malicious URLs , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[22]  Melanie Mitchell,et al.  Interpreting individual classifications of hierarchical networks , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[23]  Rasool Jalili,et al.  Alert Correlation Algorithms: A Survey and Taxonomy , 2013, CSS.

[24]  Klaus-Robert Müller,et al.  Feature Importance Measure for Non-linear Learning Algorithms , 2016, ArXiv.

[25]  Jens Myrup Pedersen,et al.  On the use of machine learning for identifying botnet network traffic , 2016, J. Cyber Secur. Mobil..

[26]  Antonios Atlasis,et al.  Detecting DNS Tunneling , 2019 .

[27]  I. Ullah,et al.  Detecting Lateral Movement Attacks through SMB using BRO , 2016 .

[28]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[29]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[30]  Blake Anderson,et al.  Identifying Encrypted Malware Traffic with Contextual Flow Data , 2016, AISec@CCS.

[31]  Alexander D. Kent,et al.  Connected Components and Credential Hopping in Authentication Graphs , 2014, 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems.

[32]  Ali A. Ghorbani,et al.  Characterization of Tor Traffic using Time based Features , 2017, ICISSP.

[33]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.