Coverage-Based Greybox Fuzzing as Markov Chain

Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test is generated by slightly mutating a seed input. If the test exercises a new and interesting path, it is added to the set of seeds; otherwise, it is discarded. We observe that most tests exercise the same few “high-frequency” paths and develop strategies to explore significantly more paths with the same number of tests by gravitating towards low-frequency paths. We explain the challenges and opportunities of CGF using a Markov chain model which specifies the probability that fuzzing the seed that exercises path $i$ i generates an input that exercises path $j$ j . Each state (i.e., seed) has an energy that specifies the number of inputs to be generated from that seed. We show that CGF is considerably more efficient if energy is inversely proportional to the density of the stationary distribution and increases monotonically every time that seed is chosen. Energy is controlled with a power schedule. We implemented several schedules by extending AFL. In 24 hours, AFLFast exposes 3 previously unreported CVEs that are not exposed by AFL and exposes 6 previously unreported CVEs 7x faster than AFL. AFLFast produces at least an order of magnitude more unique crashes than AFL. We compared AFLFast to the symbolic executor Klee. In terms of vulnerability detection, AFLFast is significantly more effective than Klee on the same subject programs that were discussed in the original Klee paper. In terms of code coverage, AFLFast only slightly outperforms Klee while a combination of both tools achieves best results by mitigating the individual weaknesses.

[1]  Guofei Gu,et al.  TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[2]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[3]  Bruno C. d. S. Oliveira,et al.  Regression tests to expose change interaction errors , 2013, ESEC/FSE 2013.

[4]  David Brumley,et al.  Optimizing Seed Selection for Fuzzing , 2014, USENIX Security Symposium.

[5]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[6]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[7]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[8]  David Brumley,et al.  Scheduling black-box mutational fuzzing , 2013, CCS.

[9]  Abhik Roychoudhury,et al.  Model-based whitebox fuzzing for program binaries , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[11]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[12]  Matthew B. Dwyer,et al.  Probabilistic symbolic execution , 2012, ISSTA 2012.

[13]  Matthew B. Dwyer,et al.  On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[14]  Brian S. Pak,et al.  Hybrid Fuzz Testing: Discovering Software Bugs via Fuzzing and Symbolic Execution , 2012 .

[15]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[16]  Zhendong Su,et al.  Coverage-directed differential testing of JVM implementations , 2016, PLDI.

[17]  David Brumley,et al.  Program-Adaptive Mutational Fuzzing , 2015, 2015 IEEE Symposium on Security and Privacy.

[18]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[19]  Patrice Godefroid,et al.  SAGE: Whitebox Fuzzing for Security Testing , 2012, ACM Queue.

[20]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2017, IEEE Trans. Software Eng..

[21]  Ryan Cunningham,et al.  Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[22]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[23]  Soumya Paul,et al.  A Probabilistic Analysis of the Efficiency of Automated Software Testing , 2016, IEEE Transactions on Software Engineering.

[24]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[25]  П. Довгалюк,et al.  Два способа организации механизма полносистемного детерминированного воспроизведения в симуляторе QEMU , 2012 .