Smart Greybox Fuzzing

Coverage-based greybox fuzzing (CGF) is one of the most successful methods for automated vulnerability detection. Given a seed file (as a sequence of bits), CGF randomly flips, deletes or bits to generate new files. CGF iteratively constructs (and fuzzes) a seed corpus by retaining those generated files which enhance coverage. However, random bitflips are unlikely to produce valid files (or valid chunks in files), for applications processing complex file formats. In this work, we introduce smart greybox fuzzing (SGF) which leverages a high-level structural representation of the seed file to generate new files. We define innovative mutation operators that work on the virtual file structure rather than on the bit level which allows SGF to explore completely new input domains while maintaining file validity. We introduce a novel validity-based power schedule that enables SGF to spend more time generating files that are more likely to pass the parsing stage of the program, which can expose vulnerabilities much deeper in the processing logic. Our evaluation demonstrates the effectiveness of SGF. On several libraries that parse structurally complex files, our tool AFLSmart explores substantially more paths (up to 200%) and exposes more vulnerabilities than baseline AFL. Our tool AFLSmart has discovered 42 zero-day vulnerabilities in widely-used, well-tested tools and libraries; so far 17 CVEs were assigned.

[1]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[2]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[3]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[4]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[6]  Patrice Godefroid,et al.  SAGE: Whitebox Fuzzing for Security Testing , 2012, ACM Queue.

[7]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[8]  Andreas Zeller,et al.  Parser-directed fuzzing , 2019, PLDI.

[9]  Mathias Payer,et al.  T-Fuzz: Fuzzing by Program Transformation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[10]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[11]  Angelos D. Keromytis,et al.  SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities , 2017, CCS.

[12]  Xiangyu Zhang,et al.  SLF: Fuzzing without Valid Seed Inputs , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[13]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[14]  Yang Liu,et al.  Steelix: program-state based binary fuzzing , 2017, ESEC/SIGSOFT FSE.

[15]  Yang Liu,et al.  Superion: Grammar-Aware Greybox Fuzzing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[16]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[17]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18]  David Brumley,et al.  Program-Adaptive Mutational Fuzzing , 2015, 2015 IEEE Symposium on Security and Privacy.

[19]  Xiangyu Zhang,et al.  Deriving input syntactic structure from execution , 2008, SIGSOFT '08/FSE-16.

[20]  Abhik Roychoudhury,et al.  Model-based whitebox fuzzing for program binaries , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[23]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[24]  Choongwoo Han,et al.  The Art, Science, and Engineering of Fuzzing: A Survey , 2018, IEEE Transactions on Software Engineering.

[25]  Yves Le Traon,et al.  Semantic fuzzing with zest , 2018, ISSTA.

[26]  Xiangyu Zhang,et al.  ProFuzzer: On-the-fly Input Type Probing for Better Zero-Day Vulnerability Discovery , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[27]  Soumya Paul,et al.  A Probabilistic Analysis of the Efficiency of Automated Software Testing , 2016, IEEE Transactions on Software Engineering.

[28]  Marcel Bohme,et al.  STADS: Software Testing as Species Discovery , 2018, 1803.02130.

[29]  Yves Le Traon,et al.  Validity Fuzzing and Parametric Generators for Effective Random Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[30]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[31]  Paul Walton Purdom,et al.  A sentence generator for testing parsers , 1972 .

[32]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[33]  Marcel Böhme,et al.  Assurances in Software Testing: A Roadmap , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[34]  Koushik Sen,et al.  FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage , 2017, ArXiv.

[35]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[36]  Ahmad-Reza Sadeghi,et al.  NAUTILUS: Fishing for Deep Bugs with Grammars , 2019, NDSS.

[37]  Andreas Zeller,et al.  Mining input grammars from dynamic taints , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..