QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

Recently, hybrid fuzzing has been proposed to address the limitations of fuzzing and concolic execution by combining both approaches. The hybrid approach has shown its effectiveness in various synthetic benchmarks such as DARPA Cyber Grand Challenge (CGC) binaries, but it still suffers from scaling to find bugs in complex, realworld software. We observed that the performance bottleneck of the existing concolic executor is the main limiting factor for its adoption beyond a small-scale study. To overcome this problem, we design a fast concolic execution engine, called QSYM, to support hybrid fuzzing. The key idea is to tightly integrate the symbolic emulation with the native execution using dynamic binary translation, making it possible to implement more fine-grained, so faster, instruction-level symbolic emulation. Additionally, QSYM loosens the strict soundness requirements of conventional concolic executors for better performance, yet takes advantage of a faster fuzzer for validation, providing unprecedented opportunities for performance optimizations, e.g., optimistically solving constraints and pruning uninteresting basic blocks. Our evaluation shows that QSYM does not just outperform state-of-the-art fuzzers (i.e., found 14× more bugs than VUzzer in the LAVA-M dataset, and outperformed Driller in 104 binaries out of 126), but also found 13 previously unknown security bugs in eight real-world programs like Dropbox Lepton, ffmpeg, and OpenJPEG, which have already been intensively tested by the state-of-the-art fuzzers, AFL and OSS-Fuzz.

[1]  Angelos D. Keromytis,et al.  A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware , 2012, NDSS.

[2]  Rishabh Singh,et al.  Not all bytes are equal: Neural byte sieve for fuzzing , 2017, ArXiv.

[3]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[4]  Yang Liu,et al.  Steelix: program-state based binary fuzzing , 2017, ESEC/SIGSOFT FSE.

[5]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2017, IEEE Trans. Software Eng..

[6]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[7]  Marcelo d'Amorim,et al.  A Comparative Study of Incremental Constraint Solving Approaches in Symbolic Execution , 2014, Haifa Verification Conference.

[8]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[9]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[10]  Alexander Aiken,et al.  Stratified synthesis: automatically learning the x86-64 instruction set , 2016, PLDI.

[11]  Rick Chen,et al.  End-to-End Verification of Processors with ISA-Formal , 2016, CAV.

[12]  Koushik Sen,et al.  FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage , 2017, ArXiv.

[13]  Brian S. Pak,et al.  Hybrid Fuzz Testing: Discovering Software Bugs via Fuzzing and Symbolic Execution , 2012 .

[14]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[15]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[16]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[17]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[18]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[19]  Herbert Bos,et al.  Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations , 2013, USENIX Security Symposium.

[20]  Mathias Payer,et al.  T-Fuzz: Fuzzing by Program Transformation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[21]  Jean-Yves Marion,et al.  Specification of concretization and symbolization policies in symbolic execution , 2016, ISSTA.

[22]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[23]  Joe Hendrix,et al.  Bounded Integer Linear Constraint Solving via Lattice Search , 2015 .

[24]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[25]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[26]  Patrice Godefroid,et al.  Billions and billions of constraints: Whitebox fuzz testing in production , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[28]  David Brumley,et al.  Your Exploit is Mine: Automatic Shellcode Transplant for Remote Exploits , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[29]  Dawson R. Engler,et al.  EXE: Automatically Generating Inputs of Death , 2008, TSEC.

[30]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[31]  Rupak Majumdar,et al.  Hybrid Concolic Testing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[32]  Stephen McCamant,et al.  Path-exploration lifting: hi-fi tests for lo-fi emulators , 2012, ASPLOS XVII.