KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs

We present a new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions of Unix systems, and arguably are the single most heavily tested set of open-source programs in existence. KLEE-generated tests achieve high line coverage -- on average over 90% per tool (median: over 94%) -- and significantly beat the coverage of the developers' own hand-written test suite. When we did the same for 75 equivalent tools in the BUSYBOX embedded system suite, results were even better, including 100% coverage on 31 of them. We also used KLEE as a bug finding tool, applying it to 452 applications (over 430K total lines of code), where it found 56 serious bugs, including three in COREUTILS that had been missed for over 15 years. Finally, we used KLEE to crosscheck purportedly identical BUSYBOX and COREUTILS utilities, finding functional correctness errors and a myriad of inconsistencies.

[1]  Barton P. Miller,et al.  Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services , 1995 .

[2]  Patrice Godefroid,et al.  Model checking for programming languages using VeriSoft , 1997, POPL '97.

[3]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[4]  Jörg Hoffmann,et al.  A New Method to Index and Query Sets , 1999, IJCAI.

[5]  James C. Corbett,et al.  Bandera: extracting finite-state models from Java source code , 2000, ICSE.

[6]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[7]  Sriram K. Rajamani,et al.  Automatically validating temporal safety properties of interfaces , 2001, SPIN '01.

[8]  Gerard J. Holzmann,et al.  From code to models , 2001, Proceedings Second International Conference on Application of Concurrency to System Design.

[9]  Sarfraz Khurshid,et al.  Generalized Symbolic Execution for Model Checking and Testing , 2003, TACAS.

[10]  Daniel Kroening,et al.  Hardware verification using ANSI-C programs as a reference , 2003, ASP-DAC '03.

[11]  Daniel Kroening,et al.  Behavioral consistency of C and Verilog programs using bounded model checking , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[12]  Sarfraz Khurshid,et al.  Test input generation with java PathFinder , 2004, ISSTA '04.

[13]  Daniel Kroening,et al.  A Tool for Checking ANSI-C Programs , 2004, TACAS.

[14]  Klaus Havelund,et al.  Model Checking Programs , 2004, Automated Software Engineering.

[15]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[16]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[17]  Dawson R. Engler,et al.  Execution Generated Test Cases: How to Make Systems Code Crash Itself , 2005, SPIN.

[18]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[19]  Miguel Castro,et al.  Vigilante: end-to-end containment of internet worms , 2005, SOSP '05.

[20]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[21]  Hao Wang,et al.  Towards automatic generation of vulnerability-based signatures , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[22]  EXE: automatically generating inputs of death , 2006, CCS '06.

[23]  Eddie Kohler,et al.  Making information flow explicit in HiStar , 2006, OSDI '06.

[24]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[25]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[26]  Rupak Majumdar,et al.  Dynamic test input generation for database applications , 2007, ISSTA '07.

[27]  David L. Dill,et al.  A Decision Procedure for Bit-Vectors and Arrays , 2007, CAV.

[28]  Miguel Castro,et al.  Bouncer: securing software by blocking bad input , 2007, SOSP.

[29]  Rupak Majumdar,et al.  Hybrid Concolic Testing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[30]  Frank Tip,et al.  Finding bugs in dynamic web applications , 2008, ISSTA '08.

[31]  Dawson R. Engler,et al.  EXE: Automatically Generating Inputs of Death , 2008, TSEC.

[32]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[33]  Manuel Costa,et al.  Bouncer: securing software by blocking bad input , 2008, WRAITS '08.

[34]  Nikolai Tillmann,et al.  Demand-Driven Compositional Symbolic Execution , 2008, TACAS.

[35]  Dawson R. Engler,et al.  RWset: Attacking Path Explosion in Constraint-Based Test Generation , 2008, TACAS.