NEZHA: Efficient Domain-Independent Differential Testing

Differential testing uses similar programs as cross-referencing oracles to find semantic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Unfortunately, existing differential testing tools are domain-specific and inefficient, requiring large numbers of test inputs to find a single bug. In this paper, we address these issues by designing and implementing NEZHA, an efficient input-format-agnostic differential testing framework. The key insight behind NEZHA's design is that current tools generate inputs by simply borrowing techniques designed for finding crash or memory corruption bugs in individual programs (e.g., maximizing code coverage). By contrast, NEZHA exploits the behavioral asymmetries between multiple test programs to focus on inputs that are more likely to trigger semantic bugs. We introduce the notion of δ-diversity, which summarizes the observed asymmetries between the behaviors of multiple test applications. Based on δ-diversity, we design two efficient domain-independent input generation mechanisms for differential testing, one gray-box and one black-box. We demonstrate that both of these input generation schemes are significantly more efficient than existing tools at finding semantic bugs in real-world, complex software. NEZHA's average rate of finding differences is 52 times and 27 times higher than that of Frankencerts and Mucerts, two popular domain-specific differential testing tools that check SSL/TLS certificate validation implementations, respectively. Moreover, performing differential testing with NEZHA results in 6 times more semantic bugs per tested input, compared to adapting state-of-the-art general-purpose fuzzers like American Fuzzy Lop (AFL) to differential testing by running them on individual test programs for input generation. NEZHA discovered 778 unique, previously unknown discrepancies across a wide variety of applications (ELF and XZ parsers, PDF viewers and SSL/TLS libraries), many of which constitute previously unknown critical security vulnerabilities. In particular, we found two critical evasion attacks against ClamAV, allowing arbitrary malicious ELF/XZ files to evade detection. The discrepancies NEZHA found in the X.509 certificate validation implementations of the tested SSL/TLS libraries range from mishandling certain types of KeyUsage extensions, to incorrect acceptance of specially crafted expired certificates, enabling man-in-the-middle attacks. All of our reported vulnerabilities have been confirmed and fixed within a week from the date of reporting.

[1]  Vitaly Shmatikov,et al.  The most dangerous code in the world: validating SSL certificates in non-browser software , 2012, CCS.

[2]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[3]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.2 , 2008, RFC.

[4]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[5]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[6]  Unix System Laboratories System V Application Binary Interface , 1993 .

[7]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[8]  Angelos D. Keromytis,et al.  SFADiff: Automated Evasion Attacks and Fingerprinting Using Black-box Differential Automata Learning , 2016, CCS.

[9]  Peter M. Maurer,et al.  Generating test data with enhanced context-free grammars , 1990, IEEE Software.

[10]  Vitaly Shmatikov,et al.  Using Frankencerts for Automated Adversarial Testing of Certificate Validation in SSL/TLS Implementations , 2014, 2014 IEEE Symposium on Security and Privacy.

[11]  Dawson R. Engler,et al.  Execution Generated Test Cases: How to Make Systems Code Crash Itself , 2005, SPIN.

[12]  Suman Jana,et al.  Automatically Detecting Error Handling Bugs Using Error Specifications , 2016, USENIX Security Symposium.

[13]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[14]  Zhendong Su,et al.  Finding deep compiler bugs via guided stochastic program mutation , 2015, OOPSLA.

[15]  Alan O. Freier,et al.  Internet Engineering Task Force (ietf) the Secure Sockets Layer (ssl) Protocol Version 3.0 , 2022 .

[16]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[17]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.3 , 2018, RFC.

[18]  Allen D. Householder,et al.  Probability-Based Parameter Selection for Black-Box Fuzz Testing , 2012 .

[19]  Brian A. Malloy,et al.  An Interpretation of Purdom's Algorithm forAutomatic Generation of Test Cases , 2001 .

[20]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[21]  Christopher Allen,et al.  The TLS Protocol Version 1.0 , 1999, RFC.

[22]  Seung-Soon Im,et al.  Tool interface standard (TIS) executable and linking format (ELF) specification , 1995 .

[23]  Emin Gün Sirer,et al.  Using production grammars in software testing , 1999, DSL '99.

[24]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[25]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[26]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[27]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[28]  Vitaly Shmatikov,et al.  A security policy oracle: detecting security holes using multiple API implementations , 2011, PLDI '11.

[29]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[30]  David Brumley,et al.  Program-Adaptive Mutational Fuzzing , 2015, 2015 IEEE Symposium on Security and Privacy.

[31]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[32]  Zhendong Su,et al.  Guided differential testing of certificate validation in SSL/TLS implementations , 2015, ESEC/SIGSOFT FSE.

[33]  Sarfraz Khurshid,et al.  TestEra: a novel framework for automated testing of Java programs , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[34]  Dawson R. Engler,et al.  Practical, Low-Effort Equivalence Verification of Real Code , 2011, CAV.

[35]  Zhendong Su,et al.  Coverage-directed differential testing of JVM implementations , 2016, PLDI.

[36]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[37]  Jeff Hodges,et al.  Representation and Verification of Domain-Based Application Service Identity within Internet Public Key Infrastructure Using X.509 (PKIX) Certificates in the Context of Transport Layer Security (TLS) , 2011, RFC.

[38]  Peter Chapman,et al.  Automated black-box detection of side-channel vulnerabilities in web applications , 2011, CCS '11.

[39]  Russ Housley,et al.  Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile , 2002, RFC.

[40]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[41]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[42]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.1 , 2006, RFC.

[43]  Suman Jana,et al.  APEx: Automated inference of error specifications for C APIs , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[44]  Andreas Zeller,et al.  Yesterday, my program worked. Today, it does not. Why? , 1999, ESEC/FSE-7.

[45]  Vitaly Shmatikov,et al.  Abusing File Processing in Malware Detectors for Fun and Profit , 2012, 2012 IEEE Symposium on Security and Privacy.

[46]  Warwick Ford,et al.  Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework , 2003, RFC.

[47]  Konstantin Serebryany,et al.  MemorySanitizer: Fast detector of uninitialized memory use in C++ , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[48]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[49]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[50]  Alexander Aiken,et al.  Synthesizing program input grammars , 2016, PLDI.

[51]  Roy P. Pargas,et al.  Test-Data Generation Using Genetic Algorithms , 1999, Softw. Test. Verification Reliab..

[52]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[53]  Herbert Bos,et al.  Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations , 2013, USENIX Security Symposium.