Formal Analysis and Redesign of a Neural Network-Based Aircraft Taxiing System with VerifAI

We demonstrate a unified approach to rigorous design of safety-critical autonomous systems using the VerifAI toolkit for formal analysis of AI-based systems. VerifAI provides an integrated toolchain for tasks spanning the design process, including modeling, falsification, debugging, and ML component retraining. We evaluate all of these applications in an industrial case study on an experimental autonomous aircraft taxiing system developed by Boeing, which uses a neural network to track the centerline of a runway. We define runway scenarios using the Scenic probabilistic programming language, and use them to drive tests in the X-Plane flight simulator. We first perform falsification, automatically finding environment conditions causing the system to violate its specification by deviating significantly from the centerline (or even leaving the runway entirely). Next, we use counterexample analysis to identify distinct failure cases, and confirm their root causes with specialized testing. Finally, we use the results of falsification and debugging to retrain the network, eliminating several failure cases and improving the overall performance of the closed-loop system.

[1]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[2]  Ron Koymans,et al.  Specifying real-time properties with metric temporal logic , 1990, Real-Time Systems.

[3]  J. Halton On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals , 1960 .

[4]  Sriram Sankaranarayanan,et al.  Falsification of temporal properties of hybrid systems using the cross-entropy method , 2012, HSCC '12.

[5]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  George J. Pappas,et al.  Robustness of Temporal Logic Specifications , 2006, FATES/RV.

[8]  Alberto L. Sangiovanni-Vincentelli,et al.  Scenic: a language for scenario specification and scene generation , 2018, PLDI.

[9]  Somesh Jha,et al.  Semantic Adversarial Deep Learning , 2018, IEEE Design & Test.

[10]  Sanjit A. Seshia,et al.  VerifAI: A Toolkit for the Formal Design and Analysis of Artificial Intelligence-Based Systems , 2019, CAV.

[11]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[12]  Sanjit A. Seshia,et al.  Towards Verified Artificial Intelligence , 2016, ArXiv.

[13]  Divya Gopinath,et al.  A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[15]  Alberto L. Sangiovanni-Vincentelli,et al.  Counterexample-Guided Data Augmentation , 2018, IJCAI.