Scalable Robustness

Complex computational systems can experience undetected faults that produce incorrect outputs. However, error measures can be adopted to quantify these incorrect results and evaluate computational robustness. This paper offers an approach to assessing the worst case scalable robustness (WCSR) of an algorithm paired with an error measure, as well as the i.i.d. average case scalable robustness (ACSRiid). In a case study on four linearithmic and quadratic pairwise sorting algorithms suffering faulty comparisons, we confirm that algorithm efficiency is inversely correlated with algorithm robustness, and more unexpectedly, that only round robin sort -- a quadratic algorithm rarely used in computing -- achieves both ACSRiid and WCSR.

[1]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[2]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[3]  Christos D. Antonopoulos,et al.  GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[4]  Sumit Gulwani,et al.  Proving programs robust , 2011, ESEC/FSE '11.

[5]  Sumit Gulwani,et al.  Continuity and robustness of programs , 2012, CACM.

[6]  Michael T. Goodrich,et al.  Zig-zag sort: a simple deterministic data-oblivious sorting algorithm running in O(n log n) time , 2014, STOC.

[7]  Marco Vieira,et al.  Studying the Propagation of Failures in SOAs , 2015, 2015 IEEE International Conference on Dependable Systems and Networks Workshops.

[8]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[9]  William M. Jones,et al.  Fault Injection Experiments with the CLAMR Hydrodynamics Mini-App , 2014, 2014 IEEE International Symposium on Software Reliability Engineering Workshops.

[10]  Derick Wood,et al.  A survey of adaptive sorting algorithms , 1992, CSUR.

[11]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Eli Upfal,et al.  Fault Tolerant Sorting Networks , 1991, SIAM J. Discret. Math..

[13]  Larry Rudolph,et al.  A Robust Sorting Network , 1985, IEEE Transactions on Computers.

[14]  Cormac Herley Security, cybercrime, and scale , 2014, Commun. ACM.

[15]  Brad L. Hutchings,et al.  A Fault Injection Analysis of Linux Operating on an FPGA-Embedded Platform , 2012, Int. J. Reconfigurable Comput..

[16]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[17]  David H. Ackley Indefinite Scalability for Living Computation , 2016, AAAI.

[18]  Owen L. Astrachan,et al.  Bubble sort: an archaeological algorithmic analysis , 2003, SIGCSE.

[19]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[20]  Koushik Chakraborty,et al.  Adapting to intermittent faults in multicore systems , 2008, ASPLOS.

[21]  Rolf Riesen,et al.  Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing , 2012, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[22]  Luigi Carro,et al.  Radiation Sensitivity of High Performance Computing Applications on Kepler-Based GPGPUs , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[23]  Régis Leveugle,et al.  Glitch and Laser Fault Attacks onto a Secure AES Implementation on a SRAM-Based FPGA , 2011, Journal of Cryptology.

[24]  David Naccache,et al.  The Sorcerer's Apprentice Guide to Fault Attacks , 2006, Proceedings of the IEEE.

[25]  Andrea Höller,et al.  A Virtual Fault Injection Framework for Reliability-Aware Software Development , 2015, 2015 IEEE International Conference on Dependable Systems and Networks Workshops.

[26]  Hong-Zhong Huang,et al.  Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery , 2011, IEEE Transactions on Reliability.

[27]  Jasper G. J. van Woudenberg,et al.  Practical Optical Fault Injection on Secure Microcontrollers , 2011, 2011 Workshop on Fault Diagnosis and Tolerance in Cryptography.

[28]  Catherine A. Meadows,et al.  A Cost-Based Framework for Analysis of Denial of Service Networks , 2001, J. Comput. Secur..

[29]  David H. Ackley Beyond efficiency , 2013, Commun. ACM.

[30]  Michael T. Goodrich,et al.  Spin-the-Bottle Sort and Annealing Sort: Oblivious Sorting via Round-Robin Random Comparisons , 2010, Algorithmica.

[31]  Umberto Ferraro Petrillo,et al.  The Price of Resiliency: a Case Study on Sorting with Memory Faults , 2008, Algorithmica.

[32]  Joachim Giesen,et al.  Approximate Sorting , 2006, Fundam. Informaticae.

[33]  David H. Ackley,et al.  Artificial life programming in the robust-first attractor , 2015, ECAL.

[34]  C. Constantinescu,et al.  Intermittent faults and effects on reliability of integrated circuits , 2008, 2008 Annual Reliability and Maintainability Symposium.

[35]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[36]  Franck Cappello,et al.  Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..

[37]  David H. Ackley,et al.  Comparison Criticality in Sorting Algorithms , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[38]  Marco Vieira,et al.  Towards Assessing Representativeness of Fault Injection-Generated Failure Data for Online Failure Prediction , 2015, 2015 IEEE International Conference on Dependable Systems and Networks Workshops.