Experience Mining Google's Production Console Logs

We describe our early experience in applying our console log mining techniques [19, 20] to logs from production Google systems with thousands of nodes. This data set is five orders of magnitude in size and contains almost 20 times as many messages types as the Hadoop data set we used in [19]. It also has many properties that are unique to large scale production deployments (e.g., the system stays on for several months and multiple versions of the software can run concurrently). Our early experience shows that our techniques, including source code based log parsing, state and sequence based feature creation and problem detection, work well on this production data set. We also discuss our experience in using our log parser to assist the log sanitization.

[1]  Si-zhao Joe Qin,et al.  Multi-dimensional Fault Diagnosis Using a Subspace Approach , 1997 .

[2]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[3]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[4]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[5]  S. J. QinDepartment Multi-dimensional Fault Diagnosis Using a Subspace Approach , 1997 .

[6]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[7]  James E. Prewett Analyzing cluster log files using Logsurfer , 2003 .

[8]  David Walker,et al.  From dirt to shovels: fully automatic tool generation from ad hoc data , 2008, POPL '08.

[9]  Peter Sommerlad,et al.  Refactoring support for the C++ development tooling , 2007, OOPSLA '07.

[10]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11]  Ling Huang,et al.  Large-Scale System Problems Detection by Mining Console Logs , 2009 .

[12]  Yuan Yao,et al.  Splitting-merging model of Chinese word tokenization and segmentation , 1998, Nat. Lang. Eng..

[13]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[14]  Robert Griesemer Parallelism by design: data analysis with sawzall , 2008, CGO '08.

[15]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[16]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Stephen E. Hansen,et al.  Automated System Monitoring and Notification with Swatch , 1993, LISA.

[18]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[19]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[20]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[21]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[22]  Chunyu Kit,et al.  Learning Case-based Knowledge for Disambiguating Chinese Word Segmentation: A Preliminary Study , 2002, SIGHAN@COLING.

[23]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).