AQEyes: Visual Analytics for Anomaly Detection and Examination of Air Quality Data

Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability, and the efficiency of analysis. Furthermore, the poor quality of collected air quality data (inconsistently formatted and sometimes missing) also increases the difficulty of analysis substantially. In this paper, we systematically formulate design requirements for a system that can solve these limitations and then propose AQEyes, an integrated visual analytics system for efficiently monitoring, detecting, and examining anomalies in air quality data. In particular, we propose a unified end-to-end tunable machine learning pipeline that includes several data pre-processors and featurizers to deal with data quality issues. The pipeline integrates an efficient unsupervised anomaly detection method that works without the use of labeled data and overcomes the limitations of existing approaches. Further, we develop an interactive visualization system to visualize the outputs from the pipeline. The system incorporates a set of novel visualization and interaction designs, allowing analysts to visually examine air quality dynamics and anomalous events in multiple scales and from multiple facets. We demonstrate the performance of this pipeline through a quantitative evaluation and show the effectiveness of the visualization system using qualitative case studies on real-world datasets.

[1]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[2]  Marc Alexa,et al.  Visualizing time-series on spirals , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[3]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[4]  Valentino Constantinou,et al.  Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , 2018, KDD.

[5]  Leland Wilkinson,et al.  TimeSeer: Scagnostics for High-Dimensional Time Series , 2013, IEEE Transactions on Visualization and Computer Graphics.

[6]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[7]  Wouter Meulemans,et al.  Map LineUps: Effects of spatial structure on graphical inference , 2017, IEEE Transactions on Visualization and Computer Graphics.

[8]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[9]  Dongyu Liu,et al.  SmartAdP: Visual Analytics of Large-scale Taxi Trajectories for Selecting Billboard Locations , 2017, IEEE Transactions on Visualization and Computer Graphics.

[10]  Tamara Munzner,et al.  Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[11]  Jarke J. van Wijk,et al.  Cluster and Calendar Based Visualization of Time Series Data , 1999, INFOVIS.

[12]  Nirvana Meratnia,et al.  A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets , 2007 .

[13]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[14]  Laura Gustafson Bayesian tuning and bandits : an extensible, open source library for AutoML , 2018 .

[15]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[16]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[17]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[18]  Nhien-An Le-Khac,et al.  Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks , 2016, FDSE.

[19]  Menno-Jan Kraak,et al.  The space - time cube revisited from a geovisualization perspective , 2003 .

[20]  Lovekesh Vig,et al.  LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection , 2016, ArXiv.

[21]  Eyal Amir,et al.  Real-time Bayesian Anomaly Detection for Environmental Sensor Data , 2007 .

[22]  Yin Chen,et al.  Statistical anomaly detection with sensor networks , 2010, TOSN.

[23]  Pedro Galeano,et al.  Functional outlier detection by a local depth with application to NOx levels , 2014, Stochastic Environmental Research and Risk Assessment.

[24]  Ahmed Zouinkhi,et al.  A machine learning methods: Outlier detection in WSN , 2015, 2015 16th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA).

[25]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[26]  Pip Forer,et al.  Activities, ringmaps and geovisualization of large human movement fields , 2008, Inf. Vis..

[27]  Ming Li,et al.  Forecasting Fine-Grained Air Quality Based on Big Data , 2015, KDD.

[28]  Jo Wood,et al.  Revealing Patterns and Trends of Mass Mobility Through Spatial and Temporal Abstraction of Origin-Destination Movement Data , 2017, IEEE Transactions on Visualization and Computer Graphics.

[29]  Lei Shi,et al.  Visual Exploration of Air Quality Data with a Time-correlation-partitioning Tree Based on Information Theory , 2019, ACM Trans. Interact. Intell. Syst..

[30]  Ping Guo,et al.  Visual Analysis of the Air Pollution Problem in Hong Kong , 2007, IEEE Transactions on Visualization and Computer Graphics.

[31]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[32]  Miriah D. Meyer,et al.  Visually Comparing Weather Features in Forecasts , 2016, IEEE Transactions on Visualization and Computer Graphics.

[33]  Yike Guo,et al.  A Visual Analytics Approach for Station-Based Air Quality Data , 2016, Sensors.

[34]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Rüdiger Westermann,et al.  Visualization in Meteorology—A Survey of Techniques and Tools for Data Analysis Tasks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[37]  Niklas Elmqvist,et al.  Graphical Perception of Multiple Time Series , 2010, IEEE Transactions on Visualization and Computer Graphics.

[38]  Heidrun Schumann,et al.  Visualization of Time-Oriented Data , 2011, Human-Computer Interaction Series.

[39]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[40]  Han-Wei Shen,et al.  Visualization and Exploration of Temporal Trend Relationships in Multivariate Time-Varying Data , 2009, IEEE Transactions on Visualization and Computer Graphics.

[41]  Nan Cao,et al.  StreamExplorer: A Multi-Stage System for Visually Exploring Events in Social Streams , 2018, IEEE Transactions on Visualization and Computer Graphics.

[42]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[43]  Norshahida Shaadan,et al.  Anomaly detection and assessment of PM10 functional data at several locations in the Klang Valley, Malaysia , 2015 .