Machine-tool condition monitoring with Gaussian mixture models-based dynamic probabilistic clustering

Abstract The combination of artificial intelligence with data, computing power, and new algorithms can provide important tools for solving engineering problems, such as machine-tool condition monitoring. However, many of these problems require algorithms that can perform in highly dynamic scenarios where the data streams have extremely high sampling rates from different types of variables. The unsupervised learning algorithm based on Gaussian mixture models called Gaussian-based dynamic probabilistic clustering (GDPC) is one of these tools. However, this algorithm may have major limitations if a large amount of concept drifts associated with transients occurs within the data stream. GDPC becomes unstable under these conditions, so we propose a new algorithm called GDPC+ to increase its robustness. GDPC+ represents an important improvement because we introduce: (a) automatic selection of the number of mixture components based on the Bayesian information criterion (BIC), and (b) concept drift transition stabilization based on Cauchy–Schwarz divergence integrated with the Dickey–Fuller test. Thus, GDPC+ can perform better in highly dynamic scenarios than GDPC in terms of the number of false positives. The behavior of GDPC+ was investigated using random synthetic data streams and in a real data stream-based condition monitoring obtained from a machine-tool that produces engine crankshafts at high speed. We found that the initial temporal window size can be used to adapt the algorithm to different analytical requirements. The clustering results were also investigated by induction of the rules generated by the repeated incremental pruning to produce error reduction (RIPPER) algorithm in order to provide insights from the underlying monitored process and its associated concept drifts.

[1]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[2]  Concha Bielza,et al.  Industrial Applications of Machine Learning , 2018 .

[3]  José Carlos Príncipe,et al.  Closed-form cauchy-schwarz PDF divergence for mixture of Gaussians , 2011, The 2011 International Joint Conference on Neural Networks.

[4]  W. Fuller,et al.  Distribution of the Estimators for Autoregressive Time Series with a Unit Root , 1979 .

[5]  Huajun Chen,et al.  A review: The effects of imperfect data on incremental decision tree , 2018, Int. J. Inf. Commun. Technol..

[6]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[7]  João Gama,et al.  Clustering distributed sensor data streams using local processing and reduced communication , 2011, Intell. Data Anal..

[8]  Edwin Lughofer,et al.  Autonomous data stream clustering implementing split-and-merge concepts - Towards a plug-and-play approach , 2015, Inf. Sci..

[9]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[11]  Witold Pedrycz,et al.  Online Tool Condition Monitoring Based on Parsimonious Ensemble+ , 2017, IEEE Transactions on Cybernetics.

[12]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[13]  M. P. S. Bhatia,et al.  A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority , 2018, Int. J. Mach. Learn. Cybern..

[14]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[15]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.

[16]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[17]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[18]  Concha Bielza,et al.  Machine Learning-based CPS for Clustering High throughput Machining Cycle Conditions , 2017 .

[19]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[20]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[21]  Khaled Ghédira,et al.  Discussion and review on evolving data streams and concept drift adapting , 2018, Evol. Syst..

[22]  Concha Bielza,et al.  Clustering of Data Streams With Dynamic Gaussian Mixture Models: An IoT Application in Industrial Processes , 2018, IEEE Internet of Things Journal.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[25]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[26]  Ying-Wong Cheung,et al.  Lag Order and Critical Values of the Augmented Dickey-Fuller Test , 1995 .

[27]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[28]  Edwin Lughofer A dynamic split-and-merge approach for evolving cluster models , 2012, Evol. Syst..

[29]  Mahardhika Pratama,et al.  Metacognitive learning approach for online tool condition monitoring , 2017, J. Intell. Manuf..

[30]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..