Spectral Clustering of Customer Transaction Data With a Two-Level Subspace Weighting Method

Finding customer groups from transaction data is very important for retail and e-commerce companies. Recently, a “Purchase Tree” data structure is proposed to compress the customer transaction data and a local PurTree spectral clustering method is proposed to cluster the customer transaction data. However, in the PurTree distance, the node weights for the children nodes of a parent node are set as equal and the differences between different nodes are not distinguished. In this paper, we propose a two-level subspace weighting spectral clustering (TSW) algorithm for customer transaction data. In the new method, a PurTree subspace metric is proposed to measure the dissimilarity between two customers represented by two purchase trees, in which a set of level weights are introduced to distinguish the importance of different tree levels and a set of sparse node weights are introduced to distinguish the importance of different tree nodes in a purchase tree. TSW learns an adaptive similarity matrix from the local distances in order to better uncover the cluster structure buried in the customer transaction data. Simultaneously, it learns a set of level weights and a set of sparse node weights in the PurTree subspace distance. An iterative optimization algorithm is proposed to optimize the proposed model. We also present an efficient method to compute a regularization parameter in TSW. TSW was compared with six clustering algorithms on ten benchmark data sets and the experimental results show the superiority of the new method.

[1]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[2]  Yunming Ye,et al.  DSKmeans: A new kmeans-type approach to discriminative subspace clustering , 2014, Knowl. Based Syst..

[3]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[4]  Xuelong Li,et al.  Graph Regularized Non-Negative Low-Rank Matrix Factorization for Image Clustering , 2017, IEEE Transactions on Cybernetics.

[5]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Tengke Xiong,et al.  DHCC: Divisive hierarchical clustering of categorical data , 2011, Data Mining and Knowledge Discovery.

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Margaret H. Dunham,et al.  Interactive Clustering for Transaction Data , 2001, DaWaK.

[9]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[10]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Yang Yang,et al.  Multitask Spectral Clustering by Exploring Intertask Correlation , 2015, IEEE Transactions on Cybernetics.

[12]  Fang-Ming Hsu,et al.  Segmenting customers by transaction data with concept hierarchy , 2012, Expert Syst. Appl..

[13]  Michael William Newman,et al.  The Laplacian spectrum of graphs , 2001 .

[14]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[15]  Tzu-Chuen Lu,et al.  A transaction pattern analysis system based on neural network , 2009, Expert Syst. Appl..

[16]  Ming Zhong,et al.  TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering , 2018, Pattern Recognit..

[17]  R. J. Kuo,et al.  Integration of self-organizing feature map and K-means algorithm for market segmentation , 2002, Comput. Oper. Res..

[18]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[19]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Feiping Nie,et al.  Local PurTree Spectral Clustering for Massive Customer Transaction Data , 2017, IEEE Intelligent Systems.

[21]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[22]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Joshua Zhexue Huang,et al.  PurTreeClust: A purchase tree clustering algorithm for large-scale customer transaction data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[24]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[25]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[26]  C.-Y. Tsai,et al.  A purchase-based market segmentation methodology , 2004, Expert Syst. Appl..

[27]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[28]  Ke Wang,et al.  Clustering transactions using large items , 1999, CIKM '99.

[29]  Zhaohong Deng,et al.  Enhanced soft subspace clustering integrating within-cluster and between-cluster information , 2010, Pattern Recognit..