Implications of Probabilistic Data Modeling for Mining Association Rules

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine association rules are discussed in great detail. We present a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world grocery database to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left-hand-side of rules and that lift performs poorly to filter random noise in transaction data. The probabilistic data modeling approach presented in this paper not only is a valuable framework to analyze interest measures but also provides a starting point for further research to develop new interest measures which are based on statistical tests and geared towards the specific properties of transaction data.

[1]  William DuMouchel,et al.  Empirical bayes screening for multi-item associations , 2001, KDD '01.

[2]  R. Betancourt,et al.  Demand Complementarities, Household Production, and Retail Assortments , 1990 .

[3]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  Geert Wets,et al.  Direct and indirect effects of retail promotions on sales and profits in the do-it-yourself market , 2003, Expert Syst. Appl..

[5]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[6]  Kurt Hornik,et al.  Implications of probabilistic data modeling for rule mining , 2005 .

[7]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[8]  Vladimir Kotlyar,et al.  Personalization of Supermarket Product Recommendations , 2004, Data Mining and Knowledge Discovery.

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  H. Hruschka,et al.  Cross-category sales promotion effects , 1999 .

[11]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[12]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[13]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[14]  Geert Wets,et al.  Building an Association Rules Framework to Improve Product Assortment Decisions , 2004, Data Mining and Knowledge Discovery.

[15]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[16]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.