Fast algorithms for mining association rules

We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Heikki Mannila,et al.  Dependency Inference , 1987, VLDB.

[3]  Herbert A. Simon,et al.  Scientific Discovery: Computational Explorations of the Creative Processes , 1987 .

[4]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[7]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[8]  Gomer Thomas,et al.  Practitioner problems in need of database research , 1991, SGMD.

[9]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[12]  Usama M. Fayyad,et al.  SKICAT: A Machine Learning System for Automated Cataloging of Large Scale Sky Surveys , 1993, ICML.

[13]  Deborah L. McGuinness,et al.  Integrated Support for Data Archeology , 1993, Int. J. Cooperative Inf. Syst..

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[16]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[19]  Arno Siebes,et al.  Data Mining: the search for knowledge in databases. , 1994 .

[20]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.