Accelerating Support Count for Association Rule Mining on GPUs

In this work, we present a highly parallel work-efficient algorithm for performing support count on a GPU. We develop a compressed data layout scheme that enables high off-chip memory bandwidth utilization. Our data layout results in low overhead parallel coordination while reducing the memory requirements of support count. We evaluate our algorithm through extensive experimentation both on synthetically generated and real data. We achieve maximum throughput of 50 billion evaluations per second for our parallel two phase algorithm, while outperforming that of non work-efficient implementations on a multi-core CPU and a GPU by almost 40×. Resolving bank conflicts results in reduction of the execution time per iteration of our algorithm up to 6%. Employing additional optimizations such as loop unrolling leads to improvement in execution time up to 18%.

[1]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[2]  Salvatore Orlando,et al.  gpuDCI: Exploiting GPUs in Frequent Itemset Mining , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[3]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[4]  K. Vanhoof,et al.  Profiling of High-Frequency Accident Locations by Use of Association Rules , 2003 .

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[7]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[8]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[9]  H. Kaabi,et al.  Distributed Frequent Itemset Mining using Trie Data Structure , 2022 .

[10]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[11]  Masaru Kitsuregawa,et al.  Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[12]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[15]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[16]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[17]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[18]  Fan Zhang,et al.  Accelerating frequent itemset mining on graphics processing units , 2013, The Journal of Supercomputing.

[19]  Tamir Tassa,et al.  Secure Mining of Association Rules in Horizontally Distributed Databases , 2011, IEEE Transactions on Knowledge and Data Engineering.

[20]  Bora Uçar,et al.  Parallel Frequent Item Set Mining with Selective Item Replication , 2011, IEEE Transactions on Parallel and Distributed Systems.

[21]  Jeevan Kumar Kalluru,et al.  Secure Mining of Association Rules in Horizontally Distributed Databases , 2017 .

[22]  Ruoming Jin,et al.  Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Fan Zhang,et al.  An FPGA-Based Accelerator for Frequent Itemset Mining , 2013, TRETS.

[24]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[25]  Abdolreza Mirzaei,et al.  Intrusion detection using fuzzy association rules , 2009, Appl. Soft Comput..

[26]  Yanjun Qi,et al.  Association Rule Mining with the Micron Automata Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[27]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[28]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[29]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.