Based on distributed data mining, a kind of parallel and distributed calculating architecture that store partition data information into sub-nodes is introduced by using a thought of partition database and improved Apriori algorithms. It lays emphasis on the data skew in the distributed environment. A converse clustering method is proposed to solve the data skew problem. The corresponding algorithms of parallel and distributed data mining are designed based on the large-scale transaction database. Calculating processes of these algorithms are described in detail. As the parallel and distributed data are processed after effective partition, the transmitted data size is greatly reduced through efficient communication among nodes. The proposed algorithms provide a flexible and extended calculation platform, reduce overhead traffic, and keep a favorable expansibility. The proposed algorithms aim at performing network calculation and finding advantages of network calculation by using a fairly cheap computer. The proposed algorithms can be applied to large parallel or distributed single computer environment.
[1]
Wang Baobao.
An Improved Apriori Algorithm for Mining Association Rules
,
2002
.
[2]
Shi Bin.
An Incremental Updating Algorithm for Mining Association Rules
,
2000
.
[3]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[4]
Tomasz Imielinski,et al.
Mining association rules between sets of items in large databases
,
1993,
SIGMOD Conference.
[5]
Wang Guang-yang.
An Improvement of Apriori Algorithm for Mining Association Rules
,
2006
.
[6]
Philip S. Yu,et al.
An effective hash-based algorithm for mining association rules
,
1995,
SIGMOD '95.
[7]
Chen Jingsong.
An Incremental Updating Algorithm for Mining Association Rules
,
2002
.
[8]
Cao Hui.
An Algorithm of Mining Association Rules Based on Vector Matrix
,
2004
.