Study on algorithms of parallel and distributed data mining calculating process

Based on distributed data mining, a kind of parallel and distributed calculating architecture that store partition data information into sub-nodes is introduced by using a thought of partition database and improved Apriori algorithms. It lays emphasis on the data skew in the distributed environment. A converse clustering method is proposed to solve the data skew problem. The corresponding algorithms of parallel and distributed data mining are designed based on the large-scale transaction database. Calculating processes of these algorithms are described in detail. As the parallel and distributed data are processed after effective partition, the transmitted data size is greatly reduced through efficient communication among nodes. The proposed algorithms provide a flexible and extended calculation platform, reduce overhead traffic, and keep a favorable expansibility. The proposed algorithms aim at performing network calculation and finding advantages of network calculation by using a fairly cheap computer. The proposed algorithms can be applied to large parallel or distributed single computer environment.