摘要
传统的挖掘频繁项集的并行算法存在各节点间负载不均衡、同步开销过大、通信量大等问题。针对这些问题,提出了一种多次传送重新分配数据的并行算法(MRPD)。MRPD算法在第l步时将数据库重新划分成若干组,并根据各节点的需要多次传送分组;各节点获得完整分组后异步地计算频繁项集;所有节点计算完成后,得到全部频繁项集。理论分析和实验结果表明MRPD算法是有效的。
There were problems in traditional parallel algorithms for mining frequent itemsets, such as load imbalance, frequent synchronization, large scale communication and so on. Aiming at solving these problems, this paper proposed a parallel algorithm with multi-transmitting redistributed data (MRPD). In MRPD, data was redistributed into some groups at step l, and all the groups were multi-transmitted according to the request of computer nodes. Each node would compute frequent itemsets asynchronously after having received one full group. Finally, resulted the integrated frequent itemsets. Theoretical analysis and experimental results suggest that MRPD is effective.
出处
《计算机应用研究》
CSCD
北大核心
2008年第11期3332-3334,共3页
Application Research of Computers
基金
安徽省自然科学基金资助项目(050420207)
关键词
数据挖掘
并行算法
频繁项集
data mining
parallel algorithm
frequent itemsets