摘要
基于概念格的集中式数据挖掘算法,不能充分地利用分布式计算资源来改善概念格构造效率,从而影响了挖掘算法的性能.文中进一步分析了Iceberg概念格并置集成的内在并行特性;以频繁概念直乘及其下覆盖为最小粒度,对Iceberg概念格并置集成过程进行分解和分布式计算;在对其正确性理论证明的基础上,提出了一个新颖的异构分布式环境下闭频繁项集全局挖掘算法.此算法利用Iceberg概念格的半格以及可并置集成特性,充分发挥了分布式环境下计算资源的优势.实验证明,在稠密数据集和稀疏数据集上,该挖掘算法都表现出较好的性能.
With increasing distributed computing environment applied extensively,traditional center data mining algorithms which are based on concept lattice could not take full advantage of distributed computing resources to improve the time efficiency of constructing concept lattice.In consequence,the performance of mining algorithms could be affected.In this paper,we firstly further analyze the deep underlying parallel features of apposition assembly of Iceberg concept lattice.Secondly,we consider the sets which are consisted of the frequent concept direct produce and its lower cover as minimal computing units.And then those units can be scattered,handled distributively,and finally integrated into a global Iceberg concept lattice.The procedure of distributed assembly of Iceberg concept lattice is theoretically proved correct.Based on above works,a new algorithm is proposed to mine global closed frequent itemsets in heterogeneous distributed computing environment.This algorithm exploits the good quality of semi-lattice and apposition assembly construction,both of which are induced by Iceberg concept lattice.Therefore the algorithm has the ability to make the most of advantage of the computing sources in the distributed environment.It shows excellent efficiency of global data mining under both dense and sparse heterogeneous distributed data sets in experiments.
出处
《计算机学报》
EI
CSCD
北大核心
2012年第5期990-1001,共12页
Chinese Journal of Computers
关键词
Iceberg概念格
分布式数据挖掘
并置集成
异构数据库
闭频繁项集
Iceberg concept lattice
distributed data mining
apposition assembly
heterogeneous data scenario
closed frequent itemsets