摘要
大数据时代,开展面向海量、分布数据的知识发现研究成为学界和业界关注的热点,而负载均衡问题是开发分布式挖掘算法必须考虑的重要因素之一。为此,提出了一种基于垂直频繁模式树带有负载均衡的分布关联规则挖掘算法,算法采用垂直频繁模式树存储项及其关联而无需对局部挖掘结果进行合并,减少了通信量,简化了处理流程。同时所提出的算法采用混合体系结构即中心站点按照局部站点的处理能力分配任务,实现了负载均衡,提升了算法的性能。实验结果表明所提算法切实可行并具有较高效率。
In mass data era, the research on knowledge discovery of massive and distributed data has become the hot spot in both academic field and industry. The problem of load balance is one of the important factors that must be considered in developing a distributed mining algorithm. Therefore, a distributed association rules mining algorithm with load balance based on vertical FP-tree (VFP-LBDM) was proposed in this paper. Vertical frequent pattern tree was used in this algorithm to store items and their associations, and there was no need to combine the local mining results. Therefore, the communication cost was reduced and the processing procedure was also simplified. At the same time, the algorithm used the hybrid architecture in which the central site assigned tasks according to the processing capacity of each local site. It realized the load balance and improved the performance of the algorithm. The experiment shows that the algorithm given in this paper is feasible and has higher efficiency.
出处
《计算机应用》
CSCD
北大核心
2014年第2期396-400,共5页
journal of Computer Applications
基金
教育部人文社会科学研究青年基金资助项目(12YJCZH048)
辽宁"百千万人才工程"培养经费资助项目(2011921033)
关键词
关联规则挖掘
分布式
垂直频繁模式
负载均衡
序列化
association rules mining
distribution
Vertical Frequent Pattern (VFP)
load balance
serialization