摘要
针对分布式系统,提出了自适应哈希链结构的频繁模式挖掘算法。该算法首先在每个站点产生局部频繁1-项集,再产生全局频繁1-项集,根据全局频繁1-项集产生各站点的投影数据库,在各个站点分别扫描投影数据库中的交易,并根据站点可用内存情况形成相应大小的哈希链结构。通过挖掘各站点的哈希链结构得到全局频繁项集。给出了基本步骤和挖掘算法。研究表明该算法不但效率高,而且适应性强。
An algorithm for mining frequent pattern is put forward based on adaptive hash chain structures for a distributed system. In this algorithm, first the frequent 1-itemsets are generated at every site, then global frequent 1-itemsets are generated and the projection database of the global frequent 1-itemsets is formed at every site. After the transaction of the projection database is scanned at every site respectively, corresponding hash chain structures that are fit for the available memory are constructed at every site and mined to gain the global frequent itemsets. The basic process and the mining algorithm are presented. The study shows that the algorithm has higher efficiency and adaptability than the exiting approaches.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2005年第3期560-564,共5页
Systems Engineering and Electronics
基金
江苏省高校自然科学研究计划基金资助课题(04KJB46003)
关键词
数据挖掘
频繁模式
分布式
自适应
哈希链
data mining
frequent pattern
distributed
adaptive
hash chain