摘要
目的:针对FP-growth算法项结点查询耗时,频繁项集挖掘需要不断产生条件FP-tree等问题,提出了一种基于数组和辅助项头表的快速频繁项集挖掘算法。方法:首先算法使用Array-structure代替FP-tree;然后使用具有两层可hash结构的辅助项头表取代频繁项集头表,并存储项结点在Array-structure上的位置信息,结合数组可被索引和hash结构特性快速定位项结点;最后利用辅助项头表上存储的项结点信息直接挖掘频繁项集,无需生成条件FP-tree。结果:与FP-growth等算法相比,该算法在不同类型的数据集上极大地缩短了算法的执行时间。结论:基于数组和辅助项头表的快速频繁项集挖掘算法在密集型和稀疏型数据集上都具有更好的挖掘性能和更高的执行效率。
Aims:This paper aims to solve the problems of time-consuming of FP-growth item nodes query and continuous mining generated by conditional FP-tree.A fast frequent itemsets mining algorithm based on array and auxiliary item header tables was proposed.Methods:Firstly,FP-tree was replaced with Array-structure.Then the auxiliary item header table with two layers of hash structure were used to replace the frequent item set header table, and the location information of the item node on the Array-structure was stored.Arrays with index and hash structure features were combined to quickly locate item nodes and improved item nodes query efficiency.Finally,the frequent itemsets were mined directly without generating the condition of FP-tree by using the information of item nodes stored on the auxiliary item header table.Results:The experimental results showed that compared with FP-growth and other algorithms,the algorithm greatly shortened the execution time on different data sets.Conclusions:The fast frequent itemsets mining algorithm based on array and auxiliary item header tables has better performance and higher execution efficiency on both dense and sparse data sets.
作者
杜媛
张世伟
DU Yuan;ZHANG Shiwei(College of Information Engineering,China Jiliang University,Hangzhou 310018,China;Modern Educational Technology Center,China Jiliang University,Hangzhou 310018,China)
出处
《中国计量大学学报》
2019年第1期78-84,共7页
Journal of China University of Metrology
关键词
计量学
关联规则
频繁项集
最小支持度
频繁模式增长
metrology
association rules
frequent itemsets
minimum support
frequent pattern growth