摘要
多数基于FP-growth思想的频繁模式挖掘算法存在建树过程复杂、支持度计算繁琐的问题。针对这些问题,提出一种基于位编码链表(Bitmap-Code List,BC-List)的频繁项集挖掘算法(BC-List Frequent Itemsets Mining,BCLFIM)。该算法首先采用基于位图表示的节点编码模型生成位图树(BC-tree),以BC-tree的节点信息作为数据结构通过按位运算来快速获取BC-List的节点集,避免了复杂的交集运算,提高了连接效率;其次通过使用超集等价和支持度计数剪枝策略,缩小了挖掘频繁模式的搜索空间。实验结果证明,该算法相比于FIN算法和DFIN算法具有更快的挖掘速度。
Most of the frequent pattern mining algorithms based on the FP-growth idea have the disadvantages of complex construction rules and cumbersome support calculations. This paper proposes a Frequent Item set Mining algorithm(BCLFIM)based on Bitmap-Code List(BC-List)to improve this problem. Firstly, in this algorithm, a node coding model based on bitmap representation is adopted to generate BC-tree, and the node information of BC-tree is used as the data structure to quickly obtain the node set of BC-List by bitwise operation, which can reduce complicated intersection operation and improve connection efficiency. Secondly, the search space for mining frequent patterns is reduced by using the superset equivalence and support count prune strategy. Experimental show that the algorithm has faster mining speed than FIN and DFIN algorithms.
作者
顾军华
苏鸣
张亚娟
张丹红
GU Junhua;SU Ming;ZHANG Yajuan;ZHANG Danhong(School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China;Hebei Province Key Laboratory of Big Data Computing,Tianjin 300401,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第19期86-93,共8页
Computer Engineering and Applications
基金
天津市自然科学基金重点项目(No.19JCZDJC40000)。
关键词
频繁项集挖掘
关联规则
剪枝策略
位图编码
frequent item mining
association rules
pruning strategy
bitmap encoding