摘要
频繁项集挖掘是关联规则挖掘的关键步骤。FP-Growth算法是一种有效的频繁项集挖掘算法,它以自底向上的方式探索频繁模式树FP-tree,由FP-tree产生频繁项集。但是由于需要递归生成大量的条件FP-tree,其时间复杂度和空间复杂度都较高。针对这一问题,设计了一种基于单向频繁模式树的频繁项集挖掘算法UFIM。此算法首先构造一种单向频繁模式树UFP-tree结构,然后在UFP-tree上引入被约束子树,并对指向不同端点和指向相同端点的被约束子树分别采用递归和非递归的方法来挖掘频繁项集。非递归的方法判断端点的支持度计数是否小于最小支持度计数,若小于最小支持度计数则该棵被约束子树无频繁项集,否则其频繁项集是除根节点外的节点的排列组合。在mushroom数据集上的实验结果表明,UFIM算法的运行速度高于同类算法。
Mining frequent itemset is a key step in mining association rules.The FP-Growth algorithm is an efficient frequent itemset mining algorithm which explores the frequent pattern tree(FP-tree)by a bottom-up way,and generates frequent items by mining the FP-tree.However,its time complexity and space complexity are high because of needing to recursively generate a large number of conditional FP-tree.Aiming at this problem,we design a frequent itemset mining algorithm named UFIM based on unidirectional frequent pattern tree.This algorithm first constructs a unidirectional frequent pattern tree(UFP-tree)structure,then introduces a constrained sub-tree on the constructed UFP-tree;divides the constrained sub-tree into two cases:pointing to different endpoints and pointing to the same endpoints,and respectively uses recursive method and non-recursive method to mine frequent itemset.The non-recursive method determines whether the endpoint’s support count is smaller than the minimum support count.If it is smaller,the restricted sub-tree does not have frequent itemset,otherwise the frequent itemset of the restricted subtree is a node arrangement combination of the nodes besides root node.The experiment of mining frequent item set on the mushroom dataset shows that the running speed of the UFIM algorithm is higher than similar algorithms.
作者
蒋东洁
李玲娟
JIANG Dong-jie;LI Ling-juan(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机技术与发展》
2019年第10期175-180,共6页
Computer Technology and Development
基金
国家自然科学基金(61302158,61571238)