摘要
针对FP-Growth算法在构建FP-tree过程中需要对事务数据库扫描两次,同时在利用FP-tree挖掘频繁项集过程中产生大量条件模式基和条件模式树的问题,提出一种改进的FP-Growth算法。该算法只需扫描一次事务数据库,就能构建一棵无相同节点的新的FP-tree;弃用项头表,新增与新的FP-tree关联的节点表,将构建新的FP-tree过程中"多余"的项信息存入节点表;利用新的FP-tree和节点表挖掘频繁项集。实验结果表明了该算法的可行性和有效性,其提高了数据挖掘的效率。
Aiming at the problems that FP-Growth algorithm scans the transaction database twice for building the FP-tree and it generates a huge number of conditional pattern bases and conditional pattern trees when mining frequent itemsets,an improved FP-Growth algorithm was presented.A new FP-tree without the same node was built by only scanning the transaction database one time.In the process of building new FP-tree,the item head table was abandoned and an associated node table was added to store the redundant item information.The new FP-tree and a node table were used to mine frequent itemsets.Results of experiments show the feasibility and validity of the algorithm,and it improves the efficiency of data mining.
出处
《计算机工程与设计》
北大核心
2018年第1期140-145,共6页
Computer Engineering and Design