摘要
FP__growth算法是基于FP树挖掘频繁项目集的关联规则经典算法,在许多领域中有很高的应用价值。针对传统的FP__growth算法可能产生大量的频繁项集,对FP树的挖掘过程进行了改进,提出了一种项合并剪枝的挖掘策略,进而分析了单路径和多路径的挖掘方法,减少了部分分支的挖掘次数。然后利用MapReduce模型,针对改进的算法并行化实现。实验结果表明该方法提高了算法的执行效率,并且具有良好的加速比和较好的扩展性。
FP-Growth is a classic association rule mining algorithm which is based on frequent pattern tree. It has the highly application in many areas. Focus on the problem of the traditional FP-growth algorithm may produce lots of frequent item sets. This essay proposes a strategy of item sets mergering and pruning which can improve the process mining on FP tree. Thus, analyzing the mining method of single path and multi-path can reduce the number of parts of mining branch. Then the improved FP-Growth algorithm parallelized, based on the MapReduce programming technology. The experimental results show the improved method has an advantage in executing efficiency and it has better acceleration ratio and scalability.
出处
《洛阳理工学院学报(自然科学版)》
2016年第4期59-62,共4页
Journal of Luoyang Institute of Science and Technology:Natural Science Edition
基金
河南科技厅科技攻关基金项目(162102210113)
关键词
频繁项目集
关联规则
项合并剪枝
frequent item set
association rules
merger pruning