摘要
为了提高带负项值的on-shelf效用项集挖掘算法的挖掘效率,提出带负项值的on-shelf效用项集并行挖掘算法DTP-Houn,算法基于MapReduce框架,充分利用其on-shelf时间段因素,将原始事务数据库按照时间段进行分片。算法将挖掘过程转化为MapReduce工作,Map阶段在分片数据库中挖掘候选项集,Reduce阶段并行计算候选项集的on-shelf效用值。实验结果表明,算法取得了较高的挖掘效率。
In order to improve the mining efficiency of the on-shelf utility itemset mining algorithms with negative item values,the paper proposed a parallel algorithm for mining on-shelf utility itemset with negative item values named DTP-Houn(distributed TPHoun algorithm). Based on MapReduce,the algorithm divides the database according to the on-shelf time periods. The algorithm transforms the mining work into MapReduce job,the Map phase to mine candidates in database fragments,and the Reduce phase to calculate the on-shelf utility values of the candidates in parallel. The experimental results show that the DTP-Houn algorithm has a good performance.
出处
《计算机与现代化》
2018年第4期13-16,21,共5页
Computer and Modernization
基金
福建省自然科学基金资助项目(2014J01229)