摘要
当前数据位置挖掘方法无法获取全局频繁项目集,导致上述方法存在效率低、时延长等问题。为此提出基于并行FPGrowth算法的数据点位置智能挖掘方法。将并行FP-Growth算法与MAP/Reduce结合,得到优化后的FPPM算法。利用FPPM算法的reducer函数计算事务数据集局部频繁项目集,并将其整合得到全局频繁项目集。计算每个项目集属性,利用增量分类法筛选出最佳属性,统计每个属性出现的概率,构建决策分支树,完成数据点位置的挖掘。以CPU占用率、挖掘时延、信息熵以及可扩展性为测试指标设计仿真。实验结果验证了所提方法在确保较低挖掘时延前提下实现了高效率挖掘,并具有理想的可扩展性。
Currently,some methods of mining data location are unable to obtain the global frequent itemsets,leading to low efficiency and high time delay.Therefore,an intelligent approach of mining data point location based on a parallel FP-Growth algorithm was proposed.Firstly,we combined the parallel FP-Growth algorithm with MAP/Reduce to obtain an optimized FPPM algorithm.Secondly,we used the reducer functions of the FPPM algorithm to calculate the local frequent item sets in the transaction data set,and then integrated them to get a global frequent item set.Thirdly,we calculated the attribute of every item set and screened out the best attributes by incremental taxonomy.Moreover,we calculated the probability of occurrence of each attribute and constructed the decision tree,thus mining the location of the data point.Simulation experiments were based on some indexes such as CPU usage,mining delay,information entropy,and scalability.Experimental results prove that the proposed method can achieve high-efficient mining on the premise of low delay and has good scalability.
作者
唐雯炜
李志敏
TANG Wen-wei;LI Zhi-min(Information Technology Center,Zhejiang Chinese Medical University,Hangzhou Zhejiang 310053,China)
出处
《计算机仿真》
北大核心
2022年第8期519-523,共5页
Computer Simulation
关键词
算法
项目集
挖掘时延
数据点位置
增量分类
Algorithm
Item set
Mining delay
Location of data point
Incremental classification