摘要
频繁项集挖掘作为关联规则挖掘技术的关键步骤,其性能对关联规则挖掘具有重要的意义。针对经典关联规则挖掘算法——Apriori算法存在的产生候选项目集效率低和频繁扫描数据库等缺点,对Apriori算法的原理及效率进行分析,提出一种基于预判筛选策略的频繁项集挖掘算法。该算法通过对原始数据集的随机取样,得出样本频繁项集的支持度集合来进行预判筛选,从而对原始数据集候选项集进行二次剪枝,并且引入阻尼因子和补偿因子对预判筛选产生的误差进行修正,以保证算法的误判率和遗漏率。实验结果表明,该算法具有更好的时效性。
Frequent item-set mining as a key step in mining association roles,its performance is of great significance to mining association rules. Aiming at the shortcomings of classical Apriori algorithm like low efficiency and frequent scanning database, we propose a frequent item-set mining algorithm based on prejudge and screening through analysis of the principle and efficiency of the Apriori algorithm. It ob- tains the support-set of frequent item-set for prejudgment and screening by random sampling of the original dataset, so as to make the second pruning of the candidate set from original dataset. The damping factor and the compensation factor are introduced to correct the error caused by the pre-selection screening to ensure the misjudgment rote and the omission rate of the algodthra. The experiments show that the proposed algorithm has better time efficiency.
作者
李德辰
吕一帆
赵学健
LI De-chen;LYU Yi-fan;ZHAO Xue-jian(School of Intemet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;School of Modem Post, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)
出处
《计算机技术与发展》
2018年第5期99-102,共4页
Computer Technology and Development
基金
国家自然科学基金(61373135
61401225
61572262
61502246
61672299)
中国博士后科学基金(2015M581844)
江苏省基础研究计划(自然科学基金)(BK20140883
BK20140894
BK20150869)
江苏省博士后科研资助计划项目(1501125B)
南京邮电大学校级科研基金(NY214101
NY215147)