期刊文献+

基于候选项集剪枝的Apriori算法的研究 被引量:4

Study of Apriori based on candidate itemsets pruning algorithm
下载PDF
导出
摘要 在大数据挖掘中使用经典Apriori算法时,会产生大量的候选集,并需要扫描数据库中所有数据,使得其在实现过程中效率大大降低。为了提高Apriori算法的应用效率,通过建立线性链表记录每个事务中的项数,以达到事务压缩的目的;设置up值来提高1-项频繁集组合的门槛,发现支持度比较大的2-项候选集,以达到剪枝的目的;通过实验来设置up的取值范围,使最终产生的频繁项集的误差能够在接受的范围。实验证明提出的改进方法可以在找出绝大部分关联规则的同时,提高算法运行的效率。 Using classic Apriori algorithm in large data mining often generates a large number of candidate sets and needs to scan all of the data in the database,thus greatly reducing the implementation process efficiency.This paper presents an improved Apriori algorithm based on candidate itemsets pruning algorithm.To achieve the purpose of the transaction compression,a linear list was established to record each transaction; UP value was set to raise the threshold of 1-item frequent sets combination and 2-item candidate set that can achieve the purpose of pruning was found; the value range of UP was set through experiments and finally the error of frequent item sets is within the acceptable range.The experimental results showed that the new algorithm performance has been significantly improved.
出处 《阜阳师范学院学报(自然科学版)》 2014年第4期79-83,共5页 Journal of Fuyang Normal University(Natural Science)
基金 安徽省重点研究基地项目(SK2012B625)资助 安徽省专业综合改革试点项目(2013zy167) 阜阳师范学院专业综合改革试点项目(2013ZYSD05)
关键词 事务压缩 候选集剪枝 关联规则 APRIORI算法 transaction compression candidate sets pruning association rules Apriori algorithm
  • 相关文献

参考文献10

  • 1Ilayaraja M,Meyyappan T.Mining medical data to identify frequent diseases using Apriori algorithm[C]//2013 International Conference on Pattern Recognition,Informatics and Mobile Engineering(PRIME),2013:194-199.
  • 2Kantardzic M.数据挖掘:概念、模型、方法和算法[M].王晓海,吴志刚.译.2版.北京:清华大学出版社,2013:1-13.
  • 3Agrawal R,Imielinski T,Swami A.Mining association rules between sets of items in large databases.Proceedings of ACMSIGMOD Conference on Management of Data,1993:207-216.
  • 4刘东洋,刘恩.Apriori算法的改进[J].科学技术与工程,2010,10(16):4028-4031. 被引量:2
  • 5范多锋,徐俊刚.大数据量下的Apriori改进算法及在weka平台的实现[J].电子技术(上海),2012,39(7):1-4. 被引量:4
  • 6薛安荣,王富强,李明.基于Iceberg概念格的最大频繁项集挖掘[J].计算机工程,2011,37(7):35-37. 被引量:4
  • 7Park J S,Chen M S,Yu P S.An effective Hash-based algorithm for mining association rules[C]//Proceedings of ACM SIGMOD International Conference on Management of Data,1995:175-186.
  • 8高海洋,沈强,张轩溢,赵志军.一种基于数据压缩的Apriori算法[J].计算机工程与应用,2013,49(14):117-120. 被引量:6
  • 9苗苗苗,王玉英.基于矩阵压缩的Apriori算法改进的研究[J].计算机工程与应用,2013,49(1):159-162. 被引量:29
  • 10Prashant V,Mandot M.A comparative analysis of various cluster detection techniques for data mining[C]//2014 International Conference on Electronic Systems,Signal Processing and Computing Technologies,2014:357-361.

二级参考文献31

  • 1徐章艳,刘美玲,张师超,卢景丽,区玉明.Apriori算法的三种优化方法[J].计算机工程与应用,2004,40(36):190-192. 被引量:71
  • 2何小东,刘卫国.数据挖掘中关联规则挖掘算法比较研究[J].计算机工程与设计,2005,26(5):1265-1268. 被引量:36
  • 3李超,余昭平.基于矩阵的Apriori算法改进[J].计算机工程,2006,32(23):68-69. 被引量:43
  • 4Agrawal R,Imielinske T,Swami A.Mining association rules between sets of items in large databases.Proc of the ACM SIGMOD International Conference on the Management of Data,Washington D.C,1993;207一216.
  • 5[美]Tan Pang-Ning,Steinbach M,Kumar V.数据挖掘导论.北京:范明,范宏建,等译.人民邮电出版社,2006.
  • 6Burdick D,Calimlim M,Flannick J,et al.MAFIA:A Maximal Frequent Itemset Algorithm[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1490-1504.
  • 7Selvan R.Memory Efficient Mining of Maximal Itemsets Using Order Preserving Generators[J].International Journal of Recent Trends in Engineering,2009,9(6):372-276.
  • 8Martin B.Eklund P Form Concepm to Concept Lattice:A Border Algorithm for Making Covers EcplicitlMl.Berlin,Germany:Springer-Verlag,2008:78-89.
  • 9Jiawei Hart,Micheline Klimber.数据挖掘概念与技术[M].北京:机械工业出版社,2004.
  • 10郑人杰,殷人昆陶永雷.实用软件工程[M].7版.北京:清华大学出版社,1999:51-53.

共引文献39

同被引文献25

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部