挖掘正相关的频繁项集

Mining positively correlated frequent itemsets

下载PDF

导出

摘要在由频繁项集产生关联规则时,利用提升度判断规则前、后件之间的正相关性可以避免产生一些无意义的关联。但是,这并不能保证规则前、后件中的项是正相关的,也不能减少挖掘频繁项集的时间开销。当规则的前件或后件存在负相关的项时,仍然可能产生无意义的关联规则。针对以上问题,基于数学期望,提出了正相关的频繁项集的概念,并改进了一种直接在FP-树中挖掘频繁项集的算法,挖掘出正相关的频繁项集,从而有效地解决以上问题。实验表明,该算法可以大幅度地减少所产生的频繁项集数量,显著地降低了挖掘频繁项集的时间开销。对于大型数据集,尤其是稠密型数据集,该算法具有良好的性能。 Some uninteresting rules can be avoided by using the lift to judge the positive .correlation between the antecedent and the consequent of an association rule, while generating association rules from frequent itemsets. However, neither can it ensure items in the antecedent or the consequent of a rule are positively correlated; nor can it reduce the time of mining frequent itemsets. An association rule is still uninteresting if there are negatively correlated items in its antecedent or consequent. So, this paper, based on mathematical expectation, brought up the concept of the positively correlated frequent itemsets, and improved the algorithm of mining frequent itemsets directly in FP-tree to mine positively correlated frequent itemsets. In this way, a solution to the above mentioned problems was got. Experiments show that this algorithm can decrease the number of generated frequent itemsets largely, and reduce the time consumption of mining frequent itemsets evidently. This algorithm has an excellent performance in big database, especially in dense database mining.

作者王春凯李睿楠范明

机构地区郑州大学信息工程学院

出处《计算机应用》 CSCD 北大核心 2007年第1期108-110,142,共4页 journal of Computer Applications

基金河南省自然科学基金资助项目(0211050100)

关键词关联规则频繁项集 FP-树正相关 association rules frequent itemsets FP-tree positive correlation

分类号 TP311.131 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1BRIN S,MOTWANI R,SILVERSTEIN C.Beyond market basket:generalizing association rules to correlations[A].Proceeding SIGMOD[C].1997.
2HAN JW,KAMBER M.数据挖掘:概念与技术[M].范明,孟小峰,等译.北京:机械工业出版社,2001.149-184.
3TAN PN,KUMAR V,SRIVASTAVA J.Selecting the right interestingness measure for association patterns[A].Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].2002.
4HAN JW,PEI J,YIN YW.Mining frequent patterns without candidate generation[A].Proceeding SIGMOD[C].2000.
5范明,王秉政.一种直接在Trans-树中挖掘频繁模式的新算法[J].计算机科学,2003,30(8):117-120. 被引量：10
6Frequent itemset mining dataset repository[EB/OL].http://fimi.cs.helsinki.fi/data/,20006-03-10.
7WU XD,ZHANG CQ,ZHANG SC.Efficient mining of both positive and negative association rules[J].ACM Transactions on Information Systems,2004,22(3):381-405.
8XIONG H,TAN PN,KUMAR V.Mining hyperclique patterns with confidence pruning[R].Technical Report 03-006.Department of computer science,University of Minnesota,2003.
9XIONG H,SHEKHAR S,TAN PN,et al.Exploiting a support-based upper bound of pearson's correlation coefficient for efficiently identifying strongly correlated pairs[A].Proceeding ACM SIGKDD[C].2004.
10CORVALHO D,FREITAS A,EBECKEN N.Evaluating the correlation between objective rule interestingness measures and real human interest[A].Proceeding PKDD[C].2005.

二级参考文献10

1Agrawal R, Srikant R. Fast algorithms for Mining association rules. In:Proc 1994 Int'l Conf on Very Large Data Bases,Sept.1994- 487-499.
2Park J S,Chen M S. Yu P S. An effective hash-based algorithm for mining association rules. In: Proc 1995 ACM-SIGMOD Int'l Conf on Management of Data, May 1995. 175-186.
3Brin S,Motwani R ,Silverstein C. Beyond market basket: Generalizing association rules to correlations. In: Proc 1997 ACM-SIGMOD Int'l Conf on Management of Data, May 1997. 265-276.
4Agrawal R,Srikant R. Mining sequential patterns, In ICDE'95, pages 3-14.
5Dong G, Li J. Efficient mining of emerging patterns : Discovering trends and differences. In: Proc of the fifth ACM SIGKDD Intl Conf on Knowledge Discovery and Data Mining, Aug.1999. 43-52.
6Han J, Pel J, Yin Y. Mining frequent patterns without candidate generation. In:Proc 2000 ACM-SIGMOD Intl Conf on Managernent of Data, May 2000. 1-12.
7Bykowski A,Rigotti C. A Condensed Representation to Find Frequent Patterns. In:Proc of the 20th ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems (PODS 2001) ,Santa Barbara,CA,USA,ACM Press ,2001. 267-273.
8.[EB/OL].http://www. ics. uci. edu/-mlearn/MLRepository. html,.
9HartJiawei KamberM著范明孟小峰译.效据挖掘:概念与技术[M].机械工业出版社,2001.149-184.
10Han J Kamber M.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..

共引文献11

1肖基毅,邹腊梅,刘丰.频繁项集挖掘算法研究[J].情报杂志,2005,24(11):2-3. 被引量：2
2马海兵,张锦,范颖杰,胡运发.基于静态IS-树的频繁模式挖掘[J].模式识别与人工智能,2005,18(6):664-669.
3林丽,冯少荣,薛永生.基于有限个条件FP_树中挖掘频繁模式[J].计算机工程与应用,2007,43(5):175-177.
4孙莉.数据库和数据流频繁项集挖掘算法研究[J].现代机械,2007(5):54-57.
5田保慧.正相关频繁项集的挖掘算法[J].华北水利水电学院学报,2008,29(4):65-67. 被引量：1
6田野,刘大有.改进的Peer-to-Peer环境下的聚类算法[J].吉林大学学报（工学版）,2010,40(6):1639-1643.
7朱士鹏,刘水源,毛蒋兴.广西城市职能分类与调整研究[J].西北师范大学学报（自然科学版）,2011,47(4):114-119. 被引量：5
8马洁.云计算环境下关联规则数据挖掘算法研究[J].重庆工商大学学报（自然科学版）,2012,29(11):36-39. 被引量：4
9刘宁,管涛.云计算下的威胁数据挖掘模型仿真[J].控制工程,2014,21(6):958-961. 被引量：10
10郭玲.可产生潜在威胁的网络数据挖掘模型仿真分析[J].科技通报,2015,31(3):216-219. 被引量：1

1胡俊.基于FP-树的关联规则挖掘算法浅谈[J].硅谷,2010,3(21):175-175. 被引量：1
2陈建良,朱伟兴.蚁群算法优化模糊规则[J].计算机工程与应用,2007,43(5):113-115. 被引量：6
3曾艳,麦永浩.一种高效的频繁模式挖掘算法[J].计算机应用,2004,24(8):57-60. 被引量：1
4无意义的发展[J].计算机光盘软件与应用（COMPUTER ARTS数码艺术）,2004(2):13-13.
5曾艳,麦永浩.基于用户评分的关联规则挖掘协同推荐[J].计算机工程,2005,31(15):87-89. 被引量：3
6马海兵,张锦,范颖杰,胡运发.基于静态IS-树的频繁模式挖掘[J].模式识别与人工智能,2005,18(6):664-669.
7邓有莲,周定康.基于双链项头表的FP-growth改进算法[J].计算机与现代化,2007(4):58-61.
8李文斌,刘椿年,黄佳进.基于数据挖掘的垃圾E-mail过滤方法[J].北京工业大学学报,2003,29(2):237-240. 被引量：7
9张新香.关联规则挖掘在分析型CRM中的应用[J].计算机系统应用,2006,15(4):17-20. 被引量：1
10冯琴荣.粗糙集的期望表示[J].山西师范大学学报（自然科学版）,2007,21(2):24-29. 被引量：1

计算机应用

2007年第1期

浏览历史

内容加载中请稍等...

挖掘正相关的频繁项集

参考文献10

二级参考文献10

共引文献11

相关作者

相关机构

相关主题

浏览历史