期刊文献+

挖掘正相关的频繁项集

Mining positively correlated frequent itemsets
下载PDF
导出
摘要 在由频繁项集产生关联规则时,利用提升度判断规则前、后件之间的正相关性可以避免产生一些无意义的关联。但是,这并不能保证规则前、后件中的项是正相关的,也不能减少挖掘频繁项集的时间开销。当规则的前件或后件存在负相关的项时,仍然可能产生无意义的关联规则。针对以上问题,基于数学期望,提出了正相关的频繁项集的概念,并改进了一种直接在FP-树中挖掘频繁项集的算法,挖掘出正相关的频繁项集,从而有效地解决以上问题。实验表明,该算法可以大幅度地减少所产生的频繁项集数量,显著地降低了挖掘频繁项集的时间开销。对于大型数据集,尤其是稠密型数据集,该算法具有良好的性能。 Some uninteresting rules can be avoided by using the lift to judge the positive .correlation between the antecedent and the consequent of an association rule, while generating association rules from frequent itemsets. However, neither can it ensure items in the antecedent or the consequent of a rule are positively correlated; nor can it reduce the time of mining frequent itemsets. An association rule is still uninteresting if there are negatively correlated items in its antecedent or consequent. So, this paper, based on mathematical expectation, brought up the concept of the positively correlated frequent itemsets, and improved the algorithm of mining frequent itemsets directly in FP-tree to mine positively correlated frequent itemsets. In this way, a solution to the above mentioned problems was got. Experiments show that this algorithm can decrease the number of generated frequent itemsets largely, and reduce the time consumption of mining frequent itemsets evidently. This algorithm has an excellent performance in big database, especially in dense database mining.
出处 《计算机应用》 CSCD 北大核心 2007年第1期108-110,142,共4页 journal of Computer Applications
基金 河南省自然科学基金资助项目(0211050100)
关键词 关联规则 频繁项集 FP-树 正相关 association rules frequent itemsets FP-tree positive correlation
  • 相关文献

参考文献10

  • 1BRIN S,MOTWANI R,SILVERSTEIN C.Beyond market basket:generalizing association rules to correlations[A].Proceeding SIGMOD[C].1997.
  • 2HAN JW,KAMBER M.数据挖掘:概念与技术[M].范明,孟小峰,等译.北京:机械工业出版社,2001.149-184.
  • 3TAN PN,KUMAR V,SRIVASTAVA J.Selecting the right interestingness measure for association patterns[A].Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].2002.
  • 4HAN JW,PEI J,YIN YW.Mining frequent patterns without candidate generation[A].Proceeding SIGMOD[C].2000.
  • 5范明,王秉政.一种直接在Trans-树中挖掘频繁模式的新算法[J].计算机科学,2003,30(8):117-120. 被引量:10
  • 6Frequent itemset mining dataset repository[EB/OL].http://fimi.cs.helsinki.fi/data/,20006-03-10.
  • 7WU XD,ZHANG CQ,ZHANG SC.Efficient mining of both positive and negative association rules[J].ACM Transactions on Information Systems,2004,22(3):381-405.
  • 8XIONG H,TAN PN,KUMAR V.Mining hyperclique patterns with confidence pruning[R].Technical Report 03-006.Department of computer science,University of Minnesota,2003.
  • 9XIONG H,SHEKHAR S,TAN PN,et al.Exploiting a support-based upper bound of pearson's correlation coefficient for efficiently identifying strongly correlated pairs[A].Proceeding ACM SIGKDD[C].2004.
  • 10CORVALHO D,FREITAS A,EBECKEN N.Evaluating the correlation between objective rule interestingness measures and real human interest[A].Proceeding PKDD[C].2005.

二级参考文献10

  • 1Agrawal R, Srikant R. Fast algorithms for Mining association rules. In:Proc 1994 Int'l Conf on Very Large Data Bases,Sept.1994- 487-499.
  • 2Park J S,Chen M S. Yu P S. An effective hash-based algorithm for mining association rules. In: Proc 1995 ACM-SIGMOD Int'l Conf on Management of Data, May 1995. 175-186.
  • 3Brin S,Motwani R ,Silverstein C. Beyond market basket: Generalizing association rules to correlations. In: Proc 1997 ACM-SIGMOD Int'l Conf on Management of Data, May 1997. 265-276.
  • 4Agrawal R,Srikant R. Mining sequential patterns, In ICDE'95, pages 3-14.
  • 5Dong G, Li J. Efficient mining of emerging patterns : Discovering trends and differences. In: Proc of the fifth ACM SIGKDD Intl Conf on Knowledge Discovery and Data Mining, Aug.1999. 43-52.
  • 6Han J, Pel J, Yin Y. Mining frequent patterns without candidate generation. In:Proc 2000 ACM-SIGMOD Intl Conf on Managernent of Data, May 2000. 1-12.
  • 7Bykowski A,Rigotti C. A Condensed Representation to Find Frequent Patterns. In:Proc of the 20th ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems (PODS 2001) ,Santa Barbara,CA,USA,ACM Press ,2001. 267-273.
  • 8.[EB/OL].http://www. ics. uci. edu/-mlearn/MLRepository. html,.
  • 9HartJiawei KamberM著 范明 孟小峰译.效据挖掘:概念与技术[M].机械工业出版社,2001.149-184.
  • 10Han J Kamber M.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..

共引文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部