期刊文献+

一种频繁核心项集的快速挖掘算法 被引量:6

A Fast Mining Algorithm for Frequent Essential Itemsets
下载PDF
导出
摘要 传统的频繁核心项集挖掘需多次生成和反复扫描数据库,导致生成效率低下。为此,提出一种快速生成频繁核心项集算法FMEP。该算法使用Rymon枚举树作为搜索空间,并采用分而治之的策略选择特定的路径进行剪枝。利用频繁核心项集特有的反单调性质,可以快速地判断某一个候选项集是否为频繁核心项集,而无需和所有直接子集的析取支持度进行比较。通过上述方法,可以达到快速挖掘的目的。实验结果证明,该算法能够在挖掘出所有的频繁核心项集精简表示元素的同时,降低消耗时间,与MEP算法相比,在密集型数据集上的时间可缩短2倍以上,在稀疏型数据集上时间至少缩短30%。 Traditional frequent essential itemsets mining requires generating candidate itemsets and scanning database many times, which leads to the lower efficiency generation. Motivated by this, a fast algorithm of mining frequent essential itemsets is proposed. This algorithm uses Rymon enumeration tree as the strategy of space search and divide-and-conquer, meanwhile, it selects particular paths for pruning. It uses frequent essential itemsets unique properties to quickly determine whether a candidate itemset is a frequent essential itemset, without comparing with disjunctive support of all direct subsets. It is beneficial for quick mining. Experimental results show that this algorithm can correctly get all elements of frequent essential itemsets concise representation, and highly reduce the time consumption. It can reduce 2 times in dense datasets while reduce the time consumption in sparse datasets by 30% at least.
作者 田卫东 纪允
出处 《计算机工程》 CAS CSCD 2014年第6期120-124,共5页 Computer Engineering
基金 国家自然科学基金资助项目(60603068)
关键词 数据挖掘 频繁项集 精简表示 频繁核心项集 Rymon枚举树 data mining frequent itemsets concise representation frequent essential itemsets Rymon enumeration tree
  • 相关文献

参考文献12

  • 1HanJiawei MichelineKamber.数据挖掘概念与技术[M].北京:机械工业出版社,2004..
  • 2李金凤,王怀彬.基于关联规则的网络故障告警相关性分析[J].计算机工程,2012,38(5):44-46. 被引量:9
  • 3Liu Guimei, Li J, Wong L. Positive Borders or Negative Borders: How to Make Lossless Generator Based Represent- ations Concise[C]//Proc. of the 6th SIAM International Conference on Data Mining. [S. l.]: IEEE Press, 2006: 469- 473.
  • 4Calders T, Goethals B. Non-derivable Itemset Mining[J]. Data Mining and Knowledge Discovery, 2007, 14(1): 171-206.
  • 5Pasquier N, Bastide Y, Taouil R. Discovering Frequent Closed Itemsets for Association Rules[C]//Proc. of ICDT’99. [S. l.]: IEEE Press, 1999: 398-416.
  • 6程转流,胡学钢.数据流中频繁闭合模式的挖掘[J].计算机工程,2008,34(16):50-52. 被引量:4
  • 7Bykowski A, Rigtti C. A Condensed Representation of Find Frequent Patterns[C]//Proc. of PDOS’01. [S. l.]: IEEE Press, 2001: 56-63.
  • 8Kryszkiewicz M. Concise Representation of Frequent Patterns Based on Disjunction-free Generators[C]//Proc. of ICDM’01. [S. l.]: IEEE Press, 2001: 305-312.
  • 9Kryszkiewicz M, Gajek M. Concise Representation of Frequent Patterns Based on Generalized Disjunction-free Generators[C]// Proc. of PAKDD’02. [S. l.]: IEEE Press, 2002: 159-171.
  • 10Casali A, Cicchetti R, Lakhal L. Essential Patterns: A Perfect Cover of Frequent Patterns[C]//Proc. of the 7th International Conference on Data Warehousing and Knowledge Discovery. Copenhagen, Denmark: Springer-Verlag, 2005: 428-437.

二级参考文献10

  • 1吴扬扬,陈怀南.基于关联规则的通信网络告警相关性分析模型[J].通讯和计算机(中英文版),2004,1(1):57-63. 被引量:11
  • 2Yoo J S,Shekhars S,Clik M.A Join-less Approach for Co-location Pattern Mining:A Summary of Results[C] //Proc.of ICDM’05.Houston,USA:[s.n.] ,2005.
  • 3Shen Yanguang,Liu Jie,Shen Jing.The Further Development of Weka Based on Positive and Negative Association Rules[C] //Proc.of ICICTA’10.Zhangjiajie,China:[s.n.] ,2010.
  • 4Giannella C, Han Jiawei, Pei Jian, et al. Mining Frequent Patterns in Data Streams at Multiple Time Granularities[C]//Proc. of the NSF Workshop on Next Generation Data Mining. Cambridge, Mass, USA: MIT Press. 2003.
  • 5Manku G S, Motwani R. Approximate Frequency Counts over Streaming Data[C]//Proc. of the 28th lnt'l Conference on Very Large Data Bases. Hong Kong, China: [s. n.], 2002.
  • 6Arasu A, Manku G S. Approximate Counts and Quantiles over Sliding Windows[C]//Proc. of the 23rd ACM Symposium on Principles of Database Systems. Paris, France: ACM Press, 2004.
  • 7Pasquier N, Bastide Y, Taouil R, et al. Discovering Frequent Closed Itemsets for Association Rules[C]//Proc. of the 17th Int'l Conf. on Database Theory. Berlin, German: Springer-Verlag, 1999.
  • 8吴简,李兴明.基于关联规则的分布式通信网告警相关性研究[J].计算机科学,2009,36(11):204-207. 被引量:7
  • 9李春喜,赵雷.一种改进的增量挖掘算法[J].计算机工程,2010,36(24):42-44. 被引量:4
  • 10刘君强,孙晓莹,庄越挺,潘云鹤.挖掘闭合模式的高性能算法[J].软件学报,2004,15(1):94-102. 被引量:19

共引文献22

同被引文献49

  • 1王创新.关联规则提取中对Apriori算法的一种改进[J].计算机工程与应用,2004,40(34):183-185. 被引量:32
  • 2陈凯,冯全源.最大频繁项集的高效挖掘[J].微电子学与计算机,2005,22(8):22-25. 被引量:13
  • 3陈俊杰,崔晓红.基于FP-Tree的频繁闭合项目集挖掘算法的研究[J].计算机工程与应用,2006,42(34):169-171. 被引量:3
  • 4HanJiawei MichelineKamber.数据挖掘概念与技术[M].北京:机械工业出版社,2004..
  • 5JiaweiHan,MichelineKamber.DataMiningCon-ceptsandTechniques.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社.2004:1-261.
  • 6Bayardo R J. Efficiently mining long patterns from databases [ C ]//Proc of the ACM SIGMOD Int Conf on Management of Data. New York: ACM Press, 1998: 85- 93.
  • 7Pasquier N, Bastide Y, Taouil R, et al. Discov- ering frequent closed itemsets for association rules [ C ]//7th ary : 1999 : Intl. Conf. on Database Theory, Janu- 398-416.
  • 8Bastide Y, Taouil R. Pasquire N. Mining frequent patterns with counting inference [ J ]. SIGKDD Explorations,2000, 2 (2) : 66-75.
  • 9Rymon R. Search through Systematic Set Enu- meration [ C ]//Proc of Third Int' 1 Conf. on Principles of Knowledge Representation and Reasoning, 1992:539-550.
  • 10Calders T,Rigotti C,Boulicaut J F.A survey on condensed representations for frequent sets[C] //Constraint-Based Mining and Inductive Databases.Berlin:Springer,2005:64-80.

引证文献6

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部