期刊文献+

基于模式增长的不确定数据的频繁模式挖掘算法 被引量:7

Frequent pattern mining algorithm from uncertain data based on pattern-growth
下载PDF
导出
摘要 为提高不确定数据频繁模式(FP)挖掘算法的时空效率,提出了基于最大概率的不确定频繁模式挖掘(UFPM-MP)算法。首先,利用事务项集中的最大概率值预估期望支持数;然后,使用该期望支持数与最小期望支持数阈值进行比较,以确定某一项集是否为候选频繁项集,并对候选项集建立子树以递归挖掘频繁模式。实验中,UFPMMP算法与AT-Mine算法进行了对比,并在6个典型的数据集上进行实验验证。实验结果表明,UFPM-MP算法的时空效率得到了提高,稀疏数据集上提高约30%,稠密数据集上的效率提高更为明显(约3~4倍)。预估期望支持数的策略有效地减少了子树和头表项的数量,从而提高了算法的时空效率;且最小期望支持数越小,或需要挖掘的频繁模式越多的时候,算法的时间效率提高越多。 To improve the time and space efficiency of Frequent Pattern (FP) mining algorithm over uncertain dataset, the Uncertain Frequent Pattern Mining based on Max Probability (UFPM-MP) algorithm was proposed. First, the expected support number was estimated using maximum probability of the transaction itemset. Second, by comparing this expected support number to the minimum expected support number threshold, the candidate frequent itemsets were identified. Finally, the corresponding sub-trees were built for recursively mining frequent patterns. The UFPM-MP algorithm was tested on 6 classical datasets against the state-of-the-art algorithm AT (Array based tail node Tree structure)-Mine with positive results ( about 30% improvement for sparse datasets, and 3 - 4 times more efficient for dense datasets). The expected support number estimation strategy effectively reduces the number of sub-trees and items of header table, and improves the algorithm's time and space efficiency; and when the minimum expected support threshold is low or there are lots of potential frequent patterns, time efficiency of the proposed algorithm performs more remarkably.
出处 《计算机应用》 CSCD 北大核心 2015年第7期1921-1926,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61370200) 宁波市自然科学基金资助项目(2013A610115 2014A610073) 浙江省教育厅一般科研项目(Y201432717) 宁波大红鹰学院大宗商品专项课题(1320133004)
关键词 不确定数据 频繁模式 频繁项集 模式增长 uncertain data Frequent Pattern (FP) frequent itemset pattern-growth
  • 相关文献

参考文献20

  • 1CHUI C-K, KAO B, HUNG E. Mining frequent itemsets from uncertain data [C] // PAKDD 2007: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 4426. Berlin: Springer, 2007: 47-58.
  • 2WANG L, CHEUNG D W, CHENG R, et al. Efficient mining of frequent itemsets on large uncertain databases [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(12): 2170-2183.
  • 3SUN X, LIM L, WANG S. An approximation algorithm of mining frequent itemsets from uncertain dataset [J]. International Journal of Advancements in Computing Technology, 2012, 4(3): 42-49.
  • 4LEUNG C K, CARMICHAEL C L, HAO B. Efficient mining of frequent patterns from uncertain data [C] // ICDM Workshops 2007: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops. Piscataway: IEEE, 2007: 489-494.
  • 5LEUNG C K, MATEO M A F, BRAJCZUK D A. A tree-based approach for frequent pattern mining from uncertain data [C] // PAKDD 2008: Proceedings of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNCS 5012. Berlin: Springer, 2008: 653-661.
  • 6AGGARWAL C C, LI Y, WANG J, et al. Frequent pattern mining with uncertain data [C] // KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 29-37.
  • 7PEI J, HAN J, LU H, et al. H-mine: Hyper-structure mining of frequent patterns in large databases [C]// ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining. Piscataway: IEEE, 2001: 441-448.
  • 8LIN C W, HONG T P. A new mining approach for uncertain databases using CUFP trees [J]. Expert Systems with Applications, 2012, 39(4): 4084-4093.
  • 9LEUNG C K, TANBEER S K. Fast tree-based mining of frequent itemsets from uncertain data [C]// DASFAA 2012: Proceedings of the 17th International Conference on Database Systems for Advanced Applications, LNCS 7238. Berlin: Springer, 2012: 272-287.
  • 10LEUNG C K, TANBEER S K. PUF-tree: a compact tree structure for frequent pattern mining of uncertain data [C]// PAKDD 2013: Proceedings of the 17th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNCS 7818. Berlin: Springer, 2013: 13-25.

二级参考文献61

  • 1刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48(S3):1-7. 被引量:14
  • 2李建中 于戈 周傲英.不确定性数据管理的要求与挑战[J].中国计算机学会通讯,2009,5(4):6-14.
  • 3Aggarwal C C, Yu P S. A framework for clustering uncertain data streams [C] //Proc of the 24th Int Conf on Data Engineering. Los Alamitos, CA: IEEE Computer Society, 2008, 150-159.
  • 4Aggarwal C C. On high dimension projected clustering of uncertain data streams [C] //Proc of the 25th Int Conf on Data Engineering. Los Alamitos, CA: IEEE Computer Society, 2009:1152-1154.
  • 5Zhang Chen, Gao Ming, Zhou Aoying. Tracking high quality clusters over uncertain data streams [C] //Proc of the 1st Workshop on Management and Mining of Uncertain Data (MOUND 2009) Joint with ICDE 2009. Los Alamitos, CA: IEEE Computer Society, 2009 1641-1648.
  • 6Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data[G] // LNAI 4426. Berlin: Springer, 2007: 47-58.
  • 7Chui C K, Kao B. A deeremental approach for mining frequent itemsets from uncertain data [G] // LNCS 5012. Berlin: Springer, 2008:64-75.
  • 8Leung C K S, Mateo M A F, Brajczuk D A. A tree-based approach for frequent pattern mining from uncertain data [G] // LNCS 5012. Berlin.. Springer, 2008.. 653-661.
  • 9Leung C K S, Carmichael C L, Hao B. Efficient mining of frequent patterns from uncertain data [G] // LNAI 4426. Berlin: Springer, 2007:489-494.
  • 10Leung C K S, Brajezuk D A. Efficient algorithms for mining constrained frequent patterns from uncertain data [C ]/Proc of SIGKDD Workshop on Knowledge Discovery from Uncertain Data. New York: ACM, 2009:9-18.

共引文献33

同被引文献24

引证文献7

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部