期刊文献+

一种有效的不确定数据概率频繁项集挖掘算法 被引量:8

Efficient mining probabilistic frequent itemset in uncertain databases
下载PDF
导出
摘要 针对PFIM算法中频繁概率计算方法的局限性,且挖掘时需要多次扫描数据库和生成大量候选集的不足,提出EPFIM(efficient probabilistic frequent itemset mining)算法。新提出的频繁概率计算方法能适应数据流等项集的概率发生变化时的情况;通过不确定数据库存储在概率矩阵中,以及利用项集的有序性和逐步删除无用事物来提高挖掘效率。理论分析和实验结果证明了EPFIM算法的性能更优。 The way to calculate the frequentness probability in PFIM limited its applications, it needed to scan the database for many times and generated a large number of candidate sets. This paper proposed a new algorithm named EPFIM. First, the new method of calculating the frequentness probability made it easier to update frequentness probability of itemset, and could be adapted in more situations. Second, it used uncertain probability matrix to store the database in order to scan database less. In addition, the sequence of items and deleting unwanted transactions gradually improved efficiency of mining. Theoretical analysis andexperimental results show EPFIM performances better.
出处 《计算机应用研究》 CSCD 北大核心 2012年第3期841-843,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(61163015) 教育部"春晖计划"基金资助项目(Z2009-1-01024)
关键词 不确定数据 可能世界 期望支持度 概率频繁项集 uncertain databases possible word expected support probabilistic frequent itemset
  • 相关文献

参考文献10

  • 1周傲英,金澈清,王国仁,李建中.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16. 被引量:185
  • 2CI-IUI C K, KAO Ben, HUNG E. Mining frequent itemsets from un- certain data [ C ]//Proc of the 11 th Pacific-Asia Conference on Knowl- edge Discovery and Data Mining. Berlin: Springer-Verlag, 2007: 47- 58.
  • 3CHUI C K, KAO Ben. A detrimental approach for mining frequent itemsets from uncertain data [ C ]//Proc of the 12th Pacific-Asia Con- ference on Knowledge Discovery and Data Mining. Berlin: Springer- Verlag, 2008 : 64 - 75.
  • 4LEUNG C K S, CARMICHAEL C L, HAO Bo-yu. Efficient mining of frequent patterns from uncertain data [ C ]//Proc of the 17th IEEE International Conference on Data Mining Workshops. 2007:489-494.
  • 5高聪 申德荣 于戈.一种基于不确定数据的挖掘频繁集方法.计算机研究与发展,2008,:71-76.
  • 6BERNECKER T, KRIEGEL H P, RENZ M, et al. Probabilistie fre- quent itemset mining in uncertain databases [ C ]//Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York : ACM Press ,2009 : 119-127.
  • 7王爽,杨广明,朱志良.基于不确定数据的频繁项查询算法[J].东北大学学报(自然科学版),2011,32(3):344-347. 被引量:10
  • 8YI Ke, LI Fei-fei, KOLLIOS, et al. Efficient processing of top-k queries in uncertain databases [ C ]//Proc of the 24th International Conference on Data Engineering. Washington DC : IEEE Computer So- ciety ,2009 : 1406-1408.
  • 9WITTEN I H, FRANK E. Data mining: practical machine tools and techniques[M].北京:机械工业出版社,2006:202-204.
  • 10Han J,Kamber M.数据挖掘概念与技术[M].范明,译.北京:机械工业出版社,2007:32-59.

二级参考文献106

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 3Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
  • 8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
  • 9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
  • 10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献213

同被引文献84

  • 1刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48(S3):1-7. 被引量:14
  • 2谈恒贵,王文杰,李克双.频繁项集挖掘算法综述[J].计算机仿真,2005,22(11):1-4. 被引量:6
  • 3谢洁锐,胡月明,刘才兴,刘兰.无线传感器网络的时间同步技术[J].计算机工程与设计,2007,28(1):76-77. 被引量:9
  • 4高聪 申德荣 于戈.一种基于不确定数据的挖掘频繁集方法.计算机研究与发展,2008,:71-76.
  • 5Jin Che-Qing, Yi Ke,Chen Lei,Yu Xu,LinXue Min. Slieling Window Top-K Queries on Uncertain Stream. Proceedings of the VLDB Endowment, 2008, 1(1):301-312.
  • 6G.Cormade,M.Garofalakis. Sketching Probabilistic Data Stre- am. Proceeding of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing, 2007:281-292.
  • 7T.S.Jayram,S.kale,E.Vee. Efficient Aggregation Algorithms for Probabilistic Data. Proceeding of the 18th Annual ACM- SIAM Symposium on Discrete Algorithms New-Orleans,2007: 346-355.
  • 8D.Pfoser, C.S.Jensen. Capturing the Uncertainty of Moving- Object Representation. In SSD,1999:111-132.
  • 9G.Trajcevski, O.Wolfson. Managing Uncertain Trajectories of Moving Objects with DOMINO. In ICEIS,2002:217-224.
  • 10G.Trajcerski. Probabilistic Range Queries in Moving Objects Databases with Uncertainty. In MobiDE,2003:39-45.

引证文献8

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部