期刊文献+

不确定性数据流上频繁项集挖掘的有效算法 被引量:14

Efficient Algorithm for Mining of Frequent Itemsets over Uncertain Data Streams
下载PDF
导出
摘要 在很多应用中,不确定性数据都是以流的形式产生,例如传感器网络数据,移动对象跟踪数据等等.已有的基于不确定性数据流的频繁项集挖掘算法往往具有数据流储存结构繁琐、维护困难以及算法的计算量大等缺点.针对这种情况,提出了一种有效的数据结构SRUF-tree用于储存不确定性数据事务流的项集,该结构由全局树SRtree、临时表Table和窗口队列Queue三部分组成,其中全局树压缩着最近窗口容纳的所有的项集,临时表存储着每批项集的信息.基于该结构设计了一种新的算法SRUF-mine,它挖掘流频繁项集时只需要深度遍历全局树,动态维护SRUF-tree结构只需要处理窗口队列中最旧一批项集的临时表.理论和实验结果表明,SRUF-mine算法是一种有效的挖掘不确定性数据流频繁项集的算法,时空效率和扩展性均优于UF-streaming算法. 在很多应用中,不确定性数据都是以流的形式产生,例如传感器网络数据,移动对象跟踪数据等等.已有的基于不确定性数据流的频繁项集挖掘算法往往具有数据流储存结构繁琐、维护困难以及算法的计算量大等缺点.针对这种情况,提出了一种有效的数据结构SRUF-tree用于储存不确定性数据事务流的项集,该结构由全局树SRtree、临时表Table和窗口队列Queue三部分组成,其中全局树压缩着最近窗口容纳的所有的项集,临时表存储着每批项集的信息.基于该结构设计了一种新的算法SRUF-mine,它挖掘流频繁项集时只需要深度遍历全局树,动态维护SRUF-tree结构只需要处理窗口队列中最旧一批项集的临时表.理论和实验结果表明,SRUF-mine算法是一种有效的挖掘不确定性数据流频繁项集的算法,时空效率和扩展性均优于UF-streaming算法.
出处 《计算机研究与发展》 EI CSCD 北大核心 2011年第S3期1-7,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60703111 61070005 61033010) 广东省科技计划项目(2010B080701062) 高校基本科研业务费中山大学青年教师培育项目(11lgpy63)
关键词 不确定数据 数据流 频繁项集 uncertain data data streams frequent itemsets
  • 相关文献

参考文献10

  • 1周傲英,金澈清,王国仁,李建中.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16. 被引量:185
  • 2Agrawal R,Srikant R.Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Databases(VLDB’94) . 1994
  • 3J. Pei,J. Han,H. Lu,S. Nishio,S. Tang,D. Yang.H-Mine: Hyper-structure mining of frequent patterns in large databases. Proceedings of the 2001 International Conference on Data Mining . 2001
  • 4LEUNG C K S,CARMICHAEL C L,HAO Bo-yu.Efficient miningof frequent patterns from uncertain data. Proc of the 17th IEEEInternational Conference on Data Mining Workshops . 2007
  • 5Leung C K S,Hao B.Mining of frequent items from streams of uncertain data. Proc IEEE Computer Society . 2009
  • 6Yun.C,Haixun W.Maintaining Closed Frequent Itemsets over a Stream Sliding Window. The 4th IEEE International Conference on Data Mining . 2004
  • 7Giannella C,Han J,Pei J, et al.Mining frequent Patterns in Data Streams at Multiple Time Granularities. Next Generation Challenges and Future Directions . 2003
  • 8C.C. Aggarwal,Y. Li,J. Wang, et al.Frequent pattern mining with uncertain data. KDD‘09 . 2009
  • 9Green TJ,Tannen V.Models for incomplete and probabilistic information. IEEE Transactions on Knowledge and Data Engineering . 2006
  • 10Abiteboul S,Kanellakis P,Grahne G.On the representation and querying of sets of possible worlds. Proc of the ACM SIGMOD Int Conf on Management of Data . 1987

二级参考文献98

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 3Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
  • 8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
  • 9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
  • 10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献184

同被引文献163

  • 1谈恒贵,王文杰,李克双.频繁项集挖掘算法综述[J].计算机仿真,2005,22(11):1-4. 被引量:6
  • 2谢洁锐,胡月明,刘才兴,刘兰.无线传感器网络的时间同步技术[J].计算机工程与设计,2007,28(1):76-77. 被引量:9
  • 3王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量:33
  • 4Jin Che-Qing, Yi Ke,Chen Lei,Yu Xu,LinXue Min. Slieling Window Top-K Queries on Uncertain Stream. Proceedings of the VLDB Endowment, 2008, 1(1):301-312.
  • 5G.Cormade,M.Garofalakis. Sketching Probabilistic Data Stre- am. Proceeding of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing, 2007:281-292.
  • 6T.S.Jayram,S.kale,E.Vee. Efficient Aggregation Algorithms for Probabilistic Data. Proceeding of the 18th Annual ACM- SIAM Symposium on Discrete Algorithms New-Orleans,2007: 346-355.
  • 7D.Pfoser, C.S.Jensen. Capturing the Uncertainty of Moving- Object Representation. In SSD,1999:111-132.
  • 8G.Trajcevski, O.Wolfson. Managing Uncertain Trajectories of Moving Objects with DOMINO. In ICEIS,2002:217-224.
  • 9G.Trajcerski. Probabilistic Range Queries in Moving Objects Databases with Uncertainty. In MobiDE,2003:39-45.
  • 10Y.Tao,R.Cheng,X.Xiao. Indexing Multi-Dimensional Uncer- tain Data with Arbitrary Probability Density Function. In VLDB 2005:922-933.

引证文献14

二级引证文献69

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部