期刊文献+

一种基于滑动窗口的不确定数据流Top-K查询算法 被引量:2

A Top-K queries algorithm for uncertain data streams based on sliding-window
下载PDF
导出
摘要 由于不确定数据流在诸如移动计算、无线射频识别技术和传感器网络等实际应用中广泛存在,如何利用有限存储空间进行快速查询处理是不确定数据流管理的重要问题.本文研究基于滑动窗口模型的不确定数据流Top-K查询的问题,提出了相应的算法.该算法利用滑动窗口数据模型存储不确定流数据,建立3个概要表,当前窗口中的元组分别按照它们出现的顺序、它们的得分值的大小、它们的出现概率值的大小存入这3个表中.算法逐次在得分值最高的前若干个元组中选取概率值最高的前k项元组集合,并计算它们的发生概率.我们在理论上证明了,这些前k项元组集合中概率最高的就是Top-K查询结果.实验结果表明,所提出的查询算法在时间与空间复杂性方面优于其他类似的算法. Due to the existence of uncertain data streams in wide spectrum of real-world applications,such as mobile computing, radio frequency identification technology and wireless sensor networks, uncertain data streams management has become an important problem in stream data mining. This paper tackles the problem of answering maximal probabilistic Top-K tuple set (MPTopKTS) queries on uncertain data streams based on a sliding-window model. We present an algorithm for processing sliding-window MPTopKTS queries on uncertain data streams. Based on the sliding-window model,we designed three synopses table to process each tuple which contains data item 3c, score item f(x) ,and existential probability p(x). The tuples are stored in the tables according to their arrival times, their scores, and their probabilities respectively. The algorithm selects the k tuples with the highest probabilities from the sets of different numbers of the tuples with the highest scores. After that, the algorithm computes existential probability of the Top-K tulpes,and chooses the one with the highest probability as the answer of MPTopKTS. We theoretically proved the correctnesss of the algorithm presented. Our experimental results show that our algorithm requires lower time and space complexity than other similar algorithms.
出处 《南京大学学报(自然科学版)》 CSCD 北大核心 2012年第3期351-359,共9页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61070047) 江苏省自然科学基金(BK2008206)
关键词 不确定数据 数据流 TOP-K查询 滑动窗口 uncertain data, data streams, Top-K queries, sliding-window
  • 相关文献

参考文献17

  • 1Gao Y. Process of data mining in china. Journal of Nanjing University(Natural Sciences), 2011, 47(4):351-353.
  • 2Soliman M A, Iiyas I F, Chang K C. Top-k query processing in uncertain database. Proceedings of the 23re IEEE International Conference on Data Engineering, Istanbul, 2007,896 - 905.
  • 3Hua M,Pei J,Zhang W J,et al. Efficiently an swering probabilistic threshold Top K queries on uncertain data. Proceedings of the 24th IEEE International Conference on Data Engineering, Washington, 2008,1403 - 1405.
  • 4Jin C Q, Yi K, Chen L, et al. Sliding-window Top-K queries on uncertain streams. Proceed- ings of the International Conference on Very Large Data Bases, Endowment, 2008,301 - 312.
  • 5Cormode G,Li F F, Yi K. Semantics of ranking queries for probabilistic data and expect ranks. Proceedings of the 25th IEEE International Con- ference on Data Engineering, Washington, 2009, 305-316.
  • 6Jestes J,Cormode G, Li F F,et al. Semantics of ranking queries for probabilistic data. IEEE Transactions on Knowledge and data Data Engi- neering,2011,23(12) :1903-1917.
  • 7Li J,Saha B, Deshpande A. A unified approach to ranking in probabilistic database. Journal on Very Large Data Bases,2011,20(2):249-275.
  • 8Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems. Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System, Madison, 2002,1 - 16.
  • 9Parisa H, Sebastian M, Karl A. Evaluating Top- K queries over incomplete data streams. Pro- ceedings of the 18'h ACM Conference on Infor- mation and Knowledge Management, New York, 2009,877 - 886.
  • 10Kawashima H, Kitagawa H, Li X. Complex event processing over uncertain data streams. Proceedings of the International Conference on P2P, Parallel, Grid, Cloud and Internet Compu- ting, Washington, 2010,521 -526.

同被引文献32

  • 1Silberstein A, Braynard R, Ellis C, et al. A sampling based ap- proach to optimizing top-k queries in sensor networks. Proceedings of the 22nd International Conference on Data Engineering (ICDE), At- lanta, GA, USA, 2006. Washington, DC, USA : IEEE Computer Society, 2006:68-80.
  • 2Kawashima H, Kitagawa H,Li X. Complex event processing over un- certain data streams. Proceedings of the International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Washington, 2010 : 521-526.
  • 3Leung C K S, Hao B Y, Jiang F. Constained frequent itemset mining from uncertain data streams. Proceedings of the IEEE 26thinternational conference on data engineering workshops, Long Beach,2010 : 120-127.
  • 4Xi Zhang, Chomicki J. Semantics and evaluation of top-k queries in probabilistic databases. Distributed and Parallel Databases, 2009 ; 26 ( 1 ) : 67-126.
  • 5Soliman M A, Iiyas I F, Chang K C. Top-kquery processing in un- certain database. Proceedings of the 23rd IEEE International Confer- ence on Data Engineering, Istanbul, 2007:896-905.
  • 6Hua M, Pei J, Zhang W J, et al. Efficiently answering probabilistic threshold Top-K queries on uncertain data. Proceedings of the 24th IEEE International Conference on Data Engineering, Washington, 2008 ; 1403-1405.
  • 7江贺,张宪超,陈国良.图的二分问题唯一全局最优解实例与骨架计算复杂性[J].科学通报,2007,52(17):2077-2081. 被引量:3
  • 8Nikos P, Ioannis K, Evangetos E,et al. Clustering uncertain trajectories. Knowledge Infomation System,2011,28:117-147.
  • 9Steffen F, Wolfgang G, Michael K. Recurrent neural networks for fuzzy data. Integrated Computer-Aided Engineering, 2011, 18: 265-280.
  • 10Alastair C,Jana E, Chris J,et al. A state space approach to extracting the signal from uncetain data. Journal of Business and Economic Statistics, 2012,30 (2) - 173 - 180.

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部