期刊文献+

不确定数据上两种查询的分布式聚集算法 被引量:11

Distributed Aggregations for Two Queries over Uncertain Data
下载PDF
导出
摘要 不确定数据查询技术在军事、金融、电信等领域中起到了越来越重要的作用.不确定性数据在传感器网络、分布式Web Server及P2P系统等分布式系统中广泛存在.从这些系统中收集所有数据进行集中式查询将带来巨大的通信开销、时间延迟和存储代价.同时,由于不确定数据的特点,大多数集中式不确定查询算法在分布式环境下并不适用.给出不确定数据的最大值和Top-k聚集查询定义,并分别提出了基于过滤策略的分布式聚集算法.算法根据给出的3个过滤策略,利用数据的分布区间和概率进行筛选概率上限的计算,尽可能将不影响查询结果的数据抛弃.同时,算法以相对较小的代价归并保存并传输了计算最终查询结果所需要的"不可丢弃"数据.实验结果表明,在各类系统和数据条件下,过滤算法都能够正确地得到查询结果并显著降低系统的数据通信开销. The technique of querying uncertain data is playing an increasingly important role in the fields of military affairs,finance,and telecom.Actually,a lot of uncertain data is generated in distributed systems such as wireless sensor networks,distributed Web servers and P2P systems.Collecting all the uncertain data from such a system to perform centralized queries will lead to immense communication cost,time delay and storage cost.Also,most centralized query methods are not applicable to distributed systems due to the features of uncertain data.In this paper,distributed uncertain Max (Min) and Top-k queries are defined,and designed distributed aggregation algorithms are designed based on several filtering strategies.The algorithms compute the upper bound of the candidate probability for each data entry using the uncertain ranges and probability distributions of the data,and filter the data entries as much as possible,which have no chance to influence the query result.Also,the algorithms merge and keep a relatively small amount of necessary data in the transmission,to compute the final results.The correctness of the filtering strategies has been proved.Simulations show that the algorithms can process the queries correctly and reduce communication cost efficiently with various data and system circumstances.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第5期762-771,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60773068 60703012 60773063) 国家"九七三"重点基础研究发展计划基金项目(2006CB303000) 国家自然科学基金重点项目(60533110) 黑龙江省青年科技专项基金项目(QC06C033) NSFC/RGC联合科研基金项目(60831160525)~~
关键词 不确定数据 分布式聚集 TOP-K查询 过滤策略 传感器网络 uncertain data distributed aggregation Top-k query filtering strategy sensor networks
  • 相关文献

参考文献12

  • 1周傲英,金澈清,王国仁,李建中.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16. 被引量:185
  • 2Singh S,Mayfield C,Shah R,et al.Query selectivity estimation for uncertain data[C] //Proc of the 20th Int Conf on Scientific and Statistical Database Management (SSDBM'08).New York:Springer,2008:61-78.
  • 3Abiteboul S,Kanellakis P,Grahne G.On the representation and querying of set of possible worlds[J].ACM SIGMOD Record,1987,16(3):34-48.
  • 4Cheng R,Kalashnikov D V,Prabhakar S.Evaluating probabilistic queries over imprecise data[C] //Proc of the 2003 ACM SIGMOD Int Conf on Management of Data (SIGMOD'03).New York:ACM,2003:551-562.
  • 5Soliman M A,Ilyas I F,Chang K C -C.Top-k query processing in uncertain databases[C] //Proc of the 23rd Int Conf on Data Engineering (ICDE'07).Piscataway,NJ:IEEE,2007:896-905.
  • 6Hua M,Pei J,Zhang W,et al.Ranking queries on uncertain data:A probabilistic threshold approach[C] //Proc ACM Int Conf on Management of Data (SIGMOD'08).New York:ACM,2008:673-686.
  • 7Jin Che-Qing,Yi Ke,Chen Lei,et al.Sliding-window top-k queries on uncertain streams[J].Proc of VLDB Endowment,2008,1(1):301-312.
  • 8Cheng R,Chen J,Mokbel M,et al.Probabilistic Verifiers:Evaluating constrained nearest-neighbor queries over uncertain data[C] //Proc of Int Conf on Data Engineering (ICDE 2008).Piscataway,NJ:IEEE,2008:973-982.
  • 9Qi Y,Singh S,Shah R,et al.Indexing probabilistic nearest-neighbor threshold queries[C] //Proc of the 1st Workshop on Management of Uncertain Data (MUD'08).Enschede,Netherlands:Centre for Telematics and Information Technology,2008:87-102.
  • 10George Beskales,Mohamed A.Soliman,Ihab F.Ilyas.Efficient search for the top-k probable nearest neighbors in uncertain databases[J].Proc of VLDB Endowment,2008,1(1):326-339.

二级参考文献98

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 3Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 4Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 5Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 6Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 7Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81
  • 8Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502
  • 9Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66
  • 10Zimanyi E. Query evaluation in probabilistic databases. Theoretical Computer Science, 1997, 171(1-2): 179-219

共引文献184

同被引文献211

  • 1陈爱东,刘国华,费凡,周宇,万小妹,貟慧.满足均匀分布的不确定数据关联规则挖掘算法[J].计算机研究与发展,2013,50(S1):186-195. 被引量:18
  • 2李再进,谢勇,邬方,王红卫.物联网中PML服务器的设计和实现[J].物流技术,2004,23(11):80-82. 被引量:7
  • 3陈明,杨广文,刘学铮,史树明,王鼎兴.面向点对点的安全可靠存储系统[J].软件学报,2005,16(10):1790-1797. 被引量:4
  • 4张冬冬,李建中,王伟平,郭龙江.数据流历史数据的存储与聚集查询处理算法[J].软件学报,2005,16(12):2089-2098. 被引量:17
  • 5高宏 张炜.不确定图数据管理研究现状[J].中国计算机学会通讯,2009,5(4):31-36.
  • 6Michalis P, Francesco B, Aristides G. k -Nearest neighbors in uncertain graphs [J]. Proc of the VLDB Endowment, 2010, 3(1): 997-1008.
  • 7Hintsanen P, Toivonen H. Finding reliable subgraphs from large probabilistie graphs [J]. Data Mining and Knowledge Discovery, 2008, 17(1) :3-23.
  • 8Zou Zhaonian, Li Jianzhong, Gao Hong, et al. Frequent subgraph pattern mining on uncertain graph data [C] // Proc of the ACM Int Conf on Information and Knowledge Management. New York: ACM, 2009 : 583-592.
  • 9Zou Zhaonian, Li Jianzhong, Gao Hong, et al. Finding top-k maximal cliques in an uncertain graph [C] //Proc of the Annual IEEE Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2010:649-652.
  • 10Zou Zhaonian, Gao Hong, Li Jianzhong. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics [C] //Proc of the ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2010:633-642.

引证文献11

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部