期刊文献+

基于频繁密度分布模式的不确定数据流查询方法 被引量:2

Query processing on uncertain data stream based on frequency density distribution pattern
下载PDF
导出
摘要 针对当前不确定数据流相似性查询问题中不确定对象建模不准确的问题,提出了一种面向不确定数据流的相似性查询方法 HB-UTS。利用非参数估计方法对不确定数据流中的对象建模,得到不确定对象的密度函数。通过谱聚类方法挖掘密度函数的频繁模式,将挖掘后的模式抽象为语义表示的不确定数据流序列。在相似性查询阶段,通过高阶Markov的状态转移矩阵模型构建不确定数据流的索引结构,它在记录不确定数据流存储地址的同时还记录序列元素的存储概率,可有效提高数据流的分步输入查询效率。本文进行了真实与仿真相结合的方法,通过在随机化处理后的真实数据集上的实验以及与其他相似性查询方法的比较,验证了HB-UTS在处理大规模不确定数据流时较好处理能力以及实施效果。 To solve the defect of inaccurate modeling for uncertain objects in processing similarity query of uncertain data streams,HB-UTS method was proposed. Non-parametric estimation is used to model the uncertain objects to obtain the density function. The frequency pattern of density function is mined by spectral clustering method and the mined object pattern is abstracted as an indefinite semantic data stream sequence. In the similarity query phase,an index structure of the uncertain data stream is constructed by the state transition matrix model of high-order Markov.It also records the storage probability of the sequence elements while recording the storage address of the uncertain data stream to improve the step-by-step input query efficiency of data stream. To analyze the effect of this method in practical problems,a method combining reality and simulation was adopted. By the experiments on the randomized real dataset and comparing with other similarity query methods,it was verified that HB-UTS is very effective in processing large-scale uncertain data stream.
作者 迟荣华 黄少滨 吕天阳 CHI Ronghua;HUANG Shaobin;LYU Tianyang(College of Computer Science and Technology, Harbin Engineer University, Harbin 150001, China;Audit Research Institute of Chinese National Audit Office, Beijing 100073, China)
出处 《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2018年第6期1052-1058,共7页 Journal of Harbin Engineering University
基金 国家自然科学基金重大研究计划(91546110)
关键词 不确定性 数据流 相似性查询 非参数估计 数据挖掘 马尔科夫 uncertainty data stream similarity query non-parametric estimation data mining Markov
  • 相关文献

参考文献2

二级参考文献28

  • 1Babcock B,Babu S,Datar M,Motwani R,Widom J.Models and issues data stream systems.In:Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.
  • 2Aggarwal CC,Han JW,Yu PS.A framework for clustering evolving data streams.In:Proc.of the 29th Int'l Conf.on Very Large Data Bases.Berlin:Morgan Kaufmann Publishers,2003.81-92.
  • 3Aggarwal CC,Yu PS.A framework for clustering uncertain data streams.In:Proc.of the 24th Int'l Conf.on Data Engineering.Cancún:IEEE,2008.150-159.
  • 4Callaghan LO,Mishra N,Meyerson A,Guha S,Motwani R.Streaming-Data algorithms for high-quality clustering.In:Proc.of the 18th Int'l Conf.on Data Engineering.San Jose:IEEE,2002.685-694.
  • 5Zhu WH,Yin J,Xie YH.Arbitrary shape cluster algorithm for clustering data stream.Journal of Software,2006,17(3):379-387 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/17/379.htm[doi:10.1360/jos170379].
  • 6Datar M,Gionis A,Indyk P,Motwani R.Maintaining stream statistics over sliding windows.In:Proc.of the 13th Annual ACM-SIAM Symp.on Discrete Algorithms.San Francisco:ACM,2002.635-644.
  • 7Babcock B,Datar M,Motwani R,Callaghan LO.Maintaining variance and k-medians over data stream windows.In:Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.San Diego:ACM,2003.234-243.
  • 8Cao F,Estery M,Qian WN,Zhou AY.Density-Based clustering over an evolving data stream with noise.In:Proc.of the 6th SIAM Int'l Conf.on Data Mining.Bethesda:SIAM,2006.326-337.
  • 9Kriegel HP,Pfeifle M.Density-Based clustering of uncertain data.In:Proc.of the 11th ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining.Chicago:ACM,2005.672-677.
  • 10Kriegel HP,Pfeifle M.Hierarchical density-based clustering of uncertain data.In:Proc.of the 5th IEEE Int'l Conf.on Data Mining.Houston:IEEE Computer Society,2005.689-692.

共引文献34

同被引文献22

引证文献2

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部