期刊文献+

基于滑动窗口密度聚类的数据流偏倚采样算法 被引量:2

Bias Sampling Data Stream Based on Sliding Window Density Clustering Algorithm Research
下载PDF
导出
摘要 对于移动计算领域的移动对象轨迹数据流的管理,最普遍采用的技术手段是采样技术,而传统的均匀采样易丢失一些关键的变化数据,造成信息丢失现象。针对这一问题,提出一种基于概率密度聚类的数据流偏倚采样算法。该算法在滑动窗口模型下,充分利用了轨迹数据流自身的分布特性,结合偏倚采样算法思想克服了均匀采样的数据丢失问题。算法首先采用基于数据存在密度的聚类技术将滑动窗口划分为强簇、弱簇和过度簇,然后针对不同的簇给予不同的采样率,进行偏倚采样,进而得到最终的数据流摘要。经过实际数据集的实验检测,证明算法较好地保证了采样质量,并具有较快的数据处理能力。 In management of the mobile object trajectory data stream in the field of mobile computing, the most com- monly used technical means is sampling techniques, but the traditional uniform sampling is easy to lose some of the key changes in data, resulting in the phenomenon of loss of information. To solve this problem, we proposed a data stream based on the probability density clustering bias sampling algorithm. The algorithm in a sliding window model, makes full use of the distribution of characteristics of the the trajectory data stream itself, combines a bias sampling algorithm ideo- logy to overcome uniformly sampled data loss problems. Firstly the sliding window is divided into a strong cluster clus- tering techniques based on density data exists, weak clusters and excessive cluster, and then different sampling rates for different clusters biased sampling are given, thereby to obtain a final summary of the data stream. The experimental tes- ting results of the set of actual data show that the algorithm ensures the sampling quality and has faster data processing capability.
出处 《计算机科学》 CSCD 北大核心 2013年第9期254-256,269,共4页 Computer Science
基金 辽宁省计划项目基金(2012232001) 辽宁省自然科学基金(201202119)资助
关键词 轨迹数据流 滑动窗口 密度聚类 偏倚采样 Trajectory data stream, Sliding window, Density clustering, Bias sampling
  • 相关文献

参考文献8

二级参考文献67

  • 1李存华,孙志挥,陈耿,胡云.核密度估计及其在聚类算法构造中的应用[J].计算机研究与发展,2004,41(10):1712-1719. 被引量:64
  • 2常建龙,曹锋,周傲英+.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905-918. 被引量:61
  • 3KishL著 倪加勋译.抽样调查[M].中国统计出版社,1997..
  • 4Toivonen H. Sampling large databases from association rulesff VLDB'96. 1996
  • 5Chen B, Haas P, Scheuermann P. New Two - phase Sampling - based Algorithm for Discovering Association Rules//SIGKDD'02. 2002
  • 6Olken F, Rotem D, Xu Ping. Random sampling from hash files// Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, ACM Press, 1990:375-386
  • 7Guha S,Rastogi R,Shim K. CURE: An Efficient Clustering Algorithm for Large Databases//Proc. ACM SIGMOD Conf.June 1998 : 73-84
  • 8Knorr E, Ng R. A unified notion of outliers:Properties and computation//Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining(KDD'97). Newport Beach,CA. Aug. 1997:219 -222
  • 9Motwani R, Raghavan P. Randomizeed Algorithms. Cambridge University Press, 1995
  • 10Poosala V,Ioannidis Y. Selectivity Estimation Without the Attribute Value Independence Assumption//Proc. Very Large Data Bases Conf.. Aug. 1997:486-495

共引文献84

同被引文献14

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部