期刊文献+

基于小波概要的并行数据流聚类 被引量:7

Wavelet Synopsis Based Clustering of Parallel Data Streams
下载PDF
导出
摘要 许多应用中会连续不断产生大量随时间演变的序列型数据,构成时间序列数据流,如传感器网络、实时股票行情、网络及通信监控等场合.聚类是分析这类并行多数据流的一种有力工具.但数据流长度无限、随时间演变和大数据量的特点,使得传统的聚类方法无法直接应用.利用数据流的遗忘特性,应用离散小波变换,分层、动态地维护每个数据流的概要结构.基于该概要结构,快速计算数据流与聚类中心之间的近似距离,实现了一种适合并行多数据流的K-means聚类方法.所进行的实验验证了该聚类方法的有效性. In many real-life applications, such as stock markets, network monitoring, and sensor networks, data are modeled as dynamic evolving time series which is continuous and unbounded in nature, and many such data streams concur usually. Clustering is useful in analyzing such paralleled data streams. This paper is interested in grouping these evolving data streams. For this purpose, a synopsis is maintained dynamically for each data stream. The construction of the synopsis is based on Discrete Wavelet Transform and utilizes the amnesic feature of data stream. By using the synopsis, a fast computation of approximate distances between streams and the cluster center can be implemented, and an efficient online version of the classical K-means clustering algorithm is developed. Experiments have proved the effectiveness of the proposed method.
出处 《软件学报》 EI CSCD 北大核心 2010年第4期644-658,共15页 Journal of Software
基金 国家自然科学基金Nos.60803021 60973047 浙江省自然科学基金No.Y1091189 宁波市自然科学基金Nos.2007A610007 2009A610072~~
关键词 聚类 概要 遗忘特性 离散小波变换 数据流 clustering synopsis amnesic feature discrete wavelet transform data stream
  • 相关文献

参考文献19

  • 1Keogh E,Kasetty S.On the need for time series data mining benchmarks:A survey and empirical demonstration.Data Mining and Knowledge Discovery,2003,7(4):349-371.[doi:10.1023/A:1024988512476].
  • 2Guha S,Meyerson A,Mishra N,Motwani R,O'Callaghan L.Clustering data streams:Theory and practice.IEEE Trans.on Knowledge and Data Engineering,2003,15(3):515-528.[doi:10.1109/TKDE.2003.1198387].
  • 3Aggarwal CC,Han J,Wang J,Yu PS.A framework for clustering evolving data streams.In:Johann CF,Peter CL,Serge A,Michael JC,Patricia GS,Andreas H,eds.Proc.of the 29th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann Publishers,2003.81-92.
  • 4Charikar M,O'Callaghan L,Panigrahy R.Better streaming algorithms for clustering problems.In:Proc.of 35th ACM Symp.on Theory of Computing.New York:ACM Press,2003.30-39.http://doi.acm.org/10.1145/780542.780548.
  • 5Beringer J,Hullermeier E.Online clustering of parallel data streams.Data & Knowledge Engineering,2006,58(2):180-204.[doi:10.1016/j.datak.2005.05.009].
  • 6Matias Y,Vitter JS,Wang M.Wavelet-Based histograms for selectivity estimation.In:Tiwary A,Franklin M,eds.Proc.of the 1998 ACM SIGMOD Int'l Conf.on Management of Data.New York:ACM Press,1998.448-459.
  • 7Boggess A,Narcowich FJ,Wrote; Rui GS,et al.,Trans.A First Course in Wavelets with Fourier Analysis.Beijing:Publishing House of Electronics Industry,2004 (in Chinese).
  • 8Gilbert AC,Kotidis Y,Muthukrishnan S,Strauss M.One-Pass wavelet decompositions of data streams.IEEE Trans.on Knowledge and Data Engineering,2003,15(3):541-554.[doi:10.1109/TKDE.2003.1198389].
  • 9Guha S,Kim C,Shim K.XWAVE:Approximate extended wavelets for streaming data.In:Nascimento MA,Ozsu MT,Kossmann D,Miller RJ,Blakeley JA,Schiefer KB,eds.Proc.Of the 30th Int'l Conf.On Very Large Data Bases.Toronto:Morgan Kaufmann Publishers,2004.288-299.
  • 10Guha S,Harb B.Wavelet synopsis for data streams:Minimizing non-euclidean error.In:Grossman RL,Bayardo R,Bennett K,Vaidya J,eds.Proc.of the 11th ACM SIGKDD Int'l Conf.on Knowledge Discovery in Data Mining.New York:ACM Press,2005.88-97.

同被引文献52

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部