期刊文献+

基于滑动窗口的进化数据流聚类 被引量:60

Clustering Evolving Data Streams over Sliding Windows
下载PDF
导出
摘要 提出了纳伪(false positive)和拒真(false negative)两种聚类特征指数直方图分别来支持纳伪误差和拒真误差窗口的聚类分析;然后,提出一种基于滑动窗口的数据流聚类方法.该方法在占用窗口大小的次线性内存空间前提下,及时保存最近数据记录的分布状况,从而实现对滑动窗口内的数据进行聚类.此外,它还可被扩展用于N-n窗口(滑动窗口的扩展模型)的数据聚类.实验采用KDD-CUP’99和KDD-CUP’98真实数据集以及变换高斯分布的人工数据集构造进化数据流.理论分析和实验结果表明,该方法具有良好的聚类质量、较小的内存开销和快速的数据处理能力. To address the sliding window based clustering, two types of exponential histogram of cluster features, false positive and false negative, are introduced in this paper. With these structures, a clustering algorithm based on sliding windows is proposed. The algorithm can precisely obtain the distribution of recent records with limited memory, thus it can produce the clustering result over sliding windows. Furthermore, it can be extended to deal with the clustering problem over N-n window (an extended model of the sliding window). The evolving data streams in the experiments include KDD-CUP'99 and KDD-CUP'98 real data sets and synthetic data sets with changing Gaussian distribution. Theoretical analysis and comprehensive experimental results demonstrate that the proposed method is of high quality, little memory and fast processing rate.
出处 《软件学报》 EI CSCD 北大核心 2007年第4期905-918,共14页 Journal of Software
基金 SupportedbytheNationalNaturalScienceFoundationofChinaunderGrantNos.60496325 60496327(国家自然科学基金)
关键词 进化数据流 聚类 滑动窗口 evolving data stream clustering sliding window
  • 相关文献

参考文献21

  • 1Aggarwal CC,Han J,Wang J,Yu PS.A framework for clustering evolving data streams.In:Freytag JC,Lockemann PC,Abiteboul S,Carey MJ,Selinger PG,Heuer A,eds.Proc.of the Int'l Conf.on Very Large Data Bases.Berlin:Morgan Kaufmann Publishers,2003.81-92
  • 2Chalaghan LO,Mishra N,Meyerson A,Guha S.Streaming data algorithms for high-quality clustering.In:Proc.of the 18th Int'l Conf.on Data Engineering.San Jose,2002.685-694.http://doi.ieeecomputersociety.org/10.1109/ICDE.2002.994785
  • 3Domingos P,Hulten C.Mining high-speed data streams.In:Proc.of the KDD.2000.http://citeseer.ist.psu.edu/domingos00mining.html
  • 4Guha S,Meyerson A,Mishra N,Motwani R,Callaghan LO.Clustering data streams:Theory and practice.IEEE Trans.on Knowledge and Data Engineering,2003,3(15):515-528.
  • 5Guha S,Mishra N,Motwani R,Callaghan LO.Clustering data stream.In:Proc.of the 41st Annual Symp.on Foundations of Computer Science.Redondo Beach:IEEE Computer Society,2000.359-366.
  • 6Nam H,Won S.Statistical grid-based clustering over data streams.SIGMOD Record,2004,33(1):32-37.
  • 7Ordonez C.Clustering binary data streams with k-means.In:Zaki MJ,Aggarwal CC,eds.Proc.of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD).San Diego,2003.12-19.
  • 8Zhou A,Cai Z,Wei L,Qian W.M-Kernel merging:Towards density estimation over data streams.In:Proc.of the 8th Int'l Conf.on Database Systems for Advanced Applications (DASFAA).Kyoto,2003.285-292.
  • 9Aggarwal CC,Han J,Wang J,Yu PS.A framework for projected clustering of high dimensional data streams.In:Nascimento MA,Ozsu MT,Kossmann D,Miller RJ,Blakeley JA,Schiefer KB,eds.Proc.of the VLDB.Toronto:Morgan Kaufmann Publishers,2004.852-863.
  • 10Babcock B,Datar M,Motwani R,Callaghan LO.Maintaining variance and k-medians over data stream windows.In:Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.Principles of Database Systems.San Diego:ACM Press,2003.234-243.

同被引文献527

引证文献60

二级引证文献153

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部