期刊文献+

基于k均值分区的流数据高效密度聚类算法 被引量:8

Efficient Data Stream Clustering Algorithm Based on k-Means Partitioning and Density
下载PDF
导出
摘要 数据流聚类是数据流挖掘研究的一个重要内容,已有的数据流聚类算法大多采用k中心点(均值)方法对数据进行聚类,不能对数据分布不规则以及高维空间数据流进行有效聚类.论文提出一种基于k均值分区的流数据密度聚类算法,先对数据流进行分区做k均值聚类生成中间聚类结果(均值参考点集),随后对这些均值参考点进行密度聚类,理论分析和实验结果表明算法可以有效解决数据分布不规则以及高维空间数据流聚类问题,算法是有效可行的. Data stream clustering is an important issue in data stream mining. Most of the existing algorithms adopted K medians (means) method to solve this problem, which are not suitable to address the problem of clustering high dimensional or abnormal distributed data streams. This article proposes a k-Means partitioning and density based data stream clustering algorithm--CLUSMD. The algorithm applies K means clustering on each partition of the data stream to generate mean reference point set, and subsequently density based clustering is applied to these reference points to get the clustering result of each periods. Theoretic analysis and experimental results showe that CLUSMD is effective and efficient.
出处 《小型微型计算机系统》 CSCD 北大核心 2007年第1期83-87,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(70371015)资助 教育部高等学校博士学科点专项科研基金项目(20040286009)资助.
关键词 数据流聚类 均值参考点 密度聚类 data stream clustering mean reference point density based clustering
  • 相关文献

参考文献9

  • 1Han Jia-wei.Micheline.Data mining:concepts and techniques[M].Morgan Kaufmann Publishers,San Fransisco,CA,2000.
  • 2Ester M,Kriegel HP,Sander J,et al.A density based algorithm of discovering clusters in large spatial databases with noise[C].In:Simoudis E,Han JW,Fayyad UM,eds.Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining Portland,AAAI Press,1996:226-231.
  • 3Zhang T,Ramakrishnan R,Livny M.BIRCH:an efficient data clustering method for very large databases[C].In:Jagadish HV,Mumick IS,eds.Proc.of the 1996 ACM SIGMOD Int.Conf.on Management of Data.Montreal:ACM Press,1996:103-114.
  • 4Guha S,Rostogi R,Shim K.CURE:an efficient clustering algorithm for large databases[C].In:Haas LM,Tiwary A,eds.Proceedings of the ACM SIGMOD International Conference on Management of Data Seattle.ACM Press,1998:73-84.
  • 5Wang W,Yang J,Muntz R.STING:a statistical information grid approach to spatial data mining[C].Proc.Int.Conf.on Very Large Databases(VLDB97),1997:186-195.
  • 6Guha S,Mishra N,Motwani R.Clustering data streams[C].In:Proceedings of the Annual Symposium on Foundations of Computer Science,2000:359-366.
  • 7Liadan OCallaghan,Nina Mishra,Adam Meyerson,Sudipto Guha,Rajeev Motwani.Streaming-data algorithms for high-quality clustering[C].In:Proceedings of IEEE International Conference on Data Engineering,2002:685-696.
  • 8倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量:18
  • 9Maria Halkidi,Michalis Vazirgiannis.Clustering validity assessment:finding the optimal partitioning of a data set[C].ICDM 2001:187-194.

二级参考文献11

  • 1Ester M, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. the 2nd Int'l Conf. Knowledge Discovering in Databases and Data Mining(KDD 96). Menlo Park, CA: AAA I Press, 1996.
  • 2Zhan W, et al. STING: A statistical information grid approach to spatial data mining. In: Proc. the 23rd VLDB Conf. Athens. San Francicso: Morgan Kaufmann, 1997. 186~ 195.
  • 3K. Beyer, J. Goldstein, R. Ramakhrisnan, et al. Nearest neighbor' meaningful. In: Proc. the 7th Int'l Conf. Database Theory ( ICDT' 99), http://citeseer.ist.psu.edu/605885.html,1999.
  • 4A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the neareast neighbor in high dimensional spaces. In: Proc. the 26th Int'l Conf. Very Large Data Bases, San Francisco, 2000.
  • 5Maria Halkidi, Michalis Vazirgiannis. Clustering validity assessment: Finding the optimal partitioning of a data set. IEEE Int'l Conf. Data Mining, California, USA, 2001.
  • 6Zhang T, et al. Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Int'l Conf.Management of Data, Montreal. New York: ACM Press, 1996.73 ~ 84.
  • 7Guha S, Rastogi R, Shin K. CURE: An efficient clustering algorithm for large databases. In: Proc. ACM SIGMOD Int'l Conf. Management of Data, Seattle. New York: ACM Press,1998. 73~84.
  • 8Jiawei Han, Micheline. Data Mining: Concepts and Techniques.San Francisco: Morgan Kaufmann Publishers, 2000.
  • 9C. Ordones, E. Omiecinski. Efficient disk-based K-means clustering for relational databases. IEEE Trans. Knowledge and Data Engineering, 2004, 16:909~921.
  • 10C. Ordonez. Clustering binery data streams with K-means. ACM DKMD Workshop, San Diego, California, 2003.

共引文献17

同被引文献62

引证文献8

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部