期刊文献+

一种基于网格方法的高维数据流子空间聚类算法 被引量:8

A Grid-based Subspace Clustering Algorithm for High-dimensional Data Streams
下载PDF
导出
摘要 基于对网格聚类方法的分析,结合由底向上的网格方法和自顶向下的网格方法,设计了一个能在线处理高维数据流的子空间聚类算法。通过利用由底向上网格方法对数据的压缩能力和自顶向下网格方法处理高维数据的能力,算法能基于对数据流的一次扫描,快速识别数据中位于不同子空间内的簇。理论分析以及在多个数据集上的实验表明算法具有较高的计算精度与计算效率。 Based on the analysis of grid-based clustering algorithms, we propose a subspace clustering algorithm that can find clusters in different subspaces for high-dimensional data streams. The algorithm combines the advantages of bottom-up grid-based method and top-down grid-based method. A uniformly partitioned grid data structure is used to summarize the data stream online. A top-down grid partition method is used o find the subspaces in which clusters locate. Theory analysis and performance study with real datasets and synthetic dataset demonstrate the efficiency and effectiveness of our proposed algorithm.
出处 《计算机科学》 CSCD 北大核心 2007年第4期199-203,221,共6页 Computer Science
基金 湖北省自然科学基金项目"时空数据库的关键技术研究与实验"(ABA048)的资助
关键词 网格 子空间聚类 数据流 高维数据 Grid, Subspace clustering, Data stream, High-dimensional data
  • 相关文献

参考文献15

  • 1Henzinger M R,Raghavan P,Rajagopalan S.Computing on data streams.SRC Technical Note 1998-011.Digital systems research center:Palo Alto,California,1998
  • 2O'Callaghan L,et al.Streaming-Data Algorithms for High-Quality Clustering.In:Proc.of the 18th Intl.Conf.on Data Engineering (ICDE'02),2002.685~694
  • 3Han J,Kamber M.Data Mining:Concepts and Techniques.Morgan Kaufmann Publishers,2001
  • 4Aggarwal C C,et al.A Framework for Clustering Evolving Data Streams.In:Proc.of the 29th VLDB Conf.,2003.81~92
  • 5Aggarwal C C,et al.A Framework for Projected Clustering of High Dimensional Data Streams.In:Proc.of the 30th VLDB Conf.,2004.852~863
  • 6Park N H,Lee W S.Statistical Grid-Based Clustering over Data Streams.ACM SIGMOD Record,2004,33(1):32~37
  • 7Lu Y,et al.A Grid-Based Clustering Algorithm for High-Dimensional Data Streams.In:Proc.of the 1st International Conference on Advanced Data Mining and Applications,2005.In Lecture Notes in Computer Science,2005,3584:824~831
  • 8Agrawal R,et al.Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.In:Proc.ACM SIGMOD Int.Conf.on Management of Data (SIGMOD'98),1998.94~105
  • 9Goil S,et al.MAFIA:Efficient and Scalable Subspace Clustering for Very Large Data Sets:[Technical Report,No.CPDC-TR-9906-010].Center for Parallel and Distributed Computing,Department of Electrical & Computer Engineering,Northwestern University,1999
  • 10Hinneburg A,Keim D A.Optimal Grid-Clustring:Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering.In:Proc.of the 25th VLDB Conference,1999.506~517

同被引文献127

引证文献8

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部