期刊文献+

基于随机投影的并行数据流聚类方法 被引量:3

Random Projection Based Clustering Method of Parallel Data Streams
原文传递
导出
摘要 利用数据流的遗忘特性,应用随机投影,分层、动态地维护每个数据流的概要结构.基于该概要结构,快速计算数据流和聚类中心之间的近似距离,实现一种适合并行多数据流的K-means聚类方法.所进行的实验验证该方法的有效性. A synopsis is maintained dynamically for each data stream. The construction of the synopsis is based on random projections and it utilizes the amnesic feature of data stream. Using the synopsis, the approximate distances between streams and the cluster center can be computed fast. And an efficient online version of the classical K-means clustering algorithm is developed. The experimental results showy the method can be performed effectively with a good clustering quality.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2009年第1期113-122,共10页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.60773072) 浙江省自然科学基金项目(No.Y104144) 浙江省教育厅项目(No.20051737)资助
关键词 概要结构 遗忘特性 随机投影 数据流 Synopsis, Amnesic Feature, Random Projection, Data Stream
  • 相关文献

参考文献26

  • 1Keogh E, Kasetty S. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Mining and Knowledge Discovery, 2003, 7(4): 349-371
  • 2Guha S, Meyerson A, Mishra N, et al. Clustering Data Streams: Theory and Practice. IEEE Trans on Knowledge and Data Engineering, 2003, 15(3) : 515 -528
  • 3Aggarwal C C, Han Jiawei, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams //Proc of the 29th International Conference on Very Large Data Base. Berlin, Germany, 2003: 81 -92
  • 4Charikar M, O'Callaghan L, Panigrahy R. Better Streaming Algorithms for Clustering Problems // Proc of the 35th Annual ACM Symposium on Theory of Computing. San Diego, USA, 2003 : 30 - 39
  • 5Beringer J, Hullermeier E. Online Clustering of Parallel Data Streams. Data & Knowledge Engineering, 2006, 58(2): 180 - 204
  • 6Yeh M Y, Dai Biru, Chen M S. Clustering over Multiple Evolving Streams by Events and Correlations. IEEE Trans on Knowledge and Data Engineering, 2007, 19(10) : 1349 - 1362
  • 7Johnson W B, Lindenstrauss J. Extensions of Lipschitz Mappings into a Hilbert Space. Contemporary Mathematics, 1984, 26 ( 1 ) : 189 -206
  • 8Achlioptas D. Database-Friendly Random Projections//Proc of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Santa Barbara, USA, 2001 : 274 -281
  • 9Linial N, London E, Rabinovich Y. The Geometry of Graphs and Some of Its Algorithmic Applications. Combinatorica, 1995, 15 (2) : 215 -245
  • 10Dasgupta S, Gupta A. An Elementary Proof of a Theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 2003, 22 (1): 60-65

同被引文献43

  • 1周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量:21
  • 2刘旭,毛国君,孙岳,刘椿年.数据流中频繁闭项集的近似挖掘算法[J].电子学报,2007,35(5):900-905. 被引量:14
  • 3罗会兰,孔繁胜,李一啸.聚类集成中的差异性度量研究[J].计算机学报,2007,30(8):1315-1324. 被引量:36
  • 4Jain A K. Data Clustering: 50 Years Beyond K-Means[J]. Pattern Recognition Letters,2010,31(8):651-666.
  • 5Fern X Z, Brodley C E. Random Projection For High Dimensional Data Clustering: A Cluster Ensemble Approach[C]//Proceedings of the 20th International Conference on Machine Learning. Washington DC, 2003: 186-193.
  • 6Turk M,Pentland A P. Face Recognition Using Eigenfaces[C]//IEEE Conference on Computer Vision and Pattern Recognition. Maui Marriott, Hawaii, 1991 : 586-591.
  • 7Deng Cai, et al. Orthogonal Laplacianfaces for Face Recognition [J ]. IEEE Transactions on Image Processing, 2006, 15 ( 11 ) : 3608-3614.
  • 8Roweis S T, Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science, 2000, 290 (5500) : 2323- 2326.
  • 9Strehl A, Ghost J. Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions[J]. Journal of Machine Learning Research, 2002,3 : 583-617.
  • 10Fred A L,Jain A K. Combining Multiple Clusterings Using Evidence Accumulation[J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2005 : 835-850.

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部