期刊文献+

旋转网格:一种新的聚类融合方法

Rotation Grid:A New Cluster Ensemble Method
下载PDF
导出
摘要 网格聚类以网格为单位学习聚簇,速度快、效率高。但它过于依赖密度阈值的选择,并且构造的每个聚簇边界呈锯齿状,不能很好地识别平滑边界曲面。针对该问题,提出一种新的面向网格问题的聚类融合算法(RG)。RG不是通过随机抽样数据集或随机初始化相关参数来创建有差异的划分,而是随机地将特征划分为K个子集,使用特征变换得到K个不同的旋转变换基,形成新的特征空间,并将网格聚类算法应用于该特征空间,从而构建有差异的划分。实验表明,RG能够有效地划分任意形状、大小的数据集,并能有效地解决网格聚类过分依赖于密度阈值选择以及边界处理过于粗糙的问题,其精度明显高于单个网格聚类。 Although it is rapid and efficient to use the grid-based clustering approach to learn the partition of a data set,grid clustering is excessively dependent on the initialization of density threshold,and the margin of each cluster constructed by the approach presents zigzag manner,which prohibits the recognition of smooth boundary surface.Thus,this paper proposed a new grid-oriented cluster ensemble approach(RG) to solve this problem.Instead of constructing the partitions with diversity on a given data set by random sampling or initializing parameters of corresponding algorithm,RG randomly splits the features set into K subsets,uses feature transformation method on the subsets to learn K diffe-rent rotation basis,and applies grid cluster algorithm to the new feature space formed by the K axis rotations to learn the partitions with diversities.Experimental results show that,compared with single grid clustering,RG not only partitions the data set with arbitrary shape or size efficiently,but also alleviates its dependence on the density threshold initialization and smoothes the rough boundary.
出处 《计算机科学》 CSCD 北大核心 2011年第7期157-161,共5页 Computer Science
基金 国家自然科学基金项目(60773048)资助
关键词 网格聚类 聚类算法 聚类融合 Grid clustering Clustering algorithm Clustering ensemble
  • 相关文献

参考文献14

  • 1Han Jia wei, Kamber M. Data Mining: Concepts and Tech niques(2^nd ed) [M]. Morgan Kaufmann Publishers, 2006 : 398- 401.
  • 2Tulyakov S, et al. Review of Classifier Combination Methods [M]. Studies in Computational Intelligence(SCI), 2008: 361-386.
  • 3Dudoit S, Fridlyand J. Bagging to improve the accuracy of a clus tering procedure[J]. Bioinformatics,2003,19(9) : 1090-1099.
  • 4Topchy A, Jain A K, Punch W. Clustering for grouping of smooth curves and texture segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003,25 (4) :513-518.
  • 5Chen Min, Gao Xue-dong, Li Hui-fei. Parallel DBSCAN with Priority R-tree[C]//The 2nd IEEE International Conference on Information Management and Engineering (ICIME). Chengdu, April 2010 : 508-511.
  • 6Shacham O, Vechev M, Yahav E. Chameleon: adaptive selection of collections [ C] // Proceedings of the 2009 ACM SIGPLAN conference gramming language design and implementation. June 2009.
  • 7Chen Ling, Tu Li, Chen Hong-jian. Data clustering by ant colony on a digraph[C]// Proceedings of the 4th International Confe- rence on Machine Learning and Cybernetics. Guangzhou, August 2005 : 1686-1692.
  • 8Ertoz L, Steinbach M, Kumar V. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data[C] // Proceedings of the 2003 SIAM International Conference on Data Mining(SDM'03). San Francisco:CA,2003.
  • 9Minaei-bidgli B, Topehy A, Punch W F. Ensembles of Partitions via Data Resampling[C]//Proceedings International Conference on Information Technology, Coding and Computing (ITCC 2004). 2004 : 188-192.
  • 10Fern X Z, Brodley C E. Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach[C]//Procee- dings of the 20th International Conference on Machine Learning. 2003 : 186-193.

二级参考文献9

  • 1阳琳贇,王文渊.聚类融合方法综述[J].计算机应用研究,2005,22(12):8-10. 被引量:28
  • 2Jain A K, Flynn P J. Data Clustering, A Review. ACM Computing Surveys, 1999,31(3) :264-323
  • 3Fred A L. Finding Consistent Clusters in Data Partitions//Proceedings of the Second International Workshop on Multiple Classifier Systems, 2001. Volume 2096 of Lecture Notes in Computer Science. Springer, 2001:309-318
  • 4Strehl A,Ghosh J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 2003,3 (3) : 583-617
  • 5Karypis G,Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 1998,20(1) : 359-392
  • 6Fred A L,Jain A K. Data clustering using evidence accumulation ffProceedings of the 16th International Conference on Pattern Recognition (ICPR 2002). volume 4,2002 ; 276-280
  • 7Ayad H,Kamel M. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors//Proceedings of the 4th International Workshop on Multiple Classifier Systems (MCS'03), 2003. Volume 2709 of Lecture Notes in Computer Science. Springer, 2003 : 166 175
  • 8Merz C, Murphy P. UCI repository of machine learning databases. http://archive. ics. uci. edu/ml/
  • 9Larson B, Aone C. Fast and effective text mining using lineartime document clustering//Conference on Knowledge Discovery in Data, Proceeding of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999:16- 22

共引文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部