期刊文献+

面向混合属性数据集的双重聚类方法 被引量:2

Dual clustering method of mixed data set
下载PDF
导出
摘要 面对复杂信息环境下的数据预处理需求,提出了一种可以处理混合属性数据集的双重聚类方法。这种双重聚类方法由双重近邻无向图的构造算法或其改进算法,基于分离集合并的双重近邻图聚类算法、基于宽度优先搜索的双重近邻图聚类算法、或基于深度优先搜索的双重近邻图聚类算法来实现。通过人工数据集和UCI标准数据集的仿真实验,可以验证,尽管这三个聚类算法所采用的搜索策略不同,但最终的结果是一致的。仿真实验结果还表明,对于一些具有明显聚类分布结构且无近邻噪声干扰的数据集,该方法经常能取得比K-means算法和AP算法更好的聚类精度,从而说明这种双重聚类方法具有一定的有效性。为进一步推广并在实际中发掘出该方法的应用价值,最后给出了一点较有价值的研究展望。 In order to effectively preprocessing mixed data sets from complex information environment, this paper proposes a dual clustering method. This dual clustering method is implemented by a construction algorithm of a dual near neighbor undirected graph or its improved algorithm, a clustering algorithm based on merging disjointset, a clustering algorithm based on breadthfirstsearch, or a clustering algorithm based on depthfirstsearch. Through the simulation experiments of some artificial data sets and UCI standard data sets, we can verify that the three clustering algorithms can obtain the same results in the end, although they use different search strategies. The experimental results also show that this dual clustering method can often obtain better clustering quality than kmeans algorithm and AP algorithm when handling some data sets with apparent clusters and without near neighbors noises. This demonstrates the dual clustering method is comparatively effective and practical. In the end, some research expectations are given to disinter and popularize this method.
作者 陈新泉
出处 《计算机工程与科学》 CSCD 北大核心 2013年第2期127-132,共6页 Computer Engineering & Science
基金 重庆三峡学院科学研究项目计划资助(11ZZ-058)
关键词 混合数据集 分离集 宽度优先搜索 深度优先搜索 双重聚类 mixed data set disjoint-set breadth-first-search depth-first-search dual clustering
  • 相关文献

同被引文献28

  • 1杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量:22
  • 2Tang Pang-ning, Michael Steinbaeh, Vipin Kumar. IntroductiontO data mining [M]. Beijing:Post:Telecom Press, 2006.
  • 3Jain A K. Data clustering:50 years beyond k-means[J]. Pattern Recognition Letters, 2010,31 (8) : 651-666.
  • 4Aggarwal C C, Han J,Wang J, et al. A framework for clustering evolving data streams[C]//Proc of VLDB. 2003:81-92.
  • 5Aggarwal C C, Han J, Wang J, et al. A framework for projected clustering of high dimensional data streams [C]//Proc. of VLDB. 2004 : 852-863.
  • 6Cao F, Estery M, Qian W, et al. Density-based clustering over- ran evolving data stream with noise[C]//Proc of the SIAM Conference on Data Mining (SDM). 2006:326-337.
  • 7Huang Z. Extension to K-means algorithm for clustering large datasets with categorical values[J]. Data Mining and Know- ledge Discovery II, 1998(2) : 283-304.
  • 8Aggarwal C C, Yu P S. A framework for clustering massive text and categorical data streams[C]//Proc of 6th Siam IntConf on Data Mining. Bethesda, 2006 : 477-481.
  • 9Guha S, Rastogi R, Shim K. ROCK:a robust clustering algo- rithm for categorical attributes[C]//Proc of ICDE. 1999: 512- 521.
  • 10Barbara D, Couto J, Yi L. COOLCAT: an entropy-based algo- rithm for categorical clustering[C]//Proc of CIKM. 2002 : 582- 589.

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部