期刊文献+

基于连通分量的分类变量聚类算法 被引量:4

A clustering algorithm for categorical variables based on connected components
原文传递
导出
摘要 针对分类变量相似度定义存在的不足,提出一种新的相似度定义.利用新的相似度定义,将数据集抽象为无向图,将聚类过程转化为求无向图连通分量的过程,进而提出一种基于连通分量的分类变量聚类算法.为了定量地分析该算法的聚类效果,针对类别归属已知的数据集,提出一种新的聚类结果评价指标.实验结果表明,所提出的算法具有较高的聚类精度和聚类效率. For the insufficient similarity concepts for categorical variables, a new more reasonable concept is proposed. Firstly, a data set is organized into an undirected graph by the new definition. The clustering process is converted into the problem of determining connected components in the undirected graph. Then a novel clustering algorithm for categorical variables based on connected components is proposed. In order to analyze the clustering results quantitatively, a new index is proposed for the known labels. Finally, the experimental results show that the proposed algorithm has a higher clustering precision and faster execution speed compared with several existing ones.
出处 《控制与决策》 EI CSCD 北大核心 2015年第1期39-45,共7页 Control and Decision
基金 国家自然科学基金项目(61402363 61272284) 陕西省工业攻关项目(2014K05-49) 陕西省自然科学基础研究计划项目(2014JQ8361) 西安市碑林区科技计划项目(GX1405) 西安市科学计划项目(CXY1339(5)) 校特色研究计划项目(116-211302)
关键词 聚类 分类变量 相似度 连通分量 聚类精度 clustering categorical variables similarity connected components clustering precision
  • 相关文献

参考文献16

  • 1James B M. Some methods for classification and analysis of multivariate observations[C]. Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967: 281-297.
  • 2Martin E, Hans P K, Jiirg S, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]. Proc of the 2nd Int Conf on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996: 226-231.
  • 3Wu S, Liu J J, Wei G. Clustering algorithm based on condensed set dissimilarity for high dimensional sparse data of categorical attributes[C]. Proc of the 3rd Int Conf on Advanced Computer Control. Harbin: IEEE Press, 2011: 445-448.
  • 4Han J W, Kamber M. Data mining: Concepts and techniques[M]. Beijing: China Machine Press, 2008: 253- 260.
  • 5Cao F Y, Liang J Y, Li D Y. A dissimilarity measure of the k-modes clustering algorithm[J]. Knowledge-Based Systems, 2012, 26(15): 120-127.
  • 6Natthakan L, Tossapon B, Simon G, et al. A link- based cluster ensemble approach for categorical data clustering[J]. IEEE Trans on Knowledge and Data Engineering, 2012, 24(3): 413-425.
  • 7Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes[C]. Proc of the 15th Int Conf on Data Engineering. Sydney: IEEE CS Press, 1999: 512-521.
  • 8Joydeep G, Gunjan K G. Value balanced agglomerative connectivity clustering[C]. Proc of the 3rd Int Conf on Data Mining and Knowledge Discovery: Theory, Tools and Technology. Orlando: SPIE, 2001: 6-15.
  • 9Dutta M, Dakoti M A, Pujari A K. QROCK: A quick version of the ROCK algorithm for clustering of categorical data[J]. Pattern Recognition Letters, 2005, 26(15): 2364- 2373.
  • 10金阳,左万利.一种基于动态近邻选择模型的聚类算法[J].计算机学报,2007,30(5):756-762. 被引量:18

二级参考文献10

  • 1Dubes R C, Jain A K. Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall, 1988.
  • 2Zhang Tian, Ramakrishnan Raghu, Livny Miron. Birch: An efficient data clustering method for very large databases// Proceedings of the ACM SIGMOD Conference on Management of Data. Montreal, Canada, 1996: 103-114.
  • 3Guha S, Rastogi R, Shim K. ROCK:A robust clustering algorithm for categorical attributes//Proceedings of the 15th International Conference on Data Engineering. Sydney, Australia, 1999:1-11.
  • 4Gupta G K, Ghosh J. Value balanced agglomerative connectivity clustering//Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery Ⅲ. Orlando, USA, 2001:6-15.
  • 5Dutta M, Kakoti Mahanta A, Pujari Arun K. QROCK: A quick version of the ROCK algorithm for clustering of categorical data. Pattern Recognition Letters, 2005, 26(15): 2364-2373.
  • 6Gehrke J. New research directions in KDD. Report on the SIGKDD 2001 Conference Panel, SIGKDD Explorations,2002, 3(2): 76-77.
  • 7Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. Minnesota: University of Minnesota, Technical Report: 00-034, 2002.
  • 8Sebastiani F. A tutorial on automatic text categorization// Proceedings of the 1st Argentinean Symposium on Artificial Intelligence (ASAI'99). Buenos Aires, AR, 1999:7-35.
  • 9Larsen B, Aone C. Fast and effective text mining using linear-time document clustering//Proceedings of the 5th ACM SIGKDD. San Diego, CA, 1999:16-22.
  • 10Kleinberg J, Papadimitriou C, Raghavan P. Segmentation problems//Proceedings of the 30th ACM Symposium on Theory of Computing. Duluth, MIN, USA, 1998:473-481.

共引文献17

同被引文献40

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部