期刊文献+

分类属性数据的泛化中心聚类算法

Generalized Centroids Clustering Algorithm for Categorical Data
下载PDF
导出
摘要 针对采用经典划分思想的聚类算法以一个点来代表类的局限,提出一种基于泛化中心的分类属性数据聚类算法。该算法通过定义包含多个点的泛化中心来代表类,能够体现出类的数据分布特征,并进一步提出泛化中心距离及类间距离度量的新方法,给出泛化中心的确定方法及基于泛化中心进行对象到类分配的聚类策略,一般只需一次划分迭代就能得到最终聚类结果。将泛化中心算法应用到四个基准数据集,并与著名的划分聚类算法K-modes及其两种改进算法进行比较,结果表明泛化中心算法聚类正确率更高,迭代次数更少,是有效可行的。 A new partition algorithm is proposed to cluster categorical data based on generalized centroid , which is different from classic partition clustering algorithms that have the disadvantage of using only one centroid to represent a cluster.The algorithm defines a new concept “generalized centroid” to represent a cluster , which implies the data distribution feature;proposes the new distance measures not only between generalized centroids but also between clusters;and further gives the approach to get the generalized centroids and to assign the objects to clusters based on the generalized centroids , which supports the fact that the algorithm gets the clustering result normally with only once partition iteration .The generalized centroids algorithm is applied to four benchmark data-sets and compared to famous partition clustering algorithm K-modes and its two improved algorithms .Experimen-tal results reveal that the generalized centroids algorithm has higher clustering accuracy and less iteration times . It is effective and feasible .
出处 《运筹与管理》 CSSCI CSCD 北大核心 2014年第6期37-43,共7页 Operations Research and Management Science
基金 国家自然科学基金资助项目(71271027) 中央高校基本科研业务费专项资金(FRF-TP-10-006B) 高等学校博士学科点专项科研基金(20120006110037)
关键词 聚类算法 泛化中心 分类属性 K-modes clustering algorithm generalized centroid categorical attribute K-modes
  • 相关文献

参考文献14

  • 1Xu R, Wunsch II D. Survey of clustering algorithms[ J]. IEEE Transactions on Neural networks, 2005, 16(3) : 645-678.
  • 2许语拉,徐培德,王慧林,彭玉华.基于团划分的成像侦察任务聚类方法研究[J].运筹与管理,2010,19(4):143-149. 被引量:7
  • 3Yin F, Liu C L. Handwritten Chinese text line segmentation by clustering with distance metric learning[ J]. Pattern Recogni- tion, 2009, 42(12): 3146-3157.
  • 4杨博,刘大有,LIU Jiming,金弟,马海宾.复杂网络聚类方法[J].软件学报,2009,20(1):54-66. 被引量:212
  • 5迟国泰,程砚秋,曹勇.基于聚类赋权的科学发展评价模型及实证[J].运筹与管理,2011,20(5):94-102. 被引量:11
  • 6孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1077
  • 7Jain A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31 (8) : 651-666.
  • 8Muhlenbach F, Lallich S. A new clustering algorithm based on regions of influence with self-detection of the best number of clusters/C]// Proceedings of the 9th IEEE International Conference on Data Mining, Miami, Florida, the United States, 2009 : 884-889.
  • 9Yan H, Chert K, Liu L, et al. Determining the best K for clustering transactional datasets : A coverage density-based approach [ J]. Data & Knowledge Engineering, 2009, 68( 1 ) : 28-48.
  • 10Huang Z. Extensions to the K-means algorithm for clustering large data sets with categorical values [ J ]. Data Mining and Knowledge Discovery, 1998, (2) : 283-304.

二级参考文献90

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 3马力,史锦凤.15个副省级城市区域经济发展水平的实证分析[J].科技进步与对策,2006,23(12):88-90. 被引量:20
  • 4徐雪仁,宫鹏,黄学智,金勇.资源卫星(可见光)遥感数据获取任务调度优化算法研究[J].遥感学报,2007,11(1):109-114. 被引量:29
  • 5Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
  • 6MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
  • 7Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
  • 8Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
  • 9Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507
  • 10Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002,14 :673-690

共引文献1306

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部