摘要
针对采用经典划分思想的聚类算法以一个点来代表类的局限,提出一种基于泛化中心的分类属性数据聚类算法。该算法通过定义包含多个点的泛化中心来代表类,能够体现出类的数据分布特征,并进一步提出泛化中心距离及类间距离度量的新方法,给出泛化中心的确定方法及基于泛化中心进行对象到类分配的聚类策略,一般只需一次划分迭代就能得到最终聚类结果。将泛化中心算法应用到四个基准数据集,并与著名的划分聚类算法K-modes及其两种改进算法进行比较,结果表明泛化中心算法聚类正确率更高,迭代次数更少,是有效可行的。
A new partition algorithm is proposed to cluster categorical data based on generalized centroid , which is different from classic partition clustering algorithms that have the disadvantage of using only one centroid to represent a cluster.The algorithm defines a new concept “generalized centroid” to represent a cluster , which implies the data distribution feature;proposes the new distance measures not only between generalized centroids but also between clusters;and further gives the approach to get the generalized centroids and to assign the objects to clusters based on the generalized centroids , which supports the fact that the algorithm gets the clustering result normally with only once partition iteration .The generalized centroids algorithm is applied to four benchmark data-sets and compared to famous partition clustering algorithm K-modes and its two improved algorithms .Experimen-tal results reveal that the generalized centroids algorithm has higher clustering accuracy and less iteration times . It is effective and feasible .
出处
《运筹与管理》
CSSCI
CSCD
北大核心
2014年第6期37-43,共7页
Operations Research and Management Science
基金
国家自然科学基金资助项目(71271027)
中央高校基本科研业务费专项资金(FRF-TP-10-006B)
高等学校博士学科点专项科研基金(20120006110037)
关键词
聚类算法
泛化中心
分类属性
K-modes
clustering algorithm
generalized centroid
categorical attribute
K-modes