一种基于粗糙熵的改进K-modes聚类算法

Improved K-modes clustering algorithm based on rough entropy

下载PDF

导出

摘要 K-modes聚类算法被广泛应用于人工智能、数据挖掘等领域。传统的K-modes聚类算法有不错的聚类效果,但是存在迭代次数多、计算量大、容易受到冗余属性的干扰等问题,且仅采用简单的0-1匹配的方法来定义2个样本属性值之间的距离,没有充分考虑每个属性对聚类结果的影响。针对上述问题,该文将粗糙熵引入K-modes算法。首先利用粗糙集属性约简算法消除冗余属性,确定各属性的重要程度;然后利用粗糙熵确定每个属性的权重,从而定义新的类内距离。将该文所提算法与传统的K-modes聚类算法分别在4组公开数据集上进行对比试验。试验结果表明,该文所提算法聚类准确率比传统的K-modes聚类算法更高。 At present,K-modes clustering algorithm is widely used in artificial intelligence,data mining and other fields.The traditional K-modes clustering algorithm has good clustering effect,but it also faces too many iterations,large amount of calculation,easy to be interfered by redundant attributes and other problems.In addition,only a simple 0-1 matching method is used to define the distance between the attribute values of each two samples,without fully considering the influence of each attribute on the clustering results.To solve the above problems,this paper introduces the rough entropy into K-modes algorithm.Firstly,the attribute reduction algorithm of rough set is used to eliminate redundant attributes and determine the importance of each attribute.Then,the rough entropy is used to determine the weight of each attribute,so as to define a new intra-class distance.In this paper,the proposed algorithm was compared with the traditional K-modes algorithm on four groups of public data sets respectively.The experimental results show that the proposed algorithm has higher clustering accuracy than the traditional K-modes algorithm.

作者刘财辉曾雄谢德华 Liu Caihui;Zeng Xiong;Xie Dehua(School of Mathematics and Computer Science,Gannan Normal University,Ganzhou 341000,China;School of Mechanical and Electronic Engineering,Ji’an Vocational and Technical College,Ji’an 343000,China)

机构地区赣南师范大学数学与计算机科学学院吉安职业技术学院机械与电子工程学院

出处《南京理工大学学报》 CAS CSCD 北大核心 2024年第3期335-341,共7页 Journal of Nanjing University of Science and Technology

基金国家自然科学基金(62166001) 江西省自然科学基金(20202BAB202010)。

关键词聚类 K-modes算法粗糙集粗糙熵属性约简权重 clustering K-modes algorithm rough sets rough entropy attribute reduction weight

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1梁吉业,白亮,曹付元.基于新的距离度量的K-Modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. 被引量：46
2赵恒,杨万海.基于属性加权的模糊K-Modes聚类算法[J].系统工程与电子技术,2003,25(10):1299-1302. 被引量：12
3王慧研,张腾飞,马福民.基于空间距离自适应权重度量的粗糙K-means算法[J].计算机科学,2018,45(7):190-196. 被引量：7
4张立军,高春晓.基于k-means聚类与粗糙集算法的指标筛选方法研究[J].运筹与管理,2020,29(12):8-12. 被引量：12
5孙静勇,马福民.基于邻域归属信息混合度量的粗糙K-Means算法[J].计算机工程,2021,47(3):109-116. 被引量：6
6施振佺,陈世平.一种改进的k-modes聚类算法[J].运筹与管理,2019,28(12):112-117. 被引量：6

二级参考文献53

1陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量：13
2高新波姬红兵.一种基于特征加权的模糊C-均值聚类算法[J].西安电子科技大学学报,2000,27(10):80-83.
3Han Jiawei,Kamber M.Data Mining Concepts and Techniques[M].San Francisco:Morgan Kaufmann,2001.
4Brendan J F,Delbert D.Clustering by passing messages between data points[J].Science,2007,315(16):972-976.
5Zhang Jiangshe,Liang Yiuwing.Improved possibilistic c-means clustering algorithms[J].IEEE Trans on Fuzzy Systems,2004,12(2):209-217.
6Mac Q J.Some methods for classification and analysis of multivariate observation[C]//Proc of the 5th Berkley Symp on Mathematical Statistics and Probability.Berkley,California:University of California Press,1967:281-297.
7Huang Zhexue.Clustering large data sets with mixed numeric and categorical values[C]//Proc of PAKDD97.Singapore:World Scientific,1997:21-35.
8Huang Zhexue.Extensions to the K-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
9Ng M K,Li Junjie,Huang Zhexue,et al.On the impact of dissimilarity measure in K-modes clustering algorithm[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2007,29(3):503-507.
10San O M,Huynh V N,Nakamori Y.An alternative extension of the K-means algorithm for clustering categorical data[J].Int Journal Application Mathematic and Computer Science,2004,14(2):241-247.

共引文献83

1齐锡晶,刘乃畅,陈浩然.开发企业参建模式下保障性租赁住房的综合效益评价研究[J].建筑经济,2022,43(S01):753-758. 被引量：2
2陈小全,张继红.基于改进粒子群算法的聚类算法[J].计算机研究与发展,2012,49(S1):287-291. 被引量：31
3陈晓红,刘蓉.改进的聚类算法及在复杂大群体决策中的应用[J].系统工程与电子技术,2006,28(11):1695-1699. 被引量：21
4杨鑫华,于宽.基于密度半径自适应选择的K-均值聚类算法[J].大连交通大学学报,2007,28(1):41-44. 被引量：2
5吴艳文,胡学钢,王东波.基于特征加权的k-modes聚类应用[J].中国科技信息,2007(16):271-272.
6刘蓉,陈晓红.新的大群体一致性学习修正决策方法[J].系统工程与电子技术,2008,30(5):847-850. 被引量：5
7宁涛,晋博晨,宋存利.基于子空间变量自动加权的K-均值文本聚类算法的研究[J].计算机应用与软件,2008,25(8):251-253. 被引量：1
8曹文婷,邹海,段凤玲.基于模糊K-Modes和免疫遗传算法的聚类分析[J].计算机技术与发展,2009,19(2):151-153. 被引量：2
9曾华,吴耀华,黄顺亮.非均匀类簇密度聚类的多粒度自学习算法[J].系统工程与电子技术,2010,32(8):1760-1765. 被引量：4
10于海涛,李梓,姚念民.K-means聚类算法优化方法的研究[J].小型微型计算机系统,2012,33(10):2273-2277. 被引量：22

1蒋宏.基于聚类分析的综合杆设计应用[J].黑龙江交通科技,2023,46(11):78-81.
2陈玲玲,沈宣.基于K-modes的安全算法抵御SSDF攻击[J].电脑与电信,2023(7):27-30.
3刘逗逗,王文发,许淳.改进的k-modes聚类算法在协同过滤就业推荐算法中的应用[J].延安大学学报（自然科学版）,2024,43(2):96-100.
4蒋小霞,黄瑞章,白瑞娜,任丽娜,陈艳平.基于事件表示和对比学习的深度事件聚类方法[J].计算机应用,2024,44(6):1734-1742.
5谢士尧,王小梅.基于深度文本聚类的论文与专利数据融合方法研究[J].数据分析与知识发现,2024,8(4):112-124.
6乔建刚,刘翔,耿斌斌.基于粗糙集最优解山区公路边坡防护效果评价[J].北京工业大学学报,2024,50(7):854-863.
7刘永利,常冉.基于ECM的多视图模糊聚类算法[J].河南理工大学学报（自然科学版）,2024,43(3):154-160.

南京理工大学学报

2024年第3期

浏览历史

内容加载中请稍等...

一种基于粗糙熵的改进K-modes聚类算法

参考文献6

二级参考文献53

共引文献83

相关作者

相关机构

相关主题

浏览历史