基于q近邻的不完备数据三支决策聚类方法被引量：5

Three-Way Decision Clustering Algorithm for Incomplete Data Based on q-Nearest Neighbors

下载PDF

导出

摘要聚类是数据挖掘的重要技术之一,在许多实际应用领域,由于数据获取限制,数据误读,随机噪音等原因会造成大量的缺失数据,形成数据集的不完备性,而传统的聚类方法无法直接对这类数据集进行聚类分析。针对数值型数据,提出了一个基于三支决策的不完备数据聚类方法。首先找到不完备数据对象的q个近邻,使用q个近邻的平均值填充缺失的数据;然后在"完备的"数据集上使用基于密度峰值的聚类方法得到簇划分,对每个簇中含有不确定性的数据对象,使用三支决策的思想将其划分到边界域中。三支决策聚类结果采用区间集形式表示,通常一个簇被划分成正域、负域和边界域部分,可以更好地描述软聚类结果。在UCI数据集和人工数据集上的实验结果展示了算法的有效性。 Clustering is a common technique for data analysis, and has been widely used in many practical areas. However, in many practical applications, there are some reasons to cause the missing values in real data sets such as difficulties and limitations of data acquisition and random noises. Most of clustering methods can’t be used to deal with incomplete data sets for clustering analysis directly. For this reason, this paper proposes a three-way decision clustering algorithm for incomplete data based on q-nearest neighbors. Firstly, the algorithm finds the q-nearest neighbors for an object with missing values, and the missing value is filled by the average value of q-nearest neighbors. Secondly, it uses the clustering method based on density peaks for the complete data set to obtain the clustering result. For the data object with uncertainty in each cluster, it is designed to the boundary region of a cluster using the three-way decision theory. The three-way decision with interval sets naturally partitions a cluster into three regions as the positive region, boundary region and negative region, which has the advantage of dealing with soft clustering. The experimental results on some UCI data sets and synthetic data sets show preliminarily the effectiveness of the proposed algorithm.

作者苏婷于洪

机构地区重庆邮电大学计算智能重庆市重点实验室

出处《计算机科学与探索》 CSCD 北大核心 2016年第6期875-883,共9页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金Nos.61379114 61272060~~

关键词不完备数据三支决策聚类 q近邻 incomplete data three-way decision clustering q-nearest neighbors

分类号 TP181.1 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献22

1Jain A K. Data clustering: 50 years beyond K-means[J]. Pat- tern Recognition Letters, 2010, 31(8): 651-666.
2Rubin D B. Inference and missing data[J]. Biometrika, 1976, 63(3): 581-592.
3Little R J A, Rubin D B. Statistical analysis with missing data[M]. Hoboken, USA: John Wiley & Sons, 2014.
4Hathaway R J, Bezdek J C. Fuzzy C-means clustering of in- complete data[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B Cybernetics, 2001, 31(5): 735-744.
5Sarkar M, Leong T Y. Fuzzy K-means clustering with missing values[C]//Proceedings of the American Medical Informat- ics Association Symposium. Bethesda, USA: AMIA, 2001: 588-592.
6Di Nuovo A G. Missing data analysis with fuzzy C-means: a study of its application in a psychological scenario[J]. Ex- pert Systems with Applications, 2011, 38(6): 6793-6797.
7Aydilek I B, Arslan A. A hybrid method for imputation of missing values using optimized fuzzy C-means with sup- port vector regression and a genetic algorithm[J]. Information Sciences, 2013, 233: 25-35.
8Himmelspach L, Conrad S. Fuzzy clustering of incomplete data based on cluster dispersion[C]//LNCS 6178: Computa- tional Intelligence for Knowledge-Based Systems Design, Proceedings of the 13th International Conference on Infor- mation Processing and Management of Uncertainty, Dort- mund, Germany, Jun 28-Jul 2, 2010. Berlin, Heidelberg: Springer, 2010: 59-68.
9Jia Zhiping, Yu Zhiqiang, Zhang Cbenghui. Fuzzy C-means clustering algorithm based on incomplete data[C]//Procee dings of the 2006 International Conference on Information Acquisition, Weihai, China, Aug 20-23, 2006. Piscataway, USA: IEEE, 2006: 601-604.
10Li Dan, Zhong Chongquan, Li Jinhua. An attribute weighted fuzzy C-means algorithm for incomplete data sets[C]//Pro- ceedings of the 2012 International Conference on System Science and Engineering. Dalian, China, Jun 30-Jul 2, 2012. Piscataway, USA: IEEE, 2012: 449-453.

同被引文献30

1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量：115
2唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量：95
3赵克勤.集对分析的不确定性系统理论在AⅠ中的应用[J].智能系统学报,2006,1(2):16-25. 被引量：69
4董静,孙乐,冯元勇,黄瑞红.中文实体关系抽取中的特征选择研究[J].中文信息学报,2007,21(4):80-85. 被引量：55
5刘克彬,李芳,刘磊,韩颖.基于核函数中文关系自动抽取系统的实现[J].计算机研究与发展,2007,44(8):1406-1411. 被引量：58
6孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1069
7赵克勤.基于集对分析的不确定性多属性决策模型与算法[J].智能系统学报,2010,5(1):41-50. 被引量：49
8黄鑫,朱巧明,钱龙华,刘梅梅.基于特征组合的中文实体关系抽取[J].微电子学与计算机,2010,27(4):198-200. 被引量：19
9李丹,顾宏,张立勇.基于属性加权的不完全数模糊c均值聚类算法[J].大连理工大学学报,2012,52(5):749-754. 被引量：5
10武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法[J].计算机学报,2012,35(8):1726-1738. 被引量：62

引证文献5

1袁浩.网络教学资源利用率优化管理仿真研究[J].计算机仿真,2017,34(10):221-224. 被引量：8
2裴卫杰,庞天杰.一种基于动态填充的不完备数据聚类算法[J].太原师范学院学报（自然科学版）,2018,17(1):50-55. 被引量：3
3朱艳辉,李飞,胡骏飞,钱继胜,王天吉.基于三支决策的两阶段实体关系抽取研究[J].计算机工程与应用,2018,54(9):145-150. 被引量：4
4施虹,杨鑫,王平心.改进的均值插补不完备数据聚类算法[J].江苏科技大学学报（自然科学版）,2020,34(4):51-56. 被引量：9
5张春英,高瑞艳,范雨祥,王龙飞,裴天帅,冯晓泽,任静.一种面向不完备数据的集对粒层次聚类算法[J].小型微型计算机系统,2021,42(3):522-530. 被引量：5

二级引证文献29

1刘宝,车礼东,黄红花,郭兵,宋振乾,李红霞,范晓明,董瑞.基于自然语言处理(NLP)技术建立化学品危险评估知识图谱的研究[J].计算机与应用化学,2018,35(7):605-610. 被引量：6
2王文双,刘崇屹,许才雄,朱桂芳,徐廷学.导弹故障诊断与预测系统的设计[J].海军航空工程学院学报,2018,33(5):486-492. 被引量：2
3逯海涛.“互联网+”背景下高校会计专业教学改革的研究与实践[J].大众投资指南,2019(2):238-239. 被引量：1
4刘克铜,赵江招,孙海英.智慧教育体系中的多媒体教学技术分析[J].电脑知识与技术,2019,15(2Z):151-152. 被引量：2
5谢迟.基于计算机辅助的舞蹈教学资源管理系统设计[J].现代电子技术,2018,41(16):100-103. 被引量：7
6孟敏.基于审美教育的小学语文教学策略研究[J].华夏教师,2019,0(7):36-37. 被引量：1
7刘岩.多元情境互动教学模式在高中体育教学中的应用研究[J].当代体育科技,2019,9(9):63-64. 被引量：2
8王虹元,宋清滔.汉字笔画教学对小学低年级语文教学的影响[J].小学生作文辅导（读写双赢）,2019,0(6):54-54.
9李光明,王军,李颀.改进的PSOGA-SVM模型应用于露天矿区空气质量预测[J].中国科技论文,2019,14(12):1348-1355. 被引量：5
10李素贞.基于社区搜索模块排序算法的网页设计教学质量评估系统[J].现代电子技术,2020,43(4):83-86. 被引量：3

1张聪,于洪.一种三支决策软增量聚类算法[J].山东大学学报（理学版）,2014,49(8):40-47. 被引量：1
2汪小燕.基于分辨矩阵的论域划分方法[J].电脑学习,2007(4):5-6. 被引量：1
3于洪.三支聚类分析[J].数码设计,2016,5(1):31-35. 被引量：11
4杨文杰,刘浩学,秦炼,王子美.图象分割中随机噪音的影响[J].遥感学报,1997,1(4):267-271.
5赵思雨,魏玲.基于决策表的保边界域不变及保负域不变约简[J].数码设计,2016,0(1):27-30. 被引量：1
6徐浙君.一种基于采样遗传的文本软聚类方法[J].计算机光盘软件与应用,2014,17(14):128-129.
7张静静,杨燕,王红军,韩晓涛,邓强.一种新的软聚类投票法及其并行化实现[J].中国科学技术大学学报,2016,46(3):173-179. 被引量：2
8黄卫华,杨国增,陆亚哲,周平.变精度粗糙集模型研究[J].河北北方学院学报（自然科学版）,2015,31(4):8-10. 被引量：3
9朱灿伟.一种基于决策粗糙集的两步分类算法[J].中国新通信,2012,14(20):72-73.
10姜亚莉,关泽群.用于Web文档聚类的基于相似度的软聚类算法[J].计算机工程,2006,32(2):59-61. 被引量：6

计算机科学与探索

2016年第6期

浏览历史

内容加载中请稍等...

基于q近邻的不完备数据三支决策聚类方法被引量：5

参考文献22

同被引文献30

引证文献5

二级引证文献29

相关作者

相关机构

相关主题

浏览历史

基于q近邻的不完备数据三支决策聚类方法 被引量：5

参考文献22

同被引文献30

引证文献5

二级引证文献29

相关作者

相关机构

相关主题

浏览历史

基于q近邻的不完备数据三支决策聚类方法被引量：5