期刊文献+

基于q近邻的不完备数据三支决策聚类方法 被引量:5

Three-Way Decision Clustering Algorithm for Incomplete Data Based on q-Nearest Neighbors
下载PDF
导出
摘要 聚类是数据挖掘的重要技术之一,在许多实际应用领域,由于数据获取限制,数据误读,随机噪音等原因会造成大量的缺失数据,形成数据集的不完备性,而传统的聚类方法无法直接对这类数据集进行聚类分析。针对数值型数据,提出了一个基于三支决策的不完备数据聚类方法。首先找到不完备数据对象的q个近邻,使用q个近邻的平均值填充缺失的数据;然后在"完备的"数据集上使用基于密度峰值的聚类方法得到簇划分,对每个簇中含有不确定性的数据对象,使用三支决策的思想将其划分到边界域中。三支决策聚类结果采用区间集形式表示,通常一个簇被划分成正域、负域和边界域部分,可以更好地描述软聚类结果。在UCI数据集和人工数据集上的实验结果展示了算法的有效性。 Clustering is a common technique for data analysis, and has been widely used in many practical areas. However, in many practical applications, there are some reasons to cause the missing values in real data sets such as difficulties and limitations of data acquisition and random noises. Most of clustering methods can’t be used to deal with incomplete data sets for clustering analysis directly. For this reason, this paper proposes a three-way decision clustering algorithm for incomplete data based on q-nearest neighbors. Firstly, the algorithm finds the q-nearest neighbors for an object with missing values, and the missing value is filled by the average value of q-nearest neighbors. Secondly, it uses the clustering method based on density peaks for the complete data set to obtain the clustering result. For the data object with uncertainty in each cluster, it is designed to the boundary region of a cluster using the three-way decision theory. The three-way decision with interval sets naturally partitions a cluster into three regions as the positive region, boundary region and negative region, which has the advantage of dealing with soft clustering. The experimental results on some UCI data sets and synthetic data sets show preliminarily the effectiveness of the proposed algorithm.
作者 苏婷 于洪
出处 《计算机科学与探索》 CSCD 北大核心 2016年第6期875-883,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61379114 61272060~~
关键词 不完备数据 三支决策聚类 q近邻 incomplete data three-way decision clustering q-nearest neighbors
  • 相关文献

参考文献22

  • 1Jain A K. Data clustering: 50 years beyond K-means[J]. Pat- tern Recognition Letters, 2010, 31(8): 651-666.
  • 2Rubin D B. Inference and missing data[J]. Biometrika, 1976, 63(3): 581-592.
  • 3Little R J A, Rubin D B. Statistical analysis with missing data[M]. Hoboken, USA: John Wiley & Sons, 2014.
  • 4Hathaway R J, Bezdek J C. Fuzzy C-means clustering of in- complete data[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B Cybernetics, 2001, 31(5): 735-744.
  • 5Sarkar M, Leong T Y. Fuzzy K-means clustering with missing values[C]//Proceedings of the American Medical Informat- ics Association Symposium. Bethesda, USA: AMIA, 2001: 588-592.
  • 6Di Nuovo A G. Missing data analysis with fuzzy C-means: a study of its application in a psychological scenario[J]. Ex- pert Systems with Applications, 2011, 38(6): 6793-6797.
  • 7Aydilek I B, Arslan A. A hybrid method for imputation of missing values using optimized fuzzy C-means with sup- port vector regression and a genetic algorithm[J]. Information Sciences, 2013, 233: 25-35.
  • 8Himmelspach L, Conrad S. Fuzzy clustering of incomplete data based on cluster dispersion[C]//LNCS 6178: Computa- tional Intelligence for Knowledge-Based Systems Design, Proceedings of the 13th International Conference on Infor- mation Processing and Management of Uncertainty, Dort- mund, Germany, Jun 28-Jul 2, 2010. Berlin, Heidelberg: Springer, 2010: 59-68.
  • 9Jia Zhiping, Yu Zhiqiang, Zhang Cbenghui. Fuzzy C-means clustering algorithm based on incomplete data[C]//Procee dings of the 2006 International Conference on Information Acquisition, Weihai, China, Aug 20-23, 2006. Piscataway, USA: IEEE, 2006: 601-604.
  • 10Li Dan, Zhong Chongquan, Li Jinhua. An attribute weighted fuzzy C-means algorithm for incomplete data sets[C]//Pro- ceedings of the 2012 International Conference on System Science and Engineering. Dalian, China, Jun 30-Jul 2, 2012. Piscataway, USA: IEEE, 2012: 449-453.

同被引文献30

引证文献5

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部