期刊文献+

类属型数据核子空间聚类算法 被引量:5

Kernel Subspace Clustering Algorithm for Categorical Data
下载PDF
导出
摘要 现有的类属型数据子空间聚类方法大多基于特征间相互独立假设,未考虑属性间存在的线性或非线性相关性.提出一种类属型数据核子空间聚类方法.首先引入原作用于连续型数据的核函数将类属型数据投影到核空间,定义了核空间中特征加权的类属型数据相似性度量.其次,基于该度量推导了类属型数据核子空间聚类目标函数,并提出一种高效求解该目标函数的优化方法.最后,定义了一种类属型数据核子空间聚类算法.该算法不仅在非线性空间中考虑了属性间的关系,而且在聚类过程中赋予每个属性衡量其与簇类相关程度的特征权重,实现了类属型属性的嵌入式特征选择.还定义了一个聚类有效性指标,以评价类属型数据聚类结果的质量.在合成数据和实际数据集上的实验结果表明,与现有子空间聚类算法相比,核子空间聚类算法可以发掘类属型属性间的非线性关系,并有效提高了聚类结果的质量. Currently,the mainstream subspace clustering methods for categorical data are dependent on linear similarity measure and the relationship between attributes is overlooked.In this study,an approach is proposed for clustering categorical data with a novel kernel soft feature-selection scheme.First,categorical data is projected into the high-dimensional kernel space by introducing the kernel function and the similarity measure of categorical data in kernel subspace is given.Based on the measure,the kernel subspace clustering objective function is derived and an optimization method is proposed to solve the objective function.At last,kernel subspace clustering algorithm for categorical data is proposed,the algorithm considers the relationship between the attributes and each attribute assigned with weights measuring its degree of relevance to the clusters,enabling automatic feature selection during the clustering process.A cluster validity index is also defined to evaluate the categorical clusters.Experimental results carried out on some synthetic datasets and real-world datasets demonstrate that the proposed method effectively excavates the nonlinear relationship among attributes and improves the performance and efficiency of clustering.
作者 徐鲲鹏 陈黎飞 孙浩军 王备战 XU Kun-Peng;CHEN Li-Fei;SUN Hao-Jun;WANG Bei-Zhan(College of Mathematics and Informatics,Fujian Normal University,Fuzhou 350117,China;Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring(Fujian Normal University),Fuzhou 350117,China;College of Engineering,Shantou University,Shantou 515063,China;College of Software,Xiamen University,Xiamen 361005,China)
出处 《软件学报》 EI CSCD 北大核心 2020年第11期3492-3505,共14页 Journal of Software
基金 国家自然科学基金(U1805263,61672157) 福建省科技厅项目(JK2017007) 福建师范大学创新团队项目(IRTL1704)。
关键词 聚类 类属型数据 核方法 非线性度量 子空间 clustering categorical data kernel method nonlinear measure subspace
  • 相关文献

参考文献4

二级参考文献36

  • 1[1]Vapnik V N. The Nature of Statistical Learning Theory. Springer Verlag New York, 1995
  • 2[2]Scholkopf B, Smola A, Muller K. Non-linear Component Analysis as a Kernel Eigenvalue Problem. Neural Network,1998:1299-1319
  • 3[3]Muller K, Mika S, Ratsch G, et al. An Introduction to Kernel-based Learning Algorithms. IEEE Trans. on Neural Networks ,2001
  • 4[4]Sch lkopf B. The Kernel Trick for Distances. Technical Report MSR- TR-2000-51, 19 May 2000.
  • 5Jain A, Murty M, Flynn P. Data clustering.. A Review[J]. ACM Computing Surveys, 1999,31 (3) : 264-323.
  • 6Fiedler M. Algebraic connectivity of graphs. Czech, Math. J. , 1973,23: 298-305.
  • 7Malik J,Belongie S,Leung T, et al. Contour and texture analysis for image segmentation In Perceptual Organization for Artificial Vision Systems. Kluwer, 2000.
  • 8Weiss Y. Segmentation using eigenvectors: A unified view//International Conference on Computer Vision 1999.
  • 9Shi J,Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22 (8) : 888-905.
  • 10Wu Z, Leahy R. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation [J]. IEEE Trans on PAMI,1993, 15(11):1101-1113.

共引文献263

同被引文献34

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部