摘要
谱嵌入聚类(SEC)算法要求样本满足流形假设,样本标签总是可以嵌入到一个线性空间中去,这为线性可分数据的谱嵌入聚类问题提供了新的思路,但该算法使用的线性映射函数不适用于处理高维非线性数据。针对这一问题,通过核化线性映射函数,建立了基于核函数的谱嵌入聚类(KSEC)模型,该模型既能解决线性映射函数不能处理非线性数据的问题,又实现了对高维数据的核降维。在真实数据集上的实验分析结果表明,使用所提算法后聚类正确率平均提高了13.11%,最高可提高31.62%,特别在高维数据上平均提高了16.53%,而且在算法关于参数的敏感度实验中发现算法的稳定性更好。所以改进后的算法对高维非线性数据具有很好的聚类效果,获得了比传统谱嵌入聚类算法更高的聚类准确率和更好的聚类性能。所提方法可以用于诸如遥感影像这类复杂图像的处理领域。
Samples are required to meet the manifold assumption in Spectral Embedded Clustering (SEC) algorithm, and class labels of samples can always be embedded in a linear space, which provides a new idea for spectral clustering of linearly separable data, but the linear mapping function used by the spectral embedded clustering algorithm is not available to process the nonlinear high-dimensional data. To solve this problem, this paper cored the linear mapping function, built a Spectral Embedded Clustering based on Kernel function (KSEC) model. This model can solve the problem that the linear mapping function can't deal with nonlinear data, as well as it can achieve kernel's dimension reduction synchronously. The experimental results on real data sets show that the improved algorithm can improve the clustering accuracy by 13.11% averagely, and the highest 31.62%, especially for high-dimensional data clustering accuracy can be increased by 16.53% on average. And the sensitive experiments on algorithm to parameters show the stability of the improved algorithm, so compared with traditional spectral clustering algorithms, higher accuracy and better clustering performance are obtained. And the method can be used for such complex image processing field as remote sensing image.
出处
《计算机应用》
CSCD
北大核心
2015年第3期761-765,810,共6页
journal of Computer Applications
基金
国家自然科学基金青年科学基金资助项目(61403394)
国家863计划项目(2012AA011004
2012AA0622022)
中央高校基本科研业务费专项资金资助项目(2014QNA45)
教育部博士点基金资助项目(20100095110003
20110095110010)
关键词
谱聚类
谱嵌入
核函数
高维数据
spectral clustering
spectral embedded
kernel function
high-dimensional data