摘要
针对半监督谱聚类不能有效处理大规模数据,没有考虑约束传递不能充分利用有限约束信息的问题,提出一种结合稀疏表示和约束传递的半监督谱聚类算法。首先,根据约束信息生成约束矩阵,将其引入到谱聚类中;然后,将约束集合中的数据作为地标点构造稀疏表示矩阵,近似获得图相似度矩阵,从而改进约束谱聚类模型;同时,根据地标点的相似度矩阵生成连通区域,在每个连通区域内动态调整近邻点,利用约束传递进一步提高聚类准确率。实验表明,所提算法和约束谱聚类相比,在算法效率方面具有明显优势,且准确率没有明显下降;和快速谱聚类方法相比,在聚类准确率上有所提升。
The semi-supervised spectral clustering algorithm does not deal with large-scale datasets effectively and does not fully utilize the constraint information because it does not consider the constraint propagation.To address these drawbacks,this paper proposes a semi-supervised spectral clustering algorithm that combines sparse representation and constraint propagation.The algorithm first generates the constraint matrix according to the constraint information,introduces it into the spectral clustering,and then constructs a sparse representation matrix by taking the data points in the constrained sets as the landmarks to approximate the graph similarity matrix,thereby revising the constrained spectral clustering model.Meanwhile,the connected region is generated according to the similarity matrix of the landmark data points,and the neighboring nodes are dynamically adjusted in each connected region.The clustering accuracy is further improved using the constraint propagation.Experimental results show that the proposed method is more efficient than constrained spectral clustering algorithms,and their accuracy levels are similar.Moreover,its clustering accuracy exceeds those of the fast spectral clustering algorithms.
作者
赵晓晓
周治平
ZHAO Xiaoxiao;ZHOU Zhiping(Engineering Research Center of Internet of Things Technology Applications Ministry of Education,Jiangnan University,Wuxi 214122,China)
出处
《智能系统学报》
CSCD
北大核心
2018年第5期855-862,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61373126)
关键词
数据挖掘
聚类分析
谱聚类
半监督学习
稀疏表示
约束传递
data mining
cluster analysis
spectral clustering
semi-supervised learning
sparse representation
constraint propagation