摘要
在信息爆炸时代,大数据处理已成为当前国内外热点研究方向之一.谱分析型算法因其特有的性能而获得了广泛的应用,然而受维数灾难影响,主流的谱分析法对高维数据的处理仍是一个极具挑战的问题.提出一种兼顾维数特征优选和图Laplacian约束的聚类模型,即联合拉普拉斯正则项和自适应特征学习(joint Laplacian regularization and adaptive feature learning,简称LRAFL)的数据聚类算法.基于自适应近邻进行图拉普拉斯学习,并将低维嵌入、特征选择和子空间聚类纳入同一框架,替换传统谱聚类算法先图Laplacian构建、后谱分析求解的两级操作.通过添加非负加和约束以及低秩约束,LRAFL能获得稀疏的特征权值向量并具有块对角结构的Laplacian矩阵.此外,提出一种有效的求解方法用于模型参数优化,并对算法的收敛性、复杂度以及平衡参数设定进行了理论分析.在合成数据和多个公开数据集上的实验结果表明,LRAFL在效果效率及实现便捷性等指标上均优于现有的其他数据聚类算法.
The explosion of information has been evoking a leading wave of big data research during recent years.Despite many empirical successes of spectral clustering algorithms,it is still challenging to cluster the high dimensional data due to the curse of dimensionality.This study proposes a novel algorithm referred to as joint Laplacian regularization and adaptive feature learning(LRAFL),which adaptively learns the feature weights and fits the feature selection as well as clustering into a unified framework,rather than the two-phase strategy of typical approaches.With a new rank constraint imposed on the Laplacian matrix,the connected components in the resulted similarity matrix are exactly equal to the cluster number.An effective approach is also proposed to solve the formulated optimization problem.Comprehensive analyses,including convergence behavior,computational complexity,and together with parameter determination are also presented.Surprisingly sound experimental results can be achieved on synthetic data and benchmark datasets by the proposed algorithm when compared with the related state-of-the-art clustering approaches.
作者
郑建炜
李卓蓉
王万良
陈婉君
ZHENG Jian-Wei;LI Zhuo-Rong;WANG Wan-Liang;CHEN Wan-Jun(School of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China;School of Computer and Computing Science,Zhejiang University City College,Hangzhou 310015,China)
出处
《软件学报》
EI
CSCD
北大核心
2019年第12期3846-3861,共16页
Journal of Software
基金
国家自然科学基金(61602413,61873240)
浙江省自然科学基金(LY19F030016)~~