摘要
流形学习算法在构造图模型时假设观测数据来自一个光滑的流形采样,但实际高维数据中由于各种因素经常存在噪声或异常值.针对概念分解算法无法有效地处理数据中存在的噪声问题,同时未考虑数据间的几何结构信息问题,提出一种基于稀疏约束的流形正则化概念分解算法.该算法通过l2,1范数对目标函数进行稀疏约束,得到具有鉴别能力的特征向量;同时构建拉普拉斯图正则项获得数据的流形结构信息,提高算法的鉴别能力.最后对文中算法的目标函数进行求解并证明了其收敛性;在PIE人脸库、AT&T人脸库、Reuters文本库和TDT2文本库上的实验结果表明,该算法提高了聚类的准确率和归一化互信息.
Manifold learning algorithms assumed that the observed data are sampled from a smooth manifold, while the actual high-dimensional data often exist noise or outliers due to various factors. The concept factorization (CF) algorithm cannot deal with the noise effectively and capture the intrinsic geometrical structure simultaneously. In this paper, a novel algorithm called sparse constrained manifold regularized concept factorization (SMCF) is proposed, which using i2>\ norm incorporated in the objective function of concept factorization to obtain the feature vectors with more discriminating ability, and extract the intrinsic manifold structure of samples by constructing graph Laplacian regularizer to improve the discrimination power. The objective function of SMCF is solved by the iterative multiplicative updating algorithm and its convergence is also proved in this paper. The experimental results on PIE, AT&T, Reuters and TDT2 datasets have shown that the proposed approach achieves better clustering performance in terms of accuracy and normalized mutual information.
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2016年第3期381-394,共14页
Journal of Computer-Aided Design & Computer Graphics
基金
国家自然科学基金(61272220
61101197)
中国博士后科学基金(2014M551599)
江苏省社会安全图像与视频理解重点实验室基金(30920130122006)
江苏省普通高校研究生科研创新计划项目(KYLX_0383)
关键词
非负矩阵分解
概念分解
l2.1范数
流形学习
图拉普拉斯
聚类
non-negative matrix factorization
concept factorization
l2,1 norm
manifold learning
graph Laplacian
cluster