摘要
现实机器学习任务中一个样本通常和多个标记相关,但获取完整的标记信息需耗费大量人力物力,因此多标记学习经常会遇到标记缺失的情况.将未缺失的标记看作不完全的标记矩阵,将样本特征作为辅助信息,则可通过矩阵补全方法来解决该问题,以往研究主要针对线性可分情形,本文提出KernelMaxide方法,在处理线性不可分多标记数据中缺失的监督信息的同时,不仅能利用数据的非线性结构,还能考虑标记之间的相互关系.该方法依据矩阵核范数的表示定理,构建了基于核矩阵的核范数最小化优化目标以及相应的优化算法,并用Nystrm方法缓解核矩阵的存储和计算开销问题.实验显示出KernelMaxide的优越性能.
In practical machine learning, one instance is always associated with multiple labels. However, due to high cost, it is difficult to acquire the full supervised information for multi-label data. Thus, multi-label learning faces the problem of missing supervised information. By considering missing labels as unobserved entries in a matrix and features as side information, the matrix completion algorithm can be exploited to solve the missingsuper vised-information problem in multi-label learning. While the previous research often focused on the case where data is linearly separable, in this paper, we propose the KernelMaxide algorithm, which not only exploits the nonlinear structure in the missing-super vised-information multi-label data, but also considers the correlation between labels. In particular, we construct a novel optimization objective based on the kernel matrix, using the Representer Theorem of Matrix Norm. We further use the Nystrm method to reduce the memory and computational burden on the kernel matrix. Experiments show the merit of our proposal.
出处
《中国科学:信息科学》
CSCD
北大核心
2018年第1期47-59,共13页
Scientia Sinica(Informationis)
基金
国家自然科学基金(批准号:61333014)资助项目