摘要
正例未标注分类简称PU分类,由于只有正例样本与未标注样本,传统的分类方法在PU分类中往往效果不甚理想。为此利用PU分类下的AUC与传统分类下的AUC关系,提出了将传统分类方法中AUC作为目标函数应用到PU分类中,利用高斯核函数将原始样本映射到高维空间使数据线性可分。通过优化AUC目标函数得到解析解避免了多次迭代的麻烦,并可以推导出增量公式,加快了运算速度。实验结果表明,所提算法实现了与训练集内所有正例与负例标签都已知的理想支持向量机(SVM)相近的性能,并且实现了快速增量,是处理现实问题的有力工具。
Positive-unlabeled classification is referred to as PU classification.Since there are only positive samples and unlabeled samples,the traditional classification methods are not effective in PU classification.For this reason,this paper proposes to apply AUC(area under receiver operating characteristic curve)in traditional classification methods as an objective function to PU classification because of the relationship between AUC under PU classification and traditional classification.For making the data linearly separable,this paper uses Gaussian kernel function to map the original sample to high-dimensional space.Optimizing the AUC objective function to obtain an analytical solution avoids the trouble of multiple iterations,and can derive an incremental formula to speed up the operation speed.Experimental results show that the proposed algorithm achieves performance similar to an ideal support vector machine(SVM)whose labels are known for all positive and negative examples in the training set,and achieves rapid increments.It is a powerful tool for dealing with real problems.
作者
马毓敏
王士同
MA Yumin;WANG Shitong(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《计算机科学与探索》
CSCD
北大核心
2020年第11期1879-1887,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61572236。
关键词
机器学习
PU分类
AUC
增量算法
machine learning
positive-unlabeled(PU)classification
AUC
incremental algorithm