摘要
针对有监督排序学习所需训练集的大量标注数据不易获得的情况,引入基于图的标签传播半监督学习。利用有限的已标记数据和大量未标记数据来完成训练数据的自动标注工作,解决大量训练数据集标注工作耗时耗力的难题。首先以训练数据为节点建立εNN图模型实现标签传播算法进行训练数据的自动标注,再基于得到的训练集使用Ranking SVM实现排序学习,在OHSUMED数据集上衡量该方法在MAP和NDCG@n评价准则下的性能。实验结果表明,该方法的性能优于普通pointwise排序学习方法,略低于普通pairwise排序学习方法,能够在达到可用性要求的前提下节省接近60%的训练集标注工作量。
In order to solve the problem that the large amount of labelled data in regard to training set needed by supervised ranking learning is hard to obtain, this paper introduces the graph-based label propagation semi-supervised learning and uses limited labelled data and a great deal of unlabelled data to complete the automatic labelling work of training data, this solves the problem of time-consuming and labourconsuming in labelling work for massive training data sets. In this paper we first build eNN graphs model with training data as nodes to achieve the automatic training data labelling by label propagation algorithm, then based on the derived training set we use rankingSVM to implement ranking learning, on ONSUMED data set we estimate the performance of the proposed method with evaluation criteria of MAP and NDCG@ n. Experimental result demonstrates that the performance of the proposed method is better than common pointwise ranking learning method, but a little poorer than pairwise ranking learning method, it can save about 60% workload of training set labelling under the premise of satisfying the demand of availability.
出处
《计算机应用与软件》
CSCD
2016年第1期286-290,共5页
Computer Applications and Software