摘要
监督型顺序回归算法需要足够多的有标签样本,而在实践中,标注样本的序数耗时耗力,甚至难以完成。为此,提出一种集成最近邻规则的半监督顺序回归算法。基于最近邻,针对每个有标签样本,在无标签数据集选择与其最近似的若干样本赋以相同序数;再由监督型顺序回归算法训练有标签样本和新标注样本。多个数据集的实验结果显示,该方法能显著改善顺序回归性能。另外,引入折扣因子λ评估新标注样本的可信度,并讨论了λ和有标签数据集大小对方法的影响。
The supervised ordinal regression algorithm often requires large amount of labeled samples.However,in the real applications,labeling instances is time and labor consuming,and sometimes even unrealistic.Therefore,a semi-supervised ordinal regression algorithm was proposed,which learned from both the labeled and unlabeled examples.The proposed method began by choosing some instances from unlabeled dataset that are most similar to one labeled example in labeled dataset,and assigning them the corresponding ranker.At this stage,the nearest neighbor rule was packed to score the similarity of two instances.Then,by using supervised ordinal regression,the ranking model was trained from both the labeled and the newly labeled examples.The experimental results show this method produce statistically significant improvements with respect to ranking measures.On the other hand,discount factor λ was introduced for evaluating creditable degree of new labeled examples,and how λ and the size of labeled dataset affected the method performance was discussed.
出处
《计算机应用》
CSCD
北大核心
2010年第4期1022-1025,共4页
journal of Computer Applications
基金
湖南省教育厅科学研究项目(07C133)
关键词
半监督顺序回归
最近邻
无标签样本
折扣因子
semi-supervised ordinal regression
nearest neighbor
unlabeled sample
discount factor