摘要
主动学习时向专家查询得到的标注如果带有噪声,将会影响学习的性能。为减少噪声,人们提出了基于"少数服从多数"的多专家主动学习算法,但该算法的缺点是代价往往太高。文章采用了一种自我训练(self-training)方法,对某些平均置信度高的样本,直接确定其分类标注,不必向专家查询,以节省学习代价。同时,使用置信度差异作为度量标准,选取那些最不确定的样本向专家查询,提高了学习效率。在UCI数据集上验证了本文算法的有效性。
It is known that the noise in labels deteriorates the performance of active learning. To reduce the inverse effect of the noise, many algorithms based on multiple experts have been proposed. The drawback of these algorithms lies in that it costs too much. This paper proposes a self-training method which can directly determine the labels of some unlabeled instances without consulting the experts so as to reduce the cost of learning. Simultaniously, to improve learning efficiency, confidence diversity as a measure is employed and uncertain instances are selected to be labeled without consulting experts. The experimental results on UCI data sets validated the effectiveness of the proposed method.
出处
《湖北工程学院学报》
2013年第6期16-19,共4页
Journal of Hubei Engineering University
关键词
主动学习
噪声数据
置信度差异
自我训练
active iearning
noisy data
confidence diversity
self-training