一种基于置信度差异代价敏感的主动学习算法

Active Learning Algorithm Based on Confidence Diversity Cost Sensitivity

下载PDF

导出

摘要主动学习时向专家查询得到的标注如果带有噪声,将会影响学习的性能。为减少噪声,人们提出了基于"少数服从多数"的多专家主动学习算法,但该算法的缺点是代价往往太高。文章采用了一种自我训练(self-training)方法,对某些平均置信度高的样本,直接确定其分类标注,不必向专家查询,以节省学习代价。同时,使用置信度差异作为度量标准,选取那些最不确定的样本向专家查询,提高了学习效率。在UCI数据集上验证了本文算法的有效性。 It is known that the noise in labels deteriorates the performance of active learning. To reduce the inverse effect of the noise, many algorithms based on multiple experts have been proposed. The drawback of these algorithms lies in that it costs too much. This paper proposes a self-training method which can directly determine the labels of some unlabeled instances without consulting the experts so as to reduce the cost of learning. Simultaniously, to improve learning efficiency, confidence diversity as a measure is employed and uncertain instances are selected to be labeled without consulting experts. The experimental results on UCI data sets validated the effectiveness of the proposed method.

作者武永成

机构地区荆楚理工学院计算机工程学院

出处《湖北工程学院学报》 2013年第6期16-19,共4页 Journal of Hubei Engineering University

关键词主动学习噪声数据置信度差异自我训练 active iearning noisy data confidence diversity self-training

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1Settles B. Active Learning Literature Survey[R].University of Wisconsin-Madison,2010.
2Zhu X. Semi-supervised learning literature survey[R].University of Wisconsin-Madison,2005.
3Zhou Z H,Li M. Semi-supervised learning by disagreement[J].{H}Knowledge and Information Systems,2010,(03):415-439.
4Turney P D. Types of cost in inductive concept learning[A].2000.15-21.
5Efron B,Tibshirani R. An introduction to the Bootstrap[M].{H}CRC Press,1994.8-10.
6Blake C,Keogh E,Merz C J. UCI repository of machine learning databases[EB/OL].http://www.ics.uci.edu/mlearn/MLRepository.html,.
7Tong S,Koller D. Support vector machine active learning with applications to text classification[J].{H}JOURNAL OF MACHINE LEARNING RESEARCH,2001.45-66.

1黄震.网络辅助教学系统的设计与实现[J].宁波职业技术学院学报,2007,11(2):42-45.
2武永成,刘钊.一种置信度可控的主动学习算法[J].现代计算机,2013,19(24):35-38.
3曹慧,刘玉峰.未标记样本在半监督学习中的应用方法研究[J].广西轻工业,2008,24(12):80-81. 被引量：1
4韦华.机器学习渐入佳境——可以自我训练和提高的软件技术[J].微电脑世界,2006(4):20-20.
5陈可佳,韩京宇,郑正中.半监督学习在链接预测问题中的应用[J].计算机工程与应用,2012,48(33):136-141. 被引量：6
6刘蓉,李红艳.半监督学习研究与应用[J].软件导刊,2010(8):6-7. 被引量：2
7陆广泉,谢扬才,刘星,张师超.一种基于KNN的半监督分类改进算法[J].广西师范大学学报（自然科学版）,2012,30(1):45-49. 被引量：7
8上网会影响学习吗？[J].复印报刊资料（成长读本）,2010(2):9-9.
9乐乐姐姐信箱[J].课堂内外（小学版）,2007(5):62-63.
10本贴话题：上网会影响学习吗？[J].少男少女,2009(23):8-8.

湖北工程学院学报

2013年第6期

浏览历史

内容加载中请稍等...

一种基于置信度差异代价敏感的主动学习算法

参考文献7

相关作者

相关机构

相关主题

浏览历史