摘要
传统的主动学习方法往往仅基于当前的目标模型来挑选样本,而忽略了历史模型所蕴含的对未标注样本预测稳定性的信息。因此,提出基于不稳定性采样的主动学习方法,依据历史模型的预测差异来估计每个未标注样本对提高模型性能的潜在效用。该方法基于历史模型对样本的预测后验概率之间的差异来衡量无标注样本的不稳定性,并挑选最不稳定的样本进行查询。在多个数据集上的大量实验结果验证了方法的有效性。
Traditional active learning methods select examples by only considering the predictions of the current model.However,these methods neglect the information of the previous trained models,which reflect the stability of the prediction sequence for each unlabeled example during the active learning stage.Thus,a novel active learning method with instability sampling was proposed,which attempted to estimate the potential utility of each unlabeled examples for improving the model performance based on the difference among predictions of the previous models.The proposed method measured the instability of unlabeled example based on the difference between the posterior probabilities predicted by the previous models,and the example with the largest instability was selected to be queried.Extensive experiments were conducted on multiple datasets with diverse classification models.The experimental results validate the effectiveness of the proposed method.
作者
何花
谢明昆
黄圣君
HE Hua;XIE Mingkun;HUANG Shengjun(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2022年第3期50-56,共7页
Journal of National University of Defense Technology
基金
新一代人工智能重大资助项目(2020AAA0107000)
江苏省自然科学基金资助项目(BK20211517)。
关键词
主动学习
标注代价
不稳定性
后验概率
熵
active learning
labeling cost
instability
posterior probability
entropy