期刊文献+

基于池的无监督线性回归主动学习 被引量:7

Unsupervised Pool-Based Active Learning for Linear Regression
下载PDF
导出
摘要 在许多现实的机器学习应用场景中,获取大量未标注的数据是很容易的,但标注过程需要花费大量的时间和经济成本.因此,在这种情况下,需要选择一些最有价值的样本进行标注,从而只利用较少的标注数据就能训练出较好的机器学习模型.目前,主动学习(Active learning)已广泛应用于解决这种场景下的问题.但是,大多数现有的主动学习方法都是基于有监督场景:能够从少量带标签的样本中训练初始模型,基于模型查询新的样本,然后迭代更新模型.无监督情况下的主动学习却很少有人考虑,即在不知道任何标签信息的情况下最佳地选择要标注的初始训练样本.这种场景下,主动学习问题变得更加困难,因为无法利用任何标签信息.针对这一场景,本文研究了基于池的无监督线性回归问题,提出了一种新的主动学习方法,该方法同时考虑了信息性、代表性和多样性这三个标准.本文在3个不同的线性回归模型(岭回归、LASSO(Least absolute shrinkage and selection operator)和线性支持向量回归)和来自不同应用领域的12个数据集上进行了广泛的实验,验证了其有效性. In many real-world machine learning applications,unlabeled data can be easily obtained,but it is very time-consuming and/or expensive to label them.So,it is desirable to be able to select the optimal samples to label,so that a good machine learning model can be trained from a minimum number of labeled data.Active learning(AL)has been widely used for this purpose.However,most existing AL approaches are supervised:they train an initial model from a small number of labeled samples,query new samples based on the model,and then update the model iteratively.Few of them have considered the completely unsupervised AL problem,i.e.,starting from zero,how to optimally select the very first few samples to label,without knowing any label information at all.This problem is very challenging,as no label information can be utilized.This paper studies unsupervised pool-based AL for linear regression problems.We propose a novel AL approach that considers simultaneously the informativeness,representativeness,and diversity,three essential criteria in AL.Extensive experiments on 12 datasets from various application domains,using three different linear regression models(ridge regression,LASSO(least absolute shrinkage and selection operator),and linear support vector regression),demonstrated the effectiveness of our proposed approach.
作者 刘子昂 蒋雪 伍冬睿 LIU Zi-Ang;JIANG Xue;WU Dong-Rui(Ministry of Education Key Laboratory on Image Information Processing and Intelligent Control,School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan 430074)
出处 《自动化学报》 EI CAS CSCD 北大核心 2021年第12期2771-2783,共13页 Acta Automatica Sinica
基金 湖北省技术创新专项基金(2019AEA171) 国家自然科学基金(61873321) NSFC-深圳机器人基础研究中心重点项目基金(U1913207) 科技部政府间国际科技创新合作重点专项基金(2017YFE0128300)资助。
关键词 主动学习 无监督学习 线性回归 支持向量回归 LASSO 岭回归 Active learning(AL) unsupervised learning linear regression support vector regression least absolute shrinkage and selection operator(LASSO) ridge regression
  • 相关文献

同被引文献51

引证文献7

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部