期刊文献+

基于局部概率抽样的标签噪声过滤方法 被引量:3

Label noise filtering method based on local probability sampling
下载PDF
导出
摘要 分类学习任务中,在获取数据的过程中会不可避免地产生噪声,特别是标签噪声的存在不仅使得学习模型更复杂,而且容易造成过拟合并导致分类器泛化能力的下降。标签噪声过滤算法虽然在一定程度上可以解决上述问题,但是仍然存在噪声识别能力较差、分类效果不够理想以及过滤效率低等问题。针对这些问题,提出一种基于标签置信度分布的局部概率抽样方法来进行标签噪声过滤。首先利用随机森林分类器对样本的标签进行投票,从而获取每个样本的标签置信度;然后根据标签置信度的大小,将样本划分为易识别样本和难识别样本;最后分别采用不同的过滤策略对样本进行过滤。实验结果表明,在标签噪声存在的情况下,所提方法在大多数案例上能够保持较高的噪声识别能力,并且在分类泛化性能上也具有明显优势。 In the classification learning tasks,it is inevitable to generate noise in the process of acquiring data.Especially,the existence of label noise not only makes the learning model more complex,but also leads to overfitting and the reduction of generalization ability of the classifier.Although some label noise filtering algorithms can solve the above problems to some extent,there are still some limitations such as poor noise recognition ability,unsatisfactory classification effect and low filtering efficiency.Focused on these issues,a local probability sampling method based on label confidence distribution was proposed for label noise filtering.Firstly,the random forest classifiers were used to perform the voting of the labels of samples,so as to obtain the label confidence of each sample.And then the samples were divided into easy and hard to recognize ones according to the values of label confidences.Finally,the samples were filtered by different filtering strategies respectively.Experimental results show that in the situation of existing label noise,the proposed method can maintain high noise recognition ability in most cases,and has obvious advantage on classification generalization performance.
作者 张增辉 姜高霞 王文剑 ZHANG Zenghui;JIANG Gaoxia;WANG Wenjian(School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China;Key Laboratory of Computation Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University),Taiyuan Shanxi 030006,China)
出处 《计算机应用》 CSCD 北大核心 2021年第1期67-73,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(61673249,U1805263,61906113) 山西省国际合作重点研发计划(国际科技合作)项目(201903D421050) 山西省高等学校科技创新项目(2020L0007)。
关键词 标签噪声 局部概率抽样 噪声过滤 随机森林 置信度估计 label noise local probability sampling noise filtering Random Forest(RF) confidence estimation
  • 相关文献

参考文献1

共引文献8

同被引文献9

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部