期刊文献+

基于动态概率抽样的标签噪声过滤方法 被引量:7

Label noise filtering method based on dynamic probability sampling
下载PDF
导出
摘要 在机器学习问题中,数据质量对系统预测的准确性产生了深远的影响。由于信息获取的难度大,人类的认知主观且有限,导致了专家无法准确标记所有样本。而近年来出现的一些概率抽样方法无法避免样本人为划分不合理且主观性较强的问题。针对这一问题,提出一种基于动态概率抽样(DPS)的标签噪声过滤方法,充分考虑各个数据集样本间的差异性,通过统计各个区间内置信度分布频率,分析各个区间内置信度分布信息熵的走势,确定合理阈值。在UCI经典数据集中选取了14个数据集,将所提方法与随机森林(RF)、HARF、MVF、局部概率抽样(LPS)等方法进行了对比实验。实验结果表明,所提出的方法在标签噪声识别和分类泛化上均展示出了较高的能力。 In machine learning,data quality has a far-reaching impact on the accuracy of system prediction.Due to the difficulty of obtaining information and the subjective and limited cognition of human,experts cannot accurately mark all samples.And some probability sampling methods proposed in resent years fail to avoid the problem of unreasonable and subjective sample division by human.To solve this problem,a label noise filtering method based on Dynamic Probability Sampling(DPS)was proposed,which fully considered the differences between samples of each dataset.By counting the frequency of built-in confidence distribution in each interval and analyzing the trend of information entropy of built-in confidence distribution in each interval,the reasonable threshold was determined.Fourteen datasets were selected from UCI classic datasets,and the proposed algorithm was compared with Random Forest(RF),High Agreement Random Forest Filter(HARF),Majority Vote Filter(MVF)and Local Probability Sampling(LPS)methods.Experimental results show that the proposed method shows high ability on both label noise recognition and classification generalization.
作者 张增辉 姜高霞 王文剑 ZHANG Zenghui;JIANG Gaoxia;WANG Wenjian(School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University),Taiyuan Shanxi 030006,China)
出处 《计算机应用》 CSCD 北大核心 2021年第12期3485-3491,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(62076154,U1805263,61906113) 山西国际科技合作计划项目(201903D421050) 中央引导地方科技发展资金项目(YDZX20201400001224) 山西省高等学校科技创新项目(2020L0007)。
关键词 标签噪声 动态概率抽样 噪声过滤 标签置信度 置信度 label noise Dynamic Probability Sampling(DPS) noise filtering label confidence confidence
  • 相关文献

参考文献4

二级参考文献24

  • 1Berenson M L, Levine D M, Goldstein M. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs: Prentice Hall, N J, 1983.
  • 2Christensen R. Analysis of Variance, Design and Regression. Applied:Statistical Methods. 1st Edition, Chapman & Hall, 1996.
  • 3Fenton N E, Pfleeger S L. Software Metrics: A Rigorous and Practical Approach. 2nd Edition, Boston: PWS Publishing, MA, 1997.
  • 4Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, CA, 1993.
  • 5Holte R C. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 1993, 11: 63-91.
  • 6Atkeson C G, Moore A W, Schaal S. Locally weighted learning. Artificial Intelligence Review, 1997, 11(1/5): 11-73.
  • 7Cohen W W. Fast effective rule induction. In Proc. the 12th Int. Conf. Machine Learning, Priedities A, Russell S (eds.),
  • 8Kolodner J. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1993.
  • 9Taghi M Khoshgoftaar, Shi Zhong, Vedang Joshi. Noise elimination with ensemble-classifier filtering for software quality estimation. Intelligent Data Analysis, 2005, 9(1): 3-27.
  • 10Witten I H, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, 2005.

共引文献18

同被引文献22

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部