期刊文献+

一种基于重取样的代价敏感学习算法 被引量:2

A Novel Cost Sensitive Learning Algorithm Based on Re-sampling
下载PDF
导出
摘要 大多数非均衡数据集的研究集中于纯重构数据集或者纯代价敏感学习,本文针对数据集类分布非均衡和不相等误分类代价往往同时发生这一事实,提出了一种以最小误分类代价为目标的基于混合重取样的代价敏感学习算法。该算法将两种不同类型解决方案有机地融合在一起,先用样本类空间重构的方法使原始数据集的两类数据达到基本均衡,然后再引入代价敏感学习算法进行分类,能提高少数类分类精度,同时有效降低总的误分类代价。实验结果验证了该算法在处理非均衡类问题时比传统算法要优越。 Most studies on the imbalanced data set classification focus on the discussion of re-sam- pling or cost-sensitive learning systems themselves however, the fact that the costs of imbalanced class distribution and unequal misclassification errors always occur simultaneously is neglected. We propose a novel cost sensitive learning (CSL) algorithm which combines the methods of re-sampling and the CSL techniques together in order to solve the misclassification problem of imbalanced data set. On one hand, the re-sampling technique allows the balanced data sets by reconstructing both the majority and the mi- nority class. On the other hand, the classification is performed based on the minimal misclassification cost but not the maximal accuracy. Here the misclassification cost for the minority class is much higher than the misclassification cost for the majority class. A cost-sensitive learning procedure is then conduc- ted for classification. The experimental results show that the proposed method can improve the classifi- cation accuracy and decrease the misclassification cost effectively, and the algorithm is superior to the traditional algorithms as for dealing with the imbalanced problem.
出处 《计算机工程与科学》 CSCD 北大核心 2011年第9期130-135,共6页 Computer Engineering & Science
基金 国家自然科学基金资助项目(61075063) 国家863计划资助项目(2009AA12Z117) 湖北省自然科学基金资助项目(2010CDB05201) 湖北省教育厅中青年项目(Q20112604)
关键词 分类 非均衡数据集 混合重取样 代价敏感学习 classification imbalanced dataset hybrid re-sampling cost sensitive learning
  • 相关文献

参考文献10

  • 1Fan W,Stofol S,Zhang J X. AdaCost: Misclassification Cost- Sensitive Boosting[C]//Proc of the 16th Int'l Conf on Machine Learning, 1999:97-105.
  • 2Joshi M V, Agarwal R C, Kumar V. Predicting Rare Classes: Can Boosting Make any Weak Learner Strong[C]//Proc of the 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, 2002:297-306.
  • 3Maloof M. Learning When Data Sets Are Imbalaneed and When Costs Are Unequal and Unknown[C]//Proc of the Working Notes of the ICML'03 Workshop on Learning from Imhalanced Data Sets,2003.
  • 4Friedman J H, Olshen R A, Stone C J, et al. Classification and Regression Trees[M]. American Statistical Association, The Film House,1986.
  • 5Ciraco M, Rogalewski M, Weiss G. Improving Classifier Utility by Altering the Misclassification Cost Ratio[C]//Proc of the 1st Int'l Workshop on Utility-based Data Mining, 2005:46-52.
  • 6凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 7Elkan C. The Foundations of Cost-Sensitive Learning[C]// Proe of the Seventeenth Int'l Joint Conf on Artificial Intelligence (IJCAI'01), 2001:973-978.
  • 8谷琼,袁磊,熊启军,宁彬,李文新.基于非均衡数据集的代价敏感学习算法比较研究[J].微电子学与计算机,2011,28(8):146-149. 被引量:30
  • 9http://archive, ics. uci. edu/ml/datasets, html.
  • 10Ouinlan J. R. C4. 5: Programs for Machine Learning[M]. Morgan Kaufmann, 1993.

二级参考文献32

  • 1凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 2Friedman J H, Olshen R A, Stone C J, et al. Classifica- tion and regression trees[M]. American Statistical Asso- ciation: The Film House, 1986.
  • 3Elkan (2. The foundations of cost- sensitive learning [C]//Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI' 01). Wash- ington DC, 2001 : 973-978.
  • 4Ciraco M, Rogalewski M, Weiss G. Improving classifier utility by altering the misclassification cost ratio[C]//the 1st International Workshop on Utility-based Data Mining. New York, 2005 : 46-52.
  • 5Fan W, Stofol S, Zhang J X. Ada cost: misclassification cost--sensitive boosting[C]//Proc of the 16th lnt' 1 Conf on Machine Lming. Slovenia: Bled, 1999 : 97-105.
  • 6Maloof M. Learning when data sets are imbalanced and when costs are unequal and unknown[C]// Working Notes of the ICML'03 Workshop on Learning from Im- balanced Data Sets. Washingtzon, DC. 2003.
  • 7The Center for Machine Learning and Intelligent Systems. UC irvine machine learning repository[DB/OL]. [1989-01-01]. http://archive, ics. uci. edu/ml/dataset: html.
  • 8Turney P D.Types of cost in inductive concept learning//Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.Stanford University,California,2000:15-21
  • 9Domingos P.MetaCost:A general method for making classifiers cost-sensitive//Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining.San Diego,CA,USA,1999:155-164
  • 10Elkan C.The foundations of cost-sensitive learning//Proceedings of the 17th International Joint Conference of Artificial Intelligence.Seattle,WA,USA,2001:973-978

共引文献64

同被引文献26

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部