期刊文献+

分类中的类重叠问题及其处理方法研究 被引量:9

Towards classification with class overlapping
下载PDF
导出
摘要 类重叠问题是数据挖掘与机器学习领域的瓶颈问题之一.如果其中还存在类不均衡问题时,情况变得更加复杂.有鉴于此,本文在已有文献基础上归纳了三种类重叠学习算法及提出一种新的方法:分隔法,并首次将支持向量数据描述算法用于实际数据的重叠样本识别,对类重叠问题及其与类不均衡问题的相互影响进行了系统研究.在真实数据上采用五种分类器的实验结果表明:1)多数情况下"分隔法"是表现最佳的类重叠学习算法;2)分隔法通常对基于分界面而非规则的分类器更为有效;3)分隔法在类不均衡问题中表现很好,当基础分类器为支持向量机时尤为突出.最后针对支持向量机的实验结果给出了理论分析. Classification with class overlapping (CWCO) has long been regarded as one of the toughest yet pervasive problems in data mining and machine learning communities. When it is combined with the well- known class imbalance problem, the situation becomes even more complicated, and few works in the literature addresses this problem. To meet this critical challenge, in this paper, we make a systematic study on the CW- CO problem and its interrelationship with the class imbalance problem. Specifically, we first introduce the support vector data description (SVDD) algorithm for capturing overlapping objects, and then introduce three learning schemes and propose a separating scheme for solving the CWCO problem. Extensive experiments on various real-world data sets using five different classifiers show that the separating scheme: 1 ) performs the best among the four schemes for CWCO, 2) is more suitable for classifiers using decision boundaries, and 3 ) performs well for class imbalance data, in particular with the support vector machines (SVMs). Finally, we provide theoretic explanations for the superior performance' of the separating scheme using SVMs.
出处 《管理科学学报》 CSSCI 北大核心 2013年第4期8-21,共14页 Journal of Management Sciences in China
基金 国家自然科学基金资助项目(71201004 70901002) 国家自然科学基金重大研究计划资助培育项目(90924020) 北京市教育委员会科技发展计划面上项目(km201310011009) 北京市大学生科学研究与创业行动计划建设项目(pxm2012_014213_000067)
关键词 数据挖掘 分类 类重叠 类不均衡 支持向量数据描述 data mining classification class overlapping class imbalance support vector data description ( SVDD )
  • 相关文献

参考文献30

  • 1Wu J, Xiong H, Chen J. COG: Local decomposition for rare class analysis [ J ]. Data Mining and Knowledge Discovery, 2010, 20(2) : 191 -220.
  • 2邹鹏,李一军,郝媛媛.基于代价敏感性学习的客户价值细分[J].管理科学学报,2009,12(1):48-56. 被引量:9
  • 3夏国恩.基于满意属性选择的客户流失预测[J].管理学报,2010,7(6):856-860. 被引量:5
  • 4杨海军,太雷.基于模糊支持向量机的上市公司财务困境预测[J].管理科学学报,2009,12(3):102-110. 被引量:43
  • 5Liu C L. Partial discriminative training for classification of overlapping classes in document analysis[ J]. International Jour- nal on Document Analysis and Recognition, 2008, 11 (2) : 53 -65.
  • 6Garcia V, Alejo R, S6nchez J S, et al. Combined effects of class imbalance and class overlap on instance-based classifica- tion[ C ]//In Intelligent Data Engineering and Automated Learning, 2006, 371 -378.
  • 7Jo T, Japkowicz N. Class imbalances versus small disjuncts [ J ]. ACM SIGKDD Explorations Newsletter, 2004, 6 ( 1 ) : 40 - 49.
  • 8He H, Garcia E A. Learning from imbalanced data[ J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21 (9) : 1263 - 1284.
  • 9刘叶青,刘三阳,谷明涛.多项式光滑的半监督支持向量分类机[J].系统工程理论与实践,2009,29(7):113-118. 被引量:4
  • 10Prati R C, Batista G E, Monard M C. Class imbalance versus class overlapping: An analysis of a learning system behavior [ C]//In Proceedings of the Mexican International Conference on Artificial Intelligence, 2005, 312 -321.

二级参考文献59

共引文献58

同被引文献117

引证文献9

二级引证文献70

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部