期刊文献+

一种新的面向非平衡分类问题的特征变换方法 被引量:1

Novel Feature Transformation for Imbalanced Problem
下载PDF
导出
摘要 通过学习特征变换矩阵,可以将样本映射到新的空间,以适应给定的样本距离测度方法.基于此,提出一种面向k近邻的特征变换方法用于提高k近邻分类算法在非平衡数据集分类问题中的分类性能.该方法最大化基于g-mean的目标函数,学习线性特征变换矩阵,使得在新空间中同类距离尽可能小而异类距离尽可能大.基于g-mean的目标函数充分考虑了稀有类数据的特征,进而有效地保证在新空间中,k近邻对稀有类数据有更好的分类性能.UCI数据集上的实验结果表明,该方法能有效提高k近邻在稀有类问题中的泛化能力;较之于传统的PCA、LDA,该变换方法也显示出明显优势. Feature transformation learning can map the original data space to a new one in which a given distance metric is suitable to calculate the distances between samples. This paper proposes a new feature transformation method to improve the performance of k nearest neighbor on imbalanced data sets. This method maximize the loss function based on g-mean metric to learn an optimal transformarion matrix such that in new space intra-class neighbors become more nearby, while extra-class neighbors become as far away from each other as possible. The loss function based on g-mean fully considers the characterisric of the rare class, which guarantees that KNN achieves better performance on rare class in the new metric space. The experiments on UCI data sets show that the proposed method effectively improves the KNN generalization ability for imbalanced problem. Besides, the proposed method presents obvious advantage compared with PCA and LDA.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第5期1037-1041,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61170223)资助 河南省教育厅科学技术研究重点项目(14A520016)资助
关键词 非平衡数据集 特征变换 K近邻 PCA LDA imbalanced data sets feature transforrnation KNN PCA LDA
  • 相关文献

参考文献1

二级参考文献21

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:378
  • 2Japkowicz N. Learning from imbalanced data sets: A comparison of various strategies, WS-00-05 [R]. Menlo Park, CA: AAAI Press, 2000
  • 3Chawla N V, Japkowicz N, Kotcz A. Editorial: Special issue on learning from imbalaneed data sets [J]. Sigkdd Explorations Newsletters, 2004, 6( 1 ) : 1-6
  • 4Weiss Gary M. Mining with rarity: A unifying frameworks [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 7-19
  • 5Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown [OL]. [2008-01-06]. http://www. site. uottawa. ca/-nat/workshop2003/workshop 2003. html
  • 6Chawla N V, Hall L O, Bowyer K W, et al. SMOTE: Synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16 : 321-357
  • 7Jo Taeho, Japkowicz Nathalie. Class imbalances versus small disjunets [J]. SIGKDD Explorations Newsletters, 2004, 6 (1): 40-49
  • 8Batista E A P A, Prati R C, Monard M C. A study of the behavior of several methods for halaneing machine learning training data [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 20-29
  • 9Guo Hongyu, Viktor Herna L. Learning from imbalanced data sets with boosting and data generation: The DataBoostIM approach [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 30-39
  • 10Chawla N V, Lazarevic A, Hall L O, et al. Smoteboost: Improving prediction of the minority class in boosting [C] // Proc of the Seventh European Conf on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2003:107-119

共引文献32

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部