期刊文献+

基于KPCA的不平衡数据欠抽样算法

KPCA-Based Under-Sampling Algorithm for Unbalanced Data
下载PDF
导出
摘要 在现实世界的分类任务中,不平衡数据通常呈现非线性分布的特点,而传统的抽样方法难以有效处理这些非线性,导致分类效果不佳。为了解决这个问题,本文提出了一种基于核主成分分析(KPCA)的欠抽样方法。该方法通过使用非线性核函数将原始数据映射到适当的高维空间使其线性化,然后根据每个样本在核主成分上的得分来选择性地删除多数类样本,从而实现欠抽样。在9组具有不同平衡率的数据集上,采用本文提出的方法进行了欠抽样预处理,并使用逻辑回归(Logistic Regression)分类器进行分类。实验结果表明,在Accuracy、F1-measure和AUC值三个指标中,本文方法分别在7组、8组和9组数据集上取得了最高评分。这表明该方法在不平衡数据集上具有良好的分类性能。The unbalanced data in the real classification task are mostly characterized by nonlinear distribution, and the traditional sampling method is not good at dealing with this kind of nonlinearity resulting in unsatisfactory sample classification effect. Aiming at this problem, an under-sampling method based on KPCA is proposed. The method maps the original data to a suitable high-dimensional space to make it linearly divisible by nonlinearly transforming the kernel function, and de-redundantly removes the majority class by calculating the scores of individual samples on the kernel principal components in order to achieve the purpose of under-sampling. After the under-sampling preprocessing of nine datasets with different balance rates, the classification is performed using Logistic Regression classifier model. The experimental results show that the algorithm of this paper obtains the highest evaluation metrics under Accuracy, F1-measure and AUC value scores under 7, 8 and 9 groups of datasets, respectively, which shows that the method has a good classification performance on unbalanced datasets.
出处 《应用数学进展》 2024年第9期4108-4118,共11页 Advances in Applied Mathematics
  • 相关文献

参考文献7

二级参考文献52

  • 1赵丽红,孙宇舸,蔡玉,徐心和.基于核主成分分析的人脸识别[J].东北大学学报(自然科学版),2006,27(8):847-850. 被引量:17
  • 2Vladimir N Vapnik. The Nature of Statistical Learning The- ory [ M ]. NY: Springer-Verlag, 1995.
  • 3Shawe-Tayor J, Cristianini N. Kernel Methods for Pattern Analysis[ M]. Cambridge University Press,2004.
  • 4Schlkopf B, Smola A J. Learning with Kernels [ M ]. Lon- don: MIT Press ,2002.
  • 5Mika S,Ratsch G,Weston J,et al. Fisher discriminant anal- ysis with kernels [ C 1/'/Proceedings of the 1999 IEEE Sig-nal Processing Society Workshop, Neural Networks for Sig- nal Processing IX. 1999:41-48.
  • 6Smola A J, Schlkopf B. Generalized Discriminal Analysis [ DB/OL]. http ://www. kernel-machines, org/,2005-03-15.
  • 7Seholkopf B, Smola A, Muller K R. Nonlinear component a- nalysis as a kernel Eigen-value problem [ J ]. Neural Com- putation, 1998,10 (6) : 1299-1319.
  • 8Mike S, Scholkopf B, Smola A. Kernel PCA and denoising in feature space [ J ]. Advances in Neural Information Pro- cessing Systems, 1999,11:536-524.
  • 9Keethi S, Lin C J. Asymptotic behaviors of support vector machines with Gaussian kernel[ J ]. Netural Computation, 2003,15(7) :1667-1689.
  • 10Schlkopf Bernhard ,Smola Alexander J. Support Vector Learn- ing[ M]. R. Oldenbourg Verlag,Mtmich,1997.

共引文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部