摘要
支持向量机(SVM)在处理不平衡样本集时,对少类样本的分类效果很不理想。为提高支持向量机在处理不平衡问题上的分类效果,提出了一种核函数选取与欠采样相结合的算法,在提高少类样本准确率的前提下,将多类样本的分类准确率的损失降到最低。该方法首先基于特征空间的可分性选择最佳核函数,然后根据特征距离进行欠采样。基于UCI标准样本集的仿真实验结果表明了该算法是合理有效的。
Support vector machine (SVM) is unsatisfactory in the classification performance of minority class when dealing with imbalanced dataset. To improve the classification performance of support vector machine in the issue of unbalanced sample, an algorithm combining selection of kernel function and under-sampling is presented, in the premise of increasing the accuracy of minority class, this algorithm minimizes the loss of the accuracy of majority class. The best kernel function based on separability in the feature space is selected, then the part of the majority class is deleted according to the feature distance. Simulation experiment results on UCI stander data shows that the algorithm is reasonable and effective.
出处
《计算机工程与设计》
CSCD
北大核心
2013年第12期4345-4350,共6页
Computer Engineering and Design
基金
山东省自然科学基金项目(2009ZRB019CE)
关键词
分类
支持向量机
不平衡样本集
欠采样算法
核函数
classification
support vector machine
imbalanced dataset
under-sampling algorithm
kernel function