摘要
针对支持向量机(SVM)在超平面附近进行不平衡数据(imbalanced datasets)分类的不准确性,提出了一种改进SVM-KNN算法,该算法在分类阶段计算测试样本与最优超平面的距离,如果距离差大于给定阈值可直接应用支持向量机分类;如果距离差小于给定阈值,则将所有支持向量都作为测试样本的近邻样本,进行KNN分类。通过对UCI数据集的大量实验表明,该算法在少数类样本的识别率和分类器的整体性能上有明显改善。
Improved KNN-SVM that combined Support Vector Machine(SVM)with K Nearest Neighbor(KNN)is presented to improve the accuracy of imbalanced classification nearby SVM hyperplane. In the class phase,the algorithm computes the distance from the tested sample to the optimal super-plane of SVM in the feature space. If the distance is greater than the given threshold,the tested sample will be classified on SVM;otherwise the SVs from different categories are used as the tested sample of nearest neighbors,the tested sample will be classified on KNN. A large amount of experiments by the UCI dataset show that the algorithm can significantly improve the identification rate of the minority samples and overall classification performance.
出处
《计算机工程与应用》
CSCD
北大核心
2016年第4期51-55,103,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.31170393)
陕西省自然科学基金(No.2012JM8023)
陕西省教育厅自然科学基金专项(No.12JK0726)
关键词
支持向量机
K近邻法
不平衡数据集
Support Vector Machine(SVM)
K Nearest Neighbor(KNN)
imbalanced datasets