期刊文献+

基于核空间中K-近邻的不均衡数据算法 被引量:9

Algorithm for Imbalanced Dataset Based on K-Nearest Neighbor in Kernel Space
下载PDF
导出
摘要 为了解决传统分类器的过拟合现象,从而增强分类性能,提出了一种基于核空间中K-近邻算法的混合取样的不均衡数据集分类算法。该算法首先在核空间上计算样本与相反类样本的k个近邻,以及类样本间的平均距离,即两个类中心间的距离;然后依据控制参数删除远离分类边界的样本,再对少数类利用SMOTE算法插入样本;最后在新的训练集上确定最终决策函数。在人工数据集和4组UCI数据集上进行了实验,结果表明了该算法对不均衡数据集进行降维采样的有效性。 In order to resolve the over fitting phenomenon of classifiers and enhance classification performance, this paper proposes an under-sampling method for imbalanced data classification based on K-nearest neighbor in kernel space. Firstly, this algorithm computes the k nearest neighbors of samples and contrary class samples in kernel space, and computes the average distance between two class samples. Then, this algorithm deletes the samples away from the classification boundary according to the control parameters, and uses the SMOTE over-sampling algorithm for small class samples to generate a new balanced sample set. Finally, this algorithm gets the final decision function with the new dataset. The algorithm may resolve the problem of imbalanced dataset and improve the classification performance of SVM. The experimental results with artificial dataset and four groups of UCI datasets show that the algorithm is effective for imbalanced dataset.
作者 杜红乐
出处 《计算机科学与探索》 CSCD 北大核心 2015年第7期869-876,共8页 Journal of Frontiers of Computer Science and Technology
基金 陕西省教育厅科技计划项目 商洛学院科学与技术研究项目~~
关键词 支持向量机 不均衡数据 过取样 欠取样 K-近邻 support vector machine imbalanced dataset over-sampling under-sampling K-nearest neighbor
  • 相关文献

参考文献10

二级参考文献100

共引文献228

同被引文献53

引证文献9

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部