摘要
对于不平衡类数据集的分类问题,训练分类器后,分类结果产生了向多数类偏斜的问题,少数类识别率较低。为了提高少数类的分类精度,提出了一种改进的SMOTE方法—空间插值方法,利用少数类及其k近邻构造超几何体,在超几何体内随机产生虚拟少数类样本,当其k近邻中存在多数类时,缩小构造虚拟样本的空间,加强对易错分样本的训练,降低数据集类不平衡程度,并进行有效性验证。在实际数据集上,基于多个分类器进行仿真,结果表明,空间插值法在少数类和数据集整体分类性能优化效果较好。
Analyzing the problem that the classification results is always biased to the majority class in class imbalance data sets. An improved method of SMOTE Called Space - Synthetic Minority Over - sampling Technique( S - SMOTE) was proposed. A super geometry based on the minority class and its k nearest neighbors was constructed. New synthetic samples were generated inside the super geometry. The production space was reducing to avoid the noise if some of its k nearest neighbors belongs to majority class. The training of minority class samples that are hardly classified was strengthen. Then the validity of the virtual samples was confirmed. Based on the real data sets, the experiments show that this method performes better than SMOTE for the classification performance of minority class and the whole data set.
出处
《计算机仿真》
CSCD
北大核心
2012年第12期175-179,共5页
Computer Simulation
关键词
类不平衡
超几何体
过抽样
样本生成
Class imbalance
Super geometry
Over - sampling
Generate samples