摘要
非平衡数据集的分类问题是数据挖掘领域的一个研究热点。针对非平衡数据集分类困难的问题,特别是由于非平衡分布引起的少数类识别能力低下的问题,提出了一种改进算法,SMOTE-SVM-KNN。针对传统的过采样算法盲目过采样的问题进行了研究,提出了一种基于边界分布的采样算法。该算法利用SVM得到支持向量作为边界,同时删除噪声点。针对边界上的点,计算其分布,然后按照分布对边界上的点进行过采样。在实际数据集上进行试验,并与SMOTE-SVM算法进行比较,试验结果表明该算法能够有效地提高少数类的分类准确率。
Classification of data with imbalanced class distribution is a research focus on data mining. In order to resolve the imbalanced problems, especially those of the poor predictive accuracy over the minority class, this paper presents an improved approach, SMOTE-SVM-KNN. To solve the problem that traditional over-sampling algorithms do oversampling blindly, a sampling algorithm based on the distribution of the data on borderline is presented. This algorithm consider the support vector as borderlines using SVM. At the same time, noise points are deleted. As for the points on the borderline, the distribution is computed. Then the oversampling of the borderline points is done based on the distribution. Compared with SMOTE-SVM algorithm on real data sets, the experimental results finally show that this algorithm can effectively improve the classification accuracy of the negative class.
出处
《物流科技》
2015年第10期13-17,共5页
Logistics Sci-Tech
基金
国家自然科学基金项目
项目编号:70971089
上海市一流学科(系统科学)项目
项目编号:XTKX2012
上海理工大学研究生创新基金项目
项目编号:JWCXSL1402