期刊文献+

基于边界分布的非平衡数据集分类研究

The Research of Classification of Imbalanced Data Based on the Distribution of the Data on Borderline
下载PDF
导出
摘要 非平衡数据集的分类问题是数据挖掘领域的一个研究热点。针对非平衡数据集分类困难的问题,特别是由于非平衡分布引起的少数类识别能力低下的问题,提出了一种改进算法,SMOTE-SVM-KNN。针对传统的过采样算法盲目过采样的问题进行了研究,提出了一种基于边界分布的采样算法。该算法利用SVM得到支持向量作为边界,同时删除噪声点。针对边界上的点,计算其分布,然后按照分布对边界上的点进行过采样。在实际数据集上进行试验,并与SMOTE-SVM算法进行比较,试验结果表明该算法能够有效地提高少数类的分类准确率。 Classification of data with imbalanced class distribution is a research focus on data mining. In order to resolve the imbalanced problems, especially those of the poor predictive accuracy over the minority class, this paper presents an improved approach, SMOTE-SVM-KNN. To solve the problem that traditional over-sampling algorithms do oversampling blindly, a sampling algorithm based on the distribution of the data on borderline is presented. This algorithm consider the support vector as borderlines using SVM. At the same time, noise points are deleted. As for the points on the borderline, the distribution is computed. Then the oversampling of the borderline points is done based on the distribution. Compared with SMOTE-SVM algorithm on real data sets, the experimental results finally show that this algorithm can effectively improve the classification accuracy of the negative class.
作者 姚倩 张宁
出处 《物流科技》 2015年第10期13-17,共5页 Logistics Sci-Tech
基金 国家自然科学基金项目 项目编号:70971089 上海市一流学科(系统科学)项目 项目编号:XTKX2012 上海理工大学研究生创新基金项目 项目编号:JWCXSL1402
关键词 支持向量机 非平衡数据集 K阶最近邻 SVM imbalanced data sets KNN
  • 相关文献

参考文献20

  • 1L6pez V, Femdndez A, Moreno-Torres J G, et al. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classi-fication. Open problems on intrinsic data characteristics[J]. Expert Systems with Applications, 2012,39(7):6585-6608.
  • 2Huang B, Kechadi M T, Buckley B. Customer chum prediction in telecommunications [J]. Expert Systems with Applications,2012,39(1):1414-1425.
  • 3Brown I,Mues C. An experimental comparison of classification algorithms for imbalanced credit scoring data sets[J]. ExpertSystems with Applications, 2012,39⑶:3446-3453.
  • 4Garcla-Pedrajas N, P^rez-Rodriguez J, Garcfa-Pedrajas M, et al. Class imbalance methods for translation initiation site recog-nition in DNA sequences[J]. Knowledge-Based Systems, 2012,25(1):22-34.
  • 5Khalilia M,Chakraborty S,Popescu M. Predicting disease risks from highly imbalanced data using random forest [J]. BMCmedical informatics and decision making, 2011,11(1):51.
  • 6Yang Q, Wu X. 10 challenging problems in data mining research[J]. International Journal of Information Technology & Deci-sion Making, 2006,5(4):597-604.
  • 7Wu G, Chang E Y. KBA: Kernel boundary alignment considering imbalanced data distribution [J]. Knowledge and Data Engi-neering, IEEE Transactions on, 2005,17(6):786-795.
  • 8Barua S, Islam M M, Yao X, et al. MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced DataSet Leaming[J]. Knowledge and Data Engineering, IEEE Transactions on, 2014,26(2):405-425.
  • 9Verbiest N, Ramentol E, Comelis C, et al. Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy roughprototype selectionfj]. Applied Soft Computing, 2014,22:511-517.
  • 10Verbeke W, Dejaeger K, Martens D, et al. New insights into chum prediction in the telecommunication sector. A profitdriven data mining approach[J]. European Journal of Operational Research, 2012,218(1):211-229.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部