期刊文献+

基于Biased-SVM的非平衡半监督分类算法 被引量:3

A Semi-Supervised Classification Method for Imbalanced Data Based on Biased-SVM
原文传递
导出
摘要 针对非平衡数据的半监督分类问题,提出了一种基于Biased-SVM的非平衡半监督分类算法.该方法首先利用初始的标记样本集训练处理不平衡数据的Biased-SVM模型,然后用训练好的Biased-SVM模型为未标记样本加上标签,再把新标记样本加入到初始标记样本集中,重新训练Biased-SVM模型,最后在测试集上进行测试.选取公共数据库里的一些数据集进行实验,首先在两类不平衡数据集上实验的结果表明,在标记样本所占比例为20%~80%时,所提方法能够在不降低数据集整体G-mean值的基础上,提高小类的F-value值并具有较高的稳定性;然后在多类不平衡数据集上实验的结果表明,在标记样本所占比例为20%~80%时,所提方法能够在不降低数据集整体的EG-mean值的基础上,提高小类识别率并具有较高的稳定性. In view of the semi-supervised classification problem for imbalaneed data, a new semi-supervised learning algorithm based on Biased-SVM was proposed. The steps of the proposed algorithm were as follows.. Firstly, the Biased-SVM model that could dispose the unbalanced samples data was trained by the initial labeled sample set. Secondly, the trained Biased-SVM model was used to add labels to the unlabeled samples. Thirdly, the new labeled samples were added to the initial labeled sample set, and the Biased-SVM model was retrained. Finally, the classifier performance was tested. The proposed method was tested in several benchmark data sets. First, according to some binary unbalanced data sets, the experimental results showed that the proposed method not only improved the G- mean value and the F-value of the minor class effectively, but also had higher stability when the labeled sample rate was 20%--80%.Second, some multi-class unbalanced data sets were selected, and the experimental results showed that the presented method not only increased the EG-mean value and the precision of the minor class effectively, but also had higher stability when the labeled sample rate was 20%--80%.
作者 杜利敏 徐扬 DU Limin XU Yang(Intelligent Control Development Center, Southwest Jiaotong University, Chengdu 610031, China Pharmacy College of Henan University, Henan Kaifeng 475004, China)
出处 《河南大学学报(自然科学版)》 CAS 2017年第4期481-489,共9页 Journal of Henan University:Natural Science
基金 国家自然科学基金项目(61175055 61305074)
关键词 半监督学习 非平衡数据 分类算法 Biased-SVM semi-supervised learning imbalanced data classification algorithm~ Biased-SVM
  • 相关文献

参考文献3

二级参考文献54

  • 1方敏,王宝树.基于AdaBoost的改进模糊分类规则集成学习[J].电子与信息学报,2005,27(5):835-837. 被引量:2
  • 2Bing Liu, Yiming Ma, Ching Kian Wong. Improving an association rule based classifier[C]. Proc of the4th European Conf on Principles of Data Mining and Knowledge Discovery. Lyon, 2000: 504-509.
  • 3Alberto Fem~indez, Salvador Garcfa, Marfa Jos6 del Jesusb, et al. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets[J]. Fuzzy Sets and Systems, 2008, 159(18): 2378- 2398.
  • 4Alberto Fernandez, Maria Jos6 del Jesus, Francisco Herrera. On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets[J]. Expert Systems with Applications, 2009, 36(6): 9805-9812.
  • 5Batista G, Prati R C, Monard M C. A study of the behavior of several methods for balancing machine learning training data[J]. SIGKDD Explorations, 2004, 6(1): 20-29.
  • 6Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. J of Computer and System Sciences, 1997, 55(1): 119-139.
  • 7Haibo He, Edwardo. A garcia learning from imbalancedData[J]. IEEE Trans on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 8Xu-Ying Liu, Jianxin Wu, Zhi-Hua Zhou. Exploratory underSampling for class-imbalance learning[J]. IEEE Trans on Systems, Man, and Cybernetics, Part B: Cybernetics, 2009, 39(2): 539-549.
  • 9Chawla N V, Lazarevic A, Hall L O, et al. SMOTEBoost: Improving prediction of the minority class in boosting[C]. Proc of the 7th European Conf on Principles and Practice of Knowledge Discovery in Databases. Dubrovnik, 2003: 107-119.
  • 10Guo H, Viktor H L. Learning from imbalanced data sets with boosting and data generation: The databoost-IM approach[J]. SIGKDD Explorations, 2004, 6(1): 30-39.

共引文献22

同被引文献28

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部