期刊文献+

基于商空间理论的非平衡数据集分类算法 被引量:2

Classification algorithm for imbalance dataset based on quotient space theory
下载PDF
导出
摘要 在机器学习及其分类问题时经常会遇到非平衡数据集,为了提高非平衡数据集分类的有效性,提出了基于商空间理论的过采样分类算法,即QMSVM算法。对训练集中多数类样本进行聚类结构划分,所得划分结果和少数类样本合并进行线性支持向量机(SVM)学习,从而获取多数类样本的支持向量和错分的样本粒;另一方面,获取少数类样本的支持向量和错分的样本,进行SMOTE采样,最后把上述得到的两类样本合并进行SVM学习,这样来实现学习数据集的再平衡处理,从而得到更加合理的分类超平面。实验结果表明,和其他几种算法相比,所提算法虽在正确分类率上有所降低,但较大改善了g_means值和acc+值,且对非平衡率较大的数据集效果会更好。 The application of data classification is usually confronted with a problem named imbalanced dataset in the machine learning.To improve the performance of imbalanced dataset classification,the over-sampling classification algorithm based on quotient space theory(QMSVM) was proposed.The algorithm partitioned majority data on clustering structure,and combined the results and minority data for linear Support Vector Machine(SVM) learning.Support vectors and sample of fault of majority data were obtained from those granules.On the other hand,support vectors and sample of fault of minority data were obtained and the Synthetic Minority Over-sampling Technique(SMOTE) was adopted.Thus,two new kinds of samples were merged for SVM learning,so as to rebalance the training set and get a more reasonable classification of hyperplanes.The experimental results show that,in comparison with several other algorithms,the accuracy of the proposed algorithm decreases,but it significantly improves the g_means value and classification accuracy of positives and the effect is better on the imbalance rate of larger datasets.
出处 《计算机应用》 CSCD 北大核心 2012年第1期210-212,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目(71071002) 安徽省教育厅自然科学基金资助项目(05010428) 安徽大学人才队伍建设项目 安徽大学学术创新团队项目(KJTD001B)
关键词 非平衡数据集 商空间理论 支持向量机 过采样 QMSVM算法 imbalanced dataset quotient space theory Support Vector Machine(SVM) over-sampling QMSVM algorithm
  • 相关文献

参考文献8

  • 1WEISS G M. Mining with rarity: A unifying framework [J]. ACM SIGKDD Explorations Newsletter - Special Issue on Learning From Imbalanced Datasets, 2004, 6(1) : 7 - 19.
  • 2蒋莎,张晓龙.一种用于非平衡数据的SVM学习算法[J].计算机工程,2008,34(20):198-199. 被引量:7
  • 3WU G, CHANG E. Class-boundary alignment for imbalanced dataset learning [ C]// The Twentieth International Conference on Machine Learning Workshop on Learning from Imbalanced Datasets. Washington, DC: AAAI Press, 2003:786 -795.
  • 4HUANG KAIZHU, YANG HAIQIN, KING I, et al. Imbalanced learning with a biased minimax probability machine [J]. IEEE Transactions on Systems, Man, and Cybernetics, 2006, 36 (4) : 913 -923.
  • 5CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic minority over-sampling technique [ J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321 -357.
  • 6郭虎升,亓慧,王文剑.处理非平衡数据的粒度SVM学习算法[J].计算机工程,2010,36(2):181-183. 被引量:15
  • 7KUBAT M, MATWIN S. Addressing the curse of imbalaneed training sets: One-sided selection [ C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997:179 - 186.
  • 8BLAKE C, MERZ C. UCI repository of machine learning data bases [EB/OL]. [ 2011-03-25]. hnp://www, ics. uei. edu/- mlearn/- MLRepository. html.

二级参考文献9

  • 1张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 2Vapnik V. Statictical Learning Theory[M]. New York, USA: Wiley, 1998.
  • 3Tang Yuchun. Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning[D]. Atlanta, USA: Georgia Stage University, 2006.
  • 4Yao Y Y. On Modeling Data Mining with Granular Computing[C]// Proc. of the 25th Annual International Conference on Computer Software and Applications. Chicago, USA: [s. n.], 2001.
  • 5Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One-sided Selection[C]//Proc. of the 14th International Conference on Machine Learning. Nashville, Tennessee, USA: [s. n.], 1997.
  • 6Vapnik V N. The Nature of Statical Learning Theory[M]. New York, USA: Spfinger-Verlag, 1995.
  • 7Musicant D, Kumar V, Ozgur A. Optimizing P-measure with Support Vector Machines[C]//Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference. Florida, USA: AAAI Press, 2003: 356-360.
  • 8Morik K, Brockhausen P, Joachims T. Combining Statistical Learning with a Knowledge-based Approach A Case Study in Intensive Care Monitoring[C]//Proceedings of the International Conference on Machine Learning. San Diego, CA, USA: [s. n.], 1999.
  • 9蒋莎,张晓龙.一种用于非平衡数据的SVM学习算法[J].计算机工程,2008,34(20):198-199. 被引量:7

共引文献20

同被引文献17

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部