期刊文献+

基于加权聚类质心的SVM不平衡分类方法 被引量:4

Support vector machine imbalanced data classification based on weighted clustering centroid
下载PDF
导出
摘要 不平衡数据分类是机器学习研究的热点问题,传统分类算法假定不同类别具有平衡分布或误分代价相同,难以得到理想的分类结果.提出一种基于加权聚类质心的SVM分类方法,在正负类样本上分别进行聚类,对每个聚类,用聚类质心和权重因子代表聚类内样本分布和数量,相等类别数量的质心和权重因子参与SVM模型训练.实验结果表明,该方法使模型的训练样本具有较高的代表性,分类性能与其他采样方法相比得到了提升. Classification of imbalanced data has become a research hot topic in machine learning. Traditional classi- fication algorithms assume that different classes have balanced distribution or equal misclassification cost, thus, making it hard to get ideal result of classifications. A support vector machine ( SVM) classification method based on weighted clustering centroid was proposed in this paper. First, unsupervised clustering was applied to the positive and negative samples respectively to extract the clustering centroid of each clustering, which was represented the most in compactness of the clustering sample. Next, all clustering centroids formed a new set of balance training. In order to minimize the information loss during clustering, each clustering centroid was associated with a weight factor that was defined proportional to the number of samples of the class. Finally, all clustering centroids and weight fac- tors participated in the training of the improved SVM model. Experimental results show that the proposed method can make the sample selected from model train sets more typical and improve the classification performance better than other sampling techniques for dealing with imbalanced data.
作者 胡小生 钟勇
出处 《智能系统学报》 CSCD 北大核心 2013年第3期261-265,共5页 CAAI Transactions on Intelligent Systems
基金 佛山市科技发展专项资金资助项目(2011AA100061) 佛山市产学研专项资金资助项目(2012HC100272) 佛山市教育局智能评价指标体系研究项目(DX20120220)
关键词 机器学习 不平衡数据分类 聚类质心 支持向量机 machine learning imbalanced data classification clustering centroid support vector machine
  • 相关文献

参考文献13

  • 1叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(2):148-156. 被引量:72
  • 2RONALDO C P, GUSTAVO E A, MARIA C M. A study with class imbalance and random sampling for a decision tree learning system [ C ]//International Conference for Information Processing. Milano, Italy, 2008: 131-140.
  • 3WU Junjie, XIONG Hui, WU Peng, et al. Local decomposition for rare class analysis[ C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2007: 814-823.
  • 4HE Haibo, GARCIA E A. Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
  • 5李雄飞,李军,董元方,屈成伟.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):202-209. 被引量:63
  • 6付忠良.不平衡多分类问题的连续AdaBoost算法研究[J].计算机研究与发展,2011,48(12):2326-2333. 被引量:17
  • 7VEROPOULOS K, CAMPBELL C, CRISTIANINI N. Controlling the sensitivity of support vector machines[ C ]//Proceedings of the International Joint Conference on Artificial Intelligence. San Francisco, USA, 1999 : 55-60.
  • 8AKBANI R, KWEK S, JAPKOWICZ N. Applying support vetor machines to imbalanced datasets [ C ]//Proceedings of 15th European Conference on Machine Learning. Pisa, Italy, 2004: 39-50.
  • 9WU G, CHANG E Y. KBA: kernel boundary alignment considering imbalanced data distribution [ J ]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (6) : 786-795.
  • 10ERTEKIN S, HUAN J, BOTTON L, et al. Learning on the border: active learning in imbalanced data classification [ C ]//Proceedings of the ACM Conference on Information and Knowledge Management. Lisbon, Portugal, 2007 : 127-136.

二级参考文献42

  • 1武勃,黄畅,艾海舟,劳世竑.基于连续Adaboost算法的多视角人脸检测[J].计算机研究与发展,2005,42(9):1612-1621. 被引量:66
  • 2凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 3Schapire R E. The strength of weak learnability [J]. Machine Learning, 1990, 5(2): 197-227.
  • 4Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
  • 5Schapire R E, Singer Y. Improved boosting algorithms using confidence-rated predictions [J]. Machine Learning, 1999, 37(3): 297-336.
  • 6Schapire R E, Freund Y, Bartlett P, et al. Boosting the margin: A new explanation for the effectiveness of voting methods [J]. The Annals of Statistics, 1998, 26 (5) : 1651- 1686.
  • 7Viola P, Jones M. Robust real-time face detection [J]. Int Journal of Computer Vision, 2004, 57(2): 137-154.
  • 8Breiman L, Random forests[J]. Machine Learning, 2001, 45(1), 5-32.
  • 9Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting [J]. Annals of Statistics, 2000, 28(2): 337-374.
  • 10Fu Zhongliartg, Yao Yu, Zhao Xianghui. The best combining of classifiers with prior probabilities [C]//Proc of the 6th Int Conf on Machine Learning and Data Mining in Pattern Recognition. Leipzig, Germany: IBM, 2009:104-114.

共引文献144

同被引文献44

  • 1沈徐辉,罗小平.基于模糊的改进KPCA方法[C]//Proceedings of the 29th Chinese Control Conference.Beijing:2010(7):29-31.
  • 2程鹏.矩阵论[M].西安:西北工业大学出版社,1989.
  • 3Bezdek J. Pattern Recognition with Fuzzy Objective Func-tion Algorithms[M], New York: Plenum Press, 198].
  • 4Pal N R,Bezdek J C. On Cluster Validity for the Fuzzy C-mean model [J]. IEEE Trans on Fuzzy Systems, 1995,3(3):370-379.
  • 5Blake C,Keogh E,Merz C J. UCI repository of machinelearning databases [EB/OL]. http://www. ics. uci. edu/.mlearn /MLRepository. htm.
  • 6CATENI S, COLLA V, VANNUCCI M. A method for resampling imbalanced datasets in binary classification tasks for real-world problems[J]. Neurocomputing, 2014, 135: 32-41.
  • 7ZHANG Huaxiang, LI Mingfang. RWO-Sampling: a random walk over-sampling approach to imbalanced data classification[J]. Information fusion, 2014, 20: 99-116.
  • 8CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16(1): 321-357.
  • 9CHEN Xiaolin, SONG Enming, MA Guangzhi. An adaptive cost-sensitive classifier[C]//Proceedings of the 2nd International Conference on Computer and Automation Engineering. Singapore: IEEE, 2010, 1: 699-701.
  • 10WANG Shijin, XI Lifeng. Condition monitoring system design with one-class and imbalanced-data classifier[C]//Proceedings of the 16th International Conference on Industrial Engineering and Engineering Management. Beijing, China: IEEE, 2009: 779-783.

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部