期刊文献+

基于分层聚类及重采样的大规模数据分类 被引量:5

Large-scale data classification based on hierarchical clustering and re-sampling
下载PDF
导出
摘要 针对大规模数据的分类问题,将监督学习与无监督学习结合起来,提出了一种基于分层聚类和重采样技术的支持向量机(SVM)分类方法。该方法首先利用无监督学习算法中的k-means聚类分析技术将数据集划分成不同的子集,然后对各个子集进行逐类聚类,分别选出各类中心邻域内的样本点,构成最终的训练集,最后利用支持向量机对所选择的最具代表样本点进行训练建模。实验表明,所提方法可以大幅度降低支持向量机的学习代价,其分类精度比随机欠采样更优,而且可以达到采用完整数据集训练所得的结果。 Based on hierarchical clustering and re-sampling, this paper presented a Support Vector Machine (SVM) classification method for large-scale data, which combined supervised learning with unsupervised learning. The proposed method first used k-means cluster analytical technology to partition dataset into several subsets. Then, the method clustered class by class for each subset and selected samples in each clustering center neighborhood to form candidate training datasets. Last, the method applied SVM to train and model for candidate training datasets. The experimental results show that the proposed method can substantially reduce SVM learning cost. Meanwhile, the proposed method has better classification accuracy than random re-sampling method, and can attain about the same classification accuracy of the non-sampling method.
出处 《计算机应用》 CSCD 北大核心 2013年第10期2801-2803,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目(61373127) 中国博士后科学基金资助项目(20110491530) 辽宁省教育厅基金资助项目(L2011186)
关键词 海量数据 分类 聚类 重采样 支持向量机 large-scale data classification clustering re-sampling Support Vector Machine (SVM)
  • 相关文献

参考文献11

  • 1李红莲,王春花,袁保宗,朱占辉.针对大规模训练集的支持向量机的学习策略[J].计算机学报,2004,27(5):715-719. 被引量:53
  • 2CHEN P H, FAN R E, LIN C J. A study on SMO-type decomposi- tion methods for support vector machines [ J]. IEEE Transactions on Neural Networks, 2006, 17(4): 893-908.
  • 3HUANG G B, MAO K Z, SIEW C K, et al. Fast modular network implementation for support vector machines [ J]. IEEE Transactions on Neural networks, 2005, 16(6) : 1651 - 1663.
  • 4DONG J X, KRZYZAK A, SUEN C Y. Fast SVM training algorithm with decomposition on very large data sets [ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(4) : 603 - 618.
  • 5CHEN G X, CHENG Y, XU J. Cluster reduction support vector ma- chine for large-scale data set [ C]//Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application. Piscataway: IEEE, 2008:8-12.
  • 6CERVANTES J, LI X, YU W. Support vector machine classifica- tion based on fuzzy clustering for large data sets [ C]// MICAI'06: Proceedings of the 5th Mexican International Conference on Artificial Intelligence, LNCS4293. Berlin: Springer, 2006:572-582.
  • 7LID C, FANG Y H. An algorithm to cluster data for efficient classi- fication of support vector machines [ J]. Expert Systems with Appli- cations, 2008, 34(3): 2013-2018.
  • 8CERVANTES J, LI X, YU W, et al. Multi-class support vector ma- chine for large data sets via minimum enclosing ball clustering [ C]//Proceeding of the 4th International Conference on Electrical and Electronics Engineering. Piscataway: IEEE, 2007: 146- 149.
  • 9陈光喜,徐健,成彦.一种聚簇消减大规模数据的支持向量分类算法[J].计算机科学,2009,36(3):184-188. 被引量:10
  • 10CHANG C C, LIN C J. LIBSVM: a library for support vector ma- chines [ CP/OL]. [ 2012-10-10]. http://www, esie. ntu. edu. tw/ - ejlin/libsvm.

二级参考文献16

  • 1胡懋智,古红英.各种不同类型的支持向量机及其性能比较分析[J].计算机工程与应用,2005,41(12):37-40. 被引量:8
  • 2白亮,老松杨,胡艳丽.支持向量机训练算法比较研究[J].计算机工程与应用,2005,41(17):79-81. 被引量:15
  • 3Zheng Chun-Hong,Jiao Li-Cheng. Fuzzy Pre-extracting Method For Support Vector Machine[A]//Proceedings of the First International Conference on Machine Learning and Cybernetics. Beijing, November 2002 : 4-5
  • 4Mangasarian O L, Musicant D R. Successive overrelaxation for support vector machines[J]. IEEE Tangasarian on Neural Networks, 1999,10: 1032-1037
  • 5Vapnik V N. Statistical Learning Theory[M]. New York: Wiley, 1998
  • 6Hearst M.A., Dumais S.T., Osman E., Platt J., Scholkopf B.. Support vector machines. IEEE Intelligent Systems, 1998, 13(4): 18~28
  • 7Vapnik V.N.. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999, 10(5): 988~999
  • 8Vapnik V.N.. Statistical Learning Theory.2nd ed..New York: Springer-Verlag, 1999
  • 9Müller Klaus-Robert, Mika Sebastian, Rtsch Gunnar, Tsuda Koji, Schlkopf Bernhard. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 2001, 12(2): 181~201
  • 10Burges C.J.C.. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 121~167

共引文献61

同被引文献30

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部