期刊文献+

一种缓解分类面交错的样本点扩散方法

Diffusion Method of Sample Points for Alleviating Staggered Situation of Classification
下载PDF
导出
摘要 固定的相似性度量使得学习器无法结合先验信息揭示数据本身固有的统计规律,对于分类面交错严重的数据集,难以取得较好的学习效果。为了缓解分类面交错,提高分类准确度,将边界和样本点扩散结合起来,通过统计样本标签信息和位置信息得到边界点,以边界点为中心选取合适的控制函数对周边样本点进行扩散,使得分类面更加清晰,从而提高分类算法的精度。在多个分类面交错的数据集上,使用不同分类器验证所提方法,结果表明,其准确率有不同程度的提升。与3种经典的有监督度量学习方法进行比较,实验结果表明所提方法适合处理交错程度高的数据集,而且能有效提升SVM的性能。 The fixed similarity measurement makes learner difficult to reveal the inherent statistical rules of the data it- self with the priori information, and it is difficult to get good effect for the data set with a staggered classification. In or- der to improve the classification accuracy of the data set with a staggered classification, this paper combined the bounda- ry and sample diffusion method. The method applies the statistical sample label information and position information to obtain boundary point,which is treated as the center. Then we selected appropriate control function to spread neighbo- ring sample points to make the classification more clear, so as to enhance the learning accuracy. Different classifiers are used to validate the method,and the accuracy of the proposed method is improved in different degrees. Compared with three classical supervised distance metric learning method, the experimental results show that this method is suitable for processing high degree of interleaving data sets,and can effectively improve the performance of SVM.
出处 《计算机科学》 CSCD 北大核心 2017年第9期286-289,295,共5页 Computer Science
基金 国家863计划重大项目(2013AA01A212) 国家自然科学基金资助项目(6127206761104156 61402118) 广东省自然科学基金(9451009001002777)资助
关键词 度量学习 样本点扩散 数据预处理 Distance metric learning,Sample point dispersion,Data preprocessing
  • 相关文献

参考文献6

二级参考文献172

  • 1http://archive.ics.uci.edu/ml/datasets.html
  • 2Skillicom D B.Understanding High-Dimensional Spaces[M].Springer-Verlag New York Incorporated,2013.
  • 3Donoho D L.High-dimensional data analysis:The curses and blessings of dimensionality[J].AMS Math Challenges Lecture,2000:1-32.
  • 4Bellman R.Adaptive Control Process:A Guide Tour[M].Princeton University Press,Princeton,New Jersey,1961.
  • 5Fukunaga K.Introduction to Statistical Pattern Recognition(2nd ed)[M].New York:Academic,1990,39-40(31-34):220-221.
  • 6Mil'man V D.New proof of the theorem of A.Dvoretzky on intersections of convex bodics[J].Functional Analysis and its Applications,1971,5 (4):288-295.
  • 7Weber R,Schek H-J,Blott S.A quantitative analysis and performance study for similarity-sesrch methods in high-dimensional spaces[C] //Proceedings of the 24rd International Conference on Very Large Data Bases,ser.VLDB' 98.San Francisco,CA,USA:Morgan Kaufmanm Publishers Inc,1998:194-205.
  • 8Gaede V,Günther O.Multidimensional access methods[J].ACM Computing Surveys (CSUR),1998,30(2):170-231.
  • 9Francois D,Wertz V,Verleysen M.Non-euclidean metrics for similarity search in noisy datasets[C] //Proc.of ESANN.2005.
  • 10Kouiroukidis N,Evangelidis G.The Effects of Dimensionality Curse in High Dimensional kNN Search[C] //Informatics(PCI),2011 15th Panhellenic Conference on.IEEE,2011:41-45.

共引文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部