期刊文献+

基于权重搜索树改进K近邻的高维分类算法 被引量:7

Improved K-nearest Neighbor Algorithm Based on Weight Search Tree for High-dimensional Classification
下载PDF
导出
摘要 信息采集技术日益发展导致的高维、大规模数据,给数据挖掘带来了巨大挑战,针对K近邻分类算法在高维数据分类中存在效率低、时间成本高的问题,提出基于权重搜索树改进K近邻(K-nearest neighbor algorithm based on weight search tree, KNN-WST)的高维分类算法,该算法根据特征属性权重的大小,选取部分属性作为结点构建搜索树,通过搜索树将数据集划分为不同的矩阵区域,未知样本需查找搜索树获得最"相似"矩阵区域,仅与矩阵区域中的数据距离度量,从而降低数据规模,以减少时间复杂度。并研究和讨论最适合高维数据距离度量的闵式距离。6个标准高维数据仿真实验表明,KNN-WST算法对比K近邻分类算法、决策树和支持向量机(support vector machine, SVM)算法,分类时间显著减少,同时分类准确率也优于其他算法,具有更好的性能,有望为解决高维数据相关问题提供一定参考。 The ongoing development of information acquisition technique results in high-dimensional and large-scale data,which enormously challenges the data mining.Aiming at low efficiency and high time cost of K-nearest neighbor classification algorithms in high-dimensional data,an improved K-nearest neighbor algorithm based on weight search tree(KNN-WST)for high-dimensional classification was proposed.The algorithm selected some attributes as nodes to construct a search tree according to the weight of feature attributes.The search tree divided the data set into different matrix regions.Unknown samples needed to find the search tree to obtain the most"similar"matrix region,and only calculated the distance from the data contained in the matrix area.Thus,it reduced data size,and so as the time complexity.And the most suitable Minkowski Distance for distance measurement of high-dimensional data were discussed and analyzed.Simulation experiments on 6 standard high-dimensional data show that the classification time of KNN-WST has better performance than that of the K-nearest neighbor,decision tree and SVM.Its classification time is significantly reduced and classification accuracy is better than other algorithms.KNN-WST has better performance on the classification of high-dimensional data,which is expected to give some references for solving the related problem of high-dimensional data.
作者 梁淑蓉 陈基漓 谢晓兰 LIANG Shu-rong;CHEN Ji-li;XIE Xiao-lan(College of Information Science and Engineering,Guilin University of Technology,Guilin 541004,China;Guangxi Key Laboratory of Embedded Technology and Intelligent Systems,Guilin 541004,China)
出处 《科学技术与工程》 北大核心 2021年第7期2760-2766,共7页 Science Technology and Engineering
基金 国家自然科学基金(61762031) 广西科技重大专项(桂科AA19046004) 广西重点研发项目(桂科AB18126006)。
关键词 高维数据 K近邻分类算法 特征属性 搜索树 闵氏距离 high-dimensional data K-nearest neighbor classification characteristic attribute search tree Minkowski Distance
  • 相关文献

参考文献10

二级参考文献77

共引文献140

同被引文献71

引证文献7

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部