期刊文献+

文档相似矩阵在提高KNN分类效率中的应用 被引量:2

Application of Document Similarity Matrix in the Improvement of Classification Efficiency of KNN Classification Algorithm
下载PDF
导出
摘要 针对传统KNN分类算法在样本数量大、维度高的情况下相似度计算量大的问题,提出了基于相似矩阵的改进KNN分类算法。该算法通过计算样本两两之间的相似度,建立相似矩阵加速KNN算法中寻找K近邻;以搜狗自然语言实验室的文本分类语料库中的新闻文档作为实验对象,采用宏平均F测度值作为分类效果评价标准,用改进KNN方法和传统KNN方法进行对比实验。实验结果表明:通过调节参数,本方法能在不损失精度的情况下减少寻找K近邻时相似度计算的次数。 In view of the fact that the traditional KNN classification algorithm has to calculate a large number of similarity problems when there is a large number of samples with high dimensions, this paper proposes an improved KNN classification algo- rithm based on the similarity matrix. The algorithm uses the similarity matrix established by calculating the similarity between 2 sam- pies to accelerate the search of K nearest neighbors in KNN algorithm; taking the news documents in the text classification corpus in Sogou Natural Language Laboratory as the experimental object, and using the macro-averaging F measuring value as the classifica- tion effect evaluation standard, the paper performs a comparative experiment with the improved KNN method and the traditional KNN method. The experimental results show that by adjusting the parameters, the proposed method can reduce the similarity calcu- lating times without the lass of accuracy when searching K nearest neighbors.
出处 《情报理论与实践》 CSSCI 北大核心 2014年第1期141-144,共4页 Information Studies:Theory & Application
基金 国家高技术研究发展计划("863"计划)资助项目"农产品全供应链多源信息感知技术与产品开发"的成果 项目编号:2012AA101701
关键词 文本分类 K最近邻分类法 相似矩阵 算法 text classification KNN classification algorithm similarity matrix algorithm
  • 相关文献

参考文献12

  • 1YANG Y , LIU X. Are-examination of text categorization meth- ods [ C] //Proceeding of the 22nd Annual International ACM SIGI R Conference on Reseach and Development in Information Retrieval (SIGIR' 99) , 1999: 42-49.
  • 2COVER T M, HART R E. Nearest neighbor pattern classifica- tion [ J]. IEEE Transactions on Information Theory, 1967, 13 (1) : 21-27.
  • 3LIU Yu, CHEN Guisheng. KNN algorithm improving based on cloud model [ C ] //2010 2nd International Conference on Ad- vanced Computer Control (ICACC) . Changsha, 2010 : 63-66.
  • 4ZHOU Lijuan, et al. A clustering-based KNN improved algo- rithm CLKNN for text classification [ C ] // Automation and Robot (CAR' 10 ) . Proceedings of the2nd International Asia Conference on Informatics in Control. Piscataway, NJ, USA: IEEE Press, 2010: 212-215.
  • 5HUANG Hong, GUO Juan, WANG Ben. An improved KNN al- gorithm based on adaptive cluster distance bounding for high di- mensional indexing [ C]. 2012 Third Global Congress on Intel- ligent Systems, 2012: 213-217.
  • 6ZHOU Yong, LI Youwen, XIA Shixiong. An improved KNN text classification algorithm based on clustering [ J ]. Journal of Computers, 2009, 3 (4) : 230-237.
  • 7LIU Haifeng, LIU Shousheng, SU Zhan. An improved KNN text categorization on skew sort condition [ C ]. 2010 Interna- tional Conforence on Computer Application and System Modeling (ICCASM 2010) . Taiyuan, 2010: 182-186.
  • 8ZHAO Weidong, TANG Shuanglin, DAI Weihui. An improved KNN algorithm based on essential vector [ J]. Electronics And Electrical Engineering, 2012 (123) : 119-122.
  • 9梁俊杰,王长磊.利用分区和距离实现高维空间快速KNN查询[J].计算机研究与发展,2007,44(11):1980-1985. 被引量:4
  • 10刘海博,郗亚辉,王煜.用于文本分类的快速KNN算法[J].河北大学学报(自然科学版),2008,28(3):322-326. 被引量:5

二级参考文献22

  • 1王晓晔,王正欧.K-最近邻分类技术的改进算法[J].电子与信息学报,2005,27(3):487-491. 被引量:25
  • 2王煜,王正欧.基于模糊决策树的文本分类规则抽取[J].计算机应用,2005,25(7):1634-1637. 被引量:13
  • 3乔玉龙,潘正祥,孙圣和.一种改进的快速k-近邻分类算法[J].电子学报,2005,33(6):1146-1149. 被引量:25
  • 4董道国,刘振中,薛向阳.VA-Trie:一种用于近似k近邻查询的高维索引结构[J].计算机研究与发展,2005,42(12):2213-2218. 被引量:10
  • 5J S Pan, Y L Qiao, S H SUN. A fast K nearest neighbors classification [J]. IEICE Trans Fundamentals, 2004, 87 (4) :961 - 963.
  • 6景丽萍 高阳 吴国宝.基于K—means特征加权算法的大规模文本数据子空间聚类[J].计算机研究与发展,2005,42:85-85.
  • 7Songbo Tan. Neighbor weighted K-nearest neighbor for unbalanced text corpus[J]. Expert Systems with Applications, 2005,28(4) : 667 - 671.
  • 8W J Hwang, K W Wen. Fast KNN classification algorithm based on partial distance search[J]. Electron Lett, 1998, 34(21) :2062 -2063.
  • 9J Kennedy, R C Eberhart. Particle swarm optimization[A]. Proceedings of the 1995 IEEE International Conference on Neural Networks[C]. Perth, Australia: IEEE Service Center, Piscataway, NJ, 1995. 1942- 1948.
  • 10E Chavez,G Navarro,R Baeza-Yates,et al.Searching in metric spaces[J].ACM Computing Surveys,2001,33(3):273-321

共引文献15

同被引文献20

  • 1Kang A N,Barolli L,Park J H,et al.A strengthening plan for enterprise information security based on cloud computing[J].Cluster Computing,2013:1-8.
  • 2Cholez H,Girard F.Maturity assessment and process improvement for information security management in small and medium enterprises[J].Journal of Software:Evolution and Process,2014,26(5):496-503.
  • 3Ahmad A,Maynard S B,Park S.Information security strategies:towards an organizational multi-strategy perspective[J].Journal of Intelligent Manufacturing,2014,25(2):357-370.
  • 4SHARIFI, ABOOSALEH M, AMIRGHOLIPOUR. Intrusion de- tection based on joint of k-means and knn[J]. Journal of Conver- gence Information Technology,2014(5) :45-52.
  • 5SHASI4IDHAR HV,SUBRAMANIAN VARADARAJAN. Customer segmentation of bank based on data mining security value based heuristic approach as a replacement to kmeans segmentation[J]. International Journal of Computer Applications, 2011 (5) : 66-72.
  • 6S VIMALA. Convergence analysis of eodehook generation teeh: niques for vector quantization using K-Means clustering technique [J]. International Journal of Computer Applications, 2011 (3) : 85- 92.
  • 7NAL1NI SINGH, AMBARISH G MOHAPATRA. Breast cancer mass detection in mammograms using kmeans and fuzzy cmeans clustering [J]. International Journal of Computer Applications, 2014 (3) : 34-40.
  • 8HEJIN YUAN,CUIRU WANG. A human action recognition algo- rithm based on semi-supervised kmeans clustering[J]. Transactions on Edutainment, 2014 (6): 47- 52.
  • 9唐然,龙腾锐,龙向宇.基于模糊聚类的改进遗传算法[J].重庆大学学报(自然科学版),2008,31(2):166-169. 被引量:6
  • 10肖锟.浅议网络环境下的企业信息安全管理[J].标准科学,2010(8):20-23. 被引量:15

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部