文档相似矩阵在提高KNN分类效率中的应用被引量：2

Application of Document Similarity Matrix in the Improvement of Classification Efficiency of KNN Classification Algorithm

下载PDF

导出

摘要针对传统KNN分类算法在样本数量大、维度高的情况下相似度计算量大的问题,提出了基于相似矩阵的改进KNN分类算法。该算法通过计算样本两两之间的相似度,建立相似矩阵加速KNN算法中寻找K近邻;以搜狗自然语言实验室的文本分类语料库中的新闻文档作为实验对象,采用宏平均F测度值作为分类效果评价标准,用改进KNN方法和传统KNN方法进行对比实验。实验结果表明:通过调节参数,本方法能在不损失精度的情况下减少寻找K近邻时相似度计算的次数。 In view of the fact that the traditional KNN classification algorithm has to calculate a large number of similarity problems when there is a large number of samples with high dimensions, this paper proposes an improved KNN classification algo- rithm based on the similarity matrix. The algorithm uses the similarity matrix established by calculating the similarity between 2 sam- pies to accelerate the search of K nearest neighbors in KNN algorithm; taking the news documents in the text classification corpus in Sogou Natural Language Laboratory as the experimental object, and using the macro-averaging F measuring value as the classifica- tion effect evaluation standard, the paper performs a comparative experiment with the improved KNN method and the traditional KNN method. The experimental results show that by adjusting the parameters, the proposed method can reduce the similarity calcu- lating times without the lass of accuracy when searching K nearest neighbors.

作者路永和何新宇

机构地区中山大学资讯管理学院

出处《情报理论与实践》 CSSCI 北大核心 2014年第1期141-144,共4页 Information Studies:Theory & Application

基金国家高技术研究发展计划("863"计划)资助项目"农产品全供应链多源信息感知技术与产品开发"的成果项目编号:2012AA101701

关键词文本分类 K最近邻分类法相似矩阵算法 text classification KNN classification algorithm similarity matrix algorithm

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1YANG Y , LIU X. Are-examination of text categorization meth- ods [ C] //Proceeding of the 22nd Annual International ACM SIGI R Conference on Reseach and Development in Information Retrieval (SIGIR' 99) , 1999: 42-49.
2COVER T M, HART R E. Nearest neighbor pattern classifica- tion [ J]. IEEE Transactions on Information Theory, 1967, 13 (1) : 21-27.
3LIU Yu, CHEN Guisheng. KNN algorithm improving based on cloud model [ C ] //2010 2nd International Conference on Ad- vanced Computer Control (ICACC) . Changsha, 2010 : 63-66.
4ZHOU Lijuan, et al. A clustering-based KNN improved algo- rithm CLKNN for text classification [ C ] // Automation and Robot (CAR' 10 ) . Proceedings of the2nd International Asia Conference on Informatics in Control. Piscataway, NJ, USA: IEEE Press, 2010: 212-215.
5HUANG Hong, GUO Juan, WANG Ben. An improved KNN al- gorithm based on adaptive cluster distance bounding for high di- mensional indexing [ C]. 2012 Third Global Congress on Intel- ligent Systems, 2012: 213-217.
6ZHOU Yong, LI Youwen, XIA Shixiong. An improved KNN text classification algorithm based on clustering [ J ]. Journal of Computers, 2009, 3 (4) : 230-237.
7LIU Haifeng, LIU Shousheng, SU Zhan. An improved KNN text categorization on skew sort condition [ C ]. 2010 Interna- tional Conforence on Computer Application and System Modeling (ICCASM 2010) . Taiyuan, 2010: 182-186.
8ZHAO Weidong, TANG Shuanglin, DAI Weihui. An improved KNN algorithm based on essential vector [ J]. Electronics And Electrical Engineering, 2012 (123) : 119-122.
9梁俊杰,王长磊.利用分区和距离实现高维空间快速KNN查询[J].计算机研究与发展,2007,44(11):1980-1985. 被引量：4
10刘海博,郗亚辉,王煜.用于文本分类的快速KNN算法[J].河北大学学报（自然科学版）,2008,28(3):322-326. 被引量：5

二级参考文献22

1王晓晔,王正欧.K-最近邻分类技术的改进算法[J].电子与信息学报,2005,27(3):487-491. 被引量：25
2王煜,王正欧.基于模糊决策树的文本分类规则抽取[J].计算机应用,2005,25(7):1634-1637. 被引量：13
3乔玉龙,潘正祥,孙圣和.一种改进的快速k-近邻分类算法[J].电子学报,2005,33(6):1146-1149. 被引量：25
4董道国,刘振中,薛向阳.VA-Trie:一种用于近似k近邻查询的高维索引结构[J].计算机研究与发展,2005,42(12):2213-2218. 被引量：10
5J S Pan, Y L Qiao, S H SUN. A fast K nearest neighbors classification [J]. IEICE Trans Fundamentals, 2004, 87 (4) :961 - 963.
6景丽萍高阳吴国宝.基于K—means特征加权算法的大规模文本数据子空间聚类[J].计算机研究与发展,2005,42:85-85.
7Songbo Tan. Neighbor weighted K-nearest neighbor for unbalanced text corpus[J]. Expert Systems with Applications, 2005,28(4) : 667 - 671.
8W J Hwang, K W Wen. Fast KNN classification algorithm based on partial distance search[J]. Electron Lett, 1998, 34(21) :2062 -2063.
9J Kennedy, R C Eberhart. Particle swarm optimization[A]. Proceedings of the 1995 IEEE International Conference on Neural Networks[C]. Perth, Australia: IEEE Service Center, Piscataway, NJ, 1995. 1942- 1948.
10E Chavez,G Navarro,R Baeza-Yates,et al.Searching in metric spaces[J].ACM Computing Surveys,2001,33(3):273-321

共引文献15

1李灿泽,吴根秀.基于证据理论与核函数的k-NN分类新方法[J].中国软科学,2010(S1):393-397.
2张爱华,荆继武,向继.中文文本分类中的文本表示因素比较[J].中国科学院研究生院学报,2009,26(3):400-407. 被引量：5
3于静洋,于俊洋.高维数据空间索引方法的研究[J].电脑知识与技术,2009,5(6):4103-4104.
4索红光,孙鑫.针对中文检索的Lucene改进策略[J].计算机应用与软件,2009,26(6):175-177. 被引量：10
5许朝阳.KNN系数修正迭代求精算法[J].计算机与现代化,2010(10):20-22.
6赵俊杰.基于特征加权的KNNFP改进算法及在故障诊断中的应用[J].电子技术应用,2011,37(4):113-116. 被引量：2
7路永和,曹利朝.基于粒子群优化的文本特征选择方法[J].现代图书情报技术,2011(7):76-81. 被引量：6
8胡元,石冰.基于区域划分的kNN文本快速分类算法研究[J].计算机科学,2012,39(10):182-186. 被引量：23
9路永和,何新宇.基于维度索引表的改进KNN分类算法[J].情报理论与实践,2014,37(5):102-106. 被引量：3
10李志龙,黄理灿,刘飘悦.基于GPU的文本特征选择与特征加权[J].工业控制计算机,2014,27(5):106-108. 被引量：1

同被引文献20

1Kang A N,Barolli L,Park J H,et al.A strengthening plan for enterprise information security based on cloud computing[J].Cluster Computing,2013:1-8.
2Cholez H,Girard F.Maturity assessment and process improvement for information security management in small and medium enterprises[J].Journal of Software:Evolution and Process,2014,26(5):496-503.
3Ahmad A,Maynard S B,Park S.Information security strategies:towards an organizational multi-strategy perspective[J].Journal of Intelligent Manufacturing,2014,25(2):357-370.
4SHARIFI, ABOOSALEH M, AMIRGHOLIPOUR. Intrusion de- tection based on joint of k-means and knn[J]. Journal of Conver- gence Information Technology,2014(5) :45-52.
5SHASI4IDHAR HV,SUBRAMANIAN VARADARAJAN. Customer segmentation of bank based on data mining security value based heuristic approach as a replacement to kmeans segmentation[J]. International Journal of Computer Applications, 2011 (5) : 66-72.
6S VIMALA. Convergence analysis of eodehook generation teeh: niques for vector quantization using K-Means clustering technique [J]. International Journal of Computer Applications, 2011 (3) : 85- 92.
7NAL1NI SINGH, AMBARISH G MOHAPATRA. Breast cancer mass detection in mammograms using kmeans and fuzzy cmeans clustering [J]. International Journal of Computer Applications, 2014 (3) : 34-40.
8HEJIN YUAN,CUIRU WANG. A human action recognition algo- rithm based on semi-supervised kmeans clustering[J]. Transactions on Edutainment, 2014 (6): 47- 52.
9唐然,龙腾锐,龙向宇.基于模糊聚类的改进遗传算法[J].重庆大学学报（自然科学版）,2008,31(2):166-169. 被引量：6
10肖锟.浅议网络环境下的企业信息安全管理[J].标准科学,2010(8):20-23. 被引量：15

引证文献2

1王茜,习磊.基于行业分布的企业网络信息安全威胁及对策研究[J].价值工程,2015,34(20):50-53.
2谭黔林,覃运初,卢艳兰.一种改进的K-medoids知识聚类算法研究[J].软件导刊,2016,15(8):13-15.

1李邺,陈北京,张旭,舒华忠.一种结合稀疏表示和切比雪夫矩的人脸识别算法[J].东南大学学报（自然科学版）,2012,42(2):249-253. 被引量：3
2陈江丽,张嵘.一种基于最短距离聚类的K最近邻分类算法[J].新乡学院学报,2014,31(12):29-33. 被引量：1
3路永和,何新宇.基于维度索引表的改进KNN分类算法[J].情报理论与实践,2014,37(5):102-106. 被引量：3
4张洁玉,武小川.加权局部二值模式的人脸特征提取[J].中国图象图形学报,2014,19(12):1794-1801. 被引量：15
5蒋盛益,李庆华.有指导的入侵检测方法研究[J].通信学报,2006,27(3):86-93. 被引量：5
6饶鲜,杨绍全,魏青,董春曦.核的最近邻算法及其仿真[J].系统工程与电子技术,2007,29(3):470-471. 被引量：5
7孙秀萍.基于小波变换与2DPCA的人脸特征提取方法[J].阴山学刊（自然科学版）,2008,22(4):65-67.
8邱宁佳,郭畅,杨华民,王鹏,温暖.基于MapReduce编程模型的改进KNN分类算法研究[J].长春理工大学学报（自然科学版）,2017,40(1):110-114. 被引量：3
9杜琳娜,闫光辉,杨霞霞,刘利松.一种改进的KNN中文文本分类算法[J].软件导刊,2010,9(2):51-53. 被引量：2
10刘峰.基于面部特征的性别鉴别方法研究[J].计算机光盘软件与应用,2013,16(5):293-294.

情报理论与实践

2014年第1期

浏览历史

内容加载中请稍等...

文档相似矩阵在提高KNN分类效率中的应用被引量：2

参考文献12

二级参考文献22

共引文献15

同被引文献20

引证文献2

相关作者

相关机构

相关主题

浏览历史

文档相似矩阵在提高KNN分类效率中的应用 被引量：2

参考文献12

二级参考文献22

共引文献15

同被引文献20

引证文献2

相关作者

相关机构

相关主题

浏览历史

文档相似矩阵在提高KNN分类效率中的应用被引量：2