期刊文献+

一种改进的文本分类算法 被引量:5

An improved text classification algorithm
下载PDF
导出
摘要 文本分类技术是文本挖掘技术中的研究热点之一,但是传统KNN分类算法的时间复杂度高,在不均匀密度样本下分类准确率低。针对这些问题,提出一种在不均匀密度样本下的优化KNN算法:IKNN算法。首先选取样本分类不均匀的训练样本,并对其中高密度样本做出相应的裁剪,以提高准确率。然后在此基础上,针对裁剪后的训练样本使用投影寻踪理论,选取更小的、更具代表性的样本库,以降低分类算法的时间复杂度。在此理论基础上,通过实验表明,在大量的训练样本下,与经典KNN算法相比,IKNN算法具有更高的效率和准确率。 Text classification, as a kind of text mining, has been a hot research area. However, the traditional KNN algorithm is accompanied with higher time complexity and when applied to asymmetric density samples, it produces low classification accuracy. Considering the disadvantage of KNN, an improved algorithm IKNN is suggested. Firstly, samples with asymmetric density are specially chosen and circularly tailored in order to improve the accuracy rate. Secondly, on the base of the tailored samples, projection pursuit theory is employed to choose smaller and more representative sample database for a lower time complexity of classification algorithm. Thus, the experiment suggests that, based on the above theory, the IKNN algorithm is endowed with higher efficiency and accuracy than the traditional KNN algorithm in solving a large number of samples.
作者 任朋启 王芳 黄树成 REN Peng-qi WANG Fang HUANG Shu-cheng(School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China)
出处 《电子设计工程》 2017年第18期1-5,共5页 Electronic Design Engineering
基金 国家自然科学基金(61572498)
关键词 文本分类 KNN算法 IKNN算法 样本裁剪 投影寻踪理论 Text mining KNN algorithm IKNN algorithm samples reduction projection pursuit
  • 相关文献

参考文献10

二级参考文献150

共引文献248

同被引文献38

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部