期刊文献+

基于聚类的文本分类算法框架研究

Research on the Framework of Text Classification Algorithms Based on Clustering
下载PDF
导出
摘要 KNN算法因其易于理解、理论成熟等优点而被广泛应用于文本分类。由于KNN需遍历样本空间计算距离,当训练集样本规模较大或维数较高时,计算开销是巨大的。针对此问题,首先将遗传算法适应度函数设计部分与K-medoids算法思想相融合形成K-GA-medoids,其次将其与KNN相结合形成用于文本分类的算法框架,在分类过程中,采取先聚类,再分类的步骤,以实现对训练集样本的缩减,从而降低计算开销。实验表明,K-GA-medoids相较于传统K-medoids而言在聚类效果上有较为明显的提升,且将其与KNN相结合形成的文本分类算法框架与传统KNN算法相比在保证分类精确率的前提下,有效提升了文本分类的效率。 KNN algorithm is widely used in text categorization because of its easy to understand and mature theory.Because KNN needs to traverse sample space to calculate distance,when the sample size of training set is large or the dimension is high,the computational cost is huge.In response to this problem,firstly,the fitness function design part of genetic algorithm is combined with the idea of K-medoids algorithm to form K-GA-medoids,secondly,it is combined with KNN to form an algorithm framework for text categorization,in the process of classification,the steps of clustering first and then classification are adopted to reduce the training set samples and reduce the computational overhead.Experiments show that the clustering effect of K-GA-medoids is better than that of traditional K-medoids,and compared with the traditional KNN algorithm,the algorithm framework for text categoriza⁃tion improves the efficiency of text categorization effectively on the premise of guaranteeing the accuracy of classification.
作者 黄细凤 HUANG Xifeng(No.10 Research Institute of China Electronics Technology Group Corporation,Chengdu 610036)
出处 《计算机与数字工程》 2021年第1期21-25,93,共6页 Computer & Digital Engineering
基金 中国电子科技集团公司第十研究所项目(编号:2018-557-05-01)资助。
关键词 KNN K-medoids 文本分类 聚类分析 遗传算法 KNN,K-medoids text categorization cluster analysis genetic algorithm
  • 相关文献

参考文献16

二级参考文献105

共引文献214

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部