期刊文献+

基于主题模型的文本分类算法性能比较 被引量:2

Performance Comparison of Algorithm for Text Classification Based on a Topic Model
下载PDF
导出
摘要 文章利用LDA模型进行文本降维和特征提取,并将传统分类算法置于集成学习框架下进行训练,以探讨是否能提高单一分类算法的分类准确度,并获得较优的分类效果,使LDA模型能够发挥更高的性能和效果,从而为文本分类精度的提高服务。同时,以Web of Science为数据来源,依据其学科类别划分标准,建立涵盖6个主题的实验文本集,利用Weka作为实验工具,以平均F值作为评价指标,对比分析了朴素贝叶斯、逻辑回归、支持向量机、K近邻算法4种传统分类算法以及AdaBoost、Bagging、Random Subspace 3种集成学习算法的分类效果。从总体上看,通过“同质集成”集成后的文本分类准确率高于单个分类器的分类准确率;利用LDA模型进行文本降维和特征提取,将朴素贝叶斯作为基分类器,并利用Bagging进行集成训练,分类效果最优,实现了“全局最优”。 This study uses the LDA model to conduct dimension reduction and feature extraction for text and trains the traditional classification algorithm within the integrated learning framework, aiming to examine whether the accuracy of a single classification algorithm can be improved, obtain better effect of classification, maximize the function and effect of the LDA model, and improve the accuracy of text classification. Using Web of Science as the data source and based on its subject categories, an experimental text set covering 6 topics is established. Using Weka as the experimental tool and the average F value as the evaluation index, the performance of four traditional classification algorithms including naive Bayes, Logic Regression, SVM and KNN, and three ensemble learning algorithms including AdaBoost, Bagging and Random Subspace is compared and analyzed. Overall, through homogeneous integration, the accuracy rate of text classification after resembling is higher than that of a single classifier. Using the LDA model for text dimension reduction and feature extraction, naive Bayes as the base classifier, and Bagging for ensembled training has the best classification performance and can obtain global optimum.
作者 王万起 田中雨 董兰军 Wang Wanqi;Tian Zhongyu(Liaoning Technical University,Fuxin,Liaoning 123000,China)
出处 《高校图书馆工作》 2022年第2期41-46,共6页 Library Work in Colleges and Universities
关键词 文本分类 集成学习 算法比较 F值 主题模型 Text classification Ensemble learning Algorithm comparison F value Topic model
  • 相关文献

参考文献20

二级参考文献326

共引文献491

同被引文献15

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部