期刊文献+

基于LDA特征扩展的短文本分类方法研究 被引量:4

Research on Short Text Classification Method Based on LDA Feature Extension
下载PDF
导出
摘要 针对短文本信息篇幅短、信息量少、特征稀疏的特点,提出一种基于LDA(Laten Dirichlet Allocation)主题模型特征扩展的短文本分类方法。该方法利用LDA模型得到文档的主题分布,然后将对应主题下的词扩充到原来短文本的特征中,作为新的部分特征词,最后利用SVM分类方法进行分类。实验结果表明,相比于传统的基于VSM模型的分类方法,基于LDA特征扩展的短文本分类方法克服了特征稀疏的问题,在各个类别上的查准率、查全率和F1值都有所提高,充分验证了该方法对短文本分类的可行性。 This paper presented a short text classification method based on LDA (Laten Dirichlet Allocation) theme model for short text information, short message, and sparse features. This method used the LDA model to obtain the subject distribution of the document, and then extended the word under the corresponding topic into the characteristics of the original short text as a new part of the feature word. Finally, SVM classification method was used to classify. The experimental results show that the short text classification method based on the LDA feature extension overcomes the problem of sparseness of features, and the precision, recall and F1 values are improved in all categories compared with the traditional classification method based on VSM model. It is proved that the method is feasible for short text classification.
出处 《软件导刊》 2018年第3期63-66,共4页 Software Guide
关键词 短文本分类 隐含狄利克雷分布(LDA) 特征扩展 SVM short text classification Laten Dirichlet Allocation (LDA) feature expansion SVM
  • 相关文献

参考文献3

二级参考文献25

共引文献115

同被引文献34

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部