摘要
针对短文本信息篇幅短、信息量少、特征稀疏的特点,提出一种基于LDA(Laten Dirichlet Allocation)主题模型特征扩展的短文本分类方法。该方法利用LDA模型得到文档的主题分布,然后将对应主题下的词扩充到原来短文本的特征中,作为新的部分特征词,最后利用SVM分类方法进行分类。实验结果表明,相比于传统的基于VSM模型的分类方法,基于LDA特征扩展的短文本分类方法克服了特征稀疏的问题,在各个类别上的查准率、查全率和F1值都有所提高,充分验证了该方法对短文本分类的可行性。
This paper presented a short text classification method based on LDA (Laten Dirichlet Allocation) theme model for short text information, short message, and sparse features. This method used the LDA model to obtain the subject distribution of the document, and then extended the word under the corresponding topic into the characteristics of the original short text as a new part of the feature word. Finally, SVM classification method was used to classify. The experimental results show that the short text classification method based on the LDA feature extension overcomes the problem of sparseness of features, and the precision, recall and F1 values are improved in all categories compared with the traditional classification method based on VSM model. It is proved that the method is feasible for short text classification.
出处
《软件导刊》
2018年第3期63-66,共4页
Software Guide