摘要
【目的/意义】目前LDA模型在文本数据挖掘方法中占有重要的地位,已成为数据挖掘领域的研究热点。为了进一步提高LDA模型在文本挖掘中的应用效果,有必要对LDA模型文本主题提取效果进行对比研究。【方法/过程】本文提出了一种基于LDA模型的不同类型文本数据主题提取效果对比评价方法,先通过LDA模型对文本数据进行主题挖掘;再通过定量的主题提取效果评价方法进行对比研究。【结果/结论】本文以期刊论文、网络舆情事件话题、微博文本、调查问卷为文本数据源,实验结果表明LDA模型在处理语义信息明确逻辑关系合理的长文本数据时,主题提取效果较好。这为提高LDA模型的挖掘效率提供了一定的理论依据。
[Purpose / significance] LDA model plays an important role in text data mining, and has become a research hot- spot in the field of data mining. In order to further improve the application effect of LDA model in text mining, it is neces- sary to make a comparative study on the extraction effect of text topics in LDA model. [ Method/process ] This paper pres- ents a method of topic extraction contrast evaluation based on LDA model for different types of text data, firstly, the topic mining of text data is carried out through LDA model; then comparative study is conducted by quantitative topic extraction effect evaluation methods. [Result/conclusion] This paper takes journal articles, network public opinion events, mi- cro-blog posts and questionnaires as text data sources. The laboratory results show that LDA model deals with long text da- ta which have clear semantic intormation and logical relationship, the topic effect is better. This provides some theoretical basis for improving the excavation efficiency of LDA model.
出处
《情报科学》
CSSCI
北大核心
2018年第1期102-107,共6页
Information Science
基金
吉林省教育科学"十三五"规划项目(GH170061)
北华大学综合性
设计性实验项目