摘要
伴随着微博的日趋流行,对微博信息的检索逐渐成为人们获取第一消息的手段。其中文本聚类和主题发现是信息检索领域的有效方法,采用适当的方法是影响微博短文本信息检索质量的关键因素。文章针对文本聚类和LDA主题模型的互补特征,综合考虑了微博特殊文体和短文本聚类效率问题,提出了基于频繁词集的文本聚类和基于类簇的LDA主题挖掘相融合的微博检索方法,给出了针对微博文体的一种新的主题检索模型。实验表明,该方法不仅能有效地划分微博文本,并且能清晰地挖掘类簇中潜在主题。
With the daily popularity of microblog, the search of microblog information has gradually become the method of people to obtain the firsthand news. Text clustering and topic discovery are the effective methods in the information retrieval field. Using the appropriate method is the key factor affecting the information retrieval quality of microblog short text. Based on the com plementary characteristics of text clustering and LDA topic model, and considering the special style of microblog and the clustering efficiency of short text comprehensively, this paper proposes a microblog retrieval method integrating the text clustering based on frequent word set with the LDA topic excavation based on class cluster, and gives a new topic retrieval model for microblog style. The experiment shows that this method can not only divide microblog text effectively, but also excavate the latent topic in the class cluster clearly.
出处
《情报理论与实践》
CSSCI
北大核心
2013年第8期85-90,共6页
Information Studies:Theory & Application
基金
国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"的成果
项目编号:71273194
关键词
文本聚类
主题检索
微博
text clustering
topic retrieval
micro-blog