摘要
近年来,微博凭借着自身的特点发展成为社会公共舆论的重要平台,对国家安全和社会发展产生了深远的影响,由此对微博文本主题提取显得格外重要。目前,文本主题挖掘的主流技术是主题概率模型。为此,首先对主题概率模型中LDA模型进行了详细地介绍;其次分析了微博的数据特点,从存在噪音词汇、微博文本短小以及微博的时序性等3个方面综述了主题概率模型在微博主题挖掘方面的研究;近一步又综述了利用主题模型发现基于主题的社团关系的研究;最后总结了未来主题模型在挖掘微博主题方面存在的挑战。
In recent years, microblog has become an important platform of social public opinion with its own characteristics, which can influence national security and social development. As such, mi- croblog topic mining is of particular significance. Currently, the main technology of topic mining in text is probability topic model. First, the LDA topic model was introduced briefly. Next, the paper analyzed the characteristics of the microblog data and summarized the research works on application of probability topic model in microblog topic mining from three aspects: short text, noise removal and the timing of microblog text. In addition, the application of probability topic model in mieroblog community discovery was introduced. Finally, some existing challenges were pointed out.
出处
《信息工程大学学报》
2017年第1期103-110,共8页
Journal of Information Engineering University
基金
国家自然科学基金资助项目(61309007)
国家863计划资助项目(2012AA012902)
国家科技支撑计划资助项目(2012BAH47B01)
关键词
微博
主题概率模型
主题
主题提取
社团发现
microblog
probability topic model
topic
topic mining
community discovery