摘要
领域微博中包含较多的专业领域信息,并且随时间表现出较强的演化性。为分析领域的主题演化情况,构建一个基于分层Dirichlet过程(HDP)的DM-HDP模型。以用户为单位抽取领域相关的微博,利用微博的领域特征和时间特征,提取领域相关带有明显时间特征的微博并自动挖掘其主题分布,最终构建领域主题演化分析过程。实验结果表明,基于DM-HDP模型的分析方法能够表现领域微博主题的演化过程,与基于LDA和HDP模型的方法相比,在内容困惑度和模型复杂度等方面均具有明显优势。
Domanial microblog contains much professional information that show a strong evolution over time.In order to analyze the topics of professional microblog automatically,a domanial microblog topic evolution method based on Hierarchical Dirichlet Process(HDP) model is built.Firstly,domain-related microblog is extracted with the individual user as the unit.Then,accurate extraction of domain-related microblog with distinct temporal features and automatic mining of its topics using domain features and temporal features.At last,the process of domanial topics evolution analysis is constructed.Experimental results show that the method based on the DM-HDP model can show the evolution of the field of microblog,and compared with the methods that based on the LDA and HDP model,it has obvious advantages in terms of content confusion and model complexity.
出处
《计算机工程》
CAS
CSCD
北大核心
2018年第2期1-8,共8页
Computer Engineering
基金
国家自然科学基金(61163025)
内蒙古自治区自然科学基金(2015MS0621)