摘要
为了解决短文本信息流的特征稀疏性对热点话题发现带来的挑战,提出了结合词语互信息和概率主题模型的微博热点话题发现方法。通过建立词共现矩阵并应用对称非负矩阵分解算法获取词项-主题矩阵,再利用概率潜在语义分析模型进行主题发现,最终通过定义微博热度分析和排序,有效地支持微博热点话题发现。实验表明,此方法能有效地进行话题聚类并检测出热点话题。
In order to face the challenges of feature sparsely of short text messages for microblog hot topic detection, this paper proposes a hot topic detection method based on the combination of term mutual information and probabilistic topic model. Symmetric Nonnegative Matrix Factorization(s NMF)is performed on word co-occurrence with word mutual information and the matrix of term-topic matrix is thereafter inferred. Probabilistic Latent Semantic Analysis(p LSA)model is then adopted to model the topic-microblog. The hotness of topic is analyzed and sorted. Experiments show that this method can effectively cluster and detect the hot topics.
出处
《计算机工程与应用》
CSCD
北大核心
2016年第6期61-66,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61163039
No.61363058)
甘肃省教育厅项目(No.2013A-016)
关键词
词共现矩阵
对称非负矩阵分解
概率潜在语义分析
微博热点话题发现
term co-occurrence matrix
symmetrical nonnegative matrix factorization
probabilistic latent semantic analysis
micro-blog hot topic detection