摘要
主题情感混合模型可以有效地提取语料的主题信息和情感倾向。本文针对现有主题/情感分析方法主题间区分度较低的问题提出了一种词加权LDA算法(weighted latent dirichlet allocation algorithm,WLDA),该算法可以实现无监督的主题提取和情感分析。通过计算语料中词汇与情感种子词的距离,在吉布斯采样中对不同词汇赋予不同权重,利用每个主题下的关键词判断主题的情感倾向,进而得到每篇文档的情感分布。这种方法增强了具有情感倾向的词汇在采样过程中的影响,从而改善了主题间的区分性。实验表明,与JST(Joint Sentiment/Topic model)模型相比,WLDA不仅在采样中迭代速度快,也能够更好地实现主题提取和情感分类。
The topic and sentiment unification model can efficiently detect topics and emotions lor a given corpus. Faced with the low discriminability of topics in sentiment/topic analysis methods,this paper proposes a novel meth-od ,the weighted latent dirichlet allocation algorithm ( WLDA),which can acquire sentiments and topics without supervision. The model assigns weights to terms during Gibbs sampling by calculating the distance between seed words and terms,then counts the weights of key words to estimate the sentiment orientation of each topic and obtain the emotional distribution throughout documents. This method enhances the impact of words that convey emotional attitudes and obtains more discriminative topics as a consequence. The experiments show that WLDA , compared with the joint sentiment/topic model (JST ),not only has a higher iteration sampling speed,but also gives better results for topic extraction and sentiment classification.
出处
《智能系统学报》
CSCD
北大核心
2016年第4期539-545,共7页
CAAI Transactions on Intelligent Systems
基金
山西省回国留学人员科研项目(2015-045
2013-033)
山西省留学回国人员科技活动择优资助项目(2013)
山西省自然科学基金项目(2014011018-2)
关键词
情感分类
主题情感混合模型
主题模型
LDA
加权算法
sentiment classification
topic and sentiment unification model
topic model
LDA
weighting algorithm