摘要
基于传统主题模型的无监督情感倾向性分析方法不能较好地解决微博语料特征稀疏的问题。为此,提出一种新的无监督微博情感倾向性分析方法。对语料进行预处理并统计语料中的共现词对,利用BTM模型挖掘文档中的隐含主题,通过已有情感词典分析隐含主题的情感分布,并实现整条微博的情感倾向性分析。在NLP&CC2012语料上进行测试,结果表明,该方法能够有效识别微博的情感倾向,平均F1值比传统主题模型方法提高15%。
Sentiment orientation analysis on microblog has become a research hotspot in current academic circles. Unsupervised methods based on traditional topic models fail to resolve the problem of feature sparsity of microblog corpus,which turns in poor performance in sentiment orientation analysis on microblog. To solve this problem, this paper presents an unsupervised method for sentiment orientation analysis on microblog based on Biterm Topic Model ( BTM). The corpus is preprocessed and the co-occurrence words pairs are counted. BTM model is used in the method to mine the implicit topics in the documents. A sentiment dictionary is used to calculate the sentiment distributions of the topics. The sentiment orientation of the whole microblog is obtained on the basis of the sentiment distributions of the topics. Experimental results conducted on NLP&CC2012 corpus show that the proposed method can more effectively identify microblogs sentiment orientation,and the average Fl-measure is improved by 15% than that of the traditional methods.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第7期219-223,229,共6页
Computer Engineering
基金
国家"863"计划基金资助项目(2011AA7032030D)
国家部委基金资助项目
关键词
微博
短文本
情感倾向性分析
无监督
Biterm主题模型
microblog
short text
sentiment orientation analysis
unsupervised
Biterm Topic Model ( BTM )