摘要
情感主题联合生成模型已经成功应用于网络评论分析.然而,随着智能终端设备的广泛应用,由于屏幕及输入限制,用户书写的评论越来越短,我们不得不面对短评论中的文本稀疏问题.本文提出了一个针对短文本的联合情感–主题模型SSTM(Short-text sentiment-topic model)来解决稀疏性问题.不同于一般主题模型中通常采用的基于文档产生过程的建模方法,我们直接对整个语料集合的产生过程建模.在产生文档集的过程中,我们每次采样一个词对,同一个词对中的词有相同的情感极性和主题.我们将SSTM模型应用于两个真实网络评论数据集.在三个实验任务中,通过定性分析验证了主题发现的有效性,并与经典方法进行定量对比,SSTM模型的文档级情感分类性能也有较大提升.
Topic and sentiment joint modelling has been successfully used in sentiment analysis for opinion text. However,we have to face the text sparse problem in opinion text when the length of text becomes shorter and shorter with popularity of smart devices. In this paper, we propose a joint sentiment-topic model SSTM(short-text sentiment-topic model) for short text. Unlike the topic model which models the generative process of each document, we directly model the generation of the whole review set. In the generation process of corpus, we sample a word-pair each time, in which the two words have the same sentiment label and topic. We apply SSTM to two real life social media datasets with three tasks. In the experiment, we demonstrate the effectiveness of the model on topic discovery by qualitative analysis. On the quantitative analysis of document level sentiment classification, SSTM model achieves better performance compared with the existing approaches.
出处
《自动化学报》
EI
CSCD
北大核心
2016年第8期1227-1237,共11页
Acta Automatica Sinica
基金
国家自然科学基金(61373108
61173062
61133012)
国家社会科学重大招标计划项目(11&ZD189)资助~~
关键词
情感分类
情感主题模型
主题模型
短文本主题模型
文本稀疏
Sentiment classification
sentiment topic model
topic model
short text topic mode
text sparse