摘要
微博行文具有较大的自由性,其中情感对象识别是一个困难的问题,尤其是情感对象未显性出现情况下的情感对象识别,暂未发现有效解决方法。该文针对这一难题,结合中文微博的特点,提出了一种改进的条件随机场的模型。该模型把情感对象识别看作一个序列标记问题,通过在传统的CRF序列标记模型上增加情感对象的全局节点,有效地结合上下文信息、句法依赖以及情感词典,从而可以识别出微博中的情感对象。该方法的优势在于能够应用于情感对象未显性出现的情况。实验结果表明该方法比现有方法能更有效地识别出微博中的情感对象。
Owing to informal words and expressions widely used in micro-blogs, target recognition for the sentiment analysis of microblogs is difficult, especially when the targets are not clearly mentioned. An improved conditional random fields model is proposed to deal with this issue, treating sentiment target extraction as a sequence-labeling problem. Through adding global nodes, the contextual information, syntactic rules and opinion lexicon are consid- ered in the targets extraction. The major contribution of this method is that it can be applied to the texts in which the targets are mentioned in the sequence. Experimental results on the Sina microblog data demonstrate that this method outperforms the state-of-art methods.
出处
《中文信息学报》
CSCD
北大核心
2015年第4期50-58,66,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金(61100148
61202269)
广东省自然科学基金(S2011040004804)
广东省科技计划项目(2010B050400011)