摘要
情感词典作为情感分析任务中的一项基础资源,是观点发现及情感极性判断的重要依据。随着网络新词的大量出现,情感新词的抽取成为一个亟待解决的问题。针对这一问题提出基于边界特征的情感新词的提取方法。该方法利用skip-gram模型挖掘情感词的边界特征、构建边界特征集,利用边界特征集提取情感新词候选集,通过bigram搭配、序列模式等方法对情感新词候选集进行过滤,根据候选串的频次、与其搭配的边界特征在语料中的分布情况对候选串进行评分。在微博语料上的实验结果显示,该方法对情感新词识别的准确率与候选串得分正相关,当候选串得分为11时准确率为83.33%。实验证明,基于边界特征的情感新词的提取方法能够有效地识别大规模语料中的情感新词。
Sentiment dictionary is one of basic language resources. It is an important basis for opinion mining and senti- mental orientation identification. With the new words teeming, new sentiment word extraction is a problem demanding to be solved. In order to solve this problem, this paper presents a method to extract new sentiment words based on boundary fea- ture. It uses skip-gram model and existing sentiment words to extract boundary feature of sentiment words and construct the set of boundary feature. Then it extracts new sentiment words with boundary feature. After the filtering about bigTam and ar- ray model, to score the candidate words. Experimental result on microblog data show that the precision is positively related to the candidate score. The precision is 83.33% when candidate score is 11. The experiment proved that this method is a- ble to extract new sentiment words effectively in biz scale data.
出处
《重庆邮电大学学报(自然科学版)》
CSCD
北大核心
2014年第6期796-802,共7页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)