摘要
传统情感词典构建方法存在依赖语义知识库、覆盖率有限、领域适应性差等问题。为此,提出一种利用语料库来构建情感词典的方法。该方法选取情感种子词,在语料上训练Word2Vec词向量来选取与种子词相似度高的词语作为候选情感词,并在语料上分析与种子词具有连词关系的词语作为候选情感词。通过种子词和候选情感词之间的相似度构建语义关联图,使用标签传播算法计算情感词的极性,从而构建情感词典。实验结果表明,与基线方法相比,该方法能获得较高的准确率和较好的鲁棒性。
Traditional sentiment lexicon construction methods have problems such as relying on semantic knowledge base,limited coverage and poor domain adaptability. Aiming at these problems,this paper proposes a method to construct sentiment lexicon based on label propagation. The method firstly selects some sentiment seed words manually,and then uses Word2Vec to train word embeddings on corpus and treats the words which have high similarities with seed words as candidate sentiment words,the method also finds out the words which have conjunctive relations with seed words,and treats them as candidate sentiment words. By constructing a semantic association graph through the similarities between seed words and candidate sentiment words,the method uses label propagation algorithm to identify polarities of candidate sentiment words and construct sentiment lexicon. Experimental results show that the method can obtain higher accuracy and better robustness compared with baseline method.
作者
张璞
王俊霞
王英豪
ZHANG Pu;WANG Junxia;WANG Yinghao(College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, Chin)
出处
《计算机工程》
CAS
CSCD
北大核心
2018年第5期168-173,共6页
Computer Engineering
基金
教育部人文社会科学研究青年基金(17YJCZH247)
重庆市教委人文社会科学研究"社会媒体背景下的产品评论挖掘及应用研究"项目(17SKG055)
重庆市教委科技项目(KJ1600440)
关键词
情感分析
情感词典构建
词向量
连词关系
标签传播
sentiment analysis
sentiment lexicon construction
word vector
conjunction relationship
label propagation