摘要
目前中文情感分析的主要资源以情感词典为主,缺乏针对实体或属性的情感知识资源。该文主要研究如何从大规模文本语料中自动获取实体情感知识。在该文方法中,用情感表达组合来表示实体情感知识。首先,基于二部图排序算法对情感表达组合候选集合进行排序。然后,提出了一种基于语义相似的提炼算法对于排序靠后的表达组合进行选择。在提炼选择过程中,充分考虑实体之间和情感词之间的约束。最后,该文在三种大规模不同领域的语料上进行实验,并进行人工评价。评价结果表明,从三个领域数据集上获取的实体情感表达组合正确率均高于90%。最终我们获得了一个大规模情感知识词典,包括约30万对的情感表达组合。
Except for some sentiment dictionaries.There are not sentiment expressions for entities which are very important for analysis.This paper proposes a method of automatically building a dictionary of entity sentiment expressions from large-scale raw text.In our method,we use a sorting algorithm based on a bipartite graph to rank the candidates of sentiment expressions.Then,we present a refining algorithm according to semantic similarity to extract some expressions from the low-rank set.Finally,we conduct the experiments on three datasets from different domains.The experimental results show that the accuracy of the extracted expressions is better than 90%.Totally we obtain a large scale dictionary including about 300 Ksentiment expressions.
作者
卢奇
陈文亮
LU Qi;CHEN Wenliang(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China;Collaborative Innovation Center of Novel Software Technology and Industrialization,Suzhou,Jiangsu 215006,China)
出处
《中文信息学报》
CSCD
北大核心
2018年第8期32-41,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金(61572338)
江苏省高校自然科学研究重大项目(16KJA520001)
关键词
情感分析
情感词典
情感挖掘
信息抽取
sentiment analysis
sentiment dictionary
sentiment mining
information extraction