摘要
目标级情感分类任务是为了得到句子中特定评价目标的情感倾向。一个句子中往往存在多个目标,多个目标的情感可能一致,也可能不一致。但在已有针对目标级情感分类的评测数据集中:①大多数是一个句子一个目标;②在少数有多个目标的句子中,多个目标情感倾向分布并不均衡,多个目标情感一致的句子占较大比例。数据集本身的缺陷限制了模型针对多个目标进行情感分类的提升空间。针对以上问题,该文构建了一个针对多目标情感分类的中文数据集,人工标注了6339个评价目标,共2071条数据。该数据集具备以下特点:①评价目标个数分布平衡;②情感正负极性分布平衡;③多目标情感倾向分布平衡。随后,该文利用多个目标情感分类的主流模型在该数据集上进行了实验与比较分析。结果表明,现有主流模型尚不能对存在多个目标且目标情感倾向性不一致实例中的目标进行很好的分类,尤其是目标的情感倾向为中性时。因此多目标情感分类任务具有一定的难度与挑战性。
Target-level sentiment classification task is to get the sentiment tendency of a specific evaluation target in a sentence.There are often multiple targets in a comment sentence,and the sentiments of multiple targets may be consistent or inconsistent.However,in the existing evaluation datasets for target-level sentiment classification:1)most of them are single sentence with one target;2)in a few sentences with multiple targets,the sentiment distribution of multiple targets is seriously biased:most multiple targets have the same emotion.In response to the above problems,this paper constructs a Chinese dataset for multi-target sentiment classification,totaling 2,071 items with 6,339 targets manually annotated.The data set provides balance distribution for the number of evaluation targets,positive and negative sentiments,and multi-target sentimental tendency.Meanwhile,this article uses multiple mainstream models of target-level sentiment classification to conduct experiments and comparative analysis on this dataset.Experimental results show that the existing mainstream models are still unable to well classify the targets in instances where there are multiple targets and the target's sentiment is inconsistent,especially when the target's sentiment is neutral.
作者
刘鹏远
田永胜
杜成玉
邱立坤
LIU Pengyuan;TIAN Yongsheng;DU Chengyu;QIU Likun(School of Information Science,Beijing Language and Culture University,Beijing 100083,China;Language Resources Monitoring and Research Center Print Media Language Branch,Beijing Language and Culture University,Beijing 100083,China;School of Computer and Control Engineering,Minjiang University,Fuzhou,Fujian 350108,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第6期30-38,共9页
Journal of Chinese Information Processing
基金
北京市自然科学基金(4192057)
教育部人文社会科学研究规划基金(18YJA740030)
北京语言大学校级项目(中央高校基本科研业务费专项资金)(17PT05)。
关键词
目标级情感分类
中文数据集
多目标
target-level sentiment classification
Chinese dataset
multi-target