摘要
鉴于常规的序列化标注方法提取中文评价对象准确率低,存在忽略中文语义与语法信息的缺陷,提出了融合语义与语法信息的中文评价对象提取模型。该模型在原始字向量的基础上通过优化字符含义策略强化语义特征,弥补忽略的字符与词语的内部信息;并通过词性序列标注,对句子的词性信息进行表征,深化输入的语法特征。网络训练使用双向长短期记忆网络并用条件随机场克服标注标签的偏差,提高了提取准确率。该模型在BDCI2017数据集上进行验证,与未融入语义和语法的提取模型相比,中文主题词与情感词提取准确率分别提高了2.1%与1.68%,联合提取的准确率为77.16%,具备良好的中文评价对象提取效果。
The regular method of Chinese opinion target extraction has poor accuracy,and it ignores Chinese semantics and syntactic information.Therefore,a Chinese opinion target extraction model that combines semantic and syntactic information has been proposed.On the basis of the original word vector,the model strengthens the semantic features by optimizing the character meaning strategy,so as to make up for the internal information between the ignored characters and words,and through part-of-speech sequence annotation,the word-of-speech information of the sentence is characterized,and it represents the input syntactic information in depth.Through the bidirectional long short-term memory and the conditional random field,the deviation of the labeled label is avoided,improving the extraction accuracy.The model was validated on the BDCI2017 dataset.When compared with a unincorporated semantics and grammar extraction model,the accuracy of Chinese keyword and sentiment extraction increased by 2.1% and 1.68%,respectively.The accuracy of joint extraction was 77.16%,indicating a good effect on Chinese opinion target extraction.
作者
周浩
王莉
ZHOU Hao;WANG Li(College of Information and Computer Science,Taiyuan University of Technology,Jinzhong 030600,China;College of Big Data,Taiyuan University of Technology,Jinzhong 030600,China)
出处
《智能系统学报》
CSCD
北大核心
2019年第1期171-178,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61872260)
山西省重点研发计划国际合作项目(201703D421013)
关键词
中文评价对象
语义
语法
序列标注
双向长短期记忆网络
条件随机场
提取模型
Chinese opinion target
semantic
syntactic
sequence labeling
bidirectional long short-term memory
conditional random field
extraction model