摘要
【目的】针对消费品领域中缺陷词识别任务精度不足的问题,提出基于义原与多特征融合的消费品领域缺陷词识别模型。【方法】模型输入为融合义原信息的分布式词向量,在此基础上添加词性特征和经过随机嵌入的词位置向量,以增添词向量所包含的信息;在卷积神经网络上去除了最大池化,增加卷积核输出的深度向量所包含的信息,为单词分类提供更充分的信息。【结果】实验结果表明,所提模型相较于仅添加词位置向量的卷积神经网络模型,在精确率、召回率和F1值上分别有0.021、0.002和0.012的提升。【局限】不同场景下的相同表述的极性识别不足。【结论】通过消融实验证明,义原、词性以及去除池化层有助于领域词识别模型性能的提升。
[Objective]This paper proposes a CNN model based on the sememe and multi-features,aiming to improve the recognition accuracy of words on defected consumer products.[Methods]First,we created the model’s input with a distributed word vector fused with sememe.Then,we added part-of-speech features and randomly embedded word position vectors to the input.Finally,we removed the max pooling and increased the information contained in the depth vector output by the convolution kernel,which provided sufficient information for word classification.[Results]Compared with the CNN model only adding word position vectors,the proposed method improved the precision,recall and F1 values by 0.021,0.002 and 0.012,respectively.[Limitations]We need to improve the polarity recognition of the same expression in different scenarios.[Conclusions]The sememe,part-of-speech,and the removal of pooling layer could improve the performance of model for domain word recognition.
作者
游新冬
袁梦龙
张乐
吕学强
You Xindong;Yuan Menglong;Zhang Le;Lv Xueqiang(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing100101,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第9期77-85,共9页
Data Analysis and Knowledge Discovery
基金
北京市自然科学基金项目(项目编号:4212020)
国家自然科学基金项目(项目编号:62171043)
中国标准化研究院院长基金项目(项目编号:282020Y-7511)的研究成果之一。