摘要
基于深度学习的短文本语义相似度度量方法是现代自然语言处理任务的基石,其重要性不言而喻。本文提出一种基于卷积神经网络和双向门控循环单元的文本编码模型,通过卷积层提取重要语义并且通过双向门控循环单元保证语义顺序,采用孪生神经网络结构保证文本编码的一致性。选取传统的卷积神经网络和长短期记忆网络以及BERT模型进行对比验证,在Quora、Sick和MSRP数据集上的验证结果表明,本文模型的精确率和召回率表现优异,且F_(1)值也优于传统模型。
Short text semantic similarity measurement based on deep learning is the cornerstone of modern natural language processing,and its importance is self-evident.Text encoding model is proposed in this paper based on convolutional neural network and bidirectional gated circulation unit,by convolution important semantic extraction and through bidirectional gated circulation unit to ensure semantic sequence cycles.And the consistency of text encoding is ensured by Siamese neural network structure.In this paper,traditional convolution neural networl is compared with both short-term and long-term memory network and BERT model.Experimental results are done on Quora data set,Sick data set and MSRP data set.The verification results show that the accuracy and recall rate of the proposed model are excellent,and the comprehensive performance index F_(1) value is the best compared with the traditional model.
作者
周圣凯
富丽贞
宋文爱
ZHOU Shengkai;FU Lizhen;SONG Wen’ai(School of Software,North University of China,Taiyuan Shanxi 030051,China;Shanxi Military and Civilian Intergration Software Engineering Technology Research Center,Taiyuan Shanxi 030051,China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2022年第3期49-56,共8页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金(61602427)
山西省军民融合软件工程技术研究中心开放基金(2111400005HX)。
关键词
自然语言处理
语义相似度
卷积神经网络
长短期记忆网络
门控循环单元
natural language processing
semantic similarity
convolutional neural network
long short-term memory
gated recurrent unit