摘要
文本相似度分析是自然语言处理领域的核心任务,基于深度文本匹配模型进行文本相似度分析是当前研究该任务的主流思路。针对传统的MatchPyramid模型对文本特征提取的不足之处进行改进,提出了基于增强Match-Pyramid模型进行文本相似度分析的方法。该方法在输入编码层加入多头自注意力机制和互注意力机制,同时对双注意力机制的输入词向量使用自编码器做降维处理,以降低模型的计算量。接着将双注意力机制的输出与原始词向量相连接,提升了词向量对文本关键信息的表征能力。最后将两个文本的词向量矩阵点积形成的单通道图映射到多个特征子空间形成了多通道图,使用密集连接的卷积神经网络对多通道图进行特征提取。实验结果表明,相比于传统的MatchPyramid模型,所提出的模型准确率提升了1.59个百分点,F1值提升了2.49个百分点。
Text similarity analysis is the core task in the field of natural language processing,and text similarity analysis based on deep text matching model is the main idea of this task.Aiming at the shortcomings of traditional MatchPyramid model in text feature extraction,a text similarity analysis method based on enhanced MatchPyramid model is proposed.In order to reduce the computational complexity of the model,multi-head self-attention mechanism and mutual attention mechanism are added to the input encoding layer,and autoencoder is used to reduce the dimension of the input word vec-tor of dual attention mechanism.Then,the output of the dual attention mechanism is connected with the original word vec-tor to improve the representation ability of the word vector to the key information of the text.Finally,the single channel graph formed by the dot product of the word vector matrix of two texts is mapped to multiple feature subspaces to form a multi-channel graph,and the dense connected convolutional neural network is used to extract the features of the multi channel graph.The experimental results show that compared with the traditional MatchPyramid model,the accuracy of the proposed model is improved by 1.59 percentage points,and the F1 value is improved by 2.49 percentage points.
作者
代翔
孙海春
朱容辰
孙天杨
DAI Xiang;SUN Haichun;ZHU Rongchen;SUN Tianyang(School of Information Network Security,People’s Public Security University of China,Beijing 100038,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第19期158-165,共8页
Computer Engineering and Applications
基金
国家自然科学基金(41971367)
国家重点研发计划项目(2017YFC0803700)
公安部技术研究计划项目(2020JSYJC22ok)。