摘要
语义匹配作为自然语言处理任务中重要的一环,直接制约问答系统、信息检索等任务的效率。针对现有语义模型大多只以词为基本语义单元进行注意力交互,较少考虑中文中的词边界模糊和字符信息获取不足而带来的语言颗粒度对整体建模忽略的问题,提出一种增强的多粒度特征融合语义匹配模型EMGFM。首先结合BERT模型和word2vec以获得增强的字符向量表示,然后从字、词、句三种粒度进行注意力的交互,并对交互结果进行加权融合,以突出不同交互信息对整体建模的贡献。为减少交互过程中产生的信息损失,通过构造差异性来对交互信息进行信息增强。最后通过最大池化、平均池化两种方式获得文本的最终语义表示以进行匹配度的计算。该模型在CCKS问句匹配大赛中文数据集上达到了87%的正确率,相比于一些语义匹配的经典模型准确率均有提升,证明该方法确实能有效提升问句语义匹配的准确性。
As an important part of natural language processing tasks,semantic matching directly restricts the efficiency of question answering system,information retrieval and other tasks.Most of the existing semantic models only take words as the basic semantic unit for attention interaction,and less take into account the problem of language granularity ignoring the overall modeling caused by the fuzzy word boundary and insufficient acquisition of character information in Chinese.Therefore,an enhanced multi granularity feature fusion semantic matching model EMGFM is proposed.Firstly,the BERT model and word2vec are combined to obtain the enhanced character vector representation,then the attention interaction is carried out from the three granularity of words,phrases and sentences,and the interaction results are weighted fused to highlight the contribution of different interaction information to the overall modeling.In order to reduce the information loss in the interactive process,the interactive information is enhanced by constructing differences.Finally,the final semantic representation of the text is obtained by maximum pooling and average pooling to calculate the matching degree.The model achieves 87%accuracy on the Chinese data set of CCKS question matching competition.Compared with some classical models of semantic matching,the accuracy is improved.It proves that the proposed method can effectively improve the accuracy of question semantic matching.
作者
尚福华
蒋毅文
曹茂俊
SHANG Fu-hua;JIANG Yi-wen;CAO Mao-jun(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)
出处
《计算机技术与发展》
2022年第7期28-33,共6页
Computer Technology and Development
基金
黑龙江省自然科学基金(LH2019F004)
东北石油大学青年科学基金(2018QNL-25)
东北石油大学优秀中青年科研创新团队(KYCXTD201903)。