摘要
文本匹配是检索系统中的关键技术之一。针对现有文本匹配模型对文本语义差异捕获不准确的问题,文中提出了一种基于细粒度差异特征的文本匹配方法。首先,使用预训练模型作为基础模型对匹配文本进行语义的抽取与初步匹配;然后,引入对抗学习的思想,在模型的编码阶段人为构造虚拟对抗样本进行训练,以提升模型的学习能力与泛化能力;最后,通过引入文本的细粒度差异特征,纠正文本匹配的初步预测结果,有效提升了模型对细粒度差异特征的捕获能力,进而提升了文本匹配模型的性能。在两个数据集上进行了实验验证,其中在LCQMC数据集上的实验结果显示,所提方法在ACC性能指标上达到了88.96%,优于已知的最好模型。
Text matching is one of the key technologies in the retrieval system.Aiming at the problem that the existing text ma-tching models can’t capture the semantic differences of texts accurately,this paper proposes a text matching method based on fine-grained difference features.Firstly,the pre-trained model is used as the basic model to extract the matching text semantics and preliminarily match them.Then,the idea of adversarial learning is introduced in the embedding layer,and by constructing the virtual confrontation samples artificially for training,the learning ability and generalization ability of the model are improved.Finally,by introducing the fine-grained difference feature of the text to correct the preliminary prediction results of the text ma-tching,the capture ability of the model for fine-grained difference features is effectively improved,and then the performance of the text matching model is improved.In this paper,two datasets are tested,and the experiment on LCQMC dataset shows that the performance index of ACC is 88.96%,which is better than the best known model.
作者
王胜
张仰森
陈若愚
向尕
WANG Sheng;ZHANG Yang-sen;CHEN Ruo-yu;XIANG Ga(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China)
出处
《计算机科学》
CSCD
北大核心
2021年第8期60-65,共6页
Computer Science
基金
国家自然科学基金(61772081)
国家重点研发计划(2018YFB1403104)
北京信息科技大学科研基金(2035008)。
关键词
文本匹配
预训练模型
语义相似度
对抗学习
差异特征
Text match
Pre-trained model
Semantic similarity
Adversarial learning
Difference feature