期刊文献+

面向长文本的两阶段文本匹配模型TP-TM

TP-TM:two-phase text matching model for long-form texts
下载PDF
导出
摘要 针对传统文本匹配方法无法学习文本间深度语义匹配特征,深度短文本匹配模型难以捕获长文本细粒度匹配信号等问题,提出一种面向长文本的两阶段文本匹配模型TP-TM(Two-Phase Text Matching)。首先使用句子级过滤器过滤噪声句并提取关键句,然后将所获关键句输入词语级过滤器,利用融入了改进版删减策略的BERT(Bidirectional Encoder Representations from Transformers)模型挖掘文本间深度交互特征,对关键句进行词语级噪声过滤和细粒度匹配操作,最终通过拼接BERT不同位置特征预测文本对关系。在中文长文本公开新闻数据集CNSE(Chinese News Same Event)和CNSS(Chinese News Same Story)上进行实验,结果显示,相较于基线模型,TP-TM模型在CNSE和CNSS数据集上的准确率分别提升了0.99和1.55个百分点,F1值分别提升了0.98和1.46个百分点,有效提升了长文本匹配任务的准确度。 Aiming at the problem that traditional text matching methods cannot learn the deep semantic matching features between texts,and the deep short text matching model is hard to capture the fine-grained matching signals of long texts,a two-phase text matching model for long-form texts named TP-TM(Two-Phase Text Matching)was proposed.Firstly,the sentences were fed into sentence-level filters to filter the noisy sentences and extract the key sentences;then the key sentences were fed into a word-level filter,which used the BERT(Bidirectional Encoder Representation from Transformers)model incorporating the improved pruning strategy to mine the deep interaction features between texts,and performed word-level noise filtering and fine-grained matching operations on the key sentences.Finally,the relationship between text pairs was predicted by splicing different position features of BERT.Experimental results show that the accuracy of the TP-TM model on CNSE(Chinese News Same Event)and CNSS(Chinese News Same Story)datasets increases by 0.99 and 1.55 percentage points,and the F1 value increases by 0.98 and 1.46 percentage points,respectively,proving that TP-TM model can effectively improve the accuracy of long-form text matching tasks.
作者 王佳睿 彭程 范敏 WANG Jiarui;PENG Cheng;FAN Min(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《计算机应用》 CSCD 北大核心 2023年第S01期33-38,共6页 journal of Computer Applications
基金 四川省科技计划项目(2022ZHCG0007)
关键词 文本匹配 长文本 BERT 过滤器 特征删减 text matching long-form text BERT(Bidirectional Encoder Representation from Transformers) filter feature deletion
  • 相关文献

参考文献4

二级参考文献14

共引文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部