自编码器预训练和多表征交互的段落重排序模型

Passage re-ranking model with autoencoder pre-training and multi-representation interaction

下载PDF

导出

摘要在段落重排序任务中,最近研究人员提出了基于双编码器的后期交互架构以实现快速计算。由于这些模型在训练和推理中都使用预训练模型对查询和段落进行独立编码,其排序性能较大地依赖了预训练模型的编码质量。此外,一些多向量的后期交互方式采用字符向量之间的最大相似度之和来计算文本相似度,容易出现部分匹配的问题。针对以上不足,提出了替换段落预测(RPP)的预训练方法,它采用一种部分连接的自编码器架构,使用ELECTRA类似的替换词汇预测任务来让预训练模型建立给定查询和文档之间的语义关系,从而增强其表示能力。在交互方式改进上,设计了一种新的后期交互范式。使用不同注意力引导待排序段落文本表征,通过动态融合后使用点积与查询向量进行相似度计算,具有较低的复杂度和较细的粒度特征。在MS MACRO段落检索数据集上的重排序实验表明:在不同训练条件下,该模型比ColBERT和PreTTR在MRR@10指标上都要优秀;在使用知识蒸馏情况下,性能接近教师模型的水平,且排序时间在GPU和CPU大幅缩短。 In the tasks of passage re-ranking,recent studies propose late interaction architectures based on bi-encoders for faster computation.Since these models independently encode queries and passages during training and inference,the performance of the ranking model heavily relies on the embedding quality of the encoder.Moreover,some multi-vector late-interaction approaches,which calculate text similarity by summing the maximum similarities between character vectors,may encounter partial matching issues.To address these limitations,this paper proposed a pre-training method called replacement paragraph prediction(RPP).It adopted a partially connected autoencoder architecture and employed a task similar to ELECTRA’s replacement token prediction to enable the pre-trained model to establish semantic relationships between given queries and passages,thus enhancing its representational capacity.Regarding the improvement of interaction methods,it designed a new late-interaction paradigm.It used different attention mechanisms to guide different text representations for the passages to be ranked.It dynamically fused these representations and computes similarity with the query vector through dot product,providing a lower complexity and finer granularity in interaction.Experiments on the MS MACRO passages ranking dataset demonstrate that the proposed model outperforms ColBERT and PreTTR on the MRR@10 metric under different training conditions.When using knowledge distillation,the proposed model achieves performance comparable to that of the teacher model,and reduces the sorting time on GPUs and a CPUs.

作者张康陈明顾凡 Zhang Kang;Chen Ming;Gu Fan(School of Information,Shanghai Ocean University,Shanghai 201306,China)

机构地区上海海洋大学信息学院

出处《计算机应用研究》 CSCD 北大核心 2023年第12期3643-3650,共8页 Application Research of Computers

基金上海市科技创新计划项目(20dz1203800)。

关键词自编码器预训练重排序后期交互 autoencoder pre-training re-ranking late interaction

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1Tian-Xiang Sun,Xiang-Yang Liu,Xi-Peng Qiu,Xuan-Jing Huang.Paradigm Shift in Natural Language Processing[J].Machine Intelligence Research,2022,19(3):169-183. 被引量：9

二级参考文献1

1QIU XiPeng,SUN TianXiang,XU YiGe,SHAO YunFan,DAI Ning,HUANG XuanJing.Pre-trained models for natural language processing: A survey[J].Science China(Technological Sciences),2020,63(10):1872-1897. 被引量：142

共引文献8

1吴骎,程毅松,王波,张中伟,金晓东,康焰.面向未来的“新基建”——华西重症医学大数据平台建设[J].中华重症医学电子杂志,2023,9(3):259-264.
2乔胤博,杨志豪,林鸿飞.融合生成式模型的知识增强实体链指方法[J].广西科学,2023,30(1):61-70.
3张心月,刘蓉,魏驰宇,方可.融合提示知识的方面级情感分析方法[J].计算机应用,2023,43(9):2753-2759. 被引量：2
4侯志江.智慧图书馆建设之“渔”——悄然来临的人工智能基础设施化时代[J].新世纪图书馆,2023(10):5-12. 被引量：1
5张鑫,许海云,杨宁,方肖,赵爽.有限样本下的科技文献语步识别方法探讨[J].图书情报工作,2024,68(3):117-129.
6Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun.The Life Cycle of Knowledge in Big Language Models:A Survey[J].Machine Intelligence Research,2024,21(2):217-238. 被引量：1
7邓远飞,李加伟,蒋运承.基于知识注入提示学习的专利短语相似度计算[J].计算机工程,2024,50(4):294-302.
8韩飞,王明博,王安义,刘朝阳.基于深度学习的高效矿井MIMO信号检测研究[J].煤炭工程,2024,56(5):166-172. 被引量：1

1严凌霄,吴汉舟,冯国瑞,张新鹏.基于深度学习的印刷纹理图矢量化[J].工业控制计算机,2023,36(7):48-50.
2郭力平,曾蓓,朱丹瑶.朝向“应许之地”:最近发展区理论的温故与知新[J].北京教育学院学报,2023,37(4):26-35. 被引量：1
3董亚鹏,赵恬娇,崔文举,王淑惠,张岩丽,王波.回收聚丙烯力学性能改性研究进展[J].塑料工业,2023,51(12):8-14.
4唐庆鹏.生成式人工智能媒介应用的意识形态风险及防范[J].南通大学学报（社会科学版）,2023,39(6):13-22. 被引量：4
5冯晓艳,王金平.基于OMMP算法的多测量向量问题的重构[J].宁波大学学报（理工版）,2024,37(1):43-50.

计算机应用研究

2023年第12期

浏览历史

内容加载中请稍等...

自编码器预训练和多表征交互的段落重排序模型

参考文献1

二级参考文献1

共引文献8

相关作者

相关机构

相关主题

浏览历史