摘要
【目的】调研和梳理相关文献,总结多文档摘要研究框架和主流模型。【文献范围】以“Multi-Document Summarization”、“多文档摘要”为检索词,分别在AI Open Index、Paper with Code和CNKI数据库中进行检索,共筛选出76篇文献。【方法】归纳多文档摘要技术实现的主流框架,依据关键技术对近年最新模型和算法进行分类概述,并对未来研究提出展望。【结果】对比阐述了多文档摘要最新模型与传统方法的优缺点,并对高质量多文档摘要数据集、现阶段评价指标进行总结。【局限】在实验结果对比部分,只讨论了Multi-News等数据集上部分应用较为广泛模型的评估结果,缺乏全部模型在同一数据集上的实验结果对比。【结论】多文档摘要任务仍存在很多亟待解决的问题,如生成摘要的事实性不高、摘要模型的通用性差等。
[Objective]This paper reviews the literature on multi-document summarization,aiming to examine their research frameworks and mainstream models.[Coverage]We searched the AI Open Index,Paper with Code,and CNKI databases with queries“multi-document summarization”and“多文档摘要”.A total of 76 representative articles were retrieved.[Methods]We summarized the mainstream research frameworks,the latest models,and algorithms of multi-document summarization technology.We also present prospects for future studies.[Results]This paper compared the strengths and weaknesses of the latest models for multi-document summarization to the traditional methods.We also summarized high-quality multi-document summarization datasets and current evaluation metrics.[Limitations]We only discussed the evaluation results of some popular models on the Multi-News dataset,lacking a comparison of all models on the same dataset.[Conclusions]Many challenges remain in the task of multi-document summarization,including the generated summaries’low factual accuracy and the models’poor generality.
作者
宝日彤
孙海春
Bao Ritong;Sun Haichun(School of Information and Cyber Security,People’s Public Security University of China,Beijing 100038,China;Key Laboratory of Security Technology&Risk Assessment,People’s Public Security University of China,Beijing 100026,China)
出处
《数据分析与知识发现》
EI
CSCD
北大核心
2024年第2期17-32,共16页
Data Analysis and Knowledge Discovery
基金
公安部技术研究计划项目(项目编号:2020JSYJC22)
北京市自然科学基金项目(项目编号:4184099)的研究成果之一。