摘要
近年的自动摘要算法大多是基于监督学习机制,没有考虑到人工标记语料的烦琐,并且大多数摘要模型在句子嵌入时不能结合上下文来更充分表达语义信息,忽略了文本的整体信息。针对上述问题提出了一种基于改进的BERT双向预训练语言模型与图排序算法结合的抽取式摘要模型。根据句子的位置以及上下文信息将句子映射为结构化的句子向量,再结合图排序算法选取影响程度靠前的句子组成临时摘要;为了避免得到有较高冗余度的摘要,对得到的临时摘要进行冗余消除。实验结果表明在公用数据集CNN/DailyMaily上,所提模型能够提高摘要的得分,相对于其他改进的基于图排序摘要抽取算法,该方法效果更佳。
In recent years,most of the automatic summary algorithms are about supervised learning mechanisms,which don’t take into account the cumbersomeness of artificial markers,can’t express semantic information more fully in context when the sentence is embedded,ignoring the overall information of the text.To solve the above problem,this paper proposed an extractive summary model based on the improved BERT bidirectional pre-trained language model combined with the graph sorting algorithm.According to the position of the sentence and the context information,this model mapped the sentence as a structured sentence vector,and combined with the graph sorting algorithm to select the sentence with the highest impact to form a temporary summary.In order to avoid obtaining a high degree of redundancy of the summary,it eliminated the redundancy of the temporary summary.The experimental results show that this model can improve the score of the summary on the common data set CNN/Daily Maily,and the experiment proves that the proposed method is more effective than other improved graph-based sort summary extraction algorithms.
作者
方萍
徐宁
Fang Ping;Xu Ning(School of Computer Science&Technology,Wuhan University of Technology,Wuhan 430070,China;School of Information Engineering,Wuhan University of Technology,Wuhan 430070,China)
出处
《计算机应用研究》
CSCD
北大核心
2021年第9期2657-2661,共5页
Application Research of Computers
关键词
抽取式摘要
BERT
图排序算法
冗余消除
extractive summary
BERT
graph sorting algorithm
redundancy elimination