摘要
就是一个高度概括原文重要信息的过程。摘要算法大致可以分为2类:抽取式摘要和生成式摘要。抽取式摘要的目的是从原文中选择一些重要的短语或句子来组成摘要。生成式摘要是利用算法生成文本的另一种表达,所用到的词汇表述并不一定来自于原文。自动文本摘要能够帮助很多下游任务(例如新闻摘要,社会媒体等)。近些年一些基于神经网络的工作大都将抽取式摘要任务当成序列标注来建模。这就存在训练和测试的不一致性问题:训练时当成分类任务,测试时当成排序任务。研究提出一种基于神经网络的回归模型,让模型在训练的时候就直接拟合ROUGE得到其分数用来做排序。实验结果超过目前抽取式摘要的最好结果。
Automatic text summarization is the process of generating a concise representation of original text while retaining the core information. Summarization algorithms can be broadly classified into two categories: extractive and abstractive. Extractive approaches aim to select salient words,phrases or sentences from the original text while the abstractive methods focus on rewriting the content without the constraint of reusing words or phrases from the original text. Automatic summarization can aid many downstream applications( e. g.,news digests,social media). Recently,neural networks based data-driven approaches have become popular for modeling the extractive summarization task. A few recent approaches conceptualize extractive summarization as a sequence labeling task. Another problem is the discrepancy between training and testing,in which during the test time,it is treated as a ranking problem. Thus the paper presents a regression model to solve it. The proposed model learns to score sentences to fit ROUGE during the training. Experiment results show the proposed model outperforms than other extractive summarization systems.
作者
赵怀鹏
车万翔
刘挺
ZHAO Huaipeng;CHE Wanxiang;LIU Ting(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2019年第2期200-203,207,共5页
Intelligent Computer and Applications
关键词
神经网络
抽取式摘要
回归模型
neural networks
extractive summarization
regression model