摘要
基于循环神经网络和注意力机制的Sequence-to-Sequence模型神经网络方法在信息抽取和自动摘要生成方面发挥了重要作用。然而,该方法不能充分利用文本的语言特征信息,且生成结果中存在未登录词问题,从而影响文本摘要的准确性和可读性。为此,利用文本语言特征改善输入的特性,同时引入拷贝机制缓解摘要生成过程未登录词问题。在此基础上,提出基于Sequence-to-Sequence模型的新方法 Copy-Generator模型,以提升文本摘要生成效果。采用中文摘要数据集LCSTS为数据源进行实验,结果表明所提方法能够有效地提高生成摘要的准确率,可应用于自动文本摘要提取任务。
The neural network method based on Sequence-to-Sequence model with Recurrent Neural Networks(RNN)and attention mechanism plays an important role in information extraction and automatic summary generation. However, this method cannot take full advantage of the linguistic features of text, and has the problem of out-of-vocabulary in the generated summarization, which influences the accuracy and readability of text summarization. To address the above problems,using text linguistics features to improve the input features, and introducing copy mechanism to alleviate the out-of-vocabulary problem in the process of summarization generation, this paper proposes a new method named Copy-Generator model based on Sequence-to-Sequence model to promote the generated summarization result. Taking the Chinese summarization dataset LCSTS as data source, the experimental results show that the proposed method can improve the accuracy of generated summarization, and can be applied to large-scale automatic text summarization task.
作者
周健
田萱
崔晓晖
ZHOU Jian;TIAN Xuan;CUI Xiaohui(School of Information Science and Technology,Beijing Forestry University,Beijing 100083,China)
出处
《计算机工程与应用》
CSCD
北大核心
2019年第1期128-134,共7页
Computer Engineering and Applications
基金
中央高校基本科研业务费专项基金(No.TD2014-02)
中央高校基本科研业务费专项资金(No.BLX2014-27)