摘要
自动文摘研究是指通过自然语言处理技术对原始文本进行压缩、提炼,在保留文档核心思想的同时为用户提供简明扼要的文字描述。传统的自动文摘方法通常只考虑字、词、句子等浅层的文本语义信息,而忽略了深层的主次关系等篇章结构信息对抽取文档核心句子的指导作用。对此,提出一种基于主次关系特征的自动文摘方法。该方法基于长短期记忆网络(Long Short-Term Memory,LSTM)神经网络构建了基于主次关系特征的单文档抽取式摘要模型,通过双向LSTM神经网络模型对句子信息和主次关系信息进行信息增强和语义编码,并利用单向LSTM神经网络对编码后的信息进行摘要抽取。实验结果表明,与当前主流的单文档抽取式摘要方法相比,该方法在摘要的准确性、稳定性和ROUGE评价指标上均有显著的提高。
Automatic summarization technology refers to providing users with a concise text description by compressing and refining the original text while retaining the core idea of document.Usually,the traditional method only considers the shallow textual semantic information and neglects the guiding role of the structure information such as the primary and secondary relations in core sentences extraction.Therefore,this paper proposes an automatic summarization method based on the primary and secondary relation feature.This method utilizes the neural network to construct a single document extractive summarization model based on primary and secondary relation feature.The Bi-LSTM neural network model is used to encode the sentence information and primary and secondary relation information,and the LSTM neural network is utilized to summarize the encoded information.Experimental results show that the proposed method has a significant improvement in accuracy,stability and the ROUGE evaluation index compared with the current mainstream single document extractive summarization methods.
作者
张迎
张宜飞
王中卿
王红玲
ZHANG Ying;ZHANG Yi-fei;WANG Zhong-qing;WANG Hong-ling(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处
《计算机科学》
CSCD
北大核心
2020年第S01期6-11,共6页
Computer Science
基金
国家自然科学基金项目(61806137,61976146)
江苏省高等学校自然科学研究面上项目(18KJB520043)。
关键词
自然语言处理
抽取式摘要
主次关系
神经网络
LSTM
Natural language processing
Extractive summarization
Primary and secondary relation
Neural network
LSTM