期刊文献+

BETES:一种中文长文档抽取式摘要方法 被引量:3

Method of Extractive Summarization Chinese Long Documents
下载PDF
导出
摘要 在自然语言处理领域是最重要的研究工作之一,并随着深度学习的兴起成为研究热点,而中文长文本的摘要抽取面临更大的挑战,存在长文本-摘要语料库不足、摘要抽取信息不准确、目标摘要冗余、摘要句缺失等问题.本文以中文长文本的摘要抽取为研究对象,提出一种BETES方法,基于规则和人工辅助筛选构建中文长文本-摘要语料库;利用Bert预处理模型进行文本向量化,更好地捕捉长文本上下文的语义,提升信息抽取的准确性;在识别中文长文本的基本篇章单元的基础上,以基本篇章单元为抽取对象,降低摘要抽取的冗余度;最后利用Transformer神经网络抽取模型,实现基本篇章单元的抽取,提升摘要句抽取的准确率.实验证明,提出的BETES方法在中文长文本的抽取式摘要过程中提高了准确性,降低了冗余度,并且ROUGE分数优于主流的摘要抽取方法. Text summarization is one of the most important researches in the field of natural language processing and has become a research hotspot with the rise of deep learning, however, extractive summarization of Chinese long text faces greater challenges, it has some problems, such as insufficient long document summary corpus, inaccurate extraction information, redundant target summary and missing summary sentences.Extractive summarization of Chinese long text is the research object in this paper, we propose a BETES method, To construct Chinese long text-abstract corpus based on rules and manual assisted filtering;The Bert preprocessing model is used for text vectorization to better capture the semantics of long text context and improve the accuracy of information extraction;On the basis of recognizing the Elementary Discourse Units(Edu)of Chinese long text, Taking the Edu as the extraction object, reduce the redundancy of summarization;Finally, Transformer neural network extraction model is used to realize the extraction of Edus and improve the accuracy of summarization sentence extraction.Experiments show that the proposed BETES method can improve the accuracy and reduce the redundancy in the process of extracting long Chinese text, and the ROUGE score is superior to the mainstream summarization extraction method.
作者 王宗辉 李宝安 吕学强 游新冬 WANG Zong-hui;LI Bao-an;LV Xue-qiang;YOU Xin-dong(Beijing key Laboratory of Internet Culture&Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China;School of Computer,Beijing Information Science&Technology University,Beijing 100101,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2022年第1期42-49,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61671070)资助 国家语委重点项目(ZDI135-53)资助 北京信息科技大学促进高校内涵发展科研水平提高项目(2019KYNH226)资助 北京信息科技大学“勤信人才”培育计划项目(QXTCP B201908)资助。
关键词 文本摘要 抽取式摘要 Bert 基本篇章单元 TRANSFORMER text summarization extractive summarization Bert elementary discourse units Transformer
  • 相关文献

参考文献7

二级参考文献70

共引文献88

同被引文献17

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部