期刊文献+

一种基于深度学习的中文生成式自动摘要方法 被引量:5

An Automatic Summarization Model Based on Deep Learning for Chinese
下载PDF
导出
摘要 针对中文的象形性和结构性特点,本文提出了一种新的生成式自动摘要解决方案,包括基于笔画的文本向量生成技术和一个生成式自动摘要模型。基于笔画的文本向量方法针对组成汉字的最小粒度笔画进行编码,增强了通过Skip-Gram模型得到对应的中文词向量语义信息;然后通过对Seq2Seq模型进行优化,使用Bi-LSTM解决长序列文本信息丢失以及逆向信息的补充问题;并在编码端加入Attention机制以计算不同输入词对解码端的影响权重,在解码端加入Beam Search算法优化生成序列的流畅度。基于LCSTS数据集实验表明,本文提出的模型在中文文本摘要生成质量和可读性上有所提升。 Based on the unique pictograph and the structure of Chinese character, a new way to form automatic summarization is proposed in the paper, which includes text vector technique directing at Chinese stroke and an automatic summarizing model. Stroke-based text vector codes the basic element of Chinese character and it highlights the specific characteristics of the word, which makes the relationship between words tightened. The corresponding text vector of Chinese word is gained by Skip-Gram model and optimized through Seq2 Seq model. It solves the problem of long-sequence text information loss and the supplement of reversing information by using Bi-LSTM. Attention mechanism is used in encoder to weigh different effects of the input statement on decoder and meanwhile the use of Beam Search in the decoder optimizes the sequence of the results. The experiments based on LCSTS data set training model show the automatic summarization model can improve the quality and the readability of Chinese text summary.
作者 李维勇 柳斌 张伟 陈云芳 LI Weiyong;LIU Bin;ZHANG Wei;CHEN Yunfang(Institute of Computing Software,Nanjing Vocational College of Information Technology,Nanjing Jiangsu 210023,China;School of Computer Science and Technology,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210023,China)
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2020年第2期51-63,共13页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金(61672297) 2019年度高校“青蓝工程”优秀教学团队项目(苏教师[2019]3号)。
关键词 深度学习 生成式自动摘要 笔画向量 Seq2Seq 注意力机制 deep learning generation summarization stroke_embedding Seq2Seq attention mechanism
  • 相关文献

参考文献3

二级参考文献21

  • 1张仰森,曹元大,徐波.基于统计的纠错建议给出算法及其实现[J].计算机工程,2004,30(11):106-109. 被引量:7
  • 2张磊,周明,黄昌宁,潘海华.中文文本自动校对[J].语言文字应用,2001(1):19-26. 被引量:23
  • 3秦兵,刘挺,李生.多文档自动文摘综述[J].中文信息学报,2005,19(6):13-20. 被引量:51
  • 4Chao-Huang Chang.A Pilot Study on Automatic Chinese Spelling Error Correction[J].Communication of COLIPS,1994,4(2):143 -149.
  • 5Lei zhang,Ming zhou,Changning Huang,Haihua Pan.Automatic detecting correcting errors in Chinese text by an approximate word-matching algorithm[A].Microsoft Research China Paper Collection[C],2000.9,Vol.1:135-141.
  • 6Li Jianhua,Wang xiaolong.Combining Trigram and Automatic Weight Distribution in Chinese Spelling Error Correction[J].Journal of Computer science and technology.2002,Vol.17 (6):915-923.
  • 7YAN R, KONG L, HUANG C, et al. Timeline generation through evolutionary trans-temporal summarization[C] // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh, United Kingdom: Association for Computational Linguistics, 2011: 433-443.
  • 8RADEV D R, JING H, STY S M, et al. Centroid-based summarization of multiple documents[J]. Information Processing and Management, 2004, 40(6) :919-938.
  • 9BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3 (Jan) : 993-1022.
  • 10LI J, LI S. Evolutionary hierarchical dirichlet process for timeline summarization[C] Linguistics (2). Sofia, Bulgaria : Association for Computational Linguistics, 2013,556.

共引文献34

同被引文献67

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部