摘要
针对现有文本摘要生成方法对源文全局语义信息提取不充分问题,提出了一种融合流注意力机制的并行编码器摘要生成算法模型.首先使用单颗粒的分词方法对源文进行分词;然后在编码阶段引入多头流注意力机制,从而更全面地提取源文的全局语义信息;其次运用并行编码器训练模型,使得输入序列中语义信息获得更大权重;最后将编码得到的全局语义信息送入到融合指针的解码器中,通过指针复制源文词汇,减少生成摘要中未登录词的出现,从而使得解码过程中生成的摘要更加全面准确地匹配源文语义.模型在CLTS和NLPCC两个数据集上进行实验,使用ROUGE-1、ROUGE-2和ROUGE-L作为评价指标.实验结果显示,与基准模型相比在CLTS数据集上分别有2.62%、1.44%和0.87%的提升,在NLPCC数据集上分别有2.82%、1.84%和1.64%的提升,表明所提算法模型在中文摘要生成任务上更加有效.
Aiming at the problem that the existing text summarization generation methods are insufficient to extract the global Semantic information of the source text,a parallel encoder summarization generation algorithm model integrating the flow-attention mechanism is proposed.Firstly,use a single particle segmentation method to segment the source text;Then,in the coding phase,the multi-head flow-attention mechanism is introduced to extract the global Semantic information of the source text more comprehensively;Secondly,the parallel encoder training model is used to make the Semantic information in the input sequence gain greater weight;Finally,the encoded global Semantic information is sent to the decoder of the fusion pointer,and the source text vocabulary is copied through the pointer to reduce the occurrence of unknown words in the generated abstract so that the generated abstract in the decoding process can match the source text semantics more comprehensively and accurately.The model was tested on two datasets,CLTS and NLPCC,using ROUGE-1,ROUGE-2,and ROUGE-L as evaluation indicators.The experimental results show that compared with the benchmark model,there are 2.62%,1.44%,and 0.87%improvements on the CLTS dataset,and 2.82%,1.84%,and 1.64%improvements on the NLPCC dataset,respectively,indicating that the proposed algorithm model is more effective in Chinese abstract generation tasks.
作者
崔少国
王奥迪
杜兴
CUI Shao-guo;WANG Ao-di;DU Xing(School of Computer and Information Sciences,Chongqing Normal University,Chongqing 401331,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2023年第12期2685-2691,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62003065)资助
重庆市科技局自然基金面上项目(CSTB2022NSCQ-MSX1206)资助
重庆市技术预见与制度创新项目(CSTB2022TFII-OFX0042)资助
教育部人文社科规划基金项目(22YJA870005)资助
重庆市教委重点项目(KJZD-K202200510)资助
重庆市教委人文社科项目(23SKGH072)资助
重庆市社会科学规划项目(2022NDYB119)资助
重庆师范大学人才基金项目(20XLB004)资助
重庆市研究生科研创新项目(CYS22558,CYS22555)资助。
关键词
中文文本
摘要生成
流注意力机制
并行编码器
指针网络
Chinese text
summary generation
flow-attention mechanism
parallel encoder
pointer network