基于二阶段对比学习的中文自动文本摘要方法研究

Automatic Chinese Text Summarization Based on Two-Stage Contrastive Learning

下载PDF

导出

摘要在中文自动文本摘要中,暴露偏差是一个常见的现象。由于中文文本自动摘要在序列到序列模型训练时解码器每一个词输入都来自真实样本,但是在测试时当前输入用的却是上一个词的输出,导致预测词在训练和测试时是从不同的分布中推断出来的,而这种不一致将导致训练模型和测试模型直接的差异。本文提出了一个两阶段对比学习框架以实现面向中文文本的生成式摘要训练,同时从摘要模型的训练以及摘要评价的建模进行对比学习。在大规模中文短文本摘要数据集(LCSTS)以及自然语言处理与中文计算会议的文本数据集(NLPCC)上的实验结果表明,相比于基线模型,本文方法可以获得更高的面向召回率的摘要评价方法(ROUGE)指标,并能更好地解决暴露偏差问题。 Exposure bias is a common phenomenon in Chinese automatic text summarization.Seq2Seq models are usually trained by using the teacher forcing method in a maximum likelihood framework.During the decoding phase of training and validation,the true summaries are fed into the decoder.Each word input to the decoder of Chinese text auto-summarization comes from real samples during training.The current input is used as the output of the previous words during testing.It leads to the prediction words being inferred from different distributions during training and testing,leading to the direct difference between the training model and the testing model.A two-stage comparative learning framework is proposed to implement generative summary training for Chinese text.Contrastive learning is also performed from the training of summary models as well as the modeling of summary evaluation.A series of rigorous experimental designs and statistical analysis methods are employed to comprehensively evaluate and validate the text summarization method proposed in this paper.The experimental results show that the proposed method achieves significant improvements in both summary quality and efficiency compared with traditional text summarization methods.Contrastive learning from the training of summary models as well as from the modeling of summary evaluation are simultaneously conducted.In addition,rather than constructing a positive or negative example as in most existing comparative learning work,the"comparability"of this paper is reflected in the different qualities of the naturally generated summarization.The experimental results show that the proposed method has better performance on the LCSTS and NLPCC datasets.

作者杨子健郭卫斌 YANG Zijian;GUO Weibin(School of Information Science and Technology,East China University of Technology,Shanghai 200237,China)

机构地区华东理工大学信息科学与工程学院

出处《华东理工大学学报（自然科学版）》 CAS CSCD 北大核心 2024年第4期586-593,共8页 Journal of East China University of Science and Technology

基金国家自然科学基金(62076094)。

关键词中文自动文本摘要对比学习暴露偏差预处理模型 ROUGE指标 Chinese automatic text summarization contrastive learning exposure bias preprocessing model ROUGE indicator

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1朱兵兵,罗飞,罗勇军,丁炜超,黄浩.基于子句抽取的文本摘要自动提取算法[J].华东理工大学学报（自然科学版）,2024,50(1):114-120. 被引量：1

二级参考文献6

1黄波,刘传才.基于加权TextRank的中文自动文本摘要[J].计算机应用研究,2020,37(2):407-410. 被引量：21
2徐馨韬,柴小丽,谢彬,沈晨,王敬平.基于改进TextRank算法的中文文本摘要提取[J].计算机工程,2019,45(3):273-277. 被引量：24
3李娜娜,刘培玉,刘文锋,刘伟童.基于TextRank的自动摘要优化算法[J].计算机应用研究,2019,36(4):1045-1050. 被引量：17
4李金鹏,张闯,陈小军,胡玥,廖鹏程.自动文本摘要研究综述[J].计算机研究与发展,2021,58(1):1-21. 被引量：49
5朱玉佳,祝永志,董兆安.基于TextRank算法的联合打分文本摘要生成[J].通信技术,2021,54(2):323-326. 被引量：9
6汪旭祥,韩斌,高瑞,陈鹏.基于改进TextRank的文本摘要自动提取[J].计算机应用与软件,2021,38(6):155-160. 被引量：12

1曹珍,郭攀峰.基于深度学习的中文短文本多标签分类模型[J].计算机与数字工程,2024,52(6):1809-1814.

华东理工大学学报（自然科学版）

2024年第4期

浏览历史

内容加载中请稍等...

基于二阶段对比学习的中文自动文本摘要方法研究

参考文献1

二级参考文献6

相关作者

相关机构

相关主题

浏览历史