摘要
为解决现有的神经语言生成模型存在生成故事中重复和长程连贯性缺失的问题,设计出一种常识增强训练的中文故事生成算法。该算法使用经SimBert模块降噪训练后的常识语料库,在Transformer架构下对GPT-2模型进行后训练,并使用OutGen故事集对训练好的模型进行微调;它利用外部知识库的常识进行常识增强训练提升生成文本的逻辑性,并使用常识降噪训练加强常识表述的多样性。实验结果表明,与GPT-2等预训练语言模型相比,本文的模型克服了生成故事的逻辑冲突;与ChatGPT等大型预训练语言模型相比,本文的模型在保证生成故事质量的同时,减少了训练资源的消耗。
To address the issues of repetition and lack of long-range coherence in stories generated in existing neural language generation models,a commonsense-enhanced training algorithm for Chinese story generation is proposed.This algorithm uses the commonsense corpus trained by SimBert module for noise reduction,conducts post-training of GPT-2 model under Transformer architecture,and uses OutGen story set to fine-tune the trained model.It enhances logical coherence of generated texts by commonsense-enhancing training leveraging an external knowledge base and improves diversity of commonsense expressions by commonsense denoising training.Experimental results demonstrate that,the proposed model better resolves logical conflicts in story generation compared to pretrained language models such as GPT-2,and it ensures higher quality of generated stories while reducing training resource consumption compared to large pretrained language models like ChatGPT.
作者
黄宏
李伟
曾志强
宋宇萍
严镕宇
王文杰
HUANG Hong;LI Wei;ZENG Zhiqiang;SONG Yuping;YAN Rongyu;WANG Wenjie(School of Computer&Information Engineering,Xiamen University of Technology,Xiamen 361024,China;School of Mathematical Sciences,Xiamen University,Xiamen 361005,China)
出处
《厦门理工学院学报》
2024年第3期74-80,共7页
Journal of Xiamen University of Technology
基金
福建省自然科学基金项目“半监督多视图深度判别表示学习研究”(2022J011233)
教育部人文社会科学研究项目“基于TRANSFORMER的中国系统性金融风险监测与预警研究”(23YJAZH067)
厦门市科技计划产学研项目“智能辅助评审管理系统”(2023CXY0409)。
关键词
故事生成算法
预训练语言模型
常识增强训练
外部知识库
常识降噪训练
story generation
pretrained language model
commonsense-enhancing training
external knowledge base
commonsense denoising training