摘要
文本风格迁移是自然语言处理的一项新兴任务,旨在改变文本的风格属性并保持其语义不变,本文对写作风格的迁移进行研究.在较小规模数据集上训练的风格迁移模型虽然能根据源文本生成具有目标的写作风格的文本,但是却无法很好地保留源文本的内容.本文将多个数据集融合进一个训练集,利用更大规模的数据增强模型抽取高级语义特征的能力,同时加入启发式语言模板用于区分不同的数据集.此外,本文还改进了作家归属分类器的分类算法进行写作风格的量化.实验结果表明,本文提出的方法生成的文本不仅能在一定程度上更接近目标写作风格,并且在源文本内容保存和通顺程度方面都优于其他模型.
Text style transfer is a new task in Natural Language Processing,which aims to change the stylistic attributes of text and keep its semantics unchanged.This paper studies the writing style transfer.Text style transfer model trained on a smaller dataset can generate texts with the style of the target writing style according to the source text,but it cannot preserve the content of the source text well.In this paper,multiple data sets are integrated into a training set,and leverage larger scale data to enhance the ability of the model to extract high-level semantic features.Meanwhile,prompt language templates are added to distinguish different data sets.In addition,this paper also improves the classification algorithm of author attribution classifier to quantify the writing style.The experiment results show that the text generated by the proposed method can not only more closely imitate the target writing style to a certain extent,but also outperform other models in terms of the content preservation and fluency of the source text.
作者
顾亦然
薛宇辰
张腾飞
GU Yiran;XUE Yuchen;ZHANG Tengfei(College of Automation&College of Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Center of Smart Corpus Research,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第10期2338-2344,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62073173)资助.
关键词
文本风格迁移
写作风格
融合数据集
启发式语言模板
分类算法
text style transfer
writing style
intergrated dataset
prompt language template
classification algorithm