期刊文献+

AIGC助力数字人文研究的实践探索:SikuGPT驱动的古诗词生成研究 被引量:9

A Practical Exploration of AIGC-Powered Digital Humanities Research:A SikuGPT Driven Research of Ancient Poetry Generation
下载PDF
导出
摘要 [目的/意义]诗词创作是数字人文领域自然语言生成研究的重要方向,对古诗词遣词造句的版本争议判断、自动诗词问答等具有一定意义,然而当前尚未出现能够自动生成繁体中文古诗词的预训练模型,已有研究着眼于根据使用者需求创作不同风格的简体古诗词。[方法/过程]文章基于CLM使用繁体《四库全书》无标点语料、繁体中文古诗词语料在gpt2-chinese-cluecorpussmall上进行继续预训练构建SikuGPT2、SikuGPT2-poem模型。采用困惑度、BLEU、专家打分、图灵测试等验证模型性能。[结果/结论]实验显示SikuGPT2-poem模型困惑度较低,生成的诗歌BLUE评分较基准模型低0.053左右,在人工打分中较基准模型平均高1.93分。总体而言,文章提出的模型表现优异且通过图灵测试,提出的古汉语生成式系列模型的预训练语料集尚小。模型在古诗生成方面表现较好,但尚不能满足赋、曲等体裁的需要。 [Purpose/significance]Poetry composition is an important direction for natural language generation research in the digital humanities,with implications for version dispute judgment of ancient poetry phrasing and automatic poetry quizzes.Yet no pre-training model capable of automatically generating ancient poems in traditional Chinese has emerged,and existing research has focused on creating different styles of simplified ancient poetry according to user needs.[Method/process]This paper constructs SikuGPT2 and SikuGPT2-poem models based on CLM using the traditional Si Ku Quan Shu unpunctuated corpus and the traditional Chinese ancient poetry corpus on gpt2-chinese-cluecorpussmall for continued pre-training.Perplexity,BLEU,expert scoring,and the Turing test were used to verify the model performance.[Result/conclusion]The experiments show that the SikuGPT2-poem model has a lower perplexity,generated poems with BLUE scores around 0.053 lower than the benchmark model,and scores on average 1.93 points higher than the benchmark model in manual scoring.Overall,the model proposed in this paper performed well and passed the Turing test.The pre-trained corpus set of the series generative model of ancient Chinese proposed in this paper is still small.The model performs well in the generation of ancient poems but cannot yet meet the needs of genres such as fugue and song.
出处 《情报理论与实践》 北大核心 2023年第5期23-31,共9页 Information Studies:Theory & Application
基金 国家社会科学基金重大项目“中国古代典籍跨语言知识库构建及应用研究”的成果,项目编号:21&ZD331。
关键词 四库全书 SikuGPT 预训练语言模型 诗歌生成 数字人文 Si Ku Quan Shu SikuGPT pre-trained language model poetry generation digital humanities
  • 相关文献

参考文献11

二级参考文献91

共引文献89

同被引文献275

引证文献9

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部