期刊文献+

面向数字人文的《四库全书》子部自动分类研究——以SikuBERT和SikuRoBERTa预训练模型为例 被引量:15

Automatic Text Classification of“Zi”Part of Siku Quanshu from the Perspective of Digital Humanities:Based on SikuBERT and SikuRoBERTa Pre-trained Models
下载PDF
导出
摘要 文章基于面向古文自然语言处理的SikuBERT和SikuRoBERTa预训练语言模型,在《四库全书》子部14个类别的古籍文本上开展典籍自动分类模型的构建,并与BERT、BERT-wwm、RoBERTa和RoBERTa-wwm基线模型进行对比。文章提出的两种分类模型效果均优于基线模型,SikuBERT模型取得90.39%的整体分类F值,在天文算法类古籍上的分类F值达98.83%。在类别自动识别任务中,SikuRoBERTa的预测正确率达95.30%。基于SikuBERT和SikuRoBERTa预训练语言模型的四库自动分类体系可以将典籍文本划分为所属子部类别,构建的分类工具为高效自动化典籍分类提供了新途径。 The Siku classification system has a far-reaching influence.In order to solve the difficulty of identifying the right category of existing ancient books and provide tools for research in the field of digital humanities,based on SikuBERT and SikuRoBERTa pre-trained language models for natural language processing of ancient Chinese,an automatic classification model of classical texts of a total of 14 categories of books in the“Zi”part of Siku Quanshu is built.It will also be compared with BERT,BERT-wwm,RoBERTa and RoBERTa-wwm baseline models.The new classification method based on the two pre-trained models as proposed in this paper is found better than the baseline models.The SikuBERT model has achieved a classification F-score of 90.39%,and a F-score of98.83%in astronomical calculation books.In the automatic category recognition task,the prediction accuracy of SikuRoBERTa has reached 95.30%.The proposed automatic classification system based on SikuBERT and SikuRoBERTa pre-trained language models can effectively classify classical texts and the classification tool constructed can provide a new way for efficient automatic classification of classical texts.
作者 胡昊天 张逸勤 邓三鸿 王东波 冯敏萱 刘浏 李斌 HU Haotian;ZHANG Yiqin;DENG Sanhong;WANG Dongbo;FENG Minxuan;LIU Liu;LI Bin
出处 《图书馆论坛》 CSSCI 北大核心 2022年第12期138-148,共11页 Library Tribune
基金 国家社科基金重大项目“中国古代典籍跨语言知识库构建及应用研究”(项目编号:21&ZD331)研究成果。
关键词 预训练模型 SikuBERT 文本分类 数字人文 《四库全书》子部 pre-trained models SikuBERT text classification digital humanities “Zi”part of Siku Quanshu
  • 相关文献

参考文献4

二级参考文献56

共引文献193

同被引文献328

引证文献15

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部