摘要
应用BERT模型,设计了一种基于多任务联合学习的古籍文本信息标注工具,可实现对标点、专名信息的自动标注。相较于以往同类技术而言,该工具对人名、地名、时间名、书名的有效识别度更高,并将有助于“远读”方法在古籍文献领域的实现。以《四库全书》所收雍正《畿辅通志》为例,专名自动识别技术可快速提取文献出处、建筑设施的建造时间、人口分布等历史信息,也可以快速提取作家作品、经典意境。在对水利设施的兴建与对黄河水患的书写中,可以看出治河名臣李卫在编纂《畿辅通志》时的个人意志。
Applying the BERT model,this paper designs a tool for labeling text information of ancient books based on multi task joint learning,which can realize the automatic labeling of punctuation and proper nouns.Compared with previous similar technologies,it can effectively recognize people,location,time,and book names,and will help to achieve the distant reading in the field of ancient books.Taking the General Records of the Capital Area of Yongzheng’s years collected in Complete library in the Four Branches of Literature as an example,the automatic recognition of name entities can quickly extract historical information such as the source of literature,the construction time of facilities and population distribution,as well as the writers and classic artistic conceptions.From the writing of the construction of water conservancy facilities and the floods of the Yellow River,we can speculate Li Wei’s personal intents as a famous minister of river management in compiling of General Records of the Capital Area.
作者
诸雨辰
李绅
胡韧奋
Zhu Yuchen;Li Shen;Hu Renfen
出处
《南京师范大学文学院学报》
2023年第1期53-61,共9页
Journal of School of Chinese Language and Culture Nanjing Normal University
基金
国家自然科学基金青年项目“面向古籍整理智能化的知识表示与加工研究”(62006021)
北京市社科重点项目“古典文献的智能化分析与关联技术研究”(21DTR037)。
关键词
命名实体识别
远读
《畿辅通志》
proper name entity recognition
distant reading
General Records of the Capital Area