摘要
提出中文历史档案数据化整理的学术概念,参照FEAF框架设计技术参考模型,针对面临的现实阻力提出若干实践策略。中文历史档案数据化整理是致力于采用数据科学的理论、方法和技术,通过人工标注与机器学习相结合的方式,将历史档案的背景信息、自然语言文本的语义信息和对应社会系统的语境信息转换为计算机可以高速精准处理的大规模结构化数据集的过程。现阶段应进行战略规划,制定相关标准,并按照超前布局的理念启动基础设施建设,分层级、有重点地予以推进。
The academic concept of data-driven organization of Chinese historical archives is proposed in the study,technical reference models based on FEAF framework are designed,and several practical strategies to address the practical obstacles are proposed.The data-driven organization of Chinese historical archives is a process that aims to apply the theories,methods,and techniques of data science,and transform the background information of historical archives,semantic information of natural language texts,and contextual information of corresponding social systems into largescale structured datasets that computers can process quickly and accurately,which combines manual annotation with machine learning.At present,strategic planning should be carried out,relevant standards should be formulated,and infrastructure construction should be initiated according to the concept of advanced layout,gradually promoting in a hierarchical and focused manner.
作者
赵生辉
徐丹丹
马藤
Zhao Shenghui;Xu Dandan;Ma Teng(School of Urban Governance and Public Affairs of Suzhou City University,Suzhou 215104)
出处
《山西档案》
北大核心
2024年第3期5-11,共7页
Shanxi Archives
关键词
中文历史档案
档案整理
档案数据化
历史大数据
Chinese historical archives
Archive organization
Data-driven archive organization
Historical big data