历代史志目录的数据集成与可视化被引量：5

Diachronic Data Integration and Visualization of Ancient Book Catalogs Compiled for Imperial Collections

下载PDF

导出

摘要古籍目录及其分类体系具有重要的学术价值,数字学术的发展为古籍目录的数字化保存和利用以及开展数字工具支持的目录学研究提供了新的契机。本文以时间跨度两千多年的八种史志目录为数据源,以机器预处理与专家校对相结合的人机迭代方式对数据进行记录拆分和字段抽取、数据补全、规范化以及书目认同,最终完成11万余条书目记录的结构化、规范化集成。在此数据集的基础上,从领域专家的研究需求出发,结合统计、可视化、检索等方法,利用人机交互技术构建了一个历代古籍目录可视化分析系统。该系统包括书目统计以及分类演化分析两个主要部分:一方面可对书目数据进行细粒度统计和可视化呈现,以帮助学者清晰地比较、追踪类目的消长;另一方面可对所有典籍在历代目录中的分类演变轨迹以及各类目所收典籍的源流进行可视化分析,以更好地实现类目分合转化的模式识别。本研究为数字学术背景下的目录学研究提供了数据基础和分析工具,不仅为学者省去了大量数据收集、整理的时间,还通过新的技术和视角助力分析、比较等解释性研究。图8。表3。参考文献36。 Ancient book catalogs record and classify a large number of Chinese ancient books.They are of great academic value for studying both ancient literature and traditional knowledge organization.The development of digital scholarship shed new light on the digital preservation and reuse of these ancient book catalogs as well as the domain research supported by digital tools.Digital scholarship facilitates the digitization and datafication of ancient book catalogs.Moreover,new methods and computational tools are provided to enable the exploration of large collections,and new research questions can be raised from fresh perspectives.Recent studies have introduced computational methods to analyze the abstracts and classification systems of the ancient book catalogs.But these studies were based on only one catalog or a particular category.It is imperative to integrate the catalogs throughout the history and provide digital tools for scholars to explore and analyze them diachronically and holistically.In this study,we selected eight representative catalogs,mostly from official histories,as data sources.They were Hanshu Yiwenzhi,Suishu Jingjizhi,Jiutangshu Jingjizhi,Xintangshu Yiwenzhi,Songshi Yiwenzhi,Mingshi Yiwenzhi,Qingshigao Yiwenzhi and Siku Quanshu Zongmu.These catalogs cover major dynasties in Chinese history with a time span of more than two thousand years.We adopted a semi-automated data processing approach to integrate the book entries in eight catalogs.The whole integration process was iterated by machine pre-processing and expert manual correction and contained three main steps—record splitting and field segmentation,field completion and normalization and book identification.Eventually we got more than 110000 structured data records,and identified over 7000 books that were recorded in at least two catalogs.Based on the integrated data,we designed and developed an interactive visual analysis system that included features of statistics,visualization and record query.The system is designed to mainly meet two research requirements proposed by expert users.First,the system provides granular statistics and graphs that can help scholars to compare and trace the change of book volumes in different categories and catalogs.Second,it provides an interactive visualization tool that can be used to explore how different books are classified differently in each catalog,and thus manifests the changes of knowledge organization as well as the origin and evolution of academic thoughts.In conclusion,this study provides data foundation and analytic tools for the studies of ancient book catalogs in the context of digital scholarship,which not only saves the effort on manual data collection and collation,but also provides new perspectives to identify and solve hermeneutics problems with new techniques.8 figs.3 tabs.36 refs.

作者李文琦王凤翔孙显斌黄芷欣李芃蓓 LI Wenqi;WANG Fengxiang;SUN Xianbin;HUANG Zhixin;LI Pengbei

机构地区北京大学信息管理系、北京大学数字人文研究中心中国科学院自然科学史研究所北京大学中文系中华书局文学编辑室

出处《中国图书馆学报》北大核心 2023年第1期82-98,共17页 Journal of Library Science in China

基金国家自然科学基金国际重点合作项目“中国儒家学术史知识图谱构建研究”(编号:72010107003)的研究成果。

关键词古籍目录数字学术数字人文目录学数据集成可视化 Ancient book catalogs Digital scholarship Digital humanities Bibliography Data integration Visualization

分类号 G257 [文化科学—图书馆学]