期刊文献+

中文个人名称规范记录的实体匹配与聚簇 被引量:2

Entity-Based Matching and Clustering of Chinese Personal Name Authority Records
下载PDF
导出
摘要 本文尝试解决国内个人名称规范联合数据库检索结果集基于实体匹配的聚簇问题,分析国内名称规范联合库CCCNA的检索服务和数据库记录特点,提出对结果集记录合并聚簇的思路:首先预处理去除重复和明显的名称语义不匹配记录,再根据提取出的个人实体属性名称、出生年、个人关联的书目题名及关联的外部记录,基于个人实体的语义进行个人名称规范记录聚簇。实证统计结果显示,处理后结果集内的簇数都显著低于处理前的记录条数,与VIAF的关联聚簇结果也验证了本文方法的有效性。但本文书目匹配采取题名匹配,这会丢失一些有用的聚簇信息,后续研究将进一步集成图书机构的书目数据库,抽取更多的书目信息进行聚簇。 This paper tries to deal with entity-based matching and clustering of retrieval result sets of Chinese personal name database. This paper analyses retrieval service of Cooperation Committee of Chinese Name Au- thory(CCCNA) and record features in the database and concludes that each retrieval result set has too many re- cords needing to cluster based entity from semantics views. It first proposes preproeessing removing records of repeated and obvious mismatching between name and semantics, then proposes records clustering method based on names, birth-years, linked controlled numbers and books titles and links result clusters to VIAF. The results show that the quantity of clusters after processing is notably less than records before processing and the empirical study of linking into VIAF confirms the effectiveness of the methods. The book title is only taken as assistance of identifying of personal entity in the same person name recordings because of different reference for- mats and multilanguages from different references and we shall integrate bibliographic databases to present books based semantic entity in future research. 4 figs. 6 tabs. 16 refs.
作者 王瑞云 贾君枝 Wang Ruiyun Jia Junzhi
出处 《国家图书馆学刊》 CSSCI 北大核心 2017年第2期79-86,共8页 Journal of The National Library of China
基金 国家社科基金重点项目"基于关联数据的中文名称规范档语义描述及数据聚合研究"(项目编号:15ATQ004)的研究成果之一
关键词 虚拟国际规范文档 个人名称规范档 实体匹配 聚簇 VIAF Personal Name Authority Files Entity Match Clustering
  • 相关文献

参考文献9

二级参考文献207

共引文献90

同被引文献4

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部