摘要
同名问题在大规模的数据库或者数字化图书馆中普遍存在,且困扰着许多研究课题。本文首先提出一种新的图结构——属性关系图(ARG)形象地刻画实体特征及实体间的联系,并给出一种基于属性关系图框架的同名区分算法ARG-Resolution,对共享同一名字的作者进行分析,根据他们之间的相似度将其聚类,最终得到对应真正实体的各个结果聚类。实验证明挖掘作者间的潜在连接进一步提高了同名区分的质量,成功解决了同名问题。
The problem of name sharing is widespread in large-scale databases or digital libraries,and it causes many research troubles. We propose a graph module named Attributed Relational Graph to describe the figures and links between entities,then we apply an algorithm named ARG-Resolution based on Attributed Relational Graph to distinct the entities having the same name. The algorithm analyzes the entities and clusters them according to the similarity measure,and eventually gets a set of clusters that correspond to the real entity respectively. The experiment over real datasets shows that mining the links can improve the quality of name disambiguation and resolve the problem successfully.
出处
《计算机工程与科学》
CSCD
北大核心
2010年第9期61-64,共4页
Computer Engineering & Science
基金
国家自然科学基金资助项目(60673136)
关键词
同名
属性
链接
相似性
层次聚类
name sharing
attributes
links
similarity
hierarchical clustering