摘要
实体统一对于提高数据的简洁性和准确性具有重要意义。在Web数据集成领域,实体统一是被广泛关注的重点研究方向。文章基于朴素的成对匹配实体统一算法,提出一种新的基于特征的增量式实体统一方法,通过对实体特征的有效区分并定义新的数据结构,提升了算法的准确率,降低了算法时间复杂度,并应用于DBLP论文数据库,结果证明该方法可以有效提升Web数据集成的数据质量。
Entity resolution has a great significance for improving the simplicity and accuracy of data.In the field of web data integration,entity resolution is widely focused on research direction.Based on the basic pairs matching entity resolution algorithm,this paper proposes a new incremental entity resolution method based on characteristics.By distinguishing the characteristics of the entity and defining a new data structure,the paper improves the accuracy of the algorithm and reduces the time complexity of the algorithm,which is applied to the DBLP database.The result shows that the method can effectively improve the data quality of web data interaction.
出处
《情报理论与实践》
CSSCI
北大核心
2015年第7期119-122,共4页
Information Studies:Theory & Application
关键词
实体统一
特征
领域数据
应用研究
entity resolution
characteristics
field data
application study