期刊文献+

一种基于特征的实体统一算法在数据集成中的应用研究 被引量:2

Application Study of Entity Resolution Algorithm Based on Characteristics in Data Interaction
下载PDF
导出
摘要 实体统一对于提高数据的简洁性和准确性具有重要意义。在Web数据集成领域,实体统一是被广泛关注的重点研究方向。文章基于朴素的成对匹配实体统一算法,提出一种新的基于特征的增量式实体统一方法,通过对实体特征的有效区分并定义新的数据结构,提升了算法的准确率,降低了算法时间复杂度,并应用于DBLP论文数据库,结果证明该方法可以有效提升Web数据集成的数据质量。 Entity resolution has a great significance for improving the simplicity and accuracy of data.In the field of web data integration,entity resolution is widely focused on research direction.Based on the basic pairs matching entity resolution algorithm,this paper proposes a new incremental entity resolution method based on characteristics.By distinguishing the characteristics of the entity and defining a new data structure,the paper improves the accuracy of the algorithm and reduces the time complexity of the algorithm,which is applied to the DBLP database.The result shows that the method can effectively improve the data quality of web data interaction.
作者 何鹏 陈豫
出处 《情报理论与实践》 CSSCI 北大核心 2015年第7期119-122,共4页 Information Studies:Theory & Application
关键词 实体统一 特征 领域数据 应用研究 entity resolution characteristics field data application study
  • 相关文献

参考文献8

  • 1刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 2ELMAGARMID A K, IPEIROTIS P G, VERYKIOS V S. Duplicate record detection: a survey [ J ]. Knowledge and Data Engineering, IEEE Transactions on, 2007, 19 (1) : 1-16.
  • 3BILENKO M, MOONEY R J. Adaptive duplicate detection using learnable string similarity measures [ C ] //Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Dis- covery and Data mining. ACM, 2003 : 39-48.
  • 4ISELE R, BIZER C. Learning expressive linkage rules using ge- netic programming [ J]. Proceedings of the VLDB Endowment, 2012, 5 (11): 1638-1649.
  • 5BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training [ C ]//Proceedings of the Eleventh Annum Con- ference on Computational Learning Theory. ACM, 1998 : 92-100.
  • 6BILENKO M, MOONEY R, COHEN W, et al. Adaptive name matching in information integration [ J ]. IEEE Intelligent Sys- tems, 2003, 18 (5): 16-23.
  • 7潘峰,李庆忠,董永权.一种模式匹配和实体统一相互促进的方法[J].计算机与数字工程,2009,37(9):4-6. 被引量:3
  • 8YIN Xiaoxin, HAN Jiawei, YU P S. Object distinction: distin- guishing objects with identical names [ C]. Data Engineering, 2007. ICDE, 2007. IEEE 23rd Intemaional Conference on IEEE, 2007: 1242-1246.

二级参考文献66

  • 1李由,刘东波,张维明.基于数据实例分布特征的自动模式匹配方法[J].计算机科学,2005,32(11):85-87. 被引量:11
  • 2郑文怡,鞠时光.模式匹配方法研究[J].计算机应用研究,2006,23(2):60-63. 被引量:10
  • 3David Guy Brizan, Abdullah Uz Tansel. A Survey of Entity Resolution and Record Linkage Methodologies [J]. Communications of the IIMA,2006,6(3):41-50.
  • 4Alexander Bilke, Felix Naumara Schema Matching using Duplicates[C]. Proe of the 21st International Conference on Data Engineering, ICDE, 2005 : 69-80.
  • 5Mingchuan Guo, Yong Yu. Mutual enhancement of schema mapping and data mapping[C]. Proc of the ACM SIGKDD 2004 Workshop on Mining for and from the Semantic Web, USA: Seattle, 2004: 129-141.
  • 6Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke etc. Query relaxation using malleable schemas[C]. Proc of the 2007 ACM SIGMOD international conference on Management of data, China: Beijing, 2007 : 152-16.
  • 7.[EB/OL].http://www.cogsci.Princeton.edu,.
  • 8Fetterly D,Manasse M,Najork M,Wiener J L.A largescale study of the evolution of Web pages//Proceedings of the 12th International World Wide Web Conference.Budapest,2003:669-678
  • 9Chang K C,He B,Li C,Patel M,Zhang Z.Structured databases on the Web:Observations and Implications.SIGMOD Record,2004,33(3):61-70
  • 10Cope J,Craswell N,Hawking D.Automated discovery of search interfaces on the Web//Proceedings of the 14th Australasian Database Conference(ADC 2003).Adelaide,2003:181-189

共引文献137

同被引文献7

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部