期刊文献+

实体解析技术综述与展望 被引量:5

Summary and Prospect on Entity Resolution
下载PDF
导出
摘要 实体解析是数据清理、数据集成、数据挖掘等技术中关键的一步,是数据质量的保障。介绍了实体解析含义、背景起源以及算法基础。列举并解释了实体解析发展过程中的经典算法,包括成对实体解析、集合实体解析、大数据的实体解析、复杂数据上的实体解析等,以及它们的特点和局限性,分享了在新的应用环境下衍生出来的针对不同需求的新的实体解析算法。最后展望了实体解析领域当前的研究热点以及发展方向。 Entity Resolution(ER)is a key step in data cleaning,data integration,data mining and the insurance of data quality.This paper listed and explained some classic algorithms in the development of entity resolution,including pairwise entity resolution,collective entity resolution,entity resolution on big data,and entity resolution on complex data et al.We also introduced the characteristics and limitation of these algorithms and shared some state-of-the-art algorithms derived from new application environment according to different requirements.Finally,the research hotspots and the development direction of this field were discussed.
作者 朱灿 曹健
出处 《计算机科学》 CSCD 北大核心 2015年第3期8-12,18,共6页 Computer Science
基金 国家自然科学基金(61272438) 上海市科委项目(12511502704 14511107702)资助
关键词 实体解析 记录链接 集合数据 复杂数据 大数据 Entity resolution Record linkage Collective data Complex data Big data
  • 相关文献

参考文献37

  • 1McCallum A, et al. Efficient clustering of high-dimensional data sets with application to reference matching[C]//KDD 2000. 2000.
  • 2Joshi S, Agrawal N, Krishnapuram R, et al. A bag of paths model for measuring, structural similarity in Web documents[C]//Pro ceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. Washington, 19(2, USA, 2003:577-582.
  • 3Bilenko M, Mooney R. Adaptive Duplicate Detection Using Learnable String Similarity Measures[C]//KDD 2003. 2003:39-48.
  • 4Viyanon W, Madria S K. A system for detecting xml similarity in content and structure using relational database[C]//Procee- dings of the 18:h ACM Conference on Information and Know- ledge Management. HongKong,China,2009: 1197 1206.
  • 5Fellegi I, Sunter A. A Theory for Record Linkage [J]. JASA 1969,64(328) : 1183-1210.
  • 6Papadakis G,loannou E,Nieder6e C, et al. Efficient entity reso- lution for large heterogeneous information spaces[C]//WSDM 2011. 2011:535 544.
  • 7杨丹,申德荣,于戈,聂铁铮,寇月.数据空间中时间为中心的集合实体识别策略[J].计算机科学与探索,2012,6(11):974-984. 被引量:4
  • 8王宏志,樊文飞.复杂数据上的实体识别技术研究[J].计算机学报,2011,34(10):1843-1852. 被引量:19
  • 9Puhlmann S, Weis M, Naumann F. XML duplicate detection using sorted neighborhoods[C]//Proceedings of the 10th Inter- national Conference on Extending Database Technology. Mu- nich,Germany, 2006 : 773 791.
  • 10Baxter R, Christen P, Churches T. A Comparison of Fast Blo- cking Methods for Record Linkage[C]//Proc. Workshop DataCleaning,Record Linkage and Object Consolidation at SIGKDD. 2003 : 25 27.

二级参考文献80

  • 1韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展,2005,42(12):2206-2212. 被引量:32
  • 2Kelman C W,Bass J A,Holman D.Research Use of Linked Health Data-A Best Practice Protocol[J].ANZ Journal of Public Health,2002(26):251-255.
  • 3Riesen K,Neuhaus M,Bunke H.Graph Embedding in Vector Spaces by Means of Prototype Selection[J].Graph-Based Representations in Pattern Recognition,2007:383-393.
  • 4Riesen K,Bunke H.Structural Classifier Ensembles for Vector Space Embedded Graphs[A].IJCNN 2007 Conference[C].2007(8):1 500-1 505.
  • 5Hjaltason G R,Samet H.Properties of Embedding Methods for Similarity Searching in Metric Spaces[A].IEEE TPAMI 25[C].2003(5):530-549.
  • 6Lawati A Al,Lee D,McDaniel P.Blocking-aware Private Record Linkage[A].Proc.IQIS[C].2005.
  • 7Trepetin S.Privacy-Preserving String Comparisons in Record Linkage Systems:A Review[J].Information Security Journal:A Global Perspective,2008 (17):253-266.
  • 8Bertino E,Elmag A.Privacy Preserving Schema and Data Matching[A].Proc.of ACM SIGMOD-PODS 2007 Conference[C].2007:653-664.
  • 9Nikki S. Gartner warns firms of "dirty data". Information Management Journal, 2007, 41 (3). http://www, allbusi ness. com/company-activities-management/operations quality-control/8901885-1. html.
  • 10Kohn L T, Corrigan J M, Donaldson M S. To err is human, building a safer health system. Washington, D. C. , USA: National Academies Press, 2000.

共引文献21

同被引文献27

引证文献5

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部