期刊文献+

数据ETL过程中的实体识别方法 被引量:2

Entity Identification Method for Data ETL Process
下载PDF
导出
摘要 实体识别是根据记录所包含的各种描述信息来确定与之相对应的现实实体,记录的相似匹配是数据集成中最 具挑战的工作。分析了常见实体识别算法,提出了实体识别过程框架,用以实现数据ETL的数据规约功能。在开发的实现 语义数据集成的联通统一客户资料系统(UCIS)中,用实体识别算法进行测试,得到的平均返回率和精度分别为86.3%、 96.5%,能够满足工程应用的要求。 Entity identification is to conform the corresponding practical entity based on its various description information. The biggest challenge for data integration is to match the similar records. General entity identification algorithms are analyzed. The entity identification process frame for carrying out the data filtering function of data ETL (data extraction, transformation and loading) is brought forward. Average returning rate and precision tested with entity identification algorithms are respectively 86.3%, 96.5% in developed UCIS(UniCom Client Information System) that carried out data integration based on semantics, which can meet engineering application.
出处 《现代电子技术》 2005年第7期44-46,共3页 Modern Electronics Technique
关键词 数据ETL 相似重复记录 实体识别算法 实体识别过程框架 data ETL duplicate records entity identification algorithms entity identification process frame
  • 相关文献

参考文献8

  • 1周宏广,周继承,彭银桥,龙思锐.数据ETL工具通用框架设计[J].计算机应用,2003,23(12):96-98. 被引量:26
  • 2Howard Newcombe B, Kennedy J M, Axlord S J, et al.Automatic Linkage of Vital Records [J] . Science, 1959,130: 954-959.
  • 3Bikon D, Dewitt D J. Duplicate Record Elimination in Large Data Files [J] . ACM Trans Database Systems,1983, 8 (2): 255-265.
  • 4De Witt D J, Naught J F, Sckneider D A. An Evaluation Ofnon-equijoin Algorithms [C] . Proc 17th International Conference on very large databases, Spain, 1991.443-452.
  • 5Hernander M, Stolfo S. The Merge/Purge Proplem for Large Databases [C] . Proc ACM SIGMOD Interational Conference on Mangement of Data, 1995, 127-136.
  • 6Hylton J A. Identifying and Merging Related Bibliographic Records [R]. MIT: MIT Laboratory for Computer Science Technical Report 678, 1996.
  • 7Lee M L, Ling T W, InteglliClean W L. A Knowledge -based Intelligent Data Cleaner [C]. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000,290 -294.
  • 8Borkar V, Deshmukh K, Sarawagi S. Automatically Extracting Structure from Free Text Addresses [J] .IEEE Data Engineering Bulletin, 2000, 23 (4): 27-32.

二级参考文献3

共引文献25

同被引文献7

  • 1王理,陈皓,夏辉,邓海生.在异构数据库环境中实现数据集成[J].现代电子技术,2006,29(6):83-85. 被引量:3
  • 2吴越,崔志明,陈建明.用微软的SQL数据仓库开发工具快速开发辅助决策系统[J].计算机与现代化,2006(9):74-76. 被引量:3
  • 3Inmon W H.数据仓库[M].北京:机械工业出版社,2005.
  • 4Gruber T R.A Translation Approach to Portable Ontology Pecifications[J].Knowledge Acquisition,1993,5:199-220.
  • 5Studer R,Benjamins V R,Fensel D.Knowledge Engineering,Principles and Methods[J].Data and Knowledge Engineering,1998,25(122):161-197.
  • 6Wache H,VSgele T,Visser U,et al.Ontology-based Integration of Information-A Survey of Existing Approaches.In IJCAI Workshop on Ontologies and Information Sharing,2001.
  • 7Heather Kreger.Web Services Conceptual Architecture.http://www-306.ibm.com/software/solutions/webservices/pdf/WSCA.pdf.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部