期刊文献+

基于聚类汇总的记录匹配算法 被引量:2

A Record Matching Algorithm Based on Clustering Collection
下载PDF
导出
摘要 记录匹配算法在异构数据的集成和数据开采等领域应用广泛 ,其主要任务是找出来自不同数据源中代表同一对象实体的记录 ,这些记录具备相似的属性和属性值。为避免组合爆炸问题 ,现有的记录匹配算法不再对数据库中的记录数两两匹配 ,而是结合排序策略和静态聚类匹配方法实现 ,但这种静态方法不适应数据的动态变化。因此 ,本文提出基于聚类汇总的记录匹配算法 ,该算法可以解决静态方法导致的匹配记录丢失问题 ,同时能够减少计算量 ,提高匹配记录搜索效率。 The record matching algorithm is widely used in the fields of heterogeneous data integration and data mining,etc. The task of the algorithm is to discover the records which represent the same entity from different sources. The records have the approximate attributes and attribute values. To avoid the problem of combination explosion,the existing record matching algorithms will no longer be used to perform pair-wise record matching. They integrate the data sorting policy with the static clustering matching method. But the static method is unsuitable for dynamic data changing. Therefore, a record matching algorithm based on clustering collection is proposed. The algorithm can solve the problem of matched record loss caused by the static method. It can also improve the efficiency of matched record searching.
出处 《计算机工程与科学》 CSCD 2004年第9期62-63,101,共3页 Computer Engineering & Science
基金 国家高性能计算基金资助项目 ( 0 0 3 0 3 ) 华中科技大学科学研究基金资助项目 (M990 15 )
关键词 聚类汇总 记录匹配 模式匹配 重复记录发现 聚类分析 record matching clustering collection clustering analysis
  • 相关文献

参考文献4

  • 1JiaweiHan MiehelineKamber.数据挖掘:概念与技术[M].机械工业出版社,2001.237-251.
  • 2U S Congress, Office of Technology Assessment. Information Technologies for Control of Money Laundering, OTA-ITC-630 [ Z ].Washington, DC: U. S. Government Printing Office, 1995.
  • 3M Hernandez, S Stolfo. The Merge/Purge Problem for Large Databases[A].Proc of the ACM SIGMOD Int'l Conf on Management of Data[C]. 1995. 127-138.
  • 4Fang Liu, Zhengding Lu, Songfeng Lu. Mining Association Rules Using Clustering[J]. Intelligent Data Analysis, 2001, 5(4): 309- 326.

共引文献50

同被引文献9

  • 1张永,迟忠先.位置编码在数据仓库ETL中的应用[J].计算机工程,2007,33(1):50-52. 被引量:12
  • 2Bilenko M, Mooney R. Adaptive name matching in information integration [ J ]. IEEE Intelligent System,2003,18 (5) : 16 - 23.
  • 3Monge A. An adaptive and efficient algorithm for detecting approximately duplicate database records [EB/OL ]. 2007 - 09 - 02. http :// citeseer, ist. psu. edu/mongeov adaptive, html.
  • 4Monge A.An Adaptive and Efficient Algorithm for Detecting Approximately Duplicate Database Records[EB/OL].(2007-09-02).http://citeseer.ist.psu.edu/mongeovadaptive.html.
  • 5Khan H M,Maly K,Zubair M.Similarity and Duplicate Detection System for an OAI Compliant Federated Digital Library[C] //Proc.of ECDL'05.Vienna,Austria:[s.n.] ,2005.
  • 6Foulonneau M.Information Redundancy Across Metadata Collections[J].Information Processing and Management,2007,43(3):740-751.
  • 7李星毅,包从剑,施化吉.数据仓库中的相似重复记录检测方法[J].电子科技大学学报,2007,36(6):1273-1277. 被引量:25
  • 8时念云,张金明,褚希.基于CURE算法的相似重复记录检测[J].计算机工程,2009,35(5):56-58. 被引量:11
  • 9邱越峰,田增平,季文贇,周傲英.一种高效的检测相似重复记录的方法[J].计算机学报,2001,24(1):69-77. 被引量:72

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部