期刊文献+

基于聚类模式的多数据源记录匹配算法 被引量:5

Matching Data Records Among Multi Data Sources Based on Clustering Techniques
下载PDF
导出
摘要 提出了一种基于聚类技术的多数据源记录匹配算法,该算法运用的罩盖(Canopy)聚类技术是一种专门对付大型数据的聚类方法,此算法不仅是一个与应用领域无关的算法,跟其它模型相比,在保证原有准确程度的前提下,大大地减少了必需的计算量,提高了记录匹配的效率. This paper put forward an algorithm, by using the canopy clustering technique which focuses on large data set, to match data records among multi data sources. The algorithm is a kind of domain-independent method, and compare to other model, when it promises the algorithm's accuracy, this method increases the effectiveness.
出处 《小型微型计算机系统》 CSCD 北大核心 2005年第9期1546-1550,共5页 Journal of Chinese Computer Systems
基金 广西师范大学青年基金资助.
关键词 记录匹配 Canopy聚类技术 实体聚类 record matching canopy clustering technique entity cluster
  • 相关文献

参考文献7

  • 1Howard B Newcombe. Handbook of record linkage: Methods for health and statistical studies, administration, and business[M]. Oxford University Press,1998.
  • 2Hernandez M, Stolfo S. The merge/purge problem for large databases[C]. Proceedings of the ACM SIGMOD International Conference on Management of Data. 1995, May: 127-138.
  • 3Monge A, Elkan C. An efficient domain-independent algorithm for detecting approximately duplicate database records[C]. Proceeding of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tucson, Arizona, 1997.
  • 4Jeremy A Hylton. Identifying and merging related bibliographic records[R]. M.S. thesis. MIT, Published as MIT Laboratory for Computer Science Technical Report 678, 1996.
  • 5McCallum A, Nigam K, Ungar L. Efficient clustering of highdimensional data sets with application to reference matching[C]. Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, 2000,169-178.
  • 6Charles L A Clarke, Gordon V Cormack. Dynamic inverted indexes for a distributed full-text retrieval system[R]. Technical Report MT-95-01.
  • 7Boyer R S, Moore J S. A fast string-searching algorithm[J].Communications of the ACM. 1977, 20(10):762-772.

同被引文献53

引证文献5

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部