期刊文献+

基于模糊综合评判的相似重复记录识别方法 被引量:14

Identification Method of Approximately Duplicate Records Based on Fuzzy Integrated Estimation
下载PDF
导出
摘要 针对在基于字符串匹配的相似重复记录识别中,属性权值确定主观性太强的问题,提出一种模糊综合评判获取属性权值的方法。采用多用户对各属性的重要性组成因素进行等级评价,通过模糊映射获得反映属性重要性的权值,并以此为基础进行相似重复记录识别。理论分析和实验表明,该方法能客观地获取各属性权值,因而在相似重复记录识别中有较高的识别精度。 Aiming at the problem of very strong subjectivity in the attribute weight determination of dataset in identifying approximately duplicate records based on the character string matching method,the paper puts forward a method based on fuzzy integrated estimation to get attribute weight.It estimates the components of all attribute’s importance by multi users,and gets the attribute’s weight through fuzzy mapping,based on which the approximately duplicate records are identified.It can be proved from theory and practice that the method can objectively get all attribute weight,thus it has a higher precision in identifying approximately duplicate records.
出处 《计算机工程》 CAS CSCD 北大核心 2010年第13期51-53,共3页 Computer Engineering
基金 湖南省教育厅科研基金资助项目(09C339) 湖南省科技计划基金资助项目(2008CK3083)
关键词 模糊综合评判 相似重复记录 属性权值 相似度 fuzzy integrated estimation approximately duplicate records attribute weight similarity
  • 相关文献

参考文献5

二级参考文献30

  • 1陈细谦,迟忠先,昃宗亮,苏立强.地理编码在空间数据仓库ETL中的应用[J].小型微型计算机系统,2005,26(4):628-630. 被引量:11
  • 2程国达,苏杭丽.一种检测汉语相似重复记录的有效方法[J].计算机应用,2005,25(6):1362-1365. 被引量:8
  • 3李先国,梁涌.一种高效的适用于字词检索的数据结构[J].微电子学与计算机,2006,23(12):157-160. 被引量:2
  • 4张永,迟忠先.位置编码在数据仓库ETL中的应用[J].计算机工程,2007,33(1):50-52. 被引量:12
  • 5[1]Bitton D, DeWitt D J. Duplicate record elimination in large data files. ACM Trans Database Systems, 1983, 8(2):255-65
  • 6[2]Hernandez M, Stolfo S. The Merge/Purge problem for large databases. In: Proc ACM SIGMOD International Conference on Management of Data, 1995. 127-138
  • 7[3]Howard B Newcombe, Kennedy J M, Axford S J, James A P. Automatic linkage of vital records. Science, 1959, 130:954-959
  • 8[4]DeWitt D J, Naught J F, Schneider D A. An evaluation of non-equijoin algorithms. In: Proc 17th International Conference on Very Large Databases, Barcelona, Spain, 1991. 443-452
  • 9[5]Hylton J A. Identifying and merging related bibliographic records[MS dissertation]. MIT: MIT Laboratory for Computer Science Technical Report 678, 1996
  • 10[6]Monge A E, Elkan C P. An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc DMKD'97, Tucson Arizona, 1997

共引文献93

同被引文献122

引证文献14

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部