期刊文献+

数据清理方法 被引量:4

Methodological Research on Data Cleaning
下载PDF
导出
摘要 数据清理是数据仓库中的一个重要研究领域,近似重复记录的识别则是数据清理中的技术难点之一。文中提出了几种预处理技术,通过使用这些技术,当记录按关键字排序时,近似重复记录互相靠近。结合识别近似重复记录的优先队列策略,给出了记录相似度的计算,并给出了分析结果。
作者 佘春红
出处 《计算机应用》 CSCD 北大核心 2002年第12期128-130,共3页 journal of Computer Applications
  • 相关文献

参考文献5

  • 1[1]Erhard R., Do H.H. Data Cleaning:Problem and Current Approaches[J]. IEEE Techn. Bulletin Data Engineering,2000,23(4).
  • 2[2]Hern′andez M.A.,Stolfo S.J. The merge/purge problem for large databases[A]. Proceedings of the ACM SIGMOD,International Conference on Management of Data[C]. ACM Press,May 1995. 127-138.
  • 3[3]Monge A.E. An adaptive and efficient algorithm for detecting approximately duplicate database records[J]. Submitted for journal publication, June 2000.
  • 4[4]Monge A. E.,Elkan C.P. The field matching problem: Algorithms and applications[A]. Proc. 2nd Intl. Conf. Knowledge Discovery and Data Mining[C]. Portland, Oregon,1996.
  • 5[5]Lee M.L.,Lu H., Ling T.W. et al. Cleansing Data for Mining and Warehousing[A]. 10th International Conference and Workshop on Database and Expert Systems Applications (DEXA99)[C]. Florence, Italy, August 30 - September 3,1999.

同被引文献19

  • 1赵玖玲,李俊山,叶霞.数据仓库应用系统技术研究[J].计算机应用研究,2002,19(11):32-33. 被引量:2
  • 2曾华琳,李堂秋,史晓东.一种基于提取上下文信息的分词算法[J].计算机应用,2005,25(9):2025-2027. 被引量:9
  • 3Madnick S E,Wang R r.A framework for corporate householding[C]∥Fisher C,Davidson B N,eds.Proceedings of the 7th International Conference on Information Quality,MIT,2002:36-46.
  • 4Apers P,Atzeni P,Ceri S,et al.Proceedings of the 27th International Conference on Very Large Data Bases[C]∥Proceedings of Very Large Databases,Rome,2001:381~390.
  • 5Meidan A.White paper[EB/OL].[2005-4-20].http://www.wizsoft.com.
  • 6Monge A E.Matching algorithms within a duplicate detection system[J].IEEE Data Engineer Bulletin,2000,23(4):14-20.
  • 7Bunke H,Jiang X,Abegglen K,et al.On the weighted mean of a pair of strings[J].Pattern Analysis & Applications,2002,5(5):23-30.
  • 8Batista G,Monard M C.An analysis of four missing data treatment methods for supervised learning[J].Applied Artificial Intelligence,2003,17(5-6):519-533.
  • 9Diego M,Monica S,Tiziana C.Using ontologies for XML data cleaning[C]∥OTM Confederated Internationl Workshops and Posters,Rome,2005:562-571.
  • 10Naumann F,Freytag J,Leser U.Completeness of integrated information sources[J].Information Systems,2004,29(7):583-615.

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部