期刊文献+

一个基于流程的数据清洗框架的研究

STUDY ON PROCESS BASED DATA CLEANING FRAMEWORK
下载PDF
导出
摘要 以往的数据清洗方法需要基于模式进行规则编码,费时、困难,而且后期难以修改规则。提出了一种新的相似重复记录的消除框架,可以使用户在无需编码的条件下简单地完成数据清洗工作。该框架具有开放的算法库、函数库以及基于模糊规则和成员函数的模糊推导系统,使其具有较强的通用性和适用性。最后通过实验验证了该框架的有效性。 Earlier approaches of data cleaning, which requires to encode rules based on a schema, were time consuming and difficult, and users could not later adapt the rules. This paper proposes a novel duplicate-elimination framework that lets users to clean data flexibly and effortlessly, without any coding. The extensible framework has open algorithms library, open functions library and Fuzzy Inference System based on fuzzy rule and membership functions,which make it universal and adaptive. At last the experimental results prove the framework' s effectiveness.
出处 《计算机应用与软件》 CSCD 2009年第9期157-158,171,共3页 Computer Applications and Software
关键词 数据清洗 重复记录 可扩展框架 模糊推导系统 Data cleaning Duplicate record Extensible framework Fuzzy inference system
  • 相关文献

参考文献5

  • 1陈伟,丁秋林.可扩展数据清理软件平台的研究[J].电子科技大学学报,2006,35(1):100-103. 被引量:10
  • 2Zadeh L A. From Computing with Numbers to Computing with Words: From Manipulation of Measurements to Manipulation of Perceptions [ J ]. Int' l J. Applied Math. and Computer Science, 2002,12 ( 3 ) : 307 - 324.
  • 3Hemandez M 3., Sto|fo S J. Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem [ J ]. Data Mining and Knowledge Discovery, 1998,2 ( 1 ) :9 - 37.
  • 4Takagi T, Sugeno M. Fuzzy Identification of Systems and its Applications to Modeling and Control [ J ]. IEEE Trans. Systems, Man, and Cybernetics, 1985,15 ( 1 ) : 116 - 132.
  • 5Jang J S R. ANFIS:Adaptive Network-based Fuzzy Inference Systems [ J]. IEEE Trans. Systems, Man, and Cybernetics, 1993,23 ( 3 ) : 665 - 685.

二级参考文献5

  • 1陈伟,丁秋林.数据清理中不完整数据的清理方法[J].微型机与应用,2005,24(2):44-45. 被引量:7
  • 2Galhardas H,Florescu D,Shasha D.Declarative data cleaning:language,model,and algorithms[C].In:Proceedings of the 27th VLDB Conference,Roma Morgan Kaufmann,2001:371-380
  • 3Rahm E,Do H H.Data cleaning:problems and current approaches[J].IEEE Data Engineer Bulletin,2000,23(4):3-13
  • 4Lee M L,Ling T W,Low W L.IntelliClean:a knowledge-based intelligent data cleaner[C].In:Proceeding of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Boston:ACM Press,2000:290-294
  • 5Monge A E.Matching algorithms within a duplicate detection system[J].IEEE Data Engineer Bulletin,2000,23(4):14-20

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部