摘要
以往的数据清洗方法需要基于模式进行规则编码,费时、困难,而且后期难以修改规则。提出了一种新的相似重复记录的消除框架,可以使用户在无需编码的条件下简单地完成数据清洗工作。该框架具有开放的算法库、函数库以及基于模糊规则和成员函数的模糊推导系统,使其具有较强的通用性和适用性。最后通过实验验证了该框架的有效性。
Earlier approaches of data cleaning, which requires to encode rules based on a schema, were time consuming and difficult, and users could not later adapt the rules. This paper proposes a novel duplicate-elimination framework that lets users to clean data flexibly and effortlessly, without any coding. The extensible framework has open algorithms library, open functions library and Fuzzy Inference System based on fuzzy rule and membership functions,which make it universal and adaptive. At last the experimental results prove the framework' s effectiveness.
出处
《计算机应用与软件》
CSCD
2009年第9期157-158,171,共3页
Computer Applications and Software
关键词
数据清洗
重复记录
可扩展框架
模糊推导系统
Data cleaning Duplicate record Extensible framework Fuzzy inference system