摘要
以往数据清洗工具在三个方面存在不足:工具和用户之间缺少交互,用户无法控制过程,也无法处理过程中的异常;数据转化和数据清洗规则缺少逻辑描述,没有达到与物理实现的分离;缺少元数据管理,用户很难分析和逐步调整数据清洗过程。文中提出了一种新的基于规则描述的交互式数据清洗框架,解决了上述三个方面存在的不足,提高了数据清洗的效率,使得数据的质量得到保证。并通过描述清洗规则的定义和执行,详细阐述了该清洗框架的结构。
There are three shortcomings in existing data cleaning tools.One is lack of human interaction,so users can't control the data cleaning processes and can't solve the exceptions in the processes;Another is lack of logical declaration about data transformation rules and data cleaning rules,so the rules are not independent of physical realization;The last is lack of management of metadata,so the users cann't analyse or adjust the data cleaning processes.The paper proposes a new rule-based interactive data cleaning framework to solve these shortcomings.So the data cleaning becomes more efficient, and data quality can be guaranteed.By describing the definition and execution of cleaning rules,this article also expatiates the architecture of the data cleaning framework.
出处
《微机发展》
2005年第4期141-144,共4页
Microcomputer Development
关键词
数据仓库
数据清洗
清洗规则
交互式
data warehouse
data cleaning
cleaning rule
interactive