期刊文献+

面向数据质量的ETL框架的设计与实现 被引量:20

Design and realization of ETL architecture to data quality
下载PDF
导出
摘要 针对传统抽取-转换-装载(ETL)架构在数据质量控制方面的不足,提出一种面向数据质量管理的ETL架构。根据ETL过程的特点,设计多数据源接口模块、ETL元数据描述模块、ETL任务描述模块和数据质量控制模块等。该架构以数据质量为核心,通过建立数据分析模型,利用规则推导引擎对数据分析结果生成数据清洗方案,从而有效地对数据流进行质量评估和管理。基于该设计思想开发一个ETL工具-DQETL。DQETL采用统一建模语言进行设计,并提供友好界面对ETL过程进行集中管理。最后,结合实例阐述了在该框架下进行数据质量管理的一般步骤。 To overcome the defects of traditional extract-transform-load (ETL) architecture in data quality control, an improved ETL architecture based on data quality management is presented. According to the feature of the ETL processes, four modules are designed, which cover the aspects of interface of multi data resources, description of ETL metadata, description of ETL tasks and controlling of data quality. The new architecture regards the data quality as the centre, on this basis, a data analysis model is constructed, the model provides data analysis results which is used by the rule deduction engine to generate data cleaning scheme. In this way, effective quality evaluation and management of data stream is provided. According to the idea of design, an ETL tool named Data Quality ETL (DQETL) is developed, the system is designed by unified modeling language (UML) and provides friendly interface to manage ETL processes. Finally, an example is provided to demonstrate the general process of data quality management.
作者 李庆阳 彭宏
出处 《计算机工程与设计》 CSCD 北大核心 2010年第9期2057-2060,共4页 Computer Engineering and Design
基金 广东省自然科学基金项目(07006474) 广东省科技攻关基金项目(2007B010200044)
关键词 数据仓库 数据质量 抽取-转换-装载(ETL) 规则推导 数据清洗 data warehouse data quality extract-transform-load (ETL) rule deduction data cleaning
  • 相关文献

参考文献9

二级参考文献39

  • 1骆伟忠,陈松乔.基于OLAP技术实现银行客户分析系统[J].微计算机信息,2004,20(5):90-91. 被引量:11
  • 2[1]Richard Y Wang,M P Reddy,Henry B Kon.Toward Quality Data:An Attribute-based Approach[J].Decision Support System, 1995; 13: 349~372
  • 3[2]Yair Wand,Rihard Y Wang. Anchoring Data Quality Dimensions in Ontological Foundations[J].COMMUNICATIONS OF THE ACM, 1996;39(11 ) :86~95
  • 4[3]Richard Y Wang,Veda C Storey,Christopher P Firth. A Framework for Analysis of Data Quality Research[J].IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1995; 7 (4): 623~640
  • 5[4]Diane M Strong,Yang W Lee,Richard Y Wang. Data Quality In Context[J].COMMUNICATIONS OF THE ACM, 1997 ;40(5): 103~110
  • 6[5]Leo L Pipino,Yang W Lee,Richard Y Wang. Data Assemssment[J].COMMUNICATIONS OF THE ACM,2002;45(4):211~218
  • 7[6]Data Quality Assessment:A Methodology for Success. FirstLogic,Whitepaper, 2003
  • 8[7]Tamraparni Dasu,Theodore Johnson,S Muthukrishnan et al. Mining Database Structure;Or,How to Build a Data Quality Browser[C]. In:ACM SIGMOD′2002,2000-06:4~6
  • 9(美)HInmon著 王志海 林友芳译.数据仓库[M](第二版)[M].北京:机械工业出版社,2003-03..
  • 10Michael F Jennings.Strategies for Custom Data Warehouse ETL Processing, 2000.

共引文献194

同被引文献147

引证文献20

二级引证文献118

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部