摘要
数据质量管理是信息系统建设的首要问题。本文首先回顾了数据质量的定义和质量提高策略的分类,然后对数据质量研究涉及的两个主要方面,即数据质量评估和数据质量提高技术的各种方法进行了比较和分析,并对有代表性的数据质量提高工具进行了介绍。最后提出了一个评估驱动的数据质量提高框架,并对数据质量研究方向进行了展望。
Data quality management is an essential problem for information systems. First, the definitions of data quality are overviewed and the strategies for improving quality are summarized. Then, the two main aspects of data quality research, that is data quality assessment and data quality improvement methods are analyzed respectively. At last, some data quality tools are briefly touched on. Based on above analysis, an assessment-driven data improvement framework is proposed and the future research directions are discussed.
出处
《计算机科学》
CSCD
北大核心
2008年第2期1-5,12,共6页
Computer Science
基金
江苏省“十五”高科技项目(BG2001013)
关键词
数据质量
数据清洗
机器学习
数据审计
Data quality, Data cleansing, Machine learning, Data auditing