摘要
针对中文数据清洗研究进行了综述。阐明了全面数据质量管理与数据清洗之间的关系,给出数据清洗的定义及对象;介绍中文数据清洗问题产生的背景、国内外研究现状与研究热点,并简介其基本原理、模型及已有算法;着重阐明了中文数据清洗的方法;总结中文数据清洗研究的不足,并对中文数据清洗的研究及应用进行了展望。
Chinese data cleaning problem is surveyed in this paper.The relationships among total data quality management and data cleaning are clarified,and the definition and objects of data cleaning are given.The background of data cleaning problem,research status and hot research areas are introduced,and the basic principle and some models of data cleaning are presented briefly,existing algorithms are analyzed.According to the situation of the country and demand of projects,the methods of Chinese data cleaning are emphasized.The weakness of Chinese data cleaning is clarified,and the future research topics and application related to Chinese data cleaning problem are discussed.
出处
《计算机工程与应用》
CSCD
2012年第14期121-129,共9页
Computer Engineering and Applications
基金
国家863计划重点项目(No.2007AA010305)
关键词
中文数据清洗
数据质量管理
数据集成
Chinese data cleaning
data quality management
data integration