摘要
由于各种原因 ,数据中存在这样或那样的脏数据需要清理 (净化 )。特别是数据仓库、KDD及TDQM(综合数据质量管理 )中 ,必须对数据进行清理。介绍了数据清理的有关内容、技术与实现方案 ,着重介绍了目前的两个重点研究、应用内容
A lot of data is dirty because of some reasons.It is required and crucial to cleaning these data when we mainly depend on them. Data cleaning is a major part of data warehousing,KDD,TDQM(total data quality management). We provide an overview of data cleaning including its content,technology and realization. And we also introduce two main application: detecting outlier and duplicate elimination.
出处
《计算机应用研究》
CSCD
北大核心
2002年第3期3-5,共3页
Application Research of Computers
关键词
数据清理
数据质量
数据仓库
数据库
Data Cleaning
Data Quality
Outlier
Duplicate Elimination