期刊文献+

数据质量研究综述 被引量:102

An Overview of Data Quality Research
下载PDF
导出
摘要 数据质量管理是信息系统建设的首要问题。本文首先回顾了数据质量的定义和质量提高策略的分类,然后对数据质量研究涉及的两个主要方面,即数据质量评估和数据质量提高技术的各种方法进行了比较和分析,并对有代表性的数据质量提高工具进行了介绍。最后提出了一个评估驱动的数据质量提高框架,并对数据质量研究方向进行了展望。 Data quality management is an essential problem for information systems. First, the definitions of data quality are overviewed and the strategies for improving quality are summarized. Then, the two main aspects of data quality research, that is data quality assessment and data quality improvement methods are analyzed respectively. At last, some data quality tools are briefly touched on. Based on above analysis, an assessment-driven data improvement framework is proposed and the future research directions are discussed.
出处 《计算机科学》 CSCD 北大核心 2008年第2期1-5,12,共6页 Computer Science
基金 江苏省“十五”高科技项目(BG2001013)
关键词 数据质量 数据清洗 机器学习 数据审计 Data quality, Data cleansing, Machine learning, Data auditing
  • 相关文献

参考文献78

  • 1Monge A, Elkan C. An efficient domain-independent algorithm for detecting approximately duplicate database records [C]. In: Proceedings of the ACM-SIGMOD Workshop on Research Issues on Knowledge Discovery and Data Mining,Tucson, AZ, 1997.
  • 2Motro A, Rakov I. Estimating the quality of data in relational databases [C]. In.. Proeeedings of the 1996 Conferenee on Informtion Quality, Cambridge, Massaehusetts, Oetober 1996.
  • 3Motro A, Anokhin P, Acar A C. Utility-based resolution of data inconsistencies [C]. IQIS 2004. 35-43.
  • 4Parssian A, Sarkar S, Jacob V S. Assessing data quality for information products [C]. 1999.
  • 5Parssian A, Sarkar S, Jacob V S. Assessing information quality for the composite relational operation ioins [C]. In:Proc. of Seventh International Conference on Information Quality, 2002.
  • 6Kahn B K, Strong D M. Product and Service Performance Model for Information Quality: An Update. IQ 1998. 102-115.
  • 7Barnett V , Lewis T. Outliers in statistical data. New York: John Wiley and Sons Inc , 1994.
  • 8Liu B, Hsu W, Ma Y. Integrating classification and association rule mining [C]. In.. Proc. of 4^th International Conference on Knowledge Discovery and Data Mining (KDD98), ACM press, 1998. 80-86.
  • 9Pluempitiwiriyawej C. A new hierarchical clustering model for speeding up the reconciliation of XML based, semistructured data in mediation systems [D]:[Doctoral Thesis]. 2001.
  • 10Cappiello C, Francalanci C, Pemici B. Data quality assessment from user's pespeetive [C]. IQIS,2004.

二级参考文献28

  • 1[1]Bitton D, DeWitt D J. Duplicate record elimination in large data files. ACM Trans Database Systems, 1983, 8(2):255-65
  • 2[2]Hernandez M, Stolfo S. The Merge/Purge problem for large databases. In: Proc ACM SIGMOD International Conference on Management of Data, 1995. 127-138
  • 3[3]Howard B Newcombe, Kennedy J M, Axford S J, James A P. Automatic linkage of vital records. Science, 1959, 130:954-959
  • 4[4]DeWitt D J, Naught J F, Schneider D A. An evaluation of non-equijoin algorithms. In: Proc 17th International Conference on Very Large Databases, Barcelona, Spain, 1991. 443-452
  • 5[5]Hylton J A. Identifying and merging related bibliographic records[MS dissertation]. MIT: MIT Laboratory for Computer Science Technical Report 678, 1996
  • 6[6]Monge A E, Elkan C P. An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc DMKD'97, Tucson Arizona, 1997
  • 7[7]Kukich K. Techniques for automatically correcting words in text. ACM Computing Surveys, 1992, 24(4):377-439
  • 8[8]Wagner R A, Fischer M J. The string-to-string correction problem. J ACM, 1974, 21(1):168-173
  • 9[9]Lowrance R, Robert A Wagner. An extension of the string-to-string correction problem. J ACM, 1975, 22(2):177-183
  • 10[10] Sellers P H. On the theory and computation of evolutionary distances. SIAM J Applied Mathematics, 1974, 26(4):787-793

共引文献92

同被引文献968

引证文献102

二级引证文献762

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部