期刊文献+

一种基于规则的交互式数据清洗技术 被引量:4

A Rule-based Interactive Data Cleaning Technique
下载PDF
导出
摘要 以往数据清洗工具在三个方面存在不足:工具和用户之间缺少交互,用户无法控制过程,也无法处理过程中的异常;数据转化和数据清洗规则缺少逻辑描述,没有达到与物理实现的分离;缺少元数据管理,用户很难分析和逐步调整数据清洗过程。文中提出了一种新的基于规则描述的交互式数据清洗框架,解决了上述三个方面存在的不足,提高了数据清洗的效率,使得数据的质量得到保证。并通过描述清洗规则的定义和执行,详细阐述了该清洗框架的结构。 There are three shortcomings in existing data cleaning tools.One is lack of human interaction,so users can't control the data cleaning processes and can't solve the exceptions in the processes;Another is lack of logical declaration about data transformation rules and data cleaning rules,so the rules are not independent of physical realization;The last is lack of management of metadata,so the users cann't analyse or adjust the data cleaning processes.The paper proposes a new rule-based interactive data cleaning framework to solve these shortcomings.So the data cleaning becomes more efficient, and data quality can be guaranteed.By describing the definition and execution of cleaning rules,this article also expatiates the architecture of the data cleaning framework.
出处 《微机发展》 2005年第4期141-144,共4页 Microcomputer Development
关键词 数据仓库 数据清洗 清洗规则 交互式 data warehouse data cleaning cleaning rule interactive
  • 相关文献

参考文献5

  • 1郭志懋,周傲英.数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076-2082. 被引量:268
  • 2邱越峰,田增平,季文贇,周傲英.一种高效的检测相似重复记录的方法[J].计算机学报,2001,24(1):69-77. 被引量:72
  • 3Common Warehouse Metamodel (CWM) Specification[S].Version 1.1 ,Volume 1,2003.
  • 4Raman V,Hellerstein J M. Potter's Wheel:An Interactive Data Cleaning System[A]. Proceedings of the 27th VLDB Conference[C]. Roma,Italy: [s. n. ] ,2001.100 - 109.
  • 5Tova M, Zohar S. Using Schema Matching to Simplify Heterogeneous Data Translation[A]. In Proc. 24th VLDB[C]. Roma, Italy: [s. n. ], 1998.122 - 133.

二级参考文献36

  • 1[1]Bitton D, DeWitt D J. Duplicate record elimination in large data files. ACM Trans Database Systems, 1983, 8(2):255-65
  • 2[2]Hernandez M, Stolfo S. The Merge/Purge problem for large databases. In: Proc ACM SIGMOD International Conference on Management of Data, 1995. 127-138
  • 3[3]Howard B Newcombe, Kennedy J M, Axford S J, James A P. Automatic linkage of vital records. Science, 1959, 130:954-959
  • 4[4]DeWitt D J, Naught J F, Schneider D A. An evaluation of non-equijoin algorithms. In: Proc 17th International Conference on Very Large Databases, Barcelona, Spain, 1991. 443-452
  • 5[5]Hylton J A. Identifying and merging related bibliographic records[MS dissertation]. MIT: MIT Laboratory for Computer Science Technical Report 678, 1996
  • 6[6]Monge A E, Elkan C P. An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc DMKD'97, Tucson Arizona, 1997
  • 7[7]Kukich K. Techniques for automatically correcting words in text. ACM Computing Surveys, 1992, 24(4):377-439
  • 8[8]Wagner R A, Fischer M J. The string-to-string correction problem. J ACM, 1974, 21(1):168-173
  • 9[9]Lowrance R, Robert A Wagner. An extension of the string-to-string correction problem. J ACM, 1975, 22(2):177-183
  • 10[10] Sellers P H. On the theory and computation of evolutionary distances. SIAM J Applied Mathematics, 1974, 26(4):787-793

共引文献323

同被引文献19

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部