期刊文献+

关系数据中函数依赖检测方法 被引量:6

A Functional Dependencies Checking Method in Relational Data
下载PDF
导出
摘要 在数据质量研究中函数依赖被广泛用于关系数据不一致性的修复.然而,不一致修复问题面临的一个主要挑战是如何从包含有错误的关系数据中自动发现有效的函数依赖(Functional Dependence,FD).目前基于统计度量置信度的FD自动发现方法经常找出大量近似成立但无效的FD.如果直接利用这些FD修复数据,会产生更多错误.针对该问题,文中提出了一种基于数据语义分析的函数依赖检测方法.该方法通过条件概率来分析属性值和元组的数据置信度,进而计算函数依赖成立的置信度.文中同时提出了利用关系数据构建马尔科夫毯贝叶斯网络用以计算数据置信度的方法.最后文中通过实验在模拟数据和真实数据上验证了基于数据语义的置信度计算方法在自动检测中的精确度优于基于统计的计算方法,并且在交互式检测应用场景中数据语义的置信度所需用户工作量少于基于统计的方法. In data quality research,Functional Dependencies(FDs)have been widely used to repair inconsistent relational data.However,the main challenge of repairing inconsistent data is how to discover valid functional dependencies from errorous relational data.The existing FD discovery methods,which are based on statistical confidence measurement,usually find many approximately correct but actually invalid FDs.Directly applying these discovered FDs to repair inconsistent relational data may introduce more data errors.To address this issue,we propose a novel approach for FD confidence measurement based on data semantics analysis.It first uses conditional probabilities to measure confidence of an attribute value,and then aggregate them for estimating the confidence level of a given FD.We also provide an efficient method to construct Markov blanket Bayesian networks for every relational data attribute,and then use Markov blanket Bayesian networks to compute conditional probabilities.Our experimental study on both synthetic and real-world data shows that the proposed approach achieves considerably higher accuracy than the statistics-based approach.Furthermore,we designed an interactive application scenario that each iteration consults user on verifying the FDs with highest confidence.Our experiment results also show our approach requires fewer manual works than statistics-based approach in interactive application scenario.
出处 《计算机学报》 EI CSCD 北大核心 2017年第1期207-222,共16页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2012CB316203) 国家自然科学基金(61332006 61472321) 西北工业大学基础研究基金(3102014JSJ0013 3102014JSJ0005)资助~~
关键词 数据质量 函数依赖 数据置信度 条件概率 data quality functional dependency data confidence conditional probability
  • 相关文献

参考文献2

二级参考文献209

  • 1Benge J, Jordan G M W, Smith P, et al. Global Data Management Survey: The new economy is the data economy[R]. Coopers, Price Waterhouse, 2001.
  • 2Eckerson W W. Data Quality and the bottom line: achieving busi- ness success through a commitment to highquality data. Data Warehousing Institute, 2002.
  • 3English L. Plain English on data quality : Information quality management:The next frontier[J]. DM Review Magazine, 2000.
  • 4Mullins C S. Database Administration: The Complete Guide to Practices and Procedures[M]. Addison Wesley.
  • 5Codd E F. Relational Completeness of Data Base Sublanguages [C]// Rustin R J, ed. Data Base Systems, Courant Computer Science Symposia. Vol. 6, Englewood Cliffs, N. J :PrenticeHall, 1972.
  • 6Korth,A. S. a. H. F. Database System Concepts[M]. McGrawHill,1986.
  • 7Ullman J D. Principles of Database Systems[M]. Computer Science Press, 1982.
  • 8Abiteboul S, Vianu R H V. Foundations of Databases[M]. Addison Wesley, 1995.
  • 9Beeri C M Y V. The implication problem for data dependencies[C]// Proc. Intl. Conf. on Algorithms, Languages and Programming. Berlin: Springer-Verlag, 1981.
  • 10Beeri C M Y V. A proof procedure for data dependencies[J]. Journal of ACM, 1984,31 (4) : 718-741.

共引文献265

同被引文献38

引证文献6

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部