期刊文献+

基于数据质量规则的缺失结果解释约减 被引量:2

Reducing Explanations of Non-Answers Using Data Quality Rules
下载PDF
导出
摘要 由于数据缺失等原因,用户在查询结果中可能没有得到预期的答案.现有的方法通过枚举可能的缺失记录来解释"为什么没有why-not"的问题.然而,枚举得到的解释数量庞大,用户无法逐一浏览确认缺失数据.实际上,这些可能的解释中有许多是不合理的,如何约减解释数量存在挑战.根据真实数据试验,利用数据中存在的唯一性约束来进行约减,生成的解释数量仍有几十万个.研究利用数据质量规则(如函数依赖)来高效约减缺失结果的解释.首先,提出一种基于函数依赖的解释约减算法FDR(functional dependencies-based reduction).其次,为了辅助用户浏览生成的解释,进一步研究利用近似函数依赖对解释进行排序.真实数据实验表明,FDR方法能够比现有的方法减少2~5个数量级的解释(从几十万个减少至几千个甚至几十个);利用近似函数依赖排序的Top-1解释精确率达到90%以上. Owing to missing tuples,the answers expected by users might not appear in the query results.Existing techniques enumerate possible missing tuples to explain why the expected answers are not returned,known as the 'why-not'problem. However,the number of all possible explanations can be extremely large,which is too difficult(if not impossible)for users to verify.It is challenging,however,to reduce explanations.According to our examination in real data sets,there are still hundreds of thousands of explanations after the reduction by unique constraints.In this paper,the reduction of explanations of non-answers by using data quality rules is studied,such as functional dependencies.First,a novel Functional Dependencies-based Reduction(FDR)algorithm for reduction is proposed. To support effective exploration of explanations,the ranking of explanations by approximate functional dependencies is studied.The experimental evaluation on real data sets demonstrates the superiority of the proposed techniques.Our FDR approach can reduce the number of explanations in 2~5orders of magnitude,from hundreds of thousands to dozens,compared with the state-of-the-art methods. The top-1 explanations ranked by approximate functional dependencies can achieve a high accuracy of 90%.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期221-229,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61202008) 国家"八六三"高技术研究发展计划基金项目(2012AA040911)
关键词 数据质量 依赖规则 缺失结果解释 函数依赖 近似函数依赖 data quality data dependencies non-answer explanation functional dependency approximate functional dependency
  • 相关文献

参考文献4

  • 1Chris Giannella,Edward Robertson.On approximation measures for functional dependencies[J].Information Systems.2003(6)
  • 2Jyrki Kivinen,Heikki Mannila.Approximate inference of functional dependencies from relations[J].Theoretical Computer Science.1995(1)
  • 3Ronald S. King,James J. Legendre.Discovery of Functional and Approximate Functional Dependencies in Relational Databases[J].Journal of Applied Mathematics and Decision Sciences.2003(1)
  • 4张守志,施伯乐.一种发现函数依赖集的方法及应用[J].软件学报,2003,14(10):1692-1696. 被引量:11

二级参考文献5

  • 1Mannila H, Raihi K-J. Algorithms for inferring functional dependencies from relations. Data and Knowledge Engineering, 1994,12(1):83-99.
  • 2Huhtala Y, Kārkkāinen J, Porkka P, Toivonen H. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999,42(2): 100-111
  • 3Savnik I. Bottom-Up induction of functional dependencies from relations. In: Piatetsky-Shapiro G, ed. Proceedings of the AAAI'93 Workshop on Knowledge Discovery in Databases. 1993. 174-185.
  • 4Beeri C, Dowd M, Fagin R, Statman R. On the structure of Armstrong relations for functional dependencies. Journal of the ACM,1984,31(1):30-46.
  • 5Lopes S, Petit J-M, Lakhal L. Efficient discovery of functional dependencies and Armstrong relations. In: Proceedings of the EDBT 2000. LNCS 1777, Heidelberg: Springer-Verlag, 2000. 350--364.

共引文献10

同被引文献23

  • 1金连,王宏志,黄沈滨,高宏.基于Map-Reduce的大数据缺失值填充算法[J].计算机研究与发展,2013,50(S1):312-321. 被引量:18
  • 2李石君,于俊清,欧伟杰.基于HTML模式代数的Web信息提取方法[J].计算机研究与发展,2006,43(9):1644-1650. 被引量:8
  • 3Rahm E, Do H H. Data cleaning: Problems and current approaches . IEEE Data Engineering Bulletin, 2000, 23 (4) 3-13.
  • 4Yu Wei, Li Shijun, Zhang Yunlu, et al, Micro blog ranking for personalize recommendation [J]- Journal of Computational Information Systems, 2013, 9(20): 8243- 8250.
  • 5Bohannon P, Fan Wenfei, Geerts F, et at. Conditional functional dependencies for data cleaning [C] //Proc of the 23rd IEEE Int Conf on Data Engineering Alamitos, CA: IEEE, 2007:746-755.
  • 6Bravo L, Fan Wenfei, Ma Shuai, et al. Extending dependencies with conditions [C]//Proc of the 33rd Int Conf on Very Large Databases. Vienna, CA: CTIT, 2007: 243- 254.
  • 7Koudas N, Saha A, Srivastava D, et al. Metric functional dependencies [C] //Proc of the 25th IEEE Int Conf on Data Engineering. Alamitos, CA: IEEE, 2009:1275-1278.
  • 8Korn F, Muthukrishnan S, Zhu Y. Checks and balances: Monitoring data quality problems in network traffic databases [C]//Proe of the 29th Int Conf on Very Large Databases. San Francisco: Morgan Kaufmann, 2003:536-547.
  • 9Xiong Hui, Pandey G, Steinbaeh M, et al, Enhancing data analysis with noise remora [J]. IEEE Trans on Knowledge and Data Engineering, 2006, 18(3) : 304-319.
  • 10田建伟,李石君.基于层次树模型的Deep Web数据提取方法[J].计算机研究与发展,2011,48(1):94-102. 被引量:14

引证文献2

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部