期刊文献+

含缺失属性值的问题数据检测与修复 被引量:9

Error data detection and repair in condition of field value missing
下载PDF
导出
摘要 为更加准确地对问题数据进行检测及修复,针对存在属性值缺失的情况,提出综合利用数据质量规则与FellegiHolt方法进行数据质量检查的策略。针对不同的检测需求,分别设计以问题数据定位和问题数据修复为目标的检测算法,提出相应算法以解决问题数据的修复以及缺失数据的填充问题。分别利用实例数据与生成数据进行实验,实验结果表明,该方法对问题数据进行检测的召回率和准确率有明显优势,两种检测策略在进行问题数据检测时的效率也有较大差异。 To carry out the detection of error data in the condition of field value missing more exactly,an error data detection method based on Fellegi-Holt method and data quality rules was put forward.To meet different requirements,two algorithms were designed to solve the error data location and repairing problems.Algorithms were also put forward to solve the error data repairing and missing data filling problems.Experiments were conducted with both real-life and synthetic data to examine the algorithms.The results show that there is a great improvement in the recall and accuracy rate of error data detection,and the efficiencies of two algorithms differ a lot.
出处 《计算机工程与设计》 北大核心 2016年第3期643-649,共7页 Computer Engineering and Design
基金 国家自然科学基金项目(61371196) 中国博士后科学基金特别基金项目(201003797)
关键词 属性值缺失 数据规则 Fellegi-Holt 数据修复 数据填充 field value missing data rules Fellegi-Holt data repairing data filling
  • 相关文献

参考文献2

二级参考文献40

  • 1向上.信息系统中的数据质量评价方法研究[J].现代情报,2007,27(3):67-68. 被引量:16
  • 2Labrinidis A, Jagadish H. Challenges and opportunities with big data. Proceedings of the VLDB Endowment, 2012, 5(12): 2032-2033.
  • 3Chang C, Kayed M, Girgis M R, ShaMan K F, others. A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1411-1428.
  • 4Lu J, Lu Y, Cong G. Reverse spatial and textual K nearest neighbor search. In: Proceedings of the 2011 International Conference on Man- agement of Data. 2011,349-360.
  • 5Simmhan Y L, Plale B, Gannon D. A survey of data provenance in e-science. ACM Sigmod Record, 2005, 34(3): 31-36.
  • 6He B, Patel M, Zhang Z, Chang K C C. Accessing the deep web. Com- munications of the ACM, 2007, 50(5): 94-101.
  • 7Lu J, SeneUart P, Lin C, Du X, Wang S, Chen X. Optimal top-k gener- ation of attribute combinations based on ranked lists. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 409-420.
  • 8Aggarwal C C, Wang H. Managing and mining graph data. Springer.Publishing Company, Incorporated, 2010.
  • 9Oceanbase. http://'oceanbase.taobao.org.
  • 10Sikka V, Farber F, Lehner W, Cha S K, Peh T, Bornh6vd C. Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 731-742.

共引文献32

同被引文献80

引证文献9

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部