期刊文献+

数据完整性的评估方法 被引量:11

Evaluation of Data Completeness
下载PDF
导出
摘要 随着信息技术的发展,数据的规模正在高速增长,数据中普遍存在质量问题.针对海量关系数据中普遍存在的数据不完整现象,研究了关系数据完整性度量问题.针对数据的完整性计算问题,提出了数据完整性计算模型,以及精确算法和基于均匀抽样的近似算法.理论分析证明了近似算法可以达到任意的精度要求,可以高效地对数据完整性进行计算.通过在DBLP数据上的实验验证了算法的有效性和高效性. With the development of information technology,the scale of data is increasing sharply, which brings more quality problems with it.Incomplete data usually exits in massive data,which gives rise to the research problem of this paper.A model of evaluating data completeness is proposed. And an exact algorithm and an approximate algorithm based on uniform sampling are proposed to evaluate data completeness in this paper.The theoretical analysis demonstrates that the proposed approximate algorithm can reach arbitrary precision,which can evaluate data completeness efficiently. Experiments on data extracted from DBLP show effectiveness and high performance of our approximate algorithm.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期230-238,共9页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展计划基金项目(2012CB316202)
关键词 数据质量 数据完整性 均匀抽样 近似算法 数据完整性模型 data quality data completeness uniform sampling approximate algorithm a model of data completeness
  • 相关文献

参考文献4

  • 1Tomasz Imieliński,Witold Lipski.Incomplete Information in Relational Databases[J].Journal of the ACM (JACM).1984(4)
  • 2Amihai Motro.Integrity = validity + completeness[J].ACM Transactions on Database Systems (TODS).1989(4)
  • 3郭志懋,周傲英.数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076-2082. 被引量:268
  • 4Wenfei Fan,Floris Geerts.Relative information completeness[J].ACM Transactions on Database Systems (TODS).2010(4)

二级参考文献24

  • 1Aebi, D., Perrochon, L. Towards improving data quality. In: Sarda, N.L., ed. Proceedings of the International Conference on Information Systems and Management of Data. Delhi, 1993. 273~281.
  • 2Wang, R.Y., Kon, H.B., Madnick, S.E. Data quality requirements analysis and modeling. In: Proceedings of the 9th International Conference on Data Engineering. Vienna: IEEE Computer Society, 1993. 670~677.
  • 3Rahm, E., Do, H.H. Data cleaning: problems and current approaches. IEEE Data Engineering Bulletin, 2000,23(4):3~13.
  • 4Galhardas, H., Florescu, D., Shasha, D., et al. AJAX: an extensible data cleaning tool. In: Chen, W.D., Naughton, J.F., Bernstein, P.A., eds. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Texas: ACM, 2000. 590.
  • 5Hernandez, M.A., Stolfo, S.J. Real-World data is dirty: data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 1998,2(1):9~37.
  • 6Lee, M.L., Ling, T.W., Lu, H.J., et al. Cleansing data for mining and warehousing. In: Bench-Capon, T., Soda, G., Tjoa, A.M., eds. Database and Expert Systems Applications. Florence: Springer, 1999. 751~760.
  • 7Monge, A.E. Matching algorithm within a duplicate detection system. IEEE Data Engineering Bulletin, 2000,23(4):14~20.
  • 8Monge, A.E., Elkan, C. The field matching problem: algorithms and applications. In: Simoudis, E., Han, J.W., Fayyad, U., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Oregon: AAAI Press, 1996. 267~270.
  • 9Savasere, A., Omiecinski, E., Navathe, S.B. An efficient algorithm for mining association rules in large databases. In: Dayal, U., Gray, P., Nishio, S., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 432~444.
  • 10Srikant, R., Agrawal, R. Mining Generalized Association Rules. In: Dayal, U., Gray, P., Nishio, S., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 407~419.

共引文献267

同被引文献112

引证文献11

二级引证文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部