期刊文献+

大数据质量管理:问题与研究进展 被引量:34

Big Data Quality Management: Problems and Progress
原文传递
导出
摘要 当前大数据在多个领域广泛存在,大数据的质量对其有效应用起着至关重要的作用,因而需要对大数据进行质量管理。尽管数据质量管理方面已经有一些研究成果,但由于大数据具有规模大、速度快和多样性高的特点,现有的方法难以适用于大数据质量管理。本文针对错误发现、错误修复和劣质数据查询处理,综述了大数据质量管理的问题与挑战,认为大数据质量管理的挑战主要有计算困难、错误混杂和缺少知识3个方面。本文依据这3个方面的解决方法,对大数据质量管理目前的研究进展进行了综述,并展望了大数据质量管理未来的研究方向。 Big data have wide applications. Since the quality of big data plays a crucial role in these data-centric applications, data quality management techniques for big data are in demand. Although some theories and techniques for data quality management have been proposed, due to the volume, variety and velocity of big data, current methods could hardly be applied to data management for big data. This paper discusses the problems and challenges for error detection, error repair and query processing of dirty data in big data management, and identifies intractability, mixed errors and the lack of knowledge as three new challenges to data quality management. The progress of big data quality management in these three aspects is reviewed and open problems for future research are proposed.
作者 王宏志
出处 《科技导报》 CAS CSCD 北大核心 2014年第34期78-84,共7页 Science & Technology Review
基金 国家重点基础研究发展计划(973计划)项目(2012CB316200) 国家自然科学基金项目(61472099)
关键词 数据质量 大数据 数据清洗 data quality big data data cleaning
  • 相关文献

参考文献4

二级参考文献240

  • 1Bayardo R J, Ma Y, Srikant R. Scaling up all pairs similarity search//Proceedings of the WWW. 2007.
  • 2Gravano L, Ipeirotis P G, Jagadish H V, Koudas N, Muthukrishnan S, Srivastava D. Approximate string joins in a database (Almost) for free//Proeeedings of the VLDB. 2001.
  • 3Cohen W W. Integration of heterogeneous databases without common domains using queries based on textual similarity// Proceedings of the SIGMOD Conference. 1998 : 201-212.
  • 4Cohen W W, Hirsh H. Joins that Generalize: Text classification using WHIRL//Proceedings of the KDD. 1998: 169- 173.
  • 5Chakrabarti K, Chaudhuri S, Ganti V, Xin D. An efficient filter for approximate membership checking//Proceedings of the SIGMOD Conference, 2008 : 805- 818.
  • 6Papapetrou P, Athitsos V, Kollios G, Gunopulos D. Reference-Based alignment in large sequence databases//Proceedings of the PVLDB. 2009, 2: 205-216.
  • 7Li Y, Terrell A, Patel J M. WHAM: A high-throughput sequence alignment method//Proceedings of the SIGMOD Conference. 2011:445-456.
  • 8Helmer S, Moerkotte G. Evaluation of main memory join algorithms for joins with set comparison join predicates//Proceedings of the VLDB. 1997: 386-395.
  • 9Baeza-Yates R A, Gonnet G H. A fast algorithm on average for all-against-all sequence matching//Proeeedings of the SPIRE/CRIWG. 1999:16-23.
  • 10Gonnet G H, Cohen M A, Benner S A. Exhaustive matching of the entire protein sequence database. Science, 1992, 256: 1443-1445.

共引文献313

同被引文献325

引证文献34

二级引证文献393

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部