期刊文献+

大数据云清洗系统的设计与实现 被引量:1

Design and Implementation of Cloud Clean System on Big Data
下载PDF
导出
摘要 数据清洗是大数据中一个重要的主题。本文基于Hadoop设计并实现了一个大数据的云清洗系统。通过Map-Reduce计算模型,该系统能够检测并修复数据质量方面的各类问题。该系统包含以下特征:(1)支持数据质量方面各类问题的清洗工作;(2)数据云清洗进度可视化以及参数设置;(3)友好的数据集输入接口以及清洗后的数据集输出接口。该大数据云清洗系统对文本数据和数据库数据均是一个有效且高效的数据清洗系统。 Data cleaning is one of the central issues in big data. The paper describes a cloud clean system based on Hadoop for data cleaning. Using Map- Reduce model,the system detects and repairs various data quality problems in big data.The paper designs the system from the following features:( 1) the support for cleaning multiple data quality problems in big data;( 2) a visual tool for watching the status of big data cleaning process and tuning the parameters for data cleaning;( 3)the friendly interface for data input and setting and cleaned data collection for big data. The cloud clean system is a promising system that provides efficient and effective data cleaning mechanism for big data in either files or database.
出处 《智能计算机与应用》 2015年第3期88-90,共3页 Intelligent Computer and Applications
基金 国家自然科学基金(61173022)
关键词 大数据 数据质量 云清洗 MAP-REDUCE Big Data Data Quality Cloud Clean Map-Reduce
  • 相关文献

参考文献12

  • 1DEAN J, GHEMAWAT S. MapReduce: Simplified data processingon large clusters [ C]//OSDI, San Francisco, CA, USA: USENIX,2004.
  • 2KOLBL, THOR A, RAHM E. Dedoop: Efficient deduplication withhadoop[J]. PVLDB, 2012,5( 12) ;1878 - 1881.
  • 3KOLB L, THOR A, RAHM E. Load balancing for map - reduce -based entity resolution [ C J//ICDE, [S. 1. ] : IEEE, 2012 : 618 -629.
  • 4RAMAN V,HELLERSTEIN J M. Potter’s wheel: An interactive da-ta cleaning system[C]//VLDB, Rome, Italy: VLDB, 2001.
  • 5WEIS M, MANOLESCU I. Xclean in action (demo) [C]//CIDR,Asilomar, CA, USA: [ s. n. ] ,2007.
  • 6FANW, GEERTS F, JIA X. Semandaq: a data quality system basedon conditional functional dependencies [ J ] . PVLDB,2008,1 (2):1460-1463.
  • 7FANW, LI J, MA S, et al. Yu. CerFix: A system for cleaning datawith certain fixes[J]. PVLDB, 2011,4(12) : 1375 -1378.
  • 8EBAIDA, ELMAGAKMID A K, ILYAS I F, et al. Nadeef: A gen-eralized data cleaning system[ J]. PVLDB, 2013,6 (12) : 1218 -1221.
  • 9ELMAGARMID A K, IPEIROTIS P G, VERYKIOS V S. Duplicaterecord detection : A survey [ J ]. IEEE Trans. Knowl. Data Eng.,2007,19(1) :1 -16.
  • 10FAN W, GEERTS F, WUSEN J. Determining the currency of data[J]. ACM Trans. Database Syst.,2012,37(3〉:1 -45 .

同被引文献9

引证文献1

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部