

An inconsistency solution in big data based on MAP-REDUCE
摘要 大数据时代悄然而至,数据质量也引起人们的关注。在提高数据质量方面,很重要的一部分是解决数据不一致性问题。针对大数据情况下的数据不一致问题,本文提出了在MAP-REDUCE框架下的聚类算法。本文在MAP-REDUCE框架下对K-MEDOIDS聚类算法进行了改进,增强了算法的适用性和精确性,并通过仿真实验验证了在大数据环境下该算法的并行性和有效性。 With the arrival of the era of big data, data quality attracts more and more attention recently. An important part of improving data quality is to solve the problem of inconsistency. In this paper, we propose the clustering algorithm based on Map-Reduce to solve the problem of data inconstancy in big data. Moreover, we improve the clustering algorithm named K-MEDOIDS for better applicability and accuracy. At the last, we simulate the experiment on the HADOOP platform. The experiment results evaluate the concurrency and effectiveness of our algorithm in big data.
作者 范令
出处 《微型机与应用》 2015年第15期18-21,25,共5页 Microcomputer & Its Applications
关键词 大数据 数据质量 数据不一致性 MAP-REDUCE 聚类算法 big data data quality inconsistency MAP-REDUCE clustering algorithm
  • 相关文献


  • 1AEBI D,PERROCHON L.Towards improving data quality[C].Ci SMOD,1993:273-281.
  • 2FAN W,GEERTS F.Foundations of data quality management[J].Synthesis Lectures on Data Management,2012,4(5):1-217.
  • 3RAHM E,DO H H.Data cleaning:problems and current approaches[J].IEEE Data Eng.Bull.,2000,23(4):3-13.
  • 4SHAHAMATNIA E,DOROTOVIC姚I,MORA A,et al.Data inconsistency in sunspot detection[C].Intelligent Systems′2014,Springer International Publishing,2015:567-577.
  • 5DANILOWICZ C,NGUYEN N T.Consensus methods for solving inconsistency of replicated data in distributed systems[J].Distributed and Parallel Databases,2003,14(1):53-69.
  • 6孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169. 被引量:2377
  • 7KUMAR V V,DINESH K.Job scheduling using fuzzy neural network algorithm in cloud environment[J].Bonfring International Journal of Man Machine Interface,2012,2.
  • 8TEWARI N C,KODUVELY H M,GUHA S,et al.Map Reduce implementation of variational bayesian probabilistic matrix factorization algorithm[C].Big Data,20131EEE International Conference on.IEEE,2013:145-152.


  • 1Nature. Big Data [EB/OL]. [2012-10-02]. http,//www. nature, com/news/specials/bigdata/index, html.
  • 2Bryant R E, Katz R H, Lazowska E D. Big-Data computing : Creating revolutionary breakthroughs in commerce, science, and society [R]. [2012-10-02]. http:// www. cra. org/ccc/docs/init/Big_Data, pdf.
  • 3Science. Special online collection: Dealing with data [EB/OL]. [2012-10-02]. http://www, sciencemag, org/site/ special/data/, 2011.
  • 4Agrawal D, Bernstein P, Bertino E, et al. Challenges and opportunities with big data A community white paper developed by leading researchers across the United States [R/OL]. [2012-10-02]. http://cra, org/ccc/docs/init/bigdata whitepaper, pdf.
  • 5Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition, and productivity [R/OL]. [ 2012-10-02 ]. http://www, mekinsey, corn/ Insights]MGI[Research/Teehnology _ and _ Innovation]Big _ data The next frontier for innovation.
  • 6World Economic Forum. Big data, big impact: New possibilities for international development [R/OL]. [2012- 10-02]. http://www3, weforum, org/docs/WEF TC MFS BigDataBigImpact_Briefing 2012. pdf.
  • 7Big Data Across the Federal Government [EB/OL]. [2012-10-02]. http://www, whitehouse, gov/sites/default/ files/microsites/ostp/big_data fact sheet_final_ 1. pdf.
  • 8UN Global Pulse. Big Data for Development:Challenges Opportunities [R/OL]. [ 2012-10-02 ]. http://www. unglobalpulse, org/proj ects/BigDataforDevelopment.
  • 9Times N Y. The age of big data fEB/OLd. [2012-10 -02]. http://www, nytimes, com/2012/02/12/sunday review/big- datas-impact in-the-world, html?pagewanted=all.
  • 10Grobelnik M. Big-data computing: Creating revolutionary breakthroughs in commerce, science, and society [R/OL]. [2012-10 -02]. http://videolectures, net/cswc2012_grobelnik_ big_data/.









使用帮助 返回顶部