摘要
MapReduce编程模型的简单性和高性价比使得其适用于海量数据的并行处理。然而,MapReduce欠缺对多数据源、组件复用以及数据可视化支持,这些缺点使用户在运用MapReduce框架进行数据挖掘时暴露出开发效率低下,重复开发等问题。提出了一种基于MapReduce的数据挖掘平台的设计与实现,该设计思想为Hadoop作为大规模数据计算平台在数据挖掘、数据可视化以及商业智能应用方面的不足提供了参考与弥补。同时,基于该方法实现了一个大规模数据挖掘工具。
Profiting from its simplicity and high cost performance, MapReduce programming paradigm is suitable for massive parallel data processing. However, MapReduee's lacking supports for multiple data source, component reuse, and data visua lization bring in problems such as low efficiency development and redundant coding. A new design and implementation of MapRe ducebased data mining platform is proposed to give reference implementation of massive data mining, data visualization and busi ness intelligence applications based on Hadoop. Based on this proposal, a massive data mining tool is implemented.
出处
《计算机工程与设计》
CSCD
北大核心
2013年第2期495-501,共7页
Computer Engineering and Design
基金
国家科技重大专项核高基基金项目(2010ZX01042-001-001-05)
国家科技支撑计划基金项目(2012BAH05F02
2011BAH15B03)