期刊文献+

基于组件的大数据分析服务平台 被引量:7

Module Based Big Data Analysis Platform
下载PDF
导出
摘要 随着数据规模的快速增长,单机的数据分析工具已经无法满足需求。针对大数据的分析问题,设计并实现了一种基于组件的大数据分析服务平台Haflow。Haflow自定义了业务流程模型和可扩展的组件接口,组件接口支持各种异构工具的集成。系统接收用户定义的业务流程,将其翻译成执行流程实例,提交到Hadoop分布式集群上执行。Haflow是一个可扩展的、分布式的、支持异构分析工具的、面向服务的大数据分析服务平台。提出该平台有两重意义:一方面平台将与数据分析业务无关的工作封装起来,支持各种异构组件,以加快分析应用的开发速度;另一方面,平台后端使用Hadoop分布式系统来实现多任务的并发,从而提高应用的平均执行速度。 As the expansion of data size, the data analysis tools that run only on stand-alone computers are no longer sufficient. We designed and implemented a module based big data analysis platform named Hallow to solve this problem. In Hallow, we defined an analysis business model and an extensible module interface, which supports integration of heterogeneous tools. Users submit their analysis flows, and system will interpret them, and then commit to the Hadoop. Hallow is an extensible,distributed,heterogeneous supported, service oriented big data platform. The goal of the platform is twofold. First, it provides an platform that encapsulates the dummy jobs that have nothing to do with the business itself, improving the development speed of analysis applications. Second, by submitting jobs to Hadoop which can execute jobs concurrently, the platform decreases the mean time of the analysis applications.
出处 《计算机科学》 CSCD 北大核心 2014年第9期75-79,共5页 Computer Science
基金 国家自然科学基金(61202065 61170074) 国家863计划(2012AA011204) 国家科技支撑计划(2012BAH05F02)资助
关键词 大数据 数据分析 数据挖掘 组件 分布式 服务 平台 Big data Data analysis Data mining, Module Distributed Service Platform
  • 相关文献

参考文献8

二级参考文献40

  • 1Ian Foster.Globus Toolkit Version 4: Software for Service-Oriented Systems[J].Journal of Computer Science & Technology,2006,21(4):513-520. 被引量:44
  • 2[1]Adviszor:Visual Insights Data Visualization. 1999
  • 3[2]Smyth P. Breaking Out of the Black-box: Research Challenges in Data Mining. Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), 2001
  • 4[3]Crapo A W, William L B W. Wallace A, et al. Visualization and the Process of Modeling:A Cognitive-theoretic View. Knowledge Discovery and Data Mining, 2000
  • 5[4]Berson A, Smith S, Thearling K. Building Data Mining Application for CRM. New York,NY: The McGraw-Hill Companies, 1999:231-315
  • 6[5]Healey C G Choosing Effective Colours for Data Visualization. www.cs.ubc. ca/libs/imager/tr/he aley. 1996b.html.
  • 7[6]Bentley C, Ward M. Animating Multidimensional Scaling to Visualize N-dimensional Data Sets. Proceedings of Information Visualization,1996
  • 8Park B, Kargupta H. Distributed data mining: algorithms, systems, and applications[M]//Ye N. The Handbook of Data Mining. Mahwah, NJ: Lawrence Erlbaum Associates, 2002: 341-358.
  • 9Jeffrey D, Sanjay G. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
  • 10Brants T, Popat A C, Xu Peng, et al. Large language models in machine translation[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), June 2007: 858-867.

共引文献99

同被引文献50

引证文献7

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部