期刊文献+

一种正交分解大数据处理系统设计方法及实现 被引量:12

An Orthogonal Decomposition Based Design Method and Implementation for Big Data Processing System
下载PDF
导出
摘要 MapReduce等计算框架的出现开启了大数据处理新纪元,以Hadoop,Spark为代表的大数据处理系统具有大吞吐率、跨平台、高可扩展的优势,并得到广泛应用.然而,为避免与具体的操作系统、硬件平台绑定,这些系统的设计与优化集中在计算模型、调度算法等方面,无法充分利用底层平台的优势.提出了一种基于正交分解的大数据处理系统设计与优化方法,将系统分解为松耦合的多个功能正交的模块,使存储、处理功能分离出来,交给能够利用底层平台操作系统甚至硬件资源的存储、执行引擎,原大数据系统退化为调度平台;进而,提出基于锁无关机制的存储底层优化策略和基于指令超级优化的执行引擎底层优化策略.以此为指导,以Hadoop作为兼容和改进的对象,实现了原型大数据处理系统Arion.Arion既能保持Hadoop的跨平台、高可扩展的优势,又能消除任务执行的瓶颈,其本地化的设计与优化手段对非Hadoop平台同样有效.通过在原型系统上的实验证明,Arion能够提升大数据处理任务的执行效率,最高达7.7%. Big data stimulates a revolution in data storage and processing field,resulting in the thriving of big data processing systems,such as Hadoop,Spark, e tc ,which build a brand new platform with platform independence,high throughput, and good scalability. On the other hand, substrate platform underpinning these systems are ignored because their designation and optimization mainly focus on the processing model and related frameworks & algorithms. We here present a new loose coupled,platform dependent big data processing system designation & optimization method which can exploit the power of underpinning platform, including OS and hardware, and get more benefit from these local infrastructures. Furthermore, based on local OS and hardware, two strategies, that is, lock-free based storage and super optimization based data processing execution engine, are proposed. Directed by the aforementioned methods and strategies, we present Arion, a modified version of vanilla Hadoop,which show us a new promising way for Hadoop optimization, meanwhile keeping its high scalability and upper layer platform independence. Our experiments prove that the prototype Arion can accelerate big data processing jobs up to 7.7%.
出处 《计算机研究与发展》 EI CSCD 北大核心 2017年第5期1097-1108,共12页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61202061 61202413) 中国科学院计算技术研究所创新课题项目(20146080)~~
关键词 大数据处理系统 计算框架 本地化 锁无关 超级优化 执行引擎 big data processing system computing framework localization lock free super optimization excecution engine
  • 相关文献

参考文献1

二级参考文献68

  • 1Big data. Nature, 2008, 455(7209): 1-136.
  • 2Dealing with data. Science,2011,331(6018): 639-806.
  • 3Holland J. Emergence: From Chaos to Order. RedwoodCity,California: Addison-Wesley? 1997.
  • 4Anthony J G Hey. The Fourth Paradigm: Data-intensiveScientific Discovery. Microsoft Research, 2009.
  • 5Phan X H, Nguyen L M,Horiguchi S. Learning to classifyshort and sparse text Web with hidden topics from large-scale data collections//Proceedings of the 17th InternationalConference on World Wide Web. Beijing, China,2008:91-100.
  • 6Sahami M, Heilman T D. A web-based kernel function formeasuring the similarity of short text snippets//Proceedingsof the 15th International Conference on World Wide Web.Edinburgh, Scotland, 2006: 377-386.
  • 7Efron M, Organisciak P,Fenlon K. Improving retrieval ofshort texts through document expansion//Proceedings of the35th International ACM SIGIR Conference on Research andDevelopment in Information Retrieval. Portland, OR, USA,2012: 911-920.
  • 8Hong L,Ahmed A, Gurumurthy S,Smola A J, Tsioutsiou-liklis K. Discovering geographical topics in the twitterstream//Proceedings of the 21st International Conference onWorld Wide Web(WWW 2012). Lyon, France, 2012:769-778.
  • 9Pozdnoukhov A,Kaiser C. Space-time dynamics of topics instreaming text//Proceedings of the 3rd ACM SIGSPATIALInternational Workshop on Location-Based Social Networks.Chicago-IL,USA, 2011: 1-8.
  • 10Sun Yizhou,Norick Brandon, Han Jiawei, Yan Xifeng, YuPhilip S,Yu Xiao. Integrating meta-path selection with user-guided object clustering in heterogeneous information net-works/ /Proceedings of the 18th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining.Beijing, China, 2012: 1348-1356.

共引文献710

同被引文献106

引证文献12

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部