期刊文献+

大数据与OLAP系统 被引量:3

Big Data and OLAP Systems
下载PDF
导出
摘要 OLAP(online analytical processing,在线联机分析处理)是关系数据基础上实现商业智能的核心技术。在大数据时代,人们迫切希望在由普通机器组成的大规模集群上能实现高性能的OLAP,然而系统性能的挑战巨大。可喜的是,近年来进展迅速,涌现了很多以Hadoop上的数据进行OLAP的所谓SQL on Hadoop系统,并且系统性能不断提升。在综述OLAP技术发展的基础上,重点对几个有代表性的SQL on Hadoop系统进行了测试分析,并展示了这类系统的性能特点。可以预见,未来在低成本的大数据OLAP市场,这类系统会占有重要位置。 OLAP (online analytical processing) is a key technology of business intelligence based on relational data. In big data era, people want to achieve high performance OLAP using a large cluster of ordinary nodes. However, the performance of such systems is a big challenge. Recently, many SQL on Hadoop systems have been proposed to address this challenge. We have seen a significant performance improvement of such systems. A survey of technology development of OLAP technologies was first provided. Then, a study of the performance of three representatives SQL on Hadoop systems was focused on. Based on the results, it is expected that such systems will play an very important role in the market of low cost OLAP analysis.
出处 《大数据》 2015年第1期48-60,共13页 Big Data Research
基金 国家自然科学基金面上项目"高度可扩展的数据仓库数据编码方法及查询处理新技术研究"(No.61170013) 中国人民大学科学研究基金(中央高校基本科研业务费专项资金)资助项目(No.14XNLQ06) 国家社会科学基金重大项目"云计算环境下的信息资源集成与服务研究"(No.12&ZD220)~~
关键词 大数据 OLAP SQL分析 SQL on HADOOP big data, online analytical processing, SQL analysis, SQL on Hadoop
  • 相关文献

参考文献9

  • 1Codd E F, Codd S B, galley C T. Providing OLAP (online analytical processing) to user-analysts: an IT mandate. E f todd & Associates. 1998.
  • 2Thomsen E. OLAP Solutions: Building Multidimensional Information Systems, 2nd Edition. Hoboken: John Wiley & Sons, 2002.
  • 3Daniel M S, Abadi D J, Batkin A, et al. C-store: a column-oriented DBMS. Proceedings of the 31st Very Large Data Bases (VLDB) Conference, Trondheim, Norway, 2005:553-564.
  • 4Kaufmann M, Manjili A A, Vagenas P, et al. Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. Proceedings of Acre Special Interest Group on Management of Data (SIGMOD) International Conference on Management of Data, New York, USA, 2013:1173-1184.
  • 5Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Proceedings of Operating Systems Design and Implementation (OSDI), San Francisco, CA, USA, 2004: 137-150.
  • 6Pavlo A, Paulson E, Rasin A, et al. A comparison of approaches to large-scale data analysis. Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD) International Conference on Management of Data, Providence, USA, 2009:165-178.
  • 7Chen Y G, Qin X P, Bian H Q, et al. A study of SQL-on-hadoop systems. Lecture Notes in Computer Science, 2014(8807): 154-166.
  • 8Hive cost based optimization, https:// cwiki.apache.org/confluence/display/Hive/ Cost-based +optimization+in+Hive, 2015.
  • 9Ion Stoica. Berkeley data analytics stack (BDAS) overview, http:// a mpca mp.berkeley.edu/wp-content/ uploads/2013/02/Berkeley-Data- Analytics-Stack-BDAS-Overview-lon- Stoica-Strata-2013.pdf, 2013.

同被引文献11

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部