期刊文献+

基于Spark的时态查询扩展与时态索引优化研究 被引量:3

Research on Temporal Query Expansion and Temporal Index Optimization Based on Spark
下载PDF
导出
摘要 时空数据库和基于集群计算的时间分析工具大多基于外存,将其应用在大数据处理场景下系统性能将迅速降低。为此,基于Spark构建一个易用且高可扩展的时态大数据查询分析系统。通过扩展Spark SQL解析器,使其能够支持类SQL形式的时态操作,运用SIMBA开源项目的方法,引入全局过滤和局部时态索引2种优化策略,使得系统能以高吞吐量及低延迟执行时态查询操作。基于时态查询效率的评估实验结果表明,在不同影响参数下,该系统的时态查询性能优于原生的Spark SQL查询处理方案。 There exists some temporal databases and temporal analysis tools based on cluster-based computing systems.However,most of them are disk-oriented and performance degenerate rapidly when processing big data. This paper proposes a system which is based on Spark,and provides accessible and scalable temporal query scheme with large temporal data for users. Specifically,it extends Spark SQL parser to support SQL-like temporal operations. Besides,it uses the index manager based on Spark SQL which is proposed by SIMBA,and embeds optimization strategies in two aspects:global filtering and local temporal index. Depending on these optimization rules,the system achieves high throughput and lowlatency in temporal operations. Evaluation experiment results on temporal query efficiency and effectiveness showthis system has improved temporal query performance over original Spark SQL in different factors.
出处 《计算机工程》 CAS CSCD 北大核心 2017年第7期22-28,37,共8页 Computer Engineering
基金 安徽省高校自然科学研究重点项目"基于关键字的大规模地理数据查询方法研究"(KJ2015A310)
关键词 时态大数据 Spark系统 SPARK SQL组件 时态查询 时态索引 高吞吐量 低延迟 temporal big data Spark system Spark SQL component temporal query temporal index high throughput low latency
  • 相关文献

参考文献1

二级参考文献15

  • 1ALLEN J F.Maintaining knowledge about temporal intervals[J].Communications of the Association for Computing Machinery,1983,26( 11):832-843.
  • 2COMBI C,POZZI G,ROSSATO R.Querying temprol clinical databases on granular trends[J].Journal of Biomedcal Informatics,2012,45(2) :273-291.
  • 3COWLEY W,PLEXOUSAKIS D.A interval algebra for indeterminate time[C]//Proc of the 17th National Conference on Artificial Intelligence.Austin:AAAL,2000:470-475.
  • 4GAO Deng-feng,GENDRANO J A G,MOON B,et al.Main memory-based algorithms for efficient parallel aggregation for temporal databases[J].Distributed and Parallel Databased,2004,16(2):123-163.
  • 5JENSEN C S,SNODGRASS R T,SOO M D.Extending existing dependency theory to temporal databases[J].IEEE Trans on Knowledge and Data Engineering,1996,8(4) :563-582.
  • 6BRUSONI V ,CONSOLE L,TERENZIANI P,et al.Qualitative and quantitative temporal constraints and relational databases:theory,architecture,and applications[J].IEEE Trans on Knowledge and Data Engineering,1999,11(6):948-968.
  • 7汤庸,刘海,郭欢,等.TempDB:时态数据管理系统[J].计算机研究与发展,2010,47(z1):442-445.
  • 8WBITE T.Hadoop:the definitive guide[M].3rd ed.[S.l.]:OReilly Media,Inc,2009:1-16.
  • 9BORTHAKUR D,GRAY J,SARMA J S,et al.Apache Hadoop goes realtime at Facebook[C]//Proc of ACM SIGMOD International Conference on Management of Data.New York:ACM Press,2011:1071-1080.
  • 10YANG Jin,TANG De-yu,ZHOU Yi.A distributed storage model for EHR based on HBase[C]//Proc of Information Management,Innovation Management and Industrial Engineering.Shenzhen:IEEE Press,2011:369-372.

同被引文献15

引证文献3

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部