期刊文献+

基于Spark的空间范围查询索引研究 被引量:5

RESEARCH ON RANGE QUERIES IN SPATIAL INDEX BASED ON THE SPARK
下载PDF
导出
摘要 由于传统的数据处理系统的数据存储与数据处理能力有限,不能满足处理大量数据的需求。为了发挥数据的价值,高效、高性能地处理大量数据集,提出基于Spark系统结合SIMBA的思路共同建立的大数据分析处理系统,基于Spark SQL的查询方式进行检索;在Spark中嵌入索引管理机制,将其封装在RDD内,用于提高查询效率;通过建立线段树存储数据的方式提高数据检索的效率。对于数据预处理时采用Range Partitioner分区策略的方式对数据进行分区,基于全局过滤和局部索引进行查询。保证该系统在进行查询操作时能够保持高吞吐量和低延迟特性,提高查询效率。 As the traditional data processing system,the ability to save and process data is limited,can't meet the needs of dealing with large amounts of data. In order to maximize the value of data sets with high efficiency and high performance,a large data analysis and processing system based on Spark system and SIMBA is proposed,which is based on Spark SQL query method. The index management mechanism is embedded in Spark system,encapsulated in the RDD,which improve the efficiency of query. Through the establishment of line tree to store data,we improve the efficiency of data retrieval. For pre-processing data,Range Partitioner partitioning strategy is used to partition data and query based on global filtering and local index.
出处 《计算机应用与软件》 北大核心 2018年第2期96-101,共6页 Computer Applications and Software
基金 安徽省高校自然科学研究重点项目(KJ2015A130)
关键词 Spark系统 大数据范围 查询 SparkSQL组件 Spark system Big data Range queries Components of Spark SQL
  • 相关文献

参考文献4

二级参考文献24

  • 1魏士伟,黄文明,康业娜,周娅.分布式数据库中基于半连接的查询优化算法研究[J].计算机应用,2007,27(B06):34-36. 被引量:23
  • 2Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009,42 (8) : 30-37.
  • 3Bell R M,Koren Y. Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights[C]//Proc of the 7th IEEE International Conference on Data Mining. Omaha NE, USA: IEEE, 2007: 43-52.
  • 4Takacs G, Pilaszy I, Nemeth B, et al. Matrix Factorization and Neighbor Based Algorithms the Netflix Prize Problem [C]// Proceedings of the 2008 ACM conference on Recommender sys- tems. Lausanne, Switzerland: ACM, 2008 : 267 274.
  • 5Zhou Y, Wilkinson D, Schreiber R, et al. Large-Scale ParallelCollaborative Filtering for the Netflix Prize[C]//Proc of the 4th international conference on Algorithmic Aspects in Information and Management. 2008.
  • 6Dean J,Ghemawat S. MapReduee: Simplified Data Processing on Large Clusters[J]. Communication of the ACM 50: anniversary issue, 2008,51 (1) : 107d 13.
  • 7Hadoop. Open-source software for reliable, scalable, distributed computing[-EB/OL], http://hadoop, apache, org/, 2011.
  • 8Mahout. Scalable machine learning and data mining[EB/OL]. http://mahout, apache, org, 2011.
  • 9Takacs G, Pliaszy I, Nemeth B, et al. Investigation of Various Matrix Factorization Methods for Large Recommender Systems [C]// Proc of the IEEE International Conference on Data Mi- ning Workshops. IEEE, 2008: 553-562.
  • 10Pilaszy I, Zibriczky D, Tikk D. Fast AL:based Matrix Factori- zation for Explicit and Implicit Feedback Datasets[C]//'Procee: dings of the fourth ACM conference on Recommender systems. New York: ACM, 2010 : 71-78.

共引文献82

同被引文献36

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部