摘要
分析面向大数据平台的MapReduce分布式编程技术以及实现数据查询时的连接算法,针对SSB数据模型,提出基于分布式缓存的多表星型连接优化技术。利用谓词向量技术,将维表中间连接的数据依赖转化为表上的位图索引过滤,减少数据依赖产生的巨大网络开销;采用分布式缓存技术充分利用处理节点的内存,优化网络传输,减少查询代价。
The algorithm of the MapReduce distributed programming technology and the connection algorithm realizing data queries on the platform of big data were analyzed. Additionally, aiming at the star schema benchmark (SSB) data models, the distributed cache multi-table star scheme join optimization technology was proposed. The predicate vector technology was used to convert the reliance of data in the middle dimension table to the bitmap index filter to reduce the huge network overhead caused by the data reliance. The distributed caching technology was used to process the nodes memory, which optimized the network transmission and reduced the query cost.
出处
《计算机工程与设计》
CSCD
北大核心
2014年第9期3107-3112,共6页
Computer Engineering and Design
基金
2012年黑龙江省科技攻关基金项目(GC12A307)