HC-Store： putting MapReduce＇s foot in two camps

HC-Store： putting MapReduce＇s foot in two camps

导出

摘要 MapReduce is a popular framework for large- scale data analysis. As data access is critical for MapReduce＇s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storage model is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models - pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore （HC-store）. Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store. We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload. MapReduce is a popular framework for large- scale data analysis. As data access is critical for MapReduce＇s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storage model is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models - pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore （HC-store）. Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store. We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.

作者 Huiju WANG Furong LI Xuan ZHOU Yu CAO Xiongpai QIN Jidong CHEN Shan WANG

机构地区 DEKE Lab School of Information EMC Labs China School of Computing

出处《Frontiers of Computer Science》 SCIE EI CSCD 2014年第6期859-871,共13页 中国计算机科学前沿（英文版）

基金 Acknowledgements This work was sponsored by the National Key Basic Research Program of China （973 Program）（2014CB340403）, the National Natural Science Foundation of China （Grant Nos. 61170013, 61272138 and 61232007）.

关键词 MAPREDUCE Hadoop HC-store cost model column-store PAX-store MapReduce, Hadoop, HC-store, cost model, column-store, PAX-store

分类号 TP393.18 [自动化与计算机技术—计算机应用技术] TU247.2 [建筑科学—建筑设计及理论]

引文网络
相关文献

参考文献13

1Dean J, Ghemawat S, Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems and Implementation. 2004, 137-150.
2Floratou A, Patel J M, Shekita E J, Tata S. Column-oriented storage techniques for mapreduce. In: Proceedings of the 37th International Conference on Very Large Data Bases. 2011, 4(7): 419~-29.
3He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, Xu Z. RCFile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Proceedings of the IEEE 27th International Conference on Data Engineering. 2011, 1199-1208.
4Copeland G P, Khoshalian S N. A decomposition storage model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data. 1985, 268-279.
5Abadi D J, Madden S, Hachem N. Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 967-980.
6Stonebraker M, Abadi D J, Batldn A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O'Neil E J, O'Neil P E, Rasin A, Tran N, Zdonik S B. C-store: A column-oriented dbms. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 553-564.
7Pavlo A, Paulson E, Rasin A, Abadi D J, DeWitt D J, Madden S, Stone- braker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 165-178.
8Chen S. Cheetah: A high performance, custom data warehouse on top of mapreduce. Proceedings of the Very Large Data Bases Endowment, 2010, 3(2): 1459-1468.
9Lin Y, Agrawal D, Chen C, Ooi B C, Wu S. Llama: leveraging colum- nar storage for scalable join processing in the mapreduce framework. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 961-972.
10Jindal A, Quian6-Ruiz J A, Dittrich J. Trojan data layouts: right shoes for a running elephant. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011, 2l.

1机器人i—foot[J].中国科技教育,2006(12):40-40.
2可用作交通工具的机器人[J].新闻周刊,2004(47):74-74.
3休闲时刻舒适地带——Air Tech Computer Ergonomic Flex Foot Rest[J].个人电脑,2004,10(3):270-270.
4王鹏飞,Sun,Lining.Wheeled foot quadruped robot HITAN-I[J].High Technology Letters,2006,12(4):346-350. 被引量：2
5WEI Hui,SHUAI Mei,WANG Zhongyu,ZHANG Chuanyou,LI Li.Novel Flexible Foot System for Humanoid Robot Adaptable to Uneven Ground[J].Chinese Journal of Mechanical Engineering,2010,23(6):725-732.
6“坐骑”机器人[J].中国市场,2005(2):14-14.
7张志敏.当好科学传播第一发球员——评Talking Science With the Media:Get on the Front Foot[J].科技导报,2012,30(22):80-80.
8Vishay推出采用MICRO FOOT芯片级封装的TrenchFET功率MOSFET[J].单片机与嵌入式系统应用,2009,9(3):88-88.
9HyunGyu Kim,Yanheng Liu,Kyungmin Jeong,TaeWon Seo.Empirical Study on Shapes of the Foot Pad and Walking Gaits for Water-Running Robots[J].Journal of Bionic Engineering,2014,11(4):572-580. 被引量：6
10Dan Sameoto,Carlo Menon.Multi-Scale Compliant Foot Designs and Fabrication for Use with a Spider-Inspired Climbing Robot[J].Journal of Bionic Engineering,2008,5(3):189-196. 被引量：10

Frontiers of Computer Science

2014年第6期

浏览历史

内容加载中请稍等...

HC-Store： putting MapReduce＇s foot in two camps

参考文献13

相关作者

相关机构

相关主题

浏览历史