Load Feedback-Based Resource Scheduling and Dynamic Migration-Based Data Locality for Virtual Hadoop Clusters in OpenStack-Based Clouds 被引量：4

Load Feedback-Based Resource Scheduling and Dynamic Migration-Based Data Locality for Virtual Hadoop Clusters in OpenStack-Based Clouds

原文传递

导出

摘要 With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service（Iaa S） cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S（DHCI） architecture, which includes four key modules： monitoring,scheduling, Virtual Machine（VM） management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host（s） or rack（s） where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions. With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service（Iaa S） cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S（DHCI） architecture, which includes four key modules： monitoring,scheduling, Virtual Machine（VM） management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host（s） or rack（s） where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.

作者 Dan Tao Zhaowen Lin Bingxu Wang

机构地区 School of Electronic and Information Engineering Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks Network and Information Center

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第2期149-159,共11页 清华大学学报（自然科学版（英文版）

基金 supported by the Open Project Program of Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks(No.WSNLBKF201503) the Fundamental Research Funds for the Central Universities(No.2016JBM011) Fundamental Research Funds for the Central Universities(No.2014ZD03-03) the Priority Academic Program Development of Jiangsu Higher Education Institutions Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology

关键词 Hadoop resource scheduling data locality Infrastructure as a Service（Iaas） OpenStack Hadoop resource scheduling data locality Infrastructure as a Service（Iaas） OpenStack

分类号 TP302 [自动化与计算机技术—计算机系统结构] TP393.09 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1Chunling Cheng,Jun Li,Ying Wang.An Energy-Saving Task Scheduling Strategy Based on Vacation Queuing Theory in Cloud Computing[J].Tsinghua Science and Technology,2015,20(1):28-39. 被引量：5
2左利云,曹志波,董守斌.云计算虚拟资源的熵优化和动态加权评估模型[J].软件学报,2013,24(8):1937-1946. 被引量：24
3孙瑞琦,杨杰,高瞻,贺志强.一种提高虚拟化Hadoop系统数据本地性的资源调度方法[J].计算机研究与发展,2014,51(S2):189-198. 被引量：5

二级参考文献39

1losup A, Jan M, Sonmez O, Epema DHJ. On the dynamic resource availability in grids. In: Proc. of the 8th IEEE/ACM Int'l Conf. on Grid Computing (Grid 2007). Texas: 1EEE Computer Society, 2007.26-33. [doi: 10.1109]GRID.2009.4354112].
2Khalili O, He J, Olsehanowsky C, Snavely A, Casanova H. Measuring the performance and reliability of production computational grids. In: Proc. of the 7th IEEE/ACM lnt'l Conf. on Grid Computing (Grid 2006). Barcelona: IEEE Computer Society, 2006. 293-300. [doi: 10.1109/ICGRID.2006.311028].
3Xu M, Cui LZ, Wang HY, Bi YB. A multiple QoS constrained scheduling strategy of multiple workflows for cloud computing. In: Proc. of the 2009 IEEE lnt'l Symp. on Parallel and Distributed Processing with Applications. 2009. 629-634. [doi: 10.1109/ISPA. 2009.95].
4Chen K, Zheng WM. Cloud computing: System instances and current research. Ruan Jian Xue Bao/Journal of Software, 2009,20(5) 1337-1345 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/3493.html [doi: 10.3724/SP.J.1001.2013.03493].
5Tian WH, Zhao Y. Cloud Computing: Resource Scheduling Management. Beijing: National Defence Industry Publishing House, 2011 (in Chinese).
6Figueiredo R. Adaptive predictor integration for system performance prediction. In: Proc. of the IEEE Int'l Parallel and Distributed Processing Symp. IEEE Press, 2007. [doi: 10.1109/IPDPS.2007.370277].
7Diaz I, Fernandez G, Martinm M. Integrating the common information model with MDS4. In: Proc. of the 9th IEEE/ACM lnt'l Conf. on Grid Computing. 2008. [doi: 10.1109/GRID.2008.4662812].
8losup A, Sonmez O, Epema D. The characteristics and performance of groups of jobs in grids. Lecture Notes in Computer Science, 2007,46(41):382-393. [doi: 10.1007/978-3-540-74466-5_42].
9Bucur AID, Epema DHJ. Scheduling policies for processor collocation in multicluster system. IEEE Trans. on Parallel and Distributed Systems, 2007,18(7):958-962. [doi: 10.1109/TPDS.2007.1036].
10Fu S, Xu CZ. Exploring event correlation for failure prediction in coalitions of clusters. In: Proc. of the 2007 ACM/IEEE Conf. on Super Computing (SC 2007). Nevada: IEEE Computer Society, 2007.41-52. [doi: 10.1145/1362622.1362678].

共引文献31

1梁梅.云计算在多媒体教学系统中的应用[J].电子世界,2014(4):13-13. 被引量：2
2王晓萍,孟坤.基于约束处理和平滑技术的改进的进化算法[J].计算机与现代化,2014(9):1-5.
3赵明.一种联邦云系统中的资源分配算法[J].科技和产业,2015,15(1):134-138.
4王金海,黄传河,王晶,何凯,史姣丽,陈希.异构云计算体系结构及其多资源联合公平分配策略[J].计算机研究与发展,2015,52(6):1288-1302. 被引量：28
5李雪竹,陈国龙.云计算虚拟化平台的内存资源全局优化研究[J].计算机工程,2015,41(7):55-59. 被引量：8
6汤龙.基于云计算的计算机与软件实验资源应用研究探讨[J].数字技术与应用,2015,33(8):90-90. 被引量：3
7向华伟,吕垚,张雪坚.基于系统架构的云计算虚拟资源动态调配研究[J].电子技术与软件工程,2016(2):41-42. 被引量：1
8孙兰芳,张曦煌.基于蜜蜂采蜜机理的云计算负载均衡策略[J].计算机应用研究,2016,33(4):1179-1182. 被引量：11
9贾炅昊,陈宁江,李湘,黄汝维.基于可用能力建模的云虚拟机动态调整策略[J].广西大学学报（自然科学版）,2016,41(3):796-803. 被引量：1
10韩珂,蔡小波,容会,向丽萍.云计算环境中能效评估方法[J].计算机系统应用,2016,25(7):247-253. 被引量：4

同被引文献15

1Mohammad Sultan Mahmud,Joshua Zhexue Huang,Salman Salloum,Tamer Z.Emara,Kuanishbay Sadatdiynov.A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis[J].Big Data Mining and Analytics,2020,3(2):85-101. 被引量：16
2Mingda Li,Hongzhi Wang,Jianzhong Li.Mining Conditional Functional Dependency Rules on Big Data[J].Big Data Mining and Analytics,2020,3(1):68-84. 被引量：5
3黄敏,何中市,邢欣来,陈英.一种新的k-means聚类中心选取算法[J].计算机工程与应用,2011,47(35):132-134. 被引量：20
4刘兵,夏士雄,周勇,韩旭东.基于样本加权的可能性模糊聚类算法[J].电子学报,2012,40(2):371-375. 被引量：21
5郑丹,王潜平.K-means初始聚类中心的选择算法[J].计算机应用,2012,32(8):2186-2188. 被引量：35
6李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量：1590
7Yuan Tian,Chuang Lin,Zhen Chen,Jianxiong Wan,Xuehai Peng.Performance Evaluation and Dynamic Optimization of Speed Scaling on Web Servers in Cloud Computing[J].Tsinghua Science and Technology,2013,18(3):298-307. 被引量：3
8Wei Chen,Junwei Cao,Yuxin Wan.QoS-Aware Virtual Machine Scheduling for Video Streaming Services in Multi-Cloud[J].Tsinghua Science and Technology,2013,18(3):308-317. 被引量：3
9Sunil Kumar,Maninder Singh.A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem[J].Big Data Mining and Analytics,2019,2(4):240-247. 被引量：5
10Zhenhua Li,Yun Zhang,Yunhao Liu.Towards a Full-Stack Dev Ops Environment (Platform-as-a-Service) for Cloud-Hosted Applications[J].Tsinghua Science and Technology,2017,22(1):1-9. 被引量：6

引证文献4

1Chao Xue,Chuang Lin,Jie Hu.Scalability Analysis of Request Scheduling in Cloud Computing[J].Tsinghua Science and Technology,2019,24(3):249-261. 被引量：1
2曾俊.基于划分的数据挖掘K-means聚类算法分析[J].现代电子技术,2020,43(3):14-17. 被引量：18
3Ankit Kumar,Neeraj Varshney,Surbhi Bhatiya,Kamred Udham Singh.Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data[J].Big Data Mining and Analytics,2023,6(4):465-477.
4Fang Dong,Xiaolin Guo,Pengcheng Zhou,Dian Shen.Task-Aware Flow Scheduling with Heterogeneous Utility Characteristics for Data Center Networks[J].Tsinghua Science and Technology,2019,24(4):400-411. 被引量：2

二级引证文献21

1茆灵铖,谢桂芳,邵周伟,时海茹,蒋秀莲.基于大数据的高校智慧校园学生综合测评系统设计与研究[J].软件工程,2020,23(5):43-45. 被引量：12
2吴建平,詹桢,王涛,文博,杨红超.基于通讯数据的社群聚类分析[J].科教文汇,2020(21):107-109.
3赵俊炜,黎鸣,梅傲琪.基于聚类算法对主站与电厂、变电站的数据同步问题的研究[J].中国科技纵横,2020(5):28-29.
4刘金安,汤新民,胡钰明,陈强超.基于聚类分析的航空器滑行过点时间预测[J].南京航空航天大学学报,2020,52(6):903-911. 被引量：1
5刘园园.数理统计方法在计量检测数据分析中应用研究[J].大众标准化,2021(1):248-249. 被引量：6
6曹博,郑炅,牛广潇.基于AR与大数据的交互式场景系统的设计与实现[J].信息与电脑,2021,33(3):96-98.
7郑芯瑜,刘必林,孔祥洪,王雪辉.基于K-means动态聚类的鸢乌贼角质颚模式识别[J].渔业科学进展,2021,42(4):64-72. 被引量：1
8瞿珊珊,康顺.一种基于多分类器集成的地表覆盖信息提取方法[J].湖北理工学院学报,2021,37(4):25-28. 被引量：1
9Yaping Fu,Yushuang Hou,Zifan Wang,Xinwei Wu,Kaizhou Gao,Ling Wang.Distributed Scheduling Problems in Intelligent Manufacturing Systems[J].Tsinghua Science and Technology,2021,26(5):625-645. 被引量：11
10黄菁彦,谈一帆,熊伟,李沛秦.《人工智能与模式识别》课程案例式教学实践[J].中国教育信息化,2021,27(22):84-87. 被引量：2

1巴子言,吴军,马严.基于虚节点的一致性哈希算法的优化[J].软件,2014,35(12):26-29. 被引量：10
2Junaid Arshad,Paul Townend.An Automatic Intrusion Diagnosis Approach for Clouds[J].International Journal of Automation and computing,2011,8(3):286-296. 被引量：2
3郭洪,王监梁.基于云自适应遗传算法的LVS负载均衡改进算法[J].计算机系统应用,2012,21(11):54-57.
4张明旺,汪松松.基于私有云理论的学校服务器的探索[J].科技信息,2011(21). 被引量：1
5吴海双,张亮,李杰辉.IaaS环境中一种保证SLA的资源调度策略[J].计算机工程,2013,39(7):51-54. 被引量：5
6权钰云.一种多功能液压试验台控制系统设计[J].价值工程,2015,34(9):51-53. 被引量：1
7冯小军,朱华双,宁仲良.数控铣削模糊自适应控制系统[J].组合机床与自动化加工技术,2004(7):73-76. 被引量：6
8吴佳,苏丹,李环媛,袁卫国.一种基于交互式的Hadoop作业调度算法[J].计算机技术与发展,2016,26(11):45-48. 被引量：1
9SEBASTIAN COHEN.Investing in the Clouds[J].China International Business,2011(5):26-27.
10陈曙东,Zhang Wenju　,Zhang Jun 　,Ma Fanyuan,Shen Jianhua.A Chord-based resource scheduling approach in drug discovery grid[J].High Technology Letters,2007,13(1):36-41.

Tsinghua Science and Technology

2017年第2期

浏览历史

内容加载中请稍等...