基于时间序列分析的杀手级任务在线识别方法被引量：2

Time Series Based Killer Task Online Recognition Approach

下载PDF

导出

摘要通过分析Google集群中任务的失效次数和失效模式,找到具有高失效频次和连续失效特征的杀手级任务。杀手级任务不仅影响云计算系统上应用运行的可靠性与可用性,而且会浪费大量资源并显著增加调度负载。在杀手级任务资源使用模式的基础上,提出一种基于时间序列的在线识别方法,以利用资源使用时间序列在失效早期准确识别出杀手级任务并通知云计算系统采取前摄性失效恢复措施,从而避免不必要的重复调度和资源浪费。实验结果表明,该方法能够以98.5%的准确率在平均3%的失效时间内识别出杀手级任务,同时节约96.75%的系统资源。 By analyzing failure frequency and failure patterns in Google cluster dataset,this paper fond what are called as killer tasks that suffer from frequent and continuous failure.Killer task is a big concern of cloud system as it causes unnecessary resource wasting and significant increase of scheduling overhead.In this paper,an online recognition approach was proposed to make use of the resource usage time series to recognize killer tasks precisely at the very early stage of their occurrence so that proactive actions can be taken to avoid rescheduling and resource wasting.The experiment results show that the proposed approach performs a 98.5% precision in recognizing killer tasks at 3% of failure duration,with a 96.75% resource saving for the cloud system averagely.

作者唐红艳李影贾统袁小雍

机构地区北京大学软件与微电子学院北京大学软件工程国家工程研究中心

出处《计算机科学》 CSCD 北大核心 2017年第4期43-46,共4页 Computer Science

基金深圳市科技计划重点项目(JSGG20140516162852628)资助

关键词云计算系统杀手级任务在线识别时间序列资源使用模式失效频率 Cloud system Killer tasks Online recognition Time series Resource usage pattern Failure frequency

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1王意洁,孙伟东,周松,裴晓强,李小勇.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986. 被引量：279
2饶翔,王怀民,陈振邦,周扬帆,蔡华,周琦,孙廷韬.云计算系统中基于伴随状态追踪的故障检测机制[J].计算机学报,2012,35(5):856-870. 被引量：23
3LIN Rongheng WU Budan YANG Fangchun ZHAO Yao HOU Jinxuan.An Efficient Adaptive Failure Detection Mechanism for Cloud Platform Based on Volterra Series[J].China Communications,2014,11(4):1-12. 被引量：6

二级参考文献21

1Candea G, Kawamoto S, Fujiki Y et al. Microreboot--A technique for cheap reeovery//Proceedings of the 6th Confer- ence on Symposium on Opearting Systems Design & Imple- mentation-Volume 6. San Francisco, USA, 2004:3.
2Lin T T Y, Siewiorek D P. Error log analysis: Statistical modeling and heuristic trend analysis. IEEE Transactions on Reliability, 1990, 39(4): 419-432.
3Yuan D, Mai H, Xiong W et al. SherLog: Error diagnosis by connecting clues from run-time logs//Proceedings of the 15th Edition of ASPLOS on Architectural Support for Pro- gramming Languages and Operating Systems. Pittsburgh, Pennsylvania, USA, 2010:143-154.
4Zheng A X, Lloyd J, Brewer E. Failure diagnosis using deci- sion trees//Proeeedings of the 1st International Conference on Autonomie Computing. Limassol, Cyprus, 2004:36-43.
5Tan J, Kavulya S, Gandhi R et al. Visual, Log-based causal tracing for performance debugging of MapReduce systems// Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems. Genoa, Italy, 2010: 795-806.
6Zheng Z, Lan Z, Park B H et al. System log pre-processing to improve failure prediction//Proceedings of the IEEE/IFIP International Conference on Dependable Systems & Net- works(DSN'09). Lisbon, Poltugal, 2009:572-577.
7Reidemeister T, Munawar M A, Jiang Met al. Diagnosis of recurrent faults using log files//Proeeedings of the 2009 Con- ference of the Center for Advanced Studies on Collaborative Research. Ontario, Canada, 2009: 12-23.
8Chen M Y, Kiciman E, Fratkin E et al. Pinpoint: Problem determination in large, dynamic internet services//Proceed- ings of the 2002 International Conference on Dependable Sys- tems and Networks. Bethesda, USA, 2002:595-604.
9Barham P, Donnelly A, Isaaes R et al. Using magpie for request extraction and workload modelling//Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation-Volume 6. San Francisco, USA, 2004:18.
10Tan P N, Steinbach M, Kumar V. Introduction to Data Mining. Bostom Pearson Addison Wesley, 2006.

共引文献305

1敖日格乐.关于云存储的关键技术分析[J].计算机产品与流通,2020,0(1):163-163. 被引量：1
2谢丽霞,汪子荧.一种在线集群异常作业预测方法[J].北京邮电大学学报,2019,42(5):62-68.
3饶庆云,丁晶晶,苏乐乐,谷永权,夏良晖,胡中南.基于云计算的分布式切图服务设计与实现[J].测绘与空间地理信息,2013,36(S1):29-35. 被引量：6
4万武南,索望,陈运,王拓.基于X-RDP阵列码的一种数据分布策略[J].通信学报,2013,34(S1):67-75. 被引量：2
5许维龙,张彦,朱洪亮,辛阳.基于HDFS的数据备份系统的设计与实现[J].信息网络安全,2012(10):59-63. 被引量：2
6张太华,何二宝,孙超.基于知识的云制造的研究现状[J].现代机械,2012(5):1-5. 被引量：5
7张胜伟.云存储中副本冗余技术的研究[J].无线互联科技,2012,9(9):33-34. 被引量：4
8钟德荣,蒋园园,张恺乐,王智泉.基于云计算的全球眼视频监控系统的设计与实现[J].计算机光盘软件与应用,2012,15(20):29-30. 被引量：6
9宋秀丽,陈龙,肖敏.云存储中支持XOR旋转编码的可恢复性验证方案[J].重庆邮电大学学报（自然科学版）,2012,24(6):682-686. 被引量：1
10王永,李敏,张勤.云计算模式下合同协同拟定模型研究[J].重庆邮电大学学报（自然科学版）,2012,24(6):708-711. 被引量：1

同被引文献2

1王意洁,孙伟东,周松,裴晓强,李小勇.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986. 被引量：279
2刘春红,韩晶晶,商彦磊.基于SVM分类的云集群失败作业主动预测方法[J].北京邮电大学学报,2016,39(5):104-109. 被引量：6

引证文献2

1谢丽霞,汪子荧.一种在线集群异常作业预测方法[J].北京邮电大学学报,2019,42(5):62-68.
2谢丽霞,汪子荧.一种分段集群异常作业预测方法[J].大连理工大学学报,2019,59(4):427-433. 被引量：1

二级引证文献1

1徐成桂,徐广顺.网络集群部署数学建模设计与仿真[J].计算机仿真,2023,40(4):392-396. 被引量：1

1王淑侠,廖达雄,王关峰.一种快速手绘草图在线识别方法[J].机械制造,2005,43(12):29-31. 被引量：5
2郑晓霞,赵俊峰,程志文,谢冰.一种WebService响应时间的动态预测方法[J].小型微型计算机系统,2011,32(8):1570-1574. 被引量：4
3黄永文,何中市,王海燕.基于时间序列分析的动态分布平滑方法[J].电子学报,2008,36(B12):147-151.
4王松波.一种基于消息认证码的身份认证算法[J].计算机系统应用,2010,19(12):123-125. 被引量：1
5陈树平.计算机网络的安全性[J].河南科学,2002,20(1):89-92. 被引量：1
6陈风.硬盘的安全测试与恢复措施[J].微型计算机,2004(16):82-85.
7郑成文,韩柯,张志强,汤伟.面向黑盒测试的软件失效特征分析[J].价值工程,2012,31(27):7-8.
8冯永祥.一种无笔序手写体汉字在线识别方法[J].计算机工程,1998,24(5):30-32.
9刘英梅.灵活恢复受损的Word文档[J].中小学电教（综合）,2010(3):77-78.
10王朝辉,苏旸.基于时间序列分析的SYN Flooding源端检测方法[J].计算机应用研究,2012,29(6):2249-2252. 被引量：2

计算机科学

2017年第4期

浏览历史

内容加载中请稍等...

基于时间序列分析的杀手级任务在线识别方法被引量：2

参考文献3

二级参考文献21

共引文献305

同被引文献2

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于时间序列分析的杀手级任务在线识别方法 被引量：2

参考文献3

二级参考文献21

共引文献305

同被引文献2

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于时间序列分析的杀手级任务在线识别方法被引量：2