Optimizing checkpoint for scientific simulations
Optimizing checkpoint for scientific simulation
摘要
It is extremely time-consuming to restart a long-running simulation from the beginning when a failure occurs.Checkpointing is a viable solution that enables simulations to be resumed from the point of failure.We study three models to determine the optimal checkpoint interval between contiguous checkpoints so that the total execution time is minimized and we demonstrate that optimal checkpointing can facilitate self-optimizing.This study greatly advances our knowledge of and practice in optimizing long-running scientific simulations.
It is extremely time-consuming to restart a long-running simulation from the beginning when a failure occurs.Checkpointing is a viable solution that enables simulations to be resumed from the point of failure.We study three models to determine the optimal checkpoint interval between contiguous checkpoints so that the total execution time is minimized and we demonstrate that optimal checkpointing can facilitate self-optimizing.This study greatly advances our knowledge of and practice in optimizing long-running scientific simulations.
基金
Project supported by the National Science Foundation of USA
the Information Technology Research (ITR/AP-DEB) (No. 0112820)
参考文献17
-
1Cao, T., Vaz Salles, M., Sowell, B., Yue, Y., Demers, A., Ge?hrke, J., White, w., 2011. Fast Checkpoint Recovery Al?gorithms for Frequently Consistent Applications. Proc. ACM SIGMOD Int. Conf. on Management of data, p.265-276. [doi:10.1145/1989323.1989352].
-
2Chandy, K., 1975. A survey of analytic models for rollback and recovery strategies. Computer, 8(5):40-47. [doi: 1 0.11 09/ C-M.1975.218955].
-
3Duda, A., 1983. The effects of checkpointing on program execution times. In}: Process. Lett., 16(5):221-229. [doi: 10.1016/0020-0190(83)90093-5].
-
4Gelenbe, E., Hernandez, M., 1990. Optimum checkpoints with age dependent failures. Acta Inf., 27(6):519-531. [doi: 10.1007/BF00277388].
-
5Grassi, v., Donatiello, L., Tucci, S., 1992. On the optimal checkpointing of critical task and transaction-oriented systems. IEEE Trans. Software Eng., 18(1):72-77. [doi:10. 1109/32.120317].
-
6Huang, Y., Madey, G., 2005. Autonomic Web-Based Simula?tions. Proc. 38th Annual Simulation Symp., p.160-l67. [doi: 1 0.11 09/ANSS.2005.15].
-
7Huang, Y., Xiang, X., Madey, G., 2004. A Self Manageable Infrastructure for Supporting Web-Based Simulations. Proc. 37th Annual Simulation Symp., p.149-156. [doi:10. 1109/SIMSYM.2004.1299478].
-
8Ji, Y., Jiang, H., Chaudhary, v., 2011. A heuristic checkpoint placement algorithm for adaptive application-level checkpointing. Int. J. Appl. Sci. Technol., 1(6):50-61.
-
9Kohl, J., Papadopoulas, P., 1998. Efficient and Flexible Fault Tolerance and Migration of Scientific Simulations Using CUMULVS. Proc. SIGMETRICS Symp. on Parallel and Distributed Tools, p.60-71. [doi:10.1145/281035.281042].
-
10Kulkarni, VG., Nicola, VF., Trivedi, K.S., 1990. Effects of checkpointing and queuing on program performance. Commun. Stat. Stoch. Models, 6(4):615-648. [doi:10. 1080/15326349908807166].
-
1吴海洋.内河省际检查站RFID核查系统方案[J].交通与港航,2015,2(4):54-58.
-
2关庆佳.边检勤务指挥管理平台[J].中国公共安全,2014(12):142-144. 被引量:2
-
3MorphoDetection公司与加拿大软件开发商达成合作伙伴关系,加强机场检查站的检测功能[J].A&S(安全&自动化),2012(12):36-36.
-
4岁丰.新型自动装配线[J].管理观察,1996,0(8):36-36.
-
5开学装机硬件推荐[J].微型计算机,2012(24):84-87.
-
6方茁,邹运英.基于互联网的治超站远程视频监控系统[J].微计算机信息,2010,26(28):54-56.
-
7新武器生命探测仪“电子门警”亮相[J].电脑知识与技术(数字社区智能家居),2006(8):7-7.
-
8中国拟对原产于日本等地的三氯乙烯实施反倾销措施[J].浙江化工,2011,42(8):36-36.
-
9趋势、突破与创新[J].工业设计,2006,0(1):22-32.
-
10李勇.中越边境边防检查站信息网络系统防雷工程的设计[J].气象研究与应用,2012,33(A01).