摘要
与串行程序相比,并行程序调试会遇到新的问题。首先并行程序往往需要长时间运行,从而导致并行程序调试是一个尤其费时的过程;其次并行程序调试过程中,某一次调试出现的错误在下次调试的时候不一定出现,给错误跟踪带来了很大困难。本文针对这两个问题,设计和实现了一个中间件系统,在并行调试工具XMPI中使能BLCR检查点系统的。通过该中间件,在使用XMPI调试大型MPI并行程序的时候,减少调试阶段并行程序运行时间,并且可以更好跟踪并行程序错误,提高并行程序开发效率。
Compared with serial programs, parallel programs debugging will face some new problems. Firstly during the parallel program debugging, they are usually needed to run a long time to find the bugs, which is time wasteful, Secondly the problems occurring in a debugging maybe won't happen in the next time, which leads to great difficulty to track the problems. This paper focuses on these two problems, and designs and implements an enabling component which enables the BLCR checkpoint and restart system in the parallel debugging tool XMPI. During debugging MPI parallel programs in XMPI, the debugging time is cut shortly and the faults can be tracked, and the efficient of developing parallel programs is improved.
出处
《计算机应用与软件》
CSCD
北大核心
2005年第10期17-18,20,共3页
Computer Applications and Software
基金
国家自然科学基金项目(No.604030347)
上海市科学技术委员会资助项目(No.03dz15026
03dz15027
035115029)