摘要
工作站机群系统已成为并行处理发展的主流方向之一 .随着机群系统应用领域的逐渐拓展和规模的不断扩大 ,人们对其可用性的要求日益提高 .设计高可用的机群系统 ,需要着重研究其系统重构技术 .本文主要论述工作站机群系统重构模型、系统状态的保存及恢复、故障的检测等关键技术 ;并结合我们开发研制的ChaRM(Checkpoint-basedRollbackRecoveryandMigrationSystem)系统 ,介绍工作站机群重构机制的设计与实现技术 .
Cluster of Workstations (COW) now becomes one of the leading technologies in the field of parallel processing.To implement the COW with high availability,it is necessary to research its system reconfiguration technique.The paper first describes the reconfiguration model of COW,the checkpointing and rollback recovery mechanism and the fault detection. On the basis of this,we introduce ChaRM system,a Checkpointing based Rollback Recovery and Migration System which is implemented by authors,and brings the main design traits to achieve the high availability system.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2000年第5期13-16,共4页
Acta Electronica Sinica
基金
国家高技术计划 863!(No .863 30 6 ZD0 1 0 2 0 1 )支持课题
国家基础研究发展规化!(No .G1 9990 32 70 2 )
关键词
工作站机群
并行处理
计算机
availability
reconfiguration
cluster of workstations (COW)
checkpointing