摘要
信息技术的迅猛发展使得单机服务器已很难满足企业和科研中多租户、多任务的计算需求,如何有效地组织和协调多机进行服务同时屏蔽底层实现细节、减少用户的学习和使用成本则是当前分布式系统研究的难点和重点。为了实现多机之间的分布式作业调度,减少运维和学习成本,本文设计与实现一种轻量级的分布式作业管理系统,理论与实践结果表明该系统能够有效地完成多机之间任务的调度和执行,具备良好的容错能力和可扩展性。
The rapid development of information technology makes it difficult for stand-alone servers to meet the multi-tenant and multi-task computing needs of enterprises and scientific research.How to effectively organize and coordinate multi-machine services while shielding the underlying implementation details and reducing the user’s learning and usage costs is the difficulties and priorities of current distributed system research.In order to realize distributed job scheduling between multiple machines and reduce operation and maintenance costs,this paper designs and implements a lightweight distributed job management system.The theoretical and practical results show that the system can effectively complete the job scheduling between multiple machines with fault tolerance and scalability.
作者
张裕
牛北方
Zhang Yu;Niu Beifang(Computer Network Information Center of Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《科研信息化技术与应用》
2019年第1期31-37,共7页
E-science Technology & Application
关键词
分布式作业管理系统
作业调度
轻量级
主从模式
WEB监控
distributed job management system
job scheduling
lightweight
master-slave mode
web monitoring