期刊文献+

气象高性能计算机故障监控系统的设计与实现 被引量:2

Design and implementation of fault monitoring system for meteorological HPC
下载PDF
导出
摘要 气象高性能计算机在大幅提升气象部门数值预报能力的同时,给运维工作也带来了挑战。提出了一套轻量级,可拓展的高性能计算机故障监控系统设计方案,详细描述了系统的开发过程,以及开发过程中涉及问题的解决办法。系统部署后,宁波气象高性能计算机故障业务影响率从60%降低到10%以下。实践证明,该系统符合宁波气象的实际需求,也为高性能计算机故障监控系统的深入研究和开发提供了思路。 The meteorological HPC(high-performance computer) has greatly improved the numerical weather prediction(NWP)ability of meteorological department, and has also brought challenges to the operation and maintenance. In this paper, a lightweight and scalable design scheme of HPC fault monitoring system is proposed. The development process of the system and the solution to the problems involved in the development process are described in detail. After the deployment of the system, the business impact failure rate of Ningbo meteorological HPC has been decreased from 60% to below 10%. Practice has proved that the system conforms to the actual demand of Ningbo meteorological, and also provides a train of thought for the further research and development of HPC fault monitoring system.
作者 许皓皓 李从初 姚浩立 徐振宇 Xu Haohao Li Congchu Yao Haoli Xu Zhenyu(Ningbo Meteorological Network and Equipment Support Center, Ningbo, Zhejiang 315012, China)
出处 《计算机时代》 2017年第8期90-93,共4页 Computer Era
关键词 高性能计算机 气象 故障监控 监控系统 HPC meteorological fault monitoring monitoring system
  • 相关文献

参考文献10

二级参考文献133

共引文献65

同被引文献16

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部