期刊文献+

Study and Analysis of the High Performance Computing Failures in China Meteorological Field

Study and Analysis of the High Performance Computing Failures in China Meteorological Field
下载PDF
导出
摘要 China Meteorological Administration (CMA) has a long history of using High Performance Computing System (HPCS) for over three decades. CMA HPCS investment provides reliable HPC capabilities essential to run Numerical Weather Prediction (NWP) models and climate models, generating millions of weather guidance products daily and providing support for Coupled Model Inter-comparison Project Phase 5 (CMIP5). Monitoring the HPCS and analyzing the resource usage can improve the performance and reliability for our users, which require a good understanding of failure characteristics. Large-scale studies of failures in real production systems are scarce. This paper collects, analyzes and studies all the failures occurring during the HPC operation period, especially focusing on studying the relationship between HPCS and NWP applications. Also, we present the challenges for a more effective monitoring system development and summarize the useful maintenance strategies. This step may have considerable effects on the performance of online failure prediction of HPC and better performance in future. China Meteorological Administration (CMA) has a long history of using High Performance Computing System (HPCS) for over three decades. CMA HPCS investment provides reliable HPC capabilities essential to run Numerical Weather Prediction (NWP) models and climate models, generating millions of weather guidance products daily and providing support for Coupled Model Inter-comparison Project Phase 5 (CMIP5). Monitoring the HPCS and analyzing the resource usage can improve the performance and reliability for our users, which require a good understanding of failure characteristics. Large-scale studies of failures in real production systems are scarce. This paper collects, analyzes and studies all the failures occurring during the HPC operation period, especially focusing on studying the relationship between HPCS and NWP applications. Also, we present the challenges for a more effective monitoring system development and summarize the useful maintenance strategies. This step may have considerable effects on the performance of online failure prediction of HPC and better performance in future.
出处 《Journal of Geoscience and Environment Protection》 2017年第12期28-40,共13页 地球科学和环境保护期刊(英文)
关键词 HPC FAILURE NUMERICAL WEATHER Prediction Real-Time Monitoring RESOURCE Management HPC Failure Numerical Weather Prediction Real-Time Monitoring Resource Management
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部