期刊文献+

基于统计数据的超级计算机内存故障分析 被引量:1

Supercomputers Memory Faults Analysis Based on Statistical Data
下载PDF
导出
摘要 基于神威太湖之光和神威蓝光超级计算机的巨量内存故障统计数据,建立P级超级计算机的内存失效时间模型。采用序列规则挖掘方法,分析内存失效序列模式,得到CPU节点上内存失效序列与后续内存失效的关联关系。通过协同分析方法研究并行应用的内存故障与内存失效特征,结果表明计算-访存-I/O密集型应用对内存故障影响较大,而应用类型对内存失效的影响有限,内存失效可能与内存芯片自身的可靠性有关。 Based on the massive amount of statistical data about memory faults on Sunway TaihuLight and Sunway BlueLight supercomputers,the memory failure time model for Petascale supercomputers is built.By sequential rule mining,the sequential pattern of memory failures is analyzed and the correlation relationship between memory failure sequences and the following memory failure on CPU nodes is found.The characteristics of memory faults and failures on parallel applications are studied by the co-analysis method.Results show that computing-memory-I/O intensive applications have large impact on memory faults while the type of applications has limited impact on memory failures,which,however,may have correlation relationship with the reliability of memory chips.
作者 刘睿涛 陈左宁 LIU Ruitao;CHEN Zuoning(State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi,Jiangsu 214215,China;National Research Center of Parallel Computer Engineering and Technology,Beijing 100190,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第5期35-45,共11页 Computer Engineering
基金 国家重点研发计划(2016YFB0200502)
关键词 超级计算机 内存故障 内存失效 统计数据 失效模型 关联关系 协同分析 supercomputer memory fault memory failure statistical data failure model correlation relationship co-analysis
  • 相关文献

参考文献2

二级参考文献6

共引文献1

同被引文献3

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部