期刊文献+

Fermilab Distributed Monitoring System(NGOP)

下载PDF
导出
摘要 A Distributed Monitoring System(NGOP)that will scale to the anticipated requirements for RUn II computing has been under development at Fermilab.NGOP [1] provides a framework to create Monitoring Agents for monitoring the overall state of computers and software that are running on them.Several Monitoring Agents are available within NGOP that are capable of analyzing log files,and checking existence of system daemons,CPU and memory utilization,etc,NGOP also provides customizable graphical hierarchical representations of these monitored systems.NGOP is able to generate events when serious problems have occurred as well as raising alarms when potential problems have been detected.NGOP allows performing correctiv actions or sending notifications,NGOP provides persistent storage for collected events,alarms and actions.A first implementation of NGOP was recently deployed at Fermilab.This is a fully functional prototype that satisfies most of the existing requirements.For the time being the NGOP prototype is monitoring 512 nodes.During the first few months of running NGOP has proved to be a useful tool.Multiple problems such as node resets,offline CPUs,and dead system daemons have been detected.NGOP provided system administrators with information required for better system tuning and configuration.The current state of deployment and future steps to improve the prototype and to implement some new features will be presented.
出处 《International Conference on Computing in High Energy and Nuclear Physics》 2001年第1期102-105,共4页 高能物理与核物理计算国际会议公报(英文版)
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部