摘要
使用Hadoop构建的云平台已经得到广泛使用,如Amazon、Yahoo、Facebook等。集群的稳定性和可靠性对于云平台的服务质量有着重要的影响,随着企业信息化在生产实时检测、海量存储和科学分析决策等方面的需求不断提升,集群故障监控也越来越重要。PDM(Integrated Parallel Mining)是中国移动的商务智能应用需求为背景,旨在针对海量数据提供高效、准确、便捷的数据分析服务,能够对Hadoop集群进行性能监控并且进行故障告警是非常重要的。Ganglia和Nagios在集群故障监控方面各有优势,将两者的优势结合,结合企业项目设计出了一个相对完整的集群故障监控平台。
The cloud platform based on hadoop has been widely used, such as Amazon, Yahoo, Facebook and so on. Stablity and reliability of the cluster is very signiifcant for the serivce quality of the cloud platform. With the needs of enterprise information in real-time detection, the mass storage and scientiifc analysis improve, the fault monitorning of the cluster is also becoming increasingly important. PDM(Integrated Parallel Mining) is based on the needs of China Mobile's business intelligence applications, it is designed to provide efifcient, accurate and convenient data analysis services for massive data. It’s very meaningful to carry out the performance and fault alarm of the hadoop platform. Ganglia and Nagios have their own advantages in the cluster fault monitoring, to combine the advantages of both, I designed a relatively complete cluster fault monitoring platform combined enterprise project.
出处
《软件》
2013年第12期73-77,共5页
Software
关键词
计算机应用
监控
故障
Hadoop
Ganglia
Nagios
Computer Application
Hadoop
Ganglia
Nagios
monitoring
fault