摘要
针对大规模数据中心网络中如何有效监控网络异常事件、发现网络性能瓶颈和潜在故障点等问题,在深入分析InfiniBand(IB)网络的特性,引入了特征选取策略和增量学习策略的基础上,提出了一种面向大规模IB网络增量学习的故障诊断方法 IL_Bayes,该方法以贝叶斯分类方法为基础,加入增量学习机制,能够有效提高故障分类精度。在天河2真实的网络环境下,对算法的诊断精度和误诊率进行了验证,结果表明IL_Bayes算法具有较高的故障分类精度和较低的误诊率。
Aiming at how to effectively monitor the network abnormal events, find the bottleneck of network performance and potential point of failure in large-scale data center network, based on the deep analysis of the characteristics of InfiniBand (IB) network and introducing the feature selection strategy and incremental learning strategy, an incremental learning method of fault diagnosis for large-scale IB network ( IL_Bayes) which based on the Bayes classification and added incremental learning mechanism was proposed. It could effectively improve the accuracy of fault classification. Through testing and verifying the diagnostic accuracy and the rate of misdiagnosis of this method in the Tianhe-2' s real network environment, the result shows that the IL_Bayes method has higher classification accuracy and lower misdiagnosis rate.
出处
《计算机应用》
CSCD
北大核心
2015年第11期3092-3096,共5页
journal of Computer Applications
基金
国家863计划项目(2012AA01A50606)