期刊文献+

大规模InfiniBand网络自学习的故障诊断方法 被引量:2

Incremental learning method for fault diagnosis in large-scale InfiniBand network
下载PDF
导出
摘要 针对大规模数据中心网络中如何有效监控网络异常事件、发现网络性能瓶颈和潜在故障点等问题,在深入分析InfiniBand(IB)网络的特性,引入了特征选取策略和增量学习策略的基础上,提出了一种面向大规模IB网络增量学习的故障诊断方法 IL_Bayes,该方法以贝叶斯分类方法为基础,加入增量学习机制,能够有效提高故障分类精度。在天河2真实的网络环境下,对算法的诊断精度和误诊率进行了验证,结果表明IL_Bayes算法具有较高的故障分类精度和较低的误诊率。 Aiming at how to effectively monitor the network abnormal events, find the bottleneck of network performance and potential point of failure in large-scale data center network, based on the deep analysis of the characteristics of InfiniBand (IB) network and introducing the feature selection strategy and incremental learning strategy, an incremental learning method of fault diagnosis for large-scale IB network ( IL_Bayes) which based on the Bayes classification and added incremental learning mechanism was proposed. It could effectively improve the accuracy of fault classification. Through testing and verifying the diagnostic accuracy and the rate of misdiagnosis of this method in the Tianhe-2' s real network environment, the result shows that the IL_Bayes method has higher classification accuracy and lower misdiagnosis rate.
作者 胡银辉 陈琳
出处 《计算机应用》 CSCD 北大核心 2015年第11期3092-3096,共5页 journal of Computer Applications
基金 国家863计划项目(2012AA01A50606)
关键词 数据中心 INFINIBAND 故障诊断 贝叶斯分类 增量学习 data center InfiniBand fault diagnosis Bayes classification incremental learning
  • 相关文献

参考文献11

  • 1邓罡,龚正虎,王宏.现代数据中心网络特征研究[J].计算机研究与发展,2014,51(2):395-407. 被引量:44
  • 2沈力. InfiniBand网络接口的研究与实现[D]. 长沙:国防科学技术大学,2010: 1-3.
  • 3Oracle Corporation. An oracle white paper: consolidating Oracle applications on exalogic[EB/OL].[2015-03-22].http://www.oracle.com/ us/products/middleware/app-consolidation-exalogic-395610.pdf.
  • 4ABTS D, MARTY M R, WELLS P M, et al. Energy proportional datacenter networks[C]// Proceedings of the 37th Annual International Symposium on Computer Architecture. New York: ACM, 2010:338-347.
  • 5Mellanox Corporation. Mellanox solution brief: Mellanox low latency, high bandwidth InfiniBand for Web 2.0 and cloud deployments[EB/OL].[2015-03-22].http://www.mellanox.com/related-docs/company/MLNX_Corp_Inv_deck.pdf.
  • 6OUSTERHOUT J, AGRAWAL P, ERICKSON D, et al. The case for RAM clouds: Scalable high performance storage entirely in DRAM[J]. ACM SIGOPS Operating Systems Review, 2009,243(4): 92-105.
  • 7SONG H, QIU L, ZHANG Y. A flexible framework for large-scale network measurement[J]. IEEE/ACM Transactions on Networking, 2009, 17(1):106-119.
  • 8郑秋华,姚敏,钱沄涛.基于拉格朗日松弛和次梯度法的网络故障定位新方法[J].系统工程理论与实践,2008,28(11):155-164. 被引量:4
  • 9戚涌.计算机网络智能诊断技术研究[D].南京理工大学,2004,11:1-5.
  • 10宣恒农,张润驰,左苗,刘田田.面向数据中心网络的分层式故障诊断算法[J].电子学报,2014,42(12):2536-2542. 被引量:7

二级参考文献41

  • 1吴吉义,沈千里,章剑林,沈忠华,平玲娣.云计算:从云安全到可信云[J].计算机研究与发展,2011,48(S1):229-233. 被引量:54
  • 2柴慧敏,王宝树.用于态势估计的一种构造贝叶斯网络参数的方法[J].计算机科学,2006,33(9):140-142. 被引量:6
  • 3宣恒农,韩忠愿,张大方.基于互测PMC模型的故障诊断方法及其应用[J].电子学报,2007,35(5):987-990. 被引量:11
  • 4Frontinl M, Griffin J, Towers S. A knowledge-based system for fault localization in wide area networks[C]//In Integrated Network Management, Ⅱ. North-Holland: Amsterdam: 1991, 519-530.
  • 5Bandini S, Bogni D, Manzoni S. Knowledge-based alarm correlation in traffic monitoring and conuol[C]//Proceedings of ITSC02, September 3 - 6,2002, 702 - 707.
  • 6Lehmann A, Diessel T, Seibold C, et al. Knowledge-based alarm surveillance for TMN[C]//Proceedings of IEEE 15th Annual International Phoenix Conference on Computers and Communications, Scottsdale, AZ, USA, 1996,494- 500.
  • 7Osmani A, Krief F. Model-based diagnosis for fault management in ATM networks[C]//ICATM'99, 1999:91 - 99.
  • 8Steimann F, Fr P, Nejdl W. Model-based diagnosis for open systems fault management[J]. AI Communications, 1999, 12(1 - 2) : 5 - 17.
  • 9Frohlich P, Nejdl W, Jobmann K, et al. Model-based alarm correlation in cellular phone networks[ C]//Proceedings of the Int Symp on Modelling Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS'97), Haifa, Israel, 1997, 197- 204.
  • 10Lewis L. A case-based reasoning approach to the resolution of faults in communication networks[C]//Proceedings of IEEE Infocom, 1993,1422- 1429.

共引文献53

同被引文献11

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部