摘要
传统集群网络(clusterareanetwork,简称cLAN)的评测模型主要考虑了延迟、带宽、路由、拥塞、网络拓扑结构等因素.但这些因素是否足以描述实际应用程序在集群上的通信行为,或者对其在集群系统上的性能给出一个很好的预测呢?当对NASParallelBenchmark(2.4版本)在集群系统深腾1800(DeepComp1800)上进行大量测试时发现,集群网络的通信性能可以被一种特殊的通信模式(LU模式)所严重影响.更深入的研究表明,这个影响LU模式的因素是独立于前面所述的如延迟、带宽、路由、拥塞、网络拓扑结构等因素的.因此有必要对集群网络的评测模型重新进行审视,并增加一个新的性能评测因子以反映这个新发现的现象.从研究结果来看,这个重新审视也将对集群系统上的并行算法设计以及实际大规模科学计算的应用程序性能的优化提供一些新的思路.
Traditional Cluster Area Network (cLAN)'s evaluation model takes only latency, bandwidth, routing, congestion, network topology and some related aspects into consideration. Are these factors ENOUGH to describe the real applications' communication behavior or predict its performance on cLAN-In the large quantity of NAS Parallel Benchmarks' tests (version 2.4) on a modern supercomputer-DeepComp 1800, which is of LINUX Cluster architecture, it is found that the real performance of cLAN could be greatly affected by a special communication pattern (LU pattern). Further investigation reveals that the cLAN's capacity of dealing with LU mode is independent of the known performance factors such as latency, bandwidth and so on. So it is necessary to take some new considerations on cLAN's evaluation model and add one new factor to reflect the abnormal phenomenon. The new model also provides some challenges in parallel algorithm design and application performance improvement on the LINUX Cluster.
出处
《软件学报》
EI
CSCD
北大核心
2005年第6期1131-1139,共9页
Journal of Software
基金
国家自然科学基金
国家高技术研究发展计划(863)
国家重点基础研究发展规划(973)~~