摘要
大规模并行模拟是研究大数据体系结构的重要方法,对大数据应用及众核体系结构的发展有着不可替代的推动作用.然而,目前的模拟技术不能满足大数据体系结构研究的需求,主要体现在模拟速度慢、配置过程复杂以及可扩展性差等方面.为了解决此问题,评估面向大数据应用的高通量众核体系结构的性能与功耗,该文提出了面向大数据应用的并行模拟框架——BDSim.该框架基于组件化思想,将功能组件与框架服务单元组成并行功能单元,并可根据负载情况,自由配置组件与框架服务单元之间的映射关系.为了提高组件之间的通信和同步效率,该文提出了一种非阻塞无锁通信优化方法,和一种CMB保守同步算法的优化算法——NMTRT-CMB同步算法.模拟不同并发规模的基于2D-Mesh网络的众核系统的实验结果表明,与基于锁的并行通信方法相比,框架采用的非阻塞无锁通信优化方法可以提高并行模拟速度约10%,该算法与CMB同步算法相比,NMTRT-CMB同步算法可以减少空消息数量达90%以上.
Large-scale parallel simulation is an important method for big-data architecture research,which plays an irreplaceable role in promoting big data application and many-core architecture development.However,the simulation techniques cannot meet the needs of big dataarchitecture research currently,mainly reflected in respects of low simulation speed,complicate configuration,poor scalability,and so on.To address these problems,this paper proposed BDSim,a highly configurable parallel simulation framework for big data application simulation.This framework is able to evaluate the performance and energy consumption of high throughput computing architecture which targets to big data applications.The basic idea of BDSim is based on the thought of component.In BDSim,aparallel function unit consists of several function components and a framework service(FS)unit.FS unit is the service agent for function components which are attached to it. The mapping between function components and a framework service unit is depended on loadings of function units.To improve communication efficiency,this paper proposed an optimized non-block lock-free communication method.The NMTRT-CMB synchronization algorithm based on CMB conservative synchronization algorithm was also presented to improve synchronization efficiency.The experiments were conducted with many-core architecture based on 2D-Mesh NOC under different parallel scale.According to the result,non-block lock-free communication method can help improving simulation speedup by10%,compared to communication based on locking method. NMTRT-CMB reduces null messages by almost 90% when running with 16 threads,compared to CMB.
出处
《计算机学报》
EI
CSCD
北大核心
2015年第10期1959-1975,共17页
Chinese Journal of Computers
基金
国家"九七三"重点基础研究发展规划项目基金(2011CB302501)
国家"八六三"高技术研究发展计划项目基金(2012AA010901
2015AA011204)
"核高基"国家科技重大专项基金项目(2013ZX0102-8001-001-001)
国家自然科学基金(61173007
61204047
61332009)资助~~