Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the s...Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively.展开更多
GRAPES(global/regional assimilation and prediction system)数值天气预报模式作为地球大气一个典型的非线性化离散系统,计算量非常巨大,因此利用低成本、低功耗和高性能的GPU对GRAPES模式进行并行加速成为目前的研究热点.首先通过实...GRAPES(global/regional assimilation and prediction system)数值天气预报模式作为地球大气一个典型的非线性化离散系统,计算量非常巨大,因此利用低成本、低功耗和高性能的GPU对GRAPES模式进行并行加速成为目前的研究热点.首先通过实现GRAPES模式在GPU中的并行加速,发现系统性能提升并不理想.在此基础上,提出了性能优化策略,包括缓解数据传输时间、降低设备内存加载和存储的数量和避免线程控制流分支,实验结果表明,利用GPU的性能优化策略有效地提升了GRAPES系统性能.展开更多
物联网作为国内外新兴的热门技术,正在深刻地影响着人们的生产生活,它在带来诸多好处的同时也给信息存储领域带来挑战.物联网信息存储中心需要根据其数据特性结合分布式实时数据库信息存储管理的优点,设计与之相适应的数据存储方案,而...物联网作为国内外新兴的热门技术,正在深刻地影响着人们的生产生活,它在带来诸多好处的同时也给信息存储领域带来挑战.物联网信息存储中心需要根据其数据特性结合分布式实时数据库信息存储管理的优点,设计与之相适应的数据存储方案,而数据分配策略作为数据存储方案的关键技术是研究的重点.根据物联网传感器信息的海量性、时空相关性、访问失衡性和连续变化性,需要一种基于时域的数据分配模型与之相适应,以此设计出基于自适应时域负载反馈的动态数据分配策略(adaptive time domain data allocation,ATDA).策略根据数据特征,将静态数据分配问题归约成简单线性规划问题,同时采用自适应时域对负载信息进行反馈,最后设置动态负载门限函数实现数据的动态分配.实验表明,该策略与同类Random、Bubba算法相比,在系统短时域负载均衡(LBST)、系统数据迁移量(DM)方面具有更好的性能.展开更多
基金supported by the National High Technology Research and Development 863 Program of China under Grant No.2009ZX01029-001-002the Postdoctoral Science Foundation of China under Grant No. 20110490208
文摘Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively.
文摘GRAPES(global/regional assimilation and prediction system)数值天气预报模式作为地球大气一个典型的非线性化离散系统,计算量非常巨大,因此利用低成本、低功耗和高性能的GPU对GRAPES模式进行并行加速成为目前的研究热点.首先通过实现GRAPES模式在GPU中的并行加速,发现系统性能提升并不理想.在此基础上,提出了性能优化策略,包括缓解数据传输时间、降低设备内存加载和存储的数量和避免线程控制流分支,实验结果表明,利用GPU的性能优化策略有效地提升了GRAPES系统性能.
文摘物联网作为国内外新兴的热门技术,正在深刻地影响着人们的生产生活,它在带来诸多好处的同时也给信息存储领域带来挑战.物联网信息存储中心需要根据其数据特性结合分布式实时数据库信息存储管理的优点,设计与之相适应的数据存储方案,而数据分配策略作为数据存储方案的关键技术是研究的重点.根据物联网传感器信息的海量性、时空相关性、访问失衡性和连续变化性,需要一种基于时域的数据分配模型与之相适应,以此设计出基于自适应时域负载反馈的动态数据分配策略(adaptive time domain data allocation,ATDA).策略根据数据特征,将静态数据分配问题归约成简单线性规划问题,同时采用自适应时域对负载信息进行反馈,最后设置动态负载门限函数实现数据的动态分配.实验表明,该策略与同类Random、Bubba算法相比,在系统短时域负载均衡(LBST)、系统数据迁移量(DM)方面具有更好的性能.