期刊文献+

面向监听一致性协议的并发内存竞争记录算法

A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence
下载PDF
导出
摘要 内存竞争记录是解决多核程序执行不确定性的关键技术,然而现有点到点的内存竞争记录机制带来的硬件开销大,难以应用到实际的片上多核处理器系统中.以降低点到点内存竞争记录方式的硬件开销为出发点,为采用监听一致性协议的片上多核处理器(chip multiprocessor,CMP)系统设计了基于并发记录策略的点到点内存竞争记录算法.该记录算法将两两线程间点到点的内存竞争关系扩展到所有线程,采用分布式记录方法为每个线程记录一个由内存竞争关系的一方构成的内存竞争日志;重演时采用简化的生产者消费者模型,确保了确定性重演的实现,有效降低了硬件消耗和带宽开销.在8核处理器系统中的仿真结果表明,该并发式点到点内存竞争记录算法为每个处理器核添加硬件资源约171B,每千条内存操作指令记录日志大小约2.3B,记录和重演阶段均添加不到1.5%的带宽开销. Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs.Because of high hardware overhead,the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors.In order to reduce the hardware overhead of point-to-point logging approach,a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed.This algorithm records the current execution points of all threads concurrently when detecting a memory conflict.It extends the point-topoint memory race relationship between two threads to all threads in recording phase,reducing hardware overhead significantly.It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log.When replaying,this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay,improving replay speed and reducing coherence bandwidth overhead.The simulation results on 8-core chip multiprocessor(CMP)system show that this concurrent recording algorithm based on point-to-point logging approach adds about171 Bhardware for each processor,and records about 2.3Blog per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第6期1238-1248,共11页 Journal of Computer Research and Development
基金 国家自然科学青年基金项目(61502123) 国家自然科学基金项目(61173024) 国家"九七三"重点基础研究发展计划基金项目(2011CB302501) 黑龙江省青年科学基金项目(QC2015084) 中国博士后科学基金项目(2015M571429)~~
关键词 片上多核处理器 多核程序 确定性重演 内存竞争记录 内存冲突检测 监听一致性协议 chip multiprocessor(CMP) multi-core program deterministic replay memory race recording memory conflict detection snoop-based coherence protocol
  • 相关文献

参考文献20

  • 1Aciicmez O, Seifert J. Cheap hardware parallelism implies cheap security [C] //Proc of the 4th Workshop on FDTC 2007. Los Alamitos, CA: IEEE Computer Society, 2007.. 80-91.
  • 2Xu M, Bodik R, Hill M D. A "light data reeorder" for enabling full system multiproeessor deterministic replay [C] //Proc of the 30th Int Symp on Computer Architecture (ISCA'03). New York= ACM, 2008:122-135.
  • 3Montesinos P, Hicks M, King S T, et al. Capo: A software- hardware interface for practical deterministic multiprocessor replay [C] //Proe of the 14th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). New York= ACM, 2009= 23-84.
  • 4Nima H, Josep T. Replay debugging: Leveraging record and replay for program debugging [C]//Proc of the 41st Int Symp on Computer Architecture (1SCA'14). New York= ACM, ZOI4:455-456.
  • 5Xu M, Bodik R, Hill M D. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). New York: ACM, 2006:49-60.
  • 6Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C] //Proc of the 35th Int Syrup on Computer Architecture (ISCA'08). New York: ACM, 2008:265-267.
  • 7Pokam G, Pereira C, Danne K, et al. Architeeting a chunk- based memory race recorder in modern CMPs [C] //Proe of the 42nd Int Syrup on Mieroarchitecture (MICRO'09). New York: ACM, 2009:576-585.
  • 8Arkaprava B, Jayaram B, Hill M D. Karma= Sealable deterministic reeord-rcplay [C] //Proe of the Int Conf on Supercomputing (ICS'll). New York= ACM, 2011= 359- 368.
  • 9朱素霞,季振洲,刘涛,王庆,张浩.面向多核程序确定性重演的内存竞争记录机制研究[J].电子学报,2011,39(12):2748-2754. 被引量:3
  • 10Zhu Suxia, Ji Zhenzhou, Liu Tao, et al. CCTR: An efficient point to-point memory race recorder implemented in chunks [J], Microprocessors and Microsystems, 2012, 36(6).. 510- 519.

二级参考文献34

  • 1Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量:52
  • 2胡伟武,赵继业,钟石强,杨旭,Elio Guidetti,吴永强.Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology[J].Journal of Computer Science & Technology,2007,22(1):1-14. 被引量:14
  • 3C M Pancake, R Netzer.A bibliography of parallel debuggers, 1993 edition[ A] .Proceedings of the ACM/ONR Workshop on Parallel and Distdbuted Debugging (PADD) [ C ]. New York, USA: ACM, 1993.169 - 186.
  • 4T J Leblanc, J M Mellor-Crummey. Debugging paraUel pro- grams with instant replay [ J ]. IEEE Transactions on Comput- ers, 1987, C-36(4) :471 - 482.
  • 5L Lewouw, K Audenaert. Minimizing the log size for execution replay of shared-memory programs[ A]. Thaird Joint Internation- al Conference on Vector and Parallel Processing[ C ]. Linz, Austria: Springer-Vedag, 1994.76 - 87.
  • 6D Lucchetti, S K Reinhardt, P M Chen. ExtraVirt:detecting and recovering from transient processor faults[ A ]. 2005 Symp on Operating System Principles Work-in-Progress Session [ C ]. Bdehton. United Kingdom: ACM.2005.1 - 8.
  • 7S Srinivasan, S Kandula, C Andrews, Y Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging [ A ]. Proceedings of the USENIX Annual Technical Conference [ C ]. Boston, Madison, USA: USENIX, 2(104.29 - 44.
  • 8R H B Netzer. Optimal tracing and replay for debugging shared-memory parallel programs[ A] .Proc of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD) [C]. San Diego, California, USA: ACM, 1993.1 - 11.
  • 9M Xu,R Bodik, M D Hill. A flight data recorder for enabling filll-system multiprocessor deterministic replay[ A]. Proc of the 30th Annual International Symposium on Computer Architecture [ C]. San Diego, CA: ACM, 2003.122 - 133.
  • 10M Prvulovic. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection[ A]. Proc of the 12th IEEE Symp on High-Performance Computer Architecture[ C]. New York, USA: IEEE Computer Society,2006. 232 - 243.

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部