期刊文献+

基于硬件签名的循环式内存竞争记录算法 被引量:2

A Cyclic Memory Race Recording Algorithm Implemented with Hardware Signatures
下载PDF
导出
摘要 多核程序的执行存在不确定性,内存竞争记录是实现多核程序确定性重演的关键技术.针对现有内存竞争记录机制记录日志较大、重演速度受限等问题,提出了一种新型的循环式点到点内存竞争记录算法.该算法用当前发生序表示内存冲突,用硬件签名实现冲突检测,无需修改原有的cache结构;引入冲突方向检测机制,约减连续同向的当前发生序,记录循环发生序到内存竞争日志.该算法中,内存竞争日志中所记录的任意两线程间的内存竞争呈循环状,大大减少了冗余,并用增量计数器优化循环发生序,更大程度上减小了内存竞争日志.仿真结果表明该算法能够在引入较少硬件资源的前提下有效地减小内存竞争日志.同时,内存竞争日志也具有较好的可扩展性. Shared-memory multithreaded programs running on chip multiprocessors tend to be nondeterministic. Two-phase deterministic record-replay is an effective approach to resolve this problem. Memory race recording is the key technology to replay multithreaded programs deterministically. It is significant to develop an efficient memory race recording scheme with both low log growth rate and rapid replay speed. A cyclic memory race recording algorithm based on point-to- point logging approach, named CyelicMR, is proposed. CyclicMR presents each memory race by using a new current dependency, uses hardware signatures with small size to detect memory races instead of cache memory, reduces the continuous memory races with same direction by a conflict direction detecting mechanism, and records an innovative cyclic dependency which can achieve much more transitivity. In this algorithm, all memory races recorded between two threads are loop-shaped, significantly reducing the redundancy of memory races. At the same time, cyclic dependency is further optimized by an incremental instruction counter, and the size of memory race is reduced a lot. Using an 8-core chip multiprocessor system, an exact comparison with earlier mainstream approaches is performed. The analysis results show that CyclicMR achieves small log growth rate, low hardware overhead and low bandwidth overhead. And it also has good scalability in memory race log.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第5期1149-1157,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61173024) 国家“九七三”重点基础研究发展计划基金项目(2011CB302501)
关键词 片上多核处理器 多核程序 确定性重演 内存竞争记录 冲突检测 硬件签名 chip multiprocessor multi-core program deterministic replay memory race recording conflict detection hardware signature
  • 相关文献

参考文献16

  • 1Pancake C' M,Paula S U. A bibliography of paralleldebuggers [J]. ACM SIGPLAN Notices, 1991,26(1) : 21-37.
  • 2Bhansali S, Chen W, De J S, et al. Framework forinstruction-level tracing and analysis of programs [C]//Procof the 2nd Int Conf on Virtual ExecutionEnvironments(VEE,06). New York:ACM, 2006: 154-163.
  • 3Netzer R H B. Optimal tracing and replay for debuggingshared-memory parallel programs [C]//Proc of the 1993ACM/ONR Workshop on Parallel and DistributedDebugging(PADD'93). New York: ACM, 1993: 1-11.
  • 4Srinivasan S, Kandula S, Andrews C. Flashback: Alightweight extension for rollback and deterministic replay forsoftware debugging [C]//Procof theAnnual Conf onUSH!NIX Annual Technical Conference( ATEC04 ).Berkeley : USENIX Association, 2004 : 3.
  • 5Dunlap G,Lucchetti D,Fetterman M, et al. Executionreplay of multiprocessor virtual machines [C]//Proc of the4thACM SIGPLAN/SIGOPS Int Confon Virtual ExecutionEnvironments C VEE'08). New York: ACM, 2008; 121-130.
  • 6Xu M, Bodik R, Hill M D. A “flight data recorder” forenabling full-system multiprocessor deterministic replay [C〕//Proc of the 30th Annual int Symp on ComputerArchitecture (ISCA'03). New York: ACM, 2003 : 122-135.
  • 7Prvulovic M. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection [C]//Proc ofthe 12th Int Symp on High-PerformanceComputerArchitecture (HPCA'06). New York: ACM,2006; 232-243.
  • 8Xu M, Bodik R, Hill M D. A regulated transitive reduction(RTR) for longer memory race recording [C]//Proc of the12th Int Conf on ArchitecturalSupportfor ProgrammingLanguages andOperating Systems( ASPLOS,06 ). NewYork: ACM, 2006: 49-60.
  • 9朱素霞,季振洲,刘涛,王庆,张浩.面向多核程序确定性重演的内存竞争记录机制研究[J].电子学报,2011,39(12):2748-2754. 被引量:3
  • 10Narayanasamy S,Pereira C,Calder B. Recording sharedmemory dependencies using strata [C]//Proc of the 12th IntConf on Architectural Supportfor Programming Languagesand OperatingSystems (ASPLOS,06). New York: ACM,2006: 229-240.

二级参考文献15

  • 1C M Pancake, R Netzer.A bibliography of parallel debuggers, 1993 edition[ A] .Proceedings of the ACM/ONR Workshop on Parallel and Distdbuted Debugging (PADD) [ C ]. New York, USA: ACM, 1993.169 - 186.
  • 2T J Leblanc, J M Mellor-Crummey. Debugging paraUel pro- grams with instant replay [ J ]. IEEE Transactions on Comput- ers, 1987, C-36(4) :471 - 482.
  • 3L Lewouw, K Audenaert. Minimizing the log size for execution replay of shared-memory programs[ A]. Thaird Joint Internation- al Conference on Vector and Parallel Processing[ C ]. Linz, Austria: Springer-Vedag, 1994.76 - 87.
  • 4D Lucchetti, S K Reinhardt, P M Chen. ExtraVirt:detecting and recovering from transient processor faults[ A ]. 2005 Symp on Operating System Principles Work-in-Progress Session [ C ]. Bdehton. United Kingdom: ACM.2005.1 - 8.
  • 5S Srinivasan, S Kandula, C Andrews, Y Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging [ A ]. Proceedings of the USENIX Annual Technical Conference [ C ]. Boston, Madison, USA: USENIX, 2(104.29 - 44.
  • 6R H B Netzer. Optimal tracing and replay for debugging shared-memory parallel programs[ A] .Proc of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD) [C]. San Diego, California, USA: ACM, 1993.1 - 11.
  • 7M Xu,R Bodik, M D Hill. A flight data recorder for enabling filll-system multiprocessor deterministic replay[ A]. Proc of the 30th Annual International Symposium on Computer Architecture [ C]. San Diego, CA: ACM, 2003.122 - 133.
  • 8M Prvulovic. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection[ A]. Proc of the 12th IEEE Symp on High-Performance Computer Architecture[ C]. New York, USA: IEEE Computer Society,2006. 232 - 243.
  • 9M Xu, R Bodik, M D Hill. A regulated transitive reduction (RTR) for longer memory race recording[ A].Proc of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems [C ]. San Jose, California, USA: ACM, 2006.49 - 60.
  • 10S Narayanasamy, C Pereira, B Calder. Recording shared mem- ory dependencies using stmta[ A]. Proc of the 12th Interna- tional Conference on Architectural Support for Programming Languages and Op-erating Systems[ C]. San Jose, California, USA: ACM, 2006.229 - 240.

共引文献2

同被引文献25

  • 1Aciicmez O, Seifert J. Cheap hardware parallelism implies cheap security [C] //Proc of the 4th Workshop on FDTC 2007. Los Alamitos, CA: IEEE Computer Society, 2007.. 80-91.
  • 2Xu M, Bodik R, Hill M D. A "light data reeorder" for enabling full system multiproeessor deterministic replay [C] //Proc of the 30th Int Symp on Computer Architecture (ISCA'03). New York= ACM, 2008:122-135.
  • 3Montesinos P, Hicks M, King S T, et al. Capo: A software- hardware interface for practical deterministic multiprocessor replay [C] //Proe of the 14th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). New York= ACM, 2009= 23-84.
  • 4Nima H, Josep T. Replay debugging: Leveraging record and replay for program debugging [C]//Proc of the 41st Int Symp on Computer Architecture (1SCA'14). New York= ACM, ZOI4:455-456.
  • 5Xu M, Bodik R, Hill M D. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). New York: ACM, 2006:49-60.
  • 6Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C] //Proc of the 35th Int Syrup on Computer Architecture (ISCA'08). New York: ACM, 2008:265-267.
  • 7Pokam G, Pereira C, Danne K, et al. Architeeting a chunk- based memory race recorder in modern CMPs [C] //Proe of the 42nd Int Syrup on Mieroarchitecture (MICRO'09). New York: ACM, 2009:576-585.
  • 8Arkaprava B, Jayaram B, Hill M D. Karma= Sealable deterministic reeord-rcplay [C] //Proe of the Int Conf on Supercomputing (ICS'll). New York= ACM, 2011= 359- 368.
  • 9Zhu Suxia, Ji Zhenzhou, Liu Tao, et al. CCTR: An efficient point to-point memory race recorder implemented in chunks [J], Microprocessors and Microsystems, 2012, 36(6).. 510- 519.
  • 10Zhu Suxia, Ji Zhenzhou, Wang Qing. An efficient deterministic record-replay with separate dependencies [J]. Computers 8 Electrical Engineering, 2013, 39(2): 175-189.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部