基于硬件签名的循环式内存竞争记录算法被引量：2

A Cyclic Memory Race Recording Algorithm Implemented with Hardware Signatures

下载PDF

导出

摘要多核程序的执行存在不确定性,内存竞争记录是实现多核程序确定性重演的关键技术.针对现有内存竞争记录机制记录日志较大、重演速度受限等问题,提出了一种新型的循环式点到点内存竞争记录算法.该算法用当前发生序表示内存冲突,用硬件签名实现冲突检测,无需修改原有的cache结构;引入冲突方向检测机制,约减连续同向的当前发生序,记录循环发生序到内存竞争日志.该算法中,内存竞争日志中所记录的任意两线程间的内存竞争呈循环状,大大减少了冗余,并用增量计数器优化循环发生序,更大程度上减小了内存竞争日志.仿真结果表明该算法能够在引入较少硬件资源的前提下有效地减小内存竞争日志.同时,内存竞争日志也具有较好的可扩展性. Shared-memory multithreaded programs running on chip multiprocessors tend to be nondeterministic. Two-phase deterministic record-replay is an effective approach to resolve this problem. Memory race recording is the key technology to replay multithreaded programs deterministically. It is significant to develop an efficient memory race recording scheme with both low log growth rate and rapid replay speed. A cyclic memory race recording algorithm based on point-to- point logging approach, named CyelicMR, is proposed. CyclicMR presents each memory race by using a new current dependency, uses hardware signatures with small size to detect memory races instead of cache memory, reduces the continuous memory races with same direction by a conflict direction detecting mechanism, and records an innovative cyclic dependency which can achieve much more transitivity. In this algorithm, all memory races recorded between two threads are loop-shaped, significantly reducing the redundancy of memory races. At the same time, cyclic dependency is further optimized by an incremental instruction counter, and the size of memory race is reduced a lot. Using an 8-core chip multiprocessor system, an exact comparison with earlier mainstream approaches is performed. The analysis results show that CyclicMR achieves small log growth rate, low hardware overhead and low bandwidth overhead. And it also has good scalability in memory race log.

作者朱素霞季振洲李东张浩

机构地区哈尔滨理工大学计算机科学与技术学院哈尔滨哈尔滨工业大学计算机科学与技术学院哈尔滨中国科学院计算技术研究所北京

出处《计算机研究与发展》 EI CSCD 北大核心 2014年第5期1149-1157,共9页 Journal of Computer Research and Development

基金国家自然科学基金项目(61173024) 国家“九七三”重点基础研究发展计划基金项目(2011CB302501)

关键词片上多核处理器多核程序确定性重演内存竞争记录冲突检测硬件签名 chip multiprocessor multi-core program deterministic replay memory race recording conflict detection hardware signature

分类号 TP303 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献16

1Pancake C' M,Paula S U. A bibliography of paralleldebuggers [J]. ACM SIGPLAN Notices, 1991,26(1) : 21-37.
2Bhansali S, Chen W, De J S, et al. Framework forinstruction-level tracing and analysis of programs [C]//Procof the 2nd Int Conf on Virtual ExecutionEnvironments(VEE,06). New York:ACM, 2006: 154-163.
3Netzer R H B. Optimal tracing and replay for debuggingshared-memory parallel programs [C]//Proc of the 1993ACM/ONR Workshop on Parallel and DistributedDebugging(PADD'93). New York: ACM, 1993: 1-11.
4Srinivasan S, Kandula S, Andrews C. Flashback: Alightweight extension for rollback and deterministic replay forsoftware debugging [C]//Procof theAnnual Conf onUSH!NIX Annual Technical Conference( ATEC04 ).Berkeley : USENIX Association, 2004 : 3.
5Dunlap G,Lucchetti D,Fetterman M, et al. Executionreplay of multiprocessor virtual machines [C]//Proc of the4thACM SIGPLAN/SIGOPS Int Confon Virtual ExecutionEnvironments C VEE'08). New York: ACM, 2008; 121-130.
6Xu M, Bodik R, Hill M D. A “flight data recorder” forenabling full-system multiprocessor deterministic replay [C〕//Proc of the 30th Annual int Symp on ComputerArchitecture (ISCA'03). New York: ACM, 2003 : 122-135.
7Prvulovic M. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection [C]//Proc ofthe 12th Int Symp on High-PerformanceComputerArchitecture (HPCA'06). New York: ACM,2006; 232-243.
8Xu M, Bodik R, Hill M D. A regulated transitive reduction(RTR) for longer memory race recording [C]//Proc of the12th Int Conf on ArchitecturalSupportfor ProgrammingLanguages andOperating Systems( ASPLOS,06 ). NewYork: ACM, 2006: 49-60.
9朱素霞,季振洲,刘涛,王庆,张浩.面向多核程序确定性重演的内存竞争记录机制研究[J].电子学报,2011,39(12):2748-2754. 被引量：3
10Narayanasamy S,Pereira C,Calder B. Recording sharedmemory dependencies using strata [C]//Proc of the 12th IntConf on Architectural Supportfor Programming Languagesand OperatingSystems (ASPLOS,06). New York: ACM,2006: 229-240.

二级参考文献15

1C M Pancake, R Netzer.A bibliography of parallel debuggers, 1993 edition[ A] .Proceedings of the ACM/ONR Workshop on Parallel and Distdbuted Debugging (PADD) [ C ]. New York, USA: ACM, 1993.169 - 186.
2T J Leblanc, J M Mellor-Crummey. Debugging paraUel pro- grams with instant replay [ J ]. IEEE Transactions on Comput- ers, 1987, C-36(4) :471 - 482.
3L Lewouw, K Audenaert. Minimizing the log size for execution replay of shared-memory programs[ A]. Thaird Joint Internation- al Conference on Vector and Parallel Processing[ C ]. Linz, Austria: Springer-Vedag, 1994.76 - 87.
4D Lucchetti, S K Reinhardt, P M Chen. ExtraVirt:detecting and recovering from transient processor faults[ A ]. 2005 Symp on Operating System Principles Work-in-Progress Session [ C ]. Bdehton. United Kingdom: ACM.2005.1 - 8.
5S Srinivasan, S Kandula, C Andrews, Y Zhou. Flashback: a lightweight extension for rollback and deterministic replay for software debugging [ A ]. Proceedings of the USENIX Annual Technical Conference [ C ]. Boston, Madison, USA: USENIX, 2(104.29 - 44.
6R H B Netzer. Optimal tracing and replay for debugging shared-memory parallel programs[ A] .Proc of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD) [C]. San Diego, California, USA: ACM, 1993.1 - 11.
7M Xu,R Bodik, M D Hill. A flight data recorder for enabling filll-system multiprocessor deterministic replay[ A]. Proc of the 30th Annual International Symposium on Computer Architecture [ C]. San Diego, CA: ACM, 2003.122 - 133.
8M Prvulovic. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection[ A]. Proc of the 12th IEEE Symp on High-Performance Computer Architecture[ C]. New York, USA: IEEE Computer Society,2006. 232 - 243.
9M Xu, R Bodik, M D Hill. A regulated transitive reduction (RTR) for longer memory race recording[ A].Proc of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems [C ]. San Jose, California, USA: ACM, 2006.49 - 60.
10S Narayanasamy, C Pereira, B Calder. Recording shared mem- ory dependencies using stmta[ A]. Proc of the 12th Interna- tional Conference on Architectural Support for Programming Languages and Op-erating Systems[ C]. San Jose, California, USA: ACM, 2006.229 - 240.

共引文献2

1朱素霞,陈德运,季振洲,孙广路,张浩.面向监听一致性协议的并发内存竞争记录算法[J].计算机研究与发展,2016,53(6):1238-1248.
2SHI Jianjun,JI Weixing,WANG Yizhuo,HUANG Lifu,GUO Yunkun,SHI Feng.Linux Kernel Data Races in Recent 5 Years[J].Chinese Journal of Electronics,2018,27(3):556-560. 被引量：1

同被引文献25

1Aciicmez O, Seifert J. Cheap hardware parallelism implies cheap security [C] //Proc of the 4th Workshop on FDTC 2007. Los Alamitos, CA: IEEE Computer Society, 2007.. 80-91.
2Xu M, Bodik R, Hill M D. A "light data reeorder" for enabling full system multiproeessor deterministic replay [C] //Proc of the 30th Int Symp on Computer Architecture (ISCA'03). New York= ACM, 2008:122-135.
3Montesinos P, Hicks M, King S T, et al. Capo: A software- hardware interface for practical deterministic multiprocessor replay [C] //Proe of the 14th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). New York= ACM, 2009= 23-84.
4Nima H, Josep T. Replay debugging: Leveraging record and replay for program debugging [C]//Proc of the 41st Int Symp on Computer Architecture (1SCA'14). New York= ACM, ZOI4:455-456.
5Xu M, Bodik R, Hill M D. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06). New York: ACM, 2006:49-60.
6Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C] //Proc of the 35th Int Syrup on Computer Architecture (ISCA'08). New York: ACM, 2008:265-267.
7Pokam G, Pereira C, Danne K, et al. Architeeting a chunk- based memory race recorder in modern CMPs [C] //Proe of the 42nd Int Syrup on Mieroarchitecture (MICRO'09). New York: ACM, 2009:576-585.
8Arkaprava B, Jayaram B, Hill M D. Karma= Sealable deterministic reeord-rcplay [C] //Proe of the Int Conf on Supercomputing (ICS'll). New York= ACM, 2011= 359- 368.
9Zhu Suxia, Ji Zhenzhou, Liu Tao, et al. CCTR: An efficient point to-point memory race recorder implemented in chunks [J], Microprocessors and Microsystems, 2012, 36(6).. 510- 519.
10Zhu Suxia, Ji Zhenzhou, Wang Qing. An efficient deterministic record-replay with separate dependencies [J]. Computers 8 Electrical Engineering, 2013, 39(2): 175-189.

引证文献2

1朱素霞,陈德运,季振洲,孙广路,张浩.面向监听一致性协议的并发内存竞争记录算法[J].计算机研究与发展,2016,53(6):1238-1248.
2李兰英,孙建达,朱素霞.线程交互不变量的原子性违例错误并发检测[J].计算机科学与探索,2018,12(7):1087-1099.

1朱素霞,季振洲,李东.面向多核处理器的内存竞争记录研究综述[J].智能计算机与应用,2013,3(3):53-59. 被引量：1
2朱素霞,季振洲,刘涛,王庆,张浩.面向多核程序确定性重演的内存竞争记录机制研究[J].电子学报,2011,39(12):2748-2754. 被引量：3
3朱素霞,陈德运,季振洲,孙广路,张浩.面向监听一致性协议的并发内存竞争记录算法[J].计算机研究与发展,2016,53(6):1238-1248.
4孟学多,俞雪永,颜晖.基于多核的在线判题系统的设计与研究[J].计算机时代,2011(7):7-9. 被引量：3
5黄志钢,周扬.动态并行语言的研究与设计[J].中国科技信息,2016(23):55-56.
6“英特尔杯”全国计算机多核程序设计大赛[J].计算机教育,2007(05S):64-64.
7“英特尔杯”全国计算机多核程序设计大赛正式拉开帷幕[J].办公自动化,2007,0(10):8-8.
8周洪斌,温一军.基于OpenMP的多核程序设计技术[J].沙洲职业工学院学报,2010,13(2):1-4. 被引量：3
9陈传绂.TurboC堆应用技术[J].河北轻化工学院学报,1996,17(2):46-49.
10于子捷,吴春燕,方荣惠.MS－DOS6．2操作系统的多系统软件配置[J].计算机应用,1995,15(1):62-63.

计算机研究与发展

2014年第5期

浏览历史

内容加载中请稍等...

基于硬件签名的循环式内存竞争记录算法被引量：2

参考文献16

二级参考文献15

共引文献2

同被引文献25

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于硬件签名的循环式内存竞争记录算法 被引量：2

参考文献16

二级参考文献15

共引文献2

同被引文献25

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于硬件签名的循环式内存竞争记录算法被引量：2