一种用于多线程程序性能分析的重放系统被引量：2

A Replay System for Performance Analysis of Multi-Threaded Programs

下载PDF

导出

摘要近年来,多线程程序中性能bug问题越来越突出.传统用于检测并发错误的记录/重放系统存在重放开销和执行时间不精确等问题,因此不适于对性能bug的研究.针对上述问题,提出了一种可用于多线程程序性能分析的重放系统——PerfPlay.首先,分析了用于程序性能分析时必要的程序信息;其次,基于程序执行轨迹,探讨了不同的重放策略,并提出了基于程序调度的重放策略,以保证重放系统的性能保真度;最后,基于提出的性能重放系统,进一步研究了经典的"线程间不必要锁竞争"所造成的性能问题.通过与传统的重放策略作比较,PerfPlay保证了重放系统的性能保证度.并经过案例研究,发现并进一步验证了若干真实的多线程程序性能问题. In recent years, it is a hotspot for program analysis to detect performance bugs in multi threaded applications. However, traditional record/replay systems focusing on concurrent anomalies have many limitations to tackle the issues of performance bugs, such as replay overhead and imprecision of replay-based execution time. To cope with the problems above, this paper proposes an improved replay system PerfPlay which can be used for the performance analysis of multi-threaded programs. To be specific, we first collect and analyze the requisite information for the program performance. Secondly, the different replay strategies are discussed and then we present a novel schedule-driven strategy to ensure the performance fidelity of replay system. Finally, we study the classical performance problem of ＂inter thread unnecessary lock contention＂ under the framework of PerfPlay. Compared with the traditional replay strategies, our experimental results demonstrate the performance fidelity of PerfPlay. Through the case study, we find a few performance bugs in realworld and further verify the effectiveness of PerfPlay.

作者郑龙廖小飞吴松金海

机构地区服务计算技术与系统教育部重点实验室(华中科技大学) 华中科技大学计算机科学与技术学院

出处《计算机研究与发展》 EI CSCD 北大核心 2015年第1期45-55,共11页 Journal of Computer Research and Development

基金国家自然科学基金项目(61272408 61322210) 高等学校博士学科点专项科研基金项目(20130142110048) 国家"八六三"高技术研究发展计划基金项目(2012AA010905)

关键词性能bug 重放案例研究多线程不必要锁竞争 performance bug replay case study multi-threaded, unnecessary lock contention

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献26

1C STANDARDS COMMITTEE.Committee draft;Programming languages[S].New York:ISO/IEC,2010.
2Altekar G,Stoica I.ODR:Output-deterministic replay formulticore debugging[C]//Proc of the 22nd Symp on Operating Systems Principles.New York:ACM,2009:193-206.
3Gilchrist J.A parallel bzip2 compressor[CP/OL].2008(2012-06-06)[2014-05-12].http://compression.ca/pbzip2/.
4Park S,Lu S,Zhou Y.CTrigger:Exposing atomicityviolation bugs from their hiding places[C]//Proc of the 14thInt Conf on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2009:25-36.
5Oracle Corporation.MySQL relational database managementsystem[CP/OL].1995(2014-09-23)[2014-05-12].http://www.mysql.com/.
6Rajwar R,Goodman J R.Transactional lock-free executionof lock-based programs[C]//Proc of the 10th Int Conf onArchitectural Support for Programming Languages and Operating Systems.New York:ACM,2002;5-17.
7Lee J.Transmission BitTorrent Client[CP/OL].2005(2013-08-08)[2014-05-12].http://www.transmissionbt.com/.
8Bell Labs.SPIN Verification Tool[CP/OL].1991(2013-05-04)[2014-05-12].http://spinroot.com/spin/whatispin.html.
9Lee D,Said M,Narayanasamy S,et al.Offline symbolicanalysis to infer Total Store Order[C]//Proc of the 17th IntSymp on High Performance Computer Architecture.LosAlamitos,CA:IEEE Computer Society,2011:357-358.
10Kemper B.PIN Binary Instrumentation Framework[CP/OL].2005(2014-03)[2014-05-12].https://sites,google.com/site/pintutorial/home/asplos2014.

二级参考文献14

1Martin M M K, Sorin Multifacet's general D J, Beckmann B M, et al. execution-driven multiprocessor simulator (GEMS) toolset [J]. SIGARCH Computer Architecture News, 2005, 33(4): 92-99.
2Xu M, Bodik R, Hill M D. A "flight data recorder" for enabling full system multiprocessor deterministic replay [C] //ProcoflEEEISCA'03. New York: ACM, 2003:122-135.
3Fidge C J. Time stamps in message-passing systems that preserve the partial ordering [C] //Proc of ACSC'88. New York: ACM, 1988: 56-66.
4Lamport L. Time, clocks, and the ordering of events in a distributed system [J]. Communications of the ACM, 1978, 21(7) : 558-565.
5Bacon D F, Goldstein S C. Hardware-assisted replay of multiprocessor programs [C] //Proc of ACM/ONR WPDD'91. New York: ACM, 1991:194-206.
6Xu M, Hill M D, Bodik R. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of IEEE ASPLOS'06. New York: ACM, 2006:49-60.
7Narayanasamy S, Pereira C, Calder B. Recording shared memory dependencies using strata [C] //Proc of IEEE ASPLOS'06. New York: ACM, 2006: 229-240.
8Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C]//Proc of IEEE ISCA'08. Piscataway, NJ:IEEE, 2008: 265-276.
9Montesinos P, Ceze L, Torrellas J. DeLorean: Recording and deterministically replaying shared memory multiprocessor execution efficiently [C] //Proc of IEEE ISCA'08. Piscataway, NJ: IEEE, 2008 : 289-300.
10Ceze L, Tuck J, Montesinos P, et al. BulkSC: Bulk enforcement of sequential consistency [C] //Proc of IEEE ISCA'07. New York:ACM, 2007:278-289.

共引文献4

1朱素霞,季振洲,李东.面向多核处理器的内存竞争记录研究综述[J].智能计算机与应用,2013,3(3):53-59. 被引量：1
2余攀峰.嵌入式多核平台调试技术[J].计算机系统应用,2013,22(11):187-189. 被引量：1
3温娜,郝永生,卢俊文.多核计算下气象研究程序的并行化研究[J].武汉理工大学学报（信息与管理工程版）,2014,36(6):739-742.
4禹振,苏小红,王甜甜,马培军.虚拟时间及其在数据竞争检测中的应用[J].哈尔滨工业大学学报,2015,47(1):68-74.

同被引文献34

1Rixner S, Daily W, Kapasi U, et al. Memory access scheduling [C] //Proc of the 27th Annual Int Syrup on Computer Architecture. New York: ACM, 2000:128-138.
2Uhlig R, Mudge T. Trace-driven memory simulation: A survey[J]. ACMComputer Survey, 1997, 29(2): 128-170.
3Joao J, Suleman M, Mutlu O, et al. Bottleneck identification and scheduling in muhithreaded applications [C]//Proe of the 17th Int Conf on Architectural Support for Programming I.anguages and Operating Systems. New York: ACM, 2012: 223-234.
4Kayiran O, Nachiappan N, Jog A, et al. Managing GPU concurrency in heterogeneous architectures [C] //Proe of the 47th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: 1EEEComputer Society, 2014:114-126.
5Mutlu O, Moscibroda T. Parallelism aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems [C] //Proc of the 35th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2008:63-74.
6Eeckhout L. Computer Architecture Performance Evaluation Methods, Synthesis Lectures on Computer Architecture [M]. San Rafael, CA: Morgan& Claypool Publishers, 2010.
7Vetter J. Contemporary High Performance Computing: From Petascale toward Exascale [M]. Boca Raton, FL: Chapman & Hall/CRC, 2013.
8Kim Y, Han Dongsu, Mutlu O, et al. ATLAS: A scalable and high performance scheduling algorithm for multiple memory controllers [C] //Proc of the 16th Int Symp on High Performance Computer Architecture. Piscataway, NJ : IEEE, 2010:1-12.
9Mutlu O, Moscibroda T. Stall time fair memory access scheduling for chip muhiprocessors [C] //Proc of the 40th Annual IEEE/ACM Int Syrup on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2007.
10Bienia C, Kumar S, Singh J, et al. The PARSEC benchmark suite: Characterization and architectural implications [C] // Proc of the 17th Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2008:72-81.

引证文献2

1朱鹏飞,卢天越,陈明宇.一种多线程程序内存系统模拟器Trace驱动仿真方法[J].计算机研究与发展,2015,52(6):1266-1277.
2欧如月.MFC多线程技术在串口通信中的应用[J].信息与电脑,2017,29(15):171-172. 被引量：3

二级引证文献3

1王晓丹,姚舜才,杜飞.超声电机数据采集系统软件设计与实现[J].现代电子技术,2019,42(6):105-108. 被引量：9
2王飞,吴小峰.风洞舰船模型运动控制系统设计与开发[J].实验室研究与探索,2019,38(12):89-93. 被引量：1
3王飞,马雪泉,谢凤伟,吴永顺.船舶操纵性约束模测量系统设计与分析[J].计算机测量与控制,2024,32(2):14-21.

1曹建.操作系统安全与Bug问题的解决方法[J].电脑技术——Hello-IT,1999(11):18-21.
2尤惠芬.CSS Bug的解决策略[J].太原师范学院学报（自然科学版）,2013,12(4):101-102.
3马宇川.终结全民超频肝DIY超频时代开启[J].微型计算机,2011(10):6-7.
4国际互联网或将30年后遭“千年虫”[J].中国计算机用户,2008(10):13-13.
5谷歌浏览器54.0新版发布[J].电脑知识与技术（经验技巧）,2016,0(12):19-19.
6关凯钰,寇浩锋.DOS下用批处理和环境变量实现程序调度的方法[J].计算机时代,1996(6):28-28.
7殷卫才.教您组建家庭影院[J].数字社区&智能家居,2009(1):76-81.
8李爱玲,王璐,彭云峰.并行应用程序调度策略研究[J].电子器件,2012,35(4):453-456.
9闵庆豪,张为华.多核缓存优化技术研究综述[J].计算机系统应用,2015,24(1):1-8. 被引量：1
10朱振元,朱承.Delphi网络应用程序性能分析[J].小型微型计算机系统,2002,23(9):1076-1078. 被引量：1

计算机研究与发展

2015年第1期

浏览历史

内容加载中请稍等...

一种用于多线程程序性能分析的重放系统被引量：2

参考文献26

二级参考文献14

共引文献4

同被引文献34

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种用于多线程程序性能分析的重放系统 被引量：2

参考文献26

二级参考文献14

共引文献4

同被引文献34

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种用于多线程程序性能分析的重放系统被引量：2