期刊文献+

一种用于多线程程序性能分析的重放系统 被引量:2

A Replay System for Performance Analysis of Multi-Threaded Programs
下载PDF
导出
摘要 近年来,多线程程序中性能bug问题越来越突出.传统用于检测并发错误的记录/重放系统存在重放开销和执行时间不精确等问题,因此不适于对性能bug的研究.针对上述问题,提出了一种可用于多线程程序性能分析的重放系统——PerfPlay.首先,分析了用于程序性能分析时必要的程序信息;其次,基于程序执行轨迹,探讨了不同的重放策略,并提出了基于程序调度的重放策略,以保证重放系统的性能保真度;最后,基于提出的性能重放系统,进一步研究了经典的"线程间不必要锁竞争"所造成的性能问题.通过与传统的重放策略作比较,PerfPlay保证了重放系统的性能保证度.并经过案例研究,发现并进一步验证了若干真实的多线程程序性能问题. In recent years, it is a hotspot for program analysis to detect performance bugs in multi threaded applications. However, traditional record/replay systems focusing on concurrent anomalies have many limitations to tackle the issues of performance bugs, such as replay overhead and imprecision of replay-based execution time. To cope with the problems above, this paper proposes an improved replay system PerfPlay which can be used for the performance analysis of multi-threaded programs. To be specific, we first collect and analyze the requisite information for the program performance. Secondly, the different replay strategies are discussed and then we present a novel schedule-driven strategy to ensure the performance fidelity of replay system. Finally, we study the classical performance problem of "inter thread unnecessary lock contention" under the framework of PerfPlay. Compared with the traditional replay strategies, our experimental results demonstrate the performance fidelity of PerfPlay. Through the case study, we find a few performance bugs in realworld and further verify the effectiveness of PerfPlay.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第1期45-55,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61272408 61322210) 高等学校博士学科点专项科研基金项目(20130142110048) 国家"八六三"高技术研究发展计划基金项目(2012AA010905)
关键词 性能bug 重放 案例研究 多线程 不必要锁竞争 performance bug replay case study multi-threaded, unnecessary lock contention
  • 相关文献

参考文献26

  • 1C STANDARDS COMMITTEE.Committee draft;Programming languages[S].New York:ISO/IEC,2010.
  • 2Altekar G,Stoica I.ODR:Output-deterministic replay formulticore debugging[C]//Proc of the 22nd Symp on Operating Systems Principles.New York:ACM,2009:193-206.
  • 3Gilchrist J.A parallel bzip2 compressor[CP/OL].2008(2012-06-06)[2014-05-12].http://compression.ca/pbzip2/.
  • 4Park S,Lu S,Zhou Y.CTrigger:Exposing atomicityviolation bugs from their hiding places[C]//Proc of the 14thInt Conf on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2009:25-36.
  • 5Oracle Corporation.MySQL relational database managementsystem[CP/OL].1995(2014-09-23)[2014-05-12].http://www.mysql.com/.
  • 6Rajwar R,Goodman J R.Transactional lock-free executionof lock-based programs[C]//Proc of the 10th Int Conf onArchitectural Support for Programming Languages and Operating Systems.New York:ACM,2002;5-17.
  • 7Lee J.Transmission BitTorrent Client[CP/OL].2005(2013-08-08)[2014-05-12].http://www.transmissionbt.com/.
  • 8Bell Labs.SPIN Verification Tool[CP/OL].1991(2013-05-04)[2014-05-12].http://spinroot.com/spin/whatispin.html.
  • 9Lee D,Said M,Narayanasamy S,et al.Offline symbolicanalysis to infer Total Store Order[C]//Proc of the 17th IntSymp on High Performance Computer Architecture.LosAlamitos,CA:IEEE Computer Society,2011:357-358.
  • 10Kemper B.PIN Binary Instrumentation Framework[CP/OL].2005(2014-03)[2014-05-12].https://sites,google.com/site/pintutorial/home/asplos2014.

二级参考文献14

  • 1Martin M M K, Sorin Multifacet's general D J, Beckmann B M, et al. execution-driven multiprocessor simulator (GEMS) toolset [J]. SIGARCH Computer Architecture News, 2005, 33(4): 92-99.
  • 2Xu M, Bodik R, Hill M D. A "flight data recorder" for enabling full system multiprocessor deterministic replay [C] //ProcoflEEEISCA'03. New York: ACM, 2003:122-135.
  • 3Fidge C J. Time stamps in message-passing systems that preserve the partial ordering [C] //Proc of ACSC'88. New York: ACM, 1988: 56-66.
  • 4Lamport L. Time, clocks, and the ordering of events in a distributed system [J]. Communications of the ACM, 1978, 21(7) : 558-565.
  • 5Bacon D F, Goldstein S C. Hardware-assisted replay of multiprocessor programs [C] //Proc of ACM/ONR WPDD'91. New York: ACM, 1991:194-206.
  • 6Xu M, Hill M D, Bodik R. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of IEEE ASPLOS'06. New York: ACM, 2006:49-60.
  • 7Narayanasamy S, Pereira C, Calder B. Recording shared memory dependencies using strata [C] //Proc of IEEE ASPLOS'06. New York: ACM, 2006: 229-240.
  • 8Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C]//Proc of IEEE ISCA'08. Piscataway, NJ:IEEE, 2008: 265-276.
  • 9Montesinos P, Ceze L, Torrellas J. DeLorean: Recording and deterministically replaying shared memory multiprocessor execution efficiently [C] //Proc of IEEE ISCA'08. Piscataway, NJ: IEEE, 2008 : 289-300.
  • 10Ceze L, Tuck J, Montesinos P, et al. BulkSC: Bulk enforcement of sequential consistency [C] //Proc of IEEE ISCA'07. New York:ACM, 2007:278-289.

共引文献4

同被引文献34

  • 1Rixner S, Daily W, Kapasi U, et al. Memory access scheduling [C] //Proc of the 27th Annual Int Syrup on Computer Architecture. New York: ACM, 2000:128-138.
  • 2Uhlig R, Mudge T. Trace-driven memory simulation: A survey[J]. ACMComputer Survey, 1997, 29(2): 128-170.
  • 3Joao J, Suleman M, Mutlu O, et al. Bottleneck identification and scheduling in muhithreaded applications [C]//Proe of the 17th Int Conf on Architectural Support for Programming I.anguages and Operating Systems. New York: ACM, 2012: 223-234.
  • 4Kayiran O, Nachiappan N, Jog A, et al. Managing GPU concurrency in heterogeneous architectures [C] //Proe of the 47th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: 1EEEComputer Society, 2014:114-126.
  • 5Mutlu O, Moscibroda T. Parallelism aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems [C] //Proc of the 35th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2008:63-74.
  • 6Eeckhout L. Computer Architecture Performance Evaluation Methods, Synthesis Lectures on Computer Architecture [M]. San Rafael, CA: Morgan& Claypool Publishers, 2010.
  • 7Vetter J. Contemporary High Performance Computing: From Petascale toward Exascale [M]. Boca Raton, FL: Chapman & Hall/CRC, 2013.
  • 8Kim Y, Han Dongsu, Mutlu O, et al. ATLAS: A scalable and high performance scheduling algorithm for multiple memory controllers [C] //Proc of the 16th Int Symp on High Performance Computer Architecture. Piscataway, NJ : IEEE, 2010:1-12.
  • 9Mutlu O, Moscibroda T. Stall time fair memory access scheduling for chip muhiprocessors [C] //Proc of the 40th Annual IEEE/ACM Int Syrup on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2007.
  • 10Bienia C, Kumar S, Singh J, et al. The PARSEC benchmark suite: Characterization and architectural implications [C] // Proc of the 17th Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2008:72-81.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部