期刊文献+

基于数据分布一致性的处理器硬件性能计数器复用估计方法 被引量:4

A Data Distribution-Consistency-Based Estimation Method for Multiplexing Processor Hardware Performance Counters
下载PDF
导出
摘要 同时可记录的处理器硬件事件数量受限于处理器硬件性能计算器的数量.目前主流处理器可支持大量(数百个)硬件事件,但由于片上寄存器数量有限,仅提供了少量(通常6~12个)硬件性能计数器.为缓解这一矛盾,硬件计数器复用技术(multiplexing,MPX)通过分时复用策略,利用少量计算寄存器来估算大量硬件事件.但在实践中,由于已有基于时间局部性的MPX估计算法结果准确率偏低,导致MPX一直未被广泛采用.为了提升MPX结果准确率,主要工作包括3部分:1)通过Kolmogorov-Smirnov正态性检验,发现针对同一硬件事件,相同代码在单计数器记录单事件(one counter one event,OCOE)的OCOE模式和MPX模式下,存在数据分布一致性的规律;2)基于此规律,提出了轮廓线估计法(outline estimation,OLE);3)在开源MPX库NeoMPX上实现了OLE算法,并在主流X86和ARM处理器上进行了验证.实验结果表明:在对16个硬件事件同时进行采集时,OLE算法相比PAPI默认的MPX估计算法,结果准确率平均提高了10.5%左右,最多可提升46.6%;相比已有算法,结果准确率分别提升了18.8%和17.7%. The number of processor hardware events can be collected simultaneously and is limited by the number of processor hardware performance counters.Modern CPUs support hundreds of low-level hardware events,while only offer a small number(usually 6~12)of hardware performance counters(to collect these hardware events)due to limited register resource.To deal with this problem,multiplexing(MPX)is proposed to estimate simultaneously collected hardware events under the constrain of limited hardware counters.However,the low-accuracy of existing time-locality-based estimation algorithms prevents MPX from wide usage in real conditions.In order to improve the MPX accuracy,we design a new estimation algorithm.Our work includes three parts:1)we characterize the distribution of MPX results and one counter one event(OCOE)by Kolmogorov-Smirnov test and find the distribution consistency of MPX results;2)we propose a new distribution-consistency-based estimation algorithm for MPX,outline estimation(OLE);3)we validate OLE within the open-source MPX library NeoMPX on the mainstream X86 and ARM processors.The results show that,for simultaneously collecting 16 processor hardware events,OLE can improve up to 46.6%accuracy than the PAPI default MPX estimation algorithm and achieve 18.8%and 17.7%higher accuracy than the other four state-of-art MPX estimation algorithms respectively.
作者 林新华 王杰 王一超 左思成 Lin Xinhua;Wang Jie;Wang Yichao;Zuo Sicheng(High Performance Computing Center,Shanghai Jiao Tong University,Shanghai 200240)
出处 《计算机研究与发展》 EI CSCD 北大核心 2022年第6期1192-1201,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(62072300)。
关键词 处理器硬件性能计数器 复用技术 性能分析 高性能计算 估计方法 processor hardware performance counters multiplexing(MPX) performance profiling high performance computing estimation method
  • 相关文献

参考文献2

二级参考文献18

  • 1Draper N,Smith H.Applied Regression Analysis[M].New York:John Wiley & Sons,1968.
  • 2Montgomery D.Design and Analysis of Experiments[M].New York:John Wiley & Sons,2006.
  • 3Joseph P J,Kapil Vaswani,Thazhuthaveetil M J.Construction and use of linear regression models for processor performance analysis[C] //Proc of the 12th Int Symp on High-Performance Computer Architecture.Los Alamitos,CA:IEEE Computer Society,2006:99-108.
  • 4Lee B,Brooks D.Illustrative design space studies with microarchitectural regression models[C] //Proc of the Int Symp on High-Performance Computer Architecture (HPCA-13).Los Alamitos,CA:IEEE Computer Society,2007:340-351.
  • 5Lee B,Brooks D.Roughness of microarchitectural design topologies and its implications for optimization[C] //Proc of the Int Symp on High-Performance Computer Architecture (HPCA-14).Los Alamitos,CA:IEEE Computer Society,2008:240-251.
  • 6Lee B,Brooks D.Accurate and efficient regression modeling for microarchitectural performance and power prediction[C] //Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems.San Jose,California:ACM,2006:185-194.
  • 7Eeckhout L,De Bosschere K.Early design phase power//performance modeling through statistical simulation[C] //Proc of the Int Symp on Performance Analysis of Systems and Software.Los Alamitos,CA:IEEE Computer Society,2001:10-17.
  • 8Lee B.An architectural assessment of spec cpu benchmark relevance[R//OL].Cambridge:Harvard University,2006[2009-04-06].ftp://ftp.deas.harvard.edu//techreports//tr-02-06.pdf.
  • 9Ould-Ahmed-Vall E,Doshi K,Yount C,et al.Characterization of SPEC CPU2006 and SPEC OMP2001:regression models and their transferability[C] //Proc of the Int Symp on Performance Analysis of Systems and Software.Los Alamitos,CA:IEEE Computer Society,2008:179-190.
  • 10Wang Y,Witten I H.Inducing model trees for continuous classes[C] //Proc of the European Conf on Machine Learning.Prague,Czechic:Czech Republic,1997:128-137.

共引文献6

同被引文献54

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部