期刊文献+

众核处理器片上同步机制和评估方法研究 被引量:10

On Synchronization and Evaluation Method of Chipped Many-Core Processor
下载PDF
导出
摘要 同步机制是片上多核/众核处理器正确执行和协同通信的关键,其效率对处理器的性能非常重要.针对片上众核体系结构,提出并实现了两种粗粒度同步机制和一种细粒度同步机制,即片上专用硬件支持的同步机制、基于原语的片上互斥访问同步机制和基于满空标志位的细粒度同步机制;提出了粗粒度同步机制的评估标准和评估方法,并设计了量化评估程序.以片上同构众核处理器Godson-T模拟器和AMDOpteron商业片上多核处理器为平台,评估比较了提出的硬件支持的同步机制与基于原语的同步机制的性能.结果表明,硬件支持可以使得片上众核处理器的同步机制性能明显提高;在传统基于原语的同步机制中,大部分性能损失是由于负载不平衡和同步点的串行化操作而造成的等待时间. Synchronization schemes are critical for execute correctly and communicate cooperatively. on-chip multi-core and many core processor to The efficiency of the synchronization is very important for the processor. In this paper, for on-chip many-core architecture, three types of synchronization schemes are proposed. That is, two types o[ coarse-grain synchronization schemes based on dedicated hardware support and atomic operation, and a fine-grain synchroniza- tion scheme based on Full/Empty bit. Then, the evaluation criterions and methods are proposed, in which quantitative evaluation micro benchmarks are designed for coarse-grain synchronization schemes. Finally, the coarse-grain synchronization schemes are evaluated via a many-core archi- tecture simulator, i. e. , Godson-T, and AMD Opteron commercial on-chip multi-processor using pThread multi-thread program model. The results show that hardware support improves the per- formance of the synchronization obviously for on-chip many-core processor, and the performance loss of the traditional synchronization scheme based on atomic instructions is caused by the wait- ing cost of load imbalance and serialization on synchronization point mostly.
出处 《计算机学报》 EI CSCD 北大核心 2010年第10期1777-1787,共11页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60736012) 国家"九七三"重点基础研究发展规划项目基金(2005CB321600) 国家"八六三"高技术研究发展计划项目基金(2009AA01Z103) 国家杰出青年科学基金(60925009) 国家自然科学基金创新研究群体科学基金(60921002) 北京市自然科学基金(4092044)资助~~
关键词 片上众核处理器 同步 硬件支持 量化评估 微程序 many-core processor synchronization hardware support evaluation micro-benchmark
  • 相关文献

参考文献21

  • 1Asanovic K et al. The landscape of parallel computing research: A view from berkeley. UC Berkeley: Technical Report No. UCB/EECS 2006-183, 2006.
  • 2Almasi G, Cascaval C, Castanos J G, Denneau M, Lieber D, Moreira J E, Warren H S, Jr. Dissecting eyclops: A detailed analysis of a multithreaded architecture. ACM SIGARCH Computer Architecture News, 2003, 31(1): 26-38.
  • 3Kongetira P, Aingaran K et al. Niagara: A 32-way multithreaded spare processor. IEEE Micro, 2005, 25(2) 21-29.
  • 4Seiler Larry, Carmean Doug et al. Larrabee: A many-core X86 architecture for visual computing//Proceedings of the International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ' 08). Los Angeles, California, 2008.
  • 5Jiang D, Singh J P. A methodology and an evaluation of the SGI Origin 2000//Proceedings of the ACM Sigmetries98/ Performance 98. Madison, Wisconsin, United States, 19981 171-181.
  • 6Eichenberger A E, Abraham S G. Impact of load imbalance on the design of software barriers//Proceedings of the 1995 International Conference on Parallel Processing. 1995 : 63-72.
  • 7Lim G H, Agarwal A. Reactive synchronization algorithms for multiprocessors//Proceedings of the Architectural Support for Programming Languages and Operating Systems. San Jose, California, 1994:25-35.
  • 8Martin R P, Vahdat A Met al. Effect of communication latency, overhead, and bandwidth on a cluster architecture// Proceedings of the 24th Annual International Symposium on Computer Architecture. Denver, Colorado, United States, 1997, 85-97.
  • 9Mellor-Crummey J M, Scott M L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.
  • 10Mellor-Crummey J M, Scott M L. Synchronization without contention//Proceedings of the Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, 1991:269-278.

二级参考文献18

  • 1Scott S L. Synchronization and communication in the T3E multiprocessor//Proceedings of the 7th International Conference on Architectural Support for Programming. Cambridge, MA, 1996:26- 36
  • 2Hensgen D, Finkel R, Manber U. Two algorithms for barrier synchronization. International Journal of Parallel Programming, 1988, 17(1): 1-17
  • 3Brooks D E. The butterfly barrier. International Journal of Parallel Programming, 1986, 15(4): 295-307
  • 4Scott M L et al. Fast contention-free combining tree barriers for shared memory multiprocessors. International Journal of Parallel Programming, 1994, 22(4) : 449-481
  • 5Torsten H et al. A practical approach to the rating of harrier algorithms using the LogP model and Open-MPI//Proceedings of the International Conference on Parallel Processing Workshops. Oslo, Norway, 2005:562-569
  • 6Buntinas B, Panda D K et al. Performance benefits of NIC based barrier on myrinet/GM//Proceedings of the 15th In ternational Parallel & Distributed Processing Symposium, San Francisco, 2001:166-173
  • 7Gupta R et al. Efficient barrier using remote memory opera tions on VIA-based clusters//Proceedings of the IEEE Inter national Conference on Cluster Computing. Chicago, 2002 :83 -90
  • 8Adams D. Cray T3D system architecture overview. Cray Research Inc: Technical Report HR-040433, 1994
  • 9The BlueGene/L team. An overview of the BlueGene/L supercomputer//Proceedings of the International Conference for High Performance Networking and Computing (SC'02). Maryland, 2002:1-22
  • 10Petrini F et al. Hardware- and software-based collective communication on the quadrics network//Proceedings of the IEEE International Symposium on Network Computing and Applications. Cambridge, MA, 2001:24-35

共引文献1

同被引文献147

  • 1田俊峰,张权雄.一种固定优先级分布式总线仲裁器的设计方法及性能评价[J].河北大学学报(自然科学版),1995,15(2):45-49. 被引量:3
  • 2李宥谋,韩俊刚.SDH芯片功能验证平台的设计与实现[J].光通信研究,2005(4):61-63. 被引量:5
  • 3聂鑫,田建生,梁远灯.基于FPGA的PCI总线仲裁器设计[J].计算机测量与控制,2005,13(8):817-820. 被引量:7
  • 4屈文新,樊晓桠,张盛兵.多核多线程处理器存储技术研究进展[J].计算机科学,2007,34(4):13-16. 被引量:8
  • 5KUMAR S,TURNER J,WILLIAMS J.Advanced algorithms for fast and scalable deep packet inspection[C] // Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking And Communications Systems.New York:ACM Press,2006:81-92.
  • 6GUO DANHUA,LIAO GUANGDENG,BHUAN L N.A scalable multithreaded L7-Filter design for multi-core servers[C] //Proceedings of the 4th ACM/IEEE Symposium on Architecture for Networking and Communications Systems.New York:ACM Press,2008:60-68.
  • 7Windows Hardware Development Center.Receive Side Scaling (RSS)[EB/OL].[2011-10-20].http://msdn.microsoft.com/en-us/windows/hardware/gg463253.aspx.
  • 8WALDVOQEL M.Multi-dimensional prefix matching using line search[C] // Proceedings of the 25th Annual IEEE Conference on Local Computer Networks. Washington, DC: IEEE Computer Society,2000:200-207.
  • 9HAMED H,AL-SHAER E.On autonomic optimization of firewall policy organization[J].Journal of High Speed Networks,2006,15(3):209-227.
  • 10MIT DARPA intrusion detection data sets[EB/OL].[2010-10-10].http://www.ll.mit.edu/IST/ideval/data/2000/2000_data_index.html.

引证文献10

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部