众核处理器片上同步机制和评估方法研究被引量：10

On Synchronization and Evaluation Method of Chipped Many-Core Processor

下载PDF

导出

摘要同步机制是片上多核/众核处理器正确执行和协同通信的关键,其效率对处理器的性能非常重要.针对片上众核体系结构,提出并实现了两种粗粒度同步机制和一种细粒度同步机制,即片上专用硬件支持的同步机制、基于原语的片上互斥访问同步机制和基于满空标志位的细粒度同步机制;提出了粗粒度同步机制的评估标准和评估方法,并设计了量化评估程序.以片上同构众核处理器Godson-T模拟器和AMDOpteron商业片上多核处理器为平台,评估比较了提出的硬件支持的同步机制与基于原语的同步机制的性能.结果表明,硬件支持可以使得片上众核处理器的同步机制性能明显提高;在传统基于原语的同步机制中,大部分性能损失是由于负载不平衡和同步点的串行化操作而造成的等待时间. Synchronization schemes are critical for execute correctly and communicate cooperatively. on-chip multi-core and many core processor to The efficiency of the synchronization is very important for the processor. In this paper, for on-chip many-core architecture, three types of synchronization schemes are proposed. That is, two types o[ coarse-grain synchronization schemes based on dedicated hardware support and atomic operation, and a fine-grain synchroniza- tion scheme based on Full/Empty bit. Then, the evaluation criterions and methods are proposed, in which quantitative evaluation micro benchmarks are designed for coarse-grain synchronization schemes. Finally, the coarse-grain synchronization schemes are evaluated via a many-core archi- tecture simulator, i. e. , Godson-T, and AMD Opteron commercial on-chip multi-processor using pThread multi-thread program model. The results show that hardware support improves the per- formance of the synchronization obviously for on-chip many-core processor, and the performance loss of the traditional synchronization scheme based on atomic instructions is caused by the wait- ing cost of load imbalance and serialization on synchronization point mostly.

作者徐卫志宋风龙刘志勇范东睿余磊张帅

机构地区中国科学院计算技术研究所系统结构重点实验室中国科学院研究生院

出处《计算机学报》 EI CSCD 北大核心 2010年第10期1777-1787,共11页 Chinese Journal of Computers

基金国家自然科学基金重点项目(60736012) 国家"九七三"重点基础研究发展规划项目基金(2005CB321600) 国家"八六三"高技术研究发展计划项目基金(2009AA01Z103) 国家杰出青年科学基金(60925009) 国家自然科学基金创新研究群体科学基金(60921002) 北京市自然科学基金(4092044)资助~~

关键词片上众核处理器同步硬件支持量化评估微程序 many-core processor synchronization hardware support evaluation micro-benchmark

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献21

1Asanovic K et al. The landscape of parallel computing research: A view from berkeley. UC Berkeley: Technical Report No. UCB/EECS 2006-183, 2006.
2Almasi G, Cascaval C, Castanos J G, Denneau M, Lieber D, Moreira J E, Warren H S, Jr. Dissecting eyclops: A detailed analysis of a multithreaded architecture. ACM SIGARCH Computer Architecture News, 2003, 31(1): 26-38.
3Kongetira P, Aingaran K et al. Niagara: A 32-way multithreaded spare processor. IEEE Micro, 2005, 25(2) 21-29.
4Seiler Larry, Carmean Doug et al. Larrabee: A many-core X86 architecture for visual computing//Proceedings of the International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ' 08). Los Angeles, California, 2008.
5Jiang D, Singh J P. A methodology and an evaluation of the SGI Origin 2000//Proceedings of the ACM Sigmetries98/ Performance 98. Madison, Wisconsin, United States, 19981 171-181.
6Eichenberger A E, Abraham S G. Impact of load imbalance on the design of software barriers//Proceedings of the 1995 International Conference on Parallel Processing. 1995 : 63-72.
7Lim G H, Agarwal A. Reactive synchronization algorithms for multiprocessors//Proceedings of the Architectural Support for Programming Languages and Operating Systems. San Jose, California, 1994:25-35.
8Martin R P, Vahdat A Met al. Effect of communication latency, overhead, and bandwidth on a cluster architecture// Proceedings of the 24th Annual International Symposium on Computer Architecture. Denver, Colorado, United States, 1997, 85-97.
9Mellor-Crummey J M, Scott M L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 1991, 9(1): 21-65.
10Mellor-Crummey J M, Scott M L. Synchronization without contention//Proceedings of the Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, 1991:269-278.

二级参考文献18

1Scott S L. Synchronization and communication in the T3E multiprocessor//Proceedings of the 7th International Conference on Architectural Support for Programming. Cambridge, MA, 1996:26- 36
2Hensgen D, Finkel R, Manber U. Two algorithms for barrier synchronization. International Journal of Parallel Programming, 1988, 17(1): 1-17
3Brooks D E. The butterfly barrier. International Journal of Parallel Programming, 1986, 15(4): 295-307
4Scott M L et al. Fast contention-free combining tree barriers for shared memory multiprocessors. International Journal of Parallel Programming, 1994, 22(4) : 449-481
5Torsten H et al. A practical approach to the rating of harrier algorithms using the LogP model and Open-MPI//Proceedings of the International Conference on Parallel Processing Workshops. Oslo, Norway, 2005:562-569
6Buntinas B, Panda D K et al. Performance benefits of NIC based barrier on myrinet/GM//Proceedings of the 15th In ternational Parallel & Distributed Processing Symposium, San Francisco, 2001:166-173
7Gupta R et al. Efficient barrier using remote memory opera tions on VIA-based clusters//Proceedings of the IEEE Inter national Conference on Cluster Computing. Chicago, 2002 :83 -90
8Adams D. Cray T3D system architecture overview. Cray Research Inc: Technical Report HR-040433, 1994
9The BlueGene/L team. An overview of the BlueGene/L supercomputer//Proceedings of the International Conference for High Performance Networking and Computing (SC'02). Maryland, 2002:1-22
10Petrini F et al. Hardware- and software-based collective communication on the quadrics network//Proceedings of the IEEE International Symposium on Network Computing and Applications. Cambridge, MA, 2001:24-35

共引文献1

1黄万荣,唐玉华,易晓东.面向流处理结构的Barrier同步实现[J].计算机研究与发展,2014,51(S1):245-250. 被引量：1

同被引文献147

1田俊峰,张权雄.一种固定优先级分布式总线仲裁器的设计方法及性能评价[J].河北大学学报（自然科学版）,1995,15(2):45-49. 被引量：3
2李宥谋,韩俊刚.SDH芯片功能验证平台的设计与实现[J].光通信研究,2005(4):61-63. 被引量：5
3聂鑫,田建生,梁远灯.基于FPGA的PCI总线仲裁器设计[J].计算机测量与控制,2005,13(8):817-820. 被引量：7
4屈文新,樊晓桠,张盛兵.多核多线程处理器存储技术研究进展[J].计算机科学,2007,34(4):13-16. 被引量：8
5KUMAR S,TURNER J,WILLIAMS J.Advanced algorithms for fast and scalable deep packet inspection[C] // Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking And Communications Systems.New York:ACM Press,2006:81-92.
6GUO DANHUA,LIAO GUANGDENG,BHUAN L N.A scalable multithreaded L7-Filter design for multi-core servers[C] //Proceedings of the 4th ACM/IEEE Symposium on Architecture for Networking and Communications Systems.New York:ACM Press,2008:60-68.
7Windows Hardware Development Center.Receive Side Scaling (RSS)[EB/OL].[2011-10-20].http://msdn.microsoft.com/en-us/windows/hardware/gg463253.aspx.
8WALDVOQEL M.Multi-dimensional prefix matching using line search[C] // Proceedings of the 25th Annual IEEE Conference on Local Computer Networks. Washington, DC: IEEE Computer Society,2000:200-207.
9HAMED H,AL-SHAER E.On autonomic optimization of firewall policy organization[J].Journal of High Speed Networks,2006,15(3):209-227.
10MIT DARPA intrusion detection data sets[EB/OL].[2010-10-10].http://www.ll.mit.edu/IST/ideval/data/2000/2000_data_index.html.

引证文献10

1李静梅,王军锋,张岐.一种适应多核处理器核间通信机制的设计[J].智能计算机与应用,2011,1(2X):26-30. 被引量：3
2余涛,吴卫东.基于多核处理器的L7-Filter规则匹配改进算法[J].计算机应用,2012,32(3):609-613.
3韩立敏,安建峰,高德远,樊晓桠,任向隆.众核处理器cache一致性研究综述[J].计算机应用研究,2012,29(11):4011-4016.
4王科特,王力生,廖新考.基于多核处理器的K线程低能耗的任务调度优化算法[J].计算机科学,2015,42(2):18-23. 被引量：2
5雷晓锋,李涛.时钟共享多线程处理器通信机制的设计与实现[J].电子技术应用,2016,42(3):42-46. 被引量：2
6冯晓,戴紫彬,李伟,蔡路亭.基于Amdahl定律的多核密码处理器性能模型研究[J].电子与信息学报,2016,38(4):827-833. 被引量：5
7杜丽娜,韩俊刚,李卯良.ARM并行阵列机中的路由器设计[J].微电子学与计算机,2017,34(2):73-76.
8胡森森,计卫星,王一拙,陈旭,付文飞,石峰.片上多核处理器Cache一致性协议优化研究综述[J].软件学报,2017,28(4):1027-1047. 被引量：5
9徐海文,张洋.面向新一代众核处理器的高性能SNC的设计与验证[J].计算机与数字工程,2021,49(8):1707-1713.
10黄万荣,唐玉华,易晓东.面向流处理结构的Barrier同步实现[J].计算机研究与发展,2014,51(S1):245-250. 被引量：1

二级引证文献18

1冯强,胡毅,于东,陆小虎.基于OMAP处理器的核间通信机制设计与实现[J].计算机工程,2014,40(4):281-286. 被引量：4
2李涛,刘应天,乔虹.多态并行阵列机中处理单元的设计与实现[J].西安邮电大学学报,2015,20(3):21-28. 被引量：2
3李卯良,李涛,刘欢,杨铮,郭佳乐,李明,宋晨阳.时钟共享多线程处理器存储结构的设计与实现[J].微电子学与计算机,2017,34(1):110-113.
4栾德杰.一种适用于CCS系统的接口技术[J].铁道通信信号,2017,53(7):56-59.
5戴紫彬,易肃汶,李伟,南龙梅.椭圆曲线密码处理器的高效并行处理架构研究与设计[J].电子与信息学报,2017,39(10):2487-2494. 被引量：4
6李雨薇,王文利,孟令达,李丹丹.时间频率信号的完好性监测方法研究[J].电子测量技术,2017,40(11):138-141. 被引量：2
7戴卓臣,陆江东.面向数据加密的多核多线程并行研究[J].电子设计工程,2018,26(8):183-187. 被引量：3
8胡森森,陈皇吉.一种新颖的面向数据流量特征的片上网络设计[J].电讯技术,2018,58(5):583-587.
9戴乐育,杨天池,郭松,王家琰.可重构分组密码协处理器二维指令架构[J].计算机工程与设计,2018,39(4):918-922.
10严迎建,王寿成,徐进辉,李功丽.基于Amdahl定律的分组密码并行处理模型研究[J].北京理工大学学报,2018,38(9):977-984. 被引量：3

1谢永刚,范琳,王忠民.基于共享内存的进程间通信在嵌入式软件测试中的应用[J].计算机应用与软件,2011,28(2):106-108. 被引量：4
2张林.用Turbo C实现网络上文件的互斥访问[J].商洛师范专科学校学报,2002,16(2):38-39.
3又到“双十一”[J].绿色中国,2014,0(20):3-3.
4杨浏,许毅.基于MCS和链路传播可靠性模型的MANET可靠性评估研究[J].科学技术与工程,2014,22(33):99-105. 被引量：1
5赵淼.信息系统数据集成技术浅析[J].电子工程师,2006,32(8):44-47.
6段海军,叶宏,雷清,郭勇,张鹏.面向IMA的网络文件系统访问控制分析与设计[J].航空计算技术,2011,41(5):95-97.
7张杰,阳富民,涂刚.嵌入式图形系统Nano-X多线程改造[J].计算机工程与设计,2005,26(1):259-261. 被引量：3
8赖海光,黄皓,谢俊元.利用对称多处理器提高NIDS的性能[J].计算机应用,2005,25(5):1141-1144. 被引量：2
9唐宇翔,胡景春.实时控制系统中优先级反转问题的解决方法[J].电子技术应用,2010,36(4):33-35.
10李维林,范敏,吴岳松.舰船装备可靠性分析评估程序及实施方法研究[J].舰船电子工程,2010,30(6):135-137. 被引量：4

计算机学报

2010年第10期

浏览历史

内容加载中请稍等...

众核处理器片上同步机制和评估方法研究被引量：10

参考文献21

二级参考文献18

共引文献1

同被引文献147

引证文献10

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

众核处理器片上同步机制和评估方法研究 被引量：10

参考文献21

二级参考文献18

共引文献1

同被引文献147

引证文献10

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

众核处理器片上同步机制和评估方法研究被引量：10