BESⅢ实验软件事例级并行化研究

Event Level Parallelization Research of BESⅢ Experimental Software

下载PDF

导出

摘要针对BESⅢ实验软件作业级并行内存消耗严重,序列级并行排序过程复杂等弊端,提出事例级并行化的解决方案,因各个事例的数据相互独立,故采用以事例组为单位的粗粒度加锁技术,在线程并行度带来的性能提升和线程交互导致的开销中取得最佳平衡。通过在内存中创建事例组先进先出队列,为事例组空闲、数据就绪、处理完成三种状态设置对应的信号量,使文件输入线程、文件输出线程、事例循环处理线程进行交互,进而建立映射表为事例处理线程分配事例并更新上下文,上述机制保证了事例数据的原序流动,避免了复杂的排序工作;为避免无效数据导致的内存浪费,应用了数据访问延迟加载技术;针对事例级并行的元组输出,建立三层映射,使得每个线程只需填充对应的树即可;最终内存消耗降低46.5%,执行性能获得显著提升。 The job level parallel of BESⅢ experimental software has the disadvantage of huge memory consumption,the sequence level parallel needs complex sorting work.Aiming at solving these problems,this article puts forward parallel solution at event level,since each event data is independent,data parallelization is selected.Coarse-grained locking of event group provides the best balance between the performance benefits of thread parallelism and the overhead of thread interaction.Creating event group FIFO queue,setting corresponding semaphore for event group state make file output thread,file input thread,event processing threads interact effectively.The mapping table is established to allocate event for event processing threads and update corresponding context.As a result,the data can flow in the original order,avoiding the sorting work.The lazy loading technique is applied to reduce memory waste caused by invalid data.For tuple output of event level parallel,three-layer mapping makes each thread fill the corresponding tree.The experimental results show that event level parallel solution reduces memory consumption 46.5%,the performance improves significantly.

作者马震太张晓梅孙功星 MA Zhentai;ZHANG Xiaomei;SUN Gongxing(Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区中国科学院高能物理研究所中国科学院大学

出处《计算机工程与应用》 CSCD 北大核心 2021年第20期253-262,共10页 Computer Engineering and Applications

基金国家自然科学基金(11775246)。

关键词排序事例组先进先出队列事例组分配线程交互三层映射 sort event group FIFO queue event group allocation thread interaction three layers of mapping

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1张继成,申文杰.基于PVM的并行遗传优化研究[J].计算机光盘软件与应用,2013,16(16):87-88. 被引量：1
2刘颖,吕方,王蕾,陈莉,崔慧敏,冯晓兵.异构并行编程模型研究与进展[J].软件学报,2014,25(7):1459-1475. 被引量：13
3刘志强,宋君强,卢风顺,赵娟.基于线程的MPI通信加速器技术研究[J].计算机学报,2011,34(1):154-164. 被引量：12
4董小社,刘超,王恩东,刘袁,张兴军.面向GPU异构并行系统的多任务流编程模型[J].计算机学报,2014,37(7):1638-1646. 被引量：11
5Cristobal A.Navarro,Nancy Hitschfeld-Kahler,Luis Mateu.A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures[J].Communications in Computational Physics,2014,15(2):285-329. 被引量：5
6王蕾,崔慧敏,陈莉,冯晓兵.任务并行编程模型研究与进展[J].软件学报,2013,24(1):77-90. 被引量：29
7李士刚,胡长军,王珏,李建江.异构多核上多级并行模型支持及性能优化[J].软件学报,2013,24(12):2782-2796. 被引量：4
8Ming-Ming Ma,Jing-Yi Liu,Shuo-Pin Wen,Sheng-Sen Sun,Chun-Xiu Liu,Yong-Zhao Sun,Zi-Yan Deng,Ye Yuan,Hong-Liang Dai,Zhi Wu,Yue-Kun Heng,Huai-Min Liu.The offline data quality monitoring of the BESIII end cap TOF system[J].Radiation Detection Technology and Methods,2019,3(4):5-11. 被引量：1

二级参考文献39

1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量：141
2张建军,蒋廷耀,郭志鑫.PVM中动态负载平衡的设计和实现[J].计算机工程,2005,31(7):63-64. 被引量：4
3Chai L, Gao Q, Panda D K. Understanding the impact of multi core architecture in cluster computing: A case study with InteI Dual Core system//Proceedings of the CCGrid'07. Rio de Janeiro, Brazil, 2007:471 -478.
4Tang H, Shen K, Yang T. Program transformation and runtime support for threaded MPI execution on shared memory machines. ACM Transactions on Programming Languages and Systems, 2000, 22(4): 673- 700.
5Demaine E D. A threads only MPI implementation for the development of parallel programs//Proceedings of the Ilth In ternational Symposium on High Performance Computing Sys terns. Winnipeg, Manitoba, Canada, 1997:153-163.
6Prakash S, Bagrodia R. MPI -SIM: Using parallel simulation to evaluate MPI programs//Proceedings of the Winter Simula tion. Los Aamitos, CA, USA, 1998:467- 474.
7Saini S, Naraikin A et al. Early performance evaluation of a Nehalem" cluster using scientific and engineering applications//Proceedings of the SC'09. New York, USA, 2009, Article 21,12 pages.
8Diaz Martin J C, Rico Gallego J A et al. An MPI -1 corn pliant thread based implementation//Proceedings o{ the EuroPVM/ MP1 2009. Berlin, Heidelberg, 2009:327- 328.
9Sade Y, Sagiv S, Shaham R. Optimizing C multithreaded memory management using thread local storage//Proceedings of the CC'05. Berlin, Heidelberg, 2005:137-155.
10Jin H W, Sur S, Chai L, Panda D K. LiMIC: Support for high-performance MPI Intra Node communication on Linux cluster//Proceedings of the ICPP'05. Washington, DC,USA, 2005, 184- 191.

共引文献66

1杜文风,王英奇,王辉,赵艳男,高博青,董石麟.基于边界平衡生成对抗网络的十字板式节点新构形智能生成方法[J].建筑结构学报,2022,43(S01):315-324. 被引量：3
2邹金安,刘志强,廖蔚.一种Nehalem平台上的MPI多级分段归约算法[J].小型微型计算机系统,2012,33(4):733-738.
3祝永志,张丹丹,曹宝香,禹继国.基于SMP机群的层次化并行编程技术的研究[J].电子学报,2012,40(11):2206-2210. 被引量：9
4王亚茹,王鹏,王德志.基于MPI的多核并行模式的性能测试与分析[J].成都信息工程大学学报,2018,33(6):617-623. 被引量：4
5熊焕亮,曾国荪,吴沧海.一种等性能面积的并行计算可扩展性度量方法[J].计算机研究与发展,2014,51(11):2547-2558. 被引量：1
6吴建宇,彭蔓蔓.面向多线程应用的片上多核处理器私有LLC优化[J].计算机工程,2015,41(1):316-321.
7祝永志,王喜燕.一种基于大同步并行编程模式的N体问题的优化实现[J].电子技术（上海）,2015,0(2):28-32.
8巨涛,朱正东,董小社.异构众核系统及其编程模型与性能优化技术研究综述[J].电子学报,2015,43(1):111-119. 被引量：13
9张薇薇,张鑫.光照并行算法的研究与实现[J].西安工程大学学报,2015,29(2):181-186. 被引量：1
10王松.基于任务的并行编程模型[J].信息通信,2015,28(6):70-70.

1姚卫新.面向产出的人才培养体系塑造[J].服装论丛,2021(2):133-133.
2陈丽芳.基于Excel的数据清洗的研究及应用[J].中国宽带,2021(11):97-98.
3李国号,唐升卫,彭琳怡.智能变电站远动快速对点系统研究[J].能源与环保,2021,43(10):177-182. 被引量：2
4周萌.基于WebGL技术的三维自动化可视系统设计[J].制造业自动化,2021,43(10):75-78. 被引量：1
5赵煜,郭贵冰,姜琳颖.基于对抗采样的社交推荐算法[J].信息安全学报,2021,6(5):88-98.

计算机工程与应用

2021年第20期

浏览历史

内容加载中请稍等...

BESⅢ实验软件事例级并行化研究

参考文献8

二级参考文献39

共引文献66

相关作者

相关机构

相关主题

浏览历史