期刊文献+

BESⅢ实验软件事例级并行化研究

Event Level Parallelization Research of BESⅢ Experimental Software
下载PDF
导出
摘要 针对BESⅢ实验软件作业级并行内存消耗严重,序列级并行排序过程复杂等弊端,提出事例级并行化的解决方案,因各个事例的数据相互独立,故采用以事例组为单位的粗粒度加锁技术,在线程并行度带来的性能提升和线程交互导致的开销中取得最佳平衡。通过在内存中创建事例组先进先出队列,为事例组空闲、数据就绪、处理完成三种状态设置对应的信号量,使文件输入线程、文件输出线程、事例循环处理线程进行交互,进而建立映射表为事例处理线程分配事例并更新上下文,上述机制保证了事例数据的原序流动,避免了复杂的排序工作;为避免无效数据导致的内存浪费,应用了数据访问延迟加载技术;针对事例级并行的元组输出,建立三层映射,使得每个线程只需填充对应的树即可;最终内存消耗降低46.5%,执行性能获得显著提升。 The job level parallel of BESⅢ experimental software has the disadvantage of huge memory consumption,the sequence level parallel needs complex sorting work.Aiming at solving these problems,this article puts forward parallel solution at event level,since each event data is independent,data parallelization is selected.Coarse-grained locking of event group provides the best balance between the performance benefits of thread parallelism and the overhead of thread interaction.Creating event group FIFO queue,setting corresponding semaphore for event group state make file output thread,file input thread,event processing threads interact effectively.The mapping table is established to allocate event for event processing threads and update corresponding context.As a result,the data can flow in the original order,avoiding the sorting work.The lazy loading technique is applied to reduce memory waste caused by invalid data.For tuple output of event level parallel,three-layer mapping makes each thread fill the corresponding tree.The experimental results show that event level parallel solution reduces memory consumption 46.5%,the performance improves significantly.
作者 马震太 张晓梅 孙功星 MA Zhentai;ZHANG Xiaomei;SUN Gongxing(Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《计算机工程与应用》 CSCD 北大核心 2021年第20期253-262,共10页 Computer Engineering and Applications
基金 国家自然科学基金(11775246)。
关键词 排序 事例组先进先出队列 事例组分配 线程交互 三层映射 sort event group FIFO queue event group allocation thread interaction three layers of mapping
  • 相关文献

参考文献8

二级参考文献39

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2张建军,蒋廷耀,郭志鑫.PVM中动态负载平衡的设计和实现[J].计算机工程,2005,31(7):63-64. 被引量:4
  • 3Chai L, Gao Q, Panda D K. Understanding the impact of multi core architecture in cluster computing: A case study with InteI Dual Core system//Proceedings of the CCGrid'07. Rio de Janeiro, Brazil, 2007:471 -478.
  • 4Tang H, Shen K, Yang T. Program transformation and runtime support for threaded MPI execution on shared memory machines. ACM Transactions on Programming Languages and Systems, 2000, 22(4): 673- 700.
  • 5Demaine E D. A threads only MPI implementation for the development of parallel programs//Proceedings of the Ilth In ternational Symposium on High Performance Computing Sys terns. Winnipeg, Manitoba, Canada, 1997:153-163.
  • 6Prakash S, Bagrodia R. MPI -SIM: Using parallel simulation to evaluate MPI programs//Proceedings of the Winter Simula tion. Los Aamitos, CA, USA, 1998:467- 474.
  • 7Saini S, Naraikin A et al. Early performance evaluation of a Nehalem" cluster using scientific and engineering applications//Proceedings of the SC'09. New York, USA, 2009, Article 21,12 pages.
  • 8Diaz Martin J C, Rico Gallego J A et al. An MPI -1 corn pliant thread based implementation//Proceedings o{ the EuroPVM/ MP1 2009. Berlin, Heidelberg, 2009:327- 328.
  • 9Sade Y, Sagiv S, Shaham R. Optimizing C multithreaded memory management using thread local storage//Proceedings of the CC'05. Berlin, Heidelberg, 2005:137-155.
  • 10Jin H W, Sur S, Chai L, Panda D K. LiMIC: Support for high-performance MPI Intra Node communication on Linux cluster//Proceedings of the ICPP'05. Washington, DC,USA, 2005, 184- 191.

共引文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部