期刊文献+

高能物理大数据挑战与海量事例特征索引技术研究 被引量:7

Data Management Challenges and Event Index Technologies in High Energy Physics
下载PDF
导出
摘要 新一代高能物理实验装置的建成与运行,产生了PB乃至EB量级的数据,这对数据采集、存储、传输与共享、分析与处理等数据管理技术提出了巨大挑战.事例是高能物理实验的基本数据单元,一次大型实验即可产生万亿级的事例.传统高能物理数据处理以ROOT文件为基本存储和处理单位,每个ROOT文件可以包含数千至数亿个事例.这种基于文件的处理方式虽然降低了高能物理数据管理系统的开发难度,但物理分析仅对极少量的稀有事例感兴趣,这导致了数据传输量大、I?O瓶颈以及数据处理效率低等问题.提出一种面向事例的高能物理数据管理方法,重点研究海量事例特征高效索引技术.在这种方法中,将物理学家感兴趣的事例的特征量抽取出来建立专门的索引,存储在NoSQL数据库中.为便于物理分析处理,事例的原始数据仍然存放在ROOT文件中.最后,通过系统验证和分析表明,基于事例特征索引进行事例筛选是可行的,优化后的HBase系统可以满足事例索引的需求. Nowadays,more and more scientific data has been produced by new generation high energy physics facilities.The scale of the data can be achieved to PB or EB level even by one experiment,which brings big challenges to data management technologies such as data acquisition,storage,transmission,sharing,analyzing and processing.Event is the basic data unit of high energy physics,and one large high energy physics experiment can produce trillions of events.The traditional high energy physical data processing technology adopts file as a basic data management unit,and each file contains thousands of events.The benefit of file-based method is to simplify the complexity of data management system.However,one physical analysis task is only interested in very few events,which leads to some problems including transferring too much redundant data,I?O bottleneck and low efficiency of data processing.To solve these problems,this paper proposes an event-oriented high energy physical data management method,which focuses on high efficiency indexing technology of massive events.In this method,event data is still stored in ROOT file while a large amount of events are indexed by some specified properties and stored in NoSQL database.Finally,experimental test results show the feasibility of the method,and optimized HBase system can meet the requirements of event index.
出处 《计算机研究与发展》 EI CSCD 北大核心 2017年第2期258-266,共9页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2016YFB1000604)~~
关键词 高能物理 数据管理 事例索引 HBASE 查询优化 high energy physics data management event index HBase query optimization
  • 相关文献

参考文献2

二级参考文献20

  • 1WLCG-Worldwide LHC Computing Grid [OL]. http://leg. web. cern. ch/LCG,2013. 7.
  • 2Fuhrmann P,Gtilzow V. dCache, storage system for the future [C] // Euro-Par 2006 Parallel Processing. Springer Berlin Hei- delberg, 2006 : 1106-1113.
  • 3Peters A J,Janyst L. Exabyte Scale Storage at CERN[J]. Jour- nal of Physics Conference Series,2011,331(5).
  • 4Schmuck F, Haskin R. GPFS: A Shared-Disk File System for Large Computing Clusters[C]//Proceedings of the Conference on File and Storage Technologies ( FAST ' 02 ). Monterey, CA, January 2002 : 231-244.
  • 5Schwan P. Lustre: Building a file system for 1000-node clusters [C]//Proceedings of the 2003 Linux Symposium. 2003.
  • 6IOzone Filesystem Benchmark[OL]. http://www, iozone, org S.
  • 7hakshober D J. Choosing an I/O scheduler for Red Hat Enter- prise Linux 4 and the 2.6 kernel[M]. Red Hat magazine, 2005.
  • 8Gluster web site[OL, http://www, gluster, org.
  • 9陆嘉恒.Hadoop实战[M].2版.北京:机械工业出版社,2013.
  • 10Glaser F,Neukirchen H,Rings T,et al.Using Map Reduce for High Energy Physics Data Analysis[C]//Proceedings of the16th IEEE International Conference on Computational Science and Engineering.Washington D.C.,USA:IEEE Press,2013:125-129.

共引文献10

同被引文献41

引证文献7

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部