期刊文献+

基于随机森林的高能物理数据放置策略 被引量:1

High Energy Physics Data Placement Strategy Based on Random Forest
下载PDF
导出
摘要 随着LHAASO高海拔宇宙线等高能物理实验规模的不断扩大,每年需要存储PB级的海量物理数据。高能物理海量存储系统一般采用随机的数据放置策略,没有考虑数据访问场景和服务器节点、存储设备的差异性。针对以上问题,提出一种异构存储环境下基于随机森林算法的数据放置策略,根据存储设备性能差异划分快慢存储池,同时对后期文件的读写访问场景进行预测和识别,综合考虑当前设备负载为数据找到最佳的放置位置。使用真实物理实验数据验证了算法的有效性。 With the continuous developments of high energy physics experiments such as Large High Air Altitude Shower Observatory(LHAASO),a large amount of data at PB scale will be collected,stored and analyzed every year.At present,random data placement strategy which doesn’t fully consider the differences among data access scenarios,servers and storage devices is generally used.A data placement strategy based on random-forest algorithm is proposed.Storage devices are separated into storage pools(Fast pool,Normal pool)according to their performance.The algorithm will predict and identify a new file’s access pattern,and find one best place for it considering the load of target devices.This paper evaluates the performance of the algorithm with data samples collected from production storage system of LHAASO experiment.
作者 程振京 程耀东 陈刚 汪璐 李海波 胡庆宝 CHENG Zhenjing;CHENG Yaodong;CHEN Gang;WANG Lu;LI Haibo;HU Qingbao(Computing Center,Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China;University of Chinese Academy of Sciences,Beijing 100049,China;Tianfu Cosmic Ray Research Center,Institute of High Energy Physics,ChineseAcademy of Sciences,Chengdu 610041,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第21期60-64,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.11675201,No.11575223,No.11605223,No.11805226)。
关键词 随机森林 分布式存储系统 异构存储 存储池 数据放置策略 访问场景 random forest distributed storage system heterogeneous storage storage pool data placement strategy access scenario
  • 相关文献

参考文献11

二级参考文献98

共引文献871

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部