摘要
科学计算数据集由数据和元数据组成.一般条件下,数据的尺寸较大,元数据尺寸较小.传统的高性能计算机并行文件系统可以高效率地读写大块连续数据,但是无法高效率地读写大量较小块的元数据.一旦大块数据和小块元数据两类读写特征混杂在一起,元数据将较严重地干扰并行I/O,造成性能的下降.为此,文中提出数据与元数据分治的双路并行I/O方法.该方法在高层I/O库中建立内存文件系统与并行文件系统两级存储,在存储资源之间并行迁移科学计算元数据.一方面降低较频繁读写元数据的I/O延迟,另一方面改变科学计算数据的存储特征与存储模式,从而提高科学计算应用、尤其是数据分析与可视化等读入密集型应用的I/O效率.测试表明,双路并行I/O方法可提高写性能8%~13%,提高读性能89%到1.01倍.
Datasets in scientific computing applications are composed by data and meta-data.They have different I/O characters on average.Data are large in size and accessed with long time interval,where meta-data are small in size but accessed with low time interval.Compared with a serial file system,a typical parallel file system is not only higher I/O bandwidth but also higher I/O latency.Once data and meta-data read/write sequentially to a file in parallel file system,the overall I/O bandwidth will be degraded.This paper presents a dual channel parallel I/O method,which storages data and meta-data in different file system.Data are writing to and reading from parallel file system directly.Meta-data are writing to or reading from local memory file system and migrating to/from parallel file system while file closing/opening.In performance evaluation,dual channel I/O method improves parallel write bandwidth 8%—13% and parallel read bandwidth 89%to 1.01 times.
出处
《计算机学报》
EI
CSCD
北大核心
2015年第5期1035-1043,共9页
Chinese Journal of Computers
基金
国家"八六三"高技术研究发展计划项目基金(2012AA01A309)
国家自然科学基金(61033009
61170310)
国家"九七三"重点基础研究发展规划项目基金(2011CB309702)资助~~
关键词
并行I/O
高层I/O库
性能优化
数据格式
双路并行I/O
parallel I/O
high-level I/O library
performance optimization
data format
dualchannel parallel I/O