摘要
近20年来,中子和同步加速器科学共同体一直希望有一种共同的数据格式,用于交换实验结果和应用,以便减少数据、分析数据。使用HDF5作为数据载体已成为许多设施的标准。最大的问题是在HDF5内的数据组织(模式)的标准化。通过为数据访问引入新的间接层:公共数据模型访问(CDMA)框架,文章提出了一种解决方案,允许数据压缩开发人员与研究所责任分离:数据压缩开发人员负责数据简化代码;研究所提供访问数据的插件。CDMA是一种核心API,它通过科学家和研究所共同认可的数据格式插件机制和科学的程序定义(关键字集合)来访问数据。在应用程序定义和物理数据组织之间使用一种新型映射系统,CDMA允许数据压缩程序独立于数据文件载体和模式下开发。每个机构都为自己的数据文件格式开发一个数据访问插件,以及程序定义和数据文件之间的映射。因此,数据压缩程序可以从严格科学的角度开发,并能立即处理来自多个研究所的数据。
For nearly 20 years, the neutral and synchrotron science community has been hoping for a common data format for exchanging experimental results and applications in order to reduce and analyze data. The use of HDF5 as a data carrier has become the standard for many facilities. The biggest problem is the standardization of data organization (schema) within HDF5. By introducing a new indirect layer for data access: the Common Data Model Access (CDMA) framework, a solution is proposed that allows separation of responsibilities between data reduction developers and the institute. Data reduction developers are responsible for data reduction code; the institute provides plug-ins for accessing data. CDMA is a core API that accesses data through a data format plug-in mechanism and a scientific application definition (sets of keywords) that is commonly recognized by scientists and research institutes. The use of a new mapping system between application definition and physical data organization enables CDMA to allow data reduction application to be developed independently of data file carrier and schema. Each institute develops a data access plug-in for its own data file format, as well as mapping between application definitions and data files. As a result, data reduction applications can be developed from a rigorously scientific perspective and can immediately process data from multiple research institutes.
出处
《无线互联科技》
2018年第2期104-107,共4页
Wireless Internet Technology
关键词
公共数据模型访问
数据分析
数据可视化
数据压缩
字典机制
common data model access
data analysis
data visualization
data reduction
dictionary mechanism