摘要
溯源是描述一个数据对象的历史操作的元数据。溯源提高了数据本身所描述的价值,给出了"对象是如何创建的?它依赖了哪些其他对象?这两个对象的历史操作有何不同?"等问题的答案。分析了对象存储系统存储管理溯源信息的优势,研究并实现了如何利用对象存储系统架构来收集和存储溯源。通过在对象存储客户端利用系统状态文件获取系统内核信息,调用JHOVE应用程序来分析和封装文件格式,使用Linux系统的审计功能对普通应用程序进行监听,并将收集到的溯源信息封装成对象,存储到对象存储设备端Berkeley DB数据库或日志文件中。测试结果表明,基于对象的溯源存储系统在不同溯源信息的收集、存储和查询方面都具有较好的性能。
Provenance is metadata that describes the ancestry or history of a digital object. Provenance enhances the value of the data it describes, as it provides answers to questions such as: How is this object created? What other object does this object depend on? How do the ancestries of these two objects differ? This paper analyzes the advantages of using objected-based storage system to store and manage provenance information, designs and implements how to use object-based storage architecture to collect and store provenance information. The system collects the kernel information by using system-status files, uses the JHOVE application to analyze file formats, and uses the Linux audit to monitor ordinary user applications on object-based storage client, and then encapsulates these provenance information into objects, stores them in Berkeley DB or log files in object-based storage devices. The measurement results show that the provenance-aware system based on object-based storage system has a good performance in terms of provenance collection, storage and query.
出处
《计算机科学与探索》
CSCD
北大核心
2018年第2期218-230,共13页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61402189
CCF-启明星辰鸿雁基金No.2016-015~~
关键词
溯源
对象存储系统
文件格式分析
溯源系统
provenance
object-based storage system
file format analysis
provenance-aware system