摘要
非结构化数据通常指相对于关系数据而言没有固定的显式结构的数据,比如视频、音频、图像、文档等非结构化数据。根据权威数据咨询机构或研究机构的预测报告显示,近5~10年的数据量将呈指数级增长,而其中的非结构化数据占到当前数字信息总量的70%~85%。面对如此庞大的数据量和信息量,如何有效管理非结构化数据、获得有价值的信息或知识显得迫在眉睫。(非结构化)数据管理可以简单化为3个目标,即:实现数据的“存得下、管得了、用的上”。本文将主要围绕前两个基本目标介绍目前的非结构化数据存储管理的研究情况。同时介绍中国人民大学非结构数据管理(UnstructuredDataManagement,UDM)研究小组基于“自由表”数据模型和BUD(BankofUnstructuredData)参考体系模型在这一个问题上所作的初步研究与探索工作,以及在原型平台myBUD中的若干存储管理技术。
In general, unstructured data means the data, compared with relational data, has no pre-defined, fixed and explicit structure, for example, as video, audio, image, documents and so on. According to the prediction in the reports from, for example, IDC and EMC, the volume of data will keep increasing exponentially while the unstructured might be from 70% to 85%. Facing with the ever-growing voluminous dataand information, it becomes more and more emergent to manage them effectively, gain the valuable information and/or knowledge. The goals of managing structured and unstructured data can be simplified into three capabilities, that is, storing, managing and using them. This paper will introduce the current work mainly focusing on the first two goals. Then it will present the Free-table model, BUD reference architecture and an adaptive storage approach that are the preliminary research and experimental study done by the UDM group at Renmin University of China.
出处
《科研信息化技术与应用》
2013年第1期30-40,共11页
E-science Technology & Application
基金
国家自然科学基金(61070054)
国家科技重大专项"核心电子器件
高端通用芯片及基础软件产品"(2010ZX01042-001-002)
关键词
非结构化数据管理
自适应算法
分布式存储系统
Unstructured data management
Adaptive algorithm
Distributed storage system