期刊文献+

非结构化数据存储管理研究 被引量:8

Research on Storage Management of Unstructured Data
原文传递
导出
摘要 非结构化数据通常指相对于关系数据而言没有固定的显式结构的数据,比如视频、音频、图像、文档等非结构化数据。根据权威数据咨询机构或研究机构的预测报告显示,近5~10年的数据量将呈指数级增长,而其中的非结构化数据占到当前数字信息总量的70%~85%。面对如此庞大的数据量和信息量,如何有效管理非结构化数据、获得有价值的信息或知识显得迫在眉睫。(非结构化)数据管理可以简单化为3个目标,即:实现数据的“存得下、管得了、用的上”。本文将主要围绕前两个基本目标介绍目前的非结构化数据存储管理的研究情况。同时介绍中国人民大学非结构数据管理(UnstructuredDataManagement,UDM)研究小组基于“自由表”数据模型和BUD(BankofUnstructuredData)参考体系模型在这一个问题上所作的初步研究与探索工作,以及在原型平台myBUD中的若干存储管理技术。 In general, unstructured data means the data, compared with relational data, has no pre-defined, fixed and explicit structure, for example, as video, audio, image, documents and so on. According to the prediction in the reports from, for example, IDC and EMC, the volume of data will keep increasing exponentially while the unstructured might be from 70% to 85%. Facing with the ever-growing voluminous dataand information, it becomes more and more emergent to manage them effectively, gain the valuable information and/or knowledge. The goals of managing structured and unstructured data can be simplified into three capabilities, that is, storing, managing and using them. This paper will introduce the current work mainly focusing on the first two goals. Then it will present the Free-table model, BUD reference architecture and an adaptive storage approach that are the preliminary research and experimental study done by the UDM group at Renmin University of China.
作者 张孝 周宁南
出处 《科研信息化技术与应用》 2013年第1期30-40,共11页 E-science Technology & Application
基金 国家自然科学基金(61070054) 国家科技重大专项"核心电子器件 高端通用芯片及基础软件产品"(2010ZX01042-001-002)
关键词 非结构化数据管理 自适应算法 分布式存储系统 Unstructured data management Adaptive algorithm Distributed storage system
  • 相关文献

参考文献26

  • 1John Gantz and David Reinsel. The Digi-tal UniverseDecade. Are You Ready. White paper, IDC and EMCCorporation. May 2010.
  • 2Sears R, van Ingen C,Gray J.. To BLOB or not to BLOB:Large object Storage in a data-base or a Filesystem. MSR-TR-2006-45,2006.
  • 3Zhou Wenjing et al. A Database Approach for AcceleratingVideo Data Access, APWEB, 2009.
  • 4IBM. Data links managing files using DB2[EB/OL]. (2001)[2012-03]. http://www.redbooks.ibm.com/readbooks.
  • 5http://uima.apache.org/.
  • 6Zhang Xiao et al. Managing a large shared bank of data byusing Free-Table [C]//Proceedings of the 12th Asia-PacificWeb Conference(APWeb 2010), Busan, Korea, Apr 6-8,2010:441-446.
  • 7李未,郎波.一种非结构化数据库的四面体数据模型[J].中国科学:信息科学,2010,40(8):1039-1053. 被引量:9
  • 8Fay C, Jeffrey D, Sanjay G, Wilson C-H, et al. Bigtable:A Distributed Storage System for Structured Data, OSDI2006.
  • 9周宁南,张孝,孙新云,琚星星,刘奎呈,杜小勇,王珊.MyBUD自适应分布式存储管理的设计与实现[J].计算机科学与探索,2012,6(8):673-683. 被引量:2
  • 10I Kotsidas et al. Flashing up the storage layer: PVLDB,2008.

二级参考文献31

  • 1萨师煊,王珊.数据库系统概论[M].北京:高等教育出版社,2004:214-228.
  • 2Zhang Xiao, Du Xiaoyong, Chen Jinchuan, et al. Managing a large shared bank of unstructured data by using flee-table[C]// Proceedings of the 12th Asia-Pacific Web Conference (APWeb 2010), Busan, Korea, Apr 6-8, 2010: 441-446.
  • 3Sears R, van Ingen C, Gray J. To BLOB or not to BLOB: large object storage in a database or a filesystem? MSR-TR- 2006-45[R]. Microsoft Research, 2006.
  • 4Zhou Wenjing, Xie Xiangwei, Li Hui, et al. A database ap- proach for accelerating video data access[C]//LNCS 5731: Proceedings of the APWeb and WAIM 2009 International Workshops, Suzhou, China, Apr 2-4, 2009. Berlin: Springer- Verlag, 2009: 45-57.
  • 5Mukherjee N, Aleti B, Ganesh A, et al. Oracle SecureFiles system[J]. Proceedings of the VLDB Endowment, 2008, 1(2): 1301-1312.
  • 6IBM. Data links managing files using DB2[EB/OL]. (2001) [2012-03]. http://www.redbooks.ibm.corn/readbooks.
  • 7Kotsidas I, Viglas S D. Flashing up the storage layer[J]. Pro- ceedings of the VLDB Endowment, 2008, 1(1): 514-525.
  • 8Zhang Ning, Tatemura J, Patel J M, et al. Towards cost- effective storage provisioning for DBMSs[J]. Proceedings of the VLDB Endowment, 2011, 5(4): 274-285.
  • 9Thusoo A, Sarma J S, Jain N, et al. Hive: a warehousing so- lution over a Map-Reduce framework[J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1626-1629.
  • 10Thusoo A, Sarma J S, Jain N, et al. Hive-a petabyte scale data warehouse using Hadoop[C]//Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE 2010), Mar 1-6, 2010: 996-1005.

共引文献9

同被引文献53

引证文献8

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部