摘要
1引言
半结构数据是指区别于语音和图像文件等"原始数据",具有一定程度的结构,又不像传统的数据库系统那样存在严格模式的数据[1.2].半结构数据广泛存在于各种电子数据源,特别是Internet当中.以WWW为例,其HTML文件格式本身就是由标签和锚点等结构单元组成的,因此文件中的数据常常具有明显的结构.但同时效据的结构又非常不规范,不符合传统效据库的要求,因此不能简单地应用现有的数据库技术和工具对其进行处理,需要研究和开发对半结构数据进行描述和处理的新技术、新工具.
Semi-structured data are generally modeled as labeled graphs. Data in such models are self-describing and dynamically typed, and capture both schema and data information. Such models, although flexible, evoke severe efficiency penalties compared to querying structured database, such as relational databases. In order to improve the efficiency of data manipulation by utilizing structure information, we present a hybrid method capable of reorganizing semi-structured data on the basis of their structural degrees. The method extracts data with high degrees of structure and stores them in relations while leaves the rest part in its original graph form. This paper gives the algorithms for generating and dynamic updating storage model of the method, illustrates how queries could be executed based on the storage model and analyzes its improvement in processing queries, comparing with common execution methods. It also gives an algorithm that converts queries on semi-structured data to relational calculus, which provides a way to utilize query optimization techniques in relational database systems.
出处
《计算机科学》
CSCD
北大核心
2002年第10期6-10,共5页
Computer Science
基金
国家"973"重点基础研究发展规划项目(G1998030414)的支持