期刊文献+

半结构化数据相似搜索的索引技术研究 被引量:11

An Index Structure of Semi-Structure Data Set for Similarity Search
下载PDF
导出
摘要 为了在海量、高维、动态的半结构化数据集上进行有效的相似搜索,该文提出一种采用聚类技术进行索引构建与更新的多路平衡树——CSS-树以及基于CSS-树的相似搜索与动态更新的算法.CSS-树借鉴SS+-树基于聚类进行节点组织与分裂的基本思想,避免了根据坐标维进行分裂时所要求的维不相关性,同时在节点组织、分裂算法和搜索算法等方面进行了改进,提出了新的搜索剪枝策略.实验表明,该结构及算法对海量半结构化数据相似搜索的效率明显优于传统算法. A new index, called CSS-tree, is proposed to organize and search dynamic high-dimension vast semi-structure data set. The CSS-tree is a multi-way balance tree, which is combining the benefit of R-tree and SS-tree to deal with high-dimension vast data sets, and the benefit of M-tree to deal with 'metric space' data sets. This paper details the structure of CSS-tree, whose each inner node is composed of a group of index elements including cover center and cover radius of child tree and every leaf is in same level and all data indices is in leaves. The paper give algorithms for similarity search based CSS-tree both range search and k-NN search, and dynamic update algorithms of the CSS-tree. It describes the simply split policy which reference to CF-tree's split policy of BIRTH, and reorganizing algorithms which using clustering technique to keep the index elements that the similar elements are neighbor in the index tree, and avoid the need of independent between feather values. It also describes how to keep minimum cover space and overlap space. Using simulation data sets and using part of 'Chinese Encyclopedia Database' as data set, which is on XML document set, experiments show that the CSS-tree is close to SS+-tree and M-tree in building tree, but CSS-tree outperforms both SS+-tree and M-tree in similarity search in semi-structured data sets.
出处 《计算机学报》 EI CSCD 北大核心 2002年第11期1219-1226,共8页 Chinese Journal of Computers
关键词 半结构化数据 相似搜索 索引 相似索引 聚类 数据挖掘 数据库 多路平衡树 similarity indexing, similarity search, semi-structured document, cluster, XML
  • 相关文献

参考文献1

二级参考文献1

  • 1Lin K I,VLDB J,1994年,3卷,4期,517页

共引文献1

同被引文献86

引证文献11

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部