期刊文献+

An encoding-based dual distance tree high-dimensional index

An encoding-based dual distance tree high-dimensional index
原文传递
导出
摘要 The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large. The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.
出处 《Science in China(Series F)》 2008年第10期1401-1414,共14页 中国科学(F辑英文版)
基金 the key program of the National Natural Science Foundation of China (Grant No.60533090) the National Natural Science Fund for Distinguished Young Scholars (Grant No.60525108) China-America Academic Digital Library Project
关键词 high-dimensional indexing centroid-distance start-distance k-nearest neighbor search high-dimensional indexing, centroid-distance, start-distance, k-nearest neighbor search
  • 相关文献

参考文献9

  • 1.
  • 2Beckmann N,,Kriegel H P,Schneider R, et al.The R*-tree: an efficient and robust access method for points and rectangles[].SIGMOD Record.1990
  • 3Bohm C,Berchtold S,Keim D.Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases[].ACM Computing Surveys.2001
  • 4Guttman A.R-tree: A dynamic index structure for spatial searching[].Procof the ACM SIGMOD Int’l Confon Management of Data(SIGMOD’).1984
  • 5Berchtold S,Keim D A,Kriegel H P.The X-tree: an index structure for high-dimensional data[].Proceedings of the nd International Conference on Very Large Data Bases.1996
  • 6Weber R,Schek H,Blott S.Aquantitative analysis and per-formance study for si milarity-search methods in high-di men-sional spaces[].Proceedings of the ACM Very Large Data Ba-ses.1998
  • 7S. Berchtold,C. Bohm,H. V. Jagadish,H. -P. Kriegel,and J. Sander.Independent quantization: An index compression technique for high-dimensional data spaces[].Proc th Int‘l Conf on Data Engineering (ICDE‘).2000
  • 8Fonseca MJ,,Jorge JA.NB-Tree:An indexing structure for content-based retrieval in large databases[].Procof theth Int’l Confon Database Systems for Advanced Applications.2003
  • 9Jagadish H V,Ooi B C,Tan K L,et al.iDistance:An adaptive B+-tree based indexing method for nearest neighbor search[].ACM Transactions on Database Systems.2005

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部