期刊文献+

MC-Tree: Dynamic Index Structure for Partially Clustered Multi-Dimensional Database

MC-Tree: Dynamic Index Structure for Partially Clustered Multi-Dimensional Database
原文传递
导出
摘要 Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which exists in many applications. Current index methods are inefficient with partially clustered datasets. The dynamic and adaptive index structure presented here, called a multi-cluster tree (MC-tree), consists of a set of height-balanced trees for indexing. This index structure improves the querying efficiency in three ways: 1) Most bounding regions achieve uniform distributions, which results in fewer splits and less overlap compared with a single indexing tree. 2) The clusters in the dataset are dynamically detected when the index is updated. 3) The query process does not involve a sequential scan. The MC-tree was shown to be better than hierarchical and cluster-based indexes for the partially clustered datasets. Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which exists in many applications. Current index methods are inefficient with partially clustered datasets. The dynamic and adaptive index structure presented here, called a multi-cluster tree (MC-tree), consists of a set of height-balanced trees for indexing. This index structure improves the querying efficiency in three ways: 1) Most bounding regions achieve uniform distributions, which results in fewer splits and less overlap compared with a single indexing tree. 2) The clusters in the dataset are dynamically detected when the index is updated. 3) The query process does not involve a sequential scan. The MC-tree was shown to be better than hierarchical and cluster-based indexes for the partially clustered datasets.
出处 《Tsinghua Science and Technology》 SCIE EI CAS 2003年第2期174-180,共7页 清华大学学报(自然科学版(英文版)
基金 Supported by the Chinese National Key FundamentalResearch Program(No.G1998030414) the National Natural Science Foundation of China (No.79990580) the"985" Program of Tsinghua University
关键词 MC-tree multi-dimensional index similarity query partially clustered dataset MC-tree multi-dimensional index similarity query partially clustered dataset
  • 相关文献

参考文献20

  • 1[1]Rafiei D, Mendelzon A O. Similarity-based queries for time series data. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data. Tucson, 1997:13 - 24.
  • 2[2]Rakesh A, Christos F. Efficient similarity search in sequence databases. In: Proc. of FODO'93.Chicago, Illinois, 1993: 69-84.
  • 3[3]Hull D. Improving text retrieval for the routing problem using latent semantic indexing. In: Proc. of the 17th ACM-SIGIR Conference. 1994: 282- 291.
  • 4[4]Flckner M, Sawhney H, Niblack W, et al. Query by image and video content, the QBIC system. IEEE Computer, 1995, 28(9): 23 - 32.
  • 5[5]Chan K, Fu Ada Wai-Chee. Efficient time series matching by wavelets. In: Proc. of the 15th Intl.Conf. on Data Engineering. Sydney, Australia,1999: 126- 133.
  • 6[6]Ravi Kanth K V, Divyakant Agrawal, Amr EI Abbadi, et al. Dimensionality reduction for similarity searching in dynamic databases. Technical Report TRCS98-10. Department of Computer Science,University of California, Santa Barbara, 1998.
  • 7[7]Tsaparas P. Nearest neighbor search in multidimensional spaces. Qualifying Depth Oral Report, Department of Computer Science,University of Toronto, 1999.
  • 8[8]Weber R, Schek H J, Blott S. A quantitative analysis and performance study for similarity search methods in high dimensional space. In: Proc. of 24th Int. Conf. on Very Large Databases, VLDB.1998: 194-205.
  • 9[9]Guttman A. R-trees: A dynamic index structure for spatial searching. In: Proc. of the ACM SIGMOD Int. Conf. On Management of Data. 1984, 47- 57.
  • 10[10]Beckmann N, Kriegel H, Schneider R, et al. The R * tree: An efficient and robust access method for points and rectangles. In: Proc. of ACM SIGMOD Int. Conf. on Mana gement of Data. 1990:322 331.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部