摘要
相似性搜索是从数据库中检索出同给定数据对象相似的数据对象 ,已有的基于R tree的相似性搜索 ,当搜索空间的维的个数较小时效率较高 ,但当搜索空间的维的个数较大时则效率很低 针对此问题 ,提出了新的度量空间分割方法和索引结构 pgh tree,利用数据对象与很少几个固定参考对象的距离之差进行数据分割和索引 ,产生一个平衡的索引树 在此基础上 ,提出了新的算法 ,利用查询数据对象与固定参考对象的距离之差过滤掉大部分的不相关数据 ,具有较小的I/O代价和距离计算复杂性 ,平均复杂性为θ(n0 58) ,是目前复杂性最小的相似性搜索算法 另外还讨论了基于 pgh tree的最近相邻点搜索策略 .
Similarity search queries find proximity objects in database with a fixed object. R-tree based methods of existing similarity search queries have high efficiency for low dimension, but have low efficiency for high dimension. To solve this problem, pgh-tree is proposed, which is a new indexing metric space. Using metrics information of object to only few fixed objects, indexing structure and partitioning metric spaces are made to create an indexing tree with balance. An algorithm suitable for similarity search queries is presented on metric spaces under the indexing structure. The algorithm overcomes the shortcomings of the exiting algorithms. It uses difference of distance between the two reference points. It reduces the number of passes of scanning database so that I/O overhead is reduced significantly. Average complex of it is θ(n 0.58). Analysis and experimental results show that the algorithm is more efficient than others. In addition, a strategy of K-nearest neighborhood search is also discussed under pgh-tree.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2003年第8期1209-1215,共7页
Journal of Computer Research and Development
基金
国家自然科学基金 ( 60 2 73 0 82 )
国家"九七三"重点基础研究发展规划基金 (G19990 3 2 70 4)
国家"八六三"高技术研究发展计划( 2 0 0 1- AA -415 - 410 )
国家教委博士基金 ( 2 0 0 0 0 2 13 0 3 )
黑龙江省自然科学基金 (F0 0 - 11)
关键词
算法
相似性搜索
度量空间
数据库
数据挖掘
algorithm
similarity search queries
metric space
database
data mining