期刊文献+

超平面树:度量空间中相似性搜索的索引结构 被引量:2

Haperplane Tree: A Structure of Indexing Metric Spaces for Similarity Search Queries
下载PDF
导出
摘要 相似性搜索是从数据库中检索出同给定数据对象相似的数据对象 ,已有的基于R tree的相似性搜索 ,当搜索空间的维的个数较小时效率较高 ,但当搜索空间的维的个数较大时则效率很低 针对此问题 ,提出了新的度量空间分割方法和索引结构 pgh tree,利用数据对象与很少几个固定参考对象的距离之差进行数据分割和索引 ,产生一个平衡的索引树 在此基础上 ,提出了新的算法 ,利用查询数据对象与固定参考对象的距离之差过滤掉大部分的不相关数据 ,具有较小的I/O代价和距离计算复杂性 ,平均复杂性为θ(n0 58) ,是目前复杂性最小的相似性搜索算法 另外还讨论了基于 pgh tree的最近相邻点搜索策略 . Similarity search queries find proximity objects in database with a fixed object. R-tree based methods of existing similarity search queries have high efficiency for low dimension, but have low efficiency for high dimension. To solve this problem, pgh-tree is proposed, which is a new indexing metric space. Using metrics information of object to only few fixed objects, indexing structure and partitioning metric spaces are made to create an indexing tree with balance. An algorithm suitable for similarity search queries is presented on metric spaces under the indexing structure. The algorithm overcomes the shortcomings of the exiting algorithms. It uses difference of distance between the two reference points. It reduces the number of passes of scanning database so that I/O overhead is reduced significantly. Average complex of it is θ(n 0.58). Analysis and experimental results show that the algorithm is more efficient than others. In addition, a strategy of K-nearest neighborhood search is also discussed under pgh-tree.
出处 《计算机研究与发展》 EI CSCD 北大核心 2003年第8期1209-1215,共7页 Journal of Computer Research and Development
基金 国家自然科学基金 ( 60 2 73 0 82 ) 国家"九七三"重点基础研究发展规划基金 (G19990 3 2 70 4) 国家"八六三"高技术研究发展计划( 2 0 0 1- AA -415 - 410 ) 国家教委博士基金 ( 2 0 0 0 0 2 13 0 3 ) 黑龙江省自然科学基金 (F0 0 - 11)
关键词 算法 相似性搜索 度量空间 数据库 数据挖掘 algorithm similarity search queries metric space database data mining
  • 相关文献

参考文献15

  • 1张兆功,李建中.基于广义超曲面树的相似性搜索算法[J].软件学报,2002,13(10):1969-1976. 被引量:2
  • 2T Bozkaya, M Ozsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans on Database Systems,1999, 24(3): 361-404.
  • 3R Agrawal, Lin King-Ip, H S Sawhney et al. Fast similarity search in the presence of noise, scaling and translation in timeseries databases. In: Proc of the 21st VLDB Conf. Zurich:Morgan Kaufmann, 1995. 490-501.
  • 4J Sharer, R Agrawal. Parallel algorithms for hight-dimensional proximity joins. In: Matthias Jarke, Michael J Carey, Klaus R Dittrich eds. Proc of the 23rd Int' 1 Conf on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 176~185.
  • 5R Agrawal, C Faloutsos, A Swami. Efficient sinailarity search in sequence databases. In: David B Lomet ed. Proc of the 4th Int'l Conf Foundations of Data Organization and Algorithms. Berlin:Springer-Verlag, 1993. 69-84.
  • 6R Baeza-Yates, G Navarro. Block-addressing indices for approximate text retrieval. In: Forouzan Golshani, Kia Makki eds. Proc of the 6th Int'l Cord on Information and Knowledge Management. New York: ACM Press, 1997. 1-8.
  • 7G R Hjaltason, H Samet. Distance browsing in spatial databases.ACM Trans on Database Systems, 1999, 24(2) : 265 -318.
  • 8M Otterman. Approximate matching with high dimensionality Rtrees[ Master dissertation]. Department of Computer Science,University of Maryland, College Park, 1992.
  • 9B Moon, H V Jagadish, C Faloutsos et al. Arlalysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans on Knowledge and Data Engineering, 2001, 13(1) : 124-141.
  • 10J K Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991, 40(4) : 175-179.

二级参考文献13

  • 1Bozkaya, T., Ozsoyoglu M. Indexing large metric spaces for similarity sear ch queries. ACM Transactions on Database Systems, 1999,24(3):361~404.
  • 2Agrawal, R., Lin, K-I., Sawhney, H. S., et al. Fast similarity search in t he presence of noise, scaling, and translation in time-series databases. In: Day al, U., Gray, P.M.D., Nishio, S., eds. Proceedings of the 21st VLDB Conference. Zurich: Morgan Kaufmann Publishers, Inc., 1995. 490~501.
  • 3Shafer, J., Agrawal, R. Parallel algorithms for high-dimensional proximity joins. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens: Morgan Kaufm ann Publishers, Inc., 1997. 176~185.
  • 4Agrawal, R., Faloutsos, C., Swami, A. Efficient similarity search in seque nce databases. In: Lomet, D.B., ed. Proceedings of the 4th International Confere nce, Foundations of Data Organization and Algorithms. Heidelberg: Springer-Verla g, 1993. 69~84.
  • 5Baeza-Yates, R., Navarro, G. Block-Addressing indices for approximate text retrieval. In: Golshani, F., Makki, K., eds. Proceedings of the 6th Internation al Conference on Information and Knowledge Management. New York: ACM Press, 1997 . 1~8.
  • 6Hjaltason, G. R., Samet, H. Distance browsing in spatial databases. ACM Tr ansactions on Database Systems, 1999,24(2):265~ 318.
  • 7Otterman, M. Approximate matching with high dimensionality R-trees [MS. Th esis]. Department of Computer Science, University of Maryland, College Park, 1992.
  • 8Moon, B., Jagadish, H.V., Faloutsos, C., et al. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge a nd Data Engineering, 2001,13(1):124~141.
  • 9Uhlmann, J.K. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991,40(4):175~ 179.
  • 10Patel, J.M., DeWitt, D.J. Partition based spatial merge join. In: Jagadis h, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Con ference on Management of Data. Montreal: ACM Press, 1996. 259~270.

共引文献1

同被引文献6

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部