期刊文献+

基于广义超曲面树的相似性搜索算法 被引量:2

An Algorithm Based on RGH-Tree for Similarity Search Queries
下载PDF
导出
摘要 相似性搜索是数据挖掘的主要领域之一.它在数据库中检索出相似的数据,发现数据间的相似性.它可以应用于图像数据库、空间数据库和时间序列分析.对于欧氏空间(一种特殊的度量空间),相似性搜索算法中基于R-tree的方法,在低维时是高效的,当维数增加时,R-tree的方法将退化为线性扫描.该现象被称为维数灾难(dimensionality curse),主要原因是存在数据重复.当数据量很大且维数很高时,距离计算和I/O操作将非常费时.提出了度量空间上新的空间分割方法和索引结构rgh-tree,利用数据库的数据对象与很少几个固定参考对象的距离信息进行数据分割和分布,产生一个各节点没有数据重复的平衡树.另外,在rgh-tree的基础上提出了相应的相似性搜索算法,该算法具有较小的I/O代价和距离计算次数,平均复杂性近似为o(n0.58).解决了目前算法存在的一些问题. Similarity search is a very important problem in data mining. It retrieves similar objects in database and finds proximity between objects. It can be applied to image/picture databases, spatial databases, and time-series databases. For Euclid space (a special metric space), similarity search algorithms based on R-tree are efficient in low-dimensional space, but degenerate into linear scan for high-dimensional space. This phenomenon is called dimensionality curse. This paper presents a new partition and index method of metric space, rgh-tree which distributes and partitions objects by using distance information of objects with few fixed reference. It produces a balance tree with no data overlay. In addition, an algorithm based on rgh-tree, which is suitable for similarity search in metric space, is presented in this paper. The algorithm overcomes the shortcomings of the exiting algorithms, which has less I/O cost and times of computing distance, with average complexity o(n0.58).
出处 《软件学报》 EI CSCD 北大核心 2002年第10期1969-1976,共8页 Journal of Software
基金 国家自然科学基金资助项目(69873014) 国家高技术研究发展计划资助项目(2001AA415410) 国家重点基础研究发展规划973资助项目(G1999032704) 国家教育部博士点基金资助项目(2000021303) 黑龙江省自然科学基金资助项目(F00-11)~
关键词 广义超曲面树 相似性搜索算法 数据库 数据挖掘 数据查询 algorithm similarity search query metric space database
  • 相关文献

参考文献13

  • 1Bozkaya, T., Ozsoyoglu M. Indexing large metric spaces for similarity sear ch queries. ACM Transactions on Database Systems, 1999,24(3):361~404.
  • 2Agrawal, R., Lin, K-I., Sawhney, H. S., et al. Fast similarity search in t he presence of noise, scaling, and translation in time-series databases. In: Day al, U., Gray, P.M.D., Nishio, S., eds. Proceedings of the 21st VLDB Conference. Zurich: Morgan Kaufmann Publishers, Inc., 1995. 490~501.
  • 3Shafer, J., Agrawal, R. Parallel algorithms for high-dimensional proximity joins. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens: Morgan Kaufm ann Publishers, Inc., 1997. 176~185.
  • 4Agrawal, R., Faloutsos, C., Swami, A. Efficient similarity search in seque nce databases. In: Lomet, D.B., ed. Proceedings of the 4th International Confere nce, Foundations of Data Organization and Algorithms. Heidelberg: Springer-Verla g, 1993. 69~84.
  • 5Baeza-Yates, R., Navarro, G. Block-Addressing indices for approximate text retrieval. In: Golshani, F., Makki, K., eds. Proceedings of the 6th Internation al Conference on Information and Knowledge Management. New York: ACM Press, 1997 . 1~8.
  • 6Hjaltason, G. R., Samet, H. Distance browsing in spatial databases. ACM Tr ansactions on Database Systems, 1999,24(2):265~ 318.
  • 7Otterman, M. Approximate matching with high dimensionality R-trees [MS. Th esis]. Department of Computer Science, University of Maryland, College Park, 1992.
  • 8Moon, B., Jagadish, H.V., Faloutsos, C., et al. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge a nd Data Engineering, 2001,13(1):124~141.
  • 9Uhlmann, J.K. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991,40(4):175~ 179.
  • 10Patel, J.M., DeWitt, D.J. Partition based spatial merge join. In: Jagadis h, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Con ference on Management of Data. Montreal: ACM Press, 1996. 259~270.

同被引文献18

  • 1T Bozkaya, M Ozsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans on Database Systems,1999, 24(3): 361-404.
  • 2R Agrawal, Lin King-Ip, H S Sawhney et al. Fast similarity search in the presence of noise, scaling and translation in timeseries databases. In: Proc of the 21st VLDB Conf. Zurich:Morgan Kaufmann, 1995. 490-501.
  • 3J Sharer, R Agrawal. Parallel algorithms for hight-dimensional proximity joins. In: Matthias Jarke, Michael J Carey, Klaus R Dittrich eds. Proc of the 23rd Int' 1 Conf on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 176~185.
  • 4R Agrawal, C Faloutsos, A Swami. Efficient sinailarity search in sequence databases. In: David B Lomet ed. Proc of the 4th Int'l Conf Foundations of Data Organization and Algorithms. Berlin:Springer-Verlag, 1993. 69-84.
  • 5R Baeza-Yates, G Navarro. Block-addressing indices for approximate text retrieval. In: Forouzan Golshani, Kia Makki eds. Proc of the 6th Int'l Cord on Information and Knowledge Management. New York: ACM Press, 1997. 1-8.
  • 6G R Hjaltason, H Samet. Distance browsing in spatial databases.ACM Trans on Database Systems, 1999, 24(2) : 265 -318.
  • 7M Otterman. Approximate matching with high dimensionality Rtrees[ Master dissertation]. Department of Computer Science,University of Maryland, College Park, 1992.
  • 8B Moon, H V Jagadish, C Faloutsos et al. Arlalysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans on Knowledge and Data Engineering, 2001, 13(1) : 124-141.
  • 9J K Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991, 40(4) : 175-179.
  • 10J M Patel, D J DeWitt. Partition based spatial merge join. In: H V Jagadish, Inderpal Singh Mumick eds. Proc of the 1996 ACM SIGMOD. Montreal, Canada: ACM Press, 1996. 259-270.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部