基于广义超曲面树的相似性搜索算法被引量：2

An Algorithm Based on RGH-Tree for Similarity Search Queries

下载PDF

导出

摘要相似性搜索是数据挖掘的主要领域之一.它在数据库中检索出相似的数据,发现数据间的相似性.它可以应用于图像数据库、空间数据库和时间序列分析.对于欧氏空间(一种特殊的度量空间),相似性搜索算法中基于R-tree的方法,在低维时是高效的,当维数增加时,R-tree的方法将退化为线性扫描.该现象被称为维数灾难(dimensionality curse),主要原因是存在数据重复.当数据量很大且维数很高时,距离计算和I/O操作将非常费时.提出了度量空间上新的空间分割方法和索引结构rgh-tree,利用数据库的数据对象与很少几个固定参考对象的距离信息进行数据分割和分布,产生一个各节点没有数据重复的平衡树.另外,在rgh-tree的基础上提出了相应的相似性搜索算法,该算法具有较小的I/O代价和距离计算次数,平均复杂性近似为o(n0.58).解决了目前算法存在的一些问题. Similarity search is a very important problem in data mining. It retrieves similar objects in database and finds proximity between objects. It can be applied to image/picture databases, spatial databases, and time-series databases. For Euclid space (a special metric space), similarity search algorithms based on R-tree are efficient in low-dimensional space, but degenerate into linear scan for high-dimensional space. This phenomenon is called dimensionality curse. This paper presents a new partition and index method of metric space, rgh-tree which distributes and partitions objects by using distance information of objects with few fixed reference. It produces a balance tree with no data overlay. In addition, an algorithm based on rgh-tree, which is suitable for similarity search in metric space, is presented in this paper. The algorithm overcomes the shortcomings of the exiting algorithms, which has less I/O cost and times of computing distance, with average complexity o(n0.58).

作者张兆功李建中

机构地区哈尔滨工业大学计算机科学与技术学院黑龙江大学

出处《软件学报》 EI CSCD 北大核心 2002年第10期1969-1976,共8页 Journal of Software

基金国家自然科学基金资助项目(69873014) 国家高技术研究发展计划资助项目(2001AA415410) 国家重点基础研究发展规划973资助项目(G1999032704) 国家教育部博士点基金资助项目(2000021303) 黑龙江省自然科学基金资助项目(F00-11)~

关键词广义超曲面树相似性搜索算法数据库数据挖掘数据查询 algorithm similarity search query metric space database

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Bozkaya, T., Ozsoyoglu M. Indexing large metric spaces for similarity sear ch queries. ACM Transactions on Database Systems, 1999,24(3):361～404.
2Agrawal, R., Lin, K-I., Sawhney, H. S., et al. Fast similarity search in t he presence of noise, scaling, and translation in time-series databases. In: Day al, U., Gray, P.M.D., Nishio, S., eds. Proceedings of the 21st VLDB Conference. Zurich: Morgan Kaufmann Publishers, Inc., 1995. 490～501.
3Shafer, J., Agrawal, R. Parallel algorithms for high-dimensional proximity joins. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens: Morgan Kaufm ann Publishers, Inc., 1997. 176～185.
4Agrawal, R., Faloutsos, C., Swami, A. Efficient similarity search in seque nce databases. In: Lomet, D.B., ed. Proceedings of the 4th International Confere nce, Foundations of Data Organization and Algorithms. Heidelberg: Springer-Verla g, 1993. 69～84.
5Baeza-Yates, R., Navarro, G. Block-Addressing indices for approximate text retrieval. In: Golshani, F., Makki, K., eds. Proceedings of the 6th Internation al Conference on Information and Knowledge Management. New York: ACM Press, 1997 . 1～8.
6Hjaltason, G. R., Samet, H. Distance browsing in spatial databases. ACM Tr ansactions on Database Systems, 1999,24(2):265～ 318.
7Otterman, M. Approximate matching with high dimensionality R-trees [MS. Th esis]. Department of Computer Science, University of Maryland, College Park, 1992.
8Moon, B., Jagadish, H.V., Faloutsos, C., et al. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge a nd Data Engineering, 2001,13(1):124～141.
9Uhlmann, J.K. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991,40(4):175～ 179.
10Patel, J.M., DeWitt, D.J. Partition based spatial merge join. In: Jagadis h, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Con ference on Management of Data. Montreal: ACM Press, 1996. 259～270.

同被引文献18

1T Bozkaya, M Ozsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans on Database Systems,1999, 24(3): 361-404.
2R Agrawal, Lin King-Ip, H S Sawhney et al. Fast similarity search in the presence of noise, scaling and translation in timeseries databases. In: Proc of the 21st VLDB Conf. Zurich:Morgan Kaufmann, 1995. 490-501.
3J Sharer, R Agrawal. Parallel algorithms for hight-dimensional proximity joins. In: Matthias Jarke, Michael J Carey, Klaus R Dittrich eds. Proc of the 23rd Int' 1 Conf on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 176~185.
4R Agrawal, C Faloutsos, A Swami. Efficient sinailarity search in sequence databases. In: David B Lomet ed. Proc of the 4th Int'l Conf Foundations of Data Organization and Algorithms. Berlin:Springer-Verlag, 1993. 69-84.
5R Baeza-Yates, G Navarro. Block-addressing indices for approximate text retrieval. In: Forouzan Golshani, Kia Makki eds. Proc of the 6th Int'l Cord on Information and Knowledge Management. New York: ACM Press, 1997. 1-8.
6G R Hjaltason, H Samet. Distance browsing in spatial databases.ACM Trans on Database Systems, 1999, 24(2) : 265 -318.
7M Otterman. Approximate matching with high dimensionality Rtrees[ Master dissertation]. Department of Computer Science,University of Maryland, College Park, 1992.
8B Moon, H V Jagadish, C Faloutsos et al. Arlalysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans on Knowledge and Data Engineering, 2001, 13(1) : 124-141.
9J K Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991, 40(4) : 175-179.
10J M Patel, D J DeWitt. Partition based spatial merge join. In: H V Jagadish, Inderpal Singh Mumick eds. Proc of the 1996 ACM SIGMOD. Montreal, Canada: ACM Press, 1996. 259-270.

引证文献2

1杨建武,陈晓鸥.基于倒排索引的文本相似搜索[J].计算机工程,2005,31(5):1-3. 被引量：4
2李建中,张兆功.超平面树:度量空间中相似性搜索的索引结构[J].计算机研究与发展,2003,40(8):1209-1215. 被引量：2

二级引证文献6

1王燕.多媒体数据库知识挖掘方法分析[J].图书馆学研究,2005(12):31-33. 被引量：1
2颜端武,成晓,甘利人.基于领域本体和概念向量的中文文本相似性测度研究[J].中国图书馆学报,2007,33(6):51-57. 被引量：5
3廉捷,刘云.网络舆情中的信息预处理与自动摘要算法[J].北京交通大学学报,2010,34(5):94-99. 被引量：8
4胡颖.基于四叉树的移动终端地图搜索算法研究与实现[J].地理空间信息,2016,14(5):89-91. 被引量：1
5华莹,侯志松.基于XML的站内搜索引擎研究[J].教育技术导刊,2008(3):61-62.
6杨俊升.多媒体教学中数据库知识挖掘方法分析[J].电子制作,2014,22(12X):170-171.

1孟均平,陈莉,马文宁,李华.图数据库中的相似性搜索算法研究与应用[J].计算机应用研究,2010,27(5):1813-1815. 被引量：5
2王忠伟,江虹.基于LSH的相似性搜索算法研究探讨[J].计算机光盘软件与应用,2015,18(2):89-90.
3李建中,张兆功.超平面树:度量空间中相似性搜索的索引结构[J].计算机研究与发展,2003,40(8):1209-1215. 被引量：2
4毛云建,杜秀华.基于形态特征的时间序列相似性搜索算法[J].计算机仿真,2008,25(1):80-83. 被引量：5
5苏.Intermec SR61手持式无线扫描器上市[J].信息与电脑,2007(2):8-8.
6陈湘涛,丁平尖,王晶.异构信息网中基于元路径的动态相似性搜索[J].计算机应用,2014,34(9):2604-2607. 被引量：2
7杨敏,王志坚,尹燕敏.时间序列相似性搜索算法研究[J].山东师范大学学报（自然科学版）,2001,16(4):373-377. 被引量：11
8岳晓峰,梁翠萍.基于分层模糊推理的坠机搜索方案研究[J].科技致富向导,2015,0(5):94-94.
9陈金科.确保票据上QR码品质的奥秘[J].印刷技术,2013(1):48-48.
10刘博宁,张建业,张鹏,王占磊.基于曲率距离的时间序列相似性搜索方法[J].电子与信息学报,2012,34(9):2200-2207. 被引量：9

软件学报

2002年第10期

浏览历史

内容加载中请稍等...

基于广义超曲面树的相似性搜索算法被引量：2

参考文献13

同被引文献18

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于广义超曲面树的相似性搜索算法 被引量：2

参考文献13

同被引文献18

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于广义超曲面树的相似性搜索算法被引量：2