超平面树:度量空间中相似性搜索的索引结构被引量：2

Haperplane Tree: A Structure of Indexing Metric Spaces for Similarity Search Queries

下载PDF

导出

摘要相似性搜索是从数据库中检索出同给定数据对象相似的数据对象 ,已有的基于R tree的相似性搜索 ,当搜索空间的维的个数较小时效率较高 ,但当搜索空间的维的个数较大时则效率很低针对此问题 ,提出了新的度量空间分割方法和索引结构 pgh tree,利用数据对象与很少几个固定参考对象的距离之差进行数据分割和索引 ,产生一个平衡的索引树在此基础上 ,提出了新的算法 ,利用查询数据对象与固定参考对象的距离之差过滤掉大部分的不相关数据 ,具有较小的I/O代价和距离计算复杂性 ,平均复杂性为θ(n0 58) ,是目前复杂性最小的相似性搜索算法另外还讨论了基于 pgh tree的最近相邻点搜索策略 . Similarity search queries find proximity objects in database with a fixed object. R-tree based methods of existing similarity search queries have high efficiency for low dimension, but have low efficiency for high dimension. To solve this problem, pgh-tree is proposed, which is a new indexing metric space. Using metrics information of object to only few fixed objects, indexing structure and partitioning metric spaces are made to create an indexing tree with balance. An algorithm suitable for similarity search queries is presented on metric spaces under the indexing structure. The algorithm overcomes the shortcomings of the exiting algorithms. It uses difference of distance between the two reference points. It reduces the number of passes of scanning database so that I/O overhead is reduced significantly. Average complex of it is θ(n 0.58). Analysis and experimental results show that the algorithm is more efficient than others. In addition, a strategy of K-nearest neighborhood search is also discussed under pgh-tree.

作者李建中张兆功

机构地区哈尔滨工业大学计算机科学与技术学院黑龙江大学计算机科学与技术学院

出处《计算机研究与发展》 EI CSCD 北大核心 2003年第8期1209-1215,共7页 Journal of Computer Research and Development

基金国家自然科学基金 ( 60 2 73 0 82 ) 国家"九七三"重点基础研究发展规划基金 (G19990 3 2 70 4) 国家"八六三"高技术研究发展计划( 2 0 0 1- AA -415 - 410 ) 国家教委博士基金 ( 2 0 0 0 0 2 13 0 3 ) 黑龙江省自然科学基金 (F0 0 - 11)

关键词算法相似性搜索度量空间数据库数据挖掘 algorithm similarity search queries metric space database data mining

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1张兆功,李建中.基于广义超曲面树的相似性搜索算法[J].软件学报,2002,13(10):1969-1976. 被引量：2
2T Bozkaya, M Ozsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans on Database Systems,1999, 24(3): 361-404.
3R Agrawal, Lin King-Ip, H S Sawhney et al. Fast similarity search in the presence of noise, scaling and translation in timeseries databases. In: Proc of the 21st VLDB Conf. Zurich:Morgan Kaufmann, 1995. 490-501.
4J Sharer, R Agrawal. Parallel algorithms for hight-dimensional proximity joins. In: Matthias Jarke, Michael J Carey, Klaus R Dittrich eds. Proc of the 23rd Int' 1 Conf on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 176~185.
5R Agrawal, C Faloutsos, A Swami. Efficient sinailarity search in sequence databases. In: David B Lomet ed. Proc of the 4th Int'l Conf Foundations of Data Organization and Algorithms. Berlin:Springer-Verlag, 1993. 69-84.
6R Baeza-Yates, G Navarro. Block-addressing indices for approximate text retrieval. In: Forouzan Golshani, Kia Makki eds. Proc of the 6th Int'l Cord on Information and Knowledge Management. New York: ACM Press, 1997. 1-8.
7G R Hjaltason, H Samet. Distance browsing in spatial databases.ACM Trans on Database Systems, 1999, 24(2) : 265 -318.
8M Otterman. Approximate matching with high dimensionality Rtrees[ Master dissertation]. Department of Computer Science,University of Maryland, College Park, 1992.
9B Moon, H V Jagadish, C Faloutsos et al. Arlalysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans on Knowledge and Data Engineering, 2001, 13(1) : 124-141.
10J K Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991, 40(4) : 175-179.

二级参考文献13

1Bozkaya, T., Ozsoyoglu M. Indexing large metric spaces for similarity sear ch queries. ACM Transactions on Database Systems, 1999,24(3):361～404.
2Agrawal, R., Lin, K-I., Sawhney, H. S., et al. Fast similarity search in t he presence of noise, scaling, and translation in time-series databases. In: Day al, U., Gray, P.M.D., Nishio, S., eds. Proceedings of the 21st VLDB Conference. Zurich: Morgan Kaufmann Publishers, Inc., 1995. 490～501.
3Shafer, J., Agrawal, R. Parallel algorithms for high-dimensional proximity joins. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens: Morgan Kaufm ann Publishers, Inc., 1997. 176～185.
4Agrawal, R., Faloutsos, C., Swami, A. Efficient similarity search in seque nce databases. In: Lomet, D.B., ed. Proceedings of the 4th International Confere nce, Foundations of Data Organization and Algorithms. Heidelberg: Springer-Verla g, 1993. 69～84.
5Baeza-Yates, R., Navarro, G. Block-Addressing indices for approximate text retrieval. In: Golshani, F., Makki, K., eds. Proceedings of the 6th Internation al Conference on Information and Knowledge Management. New York: ACM Press, 1997 . 1～8.
6Hjaltason, G. R., Samet, H. Distance browsing in spatial databases. ACM Tr ansactions on Database Systems, 1999,24(2):265～ 318.
7Otterman, M. Approximate matching with high dimensionality R-trees [MS. Th esis]. Department of Computer Science, University of Maryland, College Park, 1992.
8Moon, B., Jagadish, H.V., Faloutsos, C., et al. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge a nd Data Engineering, 2001,13(1):124～141.
9Uhlmann, J.K. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 1991,40(4):175～ 179.
10Patel, J.M., DeWitt, D.J. Partition based spatial merge join. In: Jagadis h, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Con ference on Management of Data. Montreal: ACM Press, 1996. 259～270.

共引文献1

1杨建武,陈晓鸥.基于倒排索引的文本相似搜索[J].计算机工程,2005,31(5):1-3. 被引量：4

同被引文献6

1林建勤.基于Web的数据挖掘应用模式研究[J].贵州师范大学学报（自然科学版）,2004,22(3):92-96. 被引量：10
2李泽文.基于Web的数据挖掘技术[J].现代计算机,2004,10(7):29-33. 被引量：10
3刘业政,李亚飞,杨善林.电子商务环境下基于移动Agent的Web数据挖掘[J].计算机工程,2004,30(20):107-108. 被引量：7
4黄晓霞,萧蕴诗.数据挖掘集成技术研究[J].计算机应用研究,2003,20(4):37-39. 被引量：13
5毛克彪,覃志豪,陈晓燕,李昕.基于Web GIS的电子商务数据挖掘研究[J].测绘学院学报,2003,20(3):180-182. 被引量：9
6李长河,王维花,张二虎.基于多层次数据库的智能Web挖掘系统[J].计算机工程,2004,30(5):93-94. 被引量：6

引证文献2

1王燕.多媒体数据库知识挖掘方法分析[J].图书馆学研究,2005(12):31-33. 被引量：1
2杨俊升.多媒体教学中数据库知识挖掘方法分析[J].电子制作,2014,22(12X):170-171.

二级引证文献1

1李志明,胡森树.数据挖掘及其在现代化图书馆中的应用[J].图书馆学研究,2006(6):39-41. 被引量：39

1张兆功,李建中.基于广义超曲面树的相似性搜索算法[J].软件学报,2002,13(10):1969-1976. 被引量：2
2廖豪,梁峰,谭建龙.一种面向数据流模型的流计数算法[J].计算机工程,2010,36(23):31-33. 被引量：1
3孟均平,陈莉,马文宁,李华.图数据库中的相似性搜索算法研究与应用[J].计算机应用研究,2010,27(5):1813-1815. 被引量：5
4王忠伟,江虹.基于LSH的相似性搜索算法研究探讨[J].计算机光盘软件与应用,2015,18(2):89-90.
5许可,李未.随机k-SAT问题的回溯算法分析[J].计算机学报,2000,23(5):454-458. 被引量：2
6苦衷.共享软件使用者的无奈[J].电脑爱好者,2003(24):38-38.
7胡维华,冯伟.基于分解事务矩阵的关联规则挖掘算法[J].计算机应用,2014,34(A02):113-116. 被引量：11
8毛云建,杜秀华.基于形态特征的时间序列相似性搜索算法[J].计算机仿真,2008,25(1):80-83. 被引量：5
9陈湘涛,丁平尖,王晶.异构信息网中基于元路径的动态相似性搜索[J].计算机应用,2014,34(9):2604-2607. 被引量：2
10周建钦,马述杰.超快速排序算法[J].微计算机应用,1995,16(3):25-28. 被引量：1

计算机研究与发展

2003年第8期

浏览历史

内容加载中请稍等...

超平面树:度量空间中相似性搜索的索引结构被引量：2

参考文献15

二级参考文献13

共引文献1

同被引文献6

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

超平面树:度量空间中相似性搜索的索引结构 被引量：2

参考文献15

二级参考文献13

共引文献1

同被引文献6

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

超平面树:度量空间中相似性搜索的索引结构被引量：2