面向高维数据集的近邻顺序查询方法

Sequential Search Method for Nearest Neighbor Query in High-Dimensional Dataset

下载PDF

导出

摘要对顺序索引方法进行了研究,提出一种基于向量近似的高维顺序索引结构,该结构顺序访问部分文件就能完成k近邻查询。在查询过程中依据投影值来终止查询过程,依据距离来排除不匹配的数据。为进一步降低数据访问率,采用椭圆体聚类算法对数据集进行划分。新索引结构支持以多个顺序访问过程完成k近邻查询,能够同时降低查询过程中的I/O开销和CPU开销。在大型高维图像特征库上的实验表明,新的高维索引结构的查询性能优于其他高维索引方法。 The sequential index method is studied. A new high-dimensional sequential indexing structure based on vector approximation is presented, in which only a small set of approximate vectors are sequentially accessed during the query. Two one-dimensional mapping values, projection value used for terminating the searching process and the distance used to reject impossible candidate points, are presented to improve the searching speed. To reduce the data points need to be accessed, the dataset is partitioned into some ellipsoid shaped clusters. The k-nearest neighbor search is composed of several sequentially scanning in the new index structure, which can reduce both the computational CPU cost and I/O cost. The experimental results on large image database are indicative of the effectiveness of the approach.

作者崔江涛肖斌詹海生

机构地区西安电子科技大学计算机学院西安电子科技大学网络教育学院

出处《计算机科学与探索》 CSCD 2010年第9期840-849,共10页 Journal of Frontiers of Computer Science and Technology

基金中央高校基本科研业务费专项资金No.JY10000903009~~

关键词高维索引 K近邻查询椭圆体聚类顺序查找 high-dimensional indexing k-nearest neighbor search ellipsoid shaped clustering sequential scan

分类号 TP311.134.3 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献17

1Bohm C, Berchtold S, Keim D. Searching in high- dimensional spaces-index structures for improving the performance of multimedia databases[J]. ACM Computing Surveys, 2001, 33(3): 322-373.
2Weber R, Schek H J, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces[C]//Proceedings of the 24th International Conference on VLDB, 1998: 194-205.
3Tao Yufei, Yi Ke, Sheng Cheng, et al. Quality and efficiency in high-dimensional nearest neighbor search[C]// Proceedings of the 35th International Conference on SIGMOD, 2009: 563-576.
4Lejesk H, Asmundsson F H, Jonsson B, et al. NV-tree: An efficient disk-based index for approximate search in very large high-dimensional collections[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2009, 31(5): 869-883.
5Jagadish H V, Ooi B C, Tan K L, et al. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search[J]. ACM Trans on Database Systems, 2005, 30(2): 364-397.
6Blott S, Weber R. What's wrong with high-dimensional similarity search[C]//Proceedings of the 34th International Conference on VLDB, 2008.
7Ferhatosmanoglu H, Tuncel E, Agrawal D. Vector approximation based indexing for non-uniform high dimensional data sets[C]//Proc of the ACM Int'l Conf on Information and Knowledge Management(CIKM 2000). New York: ACM, 2000:202-209.
8叶航军,徐光祐.基于矢量量化的快速图像检索[J].软件学报,2004,15(5):712-719. 被引量：11
9董道国,梁刘红,薛向阳.VAR-Tree——一种新的高维数据索引结构[J].计算机研究与发展,2005,42(1):10-17. 被引量：9
10Cui Jiangtao, Zhou Shuisheng, Sun Junding. Efficient high- dimensional indexing by sorting principal component[J]. Pattern Recognition Letters, 2007, 28(12): 2412-2418.

二级参考文献29

1N. Beckmann, H. P. Kriegel, R. Schneider, et al.. The R-tree: An efficient and robust access method for points and rectangles. The SIGMOD Conf, Atlantic, NJ, 1990.
2D. A. White, R. Jain. Similarity indexing with the SS-tree. The 12th Int'l Conf. on Data Engineering, New Orleans, LA, 1996.
3N. Katayama, S. Satoh. The SR-tree: An index structure for high dimensional nearest neighbor queries. The ACM SIGMOD Int'l Conf. on Management of Data, Tucson, Arizon, USA,1997.
4J. T. Robinson. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. The ACM SIGMOD Int'l Conf. on Management of Data, Ann Arbor, Michigan, 1981.
5R. Weber, H. J. Schek, S. Blott. A quantitative analysis and performance study for similarity-search methods in highdimensional spaces. The 24th Int'l Conf. on Very Large Databases, New York, San Jose, California, 1998.
6N. Roussopoulos, S. Kelley, F. Vincent. Nearest neighbor queries. The ACM SIGMOD Int'l Conf. on Management of Data, San Jose, California, 1995.
7S. Berchtold, C. Bohm, D. A. Keim, et al. A cost model fornearest neighbor search in high-dimensional data space. In: Proc.of the 16th ACM PODS. Tucson, Arizon, 1997. 78-86.
8T. Yoshida, H. Akama, N. Taniguchi, et al. Similiary search index using vector approximation VA-Tree. 2000. http://www. ipsj. or. jp/members/Trans/Eng/02/2000/4106/article002.html.
9D. A. Manolescu. Feature extraction--A pattern for information retrieval. The 5th Pattern Languages of Programming,Monticello, Illinois, 1998.
10A. Guttman. R-trees: A dynamic index structure for spatial searching. The ACM SIGMOD Int'l Conf. on Management of Data, Boston, MA, 1984.

共引文献17

1李静.基于内容的图像检索技术研究现状综述[J].科技风,2008(19):33-33.
2崔江涛,孙君顶,周利华.基于小波变换的多分辨率高维图像检索方法[J].西安电子科技大学学报,2005,32(3):370-373. 被引量：1
3林坤辉,徐焕,周昌乐.图像数据库基于内容检索的索引方法研究[J].厦门大学学报（自然科学版）,2006,45(4):488-491.
4徐焕,林坤辉,周昌乐.基于内容图像检索中的一种动态多维索引方法[J].计算机工程与应用,2006,42(23):161-164.
5崔江涛,付少锋,詹海生,周利华.一种新的基于主分量排序的高维索引结构[J].系统工程与电子技术,2006,28(12):1927-1931.
6胡云,孙志挥.基于分块k-主色矢量量化的图像检索方法[J].淮海工学院学报（自然科学版）,2007,16(1):31-34. 被引量：1
7张骊峰,章鲁.医学影像数据库的索引及检索技术的研究[J].国际生物医学工程杂志,2007,30(3):159-163. 被引量：3
8丁广太,王威.图像相似性计算的级联小波变换[J].上海大学学报（自然科学版）,2007,13(5):571-577.
9郑斌.基于内容的遥感图像数据库的多维索引技术[J].计算机测量与控制,2007,15(12):1760-1762. 被引量：2
10何洪辉,王丽珍,周丽华.pgi-distance:一种高效的并行KNN-join处理方法[J].计算机研究与发展,2007,44(10):1774-1781. 被引量：3

1崔江涛,郭勇,周水生.一种基于椭圆体聚类的高维索引方法[J].模式识别与人工智能,2010,23(4):483-490. 被引量：1
2许维平,崔建军,许静瑶.基于C语言编程实现倒排文件的数据查找[J].计算机工程与应用,1998,34(11):63-64.
3骆剑锋.哈希表与一般查找方法的比较及冲突的解决[J].十堰职业技术学院学报,2007,20(5):96-98. 被引量：3
4王萍.B-树的性能分析及其在数据搜索中的应用[J].浙江海洋学院学报（自然科学版）,2005,24(1):80-81. 被引量：5
5笪林梅.基于顺序表查找的学生成绩查询功能的实现[J].电脑学习,2011(2):74-75.
6吴建胜,战学刚,迟呈英.一种基于自动机的分词方法[J].计算机工程与应用,2005,41(8):81-82. 被引量：8
7夏宇,朱欣焰.高维空间数据索引技术研究[J].测绘科学,2009,34(1):60-62. 被引量：6
8罗伟,王莉,艾丽,王月行.基于概念格的图像特征数据降维[J].计算机应用研究,2009,26(9):3553-3555.
9王亚宁.论三种查找方法及其在单片机中的应用[J].昆明冶金高等专科学校学报,1999,15(2):37-42.
10尹绍宏.用B-树实现倒排文件数据的快速查找[J].天津纺织工学院学报,1999,18(5):83-86.

计算机科学与探索

2010年第9期

浏览历史

内容加载中请稍等...

面向高维数据集的近邻顺序查询方法

参考文献17

二级参考文献29

共引文献17

相关作者

相关机构

相关主题

浏览历史