期刊文献+

LBD:基于局部位码比较的高维空间KNN搜索算法 被引量:3

LBD:Exploring Local Bit-code Difference for KNN Search in High-dimensional Spaces
下载PDF
导出
摘要 利用高维数据空间合理划分,提出一种简单有效的KNN检索算法-LBD。通过聚类将数据划分成多个子集空间,对每个聚类子集内的高维向量,利用距离和位码定义简化表示形式。KNN搜索时,首先利用距离信息确定候选范围,然后利用某些维上的位码不相同信息进一步缩小搜索范围,提高剪枝效率。位码字符串比较时,按照维度贡献优先顺序,大大加快非候选点过滤。LBD利用特殊的B+树组织,降低I/O和距离计算代价。采用模拟数据和真实数据,实验验证了LBD具有更高的检索效率。 Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of high-dimensional databases. To support efficient querying and retrieval on such databases, we propose a methodology exploring Local Bit-code Difference (LBD)which can support k-nearest neighbors (KNN)queries on high-dimensional databases and yet co-exist with ubiquitous indices, such as B^+-trees. On clustering the data space into a number of partitions, LBD extracts a distance and a simple bitmap representation called Bit Code (BC)for each point in the database with respect to the corresponding reference point. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the BC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between high-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic high-dimensional spaces.
出处 《计算机科学》 CSCD 北大核心 2007年第6期145-148,161,共5页 Computer Science
关键词 高维索引 KNN查询 位码 近似向量 High-dimensional index, KNN search, Bit code, Approximate vector
  • 相关文献

参考文献12

  • 1Chavez E,Navarro G,Baeza-Yates R,et al.Searching in metric spaces.ACM Computing Surveys,2001,33(3):273~321
  • 2Fonseca M J,Jorge J A.Indexing high-dimensional data for content-based retrieval in large databases.In:Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (DASFAA'03),2003.267~274
  • 3Hjaltason G R,Hanan S.Index-driven similarity search in metric spaces.ACM Transactions on Database Systems,2003,28 (4):517~580
  • 4Weber R,Schek H,Blott S.A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces.In:Proc 24th International Conference on Very Large Data Bases,1998.194~205
  • 5Beyer K,Goldstein J,Ramakhrisnan R,et al.when is "Nearest neighbor" meaningful.In:Proc.the 7th Int'l Conf.Database Theory (ICDT'99),1999.1~11
  • 6Hinneburg A,Aggarwal C C,Keim D A.What is the nearest neighbor in high dimensional spaces.In:Proc.26th International Conference on Very Large Data Bases,2000.506~515
  • 7BinCui,et al.Exploring Bit -Difference for Approximate KNN Search in High-dimensional Databases.In:Proc.16th Australasian Database Conference,2005.165~174
  • 8Berchtold S,Bohm C,Kriegel H P.The pyramid-technique:Towards breaking the curse of dimensionality.In:Proc.ACM SIGMOD International Conference on Management of Data,1998.142~153
  • 9Yu Cui,et al.Indexing the Distance:An Efficient Method to KNN Processing.In:Proc.27th International Conference on Very Large Data Bases,2001.421~430
  • 10Chakrabarti K,Mehrotra S.Local Dimensionality Reduction:A New Approach to Indexing High Dimensional Spaces.In:Proc.Conf Very Large Data Bases,2000.89~100

同被引文献16

  • 1Bohm C, Krebs F. The k-nearest neighbor join: turbo charging the kdd process [J ]. Knowledge and Information Systems (KAIS), 2004,6 (6) :728-749.
  • 2Xia C,Lu J, Ooi B C, et al. GORDER: An efficient method for KNN join processing[C]//Proceedings of the 30th International Conference on Very Large Data Bases(VLDB'04). San Francisco, CA: Morgan Kaufmann, 2004 : 756-767.
  • 3Cui Y, Cui B, Wang S, et al. Efficient index-based KNN join processing for high-dimensional data[J]. Information and Software Technology, 2007,49 (4) : 332-344.
  • 4Emrich T, Graf F, Kriegel H-P, et al. Optimizing All-Nearest- Neighbor Queries with Trigonometric Pruning[C]///Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM). Heidelberg, Germany: Springer, 2010 : 501-518.
  • 5Wang J J, Lin L, Huang T, et al. Effcient K-Nearest Neighbor Join Algorithms for High Dimensional Sparse Data. Computing Research Repository(CoRR)[R]. cs. DB/1011.2807. 2010-11.
  • 6Yu C, Zhang R, Huang Y, et al. High-dimensional k NN Joins with Incremental Updates[J]. GeoInformatica, 2010,14 ( 1 ):55- 82.
  • 7Cui B, Ooi B C, Su J W, et al. Indexing high-dimensional data for efficient in-memory similarity search[J]. IEEE Trans on Knowledge and Data Engineering, 2005,17 (3) : 339-353.
  • 8何洪辉,王丽珍,周丽华.pgi-distance:一种高效的并行KNN-join处理方法[J].计算机研究与发展,2007,44(10):1774-1781. 被引量:3
  • 9王永智,滕至阳,王鹏,聂江涛.基于LSA和SVM的文本分类模型的研究[J].计算机工程与设计,2009,30(3):729-731. 被引量:10
  • 10潘丽芳,杨炳儒.基于簇的K最近邻(KNN)分类算法研究[J].计算机工程与设计,2009,30(18):4260-4262. 被引量:27

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部