LBD:基于局部位码比较的高维空间KNN搜索算法被引量：3

LBD:Exploring Local Bit-code Difference for KNN Search in High-dimensional Spaces

下载PDF

导出

摘要利用高维数据空间合理划分,提出一种简单有效的KNN检索算法-LBD。通过聚类将数据划分成多个子集空间,对每个聚类子集内的高维向量,利用距离和位码定义简化表示形式。KNN搜索时,首先利用距离信息确定候选范围,然后利用某些维上的位码不相同信息进一步缩小搜索范围,提高剪枝效率。位码字符串比较时,按照维度贡献优先顺序,大大加快非候选点过滤。LBD利用特殊的B+树组织,降低I/O和距离计算代价。采用模拟数据和真实数据,实验验证了LBD具有更高的检索效率。 Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of high-dimensional databases. To support efficient querying and retrieval on such databases, we propose a methodology exploring Local Bit-code Difference （LBD）which can support k-nearest neighbors （KNN）queries on high-dimensional databases and yet co-exist with ubiquitous indices, such as B^＋-trees. On clustering the data space into a number of partitions, LBD extracts a distance and a simple bitmap representation called Bit Code （BC）for each point in the database with respect to the corresponding reference point. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the BC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between high-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic high-dimensional spaces.

作者梁俊杰冯玉才

机构地区华中科技大学计算机学院湖北大学数学与计算机科学学院武汉

出处《计算机科学》 CSCD 北大核心 2007年第6期145-148,161,共5页 Computer Science

关键词高维索引 KNN查询位码近似向量 High-dimensional index, KNN search, Bit code, Approximate vector

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献12

1Chavez E,Navarro G,Baeza-Yates R,et al.Searching in metric spaces.ACM Computing Surveys,2001,33(3):273～321
2Fonseca M J,Jorge J A.Indexing high-dimensional data for content-based retrieval in large databases.In:Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (DASFAA'03),2003.267～274
3Hjaltason G R,Hanan S.Index-driven similarity search in metric spaces.ACM Transactions on Database Systems,2003,28 (4):517～580
4Weber R,Schek H,Blott S.A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces.In:Proc 24th International Conference on Very Large Data Bases,1998.194～205
5Beyer K,Goldstein J,Ramakhrisnan R,et al.when is "Nearest neighbor" meaningful.In:Proc.the 7th Int'l Conf.Database Theory (ICDT'99),1999.1～11
6Hinneburg A,Aggarwal C C,Keim D A.What is the nearest neighbor in high dimensional spaces.In:Proc.26th International Conference on Very Large Data Bases,2000.506～515
7BinCui,et al.Exploring Bit -Difference for Approximate KNN Search in High-dimensional Databases.In:Proc.16th Australasian Database Conference,2005.165～174
8Berchtold S,Bohm C,Kriegel H P.The pyramid-technique:Towards breaking the curse of dimensionality.In:Proc.ACM SIGMOD International Conference on Management of Data,1998.142～153
9Yu Cui,et al.Indexing the Distance:An Efficient Method to KNN Processing.In:Proc.27th International Conference on Very Large Data Bases,2001.421～430
10Chakrabarti K,Mehrotra S.Local Dimensionality Reduction:A New Approach to Indexing High Dimensional Spaces.In:Proc.Conf Very Large Data Bases,2000.89～100

同被引文献16

1Bohm C, Krebs F. The k-nearest neighbor join: turbo charging the kdd process [J ]. Knowledge and Information Systems (KAIS), 2004,6 (6) :728-749.
2Xia C,Lu J, Ooi B C, et al. GORDER: An efficient method for KNN join processing[C]//Proceedings of the 30th International Conference on Very Large Data Bases(VLDB'04). San Francisco, CA: Morgan Kaufmann, 2004 : 756-767.
3Cui Y, Cui B, Wang S, et al. Efficient index-based KNN join processing for high-dimensional data[J]. Information and Software Technology, 2007,49 (4) : 332-344.
4Emrich T, Graf F, Kriegel H-P, et al. Optimizing All-Nearest- Neighbor Queries with Trigonometric Pruning[C]///Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM). Heidelberg, Germany: Springer, 2010 : 501-518.
5Wang J J, Lin L, Huang T, et al. Effcient K-Nearest Neighbor Join Algorithms for High Dimensional Sparse Data. Computing Research Repository(CoRR)[R]. cs. DB/1011.2807. 2010-11.
6Yu C, Zhang R, Huang Y, et al. High-dimensional k NN Joins with Incremental Updates[J]. GeoInformatica, 2010,14 ( 1 ):55- 82.
7Cui B, Ooi B C, Su J W, et al. Indexing high-dimensional data for efficient in-memory similarity search[J]. IEEE Trans on Knowledge and Data Engineering, 2005,17 (3) : 339-353.
8何洪辉,王丽珍,周丽华.pgi-distance:一种高效的并行KNN-join处理方法[J].计算机研究与发展,2007,44(10):1774-1781. 被引量：3
9王永智,滕至阳,王鹏,聂江涛.基于LSA和SVM的文本分类模型的研究[J].计算机工程与设计,2009,30(3):729-731. 被引量：10
10潘丽芳,杨炳儒.基于簇的K最近邻(KNN)分类算法研究[J].计算机工程与设计,2009,30(18):4260-4262. 被引量：27

引证文献3

1刘艳,郝忠孝.高维主存kNN连接索引结构的核心算法[J].计算机科学,2011,38(9):146-149.
2苟和平,景永霞,冯百明,李勇.基于DBSCAN聚类的改进KNN文本分类算法[J].科学技术与工程,2013,21(1):219-222. 被引量：5
3景永霞,王治和,苟和平.基于矩阵奇异值分解的文本分类算法研究[J].西北师范大学学报（自然科学版）,2018,54(3):51-56. 被引量：7

二级引证文献12

1罗贤锋,祝胜林,陈泽健,袁玉强.基于K-Medoids聚类的改进KNN文本分类算法[J].计算机工程与设计,2014,35(11):3864-3867. 被引量：25
2樊存佳,汪友生,边航.一种改进的KNN文本分类算法[J].国外电子测量技术,2015,34(12):39-43. 被引量：25
3李湘东,阮涛,潘练.融合去噪技术和动态主题数的新闻话题分析框架研究[J].情报科学,2018,36(4):14-21. 被引量：4
4马黎.基于最大二元组和综合特征的文本分类框架[J].西北师范大学学报（自然科学版）,2019,55(5):44-50.
5冯骥,冉瑞生,魏延.基于自然邻居邻域图的无参数离群检测算法[J].智能系统学报,2019,14(5):998-1006. 被引量：6
6张帅,杨雪霞.基于非方阵图像DWT-SVD的盲水印算法研究与仿真[J].中北大学学报（自然科学版）,2020,41(1):49-54. 被引量：1
7朱芳鹏,王晓峰.面向船舶工业新闻的文本分类[J].电子测量与仪器学报,2020,32(1):149-155. 被引量：11
8方侠旋.基于XGBoost模型的文本多分类研究[J].网络安全技术与应用,2020(6):50-52. 被引量：3
9孟鑫淼.基于大数据挖掘技术的文本分类研究[J].现代电子技术,2020,43(17):126-129. 被引量：6
10范云鹏,郭小娥.矩阵奇异值分解的应用[J].科技风,2021(18):53-54. 被引量：2

1刘啸,刘玉龙.一种改进的无线传感器网络节点定位算法[J].计算机技术与发展,2014,24(4):92-95. 被引量：3
2梁俊杰,王长磊.利用分区和距离实现高维空间快速KNN查询[J].计算机研究与发展,2007,44(11):1980-1985. 被引量：4
3吴斌,肖刚,陆佳炜.基于关联规则挖掘领域的Apriori算法的优化研究[J].计算机工程与科学,2009,31(6):116-118. 被引量：22
4张春芝,冯海明.电冰箱单片机控制系统的设计[J].北京工业职业技术学院学报,2002,1(3):21-26. 被引量：2
5秦龙,郑烇,桂舒婷,杨坚.近似逼近高维索引方法[J].小型微型计算机系统,2014,35(5):951-955. 被引量：2
6梁俊杰,冯玉才.BC-iDistance:基于位码的优化高维索引[J].小型微型计算机系统,2007,28(9):1647-1651. 被引量：3
7李华旸,冯玉才,张志斌.动态数据立方的范围查询[J].小型微型计算机系统,2003,24(11):2020-2023.
8徐峻岭,徐宝文,张卫丰,崔自峰.Heuristic feature selection method for clustering[J].Journal of Southeast University(English Edition),2006,22(2):169-175.
9徐林昊,钱卫宁,周傲英.非结构化对等计算系统中多维范围搜索[J].软件学报,2007,18(6):1443-1455. 被引量：3
10彭佳红.一种基于粗糙集的混合特征选择算法[J].计算机工程与科学,2005,27(11):57-58.

计算机科学

2007年第6期

浏览历史

内容加载中请稍等...

LBD:基于局部位码比较的高维空间KNN搜索算法被引量：3

参考文献12

同被引文献16

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

LBD:基于局部位码比较的高维空间KNN搜索算法 被引量：3

参考文献12

同被引文献16

引证文献3

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

LBD:基于局部位码比较的高维空间KNN搜索算法被引量：3