摘要
利用高维数据空间合理划分,提出一种简单有效的KNN检索算法-LBD。通过聚类将数据划分成多个子集空间,对每个聚类子集内的高维向量,利用距离和位码定义简化表示形式。KNN搜索时,首先利用距离信息确定候选范围,然后利用某些维上的位码不相同信息进一步缩小搜索范围,提高剪枝效率。位码字符串比较时,按照维度贡献优先顺序,大大加快非候选点过滤。LBD利用特殊的B+树组织,降低I/O和距离计算代价。采用模拟数据和真实数据,实验验证了LBD具有更高的检索效率。
Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of high-dimensional databases. To support efficient querying and retrieval on such databases, we propose a methodology exploring Local Bit-code Difference (LBD)which can support k-nearest neighbors (KNN)queries on high-dimensional databases and yet co-exist with ubiquitous indices, such as B^+-trees. On clustering the data space into a number of partitions, LBD extracts a distance and a simple bitmap representation called Bit Code (BC)for each point in the database with respect to the corresponding reference point. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the BC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between high-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic high-dimensional spaces.
出处
《计算机科学》
CSCD
北大核心
2007年第6期145-148,161,共5页
Computer Science
关键词
高维索引
KNN查询
位码
近似向量
High-dimensional index, KNN search, Bit code, Approximate vector