基于k-means特征的适应性近似最近邻搜索算法

Adaptive approximate nearest neighbor search algorithm based on k-means features

下载PDF

导出

摘要在基于倒排索引和HNSW索引结构的最近邻搜索算法中,由于所有查询点使用固定的终止条件进行近似最近邻搜索,从而导致某些查询点在搜索路径上访问了不必要的数据点。因此,本文针对十亿规模数据集,在IVF-HNSW算法的基础上,根据数据点的k-means特征和真实最小访问点,建立神经网络回归模型。通过模型,动态预测每个查询点在HNSW索引中找到最近邻所需要搜索的质心个数,以及在IVF中需要搜索的倒排列表的个数,最终每个查询点能够通过适应性搜索,减少需要访问的数据库向量的个数,进而降低总体搜索所需要的查询时间。实验结果表明,优化后的自适应搜索算法与原始IVF-HNSW算法相比,在最高召回率下,平均查询时间最多可降低27%。 In the nearest neighbor search algorithm based on the inverted index and HNSW index structure,because all query points use a fixed termination condition to search the nearest neighbor,some query points visit unnecessary data points on the search path.Therefore,for the billion-scale data set,based on the IVF-HNSW algorithm,this paper establishes a neural network regression model according to the k-means characteristics of the data points and the real minimum access point and dynamically predicts each query point through the model to find the nearest neighbor in the HNSW index.The number of centroids to search and the number of inverted lists to search in IVF.Finally,each query point can be adaptively searched to reduce the number of database vectors that need to be accessed,thereby reducing the query time required for the overall search.The experimental results show that the optimized adaptive search algorithm can reduce the average query time by up to 27%at the highest recall rate compared with the original IVF-HNSW algorithm.

作者胡文洁杨凯祥谭宗元 HU Wenjie;YANG Kaixiang;TAN Zongyuan(School of Computer Science and Technology,Donghua University,Shanghai 201620,China)

机构地区东华大学计算机科学与技术学院

出处《智能计算机与应用》 2023年第9期80-84,共5页 Intelligent Computer and Applications

关键词最近邻搜索倒排索引 HNSW索引适应性搜索 nearest neighbor search inverted index HNSW index adaptive search

分类号 TP399 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1魏远征.数据库自适应查询优化技术研究[J].计算机应用文摘,2023,39(16):77-79.
2周泽峻,杜逆索,欧阳智.无监督随机优化乘积量化图像检索模型[J].小型微型计算机系统,2023,44(8):1758-1762.

智能计算机与应用

2023年第9期

浏览历史

内容加载中请稍等...

基于k-means特征的适应性近似最近邻搜索算法

相关作者

相关机构

相关主题

浏览历史