学习索引研究综述

Survey of Learned Index

下载PDF

导出

摘要大数据时代数据呈爆发式增长,传统索引结构难以处理庞大复杂的数据,为解决这一问题,学习索引应运而生,并成为当前数据库领域的研究热点之一。学习索引利用机器学习模型进行索引构建,通过对数据和物理位置之间的关系进行训练和学习得到学习模型,掌握二者之间的分布特点和规律,从而实现对传统索引的改进和优化。大量实验表明,与传统索引相比,学习索引可以适应大规模数据集,提供更好的搜索性能,具有更低的空间要求。文中详细介绍了学习索引的应用背景,梳理了现有的学习索引模型;根据数据类型的不同,将学习索引分为一维和多维两种类别,并对每种类别中学习索引模型的优缺点和可以支持的查询进行了详细的介绍和分析;最后对学习索引的未来研究方向进行了展望,以期为相关研究提供参考。 Due to the explosive growth of data in the era of big data, it is difficult for the traditional index structures to handle this huge and complex data.In order to solve this problem, the learned index has emerged and become one of the most popular research topics in the database.Learned indexes employ machine learning models for index construction.By training and learning the relationship between data and physical location, the learning model can be obtained so as to master the distribution characte-ristics between the two to realize the improvement and optimization of the traditional index.Extensive experiments show that learned indexes can adapt to large-scale data sets, and provide better search performance with lower memory requirements than traditional indexes.This paper introduces the applications of learned indexes and reviews the existing learned index models.According to data types, learned indexes are divided into two categories: one-dimensional and multi-dimensional.The advantages, disadvantages, and supported searches of learned index models in each category are also introduced and analyzed in detail.Finally, some future research directions of learned indexes are prospected to provide references for related researches.

作者王艺潭王一舒袁野 WANG Yitan;WANG Yishu;YUAN Ye(School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)

机构地区东北大学计算机科学与工程学院北京理工大学计算机学院

出处《计算机科学》 CSCD 北大核心 2023年第1期1-8,共8页 Computer Science

基金国家重点研发计划(2022YFB2702100) 国家自然科学基金(61932004,62225203,U21A20516)。

关键词学习索引机器学习索引构建数据结构数据库 Learned index Machine learning Index construction Data structure Database

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

1韦婷,李馨蕾,刘慧.小样本困境下的图像语义分割综述[J].计算机工程与应用,2023,59(2):1-11. 被引量：5
2樊智勇,李伯宁,王凯,赵珍.基于混沌映射自适应NSGA-Ⅱ算法的IMA双层资源分配方法[J].电光与控制,2022,29(12):25-31. 被引量：1

计算机科学

2023年第1期

浏览历史

内容加载中请稍等...

学习索引研究综述

相关作者

相关机构

相关主题

浏览历史